Almost Timely News: 🗞️ How To Get Started with Hosted Open Weights AI (2026-02-22)
All the power of AI at 5% of the cost
Almost Timely News: 🗞️ How To Get Started with Hosted Open Weights AI (2026-02-22) :: View in Browser
The Big Plug
Two things to try out this week:
1. Got a stuck AI project? Try out Katie’s free AI Readiness Assessment tool. A simple quiz to help predict AI project success.
2. Wonder how your website is seen by AI? Try my free AI View tool (limited to 10 URLs per day). It looks at your site and tells you what an AI crawler likely sees - and what to fix.
Content Authenticity Statement
95% of this week’s newsletter content was originated by me, the human. You’ll see outputs from Claude Code in the opening segment. Learn why this kind of disclosure is a good idea and might be required for anyone doing business in any capacity with the EU in the near future.
Watch This Newsletter On YouTube 📺
Click here for the video 📺 version of this newsletter on YouTube »
Click here for an MP3 audio 🎧 only version »
What’s On My Mind: How To Get Started with Hosted Open Weights AI
This week, let’s talk about using open weights models from a hosted provider. There are many situations where you’d want to use something like a state of the art (SOTA) open weights model but you don’t have the hardware to run it yourself. I’ll show you how to get started, what it will cost (it’s not free), and how to start using them.
Part 1: Glossary
If that all sounded like word salad, then let’s get the table set with some definitions.
Open weights: in the world of AI, there are two fundamental types of AI models, closed weights and open weights. Closed weights models are kept secret by providers. You can’t download them or exert much control over them; these are models like OpenAI’s GPT-5.3, or Google’s Gemini 3.1. Open weights models are models that you can download and install on your computer or in a third party provider. The models themselves are usually free.
SOTA: state of the art. Generally, this term refers to any AI model that tops benchmark charts.
Inference: when AI is generating stuff, it’s called inference. When it’s learning, that’s called training. For end users like you and me, we are almost always doing inference. This is important because we’re looking mainly for inference providers, which is the name of the type of company that hosts open weights AI models.
Prompt Caching: when we’re shopping for AI model hosting companies, look for companies that offer solid prompt caching. This saves the unchanging parts of a prompt from task to task, which can result in substantial cost savings.
Parameters: parameters are the statistical associations in a model that represent its knowledge. The more parameters a model has, generally speaking, the more knowledge it has. An 8 billion parameter model (which is relatively small) will have much less broad knowledge than an 8 trillion parameter model. The fewer parameters a model has, the more likely it is to hallucinate without access to tools.
Tools: in the context of AI, tools are anything that an AI model can use if it’s told is available. The most common tool is web search - when we perform a task that requires external knowledge, a model can fire up a web search to get current information. Other tools include things like the ability to talk to specific applications like your CRM or email inbox, etc.
Context window: AI models all have long and short term memory. Their long term memory is encoded in their parameters. Their short term working memory is called a context window, measured in tokens.
Tokens: the mathematical unit that AI operates in, typically about 3/4 of a word. When we talk about a model’s context window, it’s measured in tokens. The more tokens a model has in its context window, the more complex and detailed a task it can do. Models like Claude Opus 4.6 and Gemini 3.1 have 1 million token context windows, which means they can work with about 750,000 words at a time.
API: short for application programming interface, an API is how software packages talk to each other. Your AI interface connects to an inference provider via an API.
Zero Data Retention: A policy used by technology companies that states they do not keep information you send to them. Especially important for AI where your prompts and responses often contain valuable or sensitive information.
Part 2: Reasons for Open Weights Models
Let’s dig into the specific use cases. The most obvious question about open weights models is, why would you want to use an open weights model versus one of the premier SOTA models like Gemini 3.1 or Opus 4.6? If you already have ChatGPT, isn’t that good enough?
There are four major reasons to consider open weights models. First is privacy - depending on the inference provider you work with, they may have policies like Zero Data Retention (ZDR). For data that is commercially sensitive (but still allowable with safe third parties), using an inference provider that offers ZDR will be more private than using a commercial provider like OpenAI or Google that may retain your data for 30 days or more - and if you’re using the free versions, your data is probably being used to train future models and is retained in perpetuity.
Almost every major SOTA big tech provider has some form of data retention, so if privacy is important to you (and you’re working with material that is still acceptable to be briefly on a third party’s infrastructure), then using open weights models via an inference provider might fit the bill. You still get near-SOTA capabilities but much more privacy.
Note that for truly sensitive, confidential data, even a ZDR inference provider is still technically a third party. Use local models hosted on your own infrastructure if you have truly confidential data that cannot ever be in the hands of a third party.
The second major reason is cost. Open weights models typically cost much less than their closed weights counterparts. In this chart from Artificial Analysis, we see the typical consultant’s 2x2 matrix - intelligence versus cost. The model that’s just about right is GLM-5 from z.ai, the Chinese AI company Zhipu.
Intelligence and cost are tradeoffs in AI - because AI providers bill by token amounts, sometimes a smarter model ends up being cheaper because it requires less thinking and chasing its own tail than a dumber, cheaper model that has to retry a task many times to get it right.
Here’s a sense of the costs. Prior to this month, Google’s Gemini 3.0, Claude Opus 4.5 and GLM 5 were comparable on intelligence. Here’s how their API costs compare:
Opus 4.5 input cost: $5 per million tokens
Opus 4.5 output cost: $25 per million tokens
Gemini 3 input cost: $2 per million tokens
Gemini 3 output cost: $12 per million tokens
GLM-5 input cost: $0.80 per million tokens
GLM-5 output cost: $2.56 per million tokens
(Pricing is from US-based DeepInfra inference provider for GLM-5.)
GLM-5, comparable in intelligence to these two peers, costs in some cases 1/10th of what its competitors cost. Moonshot AI’s Kimi K2.5 model is half the cost of GLM-5, making it 1/20th the cost of Opus 4.5 for similar performance. And coders’ favorite Minimax-M2.5 is half the cost of that, at 27 cents per million tokens in, 95 cents per million tokens out.
One important clarification is that when we’re talking about Chinese models like GLM5, et cetera, we are not talking about using them through the Chinese company’s infrastructure. We are talking about using them with an inference provider company of our choosing. Because open weights models are models that anyone can download, there are all these cottage industries of companies that have set up shop that host these freely available models. So be clear, we’re not using a model that is based in the People’s Republic of China. We’re using a model that came from China, but can be used anywhere.
The current generation of open weights models, performance wise, are on par with the previous generation of SOTA models (which were current until the end of January 2026, so not that long ago). For 1/20th the cost of the market leader, you could get the same power.
Now, if you’re using AI in a chat capacity, that might not mean much, especially if you’re doing one-off tasks like writing a blog post. But if you’re using AI agents, and especially the latest breed of autonomous agents like OpenClaw and its derivatives? Using it with Anthropic’s API (which it was originally designed to do) could cost you hundreds of euros or dollars a month, maybe more. When you switch to an open weights model, you’re cutting your costs by up to 95% for the same level of intelligence. One of the reasons Minimax-M2.5 has become so popular in the last month is because it’s well-suited for agents like OpenClaw - fast, smart, and cheap.
This is doubly important for anyone who is on limited resources. For example, Claude Code is incredibly powerful, as an example, but you might not have USD 200 to spend monthly on it. What if you could get the same kind of power and performance on USD 10 a month? Using a model like Minimax-M2.5, if the model is suited for your use cases, would let you have the best of both powers - high intelligence and low costs.
I think of my many friends and colleagues who are between jobs right now. Claude Code’s capabilities could make a huge difference for them, but Anthropic’s limits on the free and lowest cost subscriptions put those capabilities out of reach. And they might not have 200 bucks a month to buy the max subscription. Open weights models would let them have that same capability for 10 bucks a month.
The third reason is control. Many open weights models have less censorship or different censorship than the models and systems we’re used to. Systems like ChatGPT and Gemini are heavily censored, prohibiting you from asking specific types of requests even if you have a legitimate use case. Open weights models vary in terms of level of censorship from model to model, but because an inference provider gives you access to the raw model itself, chances are you may have more flexibility. You have more control over how AI responds.
Whenever I bring this up in conversation, people look at me and wonder exactly what I’m asking AI. I’ll give you a practical use case for wanting to avoid censorship. There’s one client that works in the laboratory reagents space. Their product catalog is a who’s who of nearly every chemical under the sun, including some chemicals that are highly restricted. As a certified reagents provider, they are legally authorized to deal in those chemicals, but if they try to have ChatGPT or similar work on their product catalog, they’re often told they’re asking about restricted things and can’t proceed.
For this company to harness the power of AI, they need models and providers that allow them to work with the chemicals and reagents in their catalog, and many open weights models fit that bill.
The fourth use case is backups. Sometimes... things break. You show up at work one day and everyone else is panicking that ChatGPT is down or Gemini is down or something else. When you use a separate inference provider, depending on where that provider is located, you might be immune to the outage, allowing you to keep getting things done while everyone else wonders what to do besides stare at an error message.
More likely, every provider today that offers tools like Claude Code, OpenAI Codex, Google Antigravity, etc. gives you a certain amount of API usage included with your subscription each week. That usage may or may not go very far depending on the tasks you’re working on. If you have an open weights inference provider that’s pay as you go, then as long as you can keep paying, you can keep going. There are no limits that you have to budget for.
Anyone who’s used Claude Cowork or Claude Code has run into that dreaded “You’ve used 97% of your quota” message at some point - usually when you’re in the middle of a critical change or project. Having an open weights provider on standby means you can use Claude Code with your regular plan and when you hit limits, use a tool like Claude Code Router to switch to your open weights provider and keep on going.
Part 3: Getting Started
Okay, so that sounds great - open weights models are powerful, cheap, private, capable, and have fewer restrictions. What was state of the art three weeks ago is now commodity today. But how do you actually use these things? How would someone get started?
First, you need an interface of some kind. No one ever uses an AI model directly; instead, there’s some kind of wrapper around it, which in nerdspeak is called a harness. Think of the AI model as the engine; the harness is the car that the engine sits in. When we use models through inference providers, we are essentially given the ability to choose the engine AND what kind of car we want to put the engine in.
If you’re used to chatting with an environment like ChatGPT, you’ll want some kind of chat interface. As we talked about in a previous issue, you can use desktop apps like Anything LLM and connect them to any inference provider.
If you’re a coder, you’ll be able to use the model’s API right inside most coding tools like Cline, Kilo Code, Qwen Code, etc. If you’re a highly technical person, you might want something like Claude Code Router that can let you keep all your settings and tools you’ve built in Claude Code, but change the engine it uses. (It also lets you use Claude Code without an Anthropic subscription, which is a nice plus)
Second, you need an inference provider. Here’s a starting prompt to commission any Deep Research report on:
You’re an inference provider specialist who knows open weights model hosting. Provide an analysis of inference providers located in {your country or region} that offer a Zero Data Retention API (mandatory) and offer these specific models: Minimax-M2.5, Kimi K2.5, z.ai GLM-5. Providers must offer all 3 models to qualify for this analysis. Once you’ve located the providers, double check their privacy policies. Then provide the costs of the providers per million tokens of input and output for these three models. Your final output should be a Markdown list with the following fields per provider: provider name, privacy policy URL that contains a valid zero data retention (ZDR) API, cost per million tokens for GLM-5 as input and output, cost per million tokens for Minimax-M2.5 as input and output, cost per million tokens for Kimi K2.5 as input and output. Order providers by cost per million tokens output in ascending order. The intended purpose of this is to find the most affordable, private inference provider for open weights models, especially for coding and general use.
Run this in any Deep Research tool of your choice and examine the results. Double check the privacy policies! There is no substitute for human review when it comes to things like privacy and security, especially if you plan on working with sensitive data.
A few providers that I’m familiar with in my region - and none of them have paid to be listed:
DeepInfra - highly cost effective, my first choice
Together AI
Baseten
Fireworks AI
Once you’ve chosen your inference provider and set up your account with them - almost all of them require you to put some money down - then you’ll get your API keys to use with your environment. Store them someplace securely.
Next, in the interface you’re using, look for how to input your API key and choose your model. Here’s an example; I’m using DeepInfra as my provider, so I have an API key for their service. If I want to use it with Anything LLM, I’d take a screenshot of my settings, the documentation for the app, and this starter prompt:
You’re a local AI expert skilled at model hosting and serving. I’ve just installed AnythingLLM on my computer and now I need help setting it up. I want to use {your selected inference provider} and I want it configured for minimum logging and maximum privacy. Visit the website at
https://docs.anythingllm.com/
and find the documentation to help me set it up to use this provider and their API, then build me a step by step guide for setting it up with my requirements of minimum logging and maximum privacy. I am a {novice/intermediate/advanced} technical user, so create your instructions accordingly, with step by step, specific instructions. Ask me questions until you have enough information to complete this task successfully, keeping in mind my level of technical skill.
This will walk you through the setup based on your level of technical skill, helping you get up and running.
Part 4: Testing Models
Your next step is to develop a model test that will help you decide which models to use that best suit your purposes. What do you usually use AI for?
For example, suppose you use AI to write LinkedIn posts. Take your top 5 best performing LinkedIn posts you’ve ever written and reverse engineer a prompt out of them.
Here’s an example of one tailored specifically for me and my writing style, derived from my own top posts in the last year. Claude reverse engineered this:
You will be writing a LinkedIn post for AI expert Christopher Penn based on raw bullet points provided below.
<raw_thoughts>
{{RAW_THOUGHTS}}
</raw_thoughts>
**Your Role and Audience:**
You are writing as an expert Data Scientist, AI Keynote Speaker, and pragmatist. Your audience consists of 45,000 business professionals, marketers, and tech leaders who follow Christopher Penn on LinkedIn. Your goal is to transform the raw bullet points above into a highly engaging, thought-provoking LinkedIn post that challenges the status quo with empirical evidence and hard facts.
**Required Narrative Arc:**
Your post must follow this exact structure:
1. **The Hook:** Start with a punchy, declarative sentence. Ground it in a real-world scenario (e.g., "I'm sitting on a plane..."), a specific experiment, or a hard news item (e.g., a court order). Make the reader stop scrolling immediately.
2. **The Proof ("Show, Don't Tell"):** Expand on the hook using specific, verifiable details. Use numbers, exact wattages, explicit quotes, or legal realities. Do not summarize; show the raw data or the specific observation.
3. **The Provocation (The Hard Truth):** State the uncomfortable reality. Challenge conventional wisdom directly. If humans are producing bad work, call it out. If a popular tool is failing on privacy, state it plainly. Rely entirely on verifiable facts and objectivity. Do not attempt to provide a "balanced view." Give the most truthful, factual assessment based on the data.
4. **The Business Impact (The Takeaway):** Bring it home for a business leader or stakeholder. Frame the conclusion around scale, quality, speed, cost, or privacy constraints. What is the stark reality they need to accept or act upon today?
**Strict Tone and Formatting Rules:**
- **ACTIVE VOICE ONLY:** Write strictly in the active voice. Absolutely no passive voice constructions. No exceptions.
- **Paragraph Length:** Write for mobile scrollers. Keep paragraphs to 1-3 sentences maximum. Use frequent line breaks.
- **No Fluff or Jargon:** Ban these words and phrases entirely: "delve," "unlock," "in today's fast-paced digital landscape," "synergy," "tapestry," "leverage," "ecosystem," "paradigm shift," "game-changer," "cutting-edge," "revolutionary" (unless describing an actual revolution). Speak like a direct, seasoned data scientist talking to a peer.
- **Standalone Value:** Do not include a Call to Action (CTA). Do not ask a question to "drive engagement." Do not prompt the user to click a link, comment, or share. The post must deliver 100% of its value natively in the text.
- **No Engagement Bait:** Do not end with questions like "What do you think?" or "Have you experienced this?" The post stands on its own merit.
- **Hashtags:** End the post with this exact block of hashtags on a new line:
`#AI #GenerativeAI #GenAI #ChatGPT #ArtificialIntelligence #LargeLanguageModels #MachineLearning #IntelligenceRevolution`
**Before Writing:**
Use a scratchpad to plan your approach. Identify the core insight from the raw thoughts, determine what specific evidence or data you can cite, and outline how you'll structure each section of the narrative arc.
<scratchpad>
[Analyze the raw thoughts here. Identify:
- What is the core observation or experiment?
- What specific, verifiable details can I cite?
- What conventional wisdom does this challenge?
- What is the business impact?
- What visual proof would support this?]
</scratchpad>
**Your Output:**
After your scratchpad, write the complete LinkedIn post following all the rules above. After the post and hashtags, add a separate section with visual suggestions.
Format your final output as follows:
<linkedin_post>
[Your complete post text here, ending with the required hashtags]
</linkedin_post>
<visual_suggestion>
[Suggest 1-2 ideas for a realistic, authentic image or screenshot to attach to this post as visual proof. Examples: a terminal window showing output, a snippet of a legal document, a side-by-side text comparison, a graph of actual data, a screenshot of a specific tool's interface showing the issue discussed.]
</visual_suggestion>
Remember: Your final output should contain only the scratchpad, the complete LinkedIn post inside the specified tags, and the visual suggestion. The post itself must be written entirely in active voice, contain no engagement bait, and deliver complete standalone value.Once you’ve got your own version of this testbed prompt written, then take something you’d write about and put it in the raw thoughts section.
Then in a tool like Anything LLM, run this prompt for every model you want to test out. For good measure, use the exact same prompt with your favorite closed weights model/tool (like ChatGPT) as well.
Compare the results. Which model did the best job, in your opinion? This is critically important - you have to be the judge of whether or not the model got the job done. Of the open weights models available to you in your inference provider, which got the closest to the way you actually do the task?
And every time a new model comes out, you have a benchmark test that is specifically suited to you that you can evaluate the model with. If you want to be sure, run multiple tests per model to see how it does, especially if you use AI for a variety of tasks.
If you’re a more advanced user? Set up a workflow in a system like Claude Code or n8n to programmatically test the different tasks you’d want to evaluate AI on. Maybe you’re a web developer - develop a complex prompt for an infographic or an interactive, and see which model gets closest to the way you’d do it. If you’re a coder, give it a task and your coding standards and see which model produces working code with as few revisions as possible.
Whatever the case is, having some way to validate the performance of an AI model is absolutely mandatory if you want to know whether any given model will do what you want.
Part 5: Get to Work
Now that you’ve got an inference provider chosen, you’ve got a local interface chosen, and you’ve tested the models, you’re ready to get to work. You’re ready to start using open weights models at a safe, private inference provider for anything you’d normally use a closed weights model provider for.
That means you have access to private, secure AI, AI that is on standby (especially with pay as you go providers) for when your normal tool of choice breaks or is unavailable. Anyone who’s been using AI for more than a minute knows that whenever the big tech companies are about to release or do release a new piece of technology, they get overwhelmed by demand and their services are almost unusable for a couple of days while they scramble to meet demand.
During periods like that, you’re ready to go, and everyone else at the office will be wondering how you remain productive while they stand around the coffee machine.
I’ll share my own story here. I love to use AI to build and make stuff, and Trust Insights fortunately is a company that invests very heavily in both technology and its people. However, there are a ton of little projects on the side I have that have zero business value, like this one video game idea I had that’s just absurd. No business value at all. Side projects like this almost never have business value, and thus when I want to work on them, I’ll switch to DeepInfra and use a highly capable open weights model instead of using company resources.
Why? Because we subscribe to Claude’s Max plan, we get a generous amount of usage each week, but we share it as a team. Every token I use for a silly personal project is a token not available for the team to use that week, so to be responsible, I use open weights models for those things instead. (see my January 25th issue about locally hosted open weights models)
The same is true for token-intensive tasks that don’t always require the smartest, most expensive models to get them done, like building my daily briefing. Using an expensive, powerful model like Opus 4.6 to assemble a daily briefing from my to-do list and calendar is like taking a fighter jet to the grocery store. Yes, it can do the task very capably, but it’s vast overkill. Using an open weights model gets the job done just as well but doesn’t eat into our weekly token budget for Claude.
Another time, I was working on a major client project when Anthropic went down, last summer. It was a big outage, lasting for a good chunk of the workday, but I didn’t miss a step. I switched to an open weights model and was able to keep working while every other Claude user was unable to get anything done.
Open weights models should be part of the toolkit for every AI practitioner. You should have options for every major task of economic value you perform, and know - based on the testing you do like the testing in part 4 - which models are best at your specific tasks. Get started today - you might be surprised at how much you can do on a shoestring budget!
How Was This Issue?
Rate this week’s newsletter issue with a single click/tap. Your feedback over time helps me figure out what content to create for you.
Here’s The Unsubscribe
It took me a while to find a convenient way to link it up, but here’s how to get to the unsubscribe.

If you don’t see anything, here’s the text link to copy and paste:
https://almosttimely.substack.com/action/disable_email
Share With a Friend or Colleague
Please share this newsletter with two other people.
Send this URL to your friends/colleagues:
https://www.christopherspenn.com/newsletter
For enrolled subscribers on Substack, there are referral rewards if you refer 100, 200, or 300 other readers. Visit the Leaderboard here.
ICYMI: In Case You Missed It
Here’s content from the last week in case things fell through the cracks:
How AI Coworkers Can Automate Your Most Tedious Software Tasks
Agents vs. Skills in AI: Understanding the Key Difference for Smarter Automation
Why Detecting “AI Writing Style” Is As Useless As Trying to Spot “Men’s Writing
How AI Is Making Open Source Software a Viable Alternative to Expensive Commercial Tools
Almost Timely News: 🗞️ How I Think About Building with AI (2026-02-15)
In-Ear Insights: Cognitive Offloading, Deskilling, and The Impact of AI
On The Tubes
Here’s what debuted on my YouTube channel this week:
So What? How to Conduct Marketing Strategy Review with Agentic AI
Mind Readings: Introduction To Generative Engine Optimization
Skill Up With Classes
These are just a few of the classes I have available over at the Trust Insights website that you can take.
Premium
Free
👉 New! From Text to Video in Seconds, a session on AI video generation!
Never Think Alone: How AI Has Changed Marketing Forever (AMA 2025)
Powering Up Your LinkedIn Profile (For Job Hunters) 2023 Edition
Building the Data-Driven, AI-Powered Customer Journey for Retail and Ecommerce, 2024 Edition
The Marketing Singularity: How Generative AI Means the End of Marketing As We Knew It
Advertisement: New AI Book!
In Almost Timeless, generative AI expert Christopher Penn provides the definitive playbook. Drawing on 18 months of in-the-trenches work and insights from thousands of real-world questions, Penn distills the noise into 48 foundational principles—durable mental models that give you a more permanent, strategic understanding of this transformative technology.
In this book, you will learn to:
Master the Machine: Finally understand why AI acts like a “brilliant but forgetful intern” and turn its quirks into your greatest strength.
Deploy the Playbook: Move from theory to practice with frameworks for driving real, measurable business value with AI.
Secure Your Human Advantage: Discover why your creativity, judgment, and ethics are more valuable than ever—and how to leverage them to win.
Stop feeling overwhelmed. Start leading with confidence. By the time you finish Almost Timeless, you won’t just know what to do; you will understand why you are doing it. And in an age of constant change, that understanding is the only real competitive advantage.
👉 Order your copy of Almost Timeless: 48 Foundation Principles of Generative AI today!
Get Back To Work!
Folks who post jobs in the free Analytics for Marketers Slack community may have those jobs shared here, too. If you’re looking for work, check out these recent open positions, and check out the Slack group for the comprehensive list.
Data Architecture & Data Modeling Consultant (Full Time Role) at Global Data Strategy, Ltd
Vice President Of Artificial Intelligence at James Search Group
Advertisement: New AI Strategy Course
Almost every AI course is the same, conceptually. They show you how to prompt, how to set things up - the cooking equivalents of how to use a blender or how to cook a dish. These are foundation skills, and while they’re good and important, you know what’s missing from all of them? How to run a restaurant successfully. That’s the big miss. We’re so focused on the how that we completely lose sight of the why and the what.
This is why our new course, the AI-Ready Strategist, is different. It’s not a collection of prompting techniques or a set of recipes; it’s about why we do things with AI. AI strategy has nothing to do with prompting or the shiny object of the day — it has everything to do with extracting value from AI and avoiding preventable disasters. This course is for everyone in a decision-making capacity because it answers the questions almost every AI hype artist ignores: Why are you even considering AI in the first place? What will you do with it? If your AI strategy is the equivalent of obsessing over blenders while your steakhouse goes out of business, this is the course to get you back on course.
How to Stay in Touch
Let’s make sure we’re connected in the places it suits you best. Here’s where you can find different content:
My blog - daily videos, blog posts, and podcast episodes
My YouTube channel - daily videos, conference talks, and all things video
My company, Trust Insights - marketing analytics help
My podcast, Marketing over Coffee - weekly episodes of what’s worth noting in marketing
My second podcast, In-Ear Insights - the Trust Insights weekly podcast focused on data and analytics
On Bluesky - random personal stuff and chaos
On LinkedIn - daily videos and news
On Instagram - personal photos and travels
My free Slack discussion forum, Analytics for Marketers - open conversations about marketing and analytics
Listen to my theme song as a new single:
Advertisement: Ukraine 🇺🇦 Humanitarian Fund
The war to free Ukraine continues. If you’d like to support humanitarian efforts in Ukraine, the Ukrainian government has set up a special portal, United24, to help make contributing easy. The effort to free Ukraine from Russia’s illegal invasion needs your ongoing support.
👉 Donate today to the Ukraine Humanitarian Relief Fund »
Events I’ll Be At
Here are the public events where I’m speaking and attending. Say hi if you’re at an event also:
HFCU, Cambridge, March 2026
SSI, Charlotte, April 2026
The Trust Insights Generative AI Workshop, sometime this spring!
SMPS AI Conference, November 2026
There are also private events that aren’t open to the public.
If you’re an event organizer, let me help your event shine. Visit my speaking page for more details.
Can’t be at an event? Stop by my private Slack group instead, Analytics for Marketers.
Required Disclosures
Events with links have purchased sponsorships in this newsletter and as a result, I receive direct financial compensation for promoting them.
Advertisements in this newsletter have paid to be promoted, and as a result, I receive direct financial compensation for promoting them.
My company, Trust Insights, maintains business partnerships with companies including, but not limited to, IBM, Cisco Systems, Amazon, Talkwalker, MarketingProfs, MarketMuse, Agorapulse, Hubspot, Informa, Demandbase, The Marketing AI Institute, and others. While links shared from partners are not explicit endorsements, nor do they directly financially benefit Trust Insights, a commercial relationship exists for which Trust Insights may receive indirect financial benefit, and thus I may receive indirect financial benefit from them as well.
Thank You
Thanks for subscribing and reading this far. I appreciate it. As always, thank you for your support, your attention, and your kindness.
Please share this newsletter with two other people.
See you next week,
Christopher S. Penn





The 5% cost angle is spot on but I reckon the bigger story is how trivially you can plug these models into existing agent workflows now. I have been running Kimi K2.5 through Synthetic.new as a drop-in for Claude Code and the setup is literally one shell function https://reading.sh/how-to-get-3x-claude-rate-limits-for-30-a-month-1d3fdb8658df
The rate limits are the thing that pushed me over. 135 messages per 5-hour window for $30/month vs the roughly 45 you get on Claude Pro. For agentic loops where tool calls stack up fast, that gap matters more than the model delta.
Curious what providers you have been recommending to your audience? Most of the hosted open-weight options I have tried vary wildly in terms of reliability under sustained load.