The unit economics of wrapping an LLM (why flat AI pricing loses money)

Charging a flat $20 for a metered product is a subsidy with a leak: a few heavy users burn most of the tokens, and the average stops protecting you. The three companies that own the models each learned it and moved to usage based pricing. Here is the trap, the research under it, and what to charge instead.

In the first months of 2023, GitHub Copilot was losing money on the people who used it most. The product cost $10 a month. Running it cost Microsoft more than $20 a month per user on average, and for the heaviest users as much as $80, according to figures the Wall Street Journal reported and GitHub disputed. A $10 subscription, a $20 to $80 bill to serve it. That is the whole problem with selling AI for a flat price, in one line.

The pitch you have heard, wrap an LLM, charge $20 a month, print money, skips the part where the model runs every single time someone uses it. Classic software has economies of scale: you build it once and the thousandth user costs you close to nothing. An AI product is the opposite. Every query re-runs the model, every token has a cost, and a flat fee is a bet that your average user will not also be your heaviest. On a metered product that bet has a way of going wrong, and the proof is that the three companies that own the models, the ones with the lowest costs and the best engineers in the business, each made it and each backed out.

The short version

If you are building an AI product to earn, price it like the metered product it is:

Meter the expensive path. A quick chat is cheap. An agent looping over a long context, calling tools, and retrying is not. Charge for the part that actually costs you, by usage or credits, even if a simple base plan sits in front of it.
Find your heavy tail before you price. A small share of users will burn most of your tokens. Model the top few percent, not the average, then set the price so the tail pays for itself or hits a cap.
Cap or tier the top. Unlimited on a metered cost is a standing invitation to the one user who runs your agent around the clock. Every provider below ended up capping exactly that.
Charge for value, not for a door fee. A flat monthly fee anchors the customer to “I pay $20,” which is the worst possible frame on the day their usage costs you $80. Tie the price to what the work is worth.
Do not read a flat-rate margin as profit. If you have not metered your heaviest cohort, the margin you are reporting is an average with a loss hidden inside it.

Everything after this is the evidence, in case you want to argue with it.

Three times the model owners learned it

Start with the companies that should have the best unit economics in the business, because they pay wholesale for the thing everyone else marks up. If flat pricing broke for them, it is not a skill issue.

GitHub Copilot is the first. More than $20 a month per user on average, up to $80 for the heaviest, on a $10 product, in early 2023. GitHub disputed the framing and pointed at its revenue, and the figures are reported rather than audited, so hold the exact numbers loosely. The direction is the part that held: Copilot has since pushed most of its heavier plans toward usage based and metered tiers.

Anthropic is the second, and the receipt is its own policy. In July 2025 it put weekly rate limits on Claude Code, because a small number of users were running it continuously in the background, 24/7, and a few were sharing and reselling accounts. Anthropic estimated the new limits would affect less than 5% of subscribers. Read that number the right way: under 5% of users were consuming enough to make the company that makes the model ration them. That is the heavy tail, named by the outfit best positioned in the world to absorb it.

Cursor is the third, and the most instructive, because you can watch the trap close in real time. In June 2025 it changed its $20 Pro plan so the allowance became “$20 of frontier model usage per month at API pricing” instead of a fixed number of requests. Heavy users of the pricier models burned through $20 of real cost quickly and started hitting overages; one user reported $350 in a week against a $20 expectation. The backlash was loud enough that CEO Michael Truell apologized on July 4, said the company “didn’t handle this pricing rollout well,” and refunded the surprise bills. The lesson is not that Cursor priced wrong. It is that even moving from a broken flat plan toward real costs gets treated by customers as a betrayal, which is the deeper problem, and the next section.

Why you charge flat anyway

If usage based pricing is the answer, why does everyone start at a flat $20, and why did Cursor’s own customers revolt when it metered them? Because customers genuinely prefer flat rates, even when flat costs them more, and this is one of the better-documented findings in pricing research. In a 2006 study in the Journal of Marketing Research, Lambrecht and Skiera found a persistent “flat-rate bias”: in real billing data, about 48% of subscribers on a metered plan were overpaying versus a flat option they could have switched to, and they kept choosing it anyway.

Two forces drive it. An insurance effect: people will pay a premium for a predictable bill, for the comfort of knowing the number before they use the thing. And a taxi-meter effect: watching a meter tick while you actively use a product sours the experience, even when the total comes out lower.

For you, that cuts both ways. The good news is that a flat plan is the easy sell, because customers want it, so it is the path of least resistance to your first dollars. The bad news is the same sentence. The customers most attached to your flat plan are the ones getting the most for their money, and the day you try to line price up with cost, the taxi-meter effect turns your heaviest users into your angriest ones. That is the Cursor story compressed to one line. Flat-rate bias is why the trap is so easy to walk into and so painful to walk back out of.

The case for ignoring all of this

There is a real argument that none of this matters, and it deserves a fair hearing: inference is getting cheaper so fast that a product losing money today will be profitable next year just by waiting. The numbers behind it are genuinely staggering. Epoch AI finds the price to reach a fixed level of capability is falling between 9 and 900 times per year depending on the task, a median of about 50 times a year. By one series tracked in Stanford’s AI Index, the cost to run a model as capable as GPT-3.5 fell from about $20 to about $0.07 per million tokens in under two years, more than a 280-fold drop.

So why not price flat, eat the loss, and ride the curve down? Two reasons. First, the curve cuts the price of a fixed capability, and nobody sells last year’s capability. You and your customers both move to the newer, larger, costlier models as they land, which is why the frontier stays expensive even as yesterday’s frontier gets cheap. Second, and this is the one that settles it, cheaper inference lowers the whole curve but does not change its shape. A flat fee on a metered cost loses money on the heavy tail at $20 per million tokens and at $0.07 per million tokens alike; the tail just runs the meter longer to get there. You cannot out-cheapen a user who runs an agent around the clock. The category’s margins improve. The structure is still the trap.

What survives contact with users

The fix is not to fear AI products. It is to price them like the metered products they are, and to do it in a way customers can actually stomach.

Hybrid beats pure flat. A modest base fee for the predictable part, which is the comfort customers are paying for, with usage or credits on top for the expensive part. You keep the insurance effect for the bulk of users and stop subsidizing the tail.
Pick a value metric, not a token meter. Per token is accurate, but it is the taxi meter, the exact thing people hate. Price on something the customer values and can predict, a seat, a project, a run, a successful outcome, and size it to cover your cost.
Put the cap where the cost is. Unlimited is fine on the cheap path and dangerous on the agentic one. Tier or cap the part that loops.
Warn before the bill, not after. Cursor’s mistake was not metering, it was the surprise. Show usage, warn before overage, and the same change lands as fair instead of as a betrayal.

It helps to know what good looks like, because AI margins sit structurally below the software business they keep getting compared to. Venture analyses put AI-native gross margins around 50% to 60%, against 80% to 90% for mature SaaS, with inference eating roughly a fifth of revenue. Those figures are self-reported and improving as inference gets cheaper, so treat them as a gauge rather than a verdict. The point is to price knowing the meter is real. For what that leaves a solo founder after the model takes its cut, the economics of a one-person AI business has the take-home math; for the number that actually goes on the page, the pricing psychology that holds up is the companion to this piece.

The wrapper is not doomed. The flat $20 is. Wrapping an LLM can be a real business, but only if you price for the user who runs the meter hot, because on a metered product that user is not the exception you can average away. That user is the whole ballgame.

Wrap an LLM, charge $20, lose money: the pricing trap the model owners hit first

The short version

Three times the model owners learned it

Why you charge flat anyway

The case for ignoring all of this

What survives contact with users

Get the receipts in your inbox.

Sources & how we researched this