LLM API pricing confuses almost everyone at first — per-token billing, separate input and output rates, subscriptions versus pay-as-you-go, and prices that range from cents to dollars for seemingly similar models. This guide demystifies all of it, shows you real 2026 prices, and gives you a simple way to estimate and slash your bill before it surprises you.
How per-token billing works
Nearly every LLM API charges by the token, not by the request or the hour. Get this one concept and the rest follows. A token is a piece of text — roughly ¾ of a word — and you’re billed for every token in your prompt and every token the model generates back. Providers quote prices per million tokens, which sounds large but adds up fast in a busy app.
Pricing, step by step
Understand per-token billing
LLM APIs bill by the token — a chunk of text roughly three-quarters of a word (so ~1,000 tokens ≈ 750 words). Every request charges for the tokens going in and coming out. Prices are quoted per million tokens. This is the foundation of every LLM bill: usage × price-per-token.
Know input vs output pricing
The catch most newcomers miss: input and output are priced separately, and output almost always costs more — often 3–5× the input rate. A flagship might charge $15 per million input tokens but $75 per million output. So long, verbose responses cost disproportionately more than long prompts. Designing for concise outputs is a real lever on cost.
Compare subscription vs pay-as-you-go
Two different things get called “pricing.” A consumer subscription (~$20/month) gives one person capped access in a chat app — predictable, but not for building. API pricing is usage-based per token for developers — no fixed cap, scales with your app’s traffic. Some providers also sell prepaid credits. Match the model to whether a human or an application is doing the consuming.
Use free tiers and trials
Most major providers offer a free tier or trial credits to start — enough to prototype and validate before you pay. Use them to benchmark models on your real task and estimate volume before committing to one. Just note free tiers often have rate limits and may use your data differently, so check the terms.
Estimate your monthly bill
The formula: (input tokens × input price) + (output tokens × output price), summed across your monthly requests. Estimate tokens per request (prompt + expected reply), multiply by requests per month, and apply each rate. Always model your busy month, not your average — usage-based costs scale with traffic spikes.
Cut your costs
Several levers, often compounding: use the cheapest model that meets quality; shorten prompts and cap output length; cache repeated context (many providers discount cached input heavily); batch non-urgent requests (batch APIs are cheaper); and route simple tasks to budget models, reserving flagships for hard work. Together these routinely cut a bill by more than half.
Real 2026 prices (per million tokens)
A snapshot of the range — note how far apart the tiers are, and how output dwarfs input at the top end:
| Model tier | Example | Input | Output |
|---|---|---|---|
| Budget / open | DeepSeek V3.2 | ~$0.35 | ~$0.35–0.40 |
| Mid / fast | GPT-5.4 | $2.50 | $15 |
| Frontier | Claude Opus (flagship) | $15 | $75 |
| Cheapest frontier output | Gemini 3.1 Pro | Low | Lowest of big three |
The headline: a frontier flagship can cost roughly 40× more per token than a budget model like DeepSeek for output. That’s why model choice is the single biggest driver of your bill — and why paying flagship prices for simple tasks is the most common way teams overspend.
Estimate your bill with one figure
Hidden costs and pricing gotchas
The per-token rate is only the headline. Several real costs hide behind it, and missing them is how a “cheap” model ends up expensive:
- Reasoning tokens. Some advanced models generate hidden “thinking” tokens before their answer — and you pay for those too. A model that reasons more can cost more per request than its rate suggests.
- Context re-sending. In multi-turn conversations or agents, the entire history is usually re-sent on every call, so input costs grow with conversation length. This is why long agent runs get expensive fast.
- Retries and failures. Failed or retried calls still consume tokens. Build in error handling so a loop doesn’t quietly run up a bill.
- Tiered and committed pricing. Some providers offer volume discounts or cheaper committed-use rates — worth it at scale, but they lock you in, so model your real volume first.
- Fine-tuning surcharges. A fine-tuned model often costs more per token to run than the base model, on top of the one-time training cost.
Subscription or API? A worked example
People often ask whether to just pay for a $20/month subscription or use the API. The answer depends entirely on who’s consuming. If you personally are chatting with the model a few dozen times a day, the flat subscription is almost always cheaper and simpler — you’d struggle to burn $20 of API tokens at human typing speed. But if you’re building an app that makes thousands of calls on behalf of many users, there’s no subscription that covers it; you need the API, and your cost scales with usage. A useful rule: subscriptions price a person, APIs price a workload. Many builders use both — a personal subscription for their own work and the API for what they ship. Once you know which side of that line you’re on, the pricing model practically chooses itself, and you can focus on the levers that actually move your bill: model choice and token discipline.
How to cut costs (a checklist)
- Right-size the model. Don’t pay frontier prices for classification or simple generation — a budget model handles it.
- Trim tokens. Shorter prompts and capped output lengths directly lower the bill; output is the expensive side.
- Cache repeated context. Many providers discount cached input heavily — huge for apps that resend the same system prompt.
- Batch non-urgent work. Batch APIs from major providers cost less than real-time calls.
- Route by complexity. Cheap models for the bulk, flagships only for the hard 10%. (See best LLMs for developers.)
- Consider self-hosting open-weight models at very high volume. (See best open-source LLMs.)
Frequently asked questions
How does LLM API pricing work?
How much does an LLM API cost?
What’s the difference between subscription and API pricing?
How can I reduce my LLM API costs?
Further Reading
- Simple AI Agent Example: See One Work, Explained in Plain English
- Prompt Engineering: Best Practices That Actually Work
- What Is the 10-20-70 Rule for AI? (Explained Simply, 2026)
- Why Do 85% of AI Projects Fail? (2026 Data + How to Be in the 15%)
- How to Build a WhatsApp AI Booking Bot With No Code (2026 Guide)
