Are these prices exact?

Rates reflect March 2026 reference pricing; providers change prices — verify on their sites before budgeting.

Why is local Llama $0?

Self-hosted inference has no per-token API fee in this calculator; you still run your own hardware.

LLM Cost Calculator

Runs in browser

Compare LLM API pricing for GPT-4, Claude, Gemini, and more.

Estimate monthly API spend across GPT-4o, Claude, Gemini, and more. Enter requests and average token usage to compare input, output, and total cost with March 2026 pricing.

Monthly requests

Avg input tokens per request

Avg output tokens per request

Sort by

Model	Input cost	Output cost	Total/month	Context
Llama 3.3 (local)	—	—	Free — self-hosted	Self-hosted
Gemini Flash	$0.037	$0.060	$0.098	1M
GPT-4o mini	$0.075	$0.120	$0.195	128K
GPT-3.5 turbo	$0.250	$0.300	$0.550	16K
Claude Haiku 4.5	$0.400	$0.800	$1.20	200K
GPT-4o	$1.25	$2.00	$3.25	128K
Gemini 3.1 Pro	$1.00	$2.40	$3.40	1M
Claude Sonnet 4.6	$1.50	$3.00	$4.50	200K
Claude Opus 4.6	$7.50	$15.00	$22.50	200K

Prices last updated: March 2026. Always verify current pricing at each provider's website.

🔒 Runs in your browser · No uploads · Your data never leaves your device

How to use

Enter usage
Set monthly requests plus average input and output tokens per request.
Compare models
Review the table for input cost, output cost, total per month, and context window.
Sort
Sort by total cost, context size, or relative speed to match your priorities.

Common use cases

Budgeting AI features before launch — Estimate monthly API spend at your expected request volume before committing to a production rollout.
Comparing models for cost vs capability — Find the cheapest model that meets your token requirements by sorting the comparison table by total cost.

Examples

Default scenario

1k requests, 500 in / 200 out tokens.

Output

See highlighted cheapest row (often self-hosted or Flash-class models).

Frequently asked questions

Are these prices exact?: Rates reflect March 2026 reference pricing; providers change prices — verify on their sites before budgeting.
Why is local Llama $0?: Self-hosted inference has no per-token API fee in this calculator; you still run your own hardware.

Key concepts

Input tokens: Tokens in the prompt sent to the model — priced separately from output tokens in most LLM APIs.
Output tokens: Tokens in the model's response — typically priced 2–4× higher than input tokens.

You might find these useful too.

More ai tools

AI Prompt Diff
Diff two AI prompts — see changes, token delta, and cost impact by model.
Open