Input your model size and monthly token volume. Get instant cost-per-million-token breakdown across 32 GPU cloud providers — with direct comparison against OpenAI and Anthropic API pricing.
| # | Provider & GPU | Pricing | $/GPU/hr | Tokens/sec [EST] | Cost / 1M Tokens | Monthly Cost | vs GPT-4o Mini | Region |
|---|
GPU price moves, token cost analysis, and provider comparisons — every Monday in your inbox.
Cost per token for self-hosted inference is a function of GPU hourly rate and token throughput. Here's the math:
Each GPU has a known approximate throughput for a given model size, measured in tokens/second. Throughput scales inversely with parameter count — a 70B model runs ~10× slower than a 7B model on the same hardware.
Divide the GPU hourly rate by the time required to generate 1 million tokens. A cheaper-but-slower GPU can beat an expensive-but-fast one on cost-per-token.
Based on your monthly token volume, we calculate how many GPU-hours you'd need and multiply by the hourly rate. This is your all-in compute cost before storage/egress.
Self-hosted cost per million is compared directly against OpenAI GPT-4o Mini ($0.75/1M) and Claude Haiku ($0.80/1M). The savings percentage shows how much cheaper self-hosting is at your volume.
Approximate tokens/second for a 7B model at FP16 precision with vLLM. Throughput scales inversely with parameter count.
| GPU | 7B Model (tok/s) | 13B Model (tok/s) | 34B Model (tok/s) | 70B Model (tok/s) | Relative Speed |
|---|---|---|---|---|---|
| NVIDIA H200 | 12,000 | 6,480 | 2,520 | 1,200 | |
| NVIDIA H100 | 8,000 | 4,320 | 1,680 | 800 | |
| NVIDIA A100 (80GB) | 4,500 | 2,430 | 945 | 450 | |
| NVIDIA L40S | 3,500 | 1,890 | 735 | 350 | |
| NVIDIA RTX 4090 | 2,500 | 1,350 | 525 | 250 | |
| NVIDIA A10G | 2,200 | 1,188 | 462 | 220 | |
| NVIDIA L4 | 2,000 | 1,080 | 420 | 200 | |
| NVIDIA T4 | 1,200 | 648 | 252 | 120 |
* Estimates assume vLLM with FP16 precision, moderate batch sizes (8–32), and no quantization. INT4 quantization can increase throughput 2–4×. Model size vs GPU VRAM: 7B requires ~14GB (fits on RTX 4090+), 70B requires ~140GB VRAM (requires H100/H200/multi-GPU setup). See full model cost analysis →