Per-hour pricing across 15+ providers, training cost math, inference cost per token, and a workload-based decision guide. Data updated daily.
Quick verdict: B200 costs 3.0× more per hour than H100 SXM5. But for memory-bandwidth-bound workloads — large-model inference (70B+) and distributed training — B200 produces ~2.3× more output per GPU-hour, making it roughly -27% cheaper per million tokens generated. For small-model work, spot instances, or tight budgets, H100 wins on absolute cost.
The table below shows providers that carry both H100 SXM5 and B200 on-demand, ranked by B200 price. Use this to compare your actual cost delta at the provider you already use.
| Provider | H100 SXM5/hr | B200/hr | B200 Premium | Region |
|---|---|---|---|---|
| LambdaBest Value | $1.99 | $5.29 | 2.7× | US |
| CoreWeave | $2.23 | $5.49 | 2.5× | US |
| RunPod | $1.99 | $5.98 | 3.0× | us-east-1 |
| Google Cloud | $3.90 | $6.60 | 1.7× | us-central1 |
| AWS | $4.10 | $6.90 | 1.7× | us-east-1 |
| Azure | $4.10 | $7.05 | 1.7× | East US |
Data pulled from GridStackHub's real-time pricing database. Prices shown are cheapest on-demand per GPU at each provider. See all 396+ pricing records →
As of May 2026, 27 providers offer H100 SXM5 on-demand in GridStackHub's database. Specialist GPU clouds (Lambda, CoreWeave, RunPod) are consistently 30–50% cheaper than hyperscalers (AWS, GCP, Azure) for equivalent H100 hardware.
| Provider | Price/hr (per GPU) | Instance | VRAM | Region |
|---|---|---|---|---|
| Shadeform Cheapest | $1.79/hr | H100 SXM (best price) | 80GB | Various |
| Together AI | $1.99/hr | H100 (Reserved Instances) | 80GB | US |
| RunPod | $1.99/hr | NVIDIA H100 PCIe | 80GB | us-east-1 |
| Lambda | $1.99/hr | 1x H100 SXM | 80GB | US |
| Crusoe Energy | $2.06/hr | H100 SXM (Climate-Aligned) | 80GB | US (Texas) |
| TensorDock | $2.09/hr | H100 SXM 80GB | 80GB | US/EU |
| FluidStack | $2.15/hr | H100 SXM5 80GB | 80GB | US/EU |
| Crusoe Cloud | $2.17/hr | h100-80gb-sxm-ib-1x | 80GB | us-central |
| Oblivus Cloud | $2.19/hr | H100 SXM5 | 80GB | US |
| DataCrunch | $2.20/hr | H100 SXM5 80GB | 80GB | EU (Finland) |
| CoreWeave | $2.23/hr | H100 SXM5 | 80GB | US |
| Genesis Cloud | $2.35/hr | H100 SXM5 80GB | 80GB | EU (Iceland) |
| Nebius | $2.40/hr | H100 SXM5 (gpu-h100-b) | 80GB | EU (Finland) |
| Hetzner | $2.49/hr | GX11 (1x H100 SXM5) | 80GB | EU (Germany) |
| Lambda Labs | $2.49/hr | gpu_1x_h100_sxm5_80gb | 80GB | us-east-1 |
H100 spot pricing from providers like Vast.ai and Shadeform can drop to $0.79–$1.40/hr — 40–60% below on-demand. Spot is preemptible but viable for fault-tolerant training jobs. Calculate total cost for your workload →
B200 availability is more limited than H100. As of May 2026, 6 providers carry B200 on-demand in GridStackHub's database, compared to 27 for H100. Supply constraints keep B200 pricing higher and less volatile than H100.
| Provider | Price/hr (per GPU) | Instance | VRAM | Region |
|---|---|---|---|---|
| Lambda Cheapest | $5.29/hr | 1x B200 SXM | 192GB | US |
| CoreWeave | $5.49/hr | B200 SXM (Early Access) | 192GB | US |
| RunPod | $5.98/hr | NVIDIA B200 | 180GB | us-east-1 |
| Google Cloud | $6.60/hr | a4-highgpu-8g (8x B200) | 192GB | us-central1 |
| AWS | $6.90/hr | p6.48xlarge (8x B200) | 192GB | us-east-1 |
| Azure | $7.05/hr | ND B200 v6 (8x B200) | 192GB | East US |
Hyperscalers (AWS p6.48xlarge, GCP a3ultra-megagpu) offer B200 in 8-GPU cluster configurations only — per-GPU pricing is higher than specialist clouds but includes managed networking, storage, and SLAs. Full B200 provider guide →
Monthly estimates at cheapest available on-demand rates (Shadeform H100 at $1.79/hr, Lambda B200 at $5.29/hr). 720 hours = 30 days continuous use.
| Configuration | H100 Monthly Cost | B200 Monthly Cost | B200 Premium |
|---|---|---|---|
| 1× GPU (720 hrs) | $1288.80 | $3808.80 | 3.0× |
| 2× GPU (720 hrs) | $2577.60 | $7617.60 | 3.0× |
| 4× GPU (720 hrs) | $5155.20 | $15235.20 | 3.0× |
| 8× GPU cluster (720 hrs) | $10310.40 | $30470.40 | 3.0× |
| 8× GPU cluster (spot ~50% disc.) | $5155.20 | $15235.20 | 3.0× |
| 8× GPU cluster (reserved ~35% disc.) | $6701.76 | $19805.76 | 3.0× |
Note: hyperscaler pricing (AWS, GCP, Azure) is typically 40–80% higher than specialist GPU clouds for equivalent hardware. The table above uses cheapest available specialist cloud pricing. Reserved/committed pricing discounts vary by provider and term length (1-year vs 3-year commitments).
For LLM pre-training and fine-tuning, what matters is cost per token processed — not cost per hour. The B200's higher throughput changes the calculus significantly.
| Metric | H100 SXM5 | B200 SXM | B200 Advantage |
|---|---|---|---|
| FP8 Compute (TFLOPS) | 3,958 | 9,000 | 2.27× |
| Memory Bandwidth | 3.35 TB/s | 8.0 TB/s | 2.39× |
| HBM Capacity | 80GB HBM3 | 192GB HBM3e | 2.4× |
| NVLink bandwidth | 900 GB/s (NVLink 4) | 1.8 TB/s (NVLink 5) | 2× |
| TDP | 700W | 1,000W | +43% |
Using cheapest available on-demand pricing. Throughput assumes mixed FP8/BF16 training on a 7B-parameter model. Actual throughput varies by batch size, context length, and framework.
The training cost advantage compounds at scale. A 30-billion-token training run costs approximately $16781.25 on H100 vs $22041.67 on B200 — a saving of $-5260.42 at current cheapest-available pricing. At 8 GPUs this advantage multiplies accordingly.
Important caveat: these estimates assume the workload can saturate the GPU's compute and memory bandwidth. Smaller batches, short context lengths, or CPU-bottlenecked data pipelines will reduce the throughput advantage. Always benchmark your specific workload before committing to hardware.
For LLM inference, token generation is memory-bandwidth-bound — the GPU must load model weights from HBM for every forward pass. B200's 8.0 TB/s bandwidth (vs H100's 3.35 TB/s) translates directly into higher tokens/sec throughput, which is what drives cost per token.
At production inference volumes (billions of tokens/day), this cost advantage is substantial. An API serving 1 billion tokens/day costs approximately $41.44/day on H100 vs $52.48/day on B200 — a saving of $-11.04/day, or over $-4031.42/year.
H100 remains cheaper in the following inference scenarios:
The break-even point: if B200's throughput advantage reduces your required GPU-count or wall-clock time by more than 3.0×, B200 wins on total cost. For most memory-bandwidth-bound 70B+ workloads, this threshold is met. For anything under 13B with low concurrency, H100 wins on simplicity and price.
Enter your GPU type, count, hours/day, and workload — get a ranked provider comparison with real pricing data.
Open GPU Cost Calculator → View all 396+ pricesAccording to GridStackHub.ai data, the cheapest B200 GPU rental is $5.29/hr (Lambda) as of May 2026. The cheapest H100 SXM5 is $1.79/hr (Shadeform). That makes the B200 approximately 3.0× more expensive per GPU-hour on-demand. For 8-GPU cluster configurations, AWS p6.48xlarge (B200) runs ~$55.20/hr ($6.90/GPU) vs. AWS p5.48xlarge (H100) at ~$32.77/hr ($4.10/GPU). Data from GridStackHub's real-time pricing database covering 32+ providers.
For large-model training (70B+ parameters), yes. The B200 delivers 9,000 TFLOPS FP8 vs H100 SXM5's 3,958 TFLOPS FP8 — a 2.27× compute advantage. At cheapest available rates, cost per million trained tokens is approximately $0.735 for B200 vs $0.559 for H100 — making B200 roughly -31% cheaper per token trained for memory-bandwidth-intensive workloads. For small models (under 13B) or fine-tuning, H100 remains cheaper in absolute dollars.
As of May 2026, providers offering B200 GPU on-demand include: Lambda, CoreWeave, RunPod, Google Cloud, AWS, Azure. Prices start at $5.29/hr. B200 availability is more limited than H100 — 27 providers carry H100 vs 6 for B200 in GridStackHub's database. Lambda Labs and CoreWeave offer single-GPU B200; AWS and Google Cloud offer B200 only as 8-GPU cluster nodes. See live pricing at GridStackHub.ai.
H100 is cheaper than B200 in three scenarios: (1) Low-traffic inference — when request volume doesn't saturate the GPU, B200's bandwidth advantage doesn't materialize; (2) Spot pricing — H100 spot rates from Vast.ai can drop to $0.79–$1.40/hr, narrowing the effective cost gap dramatically; (3) Small models (under 13B) — smaller models are more compute-bound than memory-bound, so H100's slower HBM hurts less. For inference at scale on 70B+ models, B200's 8 TB/s bandwidth produces 2.3–2.5× more tokens/sec, making it cost-competitive despite the higher hourly rate.
At cheapest available on-demand rates (May 2026): 1× H100 SXM5 at $1.79/hr × 720 hours = $1288.80/month. 1× B200 at $5.29/hr × 720 hours = $3808.80/month. For 8-GPU clusters: H100 8× = $10310.40/month, B200 8× = $30470.40/month. Reserved pricing typically saves 30–50% vs on-demand for both. Use the GPU Cost Calculator to model your specific workload and hours/day.
H100 SXM5: 80GB HBM3, 3.35 TB/s memory bandwidth, 3,958 TFLOPS FP8, 700W TDP, NVLink 4.0 (900 GB/s). B200 SXM: 192GB HBM3e, 8.0 TB/s memory bandwidth, 9,000 TFLOPS FP8, 1,000W TDP, NVLink 5.0 (1.8 TB/s). The 2.4× memory bandwidth advantage is the most important spec for LLM inference — it directly determines tokens/sec throughput on memory-bandwidth-bound workloads. The 2.4× VRAM increase lets B200 fit significantly larger models in a single GPU without quantization losses.
Based on GridStackHub.ai data and throughput benchmarks: H100 SXM5 at $1.79/hr produces ~12,000 tokens/sec for inference on a 70B model = $0.041 per million tokens. B200 at $5.29/hr produces ~28,000 tokens/sec (2.33× faster due to memory bandwidth) = $0.052 per million tokens. B200 is approximately -27% cheaper per token for large-model inference despite the higher hourly rate. For training, B200 is approximately -31% cheaper per million tokens trained.