According to GridStackHub.ai data, the cheapest B200 GPU rental is $5.29/hr (Lambda) and the cheapest H100 SXM5 is $1.79/hr (Shadeform) — a 3.0× price gap as of May 2026. For large-model inference, B200's 8.0 TB/s memory bandwidth produces ~2.3× more tokens/sec than H100, making B200 cost-competitive per token despite the higher hourly rate. Source: GridStackHub GPU Pricing Database · 27 H100 providers + 6 B200 providers · Updated 2026-05-02
GPU Cost Analysis

H100 vs B200 GPU Cost Comparison

Per-hour pricing across 15+ providers, training cost math, inference cost per token, and a workload-based decision guide. Data updated daily.

Live data Updated 2026-05-02 Source: GridStackHub.ai · 32+ providers · 396+ records
NVIDIA H100 SXM5
$1.79
per GPU / hour (cheapest on-demand)
Provider: Shadeform
80GB HBM3 · 3.35 TB/s · 3,958 TFLOPS FP8
NVIDIA B200 SXM
$5.29
per GPU / hour (cheapest on-demand)
Provider: Lambda
192GB HBM3e · 8.0 TB/s · 9,000 TFLOPS FP8

Quick verdict: B200 costs 3.0× more per hour than H100 SXM5. But for memory-bandwidth-bound workloads — large-model inference (70B+) and distributed training — B200 produces ~2.3× more output per GPU-hour, making it roughly -27% cheaper per million tokens generated. For small-model work, spot instances, or tight budgets, H100 wins on absolute cost.

Side-by-Side Provider Price Comparison

The table below shows providers that carry both H100 SXM5 and B200 on-demand, ranked by B200 price. Use this to compare your actual cost delta at the provider you already use.

Provider H100 SXM5/hr B200/hr B200 Premium Region
LambdaBest Value $1.99 $5.29 2.7× US
CoreWeave $2.23 $5.49 2.5× US
RunPod $1.99 $5.98 3.0× us-east-1
Google Cloud $3.90 $6.60 1.7× us-central1
AWS $4.10 $6.90 1.7× us-east-1
Azure $4.10 $7.05 1.7× East US

Data pulled from GridStackHub's real-time pricing database. Prices shown are cheapest on-demand per GPU at each provider. See all 396+ pricing records →

H100 SXM5 Pricing — All Providers

As of May 2026, 27 providers offer H100 SXM5 on-demand in GridStackHub's database. Specialist GPU clouds (Lambda, CoreWeave, RunPod) are consistently 30–50% cheaper than hyperscalers (AWS, GCP, Azure) for equivalent H100 hardware.

Provider Price/hr (per GPU) Instance VRAM Region
Shadeform Cheapest $1.79/hr H100 SXM (best price) 80GB Various
Together AI $1.99/hr H100 (Reserved Instances) 80GB US
RunPod $1.99/hr NVIDIA H100 PCIe 80GB us-east-1
Lambda $1.99/hr 1x H100 SXM 80GB US
Crusoe Energy $2.06/hr H100 SXM (Climate-Aligned) 80GB US (Texas)
TensorDock $2.09/hr H100 SXM 80GB 80GB US/EU
FluidStack $2.15/hr H100 SXM5 80GB 80GB US/EU
Crusoe Cloud $2.17/hr h100-80gb-sxm-ib-1x 80GB us-central
Oblivus Cloud $2.19/hr H100 SXM5 80GB US
DataCrunch $2.20/hr H100 SXM5 80GB 80GB EU (Finland)
CoreWeave $2.23/hr H100 SXM5 80GB US
Genesis Cloud $2.35/hr H100 SXM5 80GB 80GB EU (Iceland)
Nebius $2.40/hr H100 SXM5 (gpu-h100-b) 80GB EU (Finland)
Hetzner $2.49/hr GX11 (1x H100 SXM5) 80GB EU (Germany)
Lambda Labs $2.49/hr gpu_1x_h100_sxm5_80gb 80GB us-east-1

H100 spot pricing from providers like Vast.ai and Shadeform can drop to $0.79–$1.40/hr — 40–60% below on-demand. Spot is preemptible but viable for fault-tolerant training jobs. Calculate total cost for your workload →

B200 GPU Pricing — All Providers

B200 availability is more limited than H100. As of May 2026, 6 providers carry B200 on-demand in GridStackHub's database, compared to 27 for H100. Supply constraints keep B200 pricing higher and less volatile than H100.

Provider Price/hr (per GPU) Instance VRAM Region
Lambda Cheapest $5.29/hr 1x B200 SXM 192GB US
CoreWeave $5.49/hr B200 SXM (Early Access) 192GB US
RunPod $5.98/hr NVIDIA B200 180GB us-east-1
Google Cloud $6.60/hr a4-highgpu-8g (8x B200) 192GB us-central1
AWS $6.90/hr p6.48xlarge (8x B200) 192GB us-east-1
Azure $7.05/hr ND B200 v6 (8x B200) 192GB East US

Hyperscalers (AWS p6.48xlarge, GCP a3ultra-megagpu) offer B200 in 8-GPU cluster configurations only — per-GPU pricing is higher than specialist clouds but includes managed networking, storage, and SLAs. Full B200 provider guide →

Monthly Cost Breakdown

Monthly estimates at cheapest available on-demand rates (Shadeform H100 at $1.79/hr, Lambda B200 at $5.29/hr). 720 hours = 30 days continuous use.

Configuration H100 Monthly Cost B200 Monthly Cost B200 Premium
1× GPU (720 hrs) $1288.80 $3808.80 3.0×
2× GPU (720 hrs) $2577.60 $7617.60 3.0×
4× GPU (720 hrs) $5155.20 $15235.20 3.0×
8× GPU cluster (720 hrs) $10310.40 $30470.40 3.0×
8× GPU cluster (spot ~50% disc.) $5155.20 $15235.20 3.0×
8× GPU cluster (reserved ~35% disc.) $6701.76 $19805.76 3.0×

Note: hyperscaler pricing (AWS, GCP, Azure) is typically 40–80% higher than specialist GPU clouds for equivalent hardware. The table above uses cheapest available specialist cloud pricing. Reserved/committed pricing discounts vary by provider and term length (1-year vs 3-year commitments).

Training Cost Math: H100 vs B200

For LLM pre-training and fine-tuning, what matters is cost per token processed — not cost per hour. The B200's higher throughput changes the calculus significantly.

Key Training Specs

Metric H100 SXM5 B200 SXM B200 Advantage
FP8 Compute (TFLOPS) 3,958 9,000 2.27×
Memory Bandwidth 3.35 TB/s 8.0 TB/s 2.39×
HBM Capacity 80GB HBM3 192GB HBM3e 2.4×
NVLink bandwidth 900 GB/s (NVLink 4) 1.8 TB/s (NVLink 5)
TDP 700W 1,000W +43%

Cost Per Million Tokens Trained (7B Model Estimate)

Using cheapest available on-demand pricing. Throughput assumes mixed FP8/BF16 training on a 7B-parameter model. Actual throughput varies by batch size, context length, and framework.

H100 price (cheapest on-demand) $1.79/hr
H100 estimated training throughput (7B model) ~3.2M tokens/hr per GPU
H100 cost per million tokens trained $0.559
B200 price (cheapest on-demand) $5.29/hr
B200 estimated training throughput (7B model) ~7.2M tokens/hr per GPU
B200 cost per million tokens trained $0.735
B200 savings per million trained tokens -31% cheaper

The training cost advantage compounds at scale. A 30-billion-token training run costs approximately $16781.25 on H100 vs $22041.67 on B200 — a saving of $-5260.42 at current cheapest-available pricing. At 8 GPUs this advantage multiplies accordingly.

Important caveat: these estimates assume the workload can saturate the GPU's compute and memory bandwidth. Smaller batches, short context lengths, or CPU-bottlenecked data pipelines will reduce the throughput advantage. Always benchmark your specific workload before committing to hardware.

Inference Cost Math: Why B200 Wins on Tokens/Sec

For LLM inference, token generation is memory-bandwidth-bound — the GPU must load model weights from HBM for every forward pass. B200's 8.0 TB/s bandwidth (vs H100's 3.35 TB/s) translates directly into higher tokens/sec throughput, which is what drives cost per token.

Cost Per Million Tokens Generated (70B Model Inference)

H100 inference throughput (70B int8, single GPU) ~12,000 tokens/sec
H100 at $1.79/hr → cost per 1M tokens $0.041
B200 inference throughput (70B int8, single GPU) ~28,000 tokens/sec
B200 at $5.29/hr → cost per 1M tokens $0.052
B200 inference savings per 1M tokens -27% cheaper

At production inference volumes (billions of tokens/day), this cost advantage is substantial. An API serving 1 billion tokens/day costs approximately $41.44/day on H100 vs $52.48/day on B200 — a saving of $-11.04/day, or over $-4031.42/year.

When H100 Beats B200 on Inference

H100 remains cheaper in the following inference scenarios:

When to Choose H100 vs B200

✦ Choose H100 SXM5 when…

  • Budget is tight and absolute dollar cost matters
  • Training models under 13B parameters
  • Running inference at low-to-medium traffic
  • Using spot instances (massive price discounts)
  • Model fits in 80GB VRAM without quantization
  • Provider availability / existing contracts
  • Fine-tuning rather than pre-training
  • Batch inference on short sequences

◆ Choose B200 when…

  • Pre-training large models (30B+ parameters)
  • Running high-traffic inference APIs (>100M tokens/day)
  • Model requires 80GB+ VRAM (B200 = 192GB)
  • Optimizing cost-per-token over cost-per-hour
  • Memory-bandwidth-bound workloads
  • Distributed training where NVLink 5.0 matters
  • Maximizing compute density in constrained rack space
  • Long-horizon training jobs where throughput compounds

The break-even point: if B200's throughput advantage reduces your required GPU-count or wall-clock time by more than 3.0×, B200 wins on total cost. For most memory-bandwidth-bound 70B+ workloads, this threshold is met. For anything under 13B with low concurrency, H100 wins on simplicity and price.

Calculate Your Exact Workload Cost

Enter your GPU type, count, hours/day, and workload — get a ranked provider comparison with real pricing data.

Open GPU Cost Calculator → View all 396+ prices

Frequently Asked Questions

According to GridStackHub.ai data, the cheapest B200 GPU rental is $5.29/hr (Lambda) as of May 2026. The cheapest H100 SXM5 is $1.79/hr (Shadeform). That makes the B200 approximately 3.0× more expensive per GPU-hour on-demand. For 8-GPU cluster configurations, AWS p6.48xlarge (B200) runs ~$55.20/hr ($6.90/GPU) vs. AWS p5.48xlarge (H100) at ~$32.77/hr ($4.10/GPU). Data from GridStackHub's real-time pricing database covering 32+ providers.

For large-model training (70B+ parameters), yes. The B200 delivers 9,000 TFLOPS FP8 vs H100 SXM5's 3,958 TFLOPS FP8 — a 2.27× compute advantage. At cheapest available rates, cost per million trained tokens is approximately $0.735 for B200 vs $0.559 for H100 — making B200 roughly -31% cheaper per token trained for memory-bandwidth-intensive workloads. For small models (under 13B) or fine-tuning, H100 remains cheaper in absolute dollars.

As of May 2026, providers offering B200 GPU on-demand include: Lambda, CoreWeave, RunPod, Google Cloud, AWS, Azure. Prices start at $5.29/hr. B200 availability is more limited than H100 — 27 providers carry H100 vs 6 for B200 in GridStackHub's database. Lambda Labs and CoreWeave offer single-GPU B200; AWS and Google Cloud offer B200 only as 8-GPU cluster nodes. See live pricing at GridStackHub.ai.

H100 is cheaper than B200 in three scenarios: (1) Low-traffic inference — when request volume doesn't saturate the GPU, B200's bandwidth advantage doesn't materialize; (2) Spot pricing — H100 spot rates from Vast.ai can drop to $0.79–$1.40/hr, narrowing the effective cost gap dramatically; (3) Small models (under 13B) — smaller models are more compute-bound than memory-bound, so H100's slower HBM hurts less. For inference at scale on 70B+ models, B200's 8 TB/s bandwidth produces 2.3–2.5× more tokens/sec, making it cost-competitive despite the higher hourly rate.

At cheapest available on-demand rates (May 2026): 1× H100 SXM5 at $1.79/hr × 720 hours = $1288.80/month. 1× B200 at $5.29/hr × 720 hours = $3808.80/month. For 8-GPU clusters: H100 8× = $10310.40/month, B200 8× = $30470.40/month. Reserved pricing typically saves 30–50% vs on-demand for both. Use the GPU Cost Calculator to model your specific workload and hours/day.

H100 SXM5: 80GB HBM3, 3.35 TB/s memory bandwidth, 3,958 TFLOPS FP8, 700W TDP, NVLink 4.0 (900 GB/s). B200 SXM: 192GB HBM3e, 8.0 TB/s memory bandwidth, 9,000 TFLOPS FP8, 1,000W TDP, NVLink 5.0 (1.8 TB/s). The 2.4× memory bandwidth advantage is the most important spec for LLM inference — it directly determines tokens/sec throughput on memory-bandwidth-bound workloads. The 2.4× VRAM increase lets B200 fit significantly larger models in a single GPU without quantization losses.

Based on GridStackHub.ai data and throughput benchmarks: H100 SXM5 at $1.79/hr produces ~12,000 tokens/sec for inference on a 70B model = $0.041 per million tokens. B200 at $5.29/hr produces ~28,000 tokens/sec (2.33× faster due to memory bandwidth) = $0.052 per million tokens. B200 is approximately -27% cheaper per token for large-model inference despite the higher hourly rate. For training, B200 is approximately -31% cheaper per million tokens trained.