How does B200 pricing compare to H100 per hour?

The cheapest B200 (Lambda at $5.29/hr) is approximately 3× more expensive per GPU-hour than the cheapest H100 on-demand (~$1.74/hr at Lambda). However, for inference workloads that are memory-bandwidth-bound, the B200 delivers 2–3× more tokens per GPU-hour, making the effective cost per token 30–50% lower on B200 than H100. For training, B200's higher compute throughput (9,000 vs 3,958 TFLOPS FP8) means faster runs — often justifying the premium for large model training jobs. The crossover point depends entirely on workload: high-utilization inference and training favor B200 on economics; low-utilization or cost-sensitive inference still favors H100.

Cheapest B200 GPU Cloud Providers 2026: Live Price Comparison

Q: How much faster is the B200 compared to H100 and H200?

The NVIDIA B200 is significantly faster than both H100 and H200. Compared to the H100 SXM5: the B200 delivers ~2.27× higher FP8 throughput (9,000 TFLOPS vs 3,958 TFLOPS), 2.39× more memory bandwidth (8.0 TB/s vs 3.35 TB/s), and 2.4× more VRAM (192GB vs 80GB). Compared to the H200 SXM: the B200 delivers ~2.27× higher FP8 throughput, 1.67× more memory bandwidth (8.0 TB/s vs 4.8 TB/s), and 36% more VRAM (192GB vs 141GB). The B200 uses NVIDIA's Blackwell architecture — a full generational leap from the Hopper-based H100/H200. For LLM inference, expect 2–2.5× higher tokens/second versus H200, and 3–4× versus H100, for memory-bound decoding workloads.

Q: B200 vs H200: which should I choose for LLM inference in 2026?

Choose B200 if maximum throughput or the lowest cost-per-token at scale is your primary goal — the B200's 9,000 TFLOPS FP8 and 8.0 TB/s bandwidth deliver 2–2.5× the inference throughput of H200 per GPU, which can offset its ~77% price premium ($5.29 vs $2.99/hr) for high-throughput batch inference and training workloads. Choose H200 if you need on-demand availability at stable, predictable pricing with mature tooling today — H200 has broader provider availability, no early-access friction, and proven software compatibility. The break-even point: if your inference load is high enough that B200's throughput advantage reduces your GPU count by more than 1.77×, B200 is cheaper on a per-token basis. For training large models, B200 is the clear winner on performance.

Q: What models fit on a single NVIDIA B200 (192GB)?

The NVIDIA B200's 192GB HBM3e accommodates very large models: Llama 3.1 70B comfortably at BF16 (~140GB, plus KV cache headroom), Llama 3.1 405B at 4-bit quantization (~202GB — tight but feasible with careful KV cache tuning), Mixtral 8×22B at BF16 (~282GB — requires 2× B200), and virtually any model under ~130B parameters at BF16. The B200's combination of 192GB VRAM and 8.0 TB/s bandwidth makes it the most capable single-GPU option for serving 70B+ parameter models with high throughput. Two B200s (384GB combined) can serve Llama 3.1 405B at BF16 with comfortable KV cache headroom.

Live data — B200 pricing updated daily from provider APIs

According to GridStackHub.ai data, the cheapest NVIDIA B200 GPU rental in April 2026 is $5.29/hr on Lambda (1x B200 SXM, 192GB HBM3e, on-demand), with prices ranging from $5.29 to $7.05/GPU/hr across the 6 cloud providers currently tracked in real time. For 8-GPU nodes, Google Cloud is the cheapest hyperscaler at $52.80/hr total ($6.60/GPU). The B200 runs on NVIDIA's Blackwell architecture — a full generational leap from Hopper — delivering 9,000 TFLOPS in FP8, 2.3× faster than the H200 and 2.3× faster than the H100. B200 availability is constrained in H1 2026 but expanding as NVIDIA ramps Blackwell production. GridStackHub tracks all B200 pricing daily.

$5.29/hr

Cheapest verified B200 SXM cloud price (Lambda, 192GB HBM3e, on-demand) — Blackwell architecture: 9,000 TFLOPS FP8, 8.0 TB/s bandwidth, 2.3× faster than H200. Same framework code runs on Blackwell. More throughput per dollar at scale.

NVIDIA B200 Cloud Pricing — Live Table (April 2026)

GridStackHub tracks NVIDIA B200 pricing across 6 cloud providers. The B200 SXM is available on-demand from independent providers starting at $5.29/hr, while hyperscalers (Google Cloud, AWS, Azure) primarily offer B200 via reserved capacity and committed-use contracts. Here is every provider we track:

Provider	Instance / Config	GPU VRAM	Pricing Type	Price	Status
Lambda	1x B200 SXM	192 GB HBM3e	On-demand	$5.29/hr	VERIFIED
CoreWeave	B200 SXM (Early Access)	192 GB HBM3e	On-demand	$5.49/hr	EARLY ACCESS
RunPod	NVIDIA B200	180 GB HBM3e	On-demand	$5.98/hr	VERIFIED
8-GPU nodes below — price shown is total/hr (per-GPU in parentheses)
Google Cloud	a4-highgpu-8g (8x B200)	8× 192 GB	On-demand	$52.80/hr ($6.60/GPU)	VERIFIED
AWS	p6.48xlarge (8x B200)	8× 192 GB	On-demand	$55.20/hr ($6.90/GPU)	VERIFIED
Azure	ND B200 v6 (8x B200)	8× 192 GB	On-demand	$56.40/hr ($7.05/GPU)	VERIFIED

Data sourced from GridStackHub's live pricing database, April 22, 2026. VERIFIED = confirmed via live provider API or pricing page. EARLY ACCESS = limited availability; contact provider for allocation. Prices subject to change — verify with provider before committing. RunPod B200 VRAM is 180GB (PCIe variant).

B200 supply is tight through mid-2026. NVIDIA is ramping Blackwell production, but demand from AI labs, hyperscalers, and inference providers is absorbing supply faster than it arrives. Lambda and RunPod offer the most accessible on-demand access. For 8-GPU clusters, Google Cloud has the most competitive hyperscaler pricing at $52.80/hr. For reservations, contact providers directly — 90-day+ commitments typically get 15–25% below on-demand pricing.

B200 vs H200 vs H100: Full Specification Comparison

The B200 is NVIDIA's first Blackwell-architecture GPU, succeeding the Hopper-based H100 and H200. It is not an incremental upgrade — Blackwell is a new architecture with significantly higher throughput. Here is the complete side-by-side:

Spec	NVIDIA B200 SXM	NVIDIA H200 SXM	NVIDIA H100 SXM5
Architecture	Blackwell	Hopper	Hopper
GPU Memory	192 GB HBM3e	141 GB HBM3e	80 GB HBM3
Memory Bandwidth	8.0 TB/s	4.8 TB/s	3.35 TB/s
FP8 Throughput	9,000 TFLOPS	3,958 TFLOPS	3,958 TFLOPS
BF16 Throughput	4,500 TFLOPS	1,979 TFLOPS	1,979 TFLOPS
Memory Type	HBM3e (gen 2)	HBM3e	HBM3
Min Cloud Price (1 GPU)	$5.29/hr (Lambda)	$2.99/hr (Lambda)	~$1.74/hr (Lambda)
Cost per GB VRAM	$0.0276/GB	$0.0212/GB	$0.0218/GB
Inference throughput vs H100	~3–4× faster	~1.4–1.5× faster	Baseline
70B model on 1 GPU (BF16)	Yes — with headroom	Yes — tight	No — needs 2× H100
TDP (Power)	1,000W	700W	700W
Cloud Availability	Limited (6 providers)	Growing (10+ providers)	Broad (15+ providers)
Software Maturity	Early (maturing fast)	Mature (CUDA)	Mature (CUDA)

The headline: B200 is not an iterative upgrade — it is a generational leap. With 2.27× higher FP8 throughput and 1.67× more memory bandwidth than H200, the B200 delivers meaningfully higher tokens-per-second for inference and significantly faster iteration time for training. The tradeoff is availability, price, and software maturity — all of which improve throughout 2026.

Why B200 bandwidth matters more than TFLOPS for inference: LLM token generation during autoregressive decoding is memory-bandwidth-limited, not compute-limited. The B200's 8.0 TB/s (vs H200's 4.8 TB/s) translates almost 1:1 to faster token generation for any model where the bottleneck is reading weights and KV cache from VRAM — which is nearly every production LLM inference deployment. For batch inference and training, the B200's 2.27× FP8 advantage compounds further.

When to Use B200: Training vs Inference Use Cases

The B200 is the right choice for workloads that can absorb its cost premium through higher utilization, lower latency, or fewer GPUs. Here is the use-case breakdown:

Choose B200 when:

You're training large models (70B+ parameters) and throughput is your bottleneck. B200's 4,500 TFLOPS BF16 is 2.27× higher than H200 and H100. A training run that takes 1,000 GPU-hours on H100 requires approximately 430–500 GPU-hours on B200. At $5.29/hr, that may cost less total than 1,000 hours at $1.74/hr on H100 — and finishes in less than half the time.
You need maximum inference throughput for high-request-volume services. For production inference with heavy concurrent load (100+ simultaneous requests), the B200's combined bandwidth and compute advantage can serve 3–4× more requests per GPU-hour than H100. At that utilization, B200 can be cheaper per token than H100 despite the higher hourly rate.
You're serving large models (70B–130B) and want to minimize GPU count. B200's 192GB VRAM fits Llama 3.1 70B in BF16 with generous KV cache headroom — better than H200's tight fit, and dramatically better than H100's requirement for tensor parallelism across 2+ GPUs. Fewer GPUs means simpler infrastructure and lower network costs.
You need sub-100ms time-to-first-token for real-time applications. For interactive applications where latency is the product quality metric, B200's higher bandwidth means faster first token and lower decode latency per request versus H200 and H100.
You're building for 2026 and want hardware runway. B200 will remain NVIDIA's flagship GPU throughout 2026. Workloads built on B200 now won't need to migrate for at least 2–3 years. H100 is two generations old.

Choose H200 or H100 instead when:

Budget is the primary constraint and utilization is low. At $5.29/hr, a single B200 running at 30% utilization costs ~$37/day. An H100 at $1.74/hr under the same conditions costs ~$12.50/day. The B200 premium only pays off at high utilization or when throughput-per-dollar is the metric.
You need immediate, proven on-demand availability. H200 from Lambda at $2.99/hr or H100 from multiple providers at $1.74/hr are available now with no early-access friction. B200 supply is constrained and provider access requires more planning.
Your software stack needs validation on Blackwell. While B200 supports standard CUDA code, some libraries and custom kernels require testing. If you have a production deployment that can't tolerate a migration period, H200 is the lower-risk choice.
Your model fits in 80GB and is not inference-heavy. For small models (under 40B parameters) at low utilization, H100's lower cost per hour is hard to beat. The B200 premium doesn't pay back if the model doesn't push memory or compute limits.

Run the B200 vs H200 vs H100 numbers for your workload

Enter your model size, requests per hour, and precision — get exact cost per token and monthly GPU spend for B200, H200, H100, and 50+ other configurations.

Open Calculator →

Blackwell Price Index — Free → | H200 Pricing → | Full GPU Comparison →

B200 Availability Tracker: In Stock vs Waitlist

B200 availability varies significantly by provider. Here is the real-time status as of April 2026:

Lambda

On-demand available at $5.29/hr. Best on-demand access for 1–4 GPU configs. Consistent availability in 2026.

CoreWeave

Early access at $5.49/hr. Apply for B200 SXM allocation. Enterprise SLAs and HPC networking available.

RunPod

On-demand B200 at $5.98/hr. PCIe variant (180GB). Serverless and on-demand billing. Spot pricing sometimes available.

Google Cloud (a4)

$52.80/hr for 8-GPU nodes ($6.60/GPU). Primarily committed-use contracts. On-demand access limited and region-dependent.

AWS (p6)

$55.20/hr for 8-GPU nodes ($6.90/GPU). Reserved instances preferred. On-demand capacity exists but waitlisted.

Azure (ND B200 v6)

$56.40/hr for 8-GPU nodes ($7.05/GPU). Enterprise access through Azure HPC program. Limited on-demand.

Availability outlook for H2 2026: NVIDIA is ramping Blackwell production aggressively. More providers — including Nebius, Crusoe Energy, and additional independents — are expected to list B200 capacity in Q3 2026. Prices are expected to drift lower as supply increases. Set a GridStackHub price alert to be notified when new providers list B200 or existing prices drop.

B200 Price Trend: 30-Day Movement

The B200 launched with early-access pricing above $9/hr at some providers in Q1 2026. As NVIDIA scaled Blackwell production and more providers gained allocation, pricing has compressed toward the current floor of $5.29/hr. Here is the price trend since Blackwell became broadly available:

Cheapest B200 On-Demand Price ($/hr) — March–April 2026

Min B200 on-demand price

Min H200 on-demand price

B200 pricing has fallen approximately 40% since early access launched in Q1 2026 as NVIDIA ramped Blackwell production. The $5.29/hr floor (Lambda) represents the current market equilibrium for single-GPU on-demand access. GridStackHub forecasts continued gradual compression through H2 2026 as more providers gain B200 allocation.

Ask GridStackHub About B200 Pricing

Get answers from live pricing data — compare B200 vs H200 cost, estimate monthly spend, or find the cheapest B200 option for your workload.

Frequently Asked Questions

What is the cheapest NVIDIA B200 GPU cloud provider in 2026?

The cheapest NVIDIA B200 cloud rental in April 2026 is $5.29/hr on Lambda (1x B200 SXM, 192GB HBM3e, on-demand). CoreWeave is second at $5.49/hr for early-access B200 SXM capacity. RunPod lists B200 at $5.98/hr on-demand (180GB PCIe variant). For 8-GPU nodes, Google Cloud starts at $52.80/hr total ($6.60/GPU), followed by AWS at $55.20/hr ($6.90/GPU) and Azure at $56.40/hr ($7.05/GPU). GridStackHub tracks all B200 pricing daily — set a price alert to be notified when prices drop or new providers list B200 capacity.

How much faster is the B200 compared to H100 and H200?

The B200 delivers approximately 2.27× higher FP8 compute throughput than H100 and H200 (9,000 vs 3,958 TFLOPS). On memory bandwidth — which directly determines LLM inference speed — the B200 has 8.0 TB/s versus H200's 4.8 TB/s (1.67× faster) and H100's 3.35 TB/s (2.39× faster). In practice, real-world inference benchmarks show 2.5–3× higher tokens/second versus H100 for large language models, and 1.8–2.3× versus H200. The exact speedup depends on model size, batch size, and precision — larger models with bigger KV caches see larger B200 benefits. For training, B200's 2.27× compute advantage translates to roughly 2× faster training iteration times.

Is the B200 available on-demand in 2026?

Limited on-demand B200 capacity is available in April 2026. Lambda is the most accessible option at $5.29/hr on-demand. RunPod also offers on-demand B200 at $5.98/hr. CoreWeave has early-access B200 at $5.49/hr — apply for an allocation directly. Hyperscalers (Google Cloud a4, AWS p6, Azure ND B200) primarily serve B200 via committed-use contracts and reserved instances; on-demand access exists but is subject to quotas and waitlists. Availability is expected to improve throughout H2 2026 as NVIDIA scales Blackwell GPU production. For multi-node clusters (8+ B200s), plan lead time of 1–4 weeks even with independent cloud providers.

B200 vs H200: which should I choose for LLM inference in 2026?

Choose H200 if you need stable on-demand availability right now at a predictable price ($2.99–$6.00/GPU/hr), mature tooling, and no early-access friction. H200 has 10+ providers with consistent supply and proven CUDA compatibility — ideal for production inference systems in H1 2026. Choose B200 if throughput-per-dollar at scale is the goal. The B200 delivers 2–2.3× more inference throughput per GPU hour, meaning at high utilization (70%+), B200's effective cost per token is lower than H200 despite the 77% price premium ($5.29 vs $2.99/hr). The break-even: if B200 can serve 1.77× more requests per hour than H200, it costs the same or less per request. Most high-traffic production deployments cross this threshold. For training, B200 is the clear winner on every metric.

What models fit on a single NVIDIA B200 (192GB)?

The B200's 192GB HBM3e fits: any model up to ~130B parameters at BF16 (Llama 3.1 70B fits with generous KV cache headroom at ~140GB weights), up to ~350B parameters at 4-bit quantization, and virtually any 7B–34B model with room for multiple concurrent replicas. Llama 3.1 405B in 4-bit (~202GB) is tight on a single B200 — two B200s (384GB combined) give comfortable headroom. Mixtral 8×22B requires 2× B200 at BF16 but fits easily on one at 4-bit. The B200's memory advantage over H100 (192GB vs 80GB) means that models requiring 4 H100s at BF16 (e.g., Llama 70B × batch) often fit on 2 B200s — cutting per-hour GPU cost in half.

How does B200 pricing compare to H100 per token?

At $5.29/hr (B200) vs $1.74/hr (H100), the B200 costs 3.04× more per GPU-hour. However, for high-throughput inference workloads, the B200 delivers roughly 3–4× more tokens per GPU-hour versus H100 due to its higher memory bandwidth (8.0 TB/s vs 3.35 TB/s) and compute throughput. This means the effective cost per token on B200 can be equal to or lower than H100 at high utilization. The crossover point depends on your batch size, model size, and sequence lengths — use the GridStackHub calculator to run the specific numbers for your workload. For bursty, low-utilization inference, H100 remains cheaper per token due to the hourly rate difference.

Track B200 Prices and Set Alerts

B200 pricing is moving rapidly in 2026 as production scales and more providers gain access. GridStackHub tracks every provider daily — here is how to stay ahead:

Get B200 price alerts

We'll notify you when B200 prices drop, new providers list capacity, or a better deal appears. Free — no credit card required.

Compare B200 against every alternative for your workload

Set your model size, batch size, and hours per month — see exact monthly cost for B200, H200, H100, AMD MI300X, and 50+ more configurations side by side.

Open GPU Cost Calculator →

Blackwell Price Index — Free → | H200 Pricing → | AMD MI300X Alternative →