According to GridStackHub.ai data, the cheapest NVIDIA B200 GPU rental in April 2026 is $5.29/hr on Lambda (1x B200 SXM, 192GB HBM3e, on-demand), with prices ranging from $5.29 to $7.05/GPU/hr across the 6 cloud providers currently tracked in real time. For 8-GPU nodes, Google Cloud is the cheapest hyperscaler at $52.80/hr total ($6.60/GPU). The B200 runs on NVIDIA's Blackwell architecture — a full generational leap from Hopper — delivering 9,000 TFLOPS in FP8, 2.3× faster than the H200 and 2.3× faster than the H100. B200 availability is constrained in H1 2026 but expanding as NVIDIA ramps Blackwell production. GridStackHub tracks all B200 pricing daily.
Cheapest verified B200 SXM cloud price (Lambda, 192GB HBM3e, on-demand) — Blackwell architecture: 9,000 TFLOPS FP8, 8.0 TB/s bandwidth, 2.3× faster than H200. Same framework code runs on Blackwell. More throughput per dollar at scale.
NVIDIA B200 Cloud Pricing — Live Table (April 2026)
GridStackHub tracks NVIDIA B200 pricing across 6 cloud providers. The B200 SXM is available on-demand from independent providers starting at $5.29/hr, while hyperscalers (Google Cloud, AWS, Azure) primarily offer B200 via reserved capacity and committed-use contracts. Here is every provider we track:
| Provider | Instance / Config | GPU VRAM | Pricing Type | Price | Status |
|---|---|---|---|---|---|
| Lambda | 1x B200 SXM | 192 GB HBM3e | On-demand | $5.29/hr | VERIFIED |
| CoreWeave | B200 SXM (Early Access) | 192 GB HBM3e | On-demand | $5.49/hr | EARLY ACCESS |
| RunPod | NVIDIA B200 | 180 GB HBM3e | On-demand | $5.98/hr | VERIFIED |
| 8-GPU nodes below — price shown is total/hr (per-GPU in parentheses) | |||||
| Google Cloud | a4-highgpu-8g (8x B200) | 8× 192 GB | On-demand | $52.80/hr ($6.60/GPU) | VERIFIED |
| AWS | p6.48xlarge (8x B200) | 8× 192 GB | On-demand | $55.20/hr ($6.90/GPU) | VERIFIED |
| Azure | ND B200 v6 (8x B200) | 8× 192 GB | On-demand | $56.40/hr ($7.05/GPU) | VERIFIED |
Data sourced from GridStackHub's live pricing database, April 22, 2026. VERIFIED = confirmed via live provider API or pricing page. EARLY ACCESS = limited availability; contact provider for allocation. Prices subject to change — verify with provider before committing. RunPod B200 VRAM is 180GB (PCIe variant).
B200 supply is tight through mid-2026. NVIDIA is ramping Blackwell production, but demand from AI labs, hyperscalers, and inference providers is absorbing supply faster than it arrives. Lambda and RunPod offer the most accessible on-demand access. For 8-GPU clusters, Google Cloud has the most competitive hyperscaler pricing at $52.80/hr. For reservations, contact providers directly — 90-day+ commitments typically get 15–25% below on-demand pricing.
B200 vs H200 vs H100: Full Specification Comparison
The B200 is NVIDIA's first Blackwell-architecture GPU, succeeding the Hopper-based H100 and H200. It is not an incremental upgrade — Blackwell is a new architecture with significantly higher throughput. Here is the complete side-by-side:
| Spec | NVIDIA B200 SXM | NVIDIA H200 SXM | NVIDIA H100 SXM5 |
|---|---|---|---|
| Architecture | Blackwell | Hopper | Hopper |
| GPU Memory | 192 GB HBM3e | 141 GB HBM3e | 80 GB HBM3 |
| Memory Bandwidth | 8.0 TB/s | 4.8 TB/s | 3.35 TB/s |
| FP8 Throughput | 9,000 TFLOPS | 3,958 TFLOPS | 3,958 TFLOPS |
| BF16 Throughput | 4,500 TFLOPS | 1,979 TFLOPS | 1,979 TFLOPS |
| Memory Type | HBM3e (gen 2) | HBM3e | HBM3 |
| Min Cloud Price (1 GPU) | $5.29/hr (Lambda) | $2.99/hr (Lambda) | ~$1.74/hr (Lambda) |
| Cost per GB VRAM | $0.0276/GB | $0.0212/GB | $0.0218/GB |
| Inference throughput vs H100 | ~3–4× faster | ~1.4–1.5× faster | Baseline |
| 70B model on 1 GPU (BF16) | Yes — with headroom | Yes — tight | No — needs 2× H100 |
| TDP (Power) | 1,000W | 700W | 700W |
| Cloud Availability | Limited (6 providers) | Growing (10+ providers) | Broad (15+ providers) |
| Software Maturity | Early (maturing fast) | Mature (CUDA) | Mature (CUDA) |
The headline: B200 is not an iterative upgrade — it is a generational leap. With 2.27× higher FP8 throughput and 1.67× more memory bandwidth than H200, the B200 delivers meaningfully higher tokens-per-second for inference and significantly faster iteration time for training. The tradeoff is availability, price, and software maturity — all of which improve throughout 2026.
Why B200 bandwidth matters more than TFLOPS for inference: LLM token generation during autoregressive decoding is memory-bandwidth-limited, not compute-limited. The B200's 8.0 TB/s (vs H200's 4.8 TB/s) translates almost 1:1 to faster token generation for any model where the bottleneck is reading weights and KV cache from VRAM — which is nearly every production LLM inference deployment. For batch inference and training, the B200's 2.27× FP8 advantage compounds further.
When to Use B200: Training vs Inference Use Cases
The B200 is the right choice for workloads that can absorb its cost premium through higher utilization, lower latency, or fewer GPUs. Here is the use-case breakdown:
Choose B200 when:
- You're training large models (70B+ parameters) and throughput is your bottleneck. B200's 4,500 TFLOPS BF16 is 2.27× higher than H200 and H100. A training run that takes 1,000 GPU-hours on H100 requires approximately 430–500 GPU-hours on B200. At $5.29/hr, that may cost less total than 1,000 hours at $1.74/hr on H100 — and finishes in less than half the time.
- You need maximum inference throughput for high-request-volume services. For production inference with heavy concurrent load (100+ simultaneous requests), the B200's combined bandwidth and compute advantage can serve 3–4× more requests per GPU-hour than H100. At that utilization, B200 can be cheaper per token than H100 despite the higher hourly rate.
- You're serving large models (70B–130B) and want to minimize GPU count. B200's 192GB VRAM fits Llama 3.1 70B in BF16 with generous KV cache headroom — better than H200's tight fit, and dramatically better than H100's requirement for tensor parallelism across 2+ GPUs. Fewer GPUs means simpler infrastructure and lower network costs.
- You need sub-100ms time-to-first-token for real-time applications. For interactive applications where latency is the product quality metric, B200's higher bandwidth means faster first token and lower decode latency per request versus H200 and H100.
- You're building for 2026 and want hardware runway. B200 will remain NVIDIA's flagship GPU throughout 2026. Workloads built on B200 now won't need to migrate for at least 2–3 years. H100 is two generations old.
Choose H200 or H100 instead when:
- Budget is the primary constraint and utilization is low. At $5.29/hr, a single B200 running at 30% utilization costs ~$37/day. An H100 at $1.74/hr under the same conditions costs ~$12.50/day. The B200 premium only pays off at high utilization or when throughput-per-dollar is the metric.
- You need immediate, proven on-demand availability. H200 from Lambda at $2.99/hr or H100 from multiple providers at $1.74/hr are available now with no early-access friction. B200 supply is constrained and provider access requires more planning.
- Your software stack needs validation on Blackwell. While B200 supports standard CUDA code, some libraries and custom kernels require testing. If you have a production deployment that can't tolerate a migration period, H200 is the lower-risk choice.
- Your model fits in 80GB and is not inference-heavy. For small models (under 40B parameters) at low utilization, H100's lower cost per hour is hard to beat. The B200 premium doesn't pay back if the model doesn't push memory or compute limits.
Run the B200 vs H200 vs H100 numbers for your workload
Enter your model size, requests per hour, and precision — get exact cost per token and monthly GPU spend for B200, H200, H100, and 50+ other configurations.
Open Calculator →B200 Availability Tracker: In Stock vs Waitlist
B200 availability varies significantly by provider. Here is the real-time status as of April 2026:
Lambda
On-demand available at $5.29/hr. Best on-demand access for 1–4 GPU configs. Consistent availability in 2026.
CoreWeave
Early access at $5.49/hr. Apply for B200 SXM allocation. Enterprise SLAs and HPC networking available.
RunPod
On-demand B200 at $5.98/hr. PCIe variant (180GB). Serverless and on-demand billing. Spot pricing sometimes available.
Google Cloud (a4)
$52.80/hr for 8-GPU nodes ($6.60/GPU). Primarily committed-use contracts. On-demand access limited and region-dependent.
AWS (p6)
$55.20/hr for 8-GPU nodes ($6.90/GPU). Reserved instances preferred. On-demand capacity exists but waitlisted.
Azure (ND B200 v6)
$56.40/hr for 8-GPU nodes ($7.05/GPU). Enterprise access through Azure HPC program. Limited on-demand.
Availability outlook for H2 2026: NVIDIA is ramping Blackwell production aggressively. More providers — including Nebius, Crusoe Energy, and additional independents — are expected to list B200 capacity in Q3 2026. Prices are expected to drift lower as supply increases. Set a GridStackHub price alert to be notified when new providers list B200 or existing prices drop.
B200 Price Trend: 30-Day Movement
The B200 launched with early-access pricing above $9/hr at some providers in Q1 2026. As NVIDIA scaled Blackwell production and more providers gained allocation, pricing has compressed toward the current floor of $5.29/hr. Here is the price trend since Blackwell became broadly available:
B200 pricing has fallen approximately 40% since early access launched in Q1 2026 as NVIDIA ramped Blackwell production. The $5.29/hr floor (Lambda) represents the current market equilibrium for single-GPU on-demand access. GridStackHub forecasts continued gradual compression through H2 2026 as more providers gain B200 allocation.
Ask GridStackHub About B200 Pricing
Get answers from live pricing data — compare B200 vs H200 cost, estimate monthly spend, or find the cheapest B200 option for your workload.
Frequently Asked Questions
Track B200 Prices and Set Alerts
B200 pricing is moving rapidly in 2026 as production scales and more providers gain access. GridStackHub tracks every provider daily — here is how to stay ahead:
Get B200 price alerts
We'll notify you when B200 prices drop, new providers list capacity, or a better deal appears. Free — no credit card required.
Compare B200 against every alternative for your workload
Set your model size, batch size, and hours per month — see exact monthly cost for B200, H200, H100, AMD MI300X, and 50+ more configurations side by side.
Open GPU Cost Calculator →