H100 Pricing
How much does an H100 GPU cost per hour?
H100 SXM5 costs $1.49/hr on spot (Vast.ai), $1.99/hr on-demand (Lambda), $2.23/hr on-demand (CoreWeave), and $3.90–4.10/hr at AWS, Azure, and GCP. Reserved 1-year pricing: $1.79/hr at CoreWeave.
Which is the cheapest H100 cloud provider?
For on-demand: Lambda Labs at $1.99/hr. For reserved: CoreWeave at ~$1.79/hr (1-year). For spot/interruptible: Vast.ai at ~$1.49/hr. All prices as of May 2026 — check GridStackHub's comparison table for current rates.
Is H100 pricing dropping in 2026?
Yes, but gradually. H100 on-demand prices have fallen ~45% from 2024 peaks. The primary driver is supply normalization and B200 availability shifting high-end training workloads. Expect prices to stabilize in the $1.75–2.25/hr range through 2026.
What is the H100 price per month?
One H100 running 24/7 on CoreWeave costs ~$1,606/month on-demand, ~$1,289/month on 1-year reserved, or ~$1,073/month at typical Vast.ai spot rates. AWS costs approximately $2,808/month for equivalent capacity.
GPU Types and Selection
What is the cheapest GPU for LLM inference?
For sub-13B models: L40S at $1.10–1.50/hr offers the best cost-per-token. For 30–70B models: A100 80GB at $1.50–2.00/hr. For 70B+ models requiring high throughput: H100 at $1.99/hr is often the most cost-efficient option per token at scale.
H100 vs A100: which is better for my workload?
H100 delivers ~3× the training throughput of A100 for transformer models due to improved tensor core architecture and faster memory bandwidth. For inference, the gap narrows — many 7B–30B models run efficiently on A100 at half the cost. Benchmark your specific model before committing.
What is the B200 GPU and how does it compare to H100?
NVIDIA B200 is the Blackwell architecture successor to H100. It delivers ~2.5× inference throughput per GPU vs H100. B200 costs ~$5.29/hr on-demand (CoreWeave, May 2026) vs $2.23/hr for H100 — a 2.4× price premium for 2.5× throughput, making it roughly equivalent in cost-per-token for inference.
Is an A100 still worth renting in 2026?
Yes for inference and fine-tuning. A100 80GB on-demand prices have fallen to $1.50–2.00/hr at specialized providers. For 7B–30B model inference and fine-tuning, A100 often delivers better cost-per-token than H100. For large-scale pretraining, H100/H200/B200 are clearly superior.
Pricing Models
What is the difference between on-demand, reserved, and spot GPU pricing?
On-demand: pay-as-you-go, highest rate, no commitment, instant start/stop. Reserved: 1 or 3-year commitment at 20–50% discount, guaranteed capacity. Spot/interruptible: unused capacity at 50–80% discount, may be reclaimed with short notice.
How much can you save with reserved GPU pricing?
Typically 20–35% for 1-year reserved vs on-demand at the same provider. CoreWeave: $2.23/hr on-demand → $1.79/hr reserved (20% savings). AWS: on-demand ~$3.90/hr → Reserved 1-yr ~$2.70/hr (31% savings). Savings justify reservation when GPU utilization exceeds 50%.
How reliable are spot GPU instances?
Spot reliability varies by provider and demand conditions. On Vast.ai, average spot instance lifetime for H100s has typically exceeded 6 hours, with many running for days. For fault-tolerant jobs with checkpointing, spot is highly usable. For latency-sensitive inference, avoid spot.
Can I convert on-demand to reserved pricing mid-run?
It depends on the provider. CoreWeave and some other specialized providers allow reserved commitments to be applied to existing on-demand instances. AWS has a similar mechanism through Convertible Reserved Instances. Check with your provider before assuming this option is available.
Provider Comparison
AWS vs CoreWeave for GPU pricing: what is the difference?
CoreWeave H100 on-demand: $2.23/hr. AWS H100 equivalent (p4de.24xlarge per GPU): ~$3.90/hr. That is a 75% premium for AWS. AWS advantages: compliance certifications, IAM, tight integration with S3/EKS. For pure GPU compute without compliance requirements, CoreWeave saves ~$1.67/hr per GPU.
Is Lambda Labs a reliable GPU cloud provider?
Yes. Lambda is one of the more established GPU-specialized providers with a strong reputation for H100 availability and transparent pricing. Their $1.99/hr H100 on-demand rate and 10TB/month free egress make them highly competitive. They lack some enterprise features (no SOC2 as of 2026) but are reliable for ML workloads.
What is Vast.ai and is it reliable?
Vast.ai is a GPU marketplace where individual owners and small providers offer GPU capacity at competitive rates. Reliability varies by host. Vast.ai provides reliability scores for each host. For fault-tolerant training jobs with checkpointing, it is an excellent cost option. Not suitable for production inference.
Do European GPU cloud providers offer competitive pricing?
European providers (OVHcloud, Hetzner, Scaleway) generally run 10–25% above comparable US providers due to higher electricity and land costs. The advantage: EU data residency, GDPR compliance, and lower latency for European users.
Cost Optimization
How do I calculate GPU cost for a training run?
Cost = (GPU count) × (hourly rate) × (total hours). Example: 8× H100 training run for 72 hours at CoreWeave on-demand: 8 × $2.23 × 72 = $1,285. Use GridStackHub's calculator to compare this cost across providers and pricing models.
How much does it cost to train a 7B parameter LLM?
A 7B parameter model trained on 1 trillion tokens requires roughly 700 H100-GPU-hours at full efficiency (assumes 350 TFLOPS effective). At $2.23/hr (CoreWeave on-demand), that is ~$1,561 for compute alone. In practice, efficiency losses and iteration mean real costs run 2–4× the theoretical minimum.
How much does inference cost per request?
For a 7B model on an H100 at $2.23/hr serving 100 requests/minute (1,000 tokens each): GPU cost is $0.000372 per request. Add overhead and you're at ~$0.0004–0.0006 per request. For a 70B model at 10 requests/minute: ~$0.006 per request. Use our cost-per-token calculator for precise estimates.
General Questions
What GPU should I start with for AI development?
For experimentation: A100 40GB or L40S on spot pricing gives excellent cost efficiency for most 7B–13B model work. For fine-tuning or production training: H100 on-demand from Lambda or CoreWeave. Don't over-provision — right-sizing is the biggest cost lever for most teams starting out.
How do I set up a GPU price alert?
GridStackHub offers price alerts for specific GPU models and providers. Sign up and configure alerts to notify you when H100, A100, or other GPU prices drop below your threshold at any tracked provider. Useful for catching spot price dips for planned training runs.
What is the best GPU cloud for startups?
For early-stage: Vast.ai or RunPod for spot pricing maximize compute per dollar. For Series A+ with production workloads: Lambda Labs or CoreWeave offer a good balance of price and reliability. For compliance-bound healthcare/fintech startups: AWS or Azure despite the premium.
Get live GPU price data
See real-time H100, B200, A100, and MI300X pricing across 32+ providers.
Compare GPU Prices →