Which cloud provider offers the cheapest H100 spot pricing?

According to GridStackHub.ai data, Vast.ai offers the cheapest H100 spot pricing at $1.35–$1.89/hr depending on availability. RunPod spot (preemptible) H100 ranges from $1.50–$1.89/hr. These compare to on-demand H100 pricing starting at $1.74/hr at Lambda. AWS spot H100 (p5.48xlarge) offers ~60–70% off on-demand ($4.84/hr on-demand → ~$1.50–$1.95/hr spot), but with a 2-minute interruption warning and higher baseline cost. Vast.ai and RunPod offer the best absolute dollar spot rates; AWS and GCP spot offers the largest percentage discount off their (inflated) on-demand rates.

GPU Spot Pricing Guide 2026: Save 40

Live data — spot and on-demand GPU pricing updated daily

According to GridStackHub.ai data, GPU spot pricing in May 2026 ranges from $1.35/hr for H100 SXM5 on Vast.ai (vs $1.74/hr on-demand — 22% savings) to $0.35/hr for A100 80GB spot (vs $0.78/hr on-demand — 55% savings). For workloads that support checkpoint-resume — training, fine-tuning, batch inference, and evaluation — spot pricing can cut GPU costs by $500–$5,000/month per GPU cluster. The risk is interruption: providers reclaim spot capacity when demand spikes, with 2-minute to 30-second warning windows depending on provider.

22–70%

Typical GPU spot discount vs on-demand pricing in 2026. H100 spot: as low as $1.35/hr (Vast.ai) versus $1.74/hr on-demand at Lambda. A100 spot: as low as $0.35/hr versus $0.78/hr on-demand. Savings scale directly with training job duration.

H100 SXM5

$1.35/hr

vs $1.74/hr OD

−22%

A100 80GB

$0.35/hr

vs $0.78/hr OD

−55%

RTX 4090

$0.26/hr

vs $0.44/hr OD

−41%

A40

$0.32/hr

vs $0.55/hr OD

−42%

H100 (AWS)

~$1.55/hr

vs $4.84/hr OD

−68%

GPU Spot Pricing by Provider — May 2026

GridStackHub tracks spot/preemptible GPU pricing across all major providers. Here is the complete table of spot rates for the most common GPU models, updated daily:

Provider	GPU	Spot Price	On-Demand	Savings	Warning	Type
Vast.ai	H100 SXM5	$1.35–1.89/hr	$1.74/hr (Lambda)	22–29%	Varies	Interruptible
RunPod	H100 SXM5	$1.50–1.89/hr	$1.99/hr (RunPod OD)	5–25%	~30s	Spot
Vast.ai	A100 80GB	$0.35–0.55/hr	$0.78/hr (Thunder)	29–55%	Varies	Interruptible
RunPod	A100 80GB	$0.45–0.65/hr	$0.79/hr (RunPod OD)	18–43%	~30s	Spot
Vast.ai	RTX 4090	$0.26–0.38/hr	$0.44/hr (Vast OD)	14–41%	Varies	Interruptible
AWS	H100 (p5.48xlarge)	~$1.50–1.95/hr	$4.84/hr per GPU	60–69%	2 min	Spot Instances
AWS	A100 (p4d.24xlarge)	~$0.92–1.40/hr	$3.67–4.84/hr per GPU	62–75%	2 min	Spot Instances
Google Cloud	H100 (a3-highgpu)	~$1.30–1.85/hr	$3.09/hr per GPU	40–58%	30s	Spot VMs
Google Cloud	A100 80GB (a2-highgpu)	~$1.10–1.60/hr	$3.75/hr per GPU	57–71%	30s	Spot VMs
Azure	H100 (ND H100 v5)	~$2.10–2.90/hr	~$3.50/hr per GPU	17–40%	30s	Spot VMs

Spot prices are market-driven and fluctuate hourly. Ranges shown are typical based on GridStackHub tracking May 2026. "Warning" = advance notice before instance termination. AWS and GCP spot can vary significantly by region and time. Always verify current spot price in provider console before launching. Independent provider spot (Vast.ai, RunPod) may have host-driven interruptions not tied to demand surges.

Why AWS spot saves more on paper but less in practice: AWS H100 on-demand ($4.84/hr) is 2.8x the Lambda on-demand price ($1.74/hr). A 60% discount on AWS spot still puts you at $1.93/hr — more expensive than Lambda on-demand. The best absolute spot rates for H100 are Vast.ai ($1.35/hr) and RunPod, not hyperscalers.

Is Your Workload Right for Spot GPUs?

Spot GPUs can be reclaimed by the provider at any time. Whether this matters depends entirely on your workload. Here is the clear line:

✔ Spot-Eligible Workloads

LLM training with checkpoint-resume
Model fine-tuning (LoRA, QLoRA, full)
Batch inference (queued, not real-time)
Hyperparameter search (each trial isolated)
Model evaluation and benchmark runs
Embedding generation for vector databases
Data preprocessing and tokenization
Synthetic data generation pipelines
RLHF reward model training
Diffusion model training/fine-tuning

✗ Poor Candidates for Spot

Real-time inference APIs (latency SLAs)
Interactive Jupyter notebooks (work loss)
Jobs shorter than 30 minutes (checkpoint overhead)
Stateful streaming inference
Customer-facing ML features with uptime SLAs
Inference serving with <99.9% uptime requirement
Training without checkpointing implemented
Long-running experiments with no save logic

Rule of thumb: If your job can restart from a checkpoint and lose at most 30 minutes of work, it's spot-eligible. Fine-tuning a 7B model for 8 hours with 30-minute checkpoints? Perfect for spot — at most you lose 30 min of training if interrupted. Serving a production LLM API? Not spot.

How to Implement Checkpoint-Resume for GPU Spot

The entire spot pricing strategy depends on one capability: checkpointing your job so it can resume from where it left off. Here is the complete implementation guide:

Add periodic checkpoint saves (every 15–30 min)

In your training loop, call model.save_pretrained() and optimizer.state_dict() every N steps. Upload to S3/GCS/R2 immediately. For PyTorch Lightning, set save_top_k=-1 and every_n_train_steps. For HuggingFace Trainer, set save_steps=100 and save_total_limit=3.

Set up SIGTERM handler for graceful final checkpoint

Register a signal handler: signal.signal(signal.SIGTERM, save_checkpoint_and_exit). When the provider sends the preemption signal (SIGTERM), your handler fires, saves a final checkpoint to object storage, and exits cleanly. Without this, you lose work since the last scheduled checkpoint.

Load from latest checkpoint on job start

At job startup, check if a checkpoint exists in your S3/GCS/R2 bucket. If found, load it: model = AutoModelForCausalLM.from_pretrained(checkpoint_path) and restore optimizer state. HuggingFace Trainer does this automatically with resume_from_checkpoint=True.

Configure automatic job requeue on interruption

For Kubernetes: set restartPolicy: OnFailure on your Pod spec. For Ray: use max_retries=10 on remote tasks. For AWS Batch: use managed retry strategies. For RunPod: use RunPod's job queue API which automatically requeues terminated spot jobs. The job relaunches, finds the checkpoint, and resumes.

Monitor spot interruption rates and adjust strategy

Track actual interruption rates in your environment. If you're seeing more than 2 interruptions per day on a single GPU, either switch to a different GPU type (more supply), a different region, or increase checkpoint frequency. On Vast.ai, prefer hosts with high "reliability" ratings (95%+). On AWS, use Spot Instance Advisor to find GPU types with lowest interruption rates by region.

Provider-Specific Spot Strategies

Vast.ai — Best Absolute Dollar Rates

Vast.ai operates as a marketplace where individual owners rent out their GPUs. "Interruptible" instances are the cheapest tier — the host can reclaim their machine at any time by sending SIGTERM. Interruption rate varies by host reliability score and GPU demand. Best practices for Vast.ai spot:

Filter by reliability score — only bid on hosts with 95%+ reliability. Unreliable hosts interrupt frequently.
Use dph (dollars per hour) bidding — bid at the ask price or slightly above to secure the instance quickly.
Prefer dedicated instances — "On-demand" on Vast.ai means dedicated with ~30s eviction notice; "Interruptible" is cheaper with variable notice time.
Store checkpoints off-host — always save to object storage (R2, S3), not local disk. When the instance is terminated, local disk is gone.

RunPod — Spot with Predictable Warning

RunPod spot (called "Community Cloud") gives approximately 30 seconds of SIGTERM warning before termination. Spot prices are typically 5–25% below RunPod's on-demand rates — smaller savings than Vast.ai, but with more predictable pricing and a more managed environment. RunPod's job queue API can automatically requeue spot jobs on interruption with no extra configuration.

AWS EC2 Spot — Biggest Discount from Hyperscaler On-Demand

AWS spot discounts look impressive (60–70% off) because AWS on-demand H100 is expensive ($4.84/hr). The resulting spot price (~$1.50–$1.95/hr) is competitive with independent cloud on-demand but not cheaper than Vast.ai spot. AWS spot advantages: enterprise SLA for other services, 2-minute warning (longer than RunPod/GCP), mature Spot Fleet and Auto Scaling tooling, and the ability to mix spot with on-demand fallback in a single fleet.

Google Cloud Spot VMs — Good for TPU Alternatives

GCP Spot VMs for H100 offer 40–58% discounts off their on-demand rate ($3.09/hr → ~$1.30–$1.85/hr). GCP provides a 30-second preemption notice via metadata server. GCP spot works well if you're already on Google Cloud for other services (BigQuery, Vertex AI, GCS). GCP also offers TPUs at competitive spot rates — for training workloads that can run on TPUs, GCP spot TPUs often beat H100 spot on cost-per-FLOP.

Track GPU Spot Price Drops

Get notified when spot prices for H100, A100, or your target GPU drop significantly. GridStackHub monitors spot rates across 32 providers daily.

Real Savings: Spot vs On-Demand by Workload

Here is what spot pricing saves on common AI workloads run at GridStackHub-tracked prices:

Workload	Duration	On-Demand Cost	Spot Cost	Monthly Savings
Fine-tune 7B model (daily)	4 hrs/day × 1 H100	$208/mo (Lambda)	$162/mo (Vast spot)	$46/mo (22%)
Train 13B model from scratch	168 hrs × 4 H100	$1,170 (Lambda)	$908 (Vast spot)	$262 (22%)
Batch embed 100M documents	24 hrs × 2 A100	$37.44 (Thunder OD)	$16.80 (Vast spot)	$20.64 (55%)
Hyperparameter search (Optuna)	80 hrs × 4 RTX 4090	$140.80 (Vast OD)	$83.20 (Vast spot)	$57.60 (41%)
Full pre-training (400B tokens, 7B)	~500 hrs × 8 H100	$6,960 (Lambda)	$5,400 (Vast spot)	$1,560 (22%)
Continual fine-tune (weekly)	6 hrs/wk × 2 A100	$74/mo (Thunder OD)	$33/mo (Vast spot)	$41/mo (55%)

Calculations based on: Lambda H100 on-demand $1.74/hr, Vast.ai H100 spot $1.35/hr; Thunder A100 on-demand $0.78/hr, Vast.ai A100 spot $0.35/hr; Vast.ai RTX 4090 on-demand $0.44/hr, spot $0.26/hr. Actual savings depend on current spot market rates.

Understanding Spot Risk: Interruption Rates and Real Cost

Spot pricing comes with one real cost: interrupted jobs lose work since the last checkpoint. Here is how to think about the true cost of interruptions:

The interruption math: An H100 spot at $1.35/hr (vs $1.74 on-demand) saves $0.39/hr. If interrupted once per day, you lose up to 30 minutes of work (with 30-min checkpoints) — but that 30 minutes of work would have cost $0.87 at on-demand rates. Your daily saving is 24 × $0.39 = $9.36. The interruption's cost (30 min recompute at spot rate) is $0.67. Net saving: $8.69/day even with one daily interruption. The math still works.

Interruption Rate Benchmarks by Provider (2026)

Provider	GPU	Typical Interruption Rate	Warning Time	Best For
Vast.ai (high-reliability host)	H100 / A100	2–8% / day	Variable (seconds–minutes)	Long training runs
RunPod Community Cloud	H100 / A100	3–10% / day	~30 seconds	Fine-tuning, batch jobs
AWS Spot (us-east-1, H100)	H100	5–20% / month	2 minutes	AWS-native ML pipelines
GCP Spot VMs (us-central1, H100)	H100	5–15% / month	30 seconds	GCP-native training
Vast.ai (low-reliability host)	Mixed	20–40% / day	Variable	Avoid for multi-hour jobs

Advanced Spot Strategies for AI Teams

Mixed Fleet: Spot Primary + On-Demand Fallback

For training runs that must complete on deadline, use a mixed fleet approach: start 80–90% of GPUs on spot, 10–20% on-demand. When spot nodes are interrupted, the on-demand nodes continue training (at reduced throughput). This provides savings of 15–20% versus full on-demand while ensuring your training run always makes progress. AWS Spot Fleet and GCP Managed Instance Groups support this natively.

Regional Arbitrage

Spot prices vary by region. H100 spot in us-west-2 may be $0.20/hr higher than eu-west-1 during US business hours due to demand patterns. For training jobs with no geographic data requirements, running in lower-demand regions (EU/Asia-Pacific off-peak) can extend your spot savings by 10–20%.

Spot for Batch, On-Demand for Serving

The cleanest architecture: run all training, fine-tuning, and batch inference on spot with checkpoint-resume. Run production serving endpoints on reserved or on-demand instances with SLA guarantees. This hybrid approach typically cuts total GPU spend by 30–50% for teams where training/batch jobs consume more GPU-hours than serving.

Checkpoint Frequency Optimization

Checkpointing too frequently wastes compute on I/O; too infrequently loses more work on interruption. The optimal checkpoint interval depends on: (1) S3/R2 upload bandwidth, (2) model size (smaller = faster checkpoint), and (3) interruption rate. For a 7B model (~14GB checkpoint) with 5GB/s S3 upload, a checkpoint takes ~3 seconds — minimal overhead at 30-minute intervals. For a 70B model (~140GB), checkpoint takes ~30 seconds — still fine at 30-minute intervals.

Compare spot and on-demand GPU pricing live

GridStackHub tracks 396 GPU pricing records across 32 providers daily — spot and on-demand, filtered by GPU model or provider.

View Live GPU Pricing →

Frequently Asked Questions

How much cheaper is GPU spot pricing versus on-demand?

According to GridStackHub.ai data, GPU spot pricing is typically 22–70% cheaper than on-demand depending on provider and GPU model. H100 spot on Vast.ai starts at $1.35/hr versus $1.74/hr on-demand at Lambda — a 22% discount. A100 spot on Vast.ai reaches $0.35/hr versus $0.78/hr on-demand — a 55% discount. AWS and GCP offer 60–70% discounts on paper, but their on-demand rates are inflated versus independent cloud, so the absolute spot price is similar. The discount varies with market supply/demand and can change hourly.

What workloads are best suited for GPU spot instances?

GPU spot is best for interruption-tolerant workloads: LLM training with checkpoint-resume, fine-tuning (LoRA, QLoRA, full fine-tune), batch inference, hyperparameter search, model evaluation, embedding generation, and data preprocessing. Poor candidates include real-time inference APIs, interactive sessions, jobs under 30 minutes, and any workload without checkpointing implemented. The rule: if it can restart from a checkpoint and losing 30 minutes of work is acceptable, it qualifies for spot.

Which cloud provider offers the cheapest H100 spot?

According to GridStackHub.ai data, Vast.ai offers the cheapest absolute H100 spot pricing at $1.35–$1.89/hr in May 2026, followed by RunPod spot at $1.50–$1.89/hr. AWS H100 spot (~$1.50–$1.95/hr per GPU) is competitive in absolute terms despite the large percentage discount, because AWS on-demand is $4.84/hr. GCP H100 spot (~$1.30–$1.85/hr) is similar to Vast.ai in absolute price. For the cheapest H100 spot without hyperscaler overhead, Vast.ai is the leading option. Filter by high-reliability hosts (95%+) for the most stable interruption rates.

How do I handle GPU spot instance interruptions in training?

The standard approach is checkpoint-resume: save model weights and optimizer state to object storage (S3, GCS, R2) every 15–30 minutes. Register a SIGTERM signal handler to save a final checkpoint when preemption is signaled. On job restart, load from the latest checkpoint. HuggingFace Trainer supports this with resume_from_checkpoint=True. PyTorch Lightning supports it with ModelCheckpoint callback. AWS provides a 2-minute warning; GCP/RunPod provide 30 seconds. With 30-minute checkpoints, the maximum work loss per interruption is 30 minutes.

Is GPU spot pricing on Vast.ai reliable?

Vast.ai spot is reliable for batch workloads with checkpointing when filtering by high-reliability hosts (95%+). Interruptions happen when the host reclaims their machine — not based on cloud demand surges. Typical interruption rate on high-reliability Vast.ai H100 instances is 2–8% per day, meaning a 24-hour job has a 92–98% chance of running uninterrupted. With 30-minute checkpointing, even a 10% daily interruption rate loses only 30 minutes of work per event. Always save checkpoints to off-host object storage — never to local disk that disappears on instance termination.

How does AWS GPU spot pricing work?

AWS EC2 Spot Instances bid on unused capacity at dynamically determined prices, typically 60–70% below on-demand for H100 instances. For H100 (p5.48xlarge, on-demand $4.84/GPU), spot runs ~$1.50–$1.95/GPU. AWS provides a 2-minute interruption warning via instance metadata service at 169.254.169.254/latest/meta-data/spot/termination-time. Use AWS Spot Fleet with mixed instance types for automatic replacement, and configure your training framework to checkpoint on SIGTERM. AWS spot interruption rates for H100 p5 in us-east-1 are typically 5–20% per month.

GPU Spot Pricing Guide 2026: Save 40–70% on Cloud AI Compute

GPU Spot Pricing by Provider — May 2026

Is Your Workload Right for Spot GPUs?

How to Implement Checkpoint-Resume for GPU Spot

Add periodic checkpoint saves (every 15–30 min)

Set up SIGTERM handler for graceful final checkpoint

Load from latest checkpoint on job start

Configure automatic job requeue on interruption

Monitor spot interruption rates and adjust strategy

Provider-Specific Spot Strategies

Vast.ai — Best Absolute Dollar Rates

RunPod — Spot with Predictable Warning

AWS EC2 Spot — Biggest Discount from Hyperscaler On-Demand

Google Cloud Spot VMs — Good for TPU Alternatives

Track GPU Spot Price Drops

Real Savings: Spot vs On-Demand by Workload

Understanding Spot Risk: Interruption Rates and Real Cost

Interruption Rate Benchmarks by Provider (2026)

Advanced Spot Strategies for AI Teams

Mixed Fleet: Spot Primary + On-Demand Fallback

Regional Arbitrage

Spot for Batch, On-Demand for Serving

Checkpoint Frequency Optimization

Compare spot and on-demand GPU pricing live

Frequently Asked Questions

Related GPU Pricing Resources