According to GridStackHub.ai data, GPU spot pricing in May 2026 ranges from $1.35/hr for H100 SXM5 on Vast.ai (vs $1.74/hr on-demand — 22% savings) to $0.35/hr for A100 80GB spot (vs $0.78/hr on-demand — 55% savings). For workloads that support checkpoint-resume — training, fine-tuning, batch inference, and evaluation — spot pricing can cut GPU costs by $500–$5,000/month per GPU cluster. The risk is interruption: providers reclaim spot capacity when demand spikes, with 2-minute to 30-second warning windows depending on provider.
Typical GPU spot discount vs on-demand pricing in 2026. H100 spot: as low as $1.35/hr (Vast.ai) versus $1.74/hr on-demand at Lambda. A100 spot: as low as $0.35/hr versus $0.78/hr on-demand. Savings scale directly with training job duration.
GPU Spot Pricing by Provider — May 2026
GridStackHub tracks spot/preemptible GPU pricing across all major providers. Here is the complete table of spot rates for the most common GPU models, updated daily:
| Provider | GPU | Spot Price | On-Demand | Savings | Warning | Type |
|---|---|---|---|---|---|---|
| Vast.ai | H100 SXM5 | $1.35–1.89/hr | $1.74/hr (Lambda) | 22–29% | Varies | Interruptible |
| RunPod | H100 SXM5 | $1.50–1.89/hr | $1.99/hr (RunPod OD) | 5–25% | ~30s | Spot |
| Vast.ai | A100 80GB | $0.35–0.55/hr | $0.78/hr (Thunder) | 29–55% | Varies | Interruptible |
| RunPod | A100 80GB | $0.45–0.65/hr | $0.79/hr (RunPod OD) | 18–43% | ~30s | Spot |
| Vast.ai | RTX 4090 | $0.26–0.38/hr | $0.44/hr (Vast OD) | 14–41% | Varies | Interruptible |
| AWS | H100 (p5.48xlarge) | ~$1.50–1.95/hr | $4.84/hr per GPU | 60–69% | 2 min | Spot Instances |
| AWS | A100 (p4d.24xlarge) | ~$0.92–1.40/hr | $3.67–4.84/hr per GPU | 62–75% | 2 min | Spot Instances |
| Google Cloud | H100 (a3-highgpu) | ~$1.30–1.85/hr | $3.09/hr per GPU | 40–58% | 30s | Spot VMs |
| Google Cloud | A100 80GB (a2-highgpu) | ~$1.10–1.60/hr | $3.75/hr per GPU | 57–71% | 30s | Spot VMs |
| Azure | H100 (ND H100 v5) | ~$2.10–2.90/hr | ~$3.50/hr per GPU | 17–40% | 30s | Spot VMs |
Spot prices are market-driven and fluctuate hourly. Ranges shown are typical based on GridStackHub tracking May 2026. "Warning" = advance notice before instance termination. AWS and GCP spot can vary significantly by region and time. Always verify current spot price in provider console before launching. Independent provider spot (Vast.ai, RunPod) may have host-driven interruptions not tied to demand surges.
Why AWS spot saves more on paper but less in practice: AWS H100 on-demand ($4.84/hr) is 2.8x the Lambda on-demand price ($1.74/hr). A 60% discount on AWS spot still puts you at $1.93/hr — more expensive than Lambda on-demand. The best absolute spot rates for H100 are Vast.ai ($1.35/hr) and RunPod, not hyperscalers.
Is Your Workload Right for Spot GPUs?
Spot GPUs can be reclaimed by the provider at any time. Whether this matters depends entirely on your workload. Here is the clear line:
- LLM training with checkpoint-resume
- Model fine-tuning (LoRA, QLoRA, full)
- Batch inference (queued, not real-time)
- Hyperparameter search (each trial isolated)
- Model evaluation and benchmark runs
- Embedding generation for vector databases
- Data preprocessing and tokenization
- Synthetic data generation pipelines
- RLHF reward model training
- Diffusion model training/fine-tuning
- Real-time inference APIs (latency SLAs)
- Interactive Jupyter notebooks (work loss)
- Jobs shorter than 30 minutes (checkpoint overhead)
- Stateful streaming inference
- Customer-facing ML features with uptime SLAs
- Inference serving with <99.9% uptime requirement
- Training without checkpointing implemented
- Long-running experiments with no save logic
Rule of thumb: If your job can restart from a checkpoint and lose at most 30 minutes of work, it's spot-eligible. Fine-tuning a 7B model for 8 hours with 30-minute checkpoints? Perfect for spot — at most you lose 30 min of training if interrupted. Serving a production LLM API? Not spot.
How to Implement Checkpoint-Resume for GPU Spot
The entire spot pricing strategy depends on one capability: checkpointing your job so it can resume from where it left off. Here is the complete implementation guide:
Add periodic checkpoint saves (every 15–30 min)
In your training loop, call model.save_pretrained() and optimizer.state_dict() every N steps. Upload to S3/GCS/R2 immediately. For PyTorch Lightning, set save_top_k=-1 and every_n_train_steps. For HuggingFace Trainer, set save_steps=100 and save_total_limit=3.
Set up SIGTERM handler for graceful final checkpoint
Register a signal handler: signal.signal(signal.SIGTERM, save_checkpoint_and_exit). When the provider sends the preemption signal (SIGTERM), your handler fires, saves a final checkpoint to object storage, and exits cleanly. Without this, you lose work since the last scheduled checkpoint.
Load from latest checkpoint on job start
At job startup, check if a checkpoint exists in your S3/GCS/R2 bucket. If found, load it: model = AutoModelForCausalLM.from_pretrained(checkpoint_path) and restore optimizer state. HuggingFace Trainer does this automatically with resume_from_checkpoint=True.
Configure automatic job requeue on interruption
For Kubernetes: set restartPolicy: OnFailure on your Pod spec. For Ray: use max_retries=10 on remote tasks. For AWS Batch: use managed retry strategies. For RunPod: use RunPod's job queue API which automatically requeues terminated spot jobs. The job relaunches, finds the checkpoint, and resumes.
Monitor spot interruption rates and adjust strategy
Track actual interruption rates in your environment. If you're seeing more than 2 interruptions per day on a single GPU, either switch to a different GPU type (more supply), a different region, or increase checkpoint frequency. On Vast.ai, prefer hosts with high "reliability" ratings (95%+). On AWS, use Spot Instance Advisor to find GPU types with lowest interruption rates by region.
Provider-Specific Spot Strategies
Vast.ai — Best Absolute Dollar Rates
Vast.ai operates as a marketplace where individual owners rent out their GPUs. "Interruptible" instances are the cheapest tier — the host can reclaim their machine at any time by sending SIGTERM. Interruption rate varies by host reliability score and GPU demand. Best practices for Vast.ai spot:
- Filter by reliability score — only bid on hosts with 95%+ reliability. Unreliable hosts interrupt frequently.
- Use dph (dollars per hour) bidding — bid at the ask price or slightly above to secure the instance quickly.
- Prefer dedicated instances — "On-demand" on Vast.ai means dedicated with ~30s eviction notice; "Interruptible" is cheaper with variable notice time.
- Store checkpoints off-host — always save to object storage (R2, S3), not local disk. When the instance is terminated, local disk is gone.
RunPod — Spot with Predictable Warning
RunPod spot (called "Community Cloud") gives approximately 30 seconds of SIGTERM warning before termination. Spot prices are typically 5–25% below RunPod's on-demand rates — smaller savings than Vast.ai, but with more predictable pricing and a more managed environment. RunPod's job queue API can automatically requeue spot jobs on interruption with no extra configuration.
AWS EC2 Spot — Biggest Discount from Hyperscaler On-Demand
AWS spot discounts look impressive (60–70% off) because AWS on-demand H100 is expensive ($4.84/hr). The resulting spot price (~$1.50–$1.95/hr) is competitive with independent cloud on-demand but not cheaper than Vast.ai spot. AWS spot advantages: enterprise SLA for other services, 2-minute warning (longer than RunPod/GCP), mature Spot Fleet and Auto Scaling tooling, and the ability to mix spot with on-demand fallback in a single fleet.
Google Cloud Spot VMs — Good for TPU Alternatives
GCP Spot VMs for H100 offer 40–58% discounts off their on-demand rate ($3.09/hr → ~$1.30–$1.85/hr). GCP provides a 30-second preemption notice via metadata server. GCP spot works well if you're already on Google Cloud for other services (BigQuery, Vertex AI, GCS). GCP also offers TPUs at competitive spot rates — for training workloads that can run on TPUs, GCP spot TPUs often beat H100 spot on cost-per-FLOP.
Track GPU Spot Price Drops
Get notified when spot prices for H100, A100, or your target GPU drop significantly. GridStackHub monitors spot rates across 32 providers daily.
Real Savings: Spot vs On-Demand by Workload
Here is what spot pricing saves on common AI workloads run at GridStackHub-tracked prices:
| Workload | Duration | On-Demand Cost | Spot Cost | Monthly Savings |
|---|---|---|---|---|
| Fine-tune 7B model (daily) | 4 hrs/day × 1 H100 | $208/mo (Lambda) | $162/mo (Vast spot) | $46/mo (22%) |
| Train 13B model from scratch | 168 hrs × 4 H100 | $1,170 (Lambda) | $908 (Vast spot) | $262 (22%) |
| Batch embed 100M documents | 24 hrs × 2 A100 | $37.44 (Thunder OD) | $16.80 (Vast spot) | $20.64 (55%) |
| Hyperparameter search (Optuna) | 80 hrs × 4 RTX 4090 | $140.80 (Vast OD) | $83.20 (Vast spot) | $57.60 (41%) |
| Full pre-training (400B tokens, 7B) | ~500 hrs × 8 H100 | $6,960 (Lambda) | $5,400 (Vast spot) | $1,560 (22%) |
| Continual fine-tune (weekly) | 6 hrs/wk × 2 A100 | $74/mo (Thunder OD) | $33/mo (Vast spot) | $41/mo (55%) |
Calculations based on: Lambda H100 on-demand $1.74/hr, Vast.ai H100 spot $1.35/hr; Thunder A100 on-demand $0.78/hr, Vast.ai A100 spot $0.35/hr; Vast.ai RTX 4090 on-demand $0.44/hr, spot $0.26/hr. Actual savings depend on current spot market rates.
Understanding Spot Risk: Interruption Rates and Real Cost
Spot pricing comes with one real cost: interrupted jobs lose work since the last checkpoint. Here is how to think about the true cost of interruptions:
The interruption math: An H100 spot at $1.35/hr (vs $1.74 on-demand) saves $0.39/hr. If interrupted once per day, you lose up to 30 minutes of work (with 30-min checkpoints) — but that 30 minutes of work would have cost $0.87 at on-demand rates. Your daily saving is 24 × $0.39 = $9.36. The interruption's cost (30 min recompute at spot rate) is $0.67. Net saving: $8.69/day even with one daily interruption. The math still works.
Interruption Rate Benchmarks by Provider (2026)
| Provider | GPU | Typical Interruption Rate | Warning Time | Best For |
|---|---|---|---|---|
| Vast.ai (high-reliability host) | H100 / A100 | 2–8% / day | Variable (seconds–minutes) | Long training runs |
| RunPod Community Cloud | H100 / A100 | 3–10% / day | ~30 seconds | Fine-tuning, batch jobs |
| AWS Spot (us-east-1, H100) | H100 | 5–20% / month | 2 minutes | AWS-native ML pipelines |
| GCP Spot VMs (us-central1, H100) | H100 | 5–15% / month | 30 seconds | GCP-native training |
| Vast.ai (low-reliability host) | Mixed | 20–40% / day | Variable | Avoid for multi-hour jobs |
Advanced Spot Strategies for AI Teams
Mixed Fleet: Spot Primary + On-Demand Fallback
For training runs that must complete on deadline, use a mixed fleet approach: start 80–90% of GPUs on spot, 10–20% on-demand. When spot nodes are interrupted, the on-demand nodes continue training (at reduced throughput). This provides savings of 15–20% versus full on-demand while ensuring your training run always makes progress. AWS Spot Fleet and GCP Managed Instance Groups support this natively.
Regional Arbitrage
Spot prices vary by region. H100 spot in us-west-2 may be $0.20/hr higher than eu-west-1 during US business hours due to demand patterns. For training jobs with no geographic data requirements, running in lower-demand regions (EU/Asia-Pacific off-peak) can extend your spot savings by 10–20%.
Spot for Batch, On-Demand for Serving
The cleanest architecture: run all training, fine-tuning, and batch inference on spot with checkpoint-resume. Run production serving endpoints on reserved or on-demand instances with SLA guarantees. This hybrid approach typically cuts total GPU spend by 30–50% for teams where training/batch jobs consume more GPU-hours than serving.
Checkpoint Frequency Optimization
Checkpointing too frequently wastes compute on I/O; too infrequently loses more work on interruption. The optimal checkpoint interval depends on: (1) S3/R2 upload bandwidth, (2) model size (smaller = faster checkpoint), and (3) interruption rate. For a 7B model (~14GB checkpoint) with 5GB/s S3 upload, a checkpoint takes ~3 seconds — minimal overhead at 30-minute intervals. For a 70B model (~140GB), checkpoint takes ~30 seconds — still fine at 30-minute intervals.
Compare spot and on-demand GPU pricing live
GridStackHub tracks 396 GPU pricing records across 32 providers daily — spot and on-demand, filtered by GPU model or provider.
View Live GPU Pricing →Frequently Asked Questions
According to GridStackHub.ai data, GPU spot pricing is typically 22–70% cheaper than on-demand depending on provider and GPU model. H100 spot on Vast.ai starts at $1.35/hr versus $1.74/hr on-demand at Lambda — a 22% discount. A100 spot on Vast.ai reaches $0.35/hr versus $0.78/hr on-demand — a 55% discount. AWS and GCP offer 60–70% discounts on paper, but their on-demand rates are inflated versus independent cloud, so the absolute spot price is similar. The discount varies with market supply/demand and can change hourly.
GPU spot is best for interruption-tolerant workloads: LLM training with checkpoint-resume, fine-tuning (LoRA, QLoRA, full fine-tune), batch inference, hyperparameter search, model evaluation, embedding generation, and data preprocessing. Poor candidates include real-time inference APIs, interactive sessions, jobs under 30 minutes, and any workload without checkpointing implemented. The rule: if it can restart from a checkpoint and losing 30 minutes of work is acceptable, it qualifies for spot.
According to GridStackHub.ai data, Vast.ai offers the cheapest absolute H100 spot pricing at $1.35–$1.89/hr in May 2026, followed by RunPod spot at $1.50–$1.89/hr. AWS H100 spot (~$1.50–$1.95/hr per GPU) is competitive in absolute terms despite the large percentage discount, because AWS on-demand is $4.84/hr. GCP H100 spot (~$1.30–$1.85/hr) is similar to Vast.ai in absolute price. For the cheapest H100 spot without hyperscaler overhead, Vast.ai is the leading option. Filter by high-reliability hosts (95%+) for the most stable interruption rates.
The standard approach is checkpoint-resume: save model weights and optimizer state to object storage (S3, GCS, R2) every 15–30 minutes. Register a SIGTERM signal handler to save a final checkpoint when preemption is signaled. On job restart, load from the latest checkpoint. HuggingFace Trainer supports this with resume_from_checkpoint=True. PyTorch Lightning supports it with ModelCheckpoint callback. AWS provides a 2-minute warning; GCP/RunPod provide 30 seconds. With 30-minute checkpoints, the maximum work loss per interruption is 30 minutes.
Vast.ai spot is reliable for batch workloads with checkpointing when filtering by high-reliability hosts (95%+). Interruptions happen when the host reclaims their machine — not based on cloud demand surges. Typical interruption rate on high-reliability Vast.ai H100 instances is 2–8% per day, meaning a 24-hour job has a 92–98% chance of running uninterrupted. With 30-minute checkpointing, even a 10% daily interruption rate loses only 30 minutes of work per event. Always save checkpoints to off-host object storage — never to local disk that disappears on instance termination.
AWS EC2 Spot Instances bid on unused capacity at dynamically determined prices, typically 60–70% below on-demand for H100 instances. For H100 (p5.48xlarge, on-demand $4.84/GPU), spot runs ~$1.50–$1.95/GPU. AWS provides a 2-minute interruption warning via instance metadata service at 169.254.169.254/latest/meta-data/spot/termination-time. Use AWS Spot Fleet with mixed instance types for automatic replacement, and configure your training framework to checkpoint on SIGTERM. AWS spot interruption rates for H100 p5 in us-east-1 are typically 5–20% per month.
Related GPU Pricing Resources
- GPU Cloud Pricing Comparison — All Providers Live
- Cheapest A100 Cloud 2026 — $0.42/hr and Up
- Cheapest B200 GPU Cloud 2026 — Blackwell Pricing
- AMD MI300X vs NVIDIA H100 — Full Price Comparison
- NVIDIA H200 GPU Cloud Pricing 2026
- GPU Cost Calculator — Estimate Your Workload Cost
- LLM Cost Per Token — Inference Cost by GPU