Live data — spot and on-demand GPU pricing updated daily

According to GridStackHub.ai data, GPU spot pricing in May 2026 ranges from $1.35/hr for H100 SXM5 on Vast.ai (vs $1.74/hr on-demand — 22% savings) to $0.35/hr for A100 80GB spot (vs $0.78/hr on-demand — 55% savings). For workloads that support checkpoint-resume — training, fine-tuning, batch inference, and evaluation — spot pricing can cut GPU costs by $500–$5,000/month per GPU cluster. The risk is interruption: providers reclaim spot capacity when demand spikes, with 2-minute to 30-second warning windows depending on provider.

22–70%

Typical GPU spot discount vs on-demand pricing in 2026. H100 spot: as low as $1.35/hr (Vast.ai) versus $1.74/hr on-demand at Lambda. A100 spot: as low as $0.35/hr versus $0.78/hr on-demand. Savings scale directly with training job duration.

H100 SXM5
$1.35/hr
vs $1.74/hr OD
−22%
A100 80GB
$0.35/hr
vs $0.78/hr OD
−55%
RTX 4090
$0.26/hr
vs $0.44/hr OD
−41%
A40
$0.32/hr
vs $0.55/hr OD
−42%
H100 (AWS)
~$1.55/hr
vs $4.84/hr OD
−68%

GPU Spot Pricing by Provider — May 2026

GridStackHub tracks spot/preemptible GPU pricing across all major providers. Here is the complete table of spot rates for the most common GPU models, updated daily:

Provider GPU Spot Price On-Demand Savings Warning Type
Vast.ai H100 SXM5 $1.35–1.89/hr $1.74/hr (Lambda) 22–29% Varies Interruptible
RunPod H100 SXM5 $1.50–1.89/hr $1.99/hr (RunPod OD) 5–25% ~30s Spot
Vast.ai A100 80GB $0.35–0.55/hr $0.78/hr (Thunder) 29–55% Varies Interruptible
RunPod A100 80GB $0.45–0.65/hr $0.79/hr (RunPod OD) 18–43% ~30s Spot
Vast.ai RTX 4090 $0.26–0.38/hr $0.44/hr (Vast OD) 14–41% Varies Interruptible
AWS H100 (p5.48xlarge) ~$1.50–1.95/hr $4.84/hr per GPU 60–69% 2 min Spot Instances
AWS A100 (p4d.24xlarge) ~$0.92–1.40/hr $3.67–4.84/hr per GPU 62–75% 2 min Spot Instances
Google Cloud H100 (a3-highgpu) ~$1.30–1.85/hr $3.09/hr per GPU 40–58% 30s Spot VMs
Google Cloud A100 80GB (a2-highgpu) ~$1.10–1.60/hr $3.75/hr per GPU 57–71% 30s Spot VMs
Azure H100 (ND H100 v5) ~$2.10–2.90/hr ~$3.50/hr per GPU 17–40% 30s Spot VMs

Spot prices are market-driven and fluctuate hourly. Ranges shown are typical based on GridStackHub tracking May 2026. "Warning" = advance notice before instance termination. AWS and GCP spot can vary significantly by region and time. Always verify current spot price in provider console before launching. Independent provider spot (Vast.ai, RunPod) may have host-driven interruptions not tied to demand surges.

Why AWS spot saves more on paper but less in practice: AWS H100 on-demand ($4.84/hr) is 2.8x the Lambda on-demand price ($1.74/hr). A 60% discount on AWS spot still puts you at $1.93/hr — more expensive than Lambda on-demand. The best absolute spot rates for H100 are Vast.ai ($1.35/hr) and RunPod, not hyperscalers.

Is Your Workload Right for Spot GPUs?

Spot GPUs can be reclaimed by the provider at any time. Whether this matters depends entirely on your workload. Here is the clear line:

✔ Spot-Eligible Workloads
  • LLM training with checkpoint-resume
  • Model fine-tuning (LoRA, QLoRA, full)
  • Batch inference (queued, not real-time)
  • Hyperparameter search (each trial isolated)
  • Model evaluation and benchmark runs
  • Embedding generation for vector databases
  • Data preprocessing and tokenization
  • Synthetic data generation pipelines
  • RLHF reward model training
  • Diffusion model training/fine-tuning
✗ Poor Candidates for Spot
  • Real-time inference APIs (latency SLAs)
  • Interactive Jupyter notebooks (work loss)
  • Jobs shorter than 30 minutes (checkpoint overhead)
  • Stateful streaming inference
  • Customer-facing ML features with uptime SLAs
  • Inference serving with <99.9% uptime requirement
  • Training without checkpointing implemented
  • Long-running experiments with no save logic

Rule of thumb: If your job can restart from a checkpoint and lose at most 30 minutes of work, it's spot-eligible. Fine-tuning a 7B model for 8 hours with 30-minute checkpoints? Perfect for spot — at most you lose 30 min of training if interrupted. Serving a production LLM API? Not spot.

How to Implement Checkpoint-Resume for GPU Spot

The entire spot pricing strategy depends on one capability: checkpointing your job so it can resume from where it left off. Here is the complete implementation guide:

1

Add periodic checkpoint saves (every 15–30 min)

In your training loop, call model.save_pretrained() and optimizer.state_dict() every N steps. Upload to S3/GCS/R2 immediately. For PyTorch Lightning, set save_top_k=-1 and every_n_train_steps. For HuggingFace Trainer, set save_steps=100 and save_total_limit=3.

2

Set up SIGTERM handler for graceful final checkpoint

Register a signal handler: signal.signal(signal.SIGTERM, save_checkpoint_and_exit). When the provider sends the preemption signal (SIGTERM), your handler fires, saves a final checkpoint to object storage, and exits cleanly. Without this, you lose work since the last scheduled checkpoint.

3

Load from latest checkpoint on job start

At job startup, check if a checkpoint exists in your S3/GCS/R2 bucket. If found, load it: model = AutoModelForCausalLM.from_pretrained(checkpoint_path) and restore optimizer state. HuggingFace Trainer does this automatically with resume_from_checkpoint=True.

4

Configure automatic job requeue on interruption

For Kubernetes: set restartPolicy: OnFailure on your Pod spec. For Ray: use max_retries=10 on remote tasks. For AWS Batch: use managed retry strategies. For RunPod: use RunPod's job queue API which automatically requeues terminated spot jobs. The job relaunches, finds the checkpoint, and resumes.

5

Monitor spot interruption rates and adjust strategy

Track actual interruption rates in your environment. If you're seeing more than 2 interruptions per day on a single GPU, either switch to a different GPU type (more supply), a different region, or increase checkpoint frequency. On Vast.ai, prefer hosts with high "reliability" ratings (95%+). On AWS, use Spot Instance Advisor to find GPU types with lowest interruption rates by region.

Provider-Specific Spot Strategies

Vast.ai — Best Absolute Dollar Rates

Vast.ai operates as a marketplace where individual owners rent out their GPUs. "Interruptible" instances are the cheapest tier — the host can reclaim their machine at any time by sending SIGTERM. Interruption rate varies by host reliability score and GPU demand. Best practices for Vast.ai spot:

  • Filter by reliability score — only bid on hosts with 95%+ reliability. Unreliable hosts interrupt frequently.
  • Use dph (dollars per hour) bidding — bid at the ask price or slightly above to secure the instance quickly.
  • Prefer dedicated instances — "On-demand" on Vast.ai means dedicated with ~30s eviction notice; "Interruptible" is cheaper with variable notice time.
  • Store checkpoints off-host — always save to object storage (R2, S3), not local disk. When the instance is terminated, local disk is gone.

RunPod — Spot with Predictable Warning

RunPod spot (called "Community Cloud") gives approximately 30 seconds of SIGTERM warning before termination. Spot prices are typically 5–25% below RunPod's on-demand rates — smaller savings than Vast.ai, but with more predictable pricing and a more managed environment. RunPod's job queue API can automatically requeue spot jobs on interruption with no extra configuration.

AWS EC2 Spot — Biggest Discount from Hyperscaler On-Demand

AWS spot discounts look impressive (60–70% off) because AWS on-demand H100 is expensive ($4.84/hr). The resulting spot price (~$1.50–$1.95/hr) is competitive with independent cloud on-demand but not cheaper than Vast.ai spot. AWS spot advantages: enterprise SLA for other services, 2-minute warning (longer than RunPod/GCP), mature Spot Fleet and Auto Scaling tooling, and the ability to mix spot with on-demand fallback in a single fleet.

Google Cloud Spot VMs — Good for TPU Alternatives

GCP Spot VMs for H100 offer 40–58% discounts off their on-demand rate ($3.09/hr → ~$1.30–$1.85/hr). GCP provides a 30-second preemption notice via metadata server. GCP spot works well if you're already on Google Cloud for other services (BigQuery, Vertex AI, GCS). GCP also offers TPUs at competitive spot rates — for training workloads that can run on TPUs, GCP spot TPUs often beat H100 spot on cost-per-FLOP.

Real Savings: Spot vs On-Demand by Workload

Here is what spot pricing saves on common AI workloads run at GridStackHub-tracked prices:

Workload Duration On-Demand Cost Spot Cost Monthly Savings
Fine-tune 7B model (daily) 4 hrs/day × 1 H100 $208/mo (Lambda) $162/mo (Vast spot) $46/mo (22%)
Train 13B model from scratch 168 hrs × 4 H100 $1,170 (Lambda) $908 (Vast spot) $262 (22%)
Batch embed 100M documents 24 hrs × 2 A100 $37.44 (Thunder OD) $16.80 (Vast spot) $20.64 (55%)
Hyperparameter search (Optuna) 80 hrs × 4 RTX 4090 $140.80 (Vast OD) $83.20 (Vast spot) $57.60 (41%)
Full pre-training (400B tokens, 7B) ~500 hrs × 8 H100 $6,960 (Lambda) $5,400 (Vast spot) $1,560 (22%)
Continual fine-tune (weekly) 6 hrs/wk × 2 A100 $74/mo (Thunder OD) $33/mo (Vast spot) $41/mo (55%)

Calculations based on: Lambda H100 on-demand $1.74/hr, Vast.ai H100 spot $1.35/hr; Thunder A100 on-demand $0.78/hr, Vast.ai A100 spot $0.35/hr; Vast.ai RTX 4090 on-demand $0.44/hr, spot $0.26/hr. Actual savings depend on current spot market rates.

Understanding Spot Risk: Interruption Rates and Real Cost

Spot pricing comes with one real cost: interrupted jobs lose work since the last checkpoint. Here is how to think about the true cost of interruptions:

The interruption math: An H100 spot at $1.35/hr (vs $1.74 on-demand) saves $0.39/hr. If interrupted once per day, you lose up to 30 minutes of work (with 30-min checkpoints) — but that 30 minutes of work would have cost $0.87 at on-demand rates. Your daily saving is 24 × $0.39 = $9.36. The interruption's cost (30 min recompute at spot rate) is $0.67. Net saving: $8.69/day even with one daily interruption. The math still works.

Interruption Rate Benchmarks by Provider (2026)

Provider GPU Typical Interruption Rate Warning Time Best For
Vast.ai (high-reliability host) H100 / A100 2–8% / day Variable (seconds–minutes) Long training runs
RunPod Community Cloud H100 / A100 3–10% / day ~30 seconds Fine-tuning, batch jobs
AWS Spot (us-east-1, H100) H100 5–20% / month 2 minutes AWS-native ML pipelines
GCP Spot VMs (us-central1, H100) H100 5–15% / month 30 seconds GCP-native training
Vast.ai (low-reliability host) Mixed 20–40% / day Variable Avoid for multi-hour jobs

Advanced Spot Strategies for AI Teams

Mixed Fleet: Spot Primary + On-Demand Fallback

For training runs that must complete on deadline, use a mixed fleet approach: start 80–90% of GPUs on spot, 10–20% on-demand. When spot nodes are interrupted, the on-demand nodes continue training (at reduced throughput). This provides savings of 15–20% versus full on-demand while ensuring your training run always makes progress. AWS Spot Fleet and GCP Managed Instance Groups support this natively.

Regional Arbitrage

Spot prices vary by region. H100 spot in us-west-2 may be $0.20/hr higher than eu-west-1 during US business hours due to demand patterns. For training jobs with no geographic data requirements, running in lower-demand regions (EU/Asia-Pacific off-peak) can extend your spot savings by 10–20%.

Spot for Batch, On-Demand for Serving

The cleanest architecture: run all training, fine-tuning, and batch inference on spot with checkpoint-resume. Run production serving endpoints on reserved or on-demand instances with SLA guarantees. This hybrid approach typically cuts total GPU spend by 30–50% for teams where training/batch jobs consume more GPU-hours than serving.

Checkpoint Frequency Optimization

Checkpointing too frequently wastes compute on I/O; too infrequently loses more work on interruption. The optimal checkpoint interval depends on: (1) S3/R2 upload bandwidth, (2) model size (smaller = faster checkpoint), and (3) interruption rate. For a 7B model (~14GB checkpoint) with 5GB/s S3 upload, a checkpoint takes ~3 seconds — minimal overhead at 30-minute intervals. For a 70B model (~140GB), checkpoint takes ~30 seconds — still fine at 30-minute intervals.

Compare spot and on-demand GPU pricing live

GridStackHub tracks 396 GPU pricing records across 32 providers daily — spot and on-demand, filtered by GPU model or provider.

View Live GPU Pricing →

Frequently Asked Questions

How much cheaper is GPU spot pricing versus on-demand?

According to GridStackHub.ai data, GPU spot pricing is typically 22–70% cheaper than on-demand depending on provider and GPU model. H100 spot on Vast.ai starts at $1.35/hr versus $1.74/hr on-demand at Lambda — a 22% discount. A100 spot on Vast.ai reaches $0.35/hr versus $0.78/hr on-demand — a 55% discount. AWS and GCP offer 60–70% discounts on paper, but their on-demand rates are inflated versus independent cloud, so the absolute spot price is similar. The discount varies with market supply/demand and can change hourly.

What workloads are best suited for GPU spot instances?

GPU spot is best for interruption-tolerant workloads: LLM training with checkpoint-resume, fine-tuning (LoRA, QLoRA, full fine-tune), batch inference, hyperparameter search, model evaluation, embedding generation, and data preprocessing. Poor candidates include real-time inference APIs, interactive sessions, jobs under 30 minutes, and any workload without checkpointing implemented. The rule: if it can restart from a checkpoint and losing 30 minutes of work is acceptable, it qualifies for spot.

Which cloud provider offers the cheapest H100 spot?

According to GridStackHub.ai data, Vast.ai offers the cheapest absolute H100 spot pricing at $1.35–$1.89/hr in May 2026, followed by RunPod spot at $1.50–$1.89/hr. AWS H100 spot (~$1.50–$1.95/hr per GPU) is competitive in absolute terms despite the large percentage discount, because AWS on-demand is $4.84/hr. GCP H100 spot (~$1.30–$1.85/hr) is similar to Vast.ai in absolute price. For the cheapest H100 spot without hyperscaler overhead, Vast.ai is the leading option. Filter by high-reliability hosts (95%+) for the most stable interruption rates.

How do I handle GPU spot instance interruptions in training?

The standard approach is checkpoint-resume: save model weights and optimizer state to object storage (S3, GCS, R2) every 15–30 minutes. Register a SIGTERM signal handler to save a final checkpoint when preemption is signaled. On job restart, load from the latest checkpoint. HuggingFace Trainer supports this with resume_from_checkpoint=True. PyTorch Lightning supports it with ModelCheckpoint callback. AWS provides a 2-minute warning; GCP/RunPod provide 30 seconds. With 30-minute checkpoints, the maximum work loss per interruption is 30 minutes.

Is GPU spot pricing on Vast.ai reliable?

Vast.ai spot is reliable for batch workloads with checkpointing when filtering by high-reliability hosts (95%+). Interruptions happen when the host reclaims their machine — not based on cloud demand surges. Typical interruption rate on high-reliability Vast.ai H100 instances is 2–8% per day, meaning a 24-hour job has a 92–98% chance of running uninterrupted. With 30-minute checkpointing, even a 10% daily interruption rate loses only 30 minutes of work per event. Always save checkpoints to off-host object storage — never to local disk that disappears on instance termination.

How does AWS GPU spot pricing work?

AWS EC2 Spot Instances bid on unused capacity at dynamically determined prices, typically 60–70% below on-demand for H100 instances. For H100 (p5.48xlarge, on-demand $4.84/GPU), spot runs ~$1.50–$1.95/GPU. AWS provides a 2-minute interruption warning via instance metadata service at 169.254.169.254/latest/meta-data/spot/termination-time. Use AWS Spot Fleet with mixed instance types for automatic replacement, and configure your training framework to checkpoint on SIGTERM. AWS spot interruption rates for H100 p5 in us-east-1 are typically 5–20% per month.

Related GPU Pricing Resources