Side-by-Side Comparison (H100, CoreWeave, 2026)
| Model | Hourly Rate | Monthly (720hr) | Annual | Savings vs OD |
|---|---|---|---|---|
| On-Demand | $2.23/hr | $1,606 | $19,547 | Baseline |
| Reserved 1-yr | $1.79/hr | $1,289 | $15,680 | −20% |
| Spot (Vast.ai) | ~$1.49/hr* | ~$1,073 | ~$13,052 | −33%* |
*Spot pricing fluctuates. Listed rate is typical, not guaranteed. Spot instances may be interrupted.
On-Demand: Maximum Flexibility, Highest Cost
On-demand is pay-as-you-go with no commitments. You can start, stop, and resize instances at any time. Providers guarantee availability (within reason) — you won't lose the instance unless you stop it.
Best for: Development, experimentation, short training runs (< 1 week), variable-load inference. Any workload where flexibility is worth paying a premium for.
Worst for: Long-running training jobs, production inference with known traffic patterns. You're paying the maximum rate continuously.
Reserved: Best Economics for Steady Workloads
Reserved pricing commits you to paying for capacity for 1 or 3 years. In exchange, you get a 20–50% discount. The GPU is reserved for you — you won't be preempted, and you're guaranteed availability.
Break-even math: 1-year reserved at CoreWeave saves $0.44/hr vs on-demand. That's $3,854/year per GPU. If you use the GPU for > 8,760 × (savings / premium) = effectively any steady workload, reserved wins.
Best for: Production inference endpoints, ongoing training pipelines, teams with predictable GPU needs exceeding 6 months.
Spot: Maximum Savings, Requires Engineering
Spot instances use excess cloud capacity sold at steep discounts. The key constraint: they can be reclaimed with short notice (30 seconds to 2 minutes typically). That's the price for 33–80% off.
Making spot work:
- Checkpoint training jobs every 10–30 minutes
- Use distributed training across multiple spot instances (interrupting one doesn't kill the job)
- Implement automatic job re-submission on interruption
- Store training state on durable object storage (S3/R2), not local disk
Best for: Large training jobs that can be paused and resumed, embedding generation, batch offline inference.
Read: Full Spot GPU Pricing Guide →
Decision Framework
| Scenario | Recommended Model | Reason |
|---|---|---|
| Development / experimentation | On-demand | Flexibility beats cost at low hours |
| Training run < 1 week | On-demand or Spot | Spot if fault-tolerant, OD if not |
| Training run > 1 month | Spot or Reserved | Spot for max savings; reserved if reliability critical |
| Production inference (< 10k req/day) | On-demand | Variable load, flexibility needed |
| Production inference (> 50k req/day) | Reserved | Predictable load, need guaranteed capacity |
| Batch embedding generation | Spot | Interruptible, maximize cost savings |
| Compliance / SLA required | Reserved | Guaranteed availability, no preemptions |
Calculate Your Savings
Use our GPU Cost Calculator to model on-demand vs reserved vs spot costs for your specific workload — GPU type, hours per month, and workload duration.
Frequently Asked Questions
See live spot and reserved prices
Real-time pricing for all three models across 32+ providers.
GPU Spot Pricing Guide →