According to GridStackHub.ai data, the cheapest NVIDIA L4 GPU cloud in May 2026 is $0.42/hr on Vast.ai marketplace (spot/interruptible, 24GB GDDR6). For non-interruptible on-demand L4 instances, the cheapest providers are FluidStack and TensorDock at $0.59/hr. Google Cloud g2-standard-4 (1x L4) is $0.70/hr on-demand. For the related NVIDIA L40S (48GB, Ada Lovelace), on-demand starts at $0.59/hr at FluidStack and TensorDock — sometimes the same price as L4 on-demand. The L4 is purpose-built for data center inference: low power draw (72W), 24GB GDDR6, and FP8 compute via Ada Lovelace architecture make it the dominant budget inference GPU in 2026. GridStackHub tracks all L4 and L40S pricing daily.
Cheapest NVIDIA L4 cloud price (Vast.ai marketplace, 24GB GDDR6, Ada Lovelace, interruptible). On-demand from $0.59/hr at FluidStack and TensorDock. Google Cloud on-demand: $0.70/hr. L4 is the go-to budget inference GPU for 7B–13B models in 2026.
Vast.ai SPOT
Marketplace / peer-hosted
Preemptible · 24GB GDDR6
Prices fluctuate $0.35–0.55
FluidStack VERIFIED
On-demand · L40S 48GB
US/EU regions
Non-preemptible
TensorDock VERIFIED
On-demand · L40S 48GB
US/EU regions
Pay-as-you-go
Google Cloud VERIFIED
On-demand · L4 24GB
g2-standard-4 · us-central1
CUD: ~$0.32/hr (3yr)
CoreWeave VERIFIED
On-demand · L40S 48GB
US regions · 8 vCPU
InfiniBand optional
Scaleway VERIFIED
On-demand · L40S 48GB
EU (France)
L40S-1-48G instance
Complete L4 and L40S Cloud Pricing Table — May 2026
GridStackHub tracks NVIDIA L4 and L40S pricing across 8 cloud providers. Note: many providers offer the L40S (48GB Ada Lovelace) alongside or instead of the L4 (24GB Ada Lovelace). Both are listed below for a complete picture of Ada Lovelace budget inference pricing:
| Provider | GPU Model | VRAM | Instance / Config | Type | Price /hr | Status |
|---|---|---|---|---|---|---|
| Vast.ai | L40S | 48 GB | Marketplace (peer) | Spot | $0.42/hr | SPOT |
| FluidStack | L40S | 48 GB | L40S 48GB | On-demand | $0.59/hr | VERIFIED |
| TensorDock | L40S | 48 GB | L40S 48GB | On-demand | $0.59/hr | VERIFIED |
| Jarvis Labs | L40S | 48 GB | L40S 48GB | On-demand | $0.69/hr | VERIFIED |
| Google Cloud | L4 | 24 GB | g2-standard-4 (1x L4) | On-demand | $0.70/hr | VERIFIED |
| RunPod | L40S | 48 GB | L40S 48GB | On-demand | $0.74/hr | VERIFIED |
| Nebius | L40S | 48 GB | L40S (GPU) | On-demand | $0.87/hr | VERIFIED |
| Genesis Cloud | L40S | 48 GB | L40S 48GB | On-demand | $0.76/hr | VERIFIED |
| CoreWeave | L40S | 48 GB | L40S | On-demand | $0.99/hr | VERIFIED |
| OVHcloud | L40S | 48 GB | GPU L40S | On-demand | $1.08/hr | VERIFIED |
| Scaleway | L40S | 48 GB | L40S-1-48G | On-demand | $1.02/hr | VERIFIED |
| IBM Cloud | L40S | 48 GB | gx3-24x120x1l40s | On-demand | $1.83/hr | VERIFIED |
Note: Most providers listed offer L40S (48GB) rather than L4 (24GB). FluidStack, TensorDock, RunPod, CoreWeave, Nebius, OVHcloud, Scaleway, and Jarvis Labs stock L40S. Google Cloud stocks L4. Vast.ai marketplace carries both. L40S is the better GPU (2× VRAM, 1.5× throughput) but not always distinguished from L4 by buyers — both are Ada Lovelace architecture. Data sourced from GridStackHub's live pricing database, May 2026.
L4 vs L40S: most providers have jumped to L40S. The NVIDIA L4 (24GB) was designed for inference racks. The L40S (48GB) is the professional successor with double the VRAM. In 2026, most independent cloud providers stock L40S at prices starting at $0.59/hr — often cheaper per VRAM than the L4. If you find L40S at similar pricing to L4, take the L40S: double the memory means larger models, longer context, and higher batch sizes.
Google Cloud L4 Committed Use Discount (CUD) Pricing
Google Cloud offers the only major hyperscaler with L4 (not L40S) on-demand. The g2-standard-4 (1x L4) at $0.70/hr on-demand can be reduced significantly with CUDs:
| GCP Instance | GPU | On-Demand /hr | 1yr CUD /hr | 3yr CUD /hr | Savings (3yr) |
|---|---|---|---|---|---|
| g2-standard-4 | 1x L4 24GB | $0.70/hr | ~$0.44/hr | ~$0.32/hr | ~54% |
| g2-standard-8 | 1x L4 24GB | $0.85/hr | ~$0.54/hr | ~$0.38/hr | ~55% |
| g2-standard-96 | 8x L4 24GB | $3.67/hr | ~$2.31/hr | ~$1.65/hr | ~55% |
At $0.32/hr on a 3-year GCP CUD, Google Cloud L4 becomes the absolute cheapest non-interruptible L4 pricing available — competitive even with Vast.ai spot pricing. Suitable for stable production inference workloads where you can forecast 3 years of demand.
NVIDIA L4 Specifications: What You Get
The L4 is NVIDIA's data center inference GPU from the Ada Lovelace generation, succeeding the T4. It is optimized for deployment density, not peak throughput.
| Spec | NVIDIA L4 | NVIDIA L40S | NVIDIA T4 | NVIDIA H100 |
|---|---|---|---|---|
| Architecture | Ada Lovelace | Ada Lovelace | Turing | Hopper |
| GPU Memory | 24 GB GDDR6 | 48 GB GDDR6 | 16 GB GDDR6 | 80 GB HBM3 |
| Memory Bandwidth | 300 GB/s | 864 GB/s | 320 GB/s | 3,350 GB/s |
| INT8 Throughput (TOPS) | 242 | 362 | 130 | 3,958 |
| FP8 Support | Yes (Ada) | Yes (Ada) | No | Yes (Hopper) |
| TDP (Power) | 72W | 350W | 70W | 700W |
| Form Factor | PCIe (low-power) | PCIe | PCIe (low-power) | SXM (high-power) |
| Cloud Price (cheapest OD) | $0.70/hr (GCP) | $0.59/hr (FluidStack) | $0.35/hr (GCP) | $1.74/hr (Lambda) |
| Best for | Inference, video, img gen | Inference + medium train | Inference (legacy) | Training + heavy inference |
What Models Fit on a Single L4 (24GB)
The 24GB GDDR6 limit determines which models you can run on a single L4:
| Model | Precision | VRAM Needed | Fits on L4? | Fits on L40S? |
|---|---|---|---|---|
| Llama 3 8B | BF16 | ~16 GB | ✓ Yes (8GB free) | ✓ Yes |
| Llama 3 8B | FP8 | ~8 GB | ✓ Yes (16GB free) | ✓ Yes |
| Mistral 7B | BF16 | ~14 GB | ✓ Yes | ✓ Yes |
| Phi-3 Mini 3.8B | FP16 | ~8 GB | ✓ Yes | ✓ Yes |
| Llama 3.1 13B | BF16 | ~26 GB | ✗ OOM (tight) | ✓ Yes (22GB free) |
| Llama 3.1 13B | 4-bit | ~7 GB | ✓ Yes | ✓ Yes |
| Mixtral 8×7B | BF16 | ~92 GB | ✗ No | ✗ No |
| Llama 3.1 70B | BF16 | ~140 GB | ✗ No | ✗ No |
| SDXL (image gen) | FP16 | ~8 GB | ✓ Yes | ✓ Yes |
| FLUX.1 Dev | FP16 | ~23 GB | ✓ Tight | ✓ Yes |
For Llama 13B and larger models, upgrade to L40S. The 13B model in BF16 requires ~26GB — just over the L4's 24GB limit. FluidStack and TensorDock offer L40S at $0.59/hr — the same price as many L4 offerings — making L40S the better choice for any model above 12B parameters. Google Cloud g2-standard-4 (L4) is the only hyperscaler L4, suited for batch workloads with 3yr CUD committed pricing.
L4 Use Cases: When It's the Right Choice
Best workloads for NVIDIA L4 / L40S:
- Small LLM inference at low cost: 7B–13B models (Llama 8B, Mistral 7B, Phi-3) at moderate request volumes. L40S at $0.59/hr can serve Llama 8B inference at ~4,500 tokens/sec — approximately $0.036/M tokens, competitive with managed inference APIs.
- Image generation: SDXL, FLUX.1, and Stable Diffusion models typically need 8–23GB VRAM. L4 runs SDXL at ~3–5 images/sec. L40S at 48GB handles FLUX.1 with comfortable headroom.
- Video transcoding and encoding: L4's Ada Lovelace includes hardware AV1 encode support — significantly faster than software transcoding. Ideal for media pipelines, stream processing, and video serving platforms.
- RAG embedding pipelines: Embedding models (BGE, E5, all-MiniLM) are small (1–4GB). L4 can run 100+ concurrent embedding requests. Monthly cost on GCP 3yr CUD: ~$230/mo per L4 — cheap for a dedicated embedding endpoint.
- Multi-tenant inference with small models: 24 or 48GB VRAM can host multiple small model replicas simultaneously — e.g., 3× Llama 8B FP8 instances (3 × 8GB = 24GB) on one L4, serving 3 isolated tenants.
- Dev/staging environments: At $0.59–0.70/hr, L4 is affordable for development, CI testing, and staging environments running LLM workloads — without the cost of an H100.
Workloads where L4 is NOT the right choice:
- Models above 24B parameters in BF16: Need L40S (48GB), two L4s, or an H100. A single L4 won't fit them.
- High-throughput production inference (70B+): L4's 300 GB/s bandwidth is 11× slower than H100 (3.35 TB/s). For serious throughput, H100 or B200 are the correct choice.
- Model training at scale: L4's compute throughput (242 TOPS INT8) is too low for meaningful training runs. Use H100 for any training beyond small fine-tuning.
- Multi-GPU interconnect workloads: L4 is PCIe only — no NVSwitch, no NVLink. For tensor parallelism across GPUs, use H100 SXM nodes.
Compare L4 vs H100 vs B200 for your exact workload
Enter model size, requests per hour, and precision. Get exact monthly cost across L4, L40S, H100, and 50+ configurations.
Open GPU Cost Calculator →L4 Spot vs On-Demand: When to Use Each
The choice between spot (interruptible) and on-demand L4 pricing depends entirely on your workload's tolerance for interruption:
| Factor | Spot (Vast.ai ~$0.42/hr) | On-Demand (FluidStack $0.59/hr) |
|---|---|---|
| Price | ~29% cheaper | Predictable, fixed |
| Availability | Variable — depends on host | Stable, guaranteed |
| Interruption risk | Yes — host can reclaim at any time | None |
| Best for | Batch jobs, dev/test, non-real-time | Production serving, real-time APIs |
| Checkpointing required? | Yes — checkpoint frequently | No |
| Monthly cost (24/7) | ~$303/mo | ~$426/mo |
For batch embedding, fine-tuning, and image generation jobs — spot is the right choice. At $0.42/hr, a 100-hour batch job costs $42 on Vast.ai spot vs $59 on-demand. For production inference APIs with SLAs, pay the premium: $0.59/hr on-demand is still exceptionally cheap for a dedicated inference GPU.
Get L4 price alerts
We'll notify you when L4 prices drop, new providers list L4 capacity, or spot availability changes. Free, no credit card required.
Frequently Asked Questions
Full L4 and L40S pricing — updated daily
GridStackHub tracks L4, L40S, H100, B200, and every major GPU across 32+ cloud providers. See the complete live table and set alerts when prices change.
View Full GPU Pricing Database →