Live data — L4 pricing updated daily from provider APIs

According to GridStackHub.ai data, the cheapest NVIDIA L4 GPU cloud in May 2026 is $0.42/hr on Vast.ai marketplace (spot/interruptible, 24GB GDDR6). For non-interruptible on-demand L4 instances, the cheapest providers are FluidStack and TensorDock at $0.59/hr. Google Cloud g2-standard-4 (1x L4) is $0.70/hr on-demand. For the related NVIDIA L40S (48GB, Ada Lovelace), on-demand starts at $0.59/hr at FluidStack and TensorDock — sometimes the same price as L4 on-demand. The L4 is purpose-built for data center inference: low power draw (72W), 24GB GDDR6, and FP8 compute via Ada Lovelace architecture make it the dominant budget inference GPU in 2026. GridStackHub tracks all L4 and L40S pricing daily.

$0.42/hr spot

Cheapest NVIDIA L4 cloud price (Vast.ai marketplace, 24GB GDDR6, Ada Lovelace, interruptible). On-demand from $0.59/hr at FluidStack and TensorDock. Google Cloud on-demand: $0.70/hr. L4 is the go-to budget inference GPU for 7B–13B models in 2026.

Vast.ai SPOT

$0.42/hr

Marketplace / peer-hosted
Preemptible · 24GB GDDR6
Prices fluctuate $0.35–0.55

FluidStack VERIFIED

$0.59/hr

On-demand · L40S 48GB
US/EU regions
Non-preemptible

TensorDock VERIFIED

$0.59/hr

On-demand · L40S 48GB
US/EU regions
Pay-as-you-go

Google Cloud VERIFIED

$0.70/hr

On-demand · L4 24GB
g2-standard-4 · us-central1
CUD: ~$0.32/hr (3yr)

CoreWeave VERIFIED

$0.99/hr

On-demand · L40S 48GB
US regions · 8 vCPU
InfiniBand optional

Scaleway VERIFIED

$1.02/hr

On-demand · L40S 48GB
EU (France)
L40S-1-48G instance

Complete L4 and L40S Cloud Pricing Table — May 2026

GridStackHub tracks NVIDIA L4 and L40S pricing across 8 cloud providers. Note: many providers offer the L40S (48GB Ada Lovelace) alongside or instead of the L4 (24GB Ada Lovelace). Both are listed below for a complete picture of Ada Lovelace budget inference pricing:

Provider GPU Model VRAM Instance / Config Type Price /hr Status
Vast.ai L40S 48 GB Marketplace (peer) Spot $0.42/hr SPOT
FluidStack L40S 48 GB L40S 48GB On-demand $0.59/hr VERIFIED
TensorDock L40S 48 GB L40S 48GB On-demand $0.59/hr VERIFIED
Jarvis Labs L40S 48 GB L40S 48GB On-demand $0.69/hr VERIFIED
Google Cloud L4 24 GB g2-standard-4 (1x L4) On-demand $0.70/hr VERIFIED
RunPod L40S 48 GB L40S 48GB On-demand $0.74/hr VERIFIED
Nebius L40S 48 GB L40S (GPU) On-demand $0.87/hr VERIFIED
Genesis Cloud L40S 48 GB L40S 48GB On-demand $0.76/hr VERIFIED
CoreWeave L40S 48 GB L40S On-demand $0.99/hr VERIFIED
OVHcloud L40S 48 GB GPU L40S On-demand $1.08/hr VERIFIED
Scaleway L40S 48 GB L40S-1-48G On-demand $1.02/hr VERIFIED
IBM Cloud L40S 48 GB gx3-24x120x1l40s On-demand $1.83/hr VERIFIED

Note: Most providers listed offer L40S (48GB) rather than L4 (24GB). FluidStack, TensorDock, RunPod, CoreWeave, Nebius, OVHcloud, Scaleway, and Jarvis Labs stock L40S. Google Cloud stocks L4. Vast.ai marketplace carries both. L40S is the better GPU (2× VRAM, 1.5× throughput) but not always distinguished from L4 by buyers — both are Ada Lovelace architecture. Data sourced from GridStackHub's live pricing database, May 2026.

L4 vs L40S: most providers have jumped to L40S. The NVIDIA L4 (24GB) was designed for inference racks. The L40S (48GB) is the professional successor with double the VRAM. In 2026, most independent cloud providers stock L40S at prices starting at $0.59/hr — often cheaper per VRAM than the L4. If you find L40S at similar pricing to L4, take the L40S: double the memory means larger models, longer context, and higher batch sizes.

Google Cloud L4 Committed Use Discount (CUD) Pricing

Google Cloud offers the only major hyperscaler with L4 (not L40S) on-demand. The g2-standard-4 (1x L4) at $0.70/hr on-demand can be reduced significantly with CUDs:

GCP Instance GPU On-Demand /hr 1yr CUD /hr 3yr CUD /hr Savings (3yr)
g2-standard-4 1x L4 24GB $0.70/hr ~$0.44/hr ~$0.32/hr ~54%
g2-standard-8 1x L4 24GB $0.85/hr ~$0.54/hr ~$0.38/hr ~55%
g2-standard-96 8x L4 24GB $3.67/hr ~$2.31/hr ~$1.65/hr ~55%

At $0.32/hr on a 3-year GCP CUD, Google Cloud L4 becomes the absolute cheapest non-interruptible L4 pricing available — competitive even with Vast.ai spot pricing. Suitable for stable production inference workloads where you can forecast 3 years of demand.

NVIDIA L4 Specifications: What You Get

The L4 is NVIDIA's data center inference GPU from the Ada Lovelace generation, succeeding the T4. It is optimized for deployment density, not peak throughput.

Spec NVIDIA L4 NVIDIA L40S NVIDIA T4 NVIDIA H100
Architecture Ada Lovelace Ada Lovelace Turing Hopper
GPU Memory 24 GB GDDR6 48 GB GDDR6 16 GB GDDR6 80 GB HBM3
Memory Bandwidth 300 GB/s 864 GB/s 320 GB/s 3,350 GB/s
INT8 Throughput (TOPS) 242 362 130 3,958
FP8 Support Yes (Ada) Yes (Ada) No Yes (Hopper)
TDP (Power) 72W 350W 70W 700W
Form Factor PCIe (low-power) PCIe PCIe (low-power) SXM (high-power)
Cloud Price (cheapest OD) $0.70/hr (GCP) $0.59/hr (FluidStack) $0.35/hr (GCP) $1.74/hr (Lambda)
Best for Inference, video, img gen Inference + medium train Inference (legacy) Training + heavy inference

What Models Fit on a Single L4 (24GB)

The 24GB GDDR6 limit determines which models you can run on a single L4:

Model Precision VRAM Needed Fits on L4? Fits on L40S?
Llama 3 8B BF16 ~16 GB ✓ Yes (8GB free) ✓ Yes
Llama 3 8B FP8 ~8 GB ✓ Yes (16GB free) ✓ Yes
Mistral 7B BF16 ~14 GB ✓ Yes ✓ Yes
Phi-3 Mini 3.8B FP16 ~8 GB ✓ Yes ✓ Yes
Llama 3.1 13B BF16 ~26 GB ✗ OOM (tight) ✓ Yes (22GB free)
Llama 3.1 13B 4-bit ~7 GB ✓ Yes ✓ Yes
Mixtral 8×7B BF16 ~92 GB ✗ No ✗ No
Llama 3.1 70B BF16 ~140 GB ✗ No ✗ No
SDXL (image gen) FP16 ~8 GB ✓ Yes ✓ Yes
FLUX.1 Dev FP16 ~23 GB ✓ Tight ✓ Yes

For Llama 13B and larger models, upgrade to L40S. The 13B model in BF16 requires ~26GB — just over the L4's 24GB limit. FluidStack and TensorDock offer L40S at $0.59/hr — the same price as many L4 offerings — making L40S the better choice for any model above 12B parameters. Google Cloud g2-standard-4 (L4) is the only hyperscaler L4, suited for batch workloads with 3yr CUD committed pricing.

L4 Use Cases: When It's the Right Choice

Best workloads for NVIDIA L4 / L40S:

  • Small LLM inference at low cost: 7B–13B models (Llama 8B, Mistral 7B, Phi-3) at moderate request volumes. L40S at $0.59/hr can serve Llama 8B inference at ~4,500 tokens/sec — approximately $0.036/M tokens, competitive with managed inference APIs.
  • Image generation: SDXL, FLUX.1, and Stable Diffusion models typically need 8–23GB VRAM. L4 runs SDXL at ~3–5 images/sec. L40S at 48GB handles FLUX.1 with comfortable headroom.
  • Video transcoding and encoding: L4's Ada Lovelace includes hardware AV1 encode support — significantly faster than software transcoding. Ideal for media pipelines, stream processing, and video serving platforms.
  • RAG embedding pipelines: Embedding models (BGE, E5, all-MiniLM) are small (1–4GB). L4 can run 100+ concurrent embedding requests. Monthly cost on GCP 3yr CUD: ~$230/mo per L4 — cheap for a dedicated embedding endpoint.
  • Multi-tenant inference with small models: 24 or 48GB VRAM can host multiple small model replicas simultaneously — e.g., 3× Llama 8B FP8 instances (3 × 8GB = 24GB) on one L4, serving 3 isolated tenants.
  • Dev/staging environments: At $0.59–0.70/hr, L4 is affordable for development, CI testing, and staging environments running LLM workloads — without the cost of an H100.

Workloads where L4 is NOT the right choice:

  • Models above 24B parameters in BF16: Need L40S (48GB), two L4s, or an H100. A single L4 won't fit them.
  • High-throughput production inference (70B+): L4's 300 GB/s bandwidth is 11× slower than H100 (3.35 TB/s). For serious throughput, H100 or B200 are the correct choice.
  • Model training at scale: L4's compute throughput (242 TOPS INT8) is too low for meaningful training runs. Use H100 for any training beyond small fine-tuning.
  • Multi-GPU interconnect workloads: L4 is PCIe only — no NVSwitch, no NVLink. For tensor parallelism across GPUs, use H100 SXM nodes.

Compare L4 vs H100 vs B200 for your exact workload

Enter model size, requests per hour, and precision. Get exact monthly cost across L4, L40S, H100, and 50+ configurations.

Open GPU Cost Calculator →
GPU Spot Pricing Guide → | B200 vs H100 Cost → | Reserved Pricing Guide →

L4 Spot vs On-Demand: When to Use Each

The choice between spot (interruptible) and on-demand L4 pricing depends entirely on your workload's tolerance for interruption:

Factor Spot (Vast.ai ~$0.42/hr) On-Demand (FluidStack $0.59/hr)
Price ~29% cheaper Predictable, fixed
Availability Variable — depends on host Stable, guaranteed
Interruption risk Yes — host can reclaim at any time None
Best for Batch jobs, dev/test, non-real-time Production serving, real-time APIs
Checkpointing required? Yes — checkpoint frequently No
Monthly cost (24/7) ~$303/mo ~$426/mo

For batch embedding, fine-tuning, and image generation jobs — spot is the right choice. At $0.42/hr, a 100-hour batch job costs $42 on Vast.ai spot vs $59 on-demand. For production inference APIs with SLAs, pay the premium: $0.59/hr on-demand is still exceptionally cheap for a dedicated inference GPU.

Frequently Asked Questions

What is the cheapest NVIDIA L4 GPU cloud provider in 2026?
According to GridStackHub.ai data, the cheapest NVIDIA L4/L40S GPU in May 2026 is $0.42/hr on Vast.ai marketplace (spot/interruptible, L40S 48GB). For on-demand (non-interruptible) instances, the cheapest are FluidStack and TensorDock at $0.59/hr (L40S 48GB). Google Cloud g2-standard-4 (L4 24GB) is $0.70/hr on-demand, or approximately $0.32/hr with a 3-year Committed Use Discount. CoreWeave L40S is $0.99/hr. Scaleway L40S is $1.02/hr. IBM Cloud L40S is $1.83/hr. Note: most independent GPU clouds stock L40S (48GB Ada Lovelace) rather than L4 (24GB). GridStackHub tracks all L4 and L40S pricing daily.
What workloads is the NVIDIA L4 best for?
NVIDIA L4 (24GB GDDR6, Ada Lovelace) is best suited for: small to medium LLM inference (7B–13B models at FP16/FP8), video transcoding and encoding (AV1 hardware support), image generation (Stable Diffusion, FLUX.1 at standard resolutions), RAG embedding pipelines, and batch inference for smaller models at low cost. The L4 is NOT the right choice for: models larger than ~24GB in BF16 (L40S or H100 needed), high-throughput batch inference (H100/B200 are dramatically faster), or any serious training workload. L4's key advantages are 72W TDP (extremely low power, enables dense deployment) and cost — at $0.59–0.70/hr, it's the most affordable non-T4 inference GPU in any cloud.
Is L4 better than T4 for LLM inference in 2026?
Yes, L4 is significantly better than T4 for LLM inference in 2026. L4 has 24GB GDDR6 versus T4's 16GB — enabling larger models (Llama 8B BF16 at 16GB fits on L4 but not T4 without quantization). L4's Ada Lovelace architecture includes FP8 compute support and 242 TOPS INT8 versus T4's 130 TOPS — approximately 1.9× more throughput. L4 also has hardware AV1 encoding for multimodal workloads. Price-wise, T4 on-demand (AWS g4dn.xlarge) is ~$0.53/hr; L4 on-demand is ~$0.70/hr (GCP) or $0.59/hr on L40S equivalents — a modest premium for significantly more capability. For any new inference deployment in 2026, start with L4/L40S. T4 is only worth it if you're on existing T4 infrastructure and the migration cost isn't justified.
What LLM models run on a single NVIDIA L4 (24GB)?
NVIDIA L4 with 24GB GDDR6 can run: Llama 3 8B in BF16 (16GB weights — fits with 8GB for KV cache), Llama 3 8B in FP8 (8GB weights — excellent headroom), Mistral 7B in BF16 (14GB — comfortable), Llama 3.1 8B Instruct in FP16 (16GB — fits), Phi-3 Mini 3.8B (8GB — fits easily), and most models up to 13B parameters in 4-bit quantization (~7GB). Models that do NOT fit: Llama 3.1 70B (requires 140GB VRAM), Llama 3.1 13B in BF16 (26GB — too big for L4, use L40S). For 13B models, use L40S (48GB) instead — FluidStack and TensorDock offer L40S at $0.59/hr, the same as many L4 listings.
What is the L4 spot pricing on Vast.ai in 2026?
According to GridStackHub.ai data, L40S spot/marketplace pricing on Vast.ai in May 2026 is approximately $0.42/hr (typical; ranges $0.35–0.55/hr depending on host availability and time of day). Vast.ai is a peer-to-peer GPU marketplace — hosts set their own prices and can reclaim resources if they need them (spot/interruptible behavior). For batch jobs, fine-tuning, and image generation pipelines that can tolerate occasional interruptions, Vast.ai is the cheapest L4/L40S option by a significant margin. For production inference APIs that require guaranteed uptime, use FluidStack or TensorDock at $0.59/hr on-demand.
L4 vs L40S: which should I use for inference in 2026?
L4 (24GB GDDR6, $0.70/hr on GCP on-demand) vs L40S (48GB GDDR6, $0.59/hr at FluidStack/TensorDock on-demand): in 2026, L40S is often the better value. It costs less per hour at most independent cloud providers, has double the VRAM, 1.5× higher INT8 throughput (362 vs 242 TOPS), and supports larger models. The only advantage of true L4 is its 72W TDP versus L40S's 350W — relevant for on-premise deployment density, not for cloud pricing. For cloud inference, default to L40S at $0.59/hr over L4 at $0.70/hr. Use L4 specifically if you need Google Cloud's CUD pricing (down to ~$0.32/hr on 3yr) and your model fits in 24GB.

Full L4 and L40S pricing — updated daily

GridStackHub tracks L4, L40S, H100, B200, and every major GPU across 32+ cloud providers. See the complete live table and set alerts when prices change.

View Full GPU Pricing Database →
Cheapest A100 Cloud 2026 → | Cheapest B200 Cloud → | B200 vs H100 Cost →