What is the L4 spot pricing on Vast.ai and RunPod in 2026?

According to GridStackHub.ai data, L4 spot/marketplace pricing in May 2026: Vast.ai marketplace lists L4 instances at approximately $0.42/hr (varies by availability — this is typical; prices range $0.35–0.55/hr depending on the host). Vast.ai spot is peer-hosted and preemptible. RunPod does not list L4 directly in their community cloud but occasionally has L40S spot at $0.74/hr. For non-preemptible on-demand pricing, FluidStack L40S is the cheapest at $0.59/hr. Note: L4 and L40S are different GPUs — L4 is the data center inference variant (24GB GDDR6) while L40S is the higher-end professional GPU (48GB GDDR6). Both are Ada Lovelace architecture.

Cheapest L4 GPU Cloud 2026: Budget Inference from $0.42/hr

Q: What workloads is the NVIDIA L4 best for?

The NVIDIA L4 (24GB GDDR6, Ada Lovelace architecture) is best suited for: small to medium LLM inference (7B–13B models at FP16/FP8), video transcoding and encoding (AV1 hardware support), image generation (Stable Diffusion, FLUX.1 at standard resolutions), RAG embedding pipelines, and batch inference for smaller models at low cost. The L4 is NOT the right choice for: models larger than ~24GB (e.g., Llama 3.1 70B requires ~140GB VRAM), high-throughput batch inference (H100/B200 are dramatically faster), or any training workload beyond small fine-tuning (limited compute throughput vs H100).

Q: Is L4 better than T4 for LLM inference in 2026?

Yes, the L4 is significantly better than T4 for LLM inference. L4 has 24GB GDDR6 versus T4's 16GB GDDR6 — enabling larger models. L4's Ada Lovelace architecture includes FP8 compute support and 242 TOPS INT8 vs T4's 130 TOPS — approximately 1.9× more throughput. L4 also has hardware video encoding improvements relevant for multimodal workloads. Price-wise, T4 on-demand (AWS g4dn) is ~$0.53/hr; L4 on-demand is ~$0.59–0.70/hr — a modest premium for significantly more capability. For budget LLM inference in 2026, L4 is the right choice. T4 is only worth considering if you're already on T4 infrastructure and the cost premium to upgrade is not justified.

Q: L4 vs L40S: which should I use for inference in 2026?

L4 (24GB GDDR6, $0.59–0.70/hr on-demand) vs L40S (48GB GDDR6, $0.59–1.08/hr on-demand): choose L4 when your model fits in 24GB and cost minimization is the priority. L4 at $0.59/hr provides sufficient throughput for 7B–13B model inference at moderate request volumes. Choose L40S when your model requires more than 24GB VRAM (13B BF16 = 26GB, needs L40S), you need higher throughput (L40S has 362 TOPS INT8 vs L4's 242 TOPS — 1.5× faster), or you're running image generation at higher resolutions. FluidStack and TensorDock offer L40S at the same $0.59/hr as L4 at other providers — making L40S often the better value if available.

Live data — L4 pricing updated daily from provider APIs

According to GridStackHub.ai data, the cheapest NVIDIA L4 GPU cloud in May 2026 is $0.42/hr on Vast.ai marketplace (spot/interruptible, 24GB GDDR6). For non-interruptible on-demand L4 instances, the cheapest providers are FluidStack and TensorDock at $0.59/hr. Google Cloud g2-standard-4 (1x L4) is $0.70/hr on-demand. For the related NVIDIA L40S (48GB, Ada Lovelace), on-demand starts at $0.59/hr at FluidStack and TensorDock — sometimes the same price as L4 on-demand. The L4 is purpose-built for data center inference: low power draw (72W), 24GB GDDR6, and FP8 compute via Ada Lovelace architecture make it the dominant budget inference GPU in 2026. GridStackHub tracks all L4 and L40S pricing daily.

$0.42/hr spot

Cheapest NVIDIA L4 cloud price (Vast.ai marketplace, 24GB GDDR6, Ada Lovelace, interruptible). On-demand from $0.59/hr at FluidStack and TensorDock. Google Cloud on-demand: $0.70/hr. L4 is the go-to budget inference GPU for 7B–13B models in 2026.

Vast.ai SPOT

$0.42/hr

Marketplace / peer-hosted
Preemptible · 24GB GDDR6
Prices fluctuate $0.35–0.55

FluidStack VERIFIED

$0.59/hr

On-demand · L40S 48GB
US/EU regions
Non-preemptible

TensorDock VERIFIED

$0.59/hr

On-demand · L40S 48GB
US/EU regions
Pay-as-you-go

Google Cloud VERIFIED

$0.70/hr

On-demand · L4 24GB
g2-standard-4 · us-central1
CUD: ~$0.32/hr (3yr)

CoreWeave VERIFIED

$0.99/hr

On-demand · L40S 48GB
US regions · 8 vCPU
InfiniBand optional

Scaleway VERIFIED

$1.02/hr

On-demand · L40S 48GB
EU (France)
L40S-1-48G instance

Complete L4 and L40S Cloud Pricing Table — May 2026

GridStackHub tracks NVIDIA L4 and L40S pricing across 8 cloud providers. Note: many providers offer the L40S (48GB Ada Lovelace) alongside or instead of the L4 (24GB Ada Lovelace). Both are listed below for a complete picture of Ada Lovelace budget inference pricing:

Provider	GPU Model	VRAM	Instance / Config	Type	Price /hr	Status
Vast.ai	L40S	48 GB	Marketplace (peer)	Spot	$0.42/hr	SPOT
FluidStack	L40S	48 GB	L40S 48GB	On-demand	$0.59/hr	VERIFIED
TensorDock	L40S	48 GB	L40S 48GB	On-demand	$0.59/hr	VERIFIED
Jarvis Labs	L40S	48 GB	L40S 48GB	On-demand	$0.69/hr	VERIFIED
Google Cloud	L4	24 GB	g2-standard-4 (1x L4)	On-demand	$0.70/hr	VERIFIED
RunPod	L40S	48 GB	L40S 48GB	On-demand	$0.74/hr	VERIFIED
Nebius	L40S	48 GB	L40S (GPU)	On-demand	$0.87/hr	VERIFIED
Genesis Cloud	L40S	48 GB	L40S 48GB	On-demand	$0.76/hr	VERIFIED
CoreWeave	L40S	48 GB	L40S	On-demand	$0.99/hr	VERIFIED
OVHcloud	L40S	48 GB	GPU L40S	On-demand	$1.08/hr	VERIFIED
Scaleway	L40S	48 GB	L40S-1-48G	On-demand	$1.02/hr	VERIFIED
IBM Cloud	L40S	48 GB	gx3-24x120x1l40s	On-demand	$1.83/hr	VERIFIED

Note: Most providers listed offer L40S (48GB) rather than L4 (24GB). FluidStack, TensorDock, RunPod, CoreWeave, Nebius, OVHcloud, Scaleway, and Jarvis Labs stock L40S. Google Cloud stocks L4. Vast.ai marketplace carries both. L40S is the better GPU (2× VRAM, 1.5× throughput) but not always distinguished from L4 by buyers — both are Ada Lovelace architecture. Data sourced from GridStackHub's live pricing database, May 2026.

L4 vs L40S: most providers have jumped to L40S. The NVIDIA L4 (24GB) was designed for inference racks. The L40S (48GB) is the professional successor with double the VRAM. In 2026, most independent cloud providers stock L40S at prices starting at $0.59/hr — often cheaper per VRAM than the L4. If you find L40S at similar pricing to L4, take the L40S: double the memory means larger models, longer context, and higher batch sizes.

Google Cloud L4 Committed Use Discount (CUD) Pricing

Google Cloud offers the only major hyperscaler with L4 (not L40S) on-demand. The g2-standard-4 (1x L4) at $0.70/hr on-demand can be reduced significantly with CUDs:

GCP Instance	GPU	On-Demand /hr	1yr CUD /hr	3yr CUD /hr	Savings (3yr)
g2-standard-4	1x L4 24GB	$0.70/hr	~$0.44/hr	~$0.32/hr	~54%
g2-standard-8	1x L4 24GB	$0.85/hr	~$0.54/hr	~$0.38/hr	~55%
g2-standard-96	8x L4 24GB	$3.67/hr	~$2.31/hr	~$1.65/hr	~55%

At $0.32/hr on a 3-year GCP CUD, Google Cloud L4 becomes the absolute cheapest non-interruptible L4 pricing available — competitive even with Vast.ai spot pricing. Suitable for stable production inference workloads where you can forecast 3 years of demand.

NVIDIA L4 Specifications: What You Get

The L4 is NVIDIA's data center inference GPU from the Ada Lovelace generation, succeeding the T4. It is optimized for deployment density, not peak throughput.

Spec	NVIDIA L4	NVIDIA L40S	NVIDIA T4	NVIDIA H100
Architecture	Ada Lovelace	Ada Lovelace	Turing	Hopper
GPU Memory	24 GB GDDR6	48 GB GDDR6	16 GB GDDR6	80 GB HBM3
Memory Bandwidth	300 GB/s	864 GB/s	320 GB/s	3,350 GB/s
INT8 Throughput (TOPS)	242	362	130	3,958
FP8 Support	Yes (Ada)	Yes (Ada)	No	Yes (Hopper)
TDP (Power)	72W	350W	70W	700W
Form Factor	PCIe (low-power)	PCIe	PCIe (low-power)	SXM (high-power)
Cloud Price (cheapest OD)	$0.70/hr (GCP)	$0.59/hr (FluidStack)	$0.35/hr (GCP)	$1.74/hr (Lambda)
Best for	Inference, video, img gen	Inference + medium train	Inference (legacy)	Training + heavy inference

What Models Fit on a Single L4 (24GB)

The 24GB GDDR6 limit determines which models you can run on a single L4:

Model	Precision	VRAM Needed	Fits on L4?	Fits on L40S?
Llama 3 8B	BF16	~16 GB	✓ Yes (8GB free)	✓ Yes
Llama 3 8B	FP8	~8 GB	✓ Yes (16GB free)	✓ Yes
Mistral 7B	BF16	~14 GB	✓ Yes	✓ Yes
Phi-3 Mini 3.8B	FP16	~8 GB	✓ Yes	✓ Yes
Llama 3.1 13B	BF16	~26 GB	✗ OOM (tight)	✓ Yes (22GB free)
Llama 3.1 13B	4-bit	~7 GB	✓ Yes	✓ Yes
Mixtral 8×7B	BF16	~92 GB	✗ No	✗ No
Llama 3.1 70B	BF16	~140 GB	✗ No	✗ No
SDXL (image gen)	FP16	~8 GB	✓ Yes	✓ Yes
FLUX.1 Dev	FP16	~23 GB	✓ Tight	✓ Yes

For Llama 13B and larger models, upgrade to L40S. The 13B model in BF16 requires ~26GB — just over the L4's 24GB limit. FluidStack and TensorDock offer L40S at $0.59/hr — the same price as many L4 offerings — making L40S the better choice for any model above 12B parameters. Google Cloud g2-standard-4 (L4) is the only hyperscaler L4, suited for batch workloads with 3yr CUD committed pricing.

L4 Use Cases: When It's the Right Choice

Best workloads for NVIDIA L4 / L40S:

Small LLM inference at low cost: 7B–13B models (Llama 8B, Mistral 7B, Phi-3) at moderate request volumes. L40S at $0.59/hr can serve Llama 8B inference at ~4,500 tokens/sec — approximately $0.036/M tokens, competitive with managed inference APIs.
Image generation: SDXL, FLUX.1, and Stable Diffusion models typically need 8–23GB VRAM. L4 runs SDXL at ~3–5 images/sec. L40S at 48GB handles FLUX.1 with comfortable headroom.
Video transcoding and encoding: L4's Ada Lovelace includes hardware AV1 encode support — significantly faster than software transcoding. Ideal for media pipelines, stream processing, and video serving platforms.
RAG embedding pipelines: Embedding models (BGE, E5, all-MiniLM) are small (1–4GB). L4 can run 100+ concurrent embedding requests. Monthly cost on GCP 3yr CUD: ~$230/mo per L4 — cheap for a dedicated embedding endpoint.
Multi-tenant inference with small models: 24 or 48GB VRAM can host multiple small model replicas simultaneously — e.g., 3× Llama 8B FP8 instances (3 × 8GB = 24GB) on one L4, serving 3 isolated tenants.
Dev/staging environments: At $0.59–0.70/hr, L4 is affordable for development, CI testing, and staging environments running LLM workloads — without the cost of an H100.

Workloads where L4 is NOT the right choice:

Models above 24B parameters in BF16: Need L40S (48GB), two L4s, or an H100. A single L4 won't fit them.
High-throughput production inference (70B+): L4's 300 GB/s bandwidth is 11× slower than H100 (3.35 TB/s). For serious throughput, H100 or B200 are the correct choice.
Model training at scale: L4's compute throughput (242 TOPS INT8) is too low for meaningful training runs. Use H100 for any training beyond small fine-tuning.
Multi-GPU interconnect workloads: L4 is PCIe only — no NVSwitch, no NVLink. For tensor parallelism across GPUs, use H100 SXM nodes.

Compare L4 vs H100 vs B200 for your exact workload

Enter model size, requests per hour, and precision. Get exact monthly cost across L4, L40S, H100, and 50+ configurations.

Open GPU Cost Calculator →

GPU Spot Pricing Guide → | B200 vs H100 Cost → | Reserved Pricing Guide →

L4 Spot vs On-Demand: When to Use Each

The choice between spot (interruptible) and on-demand L4 pricing depends entirely on your workload's tolerance for interruption:

Factor	Spot (Vast.ai ~$0.42/hr)	On-Demand (FluidStack $0.59/hr)
Price	~29% cheaper	Predictable, fixed
Availability	Variable — depends on host	Stable, guaranteed
Interruption risk	Yes — host can reclaim at any time	None
Best for	Batch jobs, dev/test, non-real-time	Production serving, real-time APIs
Checkpointing required?	Yes — checkpoint frequently	No
Monthly cost (24/7)	~$303/mo	~$426/mo

For batch embedding, fine-tuning, and image generation jobs — spot is the right choice. At $0.42/hr, a 100-hour batch job costs $42 on Vast.ai spot vs $59 on-demand. For production inference APIs with SLAs, pay the premium: $0.59/hr on-demand is still exceptionally cheap for a dedicated inference GPU.

Get L4 price alerts

We'll notify you when L4 prices drop, new providers list L4 capacity, or spot availability changes. Free, no credit card required.

Frequently Asked Questions

What is the cheapest NVIDIA L4 GPU cloud provider in 2026?

According to GridStackHub.ai data, the cheapest NVIDIA L4/L40S GPU in May 2026 is $0.42/hr on Vast.ai marketplace (spot/interruptible, L40S 48GB). For on-demand (non-interruptible) instances, the cheapest are FluidStack and TensorDock at $0.59/hr (L40S 48GB). Google Cloud g2-standard-4 (L4 24GB) is $0.70/hr on-demand, or approximately $0.32/hr with a 3-year Committed Use Discount. CoreWeave L40S is $0.99/hr. Scaleway L40S is $1.02/hr. IBM Cloud L40S is $1.83/hr. Note: most independent GPU clouds stock L40S (48GB Ada Lovelace) rather than L4 (24GB). GridStackHub tracks all L4 and L40S pricing daily.

What workloads is the NVIDIA L4 best for?

NVIDIA L4 (24GB GDDR6, Ada Lovelace) is best suited for: small to medium LLM inference (7B–13B models at FP16/FP8), video transcoding and encoding (AV1 hardware support), image generation (Stable Diffusion, FLUX.1 at standard resolutions), RAG embedding pipelines, and batch inference for smaller models at low cost. The L4 is NOT the right choice for: models larger than ~24GB in BF16 (L40S or H100 needed), high-throughput batch inference (H100/B200 are dramatically faster), or any serious training workload. L4's key advantages are 72W TDP (extremely low power, enables dense deployment) and cost — at $0.59–0.70/hr, it's the most affordable non-T4 inference GPU in any cloud.

Is L4 better than T4 for LLM inference in 2026?

Yes, L4 is significantly better than T4 for LLM inference in 2026. L4 has 24GB GDDR6 versus T4's 16GB — enabling larger models (Llama 8B BF16 at 16GB fits on L4 but not T4 without quantization). L4's Ada Lovelace architecture includes FP8 compute support and 242 TOPS INT8 versus T4's 130 TOPS — approximately 1.9× more throughput. L4 also has hardware AV1 encoding for multimodal workloads. Price-wise, T4 on-demand (AWS g4dn.xlarge) is ~$0.53/hr; L4 on-demand is ~$0.70/hr (GCP) or $0.59/hr on L40S equivalents — a modest premium for significantly more capability. For any new inference deployment in 2026, start with L4/L40S. T4 is only worth it if you're on existing T4 infrastructure and the migration cost isn't justified.

What LLM models run on a single NVIDIA L4 (24GB)?

NVIDIA L4 with 24GB GDDR6 can run: Llama 3 8B in BF16 (16GB weights — fits with 8GB for KV cache), Llama 3 8B in FP8 (8GB weights — excellent headroom), Mistral 7B in BF16 (14GB — comfortable), Llama 3.1 8B Instruct in FP16 (16GB — fits), Phi-3 Mini 3.8B (8GB — fits easily), and most models up to 13B parameters in 4-bit quantization (~7GB). Models that do NOT fit: Llama 3.1 70B (requires 140GB VRAM), Llama 3.1 13B in BF16 (26GB — too big for L4, use L40S). For 13B models, use L40S (48GB) instead — FluidStack and TensorDock offer L40S at $0.59/hr, the same as many L4 listings.

What is the L4 spot pricing on Vast.ai in 2026?

According to GridStackHub.ai data, L40S spot/marketplace pricing on Vast.ai in May 2026 is approximately $0.42/hr (typical; ranges $0.35–0.55/hr depending on host availability and time of day). Vast.ai is a peer-to-peer GPU marketplace — hosts set their own prices and can reclaim resources if they need them (spot/interruptible behavior). For batch jobs, fine-tuning, and image generation pipelines that can tolerate occasional interruptions, Vast.ai is the cheapest L4/L40S option by a significant margin. For production inference APIs that require guaranteed uptime, use FluidStack or TensorDock at $0.59/hr on-demand.

L4 vs L40S: which should I use for inference in 2026?

L4 (24GB GDDR6, $0.70/hr on GCP on-demand) vs L40S (48GB GDDR6, $0.59/hr at FluidStack/TensorDock on-demand): in 2026, L40S is often the better value. It costs less per hour at most independent cloud providers, has double the VRAM, 1.5× higher INT8 throughput (362 vs 242 TOPS), and supports larger models. The only advantage of true L4 is its 72W TDP versus L40S's 350W — relevant for on-premise deployment density, not for cloud pricing. For cloud inference, default to L40S at $0.59/hr over L4 at $0.70/hr. Use L4 specifically if you need Google Cloud's CUD pricing (down to ~$0.32/hr on 3yr) and your model fits in 24GB.

Full L4 and L40S pricing — updated daily

GridStackHub tracks L4, L40S, H100, B200, and every major GPU across 32+ cloud providers. See the complete live table and set alerts when prices change.

View Full GPU Pricing Database →

Cheapest A100 Cloud 2026 → | Cheapest B200 Cloud → | B200 vs H100 Cost →