AI Infrastructure Costs Explained: GPU, Network, Storage, Energy (2026)

Q: What are the main components of AI infrastructure cost?

The four main cost components are: (1) GPU compute (typically 60–80% of total cost), (2) storage (10–20%), (3) networking and egress (5–15%), and (4) ancillary costs like CPU instances, load balancers, and monitoring (5–10%). GPU compute dominates for training; storage and egress can matter more for high-throughput inference.

Q: How much does it cost to run an LLM inference endpoint?

A single H100 serving a 70B parameter model at ~100 requests/minute costs approximately $2.23/hr for the GPU (CoreWeave on-demand). Storage for model weights: ~$0.02/hr for 140GB. Networking: varies by traffic volume. Total: ~$2.25/hr for a single-GPU inference endpoint, or ~$19,692/year.

Q: How much does storage cost for AI workloads?

Model weights for a 70B parameter model occupy ~140GB in fp16. At $0.023/GB-month (S3 standard), that is $3.22/month. Training datasets are the real cost driver: a 1TB pretraining dataset costs ~$23/month to store. Object storage (S3, R2, GCS) at $0.015–0.023/GB-month is the standard approach.

GPU compute gets all the attention — but it's only part of the picture. Understanding the full AI infrastructure cost stack helps you optimize the right things and avoid surprises on your cloud bill.

GPU Compute: 60–80% of Total Cost

GPU compute is the dominant cost for almost every AI workload. The wide range reflects workload type:

Training-heavy workloads: GPU is 75–85% of total cost. You're running GPUs continuously, storage and network are secondary.
Inference at scale: GPU drops to 60–70% when serving millions of requests. Egress and networking grow as a share.
Data pipeline workloads: GPU may be 50–60% with significant storage and CPU costs.

See live GPU prices: GPU Pricing Comparison →

Use our AI model cost breakdown for specific model cost estimates.

Networking and Egress: 5–15% of Total Cost

Egress (data leaving a cloud region) is the most commonly underestimated cost in AI infrastructure:

Provider	Egress Rate	Notes
AWS	$0.09/GB after 100GB/mo	First 100GB free
Google Cloud	$0.08/GB (US)	Higher for inter-region
Azure	$0.087/GB	First 100GB free
CoreWeave	$0.05/GB	Lower than hyperscalers
Lambda Labs	Free up to 10TB/mo	Best for data-heavy workloads
Cloudflare R2	$0.00/GB	Zero egress — popular for model weights

For a 70B model serving 10,000 requests/day at 2KB per response, egress is ~20GB/day ($1.80/day on AWS). Multiply by 365: $657/year just in egress. Use zero-egress storage (R2, Backblaze) for model weights and training data where possible.

Storage: 10–20% of Total Cost

Storage Type	Cost	Use Case
S3 / GCS (object)	$0.023/GB-month	Training data, model checkpoints
Cloudflare R2	$0.015/GB-month	Model weights (zero egress)
NVMe local (cloud)	$0.10–0.25/GB-month	Active training data
EFS / NFS	$0.30/GB-month	Shared datasets across nodes

Sizing tip: a 70B model in fp16 = ~140GB. In fp8 = ~70GB. Quantization halves your storage costs and often speeds up inference.

Energy Costs: Embedded in GPU Pricing

For cloud GPU users, energy costs are embedded in the provider's pricing — you don't pay electricity bills directly. But energy costs affect which providers are cheapest in which regions.

A single H100 draws approximately 700W (SXM5 spec). At a typical data center PUE of 1.3:

Effective power draw per H100: ~910W
Annual energy per H100 (24/7): ~7,970 kWh
At Texas industrial rates ($0.045/kWh): ~$359/year electricity cost per GPU
At California commercial rates ($0.18/kWh): ~$1,435/year per GPU

This 4× electricity cost differential explains why Texas and other low-cost power states attract GPU cloud buildouts. See: Best States for Data Centers 2026 →

Full Cost Example: 70B Model Inference Endpoint

Component	Spec	Monthly Cost	% of Total
GPU compute	2× H100 on CoreWeave (reserved)	$2,570	78%
Storage	500GB (model weights + cache)	$12	<1%
Networking	~600GB/mo egress	$30	1%
CPU instances	Load balancer + management	$80	2%
Monitoring	Datadog / Prometheus	$60	2%
Total		$2,752/mo	100%

Frequently Asked Questions

What are the main components of AI infrastructure cost?

The four main cost components are: (1) GPU compute (typically 60–80% of total cost), (2) storage (10–20%), (3) networking and egress (5–15%), and (4) ancillary costs like CPU instances, load balancers, and monitoring (5–10%). GPU compute dominates for training; storage and egress can matter more for high-throughput inference.

How much does it cost to run an LLM inference endpoint?

A single H100 serving a 70B parameter model at ~100 requests/minute costs approximately $2.23/hr for the GPU (CoreWeave on-demand). Storage for model weights: ~$0.02/hr for 140GB. Networking: varies by traffic volume. Total: ~$2.25/hr for a single-GPU inference endpoint, or ~$19,692/year.

How much does storage cost for AI workloads?

Model weights for a 70B parameter model occupy ~140GB in fp16. At $0.023/GB-month (S3 standard), that is $3.22/month. Training datasets are the real cost driver: a 1TB pretraining dataset costs ~$23/month to store. Object storage (S3, R2, GCS) at $0.015–0.023/GB-month is the standard approach.

Calculate your full AI infrastructure cost

GPU, storage, egress — estimate total monthly costs across all providers.

Open Cost Calculator →

AI Infrastructure Costs Explained:GPU, Network, Storage, Energy (2026)