What AI models fit on a single AMD MI300X?

A single AMD MI300X with 192GB HBM3 can fit models up to approximately 130–140B parameters in full BF16 precision, or up to ~350B parameters in 4-bit quantization. This includes: Llama 3 70B (fits comfortably at BF16), Llama 3 405B (fits in 4-bit), Falcon 180B (fits in 4-bit), and most 7B–34B instruction-tuned models with headroom to spare. By contrast, the H100 80GB requires multi-GPU tensor parallelism for anything above ~40B parameters at BF16 — adding latency, cost, and operational complexity that the MI300X eliminates entirely.

AMD MI300X Cloud Pricing 2026 — From $1.85/hr, 192GB HBM3

Q: Which cloud providers offer AMD MI300X rental in 2026?

As of April 2026, AMD MI300X cloud availability is still growing. GridStackHub tracks MI300X pricing from Thunder Compute ($1.85/hr, on-demand), RunPod ($3.49/hr, on-demand), and CoreWeave ($2.50/hr, estimated). Several additional providers are expected to add MI300X capacity throughout 2026 as AMD ramps Instinct GPU production following Meta's 6GW MI300X deployment announcement in February 2026. Check the GridStackHub calculator for the latest availability.

The AMD MI300X is AMD's highest-performing GPU built specifically for AI inference workloads. Launched in late 2023 and ramping through 2024–2025, the MI300X packs 192GB of HBM3 memory onto a single card — 2.4× what the NVIDIA H100 SXM5 offers. This memory capacity is the MI300X's defining advantage: models up to approximately 130–140 billion parameters fit in full BF16 precision on a single MI300X, something that requires multi-GPU tensor parallelism on an H100. For teams running Llama 3 70B, Mistral 8×22B, or any 70–405B class model, a single MI300X can replace what would otherwise require two to four H100s — at a lower total hourly cost and with less operational complexity. This page tracks every live MI300X cloud rental available in 2026, from verified on-demand prices to provider availability. Whether you're evaluating MI300X vs H100 for your inference stack, calculating cost-per-token for a 405B model, or building multi-GPU clusters for a large MoE deployment, GridStackHub's MI300X pricing database has the data you need to make the call.

Live data — MI300X pricing updated daily from provider APIs

AMD MI300X Technical Specifications

AMD's Instinct MI300X is built on a 5nm CDNA 3 architecture with 304 compute units, 192GB HBM3 memory, and a 5.3 TB/s memory bandwidth. Here is the full spec sheet:

Specification	AMD MI300X
Architecture	CDNA 3 (5nm)
Compute Units	304 CUs
FP8 Peak Throughput	2,610 TFLOPS
BF16 Peak Throughput	1,307 TFLOPS
FP32 Peak Throughput	163.4 TFLOPS
GPU Memory	192 GB HBM3
Memory Bandwidth	5.3 TB/s
Memory Interface	4,096-bit HBM3
TDP	750W (socket-configured)
Infinity Fabric (CPU-GPU)	PCIe Gen 5 x16 (up to 128 GB/s bidirectional)
ROCm Support	ROCm 6.x — PyTorch, TensorFlow, JAX, vLLM
Best Single-GPU Fit	≤140B params at BF16; ≤350B params at 4-bit
Multi-GPU Interconnect	AMD Infinity Fabric (via MI300X numa links)
Min Cloud Price	$1.85/hr (Thunder Compute, verified)
Cost per GB VRAM	$0.0096/GB (55% cheaper than H100)

AMD MI300X: Best For — Workload Guide

The MI300X's 192GB of HBM3 and 5.3 TB/s bandwidth are not neutral advantages — they are decisive for specific workload types. Here is a direct guide to what the MI300X is built for:

Use Case	Why MI300X Wins	Example Workloads
Large-Model Inference (70B+ BF16)	Single-GPU fit — no tensor parallelism needed	Llama 3 70B, Mistral 8×22B, Command R+
Ultra-Large Models (130B+ BF16)	192GB accommodates 130–140B at BF16; H100 needs 2+ GPUs	Llama 3 405B (4-bit), Falcon 180B (4-bit)
Long-Context Inference	5.3 TB/s bandwidth; KV cache fits on-chip longer	100K+ token contexts, RAG pipelines
Memory-Bound Serving	2.4× more memory than H100; 58% higher bandwidth	Autoregressive decoding, embedding models
Multi-GPU MoE Inference	Fewer GPUs needed per MoE shard; cost scales better	Mixtral, DeepSeek-V2, DBRX
Lower $/GB Inference	$0.0096/GB vs H100's $0.0218/GB — 55% cheaper	Any memory-bound inference at scale
ROCm-Native Stacks	No CUDA lock-in; PyTorch/vLLM run natively on ROCm	New deployments, AMD-positive cloud providers

Decision rule: If your model fits in 80GB, H100 is often faster. If your model needs more than 80GB at your target precision, MI300X wins on cost, latency, and operational simplicity. Run the numbers in the GridStackHub calculator before committing.

$0.0096/GB

The AMD MI300X's VRAM cost at best available pricing — 55% cheaper per GB of GPU memory than H100 ($0.0218/GB). Same transformer attention math. More memory per dollar, fewer GPUs required for large models.

AMD MI300X Cloud Pricing — Live Table (April 2026)

Three cloud providers currently list AMD MI300X (192GB HBM3) capacity in GridStackHub's database. Availability is growing rapidly as AMD scales Instinct production following Meta's 6GW deployment announcement in February 2026.

Provider	MI300X 192GB	Pricing Type	Status	Notes
Thunder Compute	$1.85/hr	On-demand	VERIFIED	Cheapest available; bare-metal MI300X
CoreWeave	$2.50/hr	On-demand	ESTIMATE	HPC-grade networking; enterprise SLAs
RunPod	$3.49/hr	On-demand	VERIFIED	Flexible billing; serverless option available

Data sourced from GridStackHub's live pricing database, April 20, 2026. VERIFIED = confirmed via live provider API. ESTIMATE = sourced from public pricing pages; verify before committing. More providers being tracked as MI300X availability expands.

MI300X availability is expanding fast. Following Meta's February 2026 announcement of 6 gigawatts of AMD Instinct GPU deployments, cloud providers are rushing to add MI300X capacity. GridStackHub tracks new providers as they list MI300X — set a price alert to be notified when availability opens at new providers.

AMD MI300X vs NVIDIA H100: Full Comparison

The MI300X and H100 are both frontier GPUs, but they are built for different jobs. Here is a complete side-by-side on the specs that matter for AI workloads:

Spec	AMD MI300X	NVIDIA H100 SXM5	Winner
GPU Memory	192 GB HBM3	80 GB HBM3	MI300X +2.4×
Memory Bandwidth	5.3 TB/s	3.35 TB/s	MI300X +58%
FP8 Throughput	2,610 TFLOPS	3,958 TFLOPS	H100 +52%
BF16 Throughput	1,307 TFLOPS	1,979 TFLOPS	H100 +51%
Min Cloud Price	$1.85/hr	$1.74/hr	H100 (marginally)
Cost per GB VRAM	$0.0096/GB	$0.0218/GB	MI300X −55%
Models > 130B params (single GPU)	Yes (BF16/FP8)	No (requires multi-GPU)	MI300X
Tensor Parallelism Needed (>70B)	Often not required	Usually required	MI300X
Software Ecosystem	ROCm (maturing)	CUDA (dominant)	H100
Cloud Availability	Growing (3+ providers)	Broad (10+ providers)	H100
TDP (Power)	750W	700W	H100 (marginally)

The story in one sentence: H100 wins on raw compute throughput and software maturity; MI300X wins on memory capacity and memory bandwidth. Which GPU you choose depends entirely on whether your workload is compute-bound or memory-bound.

When MI300X Beats H100: Use Case Guide

The decision rule is straightforward: if your model fits in 80GB, start with H100. If it doesn't — or if memory bandwidth is your primary bottleneck — the MI300X is likely the better and cheaper choice.

Choose MI300X when:

You're serving models above 70B parameters at BF16 precision. Llama 3 70B requires ~140GB at BF16, fitting comfortably on a single MI300X. On H100, you'd need two GPUs with tensor parallelism, doubling cost and adding ~15–25% latency overhead from cross-GPU communication.
You're running 405B or 180B models in 4-bit quantization. Llama 3.1 405B in AWQ/GPTQ fits on one MI300X (192GB). On H100, you'd need 3–4 GPUs — a 3–4× cost multiplier.
Memory bandwidth is your inference bottleneck. For memory-bound workloads (autoregressive decoding, long-context inference), the MI300X's 5.3 TB/s bandwidth can deliver faster token generation than H100's 3.35 TB/s — even if raw TFLOPS favor H100.
You want the lowest cost-per-GB of working memory. At $0.0096/GB vs $0.0218/GB, MI300X is 55% cheaper per GB of GPU memory. For context-heavy inference workloads, this difference compounds significantly at scale.
You're building with ROCm-compatible frameworks. PyTorch, vLLM, and llama.cpp all have ROCm support. If your stack is already tested with ROCm, MI300X runs with minimal changes.

Choose H100 when:

You're training or fine-tuning models at scale. H100's higher FP8/BF16 throughput and more mature NVLink fabric make it the faster, more cost-effective choice for multi-GPU training runs.
Your stack is CUDA-dependent. If you're using CUDA-specific libraries, custom kernels, or NVIDIA-only tools (TensorRT, Triton, cuDNN), switching to MI300X adds migration overhead that may outweigh the cost advantage.
Your model is under 70B parameters and compute-bound. Smaller models that fit in 80GB of VRAM run faster on H100 due to higher raw TFLOPS. The MI300X's memory advantage doesn't help workloads that aren't memory-constrained.
You need broad provider choice and guaranteed availability. H100 is available from 10+ providers with established SLAs. MI300X has fewer options today, though availability is growing fast.
Enterprise compliance or specific cloud integrations are required. AWS, GCP, and Azure offer H100 with enterprise SLAs, compliance certifications, and ecosystem integrations that MI300X providers don't yet match.

The real MI300X opportunity: Most teams running 70B+ models today are paying for 2–4× H100s with tensor parallelism and accepting the latency overhead. A single MI300X eliminates that complexity at a lower total cost. If you have a model above 70B parameters, run the numbers — it's often a slam dunk.

Run the MI300X numbers for your workload

Compare MI300X against H100, A100, and 26+ other providers in the GridStackHub calculator.

Open Calculator →

Track GPU Prices — Free → | Full GPU Cost Per Hour →

What Models Fit on a Single AMD MI300X?

192GB of HBM3 is a significant amount of GPU memory. Here is a practical reference for what fits at various precision levels:

Model	Parameters	BF16 (2 bytes/param)	4-bit (0.5 bytes/param)	Fits on MI300X?
Llama 3.2 3B	3B	~6 GB	~1.5 GB	Yes — massive headroom
Llama 3.1 8B	8B	~16 GB	~4 GB	Yes — run multiple replicas
Llama 3.1 70B	70B	~140 GB	~35 GB	Yes — BF16 fits with KV cache room
Falcon 180B	180B	~360 GB (needs 2×)	~90 GB	4-bit: Yes — BF16 needs 2 GPUs
Llama 3.1 405B	405B	~810 GB (needs 5×)	~202 GB	4-bit: marginal (needs careful KV cache tuning)
Mixtral 8×22B (MoE)	141B active	~282 GB (needs 2×)	~71 GB	4-bit: Yes — BF16 needs 2 GPUs

*Memory estimates include model weights only. Production inference also requires KV cache (scales with batch size and context length). Estimate 20–40% additional memory overhead for KV cache in typical serving configurations.

Frequently Asked Questions

Is AMD MI300X better than NVIDIA H100 for AI inference in 2026?

For models that require more than 80GB of GPU memory — such as Llama 3 405B, Falcon 180B, or any model above ~130B parameters — the AMD MI300X is the clear winner: its 192GB HBM3 fits the entire model on a single GPU, eliminating the tensor parallelism overhead required by H100. For smaller models (under 70B parameters) that fit easily in 80GB, the H100 remains faster per dollar due to its higher compute throughput and more mature CUDA software ecosystem. The MI300X's primary advantage is memory capacity, not raw throughput.

Why is AMD MI300X cheaper per GB of VRAM than H100?

The AMD MI300X offers 192GB of HBM3 memory — 2.4× the capacity of the H100 SXM5 80GB — but currently prices at a lower cost-per-GB because AMD cloud availability is newer and faces less demand than NVIDIA's established H100 ecosystem. The best MI300X on-demand price is $1.85/hr ($0.0096/GB), versus H100's cheapest $1.74/hr ($0.0218/GB). That is a 55% lower cost per GB of GPU memory, making MI300X dramatically more efficient for memory-bound workloads like large-model inference. This gap may close as demand grows, making 2026 a window to lock in favorable MI300X pricing.

Which cloud providers offer AMD MI300X rental in 2026?

As of April 2026, GridStackHub tracks AMD MI300X pricing from three providers: Thunder Compute ($1.85/hr, verified on-demand), CoreWeave ($2.50/hr, estimated), and RunPod ($3.49/hr, verified on-demand). Several additional providers are expected to add MI300X capacity throughout 2026 as AMD ramps Instinct GPU production following Meta's 6GW MI300X deployment announcement in February 2026. Set a price alert on GridStackHub to be notified when new providers list MI300X capacity or prices drop.

Does AMD ROCm support the same frameworks as NVIDIA CUDA?

ROCm has improved dramatically. PyTorch, TensorFlow, JAX, and vLLM all have official ROCm support as of 2026. The HuggingFace transformers library runs on ROCm without code changes. llama.cpp has MI300X-optimized builds. However, custom CUDA kernels, TensorRT, and certain NVIDIA-specific libraries (cuDNN, NCCL) require porting effort. For standard inference serving with PyTorch or vLLM, the migration to MI300X is low-friction. For teams with custom CUDA training code or specialized NVIDIA tooling, the software migration cost should be factored into the ROI calculation.

Track MI300X Prices and Get Alerts

MI300X availability and pricing are moving fast in 2026. As more providers add capacity and AMD scales production, prices will shift. GridStackHub tracks every change — here is how to stay ahead of it:

Get MI300X price alerts

We'll notify you when MI300X prices drop, new providers list capacity, or a better deal appears. Free — no credit card required.

Compare MI300X vs H100 vs A100 for your workload

Set your model size, hours per month, and precision — see exact monthly cost for every provider in 60 seconds.

Compare Grid Stacks →

Track GPU Prices — Free → | Cost to Run AI Models → | Full GPU Cost Comparison →