LLM Hardware Guide
Build the perfect PC for running language models
- VRAM is the primary bottleneck - GPU selection is the most important decision
- Formula: VRAM (GB) ≈ Parameters (B) × 0.5 for Q4 quantization
- RTX 4070 Ti Super 16GB is the sweet spot for most users
- RTX 4090 24GB is required for 70B models at good speeds
- 32GB system RAM is minimum recommended, 64GB for 70B models
VRAM Requirements
GPU VRAM is the primary bottleneck for running LLMs. Understanding VRAM requirements helps you choose the right hardware.
How to Calculate VRAM Needs
Rough formula: VRAM (GB) ≈ Parameters (B) × 0.5 for Q4 quantization, × 1.0 for Q8, × 2.0 for FP16. Example: Llama 3.1 70B needs ~35GB for Q4, ~70GB for Q8.
Model Size to VRAM Mapping
7-8B models: 4-6GB (Q4). 13B models: 8-10GB (Q4). 32-34B models: 16-20GB (Q4). 70B models: 35-40GB (Q4). 405B models: 200GB+ (requires multiple GPUs).
Context Length Impact
Longer context requires more VRAM. 8K context adds ~1GB per 10B parameters. 32K context adds ~4GB. 128K context can double VRAM needs.
GPU Selection
NVIDIA dominates for LLMs due to CUDA ecosystem. AMD is catching up with ROCm.
Budget Tier ($250-400)
Intel Arc B580 12GB ($249) - Best value 12GB. RTX 3060 12GB ($270-350) - Proven, great CUDA support. RX 7600 8GB ($250) - Gaming focus, limited AI.
Mid Tier ($400-900)
RTX 4060 Ti 16GB ($450) - Cheapest 16GB NVIDIA. RTX 4070 Super 12GB ($600) - Fast but limited by 12GB. RTX 4070 Ti Super 16GB ($800) - Sweet spot for 32B models.
High End ($900-2000)
RX 7900 XTX 24GB ($900) - Best value 24GB. RTX 4080 Super 16GB ($1000) - Fast but only 16GB. RTX 4090 24GB ($1600) - Best consumer GPU for LLMs.
Professional ($3000+)
RTX 6000 Ada 48GB ($6000) - Double 4090 VRAM. A100 80GB ($15000+) - Enterprise standard. H100 80GB ($30000+) - Fastest training.
RTX 3060 12GB
RTX 4060 Ti 16GB
RTX 4070 Super
RTX 4070 Ti Super
RTX 4080 Super
RTX 4090
RX 7600
RX 7900 XTX
Intel Arc B580
RTX 6000 Ada
NVIDIA A100 80GB
NVIDIA H100 80GB
System RAM
System RAM matters for loading models and handling context.
Minimum Requirements
16GB for 7B models. 32GB for 13-32B models. 64GB for 70B+ models. Models partially load to RAM before GPU transfer.
CPU Offloading
When VRAM is insufficient, layers can offload to RAM. This is slow (10-100x) but enables running larger models. 128GB+ RAM enables 70B models on 12GB GPUs (very slow).
Speed Considerations
DDR5 is 20-30% faster than DDR4 for CPU inference. For GPU inference, RAM speed matters less. Prioritize GPU VRAM budget over RAM speed.
Storage Requirements
LLM model files are large. Fast storage improves loading times.
Space Requirements
7B model: 4-8GB. 13B model: 8-15GB. 70B model: 35-70GB. Complete model collection: 500GB-2TB. NVMe SSD strongly recommended.
Loading Speed Impact
NVMe loads 70B in ~30 seconds. SATA SSD: ~60 seconds. HDD: 3-5 minutes. Model loading is one-time, but frequent switching benefits from fast storage.
Complete Build Examples
Recommended builds at different price points.
Budget Build ($800)
RTX 3060 12GB ($300) + Ryzen 5 5600 ($120) + 32GB DDR4 ($70) + 1TB NVMe ($80) + B550 motherboard ($100) + 650W PSU ($70) + Case ($60). Runs 7B-13B models well.
Recommended Build ($1500)
RTX 4070 Ti Super 16GB ($800) + Ryzen 7 7700X ($300) + 32GB DDR5 ($100) + 2TB NVMe ($150) + B650 motherboard ($150). Runs 32B models, fast inference.
Enthusiast Build ($2500)
RTX 4090 24GB ($1600) + Ryzen 7 7800X3D ($400) + 64GB DDR5 ($200) + 2TB NVMe ($150) + X670 motherboard ($200). Runs 70B models, no compromises.
RTX 3060 12GB
RTX 4070 Ti Super
RTX 4090
Frequently Asked Questions
Related Guides & Resources
Ready to Get Started?
Check our step-by-step setup guides and GPU recommendations.