Quick Answer: Yannic Kilcher runs a RTX 4090 (24GB VRAM total) configuration for ml research & paper reviews. This setup handles google/gemma-2-9b-it at 48 tokens/sec optimized for 7B-13B models.
Specs & Performance
| Component | Product | Price | Purchase |
|---|---|---|---|
| GPU | RTX 4090 For reproducing paper experiments | $1,599 | View on Amazon |
| CPU | AMD Ryzen 9 5950X 16-core, previous gen but powerful | $359 | View on Amazon |
| MOTHERBOARD | ASUS ROG Crosshair VIII Hero Premium X570 board | $380 | View on Amazon |
| Model | Quantization | Tokens/sec | VRAM Used |
|---|---|---|---|
| google/gemma-2-9b-it | FP16 | 48 tok/s | 20GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | FP16 | 52 tok/s | 19GB |
| EssentialAI/rnj-1 | FP16 | 55 tok/s | 19GB |
| NousResearch/Hermes-3-Llama-3.1-8B | FP16 | 54 tok/s | 17GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 50 tok/s | 17GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | 75 tok/s | 17GB |
| meta-llama/Llama-Guard-3-8B | FP16 | 74 tok/s | 17GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 74 tok/s | 17GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | FP16 | 72 tok/s | 17GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | 71 tok/s | 17GB |
No. This setup has 24GB VRAM but Llama 70B needs 40GB minimum. It can run Llama 13B and smaller models.
$2,998 total. Budget alternatives: Single RTX 4090 (~$4,200) or RTX 4080 (~$2,400) for smaller models.
Llama 405B and similar 400B+ models need 200GB+ VRAM (requires 8x A100 or H100 GPUs). This 24GB setup handles up to 70B models.
| RAM | 64GB DDR4-3600 DDR4 on AM4 platform | $160 | View on Amazon |
| STORAGE | Samsung 980 Pro 2TB Fast storage for datasets | $160 | View on Amazon |
| PSU | Seasonic Focus GX-850 850W 80+ Gold | $130 | View on Amazon |
| CASE | Lian Li Lancool II Mesh Great airflow at budget price | $110 | View on Amazon |
| COOLING | Noctua NH-U12A Compact but effective air cooler | $100 | View on Amazon |