Quick Answer: PewDiePie runs a 10x RTX 4090 (192GB VRAM total) configuration for ai content creation & local llm development. This setup handles XiaomiMiMo/MiMo-V2-Flash at 65.1 tokens/sec and can run 70B models locally.
Specs & Performance
| Component | Product | Price | Purchase |
|---|---|---|---|
| GPU #1 | RTX 4090×8 Custom modified with 48GB VRAM each (originally 24GB), aggressively undervolted to ~175W per card | $20,000 $2,500 each | View on Amazon |
| GPU #2 | RTX 4000 Ada 20GB×2 Additional GPUs for workload distribution | $2,200 $1,100 each | View on Amazon |
| Model | Quantization | Tokens/sec | VRAM Used |
|---|---|---|---|
| XiaomiMiMo/MiMo-V2-Flash | Q4 | 65.1 tok/s | 174GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | 83.8 tok/s | 115GB |
| Qwen/Qwen3-235B-A22B | Q4 | 68.1 tok/s | 115GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | 108.5 tok/s | 138GB |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | 84.5 tok/s | 120GB |
| openai/gpt-oss-120b | Q8 | 77.2 tok/s | 117GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | FP16 | 42.5 tok/s | 176GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | 43.7 tok/s | 156GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | FP16 | 43.6 tok/s | 156GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | FP16 | 41.2 tok/s | 156GB |
Yes. 10x RTX 4090 provides 192GB VRAM, enough for Llama 70B Q4 quantization (needs ~40GB). Expect 65.1 tok/s with tensor parallelism.
$39,500 total. Budget alternatives: Single RTX 4090 (~$4,200) or RTX 4080 (~$2,400) for smaller models.
Llama 405B and similar 400B+ models need 200GB+ VRAM (requires 8x A100 or H100 GPUs). This 192GB setup handles up to 70B models.
Yes for 70B models. Single GPU runs 70B at ~35 tok/s vs 65.1 tok/s multi-GPU. For 7B-13B models, single GPU is sufficient.
| CPU | AMD Threadripper PRO 7985WX 64-core, 128-thread processor with 128 PCIe 5.0 lanes for multi-GPU communication | $10,000 | View on Amazon |
| MOTHERBOARD | Asus Pro WS WRX90E-SAGE SE 7x PCIe 5.0 x16 slots, designed for multi-GPU workstation setups | $1,200 | View on Amazon |
| RAM | 512GB DDR5 High-capacity memory for loading large language models | $1,800 | View on Amazon |
| STORAGE | Multiple M.2 NVMe SSDs Fast storage array for model weights and datasets | $1,500 | View on Amazon |
| PSU | Seasonic PRIME TX-1300 1300W×2 Dual PSU setup providing 2600W total, 80+ Titanium efficiency | $800 $400 each | View on Amazon |
| COOLING | Custom Liquid Cooling Setup Advanced cooling solution for 10-GPU thermal management | $2,000 | View on Amazon |