Quick Answer: Fireship runs a RTX 4080 Super (16GB VRAM total) configuration for web development & ai education. This setup handles Qwen/Qwen2.5-7B-Instruct at 44 tokens/sec optimized for 7B-13B models.
Specs & Performance
| Component | Product | Price | Purchase |
|---|---|---|---|
| GPU | RTX 4080 Super Great balance for AI demos and video rendering | $999 | View on Amazon |
| CPU | Intel Core i9-13900K 24-core for fast video encoding | $499 | View on Amazon |
| MOTHERBOARD | ASUS ROG Strix Z790-E Premium Z790 with WiFi 6E | $500 |
| Model | Quantization | Tokens/sec | VRAM Used |
|---|---|---|---|
| Qwen/Qwen2.5-7B-Instruct | FP16 | 44 tok/s | 16GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | 53 tok/s | 15GB |
| HuggingFaceTB/SmolLM-135M | FP16 | 53 tok/s | 15GB |
| liuhaotian/llava-v1.5-7b | FP16 | 53 tok/s | 15GB |
| openai-community/gpt2-medium | FP16 | 53 tok/s | 15GB |
| mistralai/Mistral-7B-Instruct-v0.1 | FP16 | 53 tok/s | 15GB |
| tencent/HunyuanVideo-1.5 | FP16 | 53 tok/s | 16GB |
| llamafactory/tiny-random-Llama-3 | FP16 | 52 tok/s | 15GB |
| deepseek-ai/DeepSeek-R1-0528 | FP16 | 52 tok/s | 15GB |
| numind/NuExtract-1.5 | FP16 | 52 tok/s | 15GB |
No. This setup has 16GB VRAM but Llama 70B needs 40GB minimum. It can run Llama 13B and smaller models.
$2,838 total. Budget alternatives: Single RTX 4090 (~$4,200) or RTX 4080 (~$2,400) for smaller models.
Llama 405B and similar 400B+ models need 200GB+ VRAM (requires 8x A100 or H100 GPUs). This 16GB setup handles up to 70B models.
| RAM | 64GB DDR5-6000 Fast for video editing | $250 | View on Amazon |
| STORAGE | Samsung 990 Pro 2TB Fast for video project files | $180 | View on Amazon |
| PSU | Corsair RM850x 850W 80+ Gold, quiet operation | $130 | View on Amazon |
| CASE | NZXT H7 Flow Clean aesthetic, good airflow | $130 | View on Amazon |
| COOLING | NZXT Kraken X63 280mm AIO for cool temps | $150 | View on Amazon |