Transparency

Benchmark Methodology

How we calculate, verify, and present performance data so you can size your local AI rigs with confidence.

Transparency is critical when helping you choose hardware. Here's exactly how we calculate, verify, and present performance data.

Three Types of Benchmarks

Verified

Hardware we've personally tested in controlled conditions. These are the gold standard.

Criteria: Consistent across 5+ runs, documented test conditions, verified hardware configuration.

Community Verified

Results submitted by the community and verified by our moderation team.

Criteria: Screenshot provided, results match expected range (±30%), hardware configuration verified.

Estimated

Algorithmic estimates based on hardware specifications. Use as a rough guide only.

Accuracy: Typically within ±40% of real-world performance. Better for relative comparisons than absolute numbers.

When real benchmark data isn't available, we use this formula:

tokensPerSec = (GPU_cores / 10,000) × 50 × model_penalty × quant_multiplier

Base Speed: GPU compute power (CUDA cores / stream processors). RTX 4090 baseline: ~82 tokens/sec; RTX 4080 baseline: ~49 tokens/sec.
Model Penalty: Larger models process slower—7B ≈ 100% base, 70B ≈ 40–60%, 180B ≈ 20–30%.
Quantization Multiplier: Q4 = 100%, Q8 = 70%, FP16 = 40%.
Randomness: ±10% variation to simulate real-world variance.

Important: These are rough approximations. Real performance depends on CPU, RAM speed, thermals, drivers, and software configuration.

Estimated benchmarks can be off by 40% or more. That's why we're building a community-driven benchmark database.

~35 tokens/sec

RTX 4080 + Llama 70B Q4

28 tokens/sec

Real test result (insufficient 16GB VRAM)

Help make this data better for everyone. When you submit a benchmark:

Hardware Specifications: TechPowerUp GPU Database (industry standard reference)
AI Model Information: HuggingFace Model Hub (official source)
Pricing Data: Amazon, Newegg, Best Buy, Micro Center (updated weekly)
Benchmark Data: Community submissions, Reddit (r/LocalLLaMA), Discord (llama.cpp), GitHub issues

We're continuously improving our methodology:

Questions about our methodology? Contact us

Loading content...