L
localai.computer
ModelsGPUsSystemsAI SetupsBuildsMethodology

Resources

  • Methodology
  • Submit Benchmark
  • About

Browse

  • AI Models
  • GPUs
  • PC Builds

Community

  • Leaderboard

Legal

  • Privacy
  • Terms
  • Contact

© 2025 localai.computer. Hardware recommendations for running AI models locally.

ℹ️We earn from qualifying purchases through affiliate links at no extra cost to you. This supports our free content and research.

  1. Home
  2. GPUs
  3. RTX 5090

Quick Answer: RTX 5090 offers 32GB VRAM and starts around $5196.32. It delivers approximately 395 tokens/sec on WeiboAI/VibeThinker-1.5B. It typically draws 575W under load.

RTX 5090

Unknown
By NVIDIAReleased 2025-01MSRP $1,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $5,196.32View Benchmarks
Specs snapshot
Key hardware metrics for AI workloads.
VRAM32GB
Cores21,760
TDP575W
ArchitectureBlackwell

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown
$5,196.32
Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RTX 5090 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hrRunPodfrom $0.30/hrLambda Labsenterprise-grade

AI benchmarks

ModelQuantizationTokens/secVRAM used
WeiboAI/VibeThinker-1.5BQ4
395.40 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen2.5-3BQ4
391.74 tok/sEstimated

Auto-generated benchmark

2GB
apple/OpenELM-1_1B-InstructQ4
387.61 tok/sEstimated

Auto-generated benchmark

1GB
google-t5/t5-3bQ4
385.75 tok/sEstimated

Auto-generated benchmark

2GB
ibm-research/PowerMoE-3bQ4
385.66 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/gemma-3-1b-itQ4
385.45 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/DeepSeek-OCRQ4
382.42 tok/sEstimated

Auto-generated benchmark

2GB
deepseek-ai/deepseek-coder-1.3b-instructQ4
379.84 tok/sEstimated

Auto-generated benchmark

2GB
google-bert/bert-base-uncasedQ4
378.88 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/Llama-3.2-3B-InstructQ4
378.27 tok/sEstimated

Auto-generated benchmark

2GB
tencent/HunyuanOCRQ4
376.53 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-2bQ4
367.39 tok/sEstimated

Auto-generated benchmark

1GB
allenai/OLMo-2-0425-1BQ4
363.30 tok/sEstimated

Auto-generated benchmark

1GB
google/embeddinggemma-300mQ4
362.84 tok/sEstimated

Auto-generated benchmark

1GB
bigcode/starcoder2-3bQ4
361.76 tok/sEstimated

Auto-generated benchmark

2GB
LiquidAI/LFM2-1.2BQ4
359.25 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-Guard-3-1BQ4
358.27 tok/sEstimated

Auto-generated benchmark

1GB
ibm-granite/granite-3.3-2b-instructQ4
357.16 tok/sEstimated

Auto-generated benchmark

1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4
353.40 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/Llama-3.2-1B-InstructQ4
348.23 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen2.5-3B-InstructQ4
347.31 tok/sEstimated

Auto-generated benchmark

2GB
google/gemma-3-1b-itQ4
342.38 tok/sEstimated

Auto-generated benchmark

1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4
338.05 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-3B-InstructQ4
337.52 tok/sEstimated

Auto-generated benchmark

2GB
facebook/sam3Q4
333.26 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-1B-InstructQ4
331.77 tok/sEstimated

Auto-generated benchmark

1GB
trl-internal-testing/tiny-random-LlamaForCausalLMQ4
329.61 tok/sEstimated

Auto-generated benchmark

4GB
EleutherAI/pythia-70m-dedupedQ4
329.40 tok/sEstimated

Auto-generated benchmark

4GB
meta-llama/Llama-3.2-1BQ4
329.17 tok/sEstimated

Auto-generated benchmark

1GB
allenai/Olmo-3-7B-ThinkQ4
328.99 tok/sEstimated

Auto-generated benchmark

4GB
deepseek-ai/DeepSeek-V3Q4
328.42 tok/sEstimated

Auto-generated benchmark

4GB
inference-net/Schematron-3BQ4
328.13 tok/sEstimated

Auto-generated benchmark

2GB
openai-community/gpt2Q4
328.10 tok/sEstimated

Auto-generated benchmark

4GB
nari-labs/Dia2-2BQ4
327.85 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen2.5-0.5BQ4
327.73 tok/sEstimated

Auto-generated benchmark

3GB
microsoft/Phi-3.5-mini-instructQ4
326.64 tok/sEstimated

Auto-generated benchmark

4GB
google/gemma-2-2b-itQ4
326.36 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen3-4BQ4
326.12 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen2.5-1.5BQ4
325.98 tok/sEstimated

Auto-generated benchmark

3GB
meta-llama/Llama-3.2-3BQ4
325.57 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen3-Embedding-0.6BQ4
325.05 tok/sEstimated

Auto-generated benchmark

3GB
microsoft/Phi-3.5-vision-instructQ4
323.72 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-Embedding-4BQ4
323.49 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Meta-Llama-3-8BQ4
323.23 tok/sEstimated

Auto-generated benchmark

4GB
petals-team/StableBeluga2Q4
323.08 tok/sEstimated

Auto-generated benchmark

4GB
microsoft/VibeVoice-1.5BQ4
322.58 tok/sEstimated

Auto-generated benchmark

3GB
microsoft/Phi-3.5-mini-instructQ4
322.16 tok/sEstimated

Auto-generated benchmark

2GB
Alibaba-NLP/gte-Qwen2-1.5B-instructQ4
322.15 tok/sEstimated

Auto-generated benchmark

3GB
unsloth/Meta-Llama-3.1-8B-InstructQ4
322.14 tok/sEstimated

Auto-generated benchmark

4GB
ibm-granite/granite-3.3-8b-instructQ4
319.72 tok/sEstimated

Auto-generated benchmark

4GB
WeiboAI/VibeThinker-1.5B
Q4
1GB
395.40 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B
Q4
2GB
391.74 tok/sEstimated
Auto-generated benchmark
apple/OpenELM-1_1B-Instruct
Q4
1GB
387.61 tok/sEstimated
Auto-generated benchmark
google-t5/t5-3b
Q4
2GB
385.75 tok/sEstimated
Auto-generated benchmark
ibm-research/PowerMoE-3b
Q4
2GB
385.66 tok/sEstimated
Auto-generated benchmark
unsloth/gemma-3-1b-it
Q4
1GB
385.45 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-OCR
Q4
2GB
382.42 tok/sEstimated
Auto-generated benchmark
deepseek-ai/deepseek-coder-1.3b-instruct
Q4
2GB
379.84 tok/sEstimated
Auto-generated benchmark
google-bert/bert-base-uncased
Q4
1GB
378.88 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-3B-Instruct
Q4
2GB
378.27 tok/sEstimated
Auto-generated benchmark
tencent/HunyuanOCR
Q4
1GB
376.53 tok/sEstimated
Auto-generated benchmark
google/gemma-2b
Q4
1GB
367.39 tok/sEstimated
Auto-generated benchmark
allenai/OLMo-2-0425-1B
Q4
1GB
363.30 tok/sEstimated
Auto-generated benchmark
google/embeddinggemma-300m
Q4
1GB
362.84 tok/sEstimated
Auto-generated benchmark
bigcode/starcoder2-3b
Q4
2GB
361.76 tok/sEstimated
Auto-generated benchmark
LiquidAI/LFM2-1.2B
Q4
1GB
359.25 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-Guard-3-1B
Q4
1GB
358.27 tok/sEstimated
Auto-generated benchmark
ibm-granite/granite-3.3-2b-instruct
Q4
1GB
357.16 tok/sEstimated
Auto-generated benchmark
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
Q4
2GB
353.40 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-1B-Instruct
Q4
1GB
348.23 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B-Instruct
Q4
2GB
347.31 tok/sEstimated
Auto-generated benchmark
google/gemma-3-1b-it
Q4
1GB
342.38 tok/sEstimated
Auto-generated benchmark
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Q4
1GB
338.05 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B-Instruct
Q4
2GB
337.52 tok/sEstimated
Auto-generated benchmark
facebook/sam3
Q4
1GB
333.26 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B-Instruct
Q4
1GB
331.77 tok/sEstimated
Auto-generated benchmark
trl-internal-testing/tiny-random-LlamaForCausalLM
Q4
4GB
329.61 tok/sEstimated
Auto-generated benchmark
EleutherAI/pythia-70m-deduped
Q4
4GB
329.40 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B
Q4
1GB
329.17 tok/sEstimated
Auto-generated benchmark
allenai/Olmo-3-7B-Think
Q4
4GB
328.99 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-V3
Q4
4GB
328.42 tok/sEstimated
Auto-generated benchmark
inference-net/Schematron-3B
Q4
2GB
328.13 tok/sEstimated
Auto-generated benchmark
openai-community/gpt2
Q4
4GB
328.10 tok/sEstimated
Auto-generated benchmark
nari-labs/Dia2-2B
Q4
2GB
327.85 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-0.5B
Q4
3GB
327.73 tok/sEstimated
Auto-generated benchmark
microsoft/Phi-3.5-mini-instruct
Q4
4GB
326.64 tok/sEstimated
Auto-generated benchmark
google/gemma-2-2b-it
Q4
1GB
326.36 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-4B
Q4
2GB
326.12 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-1.5B
Q4
3GB
325.98 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B
Q4
2GB
325.57 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-Embedding-0.6B
Q4
3GB
325.05 tok/sEstimated
Auto-generated benchmark
microsoft/Phi-3.5-vision-instruct
Q4
4GB
323.72 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-Embedding-4B
Q4
2GB
323.49 tok/sEstimated
Auto-generated benchmark
meta-llama/Meta-Llama-3-8B
Q4
4GB
323.23 tok/sEstimated
Auto-generated benchmark
petals-team/StableBeluga2
Q4
4GB
323.08 tok/sEstimated
Auto-generated benchmark
microsoft/VibeVoice-1.5B
Q4
3GB
322.58 tok/sEstimated
Auto-generated benchmark
microsoft/Phi-3.5-mini-instruct
Q4
2GB
322.16 tok/sEstimated
Auto-generated benchmark
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Q4
3GB
322.15 tok/sEstimated
Auto-generated benchmark
unsloth/Meta-Llama-3.1-8B-Instruct
Q4
4GB
322.14 tok/sEstimated
Auto-generated benchmark
ibm-granite/granite-3.3-8b-instruct
Q4
4GB
319.72 tok/sEstimated
Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

ModelQuantizationVerdictEstimated speedVRAM needed
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4Fits comfortably
314.53 tok/sEstimated
4GB (have 32GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8Fits comfortably
199.31 tok/sEstimated
7GB (have 32GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5FP16Fits comfortably
108.02 tok/sEstimated
15GB (have 32GB)
Qwen/Qwen3-4B-Instruct-2507Q4Fits comfortably
276.78 tok/sEstimated
2GB (have 32GB)
Qwen/Qwen3-4B-Instruct-2507FP16Fits comfortably
115.36 tok/sEstimated
9GB (have 32GB)
meta-llama/Llama-3.2-1B-InstructQ4Fits comfortably
331.77 tok/sEstimated
1GB (have 32GB)
meta-llama/Llama-3.2-1B-InstructQ8Fits comfortably
262.31 tok/sEstimated
1GB (have 32GB)
meta-llama/Llama-3.2-1B-InstructFP16Fits comfortably
141.96 tok/sEstimated
2GB (have 32GB)
Qwen/Qwen2.5-3B-InstructQ4Fits comfortably
347.31 tok/sEstimated
2GB (have 32GB)
meta-llama/Llama-3.2-3B-InstructQ8Fits comfortably
262.24 tok/sEstimated
3GB (have 32GB)
meta-llama/Llama-3.2-3B-InstructFP16Fits comfortably
135.14 tok/sEstimated
6GB (have 32GB)
vikhyatk/moondream2Q4Fits comfortably
293.81 tok/sEstimated
4GB (have 32GB)
Qwen/Qwen3-4BFP16Fits comfortably
114.67 tok/sEstimated
9GB (have 32GB)
Qwen/Qwen3-30B-A3B-Instruct-2507Q4Fits comfortably
170.80 tok/sEstimated
15GB (have 32GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8Fits comfortably
219.99 tok/sEstimated
4GB (have 32GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16Fits comfortably
104.76 tok/sEstimated
9GB (have 32GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitFP16Fits comfortably
114.71 tok/sEstimated
17GB (have 32GB)
meta-llama/Llama-3.3-70B-InstructQ4Not supported
114.96 tok/sEstimated
34GB (have 32GB)
Qwen/Qwen3-14BFP16Fits comfortably
83.71 tok/sEstimated
29GB (have 32GB)
Qwen/Qwen2.5-0.5BQ4Fits comfortably
327.73 tok/sEstimated
3GB (have 32GB)
Qwen/Qwen2.5-0.5BQ8Fits comfortably
219.03 tok/sEstimated
5GB (have 32GB)
Qwen/Qwen2.5-0.5BFP16Fits comfortably
108.81 tok/sEstimated
11GB (have 32GB)
meta-llama/Llama-3.1-70B-InstructQ4Not supported
98.72 tok/sEstimated
34GB (have 32GB)
meta-llama/Llama-3.1-70B-InstructQ8Not supported
80.46 tok/sEstimated
68GB (have 32GB)
meta-llama/Llama-3.1-70B-InstructFP16Not supported
43.25 tok/sEstimated
137GB (have 32GB)
microsoft/phi-2FP16Fits comfortably
114.46 tok/sEstimated
15GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4Fits comfortably
281.33 tok/sEstimated
4GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8Fits comfortably
209.59 tok/sEstimated
7GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7BFP16Fits comfortably
108.62 tok/sEstimated
15GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8BFP16Fits comfortably
112.22 tok/sEstimated
17GB (have 32GB)
HuggingFaceTB/SmolLM2-135MQ4Fits comfortably
285.99 tok/sEstimated
4GB (have 32GB)
HuggingFaceTB/SmolLM2-135MQ8Fits comfortably
222.22 tok/sEstimated
7GB (have 32GB)
HuggingFaceTB/SmolLM2-135MFP16Fits comfortably
105.03 tok/sEstimated
15GB (have 32GB)
zai-org/GLM-4.6-FP8Q4Fits comfortably
293.47 tok/sEstimated
4GB (have 32GB)
zai-org/GLM-4.6-FP8Q8Fits comfortably
191.61 tok/sEstimated
7GB (have 32GB)
microsoft/DialoGPT-mediumQ8Fits comfortably
223.10 tok/sEstimated
7GB (have 32GB)
microsoft/DialoGPT-mediumFP16Fits comfortably
109.47 tok/sEstimated
15GB (have 32GB)
Qwen/Qwen2-0.5BQ4Fits comfortably
318.24 tok/sEstimated
3GB (have 32GB)
Qwen/Qwen2-0.5BQ8Fits comfortably
206.04 tok/sEstimated
5GB (have 32GB)
deepseek-ai/deepseek-coder-1.3b-instructQ4Fits comfortably
379.84 tok/sEstimated
2GB (have 32GB)
deepseek-ai/deepseek-coder-1.3b-instructQ8Fits comfortably
260.06 tok/sEstimated
3GB (have 32GB)
deepseek-ai/deepseek-coder-1.3b-instructFP16Fits comfortably
138.21 tok/sEstimated
6GB (have 32GB)
microsoft/phi-4Q4Fits comfortably
299.03 tok/sEstimated
4GB (have 32GB)
deepseek-ai/DeepSeek-V3.1Q8Fits comfortably
190.75 tok/sEstimated
7GB (have 32GB)
deepseek-ai/DeepSeek-V3.1FP16Fits comfortably
118.91 tok/sEstimated
15GB (have 32GB)
meta-llama/Llama-3.1-8BQ4Fits comfortably
314.94 tok/sEstimated
4GB (have 32GB)
Qwen/Qwen2.5-32B-InstructQ4Fits comfortably
96.11 tok/sEstimated
16GB (have 32GB)
Qwen/Qwen2.5-32B-InstructQ8Not supported
75.86 tok/sEstimated
33GB (have 32GB)
Qwen/Qwen2.5-32B-InstructFP16Not supported
37.47 tok/sEstimated
66GB (have 32GB)
openai-community/gpt2Q8Fits comfortably
208.40 tok/sEstimated
7GB (have 32GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4
Fits comfortably4GB required · 32GB available
314.53 tok/sEstimated
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8
Fits comfortably7GB required · 32GB available
199.31 tok/sEstimated
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5FP16
Fits comfortably15GB required · 32GB available
108.02 tok/sEstimated
Qwen/Qwen3-4B-Instruct-2507Q4
Fits comfortably2GB required · 32GB available
276.78 tok/sEstimated
Qwen/Qwen3-4B-Instruct-2507FP16
Fits comfortably9GB required · 32GB available
115.36 tok/sEstimated
meta-llama/Llama-3.2-1B-InstructQ4
Fits comfortably1GB required · 32GB available
331.77 tok/sEstimated
meta-llama/Llama-3.2-1B-InstructQ8
Fits comfortably1GB required · 32GB available
262.31 tok/sEstimated
meta-llama/Llama-3.2-1B-InstructFP16
Fits comfortably2GB required · 32GB available
141.96 tok/sEstimated
Qwen/Qwen2.5-3B-InstructQ4
Fits comfortably2GB required · 32GB available
347.31 tok/sEstimated
meta-llama/Llama-3.2-3B-InstructQ8
Fits comfortably3GB required · 32GB available
262.24 tok/sEstimated
meta-llama/Llama-3.2-3B-InstructFP16
Fits comfortably6GB required · 32GB available
135.14 tok/sEstimated
vikhyatk/moondream2Q4
Fits comfortably4GB required · 32GB available
293.81 tok/sEstimated
Qwen/Qwen3-4BFP16
Fits comfortably9GB required · 32GB available
114.67 tok/sEstimated
Qwen/Qwen3-30B-A3B-Instruct-2507Q4
Fits comfortably15GB required · 32GB available
170.80 tok/sEstimated
kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8
Fits comfortably4GB required · 32GB available
219.99 tok/sEstimated
kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16
Fits comfortably9GB required · 32GB available
104.76 tok/sEstimated
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitFP16
Fits comfortably17GB required · 32GB available
114.71 tok/sEstimated
meta-llama/Llama-3.3-70B-InstructQ4
Not supported34GB required · 32GB available
114.96 tok/sEstimated
Qwen/Qwen3-14BFP16
Fits comfortably29GB required · 32GB available
83.71 tok/sEstimated
Qwen/Qwen2.5-0.5BQ4
Fits comfortably3GB required · 32GB available
327.73 tok/sEstimated
Qwen/Qwen2.5-0.5BQ8
Fits comfortably5GB required · 32GB available
219.03 tok/sEstimated
Qwen/Qwen2.5-0.5BFP16
Fits comfortably11GB required · 32GB available
108.81 tok/sEstimated
meta-llama/Llama-3.1-70B-InstructQ4
Not supported34GB required · 32GB available
98.72 tok/sEstimated
meta-llama/Llama-3.1-70B-InstructQ8
Not supported68GB required · 32GB available
80.46 tok/sEstimated
meta-llama/Llama-3.1-70B-InstructFP16
Not supported137GB required · 32GB available
43.25 tok/sEstimated
microsoft/phi-2FP16
Fits comfortably15GB required · 32GB available
114.46 tok/sEstimated
deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4
Fits comfortably4GB required · 32GB available
281.33 tok/sEstimated
deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8
Fits comfortably7GB required · 32GB available
209.59 tok/sEstimated
deepseek-ai/DeepSeek-R1-Distill-Qwen-7BFP16
Fits comfortably15GB required · 32GB available
108.62 tok/sEstimated
deepseek-ai/DeepSeek-R1-Distill-Llama-8BFP16
Fits comfortably17GB required · 32GB available
112.22 tok/sEstimated
HuggingFaceTB/SmolLM2-135MQ4
Fits comfortably4GB required · 32GB available
285.99 tok/sEstimated
HuggingFaceTB/SmolLM2-135MQ8
Fits comfortably7GB required · 32GB available
222.22 tok/sEstimated
HuggingFaceTB/SmolLM2-135MFP16
Fits comfortably15GB required · 32GB available
105.03 tok/sEstimated
zai-org/GLM-4.6-FP8Q4
Fits comfortably4GB required · 32GB available
293.47 tok/sEstimated
zai-org/GLM-4.6-FP8Q8
Fits comfortably7GB required · 32GB available
191.61 tok/sEstimated
microsoft/DialoGPT-mediumQ8
Fits comfortably7GB required · 32GB available
223.10 tok/sEstimated
microsoft/DialoGPT-mediumFP16
Fits comfortably15GB required · 32GB available
109.47 tok/sEstimated
Qwen/Qwen2-0.5BQ4
Fits comfortably3GB required · 32GB available
318.24 tok/sEstimated
Qwen/Qwen2-0.5BQ8
Fits comfortably5GB required · 32GB available
206.04 tok/sEstimated
deepseek-ai/deepseek-coder-1.3b-instructQ4
Fits comfortably2GB required · 32GB available
379.84 tok/sEstimated
deepseek-ai/deepseek-coder-1.3b-instructQ8
Fits comfortably3GB required · 32GB available
260.06 tok/sEstimated
deepseek-ai/deepseek-coder-1.3b-instructFP16
Fits comfortably6GB required · 32GB available
138.21 tok/sEstimated
microsoft/phi-4Q4
Fits comfortably4GB required · 32GB available
299.03 tok/sEstimated
deepseek-ai/DeepSeek-V3.1Q8
Fits comfortably7GB required · 32GB available
190.75 tok/sEstimated
deepseek-ai/DeepSeek-V3.1FP16
Fits comfortably15GB required · 32GB available
118.91 tok/sEstimated
meta-llama/Llama-3.1-8BQ4
Fits comfortably4GB required · 32GB available
314.94 tok/sEstimated
Qwen/Qwen2.5-32B-InstructQ4
Fits comfortably16GB required · 32GB available
96.11 tok/sEstimated
Qwen/Qwen2.5-32B-InstructQ8
Not supported33GB required · 32GB available
75.86 tok/sEstimated
Qwen/Qwen2.5-32B-InstructFP16
Not supported66GB required · 32GB available
37.47 tok/sEstimated
openai-community/gpt2Q8
Fits comfortably7GB required · 32GB available
208.40 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070
12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB
16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT
16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super
12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080
10GB

Explore how RTX 3080 stacks up for local inference workloads.