Quick Answer: RTX 5090 offers 32GB VRAM and starts around $5196.32. It delivers approximately 395 tokens/sec on WeiboAI/VibeThinker-1.5B. It typically draws 575W under load.

RTX 5090

Unknown

By NVIDIAReleased 2025-01MSRP $1,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $5,196.32 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM32GB

Cores21,760

TDP575W

ArchitectureBlackwell

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

$5,196.32

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RTX 5090 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
WeiboAI/VibeThinker-1.5B	Q4	395.40 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	391.74 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	387.61 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	385.75 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	385.66 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	385.45 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	382.42 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	379.84 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	378.88 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	378.27 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	376.53 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	367.39 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	363.30 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	362.84 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	361.76 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	359.25 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	358.27 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	357.16 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	353.40 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	348.23 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	347.31 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-3-1b-it	Q4	342.38 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	338.05 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	337.52 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	333.26 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	331.77 tok/sEstimated Auto-generated benchmark	1GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	329.61 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/pythia-70m-deduped	Q4	329.40 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-1B	Q4	329.17 tok/sEstimated Auto-generated benchmark	1GB
allenai/Olmo-3-7B-Think	Q4	328.99 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	328.42 tok/sEstimated Auto-generated benchmark	4GB
inference-net/Schematron-3B	Q4	328.13 tok/sEstimated Auto-generated benchmark	2GB
openai-community/gpt2	Q4	328.10 tok/sEstimated Auto-generated benchmark	4GB
nari-labs/Dia2-2B	Q4	327.85 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-0.5B	Q4	327.73 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-mini-instruct	Q4	326.64 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2-2b-it	Q4	326.36 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-4B	Q4	326.12 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-1.5B	Q4	325.98 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-3.2-3B	Q4	325.57 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Embedding-0.6B	Q4	325.05 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-vision-instruct	Q4	323.72 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-4B	Q4	323.49 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Meta-Llama-3-8B	Q4	323.23 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	323.08 tok/sEstimated Auto-generated benchmark	4GB
microsoft/VibeVoice-1.5B	Q4	322.58 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-mini-instruct	Q4	322.16 tok/sEstimated Auto-generated benchmark	2GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	322.15 tok/sEstimated Auto-generated benchmark	3GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	322.14 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	319.72 tok/sEstimated Auto-generated benchmark	4GB

WeiboAI/VibeThinker-1.5B

1GB

395.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

391.74 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

387.61 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

385.75 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

385.66 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

385.45 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

382.42 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

379.84 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

378.88 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

378.27 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

376.53 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

367.39 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

363.30 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

362.84 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

361.76 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

359.25 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

358.27 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

357.16 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

353.40 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

348.23 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

347.31 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

342.38 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

338.05 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

337.52 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

333.26 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

331.77 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

329.61 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

329.40 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

329.17 tok/sEstimated

Auto-generated benchmark

allenai/Olmo-3-7B-Think

4GB

328.99 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

328.42 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

328.13 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

328.10 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

327.85 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

327.73 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

326.64 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

326.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

326.12 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

325.98 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

325.57 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

3GB

325.05 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

323.72 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

323.49 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

323.23 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

323.08 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

322.58 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

322.16 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

322.15 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

4GB

322.14 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

319.72 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	314.53 tok/sEstimated	4GB (have 32GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	199.31 tok/sEstimated	7GB (have 32GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	FP16	Fits comfortably	108.02 tok/sEstimated	15GB (have 32GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	276.78 tok/sEstimated	2GB (have 32GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	115.36 tok/sEstimated	9GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	331.77 tok/sEstimated	1GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	262.31 tok/sEstimated	1GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	141.96 tok/sEstimated	2GB (have 32GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	347.31 tok/sEstimated	2GB (have 32GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	262.24 tok/sEstimated	3GB (have 32GB)
meta-llama/Llama-3.2-3B-Instruct	FP16	Fits comfortably	135.14 tok/sEstimated	6GB (have 32GB)
vikhyatk/moondream2	Q4	Fits comfortably	293.81 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen3-4B	FP16	Fits comfortably	114.67 tok/sEstimated	9GB (have 32GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	170.80 tok/sEstimated	15GB (have 32GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	Fits comfortably	219.99 tok/sEstimated	4GB (have 32GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	104.76 tok/sEstimated	9GB (have 32GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	FP16	Fits comfortably	114.71 tok/sEstimated	17GB (have 32GB)
meta-llama/Llama-3.3-70B-Instruct	Q4	Not supported	114.96 tok/sEstimated	34GB (have 32GB)
Qwen/Qwen3-14B	FP16	Fits comfortably	83.71 tok/sEstimated	29GB (have 32GB)
Qwen/Qwen2.5-0.5B	Q4	Fits comfortably	327.73 tok/sEstimated	3GB (have 32GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	219.03 tok/sEstimated	5GB (have 32GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	108.81 tok/sEstimated	11GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Not supported	98.72 tok/sEstimated	34GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	80.46 tok/sEstimated	68GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Not supported	43.25 tok/sEstimated	137GB (have 32GB)
microsoft/phi-2	FP16	Fits comfortably	114.46 tok/sEstimated	15GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	281.33 tok/sEstimated	4GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	209.59 tok/sEstimated	7GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	FP16	Fits comfortably	108.62 tok/sEstimated	15GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	FP16	Fits comfortably	112.22 tok/sEstimated	17GB (have 32GB)
HuggingFaceTB/SmolLM2-135M	Q4	Fits comfortably	285.99 tok/sEstimated	4GB (have 32GB)
HuggingFaceTB/SmolLM2-135M	Q8	Fits comfortably	222.22 tok/sEstimated	7GB (have 32GB)
HuggingFaceTB/SmolLM2-135M	FP16	Fits comfortably	105.03 tok/sEstimated	15GB (have 32GB)
zai-org/GLM-4.6-FP8	Q4	Fits comfortably	293.47 tok/sEstimated	4GB (have 32GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	191.61 tok/sEstimated	7GB (have 32GB)
microsoft/DialoGPT-medium	Q8	Fits comfortably	223.10 tok/sEstimated	7GB (have 32GB)
microsoft/DialoGPT-medium	FP16	Fits comfortably	109.47 tok/sEstimated	15GB (have 32GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	318.24 tok/sEstimated	3GB (have 32GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	206.04 tok/sEstimated	5GB (have 32GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	379.84 tok/sEstimated	2GB (have 32GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	260.06 tok/sEstimated	3GB (have 32GB)
deepseek-ai/deepseek-coder-1.3b-instruct	FP16	Fits comfortably	138.21 tok/sEstimated	6GB (have 32GB)
microsoft/phi-4	Q4	Fits comfortably	299.03 tok/sEstimated	4GB (have 32GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	190.75 tok/sEstimated	7GB (have 32GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	118.91 tok/sEstimated	15GB (have 32GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	314.94 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Fits comfortably	96.11 tok/sEstimated	16GB (have 32GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Not supported	75.86 tok/sEstimated	33GB (have 32GB)
Qwen/Qwen2.5-32B-Instruct	FP16	Not supported	37.47 tok/sEstimated	66GB (have 32GB)
openai-community/gpt2	Q8	Fits comfortably	208.40 tok/sEstimated	7GB (have 32GB)

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 32GB available

314.53 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 32GB available

199.31 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5FP16

Fits comfortably15GB required · 32GB available

108.02 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 32GB available

276.78 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 32GB available

115.36 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 32GB available

331.77 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 32GB available

262.31 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 32GB available

141.96 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 32GB available

347.31 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 32GB available

262.24 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructFP16

Fits comfortably6GB required · 32GB available

135.14 tok/sEstimated

vikhyatk/moondream2Q4

Fits comfortably4GB required · 32GB available

293.81 tok/sEstimated

Qwen/Qwen3-4BFP16

Fits comfortably9GB required · 32GB available

114.67 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 32GB available

170.80 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8

Fits comfortably4GB required · 32GB available

219.99 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 32GB available

104.76 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitFP16

Fits comfortably17GB required · 32GB available

114.71 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructQ4

Not supported34GB required · 32GB available

114.96 tok/sEstimated

Qwen/Qwen3-14BFP16

Fits comfortably29GB required · 32GB available

83.71 tok/sEstimated

Qwen/Qwen2.5-0.5BQ4

Fits comfortably3GB required · 32GB available

327.73 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 32GB available

219.03 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 32GB available

108.81 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ4

Not supported34GB required · 32GB available

98.72 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported68GB required · 32GB available

80.46 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Not supported137GB required · 32GB available

43.25 tok/sEstimated

microsoft/phi-2FP16

Fits comfortably15GB required · 32GB available

114.46 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 32GB available

281.33 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 32GB available

209.59 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BFP16

Fits comfortably15GB required · 32GB available

108.62 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BFP16

Fits comfortably17GB required · 32GB available

112.22 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ4

Fits comfortably4GB required · 32GB available

285.99 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ8

Fits comfortably7GB required · 32GB available

222.22 tok/sEstimated

HuggingFaceTB/SmolLM2-135MFP16

Fits comfortably15GB required · 32GB available

105.03 tok/sEstimated

zai-org/GLM-4.6-FP8Q4

Fits comfortably4GB required · 32GB available

293.47 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 32GB available

191.61 tok/sEstimated

microsoft/DialoGPT-mediumQ8

Fits comfortably7GB required · 32GB available

223.10 tok/sEstimated

microsoft/DialoGPT-mediumFP16

Fits comfortably15GB required · 32GB available

109.47 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 32GB available

318.24 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 32GB available

206.04 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 32GB available

379.84 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 32GB available

260.06 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructFP16

Fits comfortably6GB required · 32GB available

138.21 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 32GB available

299.03 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 32GB available

190.75 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 32GB available

118.91 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 32GB available

314.94 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ4

Fits comfortably16GB required · 32GB available

96.11 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Not supported33GB required · 32GB available

75.86 tok/sEstimated

Qwen/Qwen2.5-32B-InstructFP16

Not supported66GB required · 32GB available

37.47 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 32GB available

208.40 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

Quick Answer: RTX 5090 offers 32GB VRAM and starts around $5196.32. It delivers approximately 395 tokens/sec on WeiboAI/VibeThinker-1.5B. It typically draws 575W under load.

RTX 5090

Unknown

By NVIDIAReleased 2025-01MSRP $1,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $5,196.32 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM32GB

Cores21,760

TDP575W

ArchitectureBlackwell

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

$5,196.32

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RTX 5090 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
WeiboAI/VibeThinker-1.5B	Q4	395.40 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	391.74 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	387.61 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	385.75 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	385.66 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	385.45 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	382.42 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	379.84 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	378.88 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	378.27 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	376.53 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	367.39 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	363.30 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	362.84 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	361.76 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	359.25 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	358.27 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	357.16 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	353.40 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	348.23 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	347.31 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-3-1b-it	Q4	342.38 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	338.05 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	337.52 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	333.26 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	331.77 tok/sEstimated Auto-generated benchmark	1GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	329.61 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/pythia-70m-deduped	Q4	329.40 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-1B	Q4	329.17 tok/sEstimated Auto-generated benchmark	1GB
allenai/Olmo-3-7B-Think	Q4	328.99 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	328.42 tok/sEstimated Auto-generated benchmark	4GB
inference-net/Schematron-3B	Q4	328.13 tok/sEstimated Auto-generated benchmark	2GB
openai-community/gpt2	Q4	328.10 tok/sEstimated Auto-generated benchmark	4GB
nari-labs/Dia2-2B	Q4	327.85 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-0.5B	Q4	327.73 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-mini-instruct	Q4	326.64 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2-2b-it	Q4	326.36 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-4B	Q4	326.12 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-1.5B	Q4	325.98 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-3.2-3B	Q4	325.57 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Embedding-0.6B	Q4	325.05 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-vision-instruct	Q4	323.72 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-4B	Q4	323.49 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Meta-Llama-3-8B	Q4	323.23 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	323.08 tok/sEstimated Auto-generated benchmark	4GB
microsoft/VibeVoice-1.5B	Q4	322.58 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-mini-instruct	Q4	322.16 tok/sEstimated Auto-generated benchmark	2GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	322.15 tok/sEstimated Auto-generated benchmark	3GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	322.14 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	319.72 tok/sEstimated Auto-generated benchmark	4GB

WeiboAI/VibeThinker-1.5B

1GB

395.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

391.74 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

387.61 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

385.75 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

385.66 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

385.45 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

382.42 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

379.84 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

378.88 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

378.27 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

376.53 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

367.39 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

363.30 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

362.84 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

361.76 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

359.25 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

358.27 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

357.16 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

353.40 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

348.23 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

347.31 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

342.38 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

338.05 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

337.52 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

333.26 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

331.77 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

329.61 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

329.40 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

329.17 tok/sEstimated

Auto-generated benchmark

allenai/Olmo-3-7B-Think

4GB

328.99 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

328.42 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

328.13 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

328.10 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

327.85 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

327.73 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

326.64 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

326.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

326.12 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

325.98 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

325.57 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

3GB

325.05 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

323.72 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

323.49 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

323.23 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

323.08 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

322.58 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

322.16 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

322.15 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

4GB

322.14 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

319.72 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	314.53 tok/sEstimated	4GB (have 32GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	199.31 tok/sEstimated	7GB (have 32GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	FP16	Fits comfortably	108.02 tok/sEstimated	15GB (have 32GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	276.78 tok/sEstimated	2GB (have 32GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	115.36 tok/sEstimated	9GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	331.77 tok/sEstimated	1GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	262.31 tok/sEstimated	1GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	141.96 tok/sEstimated	2GB (have 32GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	347.31 tok/sEstimated	2GB (have 32GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	262.24 tok/sEstimated	3GB (have 32GB)
meta-llama/Llama-3.2-3B-Instruct	FP16	Fits comfortably	135.14 tok/sEstimated	6GB (have 32GB)
vikhyatk/moondream2	Q4	Fits comfortably	293.81 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen3-4B	FP16	Fits comfortably	114.67 tok/sEstimated	9GB (have 32GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	170.80 tok/sEstimated	15GB (have 32GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	Fits comfortably	219.99 tok/sEstimated	4GB (have 32GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	104.76 tok/sEstimated	9GB (have 32GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	FP16	Fits comfortably	114.71 tok/sEstimated	17GB (have 32GB)
meta-llama/Llama-3.3-70B-Instruct	Q4	Not supported	114.96 tok/sEstimated	34GB (have 32GB)
Qwen/Qwen3-14B	FP16	Fits comfortably	83.71 tok/sEstimated	29GB (have 32GB)
Qwen/Qwen2.5-0.5B	Q4	Fits comfortably	327.73 tok/sEstimated	3GB (have 32GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	219.03 tok/sEstimated	5GB (have 32GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	108.81 tok/sEstimated	11GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Not supported	98.72 tok/sEstimated	34GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	80.46 tok/sEstimated	68GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Not supported	43.25 tok/sEstimated	137GB (have 32GB)
microsoft/phi-2	FP16	Fits comfortably	114.46 tok/sEstimated	15GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	281.33 tok/sEstimated	4GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	209.59 tok/sEstimated	7GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	FP16	Fits comfortably	108.62 tok/sEstimated	15GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	FP16	Fits comfortably	112.22 tok/sEstimated	17GB (have 32GB)
HuggingFaceTB/SmolLM2-135M	Q4	Fits comfortably	285.99 tok/sEstimated	4GB (have 32GB)
HuggingFaceTB/SmolLM2-135M	Q8	Fits comfortably	222.22 tok/sEstimated	7GB (have 32GB)
HuggingFaceTB/SmolLM2-135M	FP16	Fits comfortably	105.03 tok/sEstimated	15GB (have 32GB)
zai-org/GLM-4.6-FP8	Q4	Fits comfortably	293.47 tok/sEstimated	4GB (have 32GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	191.61 tok/sEstimated	7GB (have 32GB)
microsoft/DialoGPT-medium	Q8	Fits comfortably	223.10 tok/sEstimated	7GB (have 32GB)
microsoft/DialoGPT-medium	FP16	Fits comfortably	109.47 tok/sEstimated	15GB (have 32GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	318.24 tok/sEstimated	3GB (have 32GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	206.04 tok/sEstimated	5GB (have 32GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	379.84 tok/sEstimated	2GB (have 32GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	260.06 tok/sEstimated	3GB (have 32GB)
deepseek-ai/deepseek-coder-1.3b-instruct	FP16	Fits comfortably	138.21 tok/sEstimated	6GB (have 32GB)
microsoft/phi-4	Q4	Fits comfortably	299.03 tok/sEstimated	4GB (have 32GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	190.75 tok/sEstimated	7GB (have 32GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	118.91 tok/sEstimated	15GB (have 32GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	314.94 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Fits comfortably	96.11 tok/sEstimated	16GB (have 32GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Not supported	75.86 tok/sEstimated	33GB (have 32GB)
Qwen/Qwen2.5-32B-Instruct	FP16	Not supported	37.47 tok/sEstimated	66GB (have 32GB)
openai-community/gpt2	Q8	Fits comfortably	208.40 tok/sEstimated	7GB (have 32GB)

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 32GB available

314.53 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 32GB available

199.31 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5FP16

Fits comfortably15GB required · 32GB available

108.02 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 32GB available

276.78 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 32GB available

115.36 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 32GB available

331.77 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 32GB available

262.31 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 32GB available

141.96 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 32GB available

347.31 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 32GB available

262.24 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructFP16

Fits comfortably6GB required · 32GB available

135.14 tok/sEstimated

vikhyatk/moondream2Q4

Fits comfortably4GB required · 32GB available

293.81 tok/sEstimated

Qwen/Qwen3-4BFP16

Fits comfortably9GB required · 32GB available

114.67 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 32GB available

170.80 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8

Fits comfortably4GB required · 32GB available

219.99 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 32GB available

104.76 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitFP16

Fits comfortably17GB required · 32GB available

114.71 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructQ4

Not supported34GB required · 32GB available

114.96 tok/sEstimated

Qwen/Qwen3-14BFP16

Fits comfortably29GB required · 32GB available

83.71 tok/sEstimated

Qwen/Qwen2.5-0.5BQ4

Fits comfortably3GB required · 32GB available

327.73 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 32GB available

219.03 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 32GB available

108.81 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ4

Not supported34GB required · 32GB available

98.72 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported68GB required · 32GB available

80.46 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Not supported137GB required · 32GB available

43.25 tok/sEstimated

microsoft/phi-2FP16

Fits comfortably15GB required · 32GB available

114.46 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 32GB available

281.33 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 32GB available

209.59 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BFP16

Fits comfortably15GB required · 32GB available

108.62 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BFP16

Fits comfortably17GB required · 32GB available

112.22 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ4

Fits comfortably4GB required · 32GB available

285.99 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ8

Fits comfortably7GB required · 32GB available

222.22 tok/sEstimated

HuggingFaceTB/SmolLM2-135MFP16

Fits comfortably15GB required · 32GB available

105.03 tok/sEstimated

zai-org/GLM-4.6-FP8Q4

Fits comfortably4GB required · 32GB available

293.47 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 32GB available

191.61 tok/sEstimated

microsoft/DialoGPT-mediumQ8

Fits comfortably7GB required · 32GB available

223.10 tok/sEstimated

microsoft/DialoGPT-mediumFP16

Fits comfortably15GB required · 32GB available

109.47 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 32GB available

318.24 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 32GB available

206.04 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 32GB available

379.84 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 32GB available

260.06 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructFP16

Fits comfortably6GB required · 32GB available

138.21 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 32GB available

299.03 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 32GB available

190.75 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 32GB available

118.91 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 32GB available

314.94 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ4

Fits comfortably16GB required · 32GB available

96.11 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Not supported33GB required · 32GB available

75.86 tok/sEstimated

Qwen/Qwen2.5-32B-InstructFP16

Not supported66GB required · 32GB available

37.47 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 32GB available

208.40 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.