L
localai.computer
ModelsGPUsSystemsAI SetupsBuildsMethodology

Resources

  • Methodology
  • Submit Benchmark
  • About

Browse

  • AI Models
  • GPUs
  • PC Builds

Community

  • Leaderboard

Legal

  • Privacy
  • Terms
  • Contact

© 2025 localai.computer. Hardware recommendations for running AI models locally.

ℹ️We earn from qualifying purchases through affiliate links at no extra cost to you. This supports our free content and research.

  1. Home
  2. GPUs
  3. RTX 4090

Quick Answer: RTX 4090 offers 24GB VRAM and starts around current market pricing. It delivers approximately 237 tokens/sec on meta-llama/Llama-3.2-1B. It typically draws 450W under load.

RTX 4090

Unknown
By NVIDIAReleased 2022-10MSRP $1,599.00

RTX 4090 remains the go-to GPU for local AI workloads. It runs every mainstream 70B model, sustains the fastest consumer inference speeds, and anchors premium builds that scale to production deployments.

Check Price on AmazonView Benchmarks
Specs snapshot
Key hardware metrics for AI workloads.
VRAM24GB
Cores16,384
TDP450W
ArchitectureAda Lovelace

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown
See price on Amazon
Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RTX 4090 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hrRunPodfrom $0.30/hrLambda Labsenterprise-grade

AI benchmarks

ModelQuantizationTokens/secVRAM used
meta-llama/Llama-3.2-1BQ4
236.93 tok/sEstimated

Auto-generated benchmark

1GB
tencent/HunyuanOCRQ4
236.31 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/DeepSeek-OCRQ4
234.17 tok/sEstimated

Auto-generated benchmark

2GB
google-bert/bert-base-uncasedQ4
233.28 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-3-1b-itQ4
232.46 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-Guard-3-1BQ4
231.79 tok/sEstimated

Auto-generated benchmark

1GB
bigcode/starcoder2-3bQ4
231.78 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/gemma-3-1b-itQ4
231.53 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen2.5-3B-InstructQ4
230.47 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen2.5-3BQ4
229.75 tok/sEstimated

Auto-generated benchmark

2GB
WeiboAI/VibeThinker-1.5BQ4
229.43 tok/sEstimated

Auto-generated benchmark

1GB
ibm-granite/granite-3.3-2b-instructQ4
226.87 tok/sEstimated

Auto-generated benchmark

1GB
nari-labs/Dia2-2BQ4
225.67 tok/sEstimated

Auto-generated benchmark

2GB
google/embeddinggemma-300mQ4
224.35 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-2-2b-itQ4
223.94 tok/sEstimated

Auto-generated benchmark

1GB
apple/OpenELM-1_1B-InstructQ4
222.41 tok/sEstimated

Auto-generated benchmark

1GB
inference-net/Schematron-3BQ4
220.50 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-3B-InstructQ4
212.85 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/Llama-3.2-3B-InstructQ4
211.46 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/Llama-3.2-1B-InstructQ4
208.71 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-3BQ4
206.65 tok/sEstimated

Auto-generated benchmark

2GB
google/gemma-2bQ4
206.11 tok/sEstimated

Auto-generated benchmark

1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4
205.12 tok/sEstimated

Auto-generated benchmark

1GB
facebook/sam3Q4
204.34 tok/sEstimated

Auto-generated benchmark

1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4
204.34 tok/sEstimated

Auto-generated benchmark

2GB
LiquidAI/LFM2-1.2BQ4
200.86 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/deepseek-coder-1.3b-instructQ4
200.35 tok/sEstimated

Auto-generated benchmark

2GB
google-t5/t5-3bQ4
200.19 tok/sEstimated

Auto-generated benchmark

2GB
ibm-research/PowerMoE-3bQ4
199.86 tok/sEstimated

Auto-generated benchmark

2GB
Alibaba-NLP/gte-Qwen2-1.5B-instructQ4
197.47 tok/sEstimated

Auto-generated benchmark

3GB
MiniMaxAI/MiniMax-M2Q4
197.36 tok/sEstimated

Auto-generated benchmark

4GB
petals-team/StableBeluga2Q4
197.28 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-4BQ4
197.19 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen3-Embedding-8BQ4
197.13 tok/sEstimated

Auto-generated benchmark

4GB
allenai/OLMo-2-0425-1BQ4
196.96 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen3-0.6B-BaseQ4
196.79 tok/sEstimated

Auto-generated benchmark

3GB
Qwen/Qwen3-4B-Thinking-2507-FP8Q4
196.43 tok/sEstimated

Auto-generated benchmark

2GB
openai-community/gpt2Q4
196.28 tok/sEstimated

Auto-generated benchmark

4GB
dicta-il/dictalm2.0-instructQ4
196.11 tok/sEstimated

Auto-generated benchmark

4GB
openai-community/gpt2-xlQ4
195.90 tok/sEstimated

Auto-generated benchmark

4GB
zai-org/GLM-4.5-AirQ4
195.68 tok/sEstimated

Auto-generated benchmark

4GB
deepseek-ai/DeepSeek-V3Q4
195.65 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2.5-7B-InstructQ4
194.82 tok/sEstimated

Auto-generated benchmark

4GB
meta-llama/Llama-3.2-1B-InstructQ4
194.55 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen2.5-7BQ4
193.74 tok/sEstimated

Auto-generated benchmark

4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4
193.51 tok/sEstimated

Auto-generated benchmark

4GB
swiss-ai/Apertus-8B-Instruct-2509Q4
193.22 tok/sEstimated

Auto-generated benchmark

4GB
black-forest-labs/FLUX.2-devQ4
193.09 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-8BQ4
192.14 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-1.7BQ4
191.89 tok/sEstimated

Auto-generated benchmark

4GB
meta-llama/Llama-3.2-1B
Q4
1GB
236.93 tok/sEstimated
Auto-generated benchmark
tencent/HunyuanOCR
Q4
1GB
236.31 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-OCR
Q4
2GB
234.17 tok/sEstimated
Auto-generated benchmark
google-bert/bert-base-uncased
Q4
1GB
233.28 tok/sEstimated
Auto-generated benchmark
google/gemma-3-1b-it
Q4
1GB
232.46 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-Guard-3-1B
Q4
1GB
231.79 tok/sEstimated
Auto-generated benchmark
bigcode/starcoder2-3b
Q4
2GB
231.78 tok/sEstimated
Auto-generated benchmark
unsloth/gemma-3-1b-it
Q4
1GB
231.53 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B-Instruct
Q4
2GB
230.47 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B
Q4
2GB
229.75 tok/sEstimated
Auto-generated benchmark
WeiboAI/VibeThinker-1.5B
Q4
1GB
229.43 tok/sEstimated
Auto-generated benchmark
ibm-granite/granite-3.3-2b-instruct
Q4
1GB
226.87 tok/sEstimated
Auto-generated benchmark
nari-labs/Dia2-2B
Q4
2GB
225.67 tok/sEstimated
Auto-generated benchmark
google/embeddinggemma-300m
Q4
1GB
224.35 tok/sEstimated
Auto-generated benchmark
google/gemma-2-2b-it
Q4
1GB
223.94 tok/sEstimated
Auto-generated benchmark
apple/OpenELM-1_1B-Instruct
Q4
1GB
222.41 tok/sEstimated
Auto-generated benchmark
inference-net/Schematron-3B
Q4
2GB
220.50 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B-Instruct
Q4
2GB
212.85 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-3B-Instruct
Q4
2GB
211.46 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-1B-Instruct
Q4
1GB
208.71 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B
Q4
2GB
206.65 tok/sEstimated
Auto-generated benchmark
google/gemma-2b
Q4
1GB
206.11 tok/sEstimated
Auto-generated benchmark
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Q4
1GB
205.12 tok/sEstimated
Auto-generated benchmark
facebook/sam3
Q4
1GB
204.34 tok/sEstimated
Auto-generated benchmark
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
Q4
2GB
204.34 tok/sEstimated
Auto-generated benchmark
LiquidAI/LFM2-1.2B
Q4
1GB
200.86 tok/sEstimated
Auto-generated benchmark
deepseek-ai/deepseek-coder-1.3b-instruct
Q4
2GB
200.35 tok/sEstimated
Auto-generated benchmark
google-t5/t5-3b
Q4
2GB
200.19 tok/sEstimated
Auto-generated benchmark
ibm-research/PowerMoE-3b
Q4
2GB
199.86 tok/sEstimated
Auto-generated benchmark
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Q4
3GB
197.47 tok/sEstimated
Auto-generated benchmark
MiniMaxAI/MiniMax-M2
Q4
4GB
197.36 tok/sEstimated
Auto-generated benchmark
petals-team/StableBeluga2
Q4
4GB
197.28 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-4B
Q4
2GB
197.19 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-Embedding-8B
Q4
4GB
197.13 tok/sEstimated
Auto-generated benchmark
allenai/OLMo-2-0425-1B
Q4
1GB
196.96 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-0.6B-Base
Q4
3GB
196.79 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-4B-Thinking-2507-FP8
Q4
2GB
196.43 tok/sEstimated
Auto-generated benchmark
openai-community/gpt2
Q4
4GB
196.28 tok/sEstimated
Auto-generated benchmark
dicta-il/dictalm2.0-instruct
Q4
4GB
196.11 tok/sEstimated
Auto-generated benchmark
openai-community/gpt2-xl
Q4
4GB
195.90 tok/sEstimated
Auto-generated benchmark
zai-org/GLM-4.5-Air
Q4
4GB
195.68 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-V3
Q4
4GB
195.65 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-7B-Instruct
Q4
4GB
194.82 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B-Instruct
Q4
1GB
194.55 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-7B
Q4
4GB
193.74 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Q4
4GB
193.51 tok/sEstimated
Auto-generated benchmark
swiss-ai/Apertus-8B-Instruct-2509
Q4
4GB
193.22 tok/sEstimated
Auto-generated benchmark
black-forest-labs/FLUX.2-dev
Q4
4GB
193.09 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-8B
Q4
4GB
192.14 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-1.7B
Q4
4GB
191.89 tok/sEstimated
Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

ModelQuantizationVerdictEstimated speedVRAM needed
microsoft/Phi-3.5-mini-instructQ4Fits comfortably
178.16 tok/sEstimated
2GB (have 24GB)
facebook/sam3Q8Fits comfortably
151.19 tok/sEstimated
1GB (have 24GB)
AI-MO/Kimina-Prover-72BFP16Not supported
14.46 tok/sEstimated
141GB (have 24GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ4Fits comfortably
89.81 tok/sEstimated
15GB (have 24GB)
ai-forever/ruGPT-3.5-13BQ4Fits comfortably
132.50 tok/sEstimated
7GB (have 24GB)
Qwen/Qwen2.5-72B-InstructQ4Not supported
34.18 tok/sEstimated
36GB (have 24GB)
ibm-research/PowerMoE-3bQ8Fits comfortably
144.77 tok/sEstimated
3GB (have 24GB)
IlyaGusev/saiga_llama3_8bQ4Fits comfortably
169.42 tok/sEstimated
4GB (have 24GB)
Qwen/Qwen2-7B-InstructFP16Fits comfortably
62.57 tok/sEstimated
15GB (have 24GB)
Qwen/Qwen3-4B-BaseQ4Fits comfortably
178.74 tok/sEstimated
2GB (have 24GB)
Qwen/Qwen3-4B-BaseQ8Fits comfortably
123.29 tok/sEstimated
4GB (have 24GB)
Qwen/Qwen2.5-14BQ4Fits comfortably
140.37 tok/sEstimated
7GB (have 24GB)
Qwen/Qwen2.5-14BQ8Fits comfortably
93.37 tok/sEstimated
14GB (have 24GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2Q4Fits comfortably
184.39 tok/sEstimated
4GB (have 24GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2Q8Fits comfortably
121.22 tok/sEstimated
7GB (have 24GB)
apple/OpenELM-1_1B-InstructFP16Fits comfortably
82.25 tok/sEstimated
2GB (have 24GB)
AI-MO/Kimina-Prover-72BQ4Not supported
34.32 tok/sEstimated
35GB (have 24GB)
moonshotai/Kimi-K2-ThinkingQ8Not supported
45.69 tok/sEstimated
978GB (have 24GB)
moonshotai/Kimi-K2-ThinkingFP16Not supported
23.80 tok/sEstimated
1956GB (have 24GB)
deepseek-ai/DeepSeek-Math-V2Q8Not supported
18.46 tok/sEstimated
766GB (have 24GB)
deepseek-ai/DeepSeek-Math-V2FP16Not supported
10.24 tok/sEstimated
1532GB (have 24GB)
Tongyi-MAI/Z-Image-TurboQ4Fits comfortably
178.19 tok/sEstimated
4GB (have 24GB)
Tongyi-MAI/Z-Image-TurboQ8Fits comfortably
127.07 tok/sEstimated
8GB (have 24GB)
Tongyi-MAI/Z-Image-TurboFP16Fits comfortably
74.91 tok/sEstimated
16GB (have 24GB)
tencent/HunyuanOCRQ8Fits comfortably
164.42 tok/sEstimated
2GB (have 24GB)
facebook/sam3FP16Fits comfortably
88.62 tok/sEstimated
2GB (have 24GB)
MiniMaxAI/MiniMax-VL-01Q4Not supported
22.57 tok/sEstimated
256GB (have 24GB)
MiniMaxAI/MiniMax-VL-01Q8Not supported
13.66 tok/sEstimated
511GB (have 24GB)
MiniMaxAI/MiniMax-VL-01FP16Not supported
7.41 tok/sEstimated
1021GB (have 24GB)
MiniMaxAI/MiniMax-M1-40kQ4Not supported
20.51 tok/sEstimated
255GB (have 24GB)
MiniMaxAI/MiniMax-M1-40kQ8Not supported
16.20 tok/sEstimated
510GB (have 24GB)
MiniMaxAI/MiniMax-M1-40kFP16Not supported
8.61 tok/sEstimated
1020GB (have 24GB)
WeiboAI/VibeThinker-1.5BQ4Fits comfortably
229.43 tok/sEstimated
1GB (have 24GB)
WeiboAI/VibeThinker-1.5BQ8Fits comfortably
162.28 tok/sEstimated
2GB (have 24GB)
WeiboAI/VibeThinker-1.5BFP16Fits comfortably
87.59 tok/sEstimated
4GB (have 24GB)
tencent/HunyuanVideo-1.5Q4Fits comfortably
174.11 tok/sEstimated
4GB (have 24GB)
tencent/HunyuanVideo-1.5Q8Fits comfortably
132.51 tok/sEstimated
8GB (have 24GB)
tencent/HunyuanVideo-1.5FP16Fits comfortably
68.66 tok/sEstimated
16GB (have 24GB)
nari-labs/Dia2-2BQ4Fits comfortably
225.67 tok/sEstimated
2GB (have 24GB)
nari-labs/Dia2-2BQ8Fits comfortably
159.64 tok/sEstimated
3GB (have 24GB)
nari-labs/Dia2-2BFP16Fits comfortably
84.00 tok/sEstimated
5GB (have 24GB)
unsloth/Llama-3.2-1B-InstructQ4Fits comfortably
208.71 tok/sEstimated
1GB (have 24GB)
unsloth/Llama-3.2-1B-InstructQ8Fits comfortably
145.22 tok/sEstimated
1GB (have 24GB)
unsloth/Llama-3.2-1B-InstructFP16Fits comfortably
86.07 tok/sEstimated
2GB (have 24GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4Fits comfortably
183.82 tok/sEstimated
4GB (have 24GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ8Fits comfortably
133.08 tok/sEstimated
9GB (have 24GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitFP16Fits comfortably
68.16 tok/sEstimated
17GB (have 24GB)
Qwen/Qwen3-235B-A22BQ4Not supported
22.33 tok/sEstimated
115GB (have 24GB)
ibm-granite/granite-docling-258MFP16Fits comfortably
74.85 tok/sEstimated
15GB (have 24GB)
google/gemma-2-9b-itFP16Fits comfortably
47.99 tok/sEstimated
20GB (have 24GB)
microsoft/Phi-3.5-mini-instructQ4
Fits comfortably2GB required · 24GB available
178.16 tok/sEstimated
facebook/sam3Q8
Fits comfortably1GB required · 24GB available
151.19 tok/sEstimated
AI-MO/Kimina-Prover-72BFP16
Not supported141GB required · 24GB available
14.46 tok/sEstimated
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ4
Fits comfortably15GB required · 24GB available
89.81 tok/sEstimated
ai-forever/ruGPT-3.5-13BQ4
Fits comfortably7GB required · 24GB available
132.50 tok/sEstimated
Qwen/Qwen2.5-72B-InstructQ4
Not supported36GB required · 24GB available
34.18 tok/sEstimated
ibm-research/PowerMoE-3bQ8
Fits comfortably3GB required · 24GB available
144.77 tok/sEstimated
IlyaGusev/saiga_llama3_8bQ4
Fits comfortably4GB required · 24GB available
169.42 tok/sEstimated
Qwen/Qwen2-7B-InstructFP16
Fits comfortably15GB required · 24GB available
62.57 tok/sEstimated
Qwen/Qwen3-4B-BaseQ4
Fits comfortably2GB required · 24GB available
178.74 tok/sEstimated
Qwen/Qwen3-4B-BaseQ8
Fits comfortably4GB required · 24GB available
123.29 tok/sEstimated
Qwen/Qwen2.5-14BQ4
Fits comfortably7GB required · 24GB available
140.37 tok/sEstimated
Qwen/Qwen2.5-14BQ8
Fits comfortably14GB required · 24GB available
93.37 tok/sEstimated
trl-internal-testing/tiny-LlamaForCausalLM-3.2Q4
Fits comfortably4GB required · 24GB available
184.39 tok/sEstimated
trl-internal-testing/tiny-LlamaForCausalLM-3.2Q8
Fits comfortably7GB required · 24GB available
121.22 tok/sEstimated
apple/OpenELM-1_1B-InstructFP16
Fits comfortably2GB required · 24GB available
82.25 tok/sEstimated
AI-MO/Kimina-Prover-72BQ4
Not supported35GB required · 24GB available
34.32 tok/sEstimated
moonshotai/Kimi-K2-ThinkingQ8
Not supported978GB required · 24GB available
45.69 tok/sEstimated
moonshotai/Kimi-K2-ThinkingFP16
Not supported1956GB required · 24GB available
23.80 tok/sEstimated
deepseek-ai/DeepSeek-Math-V2Q8
Not supported766GB required · 24GB available
18.46 tok/sEstimated
deepseek-ai/DeepSeek-Math-V2FP16
Not supported1532GB required · 24GB available
10.24 tok/sEstimated
Tongyi-MAI/Z-Image-TurboQ4
Fits comfortably4GB required · 24GB available
178.19 tok/sEstimated
Tongyi-MAI/Z-Image-TurboQ8
Fits comfortably8GB required · 24GB available
127.07 tok/sEstimated
Tongyi-MAI/Z-Image-TurboFP16
Fits comfortably16GB required · 24GB available
74.91 tok/sEstimated
tencent/HunyuanOCRQ8
Fits comfortably2GB required · 24GB available
164.42 tok/sEstimated
facebook/sam3FP16
Fits comfortably2GB required · 24GB available
88.62 tok/sEstimated
MiniMaxAI/MiniMax-VL-01Q4
Not supported256GB required · 24GB available
22.57 tok/sEstimated
MiniMaxAI/MiniMax-VL-01Q8
Not supported511GB required · 24GB available
13.66 tok/sEstimated
MiniMaxAI/MiniMax-VL-01FP16
Not supported1021GB required · 24GB available
7.41 tok/sEstimated
MiniMaxAI/MiniMax-M1-40kQ4
Not supported255GB required · 24GB available
20.51 tok/sEstimated
MiniMaxAI/MiniMax-M1-40kQ8
Not supported510GB required · 24GB available
16.20 tok/sEstimated
MiniMaxAI/MiniMax-M1-40kFP16
Not supported1020GB required · 24GB available
8.61 tok/sEstimated
WeiboAI/VibeThinker-1.5BQ4
Fits comfortably1GB required · 24GB available
229.43 tok/sEstimated
WeiboAI/VibeThinker-1.5BQ8
Fits comfortably2GB required · 24GB available
162.28 tok/sEstimated
WeiboAI/VibeThinker-1.5BFP16
Fits comfortably4GB required · 24GB available
87.59 tok/sEstimated
tencent/HunyuanVideo-1.5Q4
Fits comfortably4GB required · 24GB available
174.11 tok/sEstimated
tencent/HunyuanVideo-1.5Q8
Fits comfortably8GB required · 24GB available
132.51 tok/sEstimated
tencent/HunyuanVideo-1.5FP16
Fits comfortably16GB required · 24GB available
68.66 tok/sEstimated
nari-labs/Dia2-2BQ4
Fits comfortably2GB required · 24GB available
225.67 tok/sEstimated
nari-labs/Dia2-2BQ8
Fits comfortably3GB required · 24GB available
159.64 tok/sEstimated
nari-labs/Dia2-2BFP16
Fits comfortably5GB required · 24GB available
84.00 tok/sEstimated
unsloth/Llama-3.2-1B-InstructQ4
Fits comfortably1GB required · 24GB available
208.71 tok/sEstimated
unsloth/Llama-3.2-1B-InstructQ8
Fits comfortably1GB required · 24GB available
145.22 tok/sEstimated
unsloth/Llama-3.2-1B-InstructFP16
Fits comfortably2GB required · 24GB available
86.07 tok/sEstimated
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4
Fits comfortably4GB required · 24GB available
183.82 tok/sEstimated
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ8
Fits comfortably9GB required · 24GB available
133.08 tok/sEstimated
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitFP16
Fits comfortably17GB required · 24GB available
68.16 tok/sEstimated
Qwen/Qwen3-235B-A22BQ4
Not supported115GB required · 24GB available
22.33 tok/sEstimated
ibm-granite/granite-docling-258MFP16
Fits comfortably15GB required · 24GB available
74.85 tok/sEstimated
google/gemma-2-9b-itFP16
Fits comfortably20GB required · 24GB available
47.99 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

GPU FAQs

Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.

What throughput does RTX 4090 deliver on modern 30B models?

Community llama.cpp benchmarks of the ubergarm/Qwen3-30B-A3B-GGUF build show the RTX 4090 sustaining roughly 150–160 tokens/sec with CUDA kernels, keeping decode latency under 7 ms per token.

Source: Reddit – /r/LocalLLaMA (mq59v1k)

Can a single RTX 4090 keep Llama 3.1 70B Q4 fully in VRAM?

No. Builders loading Llama 3.1 70B Q4_K_M report roughly half the tensor pages spilling to system RAM on a 24 GB 4090, which drags throughput because PCIe becomes the bottleneck. Multi-GPU setups or 48 GB cards avoid the spill.

Source: Reddit – /r/LocalLLaMA (mqcouez)

How many large models can RTX 4090 run simultaneously?

Power users running multi-4090 racks note that a single 4090 comfortably hosts one 32B-class model; parallel agents or MoE workloads need tensor parallelism across multiple GPUs to keep speeds high.

Source: Reddit – /r/LocalLLaMA (mqwkgv3)

What power supply and connectors does RTX 4090 require?

NVIDIA rates the RTX 4090 at 450 W board power and recommends at least an 850 W PSU with the 16-pin 12VHPWR connector to maintain headroom for AI workloads.

Source: TechPowerUp – RTX 4090 Specs

What is the current street price for RTX 4090?

Our price tracker (Nov 2025) shows Amazon at $1,599 in stock.

Source: Supabase price tracker snapshot – 2025-11-03

Alternative GPUs

RTX 4080
16GB

Explore how RTX 4080 stacks up for local inference workloads.

RTX 4070 Ti
12GB

Explore how RTX 4070 Ti stacks up for local inference workloads.

RTX 3090
24GB

Explore how RTX 3090 stacks up for local inference workloads.

NVIDIA RTX 6000 Ada
48GB

Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.

RX 7900 XTX
24GB

Explore how RX 7900 XTX stacks up for local inference workloads.

Compare RTX 4090

RTX 4090 vs RTX 4080

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RTX 4090 vs RTX 3090

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RTX 4090 vs RX 7900 XTX

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.