Quick Answer: AMD Radeon Pro W7900 offers 48GB VRAM and starts around current market pricing. It delivers approximately 189 tokens/sec on unsloth/Llama-3.2-1B-Instruct. It typically draws 295W under load.

AMD Radeon Pro W7900

Check availability

By AMDReleased 2023-04MSRP $3,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM48GB

Cores6,144

TDP295W

ArchitectureRDNA 3

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test AMD Radeon Pro W7900 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
unsloth/Llama-3.2-1B-Instruct	Q4	188.67 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	185.50 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	185.22 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	184.38 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	183.43 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	183.43 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	183.02 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	182.98 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	182.68 tok/sEstimated Auto-generated benchmark	2GB
ibm-granite/granite-3.3-2b-instruct	Q4	182.67 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	182.40 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	181.63 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	180.89 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	179.95 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	179.90 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	179.64 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	179.04 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	177.72 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	177.41 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	175.44 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	175.29 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	174.94 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	172.81 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	167.73 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	166.29 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	165.70 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	164.88 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	164.77 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	162.62 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	161.61 tok/sEstimated Auto-generated benchmark	1GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	159.91 tok/sEstimated Auto-generated benchmark	3GB
bigscience/bloomz-560m	Q4	159.85 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-128k-instruct	Q4	159.56 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507	Q4	159.52 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-8B	Q4	159.12 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B-Instruct	Q4	158.88 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Instruct-2507	Q4	158.51 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	157.89 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-R1-0528	Q4	157.69 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Base	Q4	157.58 tok/sEstimated Auto-generated benchmark	4GB
vikhyatk/moondream2	Q4	157.43 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-large	Q4	157.26 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-7B-Instruct	Q4	157.08 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	156.19 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	156.05 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	155.98 tok/sEstimated Auto-generated benchmark	4GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	155.95 tok/sEstimated Auto-generated benchmark	2GB
allenai/Olmo-3-7B-Think	Q4	155.68 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-chat-hf	Q4	155.32 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	155.23 tok/sEstimated Auto-generated benchmark	4GB

unsloth/Llama-3.2-1B-Instruct

1GB

188.67 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

185.50 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

185.22 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

184.38 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

183.43 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

183.43 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

183.02 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

182.98 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

182.68 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

182.67 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

182.40 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

181.63 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

180.89 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

179.95 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

179.90 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

179.64 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

179.04 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

177.72 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

177.41 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

175.44 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

175.29 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

174.94 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

172.81 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

167.73 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

166.29 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

165.70 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

164.88 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

164.77 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

162.62 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

161.61 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

159.91 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

159.85 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

159.56 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

159.52 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

159.12 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

4GB

158.88 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

2GB

158.51 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

157.89 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

157.69 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

157.58 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

4GB

157.43 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

4GB

157.26 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

4GB

157.08 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

156.19 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

156.05 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

4GB

155.98 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

155.95 tok/sEstimated

Auto-generated benchmark

allenai/Olmo-3-7B-Think

4GB

155.68 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

155.32 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

155.23 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	Fits comfortably	60.94 tok/sEstimated	20GB (have 48GB)
Qwen/Qwen2.5-Coder-32B-Instruct	FP16	Not supported	20.55 tok/sEstimated	67GB (have 48GB)
distilbert/distilgpt2	FP16	Fits comfortably	50.10 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-32B	FP16	Not supported	18.39 tok/sEstimated	66GB (have 48GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	136.51 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	107.68 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	76.25 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen2.5-14B-Instruct	FP16	Fits comfortably	39.50 tok/sEstimated	29GB (have 48GB)
Qwen/Qwen3-Embedding-8B	FP16	Fits comfortably	59.44 tok/sEstimated	17GB (have 48GB)
Qwen/Qwen3-14B	FP16	Fits comfortably	40.14 tok/sEstimated	29GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q4	Fits comfortably	150.50 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	109.53 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	55.04 tok/sEstimated	11GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Fits comfortably	52.25 tok/sEstimated	34GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	35.51 tok/sEstimated	68GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Not supported	19.18 tok/sEstimated	137GB (have 48GB)
microsoft/phi-2	Q4	Fits comfortably	147.62 tok/sEstimated	4GB (have 48GB)
microsoft/phi-2	Q8	Fits comfortably	106.25 tok/sEstimated	7GB (have 48GB)
microsoft/phi-2	FP16	Fits comfortably	57.96 tok/sEstimated	15GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	150.98 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	92.59 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	FP16	Fits comfortably	52.50 tok/sEstimated	15GB (have 48GB)
meta-llama/Llama-2-7b-hf	Q4	Fits comfortably	130.97 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-2-7b-hf	Q8	Fits comfortably	93.61 tok/sEstimated	7GB (have 48GB)
meta-llama/Llama-2-7b-hf	FP16	Fits comfortably	53.05 tok/sEstimated	15GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	133.85 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	111.13 tok/sEstimated	9GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	FP16	Fits comfortably	50.77 tok/sEstimated	17GB (have 48GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	60.29 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	Q4	Fits comfortably	150.18 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	Q8	Fits comfortably	93.45 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	FP16	Fits comfortably	51.42 tok/sEstimated	11GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	137.31 tok/sEstimated	4GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	96.44 tok/sEstimated	7GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	FP16	Fits comfortably	59.01 tok/sEstimated	15GB (have 48GB)
openai-community/gpt2-medium	Q4	Fits comfortably	138.47 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2-medium	Q8	Fits comfortably	95.97 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2-medium	FP16	Fits comfortably	59.10 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-0.6B-Base	Q4	Fits comfortably	134.77 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen3-0.6B-Base	FP16	Fits comfortably	60.26 tok/sEstimated	13GB (have 48GB)
Qwen/Qwen3-4B-Base	Q8	Fits comfortably	94.48 tok/sEstimated	4GB (have 48GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	Fits comfortably	34.51 tok/sEstimated	33GB (have 48GB)
EleutherAI/gpt-neo-125m	Q8	Fits comfortably	100.18 tok/sEstimated	7GB (have 48GB)
EleutherAI/gpt-neo-125m	FP16	Fits comfortably	60.37 tok/sEstimated	15GB (have 48GB)
meta-llama/Llama-3.2-3B	Q4	Fits comfortably	179.90 tok/sEstimated	2GB (have 48GB)
meta-llama/Llama-3.2-3B	FP16	Fits comfortably	61.97 tok/sEstimated	6GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	Fits comfortably	152.80 tok/sEstimated	2GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	Fits comfortably	91.65 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	FP16	Fits comfortably	51.61 tok/sEstimated	9GB (have 48GB)
google/gemma-2-27b-it	Q4	Fits comfortably	87.66 tok/sEstimated	14GB (have 48GB)

mlx-community/gpt-oss-20b-MXFP4-Q8Q8

Fits comfortably20GB required · 48GB available

60.94 tok/sEstimated

Qwen/Qwen2.5-Coder-32B-InstructFP16

Not supported67GB required · 48GB available

20.55 tok/sEstimated

distilbert/distilgpt2FP16

Fits comfortably15GB required · 48GB available

50.10 tok/sEstimated

Qwen/Qwen3-32BFP16

Not supported66GB required · 48GB available

18.39 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 48GB available

136.51 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 48GB available

107.68 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 48GB available

76.25 tok/sEstimated

Qwen/Qwen2.5-14B-InstructFP16

Fits comfortably29GB required · 48GB available

39.50 tok/sEstimated

Qwen/Qwen3-Embedding-8BFP16

Fits comfortably17GB required · 48GB available

59.44 tok/sEstimated

Qwen/Qwen3-14BFP16

Fits comfortably29GB required · 48GB available

40.14 tok/sEstimated

Qwen/Qwen2.5-0.5BQ4

Fits comfortably3GB required · 48GB available

150.50 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 48GB available

109.53 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 48GB available

55.04 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ4

Fits comfortably34GB required · 48GB available

52.25 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported68GB required · 48GB available

35.51 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Not supported137GB required · 48GB available

19.18 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 48GB available

147.62 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 48GB available

106.25 tok/sEstimated

microsoft/phi-2FP16

Fits comfortably15GB required · 48GB available

57.96 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 48GB available

150.98 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 48GB available

92.59 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BFP16

Fits comfortably15GB required · 48GB available

52.50 tok/sEstimated

meta-llama/Llama-2-7b-hfQ4

Fits comfortably4GB required · 48GB available

130.97 tok/sEstimated

meta-llama/Llama-2-7b-hfQ8

Fits comfortably7GB required · 48GB available

93.61 tok/sEstimated

meta-llama/Llama-2-7b-hfFP16

Fits comfortably15GB required · 48GB available

53.05 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 48GB available

133.85 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably9GB required · 48GB available

111.13 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BFP16

Fits comfortably17GB required · 48GB available

50.77 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 48GB available

60.29 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ4

Fits comfortably3GB required · 48GB available

150.18 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ8

Fits comfortably5GB required · 48GB available

93.45 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BFP16

Fits comfortably11GB required · 48GB available

51.42 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 48GB available

137.31 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 48GB available

96.44 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMFP16

Fits comfortably15GB required · 48GB available

59.01 tok/sEstimated

openai-community/gpt2-mediumQ4

Fits comfortably4GB required · 48GB available

138.47 tok/sEstimated

openai-community/gpt2-mediumQ8

Fits comfortably7GB required · 48GB available

95.97 tok/sEstimated

openai-community/gpt2-mediumFP16

Fits comfortably15GB required · 48GB available

59.10 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ4

Fits comfortably3GB required · 48GB available

134.77 tok/sEstimated

Qwen/Qwen3-0.6B-BaseFP16

Fits comfortably13GB required · 48GB available

60.26 tok/sEstimated

Qwen/Qwen3-4B-BaseQ8

Fits comfortably4GB required · 48GB available

94.48 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ8

Fits comfortably33GB required · 48GB available

34.51 tok/sEstimated

EleutherAI/gpt-neo-125mQ8

Fits comfortably7GB required · 48GB available

100.18 tok/sEstimated

EleutherAI/gpt-neo-125mFP16

Fits comfortably15GB required · 48GB available

60.37 tok/sEstimated

meta-llama/Llama-3.2-3BQ4

Fits comfortably2GB required · 48GB available

179.90 tok/sEstimated

meta-llama/Llama-3.2-3BFP16

Fits comfortably6GB required · 48GB available

61.97 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ4

Fits comfortably2GB required · 48GB available

152.80 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ8

Fits comfortably4GB required · 48GB available

91.65 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitFP16

Fits comfortably9GB required · 48GB available

51.61 tok/sEstimated

google/gemma-2-27b-itQ4

Fits comfortably14GB required · 48GB available

87.66 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

AMD Radeon Pro W7900

Check availability

By AMDReleased 2023-04MSRP $3,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM48GB

Cores6,144

TDP295W

ArchitectureRDNA 3

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test AMD Radeon Pro W7900 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
unsloth/Llama-3.2-1B-Instruct	Q4	188.67 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	185.50 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	185.22 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	184.38 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	183.43 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	183.43 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	183.02 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	182.98 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	182.68 tok/sEstimated Auto-generated benchmark	2GB
ibm-granite/granite-3.3-2b-instruct	Q4	182.67 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	182.40 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	181.63 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	180.89 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	179.95 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	179.90 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	179.64 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	179.04 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	177.72 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	177.41 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	175.44 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	175.29 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	174.94 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	172.81 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	167.73 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	166.29 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	165.70 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	164.88 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	164.77 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	162.62 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	161.61 tok/sEstimated Auto-generated benchmark	1GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	159.91 tok/sEstimated Auto-generated benchmark	3GB
bigscience/bloomz-560m	Q4	159.85 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-128k-instruct	Q4	159.56 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507	Q4	159.52 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-8B	Q4	159.12 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B-Instruct	Q4	158.88 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Instruct-2507	Q4	158.51 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	157.89 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-R1-0528	Q4	157.69 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Base	Q4	157.58 tok/sEstimated Auto-generated benchmark	4GB
vikhyatk/moondream2	Q4	157.43 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-large	Q4	157.26 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-7B-Instruct	Q4	157.08 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	156.19 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	156.05 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	155.98 tok/sEstimated Auto-generated benchmark	4GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	155.95 tok/sEstimated Auto-generated benchmark	2GB
allenai/Olmo-3-7B-Think	Q4	155.68 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-chat-hf	Q4	155.32 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	155.23 tok/sEstimated Auto-generated benchmark	4GB

unsloth/Llama-3.2-1B-Instruct

1GB

188.67 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

185.50 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

185.22 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

184.38 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

183.43 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

183.43 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

183.02 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

182.98 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

182.68 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

182.67 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

182.40 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

181.63 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

180.89 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

179.95 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

179.90 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

179.64 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

179.04 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

177.72 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

177.41 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

175.44 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

175.29 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

174.94 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

172.81 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

167.73 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

166.29 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

165.70 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

164.88 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

164.77 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

162.62 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

161.61 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

159.91 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

159.85 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

159.56 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

159.52 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

159.12 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

4GB

158.88 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

2GB

158.51 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

157.89 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

157.69 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

157.58 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

4GB

157.43 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

4GB

157.26 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

4GB

157.08 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

156.19 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

156.05 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

4GB

155.98 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

155.95 tok/sEstimated

Auto-generated benchmark

allenai/Olmo-3-7B-Think

4GB

155.68 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

155.32 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

155.23 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	Fits comfortably	60.94 tok/sEstimated	20GB (have 48GB)
Qwen/Qwen2.5-Coder-32B-Instruct	FP16	Not supported	20.55 tok/sEstimated	67GB (have 48GB)
distilbert/distilgpt2	FP16	Fits comfortably	50.10 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-32B	FP16	Not supported	18.39 tok/sEstimated	66GB (have 48GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	136.51 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	107.68 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	76.25 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen2.5-14B-Instruct	FP16	Fits comfortably	39.50 tok/sEstimated	29GB (have 48GB)
Qwen/Qwen3-Embedding-8B	FP16	Fits comfortably	59.44 tok/sEstimated	17GB (have 48GB)
Qwen/Qwen3-14B	FP16	Fits comfortably	40.14 tok/sEstimated	29GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q4	Fits comfortably	150.50 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	109.53 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	55.04 tok/sEstimated	11GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Fits comfortably	52.25 tok/sEstimated	34GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	35.51 tok/sEstimated	68GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Not supported	19.18 tok/sEstimated	137GB (have 48GB)
microsoft/phi-2	Q4	Fits comfortably	147.62 tok/sEstimated	4GB (have 48GB)
microsoft/phi-2	Q8	Fits comfortably	106.25 tok/sEstimated	7GB (have 48GB)
microsoft/phi-2	FP16	Fits comfortably	57.96 tok/sEstimated	15GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	150.98 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	92.59 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	FP16	Fits comfortably	52.50 tok/sEstimated	15GB (have 48GB)
meta-llama/Llama-2-7b-hf	Q4	Fits comfortably	130.97 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-2-7b-hf	Q8	Fits comfortably	93.61 tok/sEstimated	7GB (have 48GB)
meta-llama/Llama-2-7b-hf	FP16	Fits comfortably	53.05 tok/sEstimated	15GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	133.85 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	111.13 tok/sEstimated	9GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	FP16	Fits comfortably	50.77 tok/sEstimated	17GB (have 48GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	60.29 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	Q4	Fits comfortably	150.18 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	Q8	Fits comfortably	93.45 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	FP16	Fits comfortably	51.42 tok/sEstimated	11GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	137.31 tok/sEstimated	4GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	96.44 tok/sEstimated	7GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	FP16	Fits comfortably	59.01 tok/sEstimated	15GB (have 48GB)
openai-community/gpt2-medium	Q4	Fits comfortably	138.47 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2-medium	Q8	Fits comfortably	95.97 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2-medium	FP16	Fits comfortably	59.10 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-0.6B-Base	Q4	Fits comfortably	134.77 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen3-0.6B-Base	FP16	Fits comfortably	60.26 tok/sEstimated	13GB (have 48GB)
Qwen/Qwen3-4B-Base	Q8	Fits comfortably	94.48 tok/sEstimated	4GB (have 48GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	Fits comfortably	34.51 tok/sEstimated	33GB (have 48GB)
EleutherAI/gpt-neo-125m	Q8	Fits comfortably	100.18 tok/sEstimated	7GB (have 48GB)
EleutherAI/gpt-neo-125m	FP16	Fits comfortably	60.37 tok/sEstimated	15GB (have 48GB)
meta-llama/Llama-3.2-3B	Q4	Fits comfortably	179.90 tok/sEstimated	2GB (have 48GB)
meta-llama/Llama-3.2-3B	FP16	Fits comfortably	61.97 tok/sEstimated	6GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	Fits comfortably	152.80 tok/sEstimated	2GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	Fits comfortably	91.65 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	FP16	Fits comfortably	51.61 tok/sEstimated	9GB (have 48GB)
google/gemma-2-27b-it	Q4	Fits comfortably	87.66 tok/sEstimated	14GB (have 48GB)

mlx-community/gpt-oss-20b-MXFP4-Q8Q8

Fits comfortably20GB required · 48GB available

60.94 tok/sEstimated

Qwen/Qwen2.5-Coder-32B-InstructFP16

Not supported67GB required · 48GB available

20.55 tok/sEstimated

distilbert/distilgpt2FP16

Fits comfortably15GB required · 48GB available

50.10 tok/sEstimated

Qwen/Qwen3-32BFP16

Not supported66GB required · 48GB available

18.39 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 48GB available

136.51 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 48GB available

107.68 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 48GB available

76.25 tok/sEstimated

Qwen/Qwen2.5-14B-InstructFP16

Fits comfortably29GB required · 48GB available

39.50 tok/sEstimated

Qwen/Qwen3-Embedding-8BFP16

Fits comfortably17GB required · 48GB available

59.44 tok/sEstimated

Qwen/Qwen3-14BFP16

Fits comfortably29GB required · 48GB available

40.14 tok/sEstimated

Qwen/Qwen2.5-0.5BQ4

Fits comfortably3GB required · 48GB available

150.50 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 48GB available

109.53 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 48GB available

55.04 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ4

Fits comfortably34GB required · 48GB available

52.25 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported68GB required · 48GB available

35.51 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Not supported137GB required · 48GB available

19.18 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 48GB available

147.62 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 48GB available

106.25 tok/sEstimated

microsoft/phi-2FP16

Fits comfortably15GB required · 48GB available

57.96 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 48GB available

150.98 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 48GB available

92.59 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BFP16

Fits comfortably15GB required · 48GB available

52.50 tok/sEstimated

meta-llama/Llama-2-7b-hfQ4

Fits comfortably4GB required · 48GB available

130.97 tok/sEstimated

meta-llama/Llama-2-7b-hfQ8

Fits comfortably7GB required · 48GB available

93.61 tok/sEstimated

meta-llama/Llama-2-7b-hfFP16

Fits comfortably15GB required · 48GB available

53.05 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 48GB available

133.85 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably9GB required · 48GB available

111.13 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BFP16

Fits comfortably17GB required · 48GB available

50.77 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 48GB available

60.29 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ4

Fits comfortably3GB required · 48GB available

150.18 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ8

Fits comfortably5GB required · 48GB available

93.45 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BFP16

Fits comfortably11GB required · 48GB available

51.42 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 48GB available

137.31 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 48GB available

96.44 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMFP16

Fits comfortably15GB required · 48GB available

59.01 tok/sEstimated

openai-community/gpt2-mediumQ4

Fits comfortably4GB required · 48GB available

138.47 tok/sEstimated

openai-community/gpt2-mediumQ8

Fits comfortably7GB required · 48GB available

95.97 tok/sEstimated

openai-community/gpt2-mediumFP16

Fits comfortably15GB required · 48GB available

59.10 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ4

Fits comfortably3GB required · 48GB available

134.77 tok/sEstimated

Qwen/Qwen3-0.6B-BaseFP16

Fits comfortably13GB required · 48GB available

60.26 tok/sEstimated

Qwen/Qwen3-4B-BaseQ8

Fits comfortably4GB required · 48GB available

94.48 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ8

Fits comfortably33GB required · 48GB available

34.51 tok/sEstimated

EleutherAI/gpt-neo-125mQ8

Fits comfortably7GB required · 48GB available

100.18 tok/sEstimated

EleutherAI/gpt-neo-125mFP16

Fits comfortably15GB required · 48GB available

60.37 tok/sEstimated

meta-llama/Llama-3.2-3BQ4

Fits comfortably2GB required · 48GB available

179.90 tok/sEstimated

meta-llama/Llama-3.2-3BFP16

Fits comfortably6GB required · 48GB available

61.97 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ4

Fits comfortably2GB required · 48GB available

152.80 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ8

Fits comfortably4GB required · 48GB available

91.65 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitFP16

Fits comfortably9GB required · 48GB available

51.61 tok/sEstimated

google/gemma-2-27b-itQ4

Fits comfortably14GB required · 48GB available

87.66 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.