Quick Answer: RX 7700 XT offers 12GB VRAM and starts around $399.99. It delivers approximately 90 tokens/sec on deepseek-ai/DeepSeek-OCR. It typically draws 245W under load.

RX 7700 XT

Unknown

By AMDReleased 2023-09MSRP $449.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $399.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM12GB

Cores3,456

TDP245W

ArchitectureRDNA 3

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

$399.99

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RX 7700 XT performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
deepseek-ai/DeepSeek-OCR	Q4	89.90 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	87.60 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	87.46 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	87.31 tok/sEstimated Auto-generated benchmark	2GB
ibm-granite/granite-3.3-2b-instruct	Q4	86.63 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	86.55 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	85.73 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	85.24 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	84.38 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	84.35 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	83.68 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	83.65 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	82.36 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	82.29 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	82.19 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	81.98 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	80.86 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	79.75 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	79.64 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	78.44 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	78.13 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	77.50 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	77.24 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	76.25 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	75.82 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	75.51 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	75.50 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	75.29 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	74.86 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	74.50 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	73.98 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-8B-FP8	Q4	73.96 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-0.6B-Base	Q4	73.80 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-1.7B	Q4	73.76 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-vision-instruct	Q4	73.64 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceH4/zephyr-7b-beta	Q4	73.62 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1	Q4	73.33 tok/sEstimated Auto-generated benchmark	4GB
vikhyatk/moondream2	Q4	73.11 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	73.02 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	72.80 tok/sEstimated Auto-generated benchmark	4GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	72.70 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	72.64 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-7B-Instruct	Q4	72.63 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Base	Q4	72.54 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B-Instruct	Q4	72.53 tok/sEstimated Auto-generated benchmark	3GB
openai-community/gpt2	Q4	72.35 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	72.33 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	72.29 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen-Image-Edit-2509	Q4	72.08 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Reranker-0.6B	Q4	71.73 tok/sEstimated Auto-generated benchmark	3GB

deepseek-ai/DeepSeek-OCR

2GB

89.90 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

87.60 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

87.46 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

87.31 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

86.63 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

86.55 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

85.73 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

85.24 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

84.38 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

84.35 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

83.68 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

83.65 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

82.36 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

82.29 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

82.19 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

81.98 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

80.86 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

79.75 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

79.64 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

78.44 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

78.13 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

77.50 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

77.24 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

76.25 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

75.82 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

75.51 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

75.50 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

75.29 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

74.86 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

74.50 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

73.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

73.96 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

73.80 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

73.76 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

73.64 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

73.62 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

4GB

73.33 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

4GB

73.11 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

73.02 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

72.80 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

72.70 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

72.64 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

4GB

72.63 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

72.54 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

3GB

72.53 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

72.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

72.33 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

72.29 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen-Image-Edit-2509

4GB

72.08 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

71.73 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	84.35 tok/sEstimated	2GB (have 12GB)
distilbert/distilgpt2	Q4	Fits comfortably	71.00 tok/sEstimated	4GB (have 12GB)
distilbert/distilgpt2	Q8	Fits comfortably	44.55 tok/sEstimated	7GB (have 12GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	69.70 tok/sEstimated	4GB (have 12GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	42.57 tok/sEstimated	9GB (have 12GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Not supported	23.25 tok/sEstimated	17GB (have 12GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	64.14 tok/sEstimated	3GB (have 12GB)
Qwen/Qwen2.5-1.5B	Q8	Fits comfortably	45.67 tok/sEstimated	5GB (have 12GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	Fits comfortably	69.59 tok/sEstimated	4GB (have 12GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	44.46 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Not supported	15.36 tok/sEstimated	33GB (have 12GB)
Qwen/Qwen2.5-32B-Instruct	FP16	Not supported	8.22 tok/sEstimated	66GB (have 12GB)
mistralai/Mistral-7B-v0.1	Q4	Fits comfortably	62.48 tok/sEstimated	4GB (have 12GB)
HuggingFaceTB/SmolLM-135M	Q8	Fits comfortably	48.75 tok/sEstimated	7GB (have 12GB)
HuggingFaceTB/SmolLM-135M	FP16	Not supported	23.14 tok/sEstimated	15GB (have 12GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	FP16	Not supported	23.73 tok/sEstimated	15GB (have 12GB)
Qwen/Qwen3-14B-Base	Q8	Not supported	37.35 tok/sEstimated	14GB (have 12GB)
Qwen/Qwen3-14B-Base	FP16	Not supported	20.49 tok/sEstimated	29GB (have 12GB)
meta-llama/Llama-2-7b-chat-hf	Q4	Fits comfortably	65.29 tok/sEstimated	4GB (have 12GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	FP16	Not supported	8.39 tok/sEstimated	66GB (have 12GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	FP16	Not supported	9.07 tok/sEstimated	137GB (have 12GB)
OpenPipe/Qwen3-14B-Instruct	Q4	Fits comfortably	49.80 tok/sEstimated	7GB (have 12GB)
numind/NuExtract-1.5	FP16	Not supported	25.51 tok/sEstimated	15GB (have 12GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	Fits comfortably	70.25 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen2.5-Coder-7B-Instruct	FP16	Not supported	27.31 tok/sEstimated	15GB (have 12GB)
Qwen/Qwen3-4B-Thinking-2507	Q4	Fits comfortably	63.43 tok/sEstimated	2GB (have 12GB)
ibm-granite/granite-docling-258M	Q8	Fits comfortably	45.40 tok/sEstimated	7GB (have 12GB)
ibm-granite/granite-docling-258M	FP16	Not supported	26.45 tok/sEstimated	15GB (have 12GB)
bigcode/starcoder2-3b	FP16	Fits comfortably	31.91 tok/sEstimated	6GB (have 12GB)
unsloth/gemma-3-1b-it	Q8	Fits comfortably	53.13 tok/sEstimated	1GB (have 12GB)
unsloth/gemma-3-1b-it	FP16	Fits comfortably	33.37 tok/sEstimated	2GB (have 12GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	FP16	Not supported	20.72 tok/sEstimated	19GB (have 12GB)
openai/gpt-oss-safeguard-20b	Q8	Not supported	26.07 tok/sEstimated	22GB (have 12GB)
openai/gpt-oss-safeguard-20b	FP16	Not supported	15.13 tok/sEstimated	44GB (have 12GB)
MiniMaxAI/MiniMax-M1-40k	Q4	Not supported	8.88 tok/sEstimated	255GB (have 12GB)
MiniMaxAI/MiniMax-M1-40k	Q8	Not supported	6.21 tok/sEstimated	510GB (have 12GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	7.46 tok/sEstimated	378GB (have 12GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	5.43 tok/sEstimated	755GB (have 12GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	2.86 tok/sEstimated	1509GB (have 12GB)
EssentialAI/rnj-1	Q4	Fits comfortably	49.63 tok/sEstimated	5GB (have 12GB)
EssentialAI/rnj-1	Q8	Fits comfortably	38.56 tok/sEstimated	10GB (have 12GB)
EssentialAI/rnj-1	FP16	Not supported	19.77 tok/sEstimated	19GB (have 12GB)
Qwen/Qwen2.5-3B-Instruct	FP16	Fits comfortably	33.74 tok/sEstimated	6GB (have 12GB)
bigscience/bloomz-560m	Q4	Fits comfortably	67.53 tok/sEstimated	4GB (have 12GB)
bigscience/bloomz-560m	Q8	Fits comfortably	48.82 tok/sEstimated	7GB (have 12GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	FP16	Fits comfortably	30.24 tok/sEstimated	6GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	63.53 tok/sEstimated	4GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	50.28 tok/sEstimated	7GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.2	FP16	Not supported	26.66 tok/sEstimated	15GB (have 12GB)
bigscience/bloomz-560m	FP16	Not supported	27.40 tok/sEstimated	15GB (have 12GB)

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 12GB available

84.35 tok/sEstimated

distilbert/distilgpt2Q4

Fits comfortably4GB required · 12GB available

71.00 tok/sEstimated

distilbert/distilgpt2Q8

Fits comfortably7GB required · 12GB available

44.55 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 12GB available

69.70 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably9GB required · 12GB available

42.57 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Not supported17GB required · 12GB available

23.25 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 12GB available

64.14 tok/sEstimated

Qwen/Qwen2.5-1.5BQ8

Fits comfortably5GB required · 12GB available

45.67 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ4

Fits comfortably4GB required · 12GB available

69.59 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 12GB available

44.46 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Not supported33GB required · 12GB available

15.36 tok/sEstimated

Qwen/Qwen2.5-32B-InstructFP16

Not supported66GB required · 12GB available

8.22 tok/sEstimated

mistralai/Mistral-7B-v0.1Q4

Fits comfortably4GB required · 12GB available

62.48 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ8

Fits comfortably7GB required · 12GB available

48.75 tok/sEstimated

HuggingFaceTB/SmolLM-135MFP16

Not supported15GB required · 12GB available

23.14 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMFP16

Not supported15GB required · 12GB available

23.73 tok/sEstimated

Qwen/Qwen3-14B-BaseQ8

Not supported14GB required · 12GB available

37.35 tok/sEstimated

Qwen/Qwen3-14B-BaseFP16

Not supported29GB required · 12GB available

20.49 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ4

Fits comfortably4GB required · 12GB available

65.29 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitFP16

Not supported66GB required · 12GB available

8.39 tok/sEstimated

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16FP16

Not supported137GB required · 12GB available

9.07 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ4

Fits comfortably7GB required · 12GB available

49.80 tok/sEstimated

numind/NuExtract-1.5FP16

Not supported15GB required · 12GB available

25.51 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ4

Fits comfortably4GB required · 12GB available

70.25 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructFP16

Not supported15GB required · 12GB available

27.31 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q4

Fits comfortably2GB required · 12GB available

63.43 tok/sEstimated

ibm-granite/granite-docling-258MQ8

Fits comfortably7GB required · 12GB available

45.40 tok/sEstimated

ibm-granite/granite-docling-258MFP16

Not supported15GB required · 12GB available

26.45 tok/sEstimated

bigcode/starcoder2-3bFP16

Fits comfortably6GB required · 12GB available

31.91 tok/sEstimated

unsloth/gemma-3-1b-itQ8

Fits comfortably1GB required · 12GB available

53.13 tok/sEstimated

unsloth/gemma-3-1b-itFP16

Fits comfortably2GB required · 12GB available

33.37 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2FP16

Not supported19GB required · 12GB available

20.72 tok/sEstimated

openai/gpt-oss-safeguard-20bQ8

Not supported22GB required · 12GB available

26.07 tok/sEstimated

openai/gpt-oss-safeguard-20bFP16

Not supported44GB required · 12GB available

15.13 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kQ4

Not supported255GB required · 12GB available

8.88 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kQ8

Not supported510GB required · 12GB available

6.21 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 12GB available

7.46 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 12GB available

5.43 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 12GB available

2.86 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 12GB available

49.63 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 12GB available

38.56 tok/sEstimated

EssentialAI/rnj-1FP16

Not supported19GB required · 12GB available

19.77 tok/sEstimated

Qwen/Qwen2.5-3B-InstructFP16

Fits comfortably6GB required · 12GB available

33.74 tok/sEstimated

bigscience/bloomz-560mQ4

Fits comfortably4GB required · 12GB available

67.53 tok/sEstimated

bigscience/bloomz-560mQ8

Fits comfortably7GB required · 12GB available

48.82 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16

Fits comfortably6GB required · 12GB available

30.24 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 12GB available

63.53 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 12GB available

50.28 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2FP16

Not supported15GB required · 12GB available

26.66 tok/sEstimated

bigscience/bloomz-560mFP16

Not supported15GB required · 12GB available

27.40 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

Quick Answer: RX 7700 XT offers 12GB VRAM and starts around $399.99. It delivers approximately 90 tokens/sec on deepseek-ai/DeepSeek-OCR. It typically draws 245W under load.

RX 7700 XT

Unknown

By AMDReleased 2023-09MSRP $449.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $399.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM12GB

Cores3,456

TDP245W

ArchitectureRDNA 3

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

$399.99

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RX 7700 XT performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
deepseek-ai/DeepSeek-OCR	Q4	89.90 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	87.60 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	87.46 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	87.31 tok/sEstimated Auto-generated benchmark	2GB
ibm-granite/granite-3.3-2b-instruct	Q4	86.63 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	86.55 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	85.73 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	85.24 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	84.38 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	84.35 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	83.68 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	83.65 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	82.36 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	82.29 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	82.19 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	81.98 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	80.86 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	79.75 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	79.64 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	78.44 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	78.13 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	77.50 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	77.24 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	76.25 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	75.82 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	75.51 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	75.50 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	75.29 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	74.86 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	74.50 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	73.98 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-8B-FP8	Q4	73.96 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-0.6B-Base	Q4	73.80 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-1.7B	Q4	73.76 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-vision-instruct	Q4	73.64 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceH4/zephyr-7b-beta	Q4	73.62 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1	Q4	73.33 tok/sEstimated Auto-generated benchmark	4GB
vikhyatk/moondream2	Q4	73.11 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	73.02 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	72.80 tok/sEstimated Auto-generated benchmark	4GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	72.70 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	72.64 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-7B-Instruct	Q4	72.63 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Base	Q4	72.54 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B-Instruct	Q4	72.53 tok/sEstimated Auto-generated benchmark	3GB
openai-community/gpt2	Q4	72.35 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	72.33 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	72.29 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen-Image-Edit-2509	Q4	72.08 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Reranker-0.6B	Q4	71.73 tok/sEstimated Auto-generated benchmark	3GB

deepseek-ai/DeepSeek-OCR

2GB

89.90 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

87.60 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

87.46 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

87.31 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

86.63 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

86.55 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

85.73 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

85.24 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

84.38 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

84.35 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

83.68 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

83.65 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

82.36 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

82.29 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

82.19 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

81.98 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

80.86 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

79.75 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

79.64 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

78.44 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

78.13 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

77.50 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

77.24 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

76.25 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

75.82 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

75.51 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

75.50 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

75.29 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

74.86 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

74.50 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

73.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

73.96 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

73.80 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

73.76 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

73.64 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

73.62 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

4GB

73.33 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

4GB

73.11 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

73.02 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

72.80 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

72.70 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

72.64 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

4GB

72.63 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

72.54 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

3GB

72.53 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

72.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

72.33 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

72.29 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen-Image-Edit-2509

4GB

72.08 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

71.73 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	84.35 tok/sEstimated	2GB (have 12GB)
distilbert/distilgpt2	Q4	Fits comfortably	71.00 tok/sEstimated	4GB (have 12GB)
distilbert/distilgpt2	Q8	Fits comfortably	44.55 tok/sEstimated	7GB (have 12GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	69.70 tok/sEstimated	4GB (have 12GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	42.57 tok/sEstimated	9GB (have 12GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Not supported	23.25 tok/sEstimated	17GB (have 12GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	64.14 tok/sEstimated	3GB (have 12GB)
Qwen/Qwen2.5-1.5B	Q8	Fits comfortably	45.67 tok/sEstimated	5GB (have 12GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	Fits comfortably	69.59 tok/sEstimated	4GB (have 12GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	44.46 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Not supported	15.36 tok/sEstimated	33GB (have 12GB)
Qwen/Qwen2.5-32B-Instruct	FP16	Not supported	8.22 tok/sEstimated	66GB (have 12GB)
mistralai/Mistral-7B-v0.1	Q4	Fits comfortably	62.48 tok/sEstimated	4GB (have 12GB)
HuggingFaceTB/SmolLM-135M	Q8	Fits comfortably	48.75 tok/sEstimated	7GB (have 12GB)
HuggingFaceTB/SmolLM-135M	FP16	Not supported	23.14 tok/sEstimated	15GB (have 12GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	FP16	Not supported	23.73 tok/sEstimated	15GB (have 12GB)
Qwen/Qwen3-14B-Base	Q8	Not supported	37.35 tok/sEstimated	14GB (have 12GB)
Qwen/Qwen3-14B-Base	FP16	Not supported	20.49 tok/sEstimated	29GB (have 12GB)
meta-llama/Llama-2-7b-chat-hf	Q4	Fits comfortably	65.29 tok/sEstimated	4GB (have 12GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	FP16	Not supported	8.39 tok/sEstimated	66GB (have 12GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	FP16	Not supported	9.07 tok/sEstimated	137GB (have 12GB)
OpenPipe/Qwen3-14B-Instruct	Q4	Fits comfortably	49.80 tok/sEstimated	7GB (have 12GB)
numind/NuExtract-1.5	FP16	Not supported	25.51 tok/sEstimated	15GB (have 12GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	Fits comfortably	70.25 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen2.5-Coder-7B-Instruct	FP16	Not supported	27.31 tok/sEstimated	15GB (have 12GB)
Qwen/Qwen3-4B-Thinking-2507	Q4	Fits comfortably	63.43 tok/sEstimated	2GB (have 12GB)
ibm-granite/granite-docling-258M	Q8	Fits comfortably	45.40 tok/sEstimated	7GB (have 12GB)
ibm-granite/granite-docling-258M	FP16	Not supported	26.45 tok/sEstimated	15GB (have 12GB)
bigcode/starcoder2-3b	FP16	Fits comfortably	31.91 tok/sEstimated	6GB (have 12GB)
unsloth/gemma-3-1b-it	Q8	Fits comfortably	53.13 tok/sEstimated	1GB (have 12GB)
unsloth/gemma-3-1b-it	FP16	Fits comfortably	33.37 tok/sEstimated	2GB (have 12GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	FP16	Not supported	20.72 tok/sEstimated	19GB (have 12GB)
openai/gpt-oss-safeguard-20b	Q8	Not supported	26.07 tok/sEstimated	22GB (have 12GB)
openai/gpt-oss-safeguard-20b	FP16	Not supported	15.13 tok/sEstimated	44GB (have 12GB)
MiniMaxAI/MiniMax-M1-40k	Q4	Not supported	8.88 tok/sEstimated	255GB (have 12GB)
MiniMaxAI/MiniMax-M1-40k	Q8	Not supported	6.21 tok/sEstimated	510GB (have 12GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	7.46 tok/sEstimated	378GB (have 12GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	5.43 tok/sEstimated	755GB (have 12GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	2.86 tok/sEstimated	1509GB (have 12GB)
EssentialAI/rnj-1	Q4	Fits comfortably	49.63 tok/sEstimated	5GB (have 12GB)
EssentialAI/rnj-1	Q8	Fits comfortably	38.56 tok/sEstimated	10GB (have 12GB)
EssentialAI/rnj-1	FP16	Not supported	19.77 tok/sEstimated	19GB (have 12GB)
Qwen/Qwen2.5-3B-Instruct	FP16	Fits comfortably	33.74 tok/sEstimated	6GB (have 12GB)
bigscience/bloomz-560m	Q4	Fits comfortably	67.53 tok/sEstimated	4GB (have 12GB)
bigscience/bloomz-560m	Q8	Fits comfortably	48.82 tok/sEstimated	7GB (have 12GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	FP16	Fits comfortably	30.24 tok/sEstimated	6GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	63.53 tok/sEstimated	4GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	50.28 tok/sEstimated	7GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.2	FP16	Not supported	26.66 tok/sEstimated	15GB (have 12GB)
bigscience/bloomz-560m	FP16	Not supported	27.40 tok/sEstimated	15GB (have 12GB)

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 12GB available

84.35 tok/sEstimated

distilbert/distilgpt2Q4

Fits comfortably4GB required · 12GB available

71.00 tok/sEstimated

distilbert/distilgpt2Q8

Fits comfortably7GB required · 12GB available

44.55 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 12GB available

69.70 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably9GB required · 12GB available

42.57 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Not supported17GB required · 12GB available

23.25 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 12GB available

64.14 tok/sEstimated

Qwen/Qwen2.5-1.5BQ8

Fits comfortably5GB required · 12GB available

45.67 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ4

Fits comfortably4GB required · 12GB available

69.59 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 12GB available

44.46 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Not supported33GB required · 12GB available

15.36 tok/sEstimated

Qwen/Qwen2.5-32B-InstructFP16

Not supported66GB required · 12GB available

8.22 tok/sEstimated

mistralai/Mistral-7B-v0.1Q4

Fits comfortably4GB required · 12GB available

62.48 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ8

Fits comfortably7GB required · 12GB available

48.75 tok/sEstimated

HuggingFaceTB/SmolLM-135MFP16

Not supported15GB required · 12GB available

23.14 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMFP16

Not supported15GB required · 12GB available

23.73 tok/sEstimated

Qwen/Qwen3-14B-BaseQ8

Not supported14GB required · 12GB available

37.35 tok/sEstimated

Qwen/Qwen3-14B-BaseFP16

Not supported29GB required · 12GB available

20.49 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ4

Fits comfortably4GB required · 12GB available

65.29 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitFP16

Not supported66GB required · 12GB available

8.39 tok/sEstimated

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16FP16

Not supported137GB required · 12GB available

9.07 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ4

Fits comfortably7GB required · 12GB available

49.80 tok/sEstimated

numind/NuExtract-1.5FP16

Not supported15GB required · 12GB available

25.51 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ4

Fits comfortably4GB required · 12GB available

70.25 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructFP16

Not supported15GB required · 12GB available

27.31 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q4

Fits comfortably2GB required · 12GB available

63.43 tok/sEstimated

ibm-granite/granite-docling-258MQ8

Fits comfortably7GB required · 12GB available

45.40 tok/sEstimated

ibm-granite/granite-docling-258MFP16

Not supported15GB required · 12GB available

26.45 tok/sEstimated

bigcode/starcoder2-3bFP16

Fits comfortably6GB required · 12GB available

31.91 tok/sEstimated

unsloth/gemma-3-1b-itQ8

Fits comfortably1GB required · 12GB available

53.13 tok/sEstimated

unsloth/gemma-3-1b-itFP16

Fits comfortably2GB required · 12GB available

33.37 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2FP16

Not supported19GB required · 12GB available

20.72 tok/sEstimated

openai/gpt-oss-safeguard-20bQ8

Not supported22GB required · 12GB available

26.07 tok/sEstimated

openai/gpt-oss-safeguard-20bFP16

Not supported44GB required · 12GB available

15.13 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kQ4

Not supported255GB required · 12GB available

8.88 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kQ8

Not supported510GB required · 12GB available

6.21 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 12GB available

7.46 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 12GB available

5.43 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 12GB available

2.86 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 12GB available

49.63 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 12GB available

38.56 tok/sEstimated

EssentialAI/rnj-1FP16

Not supported19GB required · 12GB available

19.77 tok/sEstimated

Qwen/Qwen2.5-3B-InstructFP16

Fits comfortably6GB required · 12GB available

33.74 tok/sEstimated

bigscience/bloomz-560mQ4

Fits comfortably4GB required · 12GB available

67.53 tok/sEstimated

bigscience/bloomz-560mQ8

Fits comfortably7GB required · 12GB available

48.82 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16

Fits comfortably6GB required · 12GB available

30.24 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 12GB available

63.53 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 12GB available

50.28 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2FP16

Not supported15GB required · 12GB available

26.66 tok/sEstimated

bigscience/bloomz-560mFP16

Not supported15GB required · 12GB available

27.40 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.