Quick Answer: RX 6900 XT offers 16GB VRAM and starts around current market pricing. It delivers approximately 109 tokens/sec on meta-llama/Llama-3.2-1B-Instruct. It typically draws 300W under load.

RX 6900 XT

Unknown

By AMDReleased 2020-12MSRP $999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM16GB

Cores5,120

TDP300W

ArchitectureRDNA 2

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test RX 6900 XT performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-3.2-1B-Instruct	Q4	109.00 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	108.96 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	108.77 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	108.16 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	107.89 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-OCR	Q4	106.71 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	106.23 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	104.92 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	103.83 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	103.74 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	103.33 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	101.59 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	101.33 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	101.15 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	100.69 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	100.48 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	100.41 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	99.11 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	97.43 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	96.97 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	96.79 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	96.38 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	95.46 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	95.38 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	93.42 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	91.38 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	90.94 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-1.5B	Q4	90.78 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B-Base	Q4	90.65 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Embedding-4B	Q4	90.40 tok/sEstimated Auto-generated benchmark	2GB
BSC-LT/salamandraTA-7b-instruct	Q4	90.40 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	90.35 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-2-7b-chat-hf	Q4	90.30 tok/sEstimated Auto-generated benchmark	4GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	90.30 tok/sEstimated Auto-generated benchmark	1GB
microsoft/VibeVoice-1.5B	Q4	90.28 tok/sEstimated Auto-generated benchmark	3GB
black-forest-labs/FLUX.1-dev	Q4	90.20 tok/sEstimated Auto-generated benchmark	4GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	89.95 tok/sEstimated Auto-generated benchmark	2GB
microsoft/Phi-3.5-mini-instruct	Q4	89.81 tok/sEstimated Auto-generated benchmark	4GB
inference-net/Schematron-3B	Q4	89.81 tok/sEstimated Auto-generated benchmark	2GB
black-forest-labs/FLUX.2-dev	Q4	89.72 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-Guard-3-1B	Q4	89.71 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	89.68 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-8B-FP8	Q4	89.54 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	89.42 tok/sEstimated Auto-generated benchmark	4GB
bigscience/bloomz-560m	Q4	89.39 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-270m-it	Q4	89.30 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	89.09 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	89.01 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-2	Q4	88.84 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/pythia-70m-deduped	Q4	88.79 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-3.2-1B-Instruct

1GB

109.00 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

108.96 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

108.77 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

108.16 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

107.89 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

106.71 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

106.23 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

104.92 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

103.83 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

103.74 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

103.33 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

101.59 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

101.33 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

101.15 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

100.69 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

100.48 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

100.41 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

99.11 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

97.43 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

96.97 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

96.79 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

96.38 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

95.46 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

95.38 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

93.42 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

91.38 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

90.94 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

90.78 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

90.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

90.40 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

90.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

90.35 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

90.30 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

90.30 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

90.28 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.1-dev

4GB

90.20 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

89.95 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

89.81 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

89.81 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.2-dev

4GB

89.72 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

89.71 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

89.68 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

89.54 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

89.42 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

89.39 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

4GB

89.30 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

89.09 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

89.01 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

88.84 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

88.79 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Gensyn/Qwen2.5-0.5B-Instruct	FP16	Fits comfortably	33.60 tok/sEstimated	11GB (have 16GB)
meta-llama/Llama-3.1-8B-Instruct	Q4	Fits comfortably	85.83 tok/sEstimated	4GB (have 16GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	56.91 tok/sEstimated	9GB (have 16GB)
meta-llama/Llama-3.1-8B-Instruct	FP16	Not supported	31.11 tok/sEstimated	17GB (have 16GB)
openai/gpt-oss-20b	Q4	Fits comfortably	41.30 tok/sEstimated	10GB (have 16GB)
openai/gpt-oss-20b	Q8	Not supported	32.62 tok/sEstimated	20GB (have 16GB)
openai/gpt-oss-20b	FP16	Not supported	18.72 tok/sEstimated	41GB (have 16GB)
google/gemma-3-1b-it	Q4	Fits comfortably	91.38 tok/sEstimated	1GB (have 16GB)
Qwen/Qwen3-Embedding-0.6B	FP16	Fits comfortably	32.43 tok/sEstimated	13GB (have 16GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	79.99 tok/sEstimated	3GB (have 16GB)
Qwen/Qwen2.5-1.5B-Instruct	Q8	Fits comfortably	61.02 tok/sEstimated	5GB (have 16GB)
Qwen/Qwen2.5-1.5B-Instruct	FP16	Fits comfortably	28.48 tok/sEstimated	11GB (have 16GB)
facebook/opt-125m	Q4	Fits comfortably	76.26 tok/sEstimated	4GB (have 16GB)
facebook/opt-125m	Q8	Fits comfortably	63.22 tok/sEstimated	7GB (have 16GB)
facebook/opt-125m	FP16	Fits (tight)	33.95 tok/sEstimated	15GB (have 16GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	71.37 tok/sEstimated	1GB (have 16GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	FP16	Fits comfortably	39.99 tok/sEstimated	2GB (have 16GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	75.37 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen2.5-3B-Instruct	FP16	Fits comfortably	35.93 tok/sEstimated	6GB (have 16GB)
openai-community/gpt2	Q4	Fits comfortably	77.69 tok/sEstimated	4GB (have 16GB)
openai-community/gpt2	Q8	Fits comfortably	62.81 tok/sEstimated	7GB (have 16GB)
openai-community/gpt2	FP16	Fits (tight)	30.63 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen2.5-7B-Instruct	Q4	Fits comfortably	84.87 tok/sEstimated	4GB (have 16GB)
google/gemma-3-1b-it	Q8	Fits comfortably	68.97 tok/sEstimated	1GB (have 16GB)
google/gemma-3-1b-it	FP16	Fits comfortably	38.43 tok/sEstimated	2GB (have 16GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	57.23 tok/sEstimated	7GB (have 16GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	FP16	Fits (tight)	32.51 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	77.51 tok/sEstimated	2GB (have 16GB)
Qwen/Qwen3-4B-Instruct-2507	Q8	Fits comfortably	54.48 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	28.51 tok/sEstimated	9GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	109.00 tok/sEstimated	1GB (have 16GB)
openai/gpt-oss-120b	FP16	Not supported	6.60 tok/sEstimated	235GB (have 16GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	9.17 tok/sEstimated	378GB (have 16GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	7.22 tok/sEstimated	755GB (have 16GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	3.48 tok/sEstimated	1509GB (have 16GB)
EssentialAI/rnj-1	Q4	Fits comfortably	61.88 tok/sEstimated	5GB (have 16GB)
EssentialAI/rnj-1	Q8	Fits comfortably	44.08 tok/sEstimated	10GB (have 16GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	100.69 tok/sEstimated	2GB (have 16GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q8	Not supported	19.46 tok/sEstimated	68GB (have 16GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	FP16	Not supported	11.44 tok/sEstimated	137GB (have 16GB)
EssentialAI/rnj-1	FP16	Not supported	21.45 tok/sEstimated	19GB (have 16GB)
meta-llama/Llama-3.2-3B-Instruct	Q4	Fits comfortably	100.48 tok/sEstimated	2GB (have 16GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	66.94 tok/sEstimated	3GB (have 16GB)
Qwen/Qwen3-32B	Q4	Fits (tight)	28.18 tok/sEstimated	16GB (have 16GB)
meta-llama/Llama-3.2-3B-Instruct	FP16	Fits comfortably	36.79 tok/sEstimated	6GB (have 16GB)
vikhyatk/moondream2	FP16	Fits (tight)	34.24 tok/sEstimated	15GB (have 16GB)
meta-llama/Llama-3.2-1B	Q4	Fits comfortably	103.83 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B	Q8	Fits comfortably	67.88 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B	FP16	Fits comfortably	36.51 tok/sEstimated	2GB (have 16GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	60.59 tok/sEstimated	5GB (have 16GB)

Gensyn/Qwen2.5-0.5B-InstructFP16

Fits comfortably11GB required · 16GB available

33.60 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 16GB available

85.83 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 16GB available

56.91 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructFP16

Not supported17GB required · 16GB available

31.11 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 16GB available

41.30 tok/sEstimated

openai/gpt-oss-20bQ8

Not supported20GB required · 16GB available

32.62 tok/sEstimated

openai/gpt-oss-20bFP16

Not supported41GB required · 16GB available

18.72 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 16GB available

91.38 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BFP16

Fits comfortably13GB required · 16GB available

32.43 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 16GB available

79.99 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ8

Fits comfortably5GB required · 16GB available

61.02 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructFP16

Fits comfortably11GB required · 16GB available

28.48 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 16GB available

76.26 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 16GB available

63.22 tok/sEstimated

facebook/opt-125mFP16

Fits (tight)15GB required · 16GB available

33.95 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 16GB available

71.37 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0FP16

Fits comfortably2GB required · 16GB available

39.99 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 16GB available

75.37 tok/sEstimated

Qwen/Qwen2.5-3B-InstructFP16

Fits comfortably6GB required · 16GB available

35.93 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 16GB available

77.69 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 16GB available

62.81 tok/sEstimated

openai-community/gpt2FP16

Fits (tight)15GB required · 16GB available

30.63 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ4

Fits comfortably4GB required · 16GB available

84.87 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 16GB available

68.97 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 16GB available

38.43 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 16GB available

57.23 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5FP16

Fits (tight)15GB required · 16GB available

32.51 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 16GB available

77.51 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q8

Fits comfortably4GB required · 16GB available

54.48 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 16GB available

28.51 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 16GB available

109.00 tok/sEstimated

openai/gpt-oss-120bFP16

Not supported235GB required · 16GB available

6.60 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 16GB available

9.17 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 16GB available

7.22 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 16GB available

3.48 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 16GB available

61.88 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 16GB available

44.08 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 16GB available

100.69 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ8

Not supported68GB required · 16GB available

19.46 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicFP16

Not supported137GB required · 16GB available

11.44 tok/sEstimated

EssentialAI/rnj-1FP16

Not supported19GB required · 16GB available

21.45 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 16GB available

100.48 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 16GB available

66.94 tok/sEstimated

Qwen/Qwen3-32BQ4

Fits (tight)16GB required · 16GB available

28.18 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructFP16

Fits comfortably6GB required · 16GB available

36.79 tok/sEstimated

vikhyatk/moondream2FP16

Fits (tight)15GB required · 16GB available

34.24 tok/sEstimated

meta-llama/Llama-3.2-1BQ4

Fits comfortably1GB required · 16GB available

103.83 tok/sEstimated

meta-llama/Llama-3.2-1BQ8

Fits comfortably1GB required · 16GB available

67.88 tok/sEstimated

meta-llama/Llama-3.2-1BFP16

Fits comfortably2GB required · 16GB available

36.51 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 16GB available

60.59 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RX 7900 XT

20GB

Explore how RX 7900 XT stacks up for local inference workloads.

RX 7900 XTX

24GB

Explore how RX 7900 XTX stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

RTX 3090

24GB

Explore how RTX 3090 stacks up for local inference workloads.

RTX 4070

12GB

Explore how RTX 4070 stacks up for local inference workloads.

Compare RX 6900 XT

RX 6900 XT vs RTX 3080

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RX 6900 XT vs RTX 3090

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RX 6900 XT vs RX 7900 XT

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RX 6900 XT

Unknown

By AMDReleased 2020-12MSRP $999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM16GB

Cores5,120

TDP300W

ArchitectureRDNA 2

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test RX 6900 XT performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-3.2-1B-Instruct	Q4	109.00 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	108.96 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	108.77 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	108.16 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	107.89 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-OCR	Q4	106.71 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	106.23 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	104.92 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	103.83 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	103.74 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	103.33 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	101.59 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	101.33 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	101.15 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	100.69 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	100.48 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	100.41 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	99.11 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	97.43 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	96.97 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	96.79 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	96.38 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	95.46 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	95.38 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	93.42 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	91.38 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	90.94 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-1.5B	Q4	90.78 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B-Base	Q4	90.65 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Embedding-4B	Q4	90.40 tok/sEstimated Auto-generated benchmark	2GB
BSC-LT/salamandraTA-7b-instruct	Q4	90.40 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	90.35 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-2-7b-chat-hf	Q4	90.30 tok/sEstimated Auto-generated benchmark	4GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	90.30 tok/sEstimated Auto-generated benchmark	1GB
microsoft/VibeVoice-1.5B	Q4	90.28 tok/sEstimated Auto-generated benchmark	3GB
black-forest-labs/FLUX.1-dev	Q4	90.20 tok/sEstimated Auto-generated benchmark	4GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	89.95 tok/sEstimated Auto-generated benchmark	2GB
microsoft/Phi-3.5-mini-instruct	Q4	89.81 tok/sEstimated Auto-generated benchmark	4GB
inference-net/Schematron-3B	Q4	89.81 tok/sEstimated Auto-generated benchmark	2GB
black-forest-labs/FLUX.2-dev	Q4	89.72 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-Guard-3-1B	Q4	89.71 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	89.68 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-8B-FP8	Q4	89.54 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	89.42 tok/sEstimated Auto-generated benchmark	4GB
bigscience/bloomz-560m	Q4	89.39 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-270m-it	Q4	89.30 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	89.09 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	89.01 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-2	Q4	88.84 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/pythia-70m-deduped	Q4	88.79 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-3.2-1B-Instruct

1GB

109.00 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

108.96 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

108.77 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

108.16 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

107.89 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

106.71 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

106.23 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

104.92 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

103.83 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

103.74 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

103.33 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

101.59 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

101.33 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

101.15 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

100.69 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

100.48 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

100.41 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

99.11 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

97.43 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

96.97 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

96.79 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

96.38 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

95.46 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

95.38 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

93.42 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

91.38 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

90.94 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

90.78 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

90.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

90.40 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

90.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

90.35 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

90.30 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

90.30 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

90.28 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.1-dev

4GB

90.20 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

89.95 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

89.81 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

89.81 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.2-dev

4GB

89.72 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

89.71 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

89.68 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

89.54 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

89.42 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

89.39 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

4GB

89.30 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

89.09 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

89.01 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

88.84 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

88.79 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Gensyn/Qwen2.5-0.5B-Instruct	FP16	Fits comfortably	33.60 tok/sEstimated	11GB (have 16GB)
meta-llama/Llama-3.1-8B-Instruct	Q4	Fits comfortably	85.83 tok/sEstimated	4GB (have 16GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	56.91 tok/sEstimated	9GB (have 16GB)
meta-llama/Llama-3.1-8B-Instruct	FP16	Not supported	31.11 tok/sEstimated	17GB (have 16GB)
openai/gpt-oss-20b	Q4	Fits comfortably	41.30 tok/sEstimated	10GB (have 16GB)
openai/gpt-oss-20b	Q8	Not supported	32.62 tok/sEstimated	20GB (have 16GB)
openai/gpt-oss-20b	FP16	Not supported	18.72 tok/sEstimated	41GB (have 16GB)
google/gemma-3-1b-it	Q4	Fits comfortably	91.38 tok/sEstimated	1GB (have 16GB)
Qwen/Qwen3-Embedding-0.6B	FP16	Fits comfortably	32.43 tok/sEstimated	13GB (have 16GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	79.99 tok/sEstimated	3GB (have 16GB)
Qwen/Qwen2.5-1.5B-Instruct	Q8	Fits comfortably	61.02 tok/sEstimated	5GB (have 16GB)
Qwen/Qwen2.5-1.5B-Instruct	FP16	Fits comfortably	28.48 tok/sEstimated	11GB (have 16GB)
facebook/opt-125m	Q4	Fits comfortably	76.26 tok/sEstimated	4GB (have 16GB)
facebook/opt-125m	Q8	Fits comfortably	63.22 tok/sEstimated	7GB (have 16GB)
facebook/opt-125m	FP16	Fits (tight)	33.95 tok/sEstimated	15GB (have 16GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	71.37 tok/sEstimated	1GB (have 16GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	FP16	Fits comfortably	39.99 tok/sEstimated	2GB (have 16GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	75.37 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen2.5-3B-Instruct	FP16	Fits comfortably	35.93 tok/sEstimated	6GB (have 16GB)
openai-community/gpt2	Q4	Fits comfortably	77.69 tok/sEstimated	4GB (have 16GB)
openai-community/gpt2	Q8	Fits comfortably	62.81 tok/sEstimated	7GB (have 16GB)
openai-community/gpt2	FP16	Fits (tight)	30.63 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen2.5-7B-Instruct	Q4	Fits comfortably	84.87 tok/sEstimated	4GB (have 16GB)
google/gemma-3-1b-it	Q8	Fits comfortably	68.97 tok/sEstimated	1GB (have 16GB)
google/gemma-3-1b-it	FP16	Fits comfortably	38.43 tok/sEstimated	2GB (have 16GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	57.23 tok/sEstimated	7GB (have 16GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	FP16	Fits (tight)	32.51 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	77.51 tok/sEstimated	2GB (have 16GB)
Qwen/Qwen3-4B-Instruct-2507	Q8	Fits comfortably	54.48 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	28.51 tok/sEstimated	9GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	109.00 tok/sEstimated	1GB (have 16GB)
openai/gpt-oss-120b	FP16	Not supported	6.60 tok/sEstimated	235GB (have 16GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	9.17 tok/sEstimated	378GB (have 16GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	7.22 tok/sEstimated	755GB (have 16GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	3.48 tok/sEstimated	1509GB (have 16GB)
EssentialAI/rnj-1	Q4	Fits comfortably	61.88 tok/sEstimated	5GB (have 16GB)
EssentialAI/rnj-1	Q8	Fits comfortably	44.08 tok/sEstimated	10GB (have 16GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	100.69 tok/sEstimated	2GB (have 16GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q8	Not supported	19.46 tok/sEstimated	68GB (have 16GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	FP16	Not supported	11.44 tok/sEstimated	137GB (have 16GB)
EssentialAI/rnj-1	FP16	Not supported	21.45 tok/sEstimated	19GB (have 16GB)
meta-llama/Llama-3.2-3B-Instruct	Q4	Fits comfortably	100.48 tok/sEstimated	2GB (have 16GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	66.94 tok/sEstimated	3GB (have 16GB)
Qwen/Qwen3-32B	Q4	Fits (tight)	28.18 tok/sEstimated	16GB (have 16GB)
meta-llama/Llama-3.2-3B-Instruct	FP16	Fits comfortably	36.79 tok/sEstimated	6GB (have 16GB)
vikhyatk/moondream2	FP16	Fits (tight)	34.24 tok/sEstimated	15GB (have 16GB)
meta-llama/Llama-3.2-1B	Q4	Fits comfortably	103.83 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B	Q8	Fits comfortably	67.88 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B	FP16	Fits comfortably	36.51 tok/sEstimated	2GB (have 16GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	60.59 tok/sEstimated	5GB (have 16GB)

Gensyn/Qwen2.5-0.5B-InstructFP16

Fits comfortably11GB required · 16GB available

33.60 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 16GB available

85.83 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 16GB available

56.91 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructFP16

Not supported17GB required · 16GB available

31.11 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 16GB available

41.30 tok/sEstimated

openai/gpt-oss-20bQ8

Not supported20GB required · 16GB available

32.62 tok/sEstimated

openai/gpt-oss-20bFP16

Not supported41GB required · 16GB available

18.72 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 16GB available

91.38 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BFP16

Fits comfortably13GB required · 16GB available

32.43 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 16GB available

79.99 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ8

Fits comfortably5GB required · 16GB available

61.02 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructFP16

Fits comfortably11GB required · 16GB available

28.48 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 16GB available

76.26 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 16GB available

63.22 tok/sEstimated

facebook/opt-125mFP16

Fits (tight)15GB required · 16GB available

33.95 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 16GB available

71.37 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0FP16

Fits comfortably2GB required · 16GB available

39.99 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 16GB available

75.37 tok/sEstimated

Qwen/Qwen2.5-3B-InstructFP16

Fits comfortably6GB required · 16GB available

35.93 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 16GB available

77.69 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 16GB available

62.81 tok/sEstimated

openai-community/gpt2FP16

Fits (tight)15GB required · 16GB available

30.63 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ4

Fits comfortably4GB required · 16GB available

84.87 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 16GB available

68.97 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 16GB available

38.43 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 16GB available

57.23 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5FP16

Fits (tight)15GB required · 16GB available

32.51 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 16GB available

77.51 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q8

Fits comfortably4GB required · 16GB available

54.48 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 16GB available

28.51 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 16GB available

109.00 tok/sEstimated

openai/gpt-oss-120bFP16

Not supported235GB required · 16GB available

6.60 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 16GB available

9.17 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 16GB available

7.22 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 16GB available

3.48 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 16GB available

61.88 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 16GB available

44.08 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 16GB available

100.69 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ8

Not supported68GB required · 16GB available

19.46 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicFP16

Not supported137GB required · 16GB available

11.44 tok/sEstimated

EssentialAI/rnj-1FP16

Not supported19GB required · 16GB available

21.45 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 16GB available

100.48 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 16GB available

66.94 tok/sEstimated

Qwen/Qwen3-32BQ4

Fits (tight)16GB required · 16GB available

28.18 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructFP16

Fits comfortably6GB required · 16GB available

36.79 tok/sEstimated

vikhyatk/moondream2FP16

Fits (tight)15GB required · 16GB available

34.24 tok/sEstimated

meta-llama/Llama-3.2-1BQ4

Fits comfortably1GB required · 16GB available

103.83 tok/sEstimated

meta-llama/Llama-3.2-1BQ8

Fits comfortably1GB required · 16GB available

67.88 tok/sEstimated

meta-llama/Llama-3.2-1BFP16

Fits comfortably2GB required · 16GB available

36.51 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 16GB available

60.59 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RX 7900 XT

20GB

Explore how RX 7900 XT stacks up for local inference workloads.

RX 7900 XTX

24GB

Explore how RX 7900 XTX stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

RTX 3090

24GB

Explore how RTX 3090 stacks up for local inference workloads.

RTX 4070

12GB

Explore how RTX 4070 stacks up for local inference workloads.

Compare RX 6900 XT

RX 6900 XT vs RTX 3080

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RX 6900 XT vs RTX 3090

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RX 6900 XT vs RX 7900 XT

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.