Quick Answer: RX 6800 XT offers 16GB VRAM and starts around $918.00. It delivers approximately 107 tokens/sec on meta-llama/Llama-3.2-3B. It typically draws 300W under load.

RX 6800 XT

In Stock

By AMDReleased 2020-11MSRP $649.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $918.00 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM16GB

Cores4,608

TDP300W

ArchitectureRDNA 2

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonIn Stock

$918.00

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test RX 6800 XT performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-3.2-3B	Q4	107.02 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	104.76 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	104.63 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-OCR	Q4	104.50 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	103.02 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	102.08 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	101.84 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	100.94 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	99.88 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	99.79 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	99.69 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	99.46 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	98.30 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	98.18 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	98.03 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	97.68 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	96.99 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	96.62 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	96.36 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	96.17 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	95.97 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	95.79 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	95.71 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	93.40 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	92.33 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	92.11 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	91.51 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	91.15 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	89.62 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	89.36 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-0.6B	Q4	89.21 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	89.02 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	88.99 tok/sEstimated Auto-generated benchmark	2GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	88.24 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	88.20 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	88.17 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	88.07 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	87.99 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B-Base	Q4	87.89 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	87.84 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-1b-it	Q4	87.82 tok/sEstimated Auto-generated benchmark	1GB
mistralai/Mistral-7B-Instruct-v0.1	Q4	87.66 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	87.62 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B	Q4	87.55 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3-mini-4k-instruct	Q4	87.45 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	87.43 tok/sEstimated Auto-generated benchmark	4GB
parler-tts/parler-tts-large-v1	Q4	87.37 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507	Q4	87.20 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-7B-Instruct	Q4	87.13 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	86.65 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-3.2-3B

2GB

107.02 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

104.76 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

104.63 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

104.50 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

103.02 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

102.08 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

101.84 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

100.94 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

99.88 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

99.79 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

99.69 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

99.46 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

98.30 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

98.18 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

98.03 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

97.68 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

96.99 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

96.62 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

96.36 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

96.17 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

95.97 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

95.79 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

95.71 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

93.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

92.33 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

92.11 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

91.51 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

91.15 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

89.62 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

89.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

89.21 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

89.02 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

88.99 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

88.24 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

88.20 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

88.17 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

88.07 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

87.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

87.89 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

87.84 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

87.82 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

4GB

87.66 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

87.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

87.55 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

4GB

87.45 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

87.43 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

87.37 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

87.20 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

87.13 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

86.65 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Qwen/Qwen3-0.6B	FP16	Fits comfortably	29.64 tok/sEstimated	13GB (have 16GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	79.94 tok/sEstimated	3GB (have 16GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	53.98 tok/sEstimated	5GB (have 16GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	80.68 tok/sEstimated	3GB (have 16GB)
facebook/opt-125m	Q8	Fits comfortably	55.95 tok/sEstimated	7GB (have 16GB)
facebook/opt-125m	FP16	Fits (tight)	30.55 tok/sEstimated	15GB (have 16GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	77.44 tok/sEstimated	4GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	102.08 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	66.24 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	35.60 tok/sEstimated	2GB (have 16GB)
openai/gpt-oss-120b	Q4	Not supported	15.39 tok/sEstimated	59GB (have 16GB)
openai/gpt-oss-120b	Q8	Not supported	11.05 tok/sEstimated	117GB (have 16GB)
openai/gpt-oss-120b	FP16	Not supported	6.15 tok/sEstimated	235GB (have 16GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	93.40 tok/sEstimated	2GB (have 16GB)
Qwen/Qwen2.5-3B-Instruct	FP16	Fits comfortably	38.72 tok/sEstimated	6GB (have 16GB)
bigscience/bloomz-560m	Q4	Fits comfortably	84.63 tok/sEstimated	4GB (have 16GB)
bigscience/bloomz-560m	Q8	Fits comfortably	51.43 tok/sEstimated	7GB (have 16GB)
bigscience/bloomz-560m	FP16	Fits (tight)	28.76 tok/sEstimated	15GB (have 16GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	91.15 tok/sEstimated	2GB (have 16GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	68.02 tok/sEstimated	3GB (have 16GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	FP16	Fits comfortably	35.49 tok/sEstimated	6GB (have 16GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	78.83 tok/sEstimated	4GB (have 16GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	60.84 tok/sEstimated	7GB (have 16GB)
mistralai/Mistral-7B-Instruct-v0.2	FP16	Fits (tight)	29.73 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	52.35 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen3-8B	FP16	Not supported	32.62 tok/sEstimated	17GB (have 16GB)
inference-net/Schematron-3B	Q4	Fits comfortably	89.36 tok/sEstimated	2GB (have 16GB)
inference-net/Schematron-3B	Q8	Fits comfortably	65.59 tok/sEstimated	3GB (have 16GB)
inference-net/Schematron-3B	FP16	Fits comfortably	37.35 tok/sEstimated	6GB (have 16GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	FP16	Not supported	10.31 tok/sEstimated	137GB (have 16GB)
petals-team/StableBeluga2	Q4	Fits comfortably	88.07 tok/sEstimated	4GB (have 16GB)
petals-team/StableBeluga2	Q8	Fits comfortably	57.28 tok/sEstimated	7GB (have 16GB)
petals-team/StableBeluga2	FP16	Fits (tight)	30.31 tok/sEstimated	15GB (have 16GB)
meta-llama/Meta-Llama-3-8B	Q8	Fits comfortably	59.10 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen3-32B	FP16	Not supported	11.32 tok/sEstimated	66GB (have 16GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Not supported	17.70 tok/sEstimated	39GB (have 16GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q8	Not supported	12.37 tok/sEstimated	78GB (have 16GB)
openai-community/gpt2-large	Q4	Fits comfortably	85.95 tok/sEstimated	4GB (have 16GB)
openai-community/gpt2-large	Q8	Fits comfortably	54.21 tok/sEstimated	7GB (have 16GB)
openai-community/gpt2-large	FP16	Fits (tight)	30.81 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	77.93 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	58.30 tok/sEstimated	7GB (have 16GB)
Qwen/Qwen3-1.7B	FP16	Fits (tight)	31.71 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen3-4B	Q4	Fits comfortably	76.95 tok/sEstimated	2GB (have 16GB)
google-t5/t5-3b	Q4	Fits comfortably	104.63 tok/sEstimated	2GB (have 16GB)
google-t5/t5-3b	FP16	Fits comfortably	35.44 tok/sEstimated	6GB (have 16GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	30.34 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	74.89 tok/sEstimated	3GB (have 16GB)
meta-llama/Llama-3.3-70B-Instruct	Q4	Not supported	29.46 tok/sEstimated	34GB (have 16GB)
openai-community/gpt2	Q4	Fits comfortably	77.55 tok/sEstimated	4GB (have 16GB)

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 16GB available

29.64 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 16GB available

79.94 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 16GB available

53.98 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 16GB available

80.68 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 16GB available

55.95 tok/sEstimated

facebook/opt-125mFP16

Fits (tight)15GB required · 16GB available

30.55 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 16GB available

77.44 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 16GB available

102.08 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 16GB available

66.24 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 16GB available

35.60 tok/sEstimated

openai/gpt-oss-120bQ4

Not supported59GB required · 16GB available

15.39 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported117GB required · 16GB available

11.05 tok/sEstimated

openai/gpt-oss-120bFP16

Not supported235GB required · 16GB available

6.15 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 16GB available

93.40 tok/sEstimated

Qwen/Qwen2.5-3B-InstructFP16

Fits comfortably6GB required · 16GB available

38.72 tok/sEstimated

bigscience/bloomz-560mQ4

Fits comfortably4GB required · 16GB available

84.63 tok/sEstimated

bigscience/bloomz-560mQ8

Fits comfortably7GB required · 16GB available

51.43 tok/sEstimated

bigscience/bloomz-560mFP16

Fits (tight)15GB required · 16GB available

28.76 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 16GB available

91.15 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 16GB available

68.02 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16

Fits comfortably6GB required · 16GB available

35.49 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 16GB available

78.83 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 16GB available

60.84 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2FP16

Fits (tight)15GB required · 16GB available

29.73 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 16GB available

52.35 tok/sEstimated

Qwen/Qwen3-8BFP16

Not supported17GB required · 16GB available

32.62 tok/sEstimated

inference-net/Schematron-3BQ4

Fits comfortably2GB required · 16GB available

89.36 tok/sEstimated

inference-net/Schematron-3BQ8

Fits comfortably3GB required · 16GB available

65.59 tok/sEstimated

inference-net/Schematron-3BFP16

Fits comfortably6GB required · 16GB available

37.35 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicFP16

Not supported137GB required · 16GB available

10.31 tok/sEstimated

petals-team/StableBeluga2Q4

Fits comfortably4GB required · 16GB available

88.07 tok/sEstimated

petals-team/StableBeluga2Q8

Fits comfortably7GB required · 16GB available

57.28 tok/sEstimated

petals-team/StableBeluga2FP16

Fits (tight)15GB required · 16GB available

30.31 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ8

Fits comfortably9GB required · 16GB available

59.10 tok/sEstimated

Qwen/Qwen3-32BFP16

Not supported66GB required · 16GB available

11.32 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Not supported39GB required · 16GB available

17.70 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ8

Not supported78GB required · 16GB available

12.37 tok/sEstimated

openai-community/gpt2-largeQ4

Fits comfortably4GB required · 16GB available

85.95 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 16GB available

54.21 tok/sEstimated

openai-community/gpt2-largeFP16

Fits (tight)15GB required · 16GB available

30.81 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 16GB available

77.93 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 16GB available

58.30 tok/sEstimated

Qwen/Qwen3-1.7BFP16

Fits (tight)15GB required · 16GB available

31.71 tok/sEstimated

Qwen/Qwen3-4BQ4

Fits comfortably2GB required · 16GB available

76.95 tok/sEstimated

google-t5/t5-3bQ4

Fits comfortably2GB required · 16GB available

104.63 tok/sEstimated

google-t5/t5-3bFP16

Fits comfortably6GB required · 16GB available

35.44 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 16GB available

30.34 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 16GB available

74.89 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructQ4

Not supported34GB required · 16GB available

29.46 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 16GB available

77.55 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 4070

12GB

Explore how RTX 4070 stacks up for local inference workloads.

RTX 3060 12GB

12GB

Explore how RTX 3060 12GB stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

RX 6900 XT

16GB

Explore how RX 6900 XT stacks up for local inference workloads.

RX 7900 XT

20GB

Explore how RX 7900 XT stacks up for local inference workloads.

Quick Answer: RX 6800 XT offers 16GB VRAM and starts around $918.00. It delivers approximately 107 tokens/sec on meta-llama/Llama-3.2-3B. It typically draws 300W under load.

RX 6800 XT

In Stock

By AMDReleased 2020-11MSRP $649.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $918.00 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM16GB

Cores4,608

TDP300W

ArchitectureRDNA 2

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonIn Stock

$918.00

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test RX 6800 XT performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-3.2-3B	Q4	107.02 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	104.76 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	104.63 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-OCR	Q4	104.50 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	103.02 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	102.08 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	101.84 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	100.94 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	99.88 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	99.79 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	99.69 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	99.46 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	98.30 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	98.18 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	98.03 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	97.68 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	96.99 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	96.62 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	96.36 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	96.17 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	95.97 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	95.79 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	95.71 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	93.40 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	92.33 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	92.11 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	91.51 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	91.15 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	89.62 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	89.36 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-0.6B	Q4	89.21 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	89.02 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	88.99 tok/sEstimated Auto-generated benchmark	2GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	88.24 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	88.20 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	88.17 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	88.07 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	87.99 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B-Base	Q4	87.89 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	87.84 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-1b-it	Q4	87.82 tok/sEstimated Auto-generated benchmark	1GB
mistralai/Mistral-7B-Instruct-v0.1	Q4	87.66 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	87.62 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B	Q4	87.55 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3-mini-4k-instruct	Q4	87.45 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	87.43 tok/sEstimated Auto-generated benchmark	4GB
parler-tts/parler-tts-large-v1	Q4	87.37 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507	Q4	87.20 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-7B-Instruct	Q4	87.13 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	86.65 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-3.2-3B

2GB

107.02 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

104.76 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

104.63 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

104.50 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

103.02 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

102.08 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

101.84 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

100.94 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

99.88 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

99.79 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

99.69 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

99.46 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

98.30 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

98.18 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

98.03 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

97.68 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

96.99 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

96.62 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

96.36 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

96.17 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

95.97 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

95.79 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

95.71 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

93.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

92.33 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

92.11 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

91.51 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

91.15 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

89.62 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

89.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

89.21 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

89.02 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

88.99 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

88.24 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

88.20 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

88.17 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

88.07 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

87.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

87.89 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

87.84 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

87.82 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

4GB

87.66 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

87.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

87.55 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

4GB

87.45 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

87.43 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

87.37 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

87.20 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

87.13 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

86.65 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Qwen/Qwen3-0.6B	FP16	Fits comfortably	29.64 tok/sEstimated	13GB (have 16GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	79.94 tok/sEstimated	3GB (have 16GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	53.98 tok/sEstimated	5GB (have 16GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	80.68 tok/sEstimated	3GB (have 16GB)
facebook/opt-125m	Q8	Fits comfortably	55.95 tok/sEstimated	7GB (have 16GB)
facebook/opt-125m	FP16	Fits (tight)	30.55 tok/sEstimated	15GB (have 16GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	77.44 tok/sEstimated	4GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	102.08 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	66.24 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	35.60 tok/sEstimated	2GB (have 16GB)
openai/gpt-oss-120b	Q4	Not supported	15.39 tok/sEstimated	59GB (have 16GB)
openai/gpt-oss-120b	Q8	Not supported	11.05 tok/sEstimated	117GB (have 16GB)
openai/gpt-oss-120b	FP16	Not supported	6.15 tok/sEstimated	235GB (have 16GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	93.40 tok/sEstimated	2GB (have 16GB)
Qwen/Qwen2.5-3B-Instruct	FP16	Fits comfortably	38.72 tok/sEstimated	6GB (have 16GB)
bigscience/bloomz-560m	Q4	Fits comfortably	84.63 tok/sEstimated	4GB (have 16GB)
bigscience/bloomz-560m	Q8	Fits comfortably	51.43 tok/sEstimated	7GB (have 16GB)
bigscience/bloomz-560m	FP16	Fits (tight)	28.76 tok/sEstimated	15GB (have 16GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	91.15 tok/sEstimated	2GB (have 16GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	68.02 tok/sEstimated	3GB (have 16GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	FP16	Fits comfortably	35.49 tok/sEstimated	6GB (have 16GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	78.83 tok/sEstimated	4GB (have 16GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	60.84 tok/sEstimated	7GB (have 16GB)
mistralai/Mistral-7B-Instruct-v0.2	FP16	Fits (tight)	29.73 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	52.35 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen3-8B	FP16	Not supported	32.62 tok/sEstimated	17GB (have 16GB)
inference-net/Schematron-3B	Q4	Fits comfortably	89.36 tok/sEstimated	2GB (have 16GB)
inference-net/Schematron-3B	Q8	Fits comfortably	65.59 tok/sEstimated	3GB (have 16GB)
inference-net/Schematron-3B	FP16	Fits comfortably	37.35 tok/sEstimated	6GB (have 16GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	FP16	Not supported	10.31 tok/sEstimated	137GB (have 16GB)
petals-team/StableBeluga2	Q4	Fits comfortably	88.07 tok/sEstimated	4GB (have 16GB)
petals-team/StableBeluga2	Q8	Fits comfortably	57.28 tok/sEstimated	7GB (have 16GB)
petals-team/StableBeluga2	FP16	Fits (tight)	30.31 tok/sEstimated	15GB (have 16GB)
meta-llama/Meta-Llama-3-8B	Q8	Fits comfortably	59.10 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen3-32B	FP16	Not supported	11.32 tok/sEstimated	66GB (have 16GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Not supported	17.70 tok/sEstimated	39GB (have 16GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q8	Not supported	12.37 tok/sEstimated	78GB (have 16GB)
openai-community/gpt2-large	Q4	Fits comfortably	85.95 tok/sEstimated	4GB (have 16GB)
openai-community/gpt2-large	Q8	Fits comfortably	54.21 tok/sEstimated	7GB (have 16GB)
openai-community/gpt2-large	FP16	Fits (tight)	30.81 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	77.93 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	58.30 tok/sEstimated	7GB (have 16GB)
Qwen/Qwen3-1.7B	FP16	Fits (tight)	31.71 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen3-4B	Q4	Fits comfortably	76.95 tok/sEstimated	2GB (have 16GB)
google-t5/t5-3b	Q4	Fits comfortably	104.63 tok/sEstimated	2GB (have 16GB)
google-t5/t5-3b	FP16	Fits comfortably	35.44 tok/sEstimated	6GB (have 16GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	30.34 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	74.89 tok/sEstimated	3GB (have 16GB)
meta-llama/Llama-3.3-70B-Instruct	Q4	Not supported	29.46 tok/sEstimated	34GB (have 16GB)
openai-community/gpt2	Q4	Fits comfortably	77.55 tok/sEstimated	4GB (have 16GB)

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 16GB available

29.64 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 16GB available

79.94 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 16GB available

53.98 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 16GB available

80.68 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 16GB available

55.95 tok/sEstimated

facebook/opt-125mFP16

Fits (tight)15GB required · 16GB available

30.55 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 16GB available

77.44 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 16GB available

102.08 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 16GB available

66.24 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 16GB available

35.60 tok/sEstimated

openai/gpt-oss-120bQ4

Not supported59GB required · 16GB available

15.39 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported117GB required · 16GB available

11.05 tok/sEstimated

openai/gpt-oss-120bFP16

Not supported235GB required · 16GB available

6.15 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 16GB available

93.40 tok/sEstimated

Qwen/Qwen2.5-3B-InstructFP16

Fits comfortably6GB required · 16GB available

38.72 tok/sEstimated

bigscience/bloomz-560mQ4

Fits comfortably4GB required · 16GB available

84.63 tok/sEstimated

bigscience/bloomz-560mQ8

Fits comfortably7GB required · 16GB available

51.43 tok/sEstimated

bigscience/bloomz-560mFP16

Fits (tight)15GB required · 16GB available

28.76 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 16GB available

91.15 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 16GB available

68.02 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16

Fits comfortably6GB required · 16GB available

35.49 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 16GB available

78.83 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 16GB available

60.84 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2FP16

Fits (tight)15GB required · 16GB available

29.73 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 16GB available

52.35 tok/sEstimated

Qwen/Qwen3-8BFP16

Not supported17GB required · 16GB available

32.62 tok/sEstimated

inference-net/Schematron-3BQ4

Fits comfortably2GB required · 16GB available

89.36 tok/sEstimated

inference-net/Schematron-3BQ8

Fits comfortably3GB required · 16GB available

65.59 tok/sEstimated

inference-net/Schematron-3BFP16

Fits comfortably6GB required · 16GB available

37.35 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicFP16

Not supported137GB required · 16GB available

10.31 tok/sEstimated

petals-team/StableBeluga2Q4

Fits comfortably4GB required · 16GB available

88.07 tok/sEstimated

petals-team/StableBeluga2Q8

Fits comfortably7GB required · 16GB available

57.28 tok/sEstimated

petals-team/StableBeluga2FP16

Fits (tight)15GB required · 16GB available

30.31 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ8

Fits comfortably9GB required · 16GB available

59.10 tok/sEstimated

Qwen/Qwen3-32BFP16

Not supported66GB required · 16GB available

11.32 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Not supported39GB required · 16GB available

17.70 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ8

Not supported78GB required · 16GB available

12.37 tok/sEstimated

openai-community/gpt2-largeQ4

Fits comfortably4GB required · 16GB available

85.95 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 16GB available

54.21 tok/sEstimated

openai-community/gpt2-largeFP16

Fits (tight)15GB required · 16GB available

30.81 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 16GB available

77.93 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 16GB available

58.30 tok/sEstimated

Qwen/Qwen3-1.7BFP16

Fits (tight)15GB required · 16GB available

31.71 tok/sEstimated

Qwen/Qwen3-4BQ4

Fits comfortably2GB required · 16GB available

76.95 tok/sEstimated

google-t5/t5-3bQ4

Fits comfortably2GB required · 16GB available

104.63 tok/sEstimated

google-t5/t5-3bFP16

Fits comfortably6GB required · 16GB available

35.44 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 16GB available

30.34 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 16GB available

74.89 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructQ4

Not supported34GB required · 16GB available

29.46 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 16GB available

77.55 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 4070

12GB

Explore how RTX 4070 stacks up for local inference workloads.

RTX 3060 12GB

12GB

Explore how RTX 3060 12GB stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

RX 6900 XT

16GB

Explore how RX 6900 XT stacks up for local inference workloads.

RX 7900 XT

20GB

Explore how RX 7900 XT stacks up for local inference workloads.