Quick Answer: RX 7900 XTX offers 24GB VRAM and starts around $99.99. It delivers approximately 192 tokens/sec on WeiboAI/VibeThinker-1.5B. It typically draws 355W under load.

RX 7900 XTX

In Stock

By AMDReleased 2022-12MSRP $999.00

RX 7900 XTX gives AMD builders a 24GB option with competitive throughput for 7B–13B LLMs and diffusion workloads. Use ROCm-compatible stacks like llama.cpp or vLLM (AMD fork).

Buy on Amazon - $99.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM24GB

Cores6,144

TDP355W

ArchitectureRDNA 3

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonIn Stock

$99.99

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RX 7900 XTX performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
WeiboAI/VibeThinker-1.5B	Q4	191.98 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	191.65 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	191.32 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	190.23 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	188.87 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	187.96 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	186.92 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	183.83 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	182.32 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	182.10 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	182.08 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	180.34 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	179.86 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	178.70 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	177.84 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	177.84 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	176.40 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	173.09 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	171.53 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	171.50 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	170.23 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	169.92 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	169.47 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	167.23 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	166.42 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	162.44 tok/sEstimated Auto-generated benchmark	1GB
microsoft/Phi-4-multimodal-instruct	Q4	159.91 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Instruct	Q4	159.65 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-1B-Instruct	Q4	159.52 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	159.48 tok/sEstimated Auto-generated benchmark	2GB
petals-team/StableBeluga2	Q4	158.43 tok/sEstimated Auto-generated benchmark	4GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	158.16 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-1b-it	Q4	158.14 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	157.92 tok/sEstimated Auto-generated benchmark	2GB
Gensyn/Qwen2.5-0.5B-Instruct	Q4	157.49 tok/sEstimated Auto-generated benchmark	3GB
google-t5/t5-3b	Q4	157.37 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-0.6B	Q4	156.94 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-vision-instruct	Q4	156.93 tok/sEstimated Auto-generated benchmark	4GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	156.86 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Thinking-2507	Q4	156.77 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-2-7b-chat-hf	Q4	156.70 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	156.43 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceH4/zephyr-7b-beta	Q4	156.01 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	155.91 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	155.34 tok/sEstimated Auto-generated benchmark	2GB
IlyaGusev/saiga_llama3_8b	Q4	154.62 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	154.60 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-1.5B	Q4	154.56 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Meta-Llama-3-8B	Q4	154.46 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	154.43 tok/sEstimated Auto-generated benchmark	4GB

WeiboAI/VibeThinker-1.5B

1GB

191.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

191.65 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

191.32 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

190.23 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

188.87 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

187.96 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

186.92 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

183.83 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

182.32 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

182.10 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

182.08 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

180.34 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

179.86 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

178.70 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

177.84 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

177.84 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

176.40 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

173.09 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

171.53 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

171.50 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

170.23 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

169.92 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

169.47 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

167.23 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

166.42 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

162.44 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

159.91 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

159.65 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

159.52 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

159.48 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

158.43 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

4GB

158.16 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

158.14 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

157.92 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

3GB

157.49 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

157.37 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

156.94 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

156.93 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

156.86 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

156.77 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

156.70 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

156.43 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

156.01 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

155.91 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

155.34 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

4GB

154.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

154.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

154.56 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

154.46 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

154.43 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
HuggingFaceH4/zephyr-7b-beta	Q4	Fits comfortably	156.01 tok/sEstimated	4GB (have 24GB)
liuhaotian/llava-v1.5-7b	Q4	Fits comfortably	150.77 tok/sEstimated	4GB (have 24GB)
Qwen/Qwen2.5-72B-Instruct	Q8	Not supported	20.20 tok/sEstimated	70GB (have 24GB)
Qwen/Qwen2.5-72B-Instruct	FP16	Not supported	11.56 tok/sEstimated	141GB (have 24GB)
BSC-LT/salamandraTA-7b-instruct	Q8	Fits comfortably	104.26 tok/sEstimated	7GB (have 24GB)
deepseek-ai/deepseek-coder-33b-instruct	Q4	Fits comfortably	50.76 tok/sEstimated	17GB (have 24GB)
deepseek-ai/deepseek-coder-33b-instruct	Q8	Not supported	35.83 tok/sEstimated	34GB (have 24GB)
meta-llama/Llama-3.2-3B-Instruct	Q4	Fits comfortably	141.65 tok/sEstimated	2GB (have 24GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	97.68 tok/sEstimated	3GB (have 24GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q8	Not supported	21.52 tok/sEstimated	78GB (have 24GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	FP16	Not supported	10.51 tok/sEstimated	156GB (have 24GB)
meta-llama/Llama-2-13b-chat-hf	Q4	Fits comfortably	102.99 tok/sEstimated	7GB (have 24GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	FP16	Not supported	27.41 tok/sEstimated	61GB (have 24GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Not supported	58.16 tok/sEstimated	31GB (have 24GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	FP16	Not supported	33.16 tok/sEstimated	61GB (have 24GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	Fits comfortably	149.48 tok/sEstimated	3GB (have 24GB)
mistralai/Mistral-Small-Instruct-2409	Q4	Fits comfortably	75.77 tok/sEstimated	11GB (have 24GB)
mistralai/Mistral-Small-Instruct-2409	Q8	Fits (tight)	57.28 tok/sEstimated	23GB (have 24GB)
google/gemma-2-27b-it	Q4	Fits comfortably	77.91 tok/sEstimated	14GB (have 24GB)
google/gemma-2-27b-it	Q8	Not supported	58.13 tok/sEstimated	28GB (have 24GB)
black-forest-labs/FLUX.2-dev	FP16	Fits comfortably	60.22 tok/sEstimated	16GB (have 24GB)
OpenPipe/Qwen3-14B-Instruct	Q8	Fits comfortably	82.40 tok/sEstimated	14GB (have 24GB)
OpenPipe/Qwen3-14B-Instruct	FP16	Not supported	38.62 tok/sEstimated	29GB (have 24GB)
openai/gpt-oss-120b	Q4	Not supported	27.67 tok/sEstimated	59GB (have 24GB)
Qwen/Qwen3-4B	Q4	Fits comfortably	132.22 tok/sEstimated	2GB (have 24GB)
OpenPipe/Qwen3-14B-Instruct	Q4	Fits comfortably	98.86 tok/sEstimated	7GB (have 24GB)
openai-community/gpt2-xl	Q4	Fits comfortably	135.22 tok/sEstimated	4GB (have 24GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	117.75 tok/sEstimated	3GB (have 24GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	Fits comfortably	144.19 tok/sEstimated	4GB (have 24GB)
facebook/opt-125m	Q8	Fits comfortably	100.64 tok/sEstimated	7GB (have 24GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	Fits comfortably	156.86 tok/sEstimated	2GB (have 24GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	59.08 tok/sEstimated	9GB (have 24GB)
Qwen/Qwen2.5-1.5B	FP16	Fits comfortably	54.08 tok/sEstimated	11GB (have 24GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	102.01 tok/sEstimated	7GB (have 24GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	70.33 tok/sEstimated	14GB (have 24GB)
meta-llama/Llama-3.3-70B-Instruct	FP16	Not supported	20.73 tok/sEstimated	137GB (have 24GB)
Qwen/Qwen3-Embedding-8B	Q4	Fits comfortably	150.33 tok/sEstimated	4GB (have 24GB)
Qwen/Qwen3-Embedding-8B	Q8	Fits comfortably	102.05 tok/sEstimated	9GB (have 24GB)
Qwen/Qwen3-14B	Q8	Fits comfortably	74.21 tok/sEstimated	14GB (have 24GB)
Qwen/Qwen3-14B	FP16	Not supported	38.58 tok/sEstimated	29GB (have 24GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	152.24 tok/sEstimated	4GB (have 24GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	105.47 tok/sEstimated	7GB (have 24GB)
meta-llama/Llama-2-7b-hf	FP16	Fits comfortably	57.52 tok/sEstimated	15GB (have 24GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	142.19 tok/sEstimated	4GB (have 24GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	109.91 tok/sEstimated	9GB (have 24GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	139.70 tok/sEstimated	3GB (have 24GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	101.81 tok/sEstimated	5GB (have 24GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	159.48 tok/sEstimated	2GB (have 24GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	121.88 tok/sEstimated	3GB (have 24GB)
Qwen/Qwen3-4B-Thinking-2507	FP16	Fits comfortably	59.02 tok/sEstimated	9GB (have 24GB)

HuggingFaceH4/zephyr-7b-betaQ4

Fits comfortably4GB required · 24GB available

156.01 tok/sEstimated

liuhaotian/llava-v1.5-7bQ4

Fits comfortably4GB required · 24GB available

150.77 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ8

Not supported70GB required · 24GB available

20.20 tok/sEstimated

Qwen/Qwen2.5-72B-InstructFP16

Not supported141GB required · 24GB available

11.56 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ8

Fits comfortably7GB required · 24GB available

104.26 tok/sEstimated

deepseek-ai/deepseek-coder-33b-instructQ4

Fits comfortably17GB required · 24GB available

50.76 tok/sEstimated

deepseek-ai/deepseek-coder-33b-instructQ8

Not supported34GB required · 24GB available

35.83 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 24GB available

141.65 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 24GB available

97.68 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ8

Not supported78GB required · 24GB available

21.52 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingFP16

Not supported156GB required · 24GB available

10.51 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ4

Fits comfortably7GB required · 24GB available

102.99 tok/sEstimated

Qwen/Qwen3-30B-A3B-Thinking-2507FP16

Not supported61GB required · 24GB available

27.41 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Not supported31GB required · 24GB available

58.16 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitFP16

Not supported61GB required · 24GB available

33.16 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ4

Fits comfortably3GB required · 24GB available

149.48 tok/sEstimated

mistralai/Mistral-Small-Instruct-2409Q4

Fits comfortably11GB required · 24GB available

75.77 tok/sEstimated

mistralai/Mistral-Small-Instruct-2409Q8

Fits (tight)23GB required · 24GB available

57.28 tok/sEstimated

google/gemma-2-27b-itQ4

Fits comfortably14GB required · 24GB available

77.91 tok/sEstimated

google/gemma-2-27b-itQ8

Not supported28GB required · 24GB available

58.13 tok/sEstimated

black-forest-labs/FLUX.2-devFP16

Fits comfortably16GB required · 24GB available

60.22 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ8

Fits comfortably14GB required · 24GB available

82.40 tok/sEstimated

OpenPipe/Qwen3-14B-InstructFP16

Not supported29GB required · 24GB available

38.62 tok/sEstimated

openai/gpt-oss-120bQ4

Not supported59GB required · 24GB available

27.67 tok/sEstimated

Qwen/Qwen3-4BQ4

Fits comfortably2GB required · 24GB available

132.22 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ4

Fits comfortably7GB required · 24GB available

98.86 tok/sEstimated

openai-community/gpt2-xlQ4

Fits comfortably4GB required · 24GB available

135.22 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 24GB available

117.75 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4

Fits comfortably4GB required · 24GB available

144.19 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 24GB available

100.64 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ4

Fits comfortably2GB required · 24GB available

156.86 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 24GB available

59.08 tok/sEstimated

Qwen/Qwen2.5-1.5BFP16

Fits comfortably11GB required · 24GB available

54.08 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 24GB available

102.01 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 24GB available

70.33 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructFP16

Not supported137GB required · 24GB available

20.73 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ4

Fits comfortably4GB required · 24GB available

150.33 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ8

Fits comfortably9GB required · 24GB available

102.05 tok/sEstimated

Qwen/Qwen3-14BQ8

Fits comfortably14GB required · 24GB available

74.21 tok/sEstimated

Qwen/Qwen3-14BFP16

Not supported29GB required · 24GB available

38.58 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 24GB available

152.24 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 24GB available

105.47 tok/sEstimated

meta-llama/Llama-2-7b-hfFP16

Fits comfortably15GB required · 24GB available

57.52 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 24GB available

142.19 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably9GB required · 24GB available

109.91 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 24GB available

139.70 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 24GB available

101.81 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 24GB available

159.48 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 24GB available

121.88 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507FP16

Fits comfortably9GB required · 24GB available

59.02 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

GPU FAQs

Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.

How does Vulkan compare to ROCm on RX 7900 XTX?

On qwen3-30B Q4, Vulkan decode hits ~117 tok/sec once a 32K context fills, while ROCm drops to ~12 tok/sec—making Vulkan the faster option for long prompts.

Source: Reddit – /r/LocalLLaMA (mrdpho0)

What prompt-prefill speeds can Vulkan deliver?

The same benchmarks show Vulkan prompt prefill at ~486 tok/s on Windows drivers versus ~432 tok/s on ROCm, highlighting the driver advantage.

Source: Reddit – /r/LocalLLaMA (mrdpho0)

Can AMD hardware host 70B Q8 models?

Yes. Builders highlight Ryzen AI 395 mini-PCs with RX 7900-class GPUs that can load 70B Q8 contexts—something 24 GB NVIDIA cards can’t do—though throughput is slower.

Source: Reddit – /r/LocalLLaMA (mqupq0a)

Does FlashAttention accelerate the 7900 XTX?

Not yet—FlashAttention under Vulkan falls back to the CPU on 7900 XTX, so enabling it doesn’t improve throughput the way it does on NVIDIA cards.

Source: Reddit – /r/LocalLLaMA (mrdpho0)

What are the specs and price snapshot?

RX 7900 XTX offers 24 GB GDDR6 and a 355 W TBP. As of Nov 2025 Amazon listed it at $899 in stock.

Source: TechPowerUp – Radeon RX 7900 XTX Specs

Alternative GPUs

RX 7900 XT

20GB

Explore how RX 7900 XT stacks up for local inference workloads.

RX 6900 XT

16GB

Explore how RX 6900 XT stacks up for local inference workloads.

RTX 4090

24GB

Explore how RTX 4090 stacks up for local inference workloads.

RTX 4080

16GB

Explore how RTX 4080 stacks up for local inference workloads.

RTX 4070 Ti

12GB

Explore how RTX 4070 Ti stacks up for local inference workloads.

Compare RX 7900 XTX

RX 7900 XTX vs RTX 4090

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RX 7900 XTX vs RTX 4080

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RX 7900 XTX vs RX 7900 XT

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

Quick Answer: RX 7900 XTX offers 24GB VRAM and starts around $99.99. It delivers approximately 192 tokens/sec on WeiboAI/VibeThinker-1.5B. It typically draws 355W under load.

RX 7900 XTX

In Stock

By AMDReleased 2022-12MSRP $999.00

RX 7900 XTX gives AMD builders a 24GB option with competitive throughput for 7B–13B LLMs and diffusion workloads. Use ROCm-compatible stacks like llama.cpp or vLLM (AMD fork).

Buy on Amazon - $99.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM24GB

Cores6,144

TDP355W

ArchitectureRDNA 3

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonIn Stock

$99.99

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RX 7900 XTX performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
WeiboAI/VibeThinker-1.5B	Q4	191.98 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	191.65 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	191.32 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	190.23 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	188.87 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	187.96 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	186.92 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	183.83 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	182.32 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	182.10 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	182.08 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	180.34 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	179.86 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	178.70 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	177.84 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	177.84 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	176.40 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	173.09 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	171.53 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	171.50 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	170.23 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	169.92 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	169.47 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	167.23 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	166.42 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	162.44 tok/sEstimated Auto-generated benchmark	1GB
microsoft/Phi-4-multimodal-instruct	Q4	159.91 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Instruct	Q4	159.65 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-1B-Instruct	Q4	159.52 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	159.48 tok/sEstimated Auto-generated benchmark	2GB
petals-team/StableBeluga2	Q4	158.43 tok/sEstimated Auto-generated benchmark	4GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	158.16 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-1b-it	Q4	158.14 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	157.92 tok/sEstimated Auto-generated benchmark	2GB
Gensyn/Qwen2.5-0.5B-Instruct	Q4	157.49 tok/sEstimated Auto-generated benchmark	3GB
google-t5/t5-3b	Q4	157.37 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-0.6B	Q4	156.94 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-vision-instruct	Q4	156.93 tok/sEstimated Auto-generated benchmark	4GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	156.86 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Thinking-2507	Q4	156.77 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-2-7b-chat-hf	Q4	156.70 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	156.43 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceH4/zephyr-7b-beta	Q4	156.01 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	155.91 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	155.34 tok/sEstimated Auto-generated benchmark	2GB
IlyaGusev/saiga_llama3_8b	Q4	154.62 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	154.60 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-1.5B	Q4	154.56 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Meta-Llama-3-8B	Q4	154.46 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	154.43 tok/sEstimated Auto-generated benchmark	4GB

WeiboAI/VibeThinker-1.5B

1GB

191.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

191.65 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

191.32 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

190.23 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

188.87 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

187.96 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

186.92 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

183.83 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

182.32 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

182.10 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

182.08 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

180.34 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

179.86 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

178.70 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

177.84 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

177.84 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

176.40 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

173.09 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

171.53 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

171.50 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

170.23 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

169.92 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

169.47 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

167.23 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

166.42 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

162.44 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

159.91 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

159.65 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

159.52 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

159.48 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

158.43 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

4GB

158.16 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

158.14 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

157.92 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

3GB

157.49 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

157.37 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

156.94 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

156.93 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

156.86 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

156.77 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

156.70 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

156.43 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

156.01 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

155.91 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

155.34 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

4GB

154.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

154.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

154.56 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

154.46 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

154.43 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
HuggingFaceH4/zephyr-7b-beta	Q4	Fits comfortably	156.01 tok/sEstimated	4GB (have 24GB)
liuhaotian/llava-v1.5-7b	Q4	Fits comfortably	150.77 tok/sEstimated	4GB (have 24GB)
Qwen/Qwen2.5-72B-Instruct	Q8	Not supported	20.20 tok/sEstimated	70GB (have 24GB)
Qwen/Qwen2.5-72B-Instruct	FP16	Not supported	11.56 tok/sEstimated	141GB (have 24GB)
BSC-LT/salamandraTA-7b-instruct	Q8	Fits comfortably	104.26 tok/sEstimated	7GB (have 24GB)
deepseek-ai/deepseek-coder-33b-instruct	Q4	Fits comfortably	50.76 tok/sEstimated	17GB (have 24GB)
deepseek-ai/deepseek-coder-33b-instruct	Q8	Not supported	35.83 tok/sEstimated	34GB (have 24GB)
meta-llama/Llama-3.2-3B-Instruct	Q4	Fits comfortably	141.65 tok/sEstimated	2GB (have 24GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	97.68 tok/sEstimated	3GB (have 24GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q8	Not supported	21.52 tok/sEstimated	78GB (have 24GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	FP16	Not supported	10.51 tok/sEstimated	156GB (have 24GB)
meta-llama/Llama-2-13b-chat-hf	Q4	Fits comfortably	102.99 tok/sEstimated	7GB (have 24GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	FP16	Not supported	27.41 tok/sEstimated	61GB (have 24GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Not supported	58.16 tok/sEstimated	31GB (have 24GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	FP16	Not supported	33.16 tok/sEstimated	61GB (have 24GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	Fits comfortably	149.48 tok/sEstimated	3GB (have 24GB)
mistralai/Mistral-Small-Instruct-2409	Q4	Fits comfortably	75.77 tok/sEstimated	11GB (have 24GB)
mistralai/Mistral-Small-Instruct-2409	Q8	Fits (tight)	57.28 tok/sEstimated	23GB (have 24GB)
google/gemma-2-27b-it	Q4	Fits comfortably	77.91 tok/sEstimated	14GB (have 24GB)
google/gemma-2-27b-it	Q8	Not supported	58.13 tok/sEstimated	28GB (have 24GB)
black-forest-labs/FLUX.2-dev	FP16	Fits comfortably	60.22 tok/sEstimated	16GB (have 24GB)
OpenPipe/Qwen3-14B-Instruct	Q8	Fits comfortably	82.40 tok/sEstimated	14GB (have 24GB)
OpenPipe/Qwen3-14B-Instruct	FP16	Not supported	38.62 tok/sEstimated	29GB (have 24GB)
openai/gpt-oss-120b	Q4	Not supported	27.67 tok/sEstimated	59GB (have 24GB)
Qwen/Qwen3-4B	Q4	Fits comfortably	132.22 tok/sEstimated	2GB (have 24GB)
OpenPipe/Qwen3-14B-Instruct	Q4	Fits comfortably	98.86 tok/sEstimated	7GB (have 24GB)
openai-community/gpt2-xl	Q4	Fits comfortably	135.22 tok/sEstimated	4GB (have 24GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	117.75 tok/sEstimated	3GB (have 24GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	Fits comfortably	144.19 tok/sEstimated	4GB (have 24GB)
facebook/opt-125m	Q8	Fits comfortably	100.64 tok/sEstimated	7GB (have 24GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	Fits comfortably	156.86 tok/sEstimated	2GB (have 24GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	59.08 tok/sEstimated	9GB (have 24GB)
Qwen/Qwen2.5-1.5B	FP16	Fits comfortably	54.08 tok/sEstimated	11GB (have 24GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	102.01 tok/sEstimated	7GB (have 24GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	70.33 tok/sEstimated	14GB (have 24GB)
meta-llama/Llama-3.3-70B-Instruct	FP16	Not supported	20.73 tok/sEstimated	137GB (have 24GB)
Qwen/Qwen3-Embedding-8B	Q4	Fits comfortably	150.33 tok/sEstimated	4GB (have 24GB)
Qwen/Qwen3-Embedding-8B	Q8	Fits comfortably	102.05 tok/sEstimated	9GB (have 24GB)
Qwen/Qwen3-14B	Q8	Fits comfortably	74.21 tok/sEstimated	14GB (have 24GB)
Qwen/Qwen3-14B	FP16	Not supported	38.58 tok/sEstimated	29GB (have 24GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	152.24 tok/sEstimated	4GB (have 24GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	105.47 tok/sEstimated	7GB (have 24GB)
meta-llama/Llama-2-7b-hf	FP16	Fits comfortably	57.52 tok/sEstimated	15GB (have 24GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	142.19 tok/sEstimated	4GB (have 24GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	109.91 tok/sEstimated	9GB (have 24GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	139.70 tok/sEstimated	3GB (have 24GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	101.81 tok/sEstimated	5GB (have 24GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	159.48 tok/sEstimated	2GB (have 24GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	121.88 tok/sEstimated	3GB (have 24GB)
Qwen/Qwen3-4B-Thinking-2507	FP16	Fits comfortably	59.02 tok/sEstimated	9GB (have 24GB)

HuggingFaceH4/zephyr-7b-betaQ4

Fits comfortably4GB required · 24GB available

156.01 tok/sEstimated

liuhaotian/llava-v1.5-7bQ4

Fits comfortably4GB required · 24GB available

150.77 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ8

Not supported70GB required · 24GB available

20.20 tok/sEstimated

Qwen/Qwen2.5-72B-InstructFP16

Not supported141GB required · 24GB available

11.56 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ8

Fits comfortably7GB required · 24GB available

104.26 tok/sEstimated

deepseek-ai/deepseek-coder-33b-instructQ4

Fits comfortably17GB required · 24GB available

50.76 tok/sEstimated

deepseek-ai/deepseek-coder-33b-instructQ8

Not supported34GB required · 24GB available

35.83 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 24GB available

141.65 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 24GB available

97.68 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ8

Not supported78GB required · 24GB available

21.52 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingFP16

Not supported156GB required · 24GB available

10.51 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ4

Fits comfortably7GB required · 24GB available

102.99 tok/sEstimated

Qwen/Qwen3-30B-A3B-Thinking-2507FP16

Not supported61GB required · 24GB available

27.41 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Not supported31GB required · 24GB available

58.16 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitFP16

Not supported61GB required · 24GB available

33.16 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ4

Fits comfortably3GB required · 24GB available

149.48 tok/sEstimated

mistralai/Mistral-Small-Instruct-2409Q4

Fits comfortably11GB required · 24GB available

75.77 tok/sEstimated

mistralai/Mistral-Small-Instruct-2409Q8

Fits (tight)23GB required · 24GB available

57.28 tok/sEstimated

google/gemma-2-27b-itQ4

Fits comfortably14GB required · 24GB available

77.91 tok/sEstimated

google/gemma-2-27b-itQ8

Not supported28GB required · 24GB available

58.13 tok/sEstimated

black-forest-labs/FLUX.2-devFP16

Fits comfortably16GB required · 24GB available

60.22 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ8

Fits comfortably14GB required · 24GB available

82.40 tok/sEstimated

OpenPipe/Qwen3-14B-InstructFP16

Not supported29GB required · 24GB available

38.62 tok/sEstimated

openai/gpt-oss-120bQ4

Not supported59GB required · 24GB available

27.67 tok/sEstimated

Qwen/Qwen3-4BQ4

Fits comfortably2GB required · 24GB available

132.22 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ4

Fits comfortably7GB required · 24GB available

98.86 tok/sEstimated

openai-community/gpt2-xlQ4

Fits comfortably4GB required · 24GB available

135.22 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 24GB available

117.75 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4

Fits comfortably4GB required · 24GB available

144.19 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 24GB available

100.64 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ4

Fits comfortably2GB required · 24GB available

156.86 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 24GB available

59.08 tok/sEstimated

Qwen/Qwen2.5-1.5BFP16

Fits comfortably11GB required · 24GB available

54.08 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 24GB available

102.01 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 24GB available

70.33 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructFP16

Not supported137GB required · 24GB available

20.73 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ4

Fits comfortably4GB required · 24GB available

150.33 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ8

Fits comfortably9GB required · 24GB available

102.05 tok/sEstimated

Qwen/Qwen3-14BQ8

Fits comfortably14GB required · 24GB available

74.21 tok/sEstimated

Qwen/Qwen3-14BFP16

Not supported29GB required · 24GB available

38.58 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 24GB available

152.24 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 24GB available

105.47 tok/sEstimated

meta-llama/Llama-2-7b-hfFP16

Fits comfortably15GB required · 24GB available

57.52 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 24GB available

142.19 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably9GB required · 24GB available

109.91 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 24GB available

139.70 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 24GB available

101.81 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 24GB available

159.48 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 24GB available

121.88 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507FP16

Fits comfortably9GB required · 24GB available

59.02 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

GPU FAQs

Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.

How does Vulkan compare to ROCm on RX 7900 XTX?

On qwen3-30B Q4, Vulkan decode hits ~117 tok/sec once a 32K context fills, while ROCm drops to ~12 tok/sec—making Vulkan the faster option for long prompts.

Source: Reddit – /r/LocalLLaMA (mrdpho0)

What prompt-prefill speeds can Vulkan deliver?

The same benchmarks show Vulkan prompt prefill at ~486 tok/s on Windows drivers versus ~432 tok/s on ROCm, highlighting the driver advantage.

Source: Reddit – /r/LocalLLaMA (mrdpho0)

Can AMD hardware host 70B Q8 models?

Yes. Builders highlight Ryzen AI 395 mini-PCs with RX 7900-class GPUs that can load 70B Q8 contexts—something 24 GB NVIDIA cards can’t do—though throughput is slower.

Source: Reddit – /r/LocalLLaMA (mqupq0a)

Does FlashAttention accelerate the 7900 XTX?

Not yet—FlashAttention under Vulkan falls back to the CPU on 7900 XTX, so enabling it doesn’t improve throughput the way it does on NVIDIA cards.

Source: Reddit – /r/LocalLLaMA (mrdpho0)

What are the specs and price snapshot?

RX 7900 XTX offers 24 GB GDDR6 and a 355 W TBP. As of Nov 2025 Amazon listed it at $899 in stock.

Source: TechPowerUp – Radeon RX 7900 XTX Specs