Quick Answer: Apple M2 Pro offers 32GB VRAM and starts around current market pricing. It delivers approximately 35 tokens/sec on nari-labs/Dia2-2B. It typically draws 25W under load.

Apple M2 Pro

Unknown

By AppleReleased 2023-01MSRP $1,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM32GB

Cores19

TDP25W

ArchitectureApple Silicon M2

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test Apple M2 Pro performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
nari-labs/Dia2-2B	Q4	35.22 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	35.22 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	35.16 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	34.98 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	34.89 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	34.56 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	34.49 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	34.46 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	34.35 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	34.16 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	33.98 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	33.14 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	32.71 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	32.21 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	32.03 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	32.03 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	32.01 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	31.97 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	31.79 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	31.79 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	31.77 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-OCR	Q4	31.72 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	31.69 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	31.66 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	31.62 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	31.61 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	31.44 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	31.26 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	31.06 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	30.70 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	29.73 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-8B	Q4	29.40 tok/sEstimated Auto-generated benchmark	4GB
huggyllama/llama-7b	Q4	29.37 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-270m-it	Q4	29.33 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	29.26 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-4B	Q4	29.26 tok/sEstimated Auto-generated benchmark	2GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	29.18 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B-Instruct	Q4	29.06 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/pythia-70m-deduped	Q4	29.04 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	28.97 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	28.97 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Reranker-0.6B	Q4	28.89 tok/sEstimated Auto-generated benchmark	3GB
parler-tts/parler-tts-large-v1	Q4	28.88 tok/sEstimated Auto-generated benchmark	4GB
rinna/japanese-gpt-neox-small	Q4	28.88 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-8B	Q4	28.88 tok/sEstimated Auto-generated benchmark	4GB
facebook/opt-125m	Q4	28.78 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceH4/zephyr-7b-beta	Q4	28.75 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3.1	Q4	28.75 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	28.74 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507	Q4	28.71 tok/sEstimated Auto-generated benchmark	2GB

nari-labs/Dia2-2B

2GB

35.22 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

35.22 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

35.16 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

34.98 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

34.89 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

34.56 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

34.49 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

34.46 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

34.35 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

34.16 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

33.98 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

33.14 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

32.71 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

32.21 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

32.03 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

32.03 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

32.01 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

31.97 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

31.79 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

31.79 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

31.77 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

31.72 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

31.69 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

31.66 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

31.62 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

31.61 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

31.44 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

31.26 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

31.06 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

30.70 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

29.73 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

29.40 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

4GB

29.37 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

4GB

29.33 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

29.26 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

29.26 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

29.18 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

4GB

29.06 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

29.04 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

28.97 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

28.97 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

28.89 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

28.88 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

4GB

28.88 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

4GB

28.88 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

4GB

28.78 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

28.75 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

4GB

28.75 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

28.74 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

28.71 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	3.43 tok/sEstimated	378GB (have 32GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	2.44 tok/sEstimated	755GB (have 32GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	1.20 tok/sEstimated	1509GB (have 32GB)
EssentialAI/rnj-1	Q4	Fits comfortably	21.15 tok/sEstimated	5GB (have 32GB)
EssentialAI/rnj-1	Q8	Fits comfortably	15.07 tok/sEstimated	10GB (have 32GB)
google/gemma-2-9b-it	Q4	Fits comfortably	20.91 tok/sEstimated	5GB (have 32GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Not supported	5.99 tok/sEstimated	35GB (have 32GB)
dphn/dolphin-2.9.1-yi-1.5-34b	FP16	Not supported	3.77 tok/sEstimated	70GB (have 32GB)
openai/gpt-oss-20b	Q4	Fits comfortably	14.08 tok/sEstimated	10GB (have 32GB)
openai/gpt-oss-20b	Q8	Fits comfortably	10.22 tok/sEstimated	20GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	21.32 tok/sEstimated	1GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	13.11 tok/sEstimated	2GB (have 32GB)
openai/gpt-oss-120b	Q4	Not supported	5.84 tok/sEstimated	59GB (have 32GB)
openai/gpt-oss-120b	Q8	Not supported	4.09 tok/sEstimated	117GB (have 32GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	35.22 tok/sEstimated	2GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	FP16	Fits comfortably	11.11 tok/sEstimated	11GB (have 32GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	Fits comfortably	15.50 tok/sEstimated	10GB (have 32GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	10.55 tok/sEstimated	9GB (have 32GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	24.38 tok/sEstimated	3GB (have 32GB)
Qwen/Qwen3-Embedding-4B	Q8	Fits comfortably	19.96 tok/sEstimated	4GB (have 32GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	Not supported	6.86 tok/sEstimated	33GB (have 32GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	FP16	Not supported	3.37 tok/sEstimated	66GB (have 32GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	19.01 tok/sEstimated	7GB (have 32GB)
Qwen/Qwen2.5-14B	Q8	Fits comfortably	13.17 tok/sEstimated	14GB (have 32GB)
Qwen/Qwen2.5-14B	FP16	Fits comfortably	7.31 tok/sEstimated	29GB (have 32GB)
google/gemma-2-2b-it	Q4	Fits comfortably	32.71 tok/sEstimated	1GB (have 32GB)
meta-llama/Llama-Guard-3-1B	FP16	Fits comfortably	12.98 tok/sEstimated	2GB (have 32GB)
codellama/CodeLlama-34b-hf	Q4	Fits comfortably	10.26 tok/sEstimated	17GB (have 32GB)
codellama/CodeLlama-34b-hf	Q8	Not supported	7.22 tok/sEstimated	35GB (have 32GB)
codellama/CodeLlama-34b-hf	FP16	Not supported	3.90 tok/sEstimated	70GB (have 32GB)
openai-community/gpt2-xl	Q8	Fits comfortably	20.53 tok/sEstimated	7GB (have 32GB)
openai-community/gpt2-xl	FP16	Fits comfortably	10.83 tok/sEstimated	15GB (have 32GB)
meta-llama/Llama-Guard-3-8B	Q4	Fits comfortably	29.40 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen2.5-32B	Q8	Not supported	5.92 tok/sEstimated	33GB (have 32GB)
Qwen/Qwen2.5-32B	FP16	Not supported	3.45 tok/sEstimated	66GB (have 32GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	Fits comfortably	26.20 tok/sEstimated	2GB (have 32GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	Fits comfortably	18.11 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	FP16	Fits comfortably	10.43 tok/sEstimated	9GB (have 32GB)
skt/kogpt2-base-v2	Q8	Fits comfortably	17.55 tok/sEstimated	7GB (have 32GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q8	Not supported	3.93 tok/sEstimated	78GB (have 32GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	FP16	Not supported	2.14 tok/sEstimated	156GB (have 32GB)
bigcode/starcoder2-3b	Q4	Fits comfortably	31.62 tok/sEstimated	2GB (have 32GB)
bigcode/starcoder2-3b	Q8	Fits comfortably	23.43 tok/sEstimated	3GB (have 32GB)
lmsys/vicuna-7b-v1.5	Q4	Fits comfortably	27.53 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen2.5-Math-72B-Instruct	FP16	Not supported	1.88 tok/sEstimated	142GB (have 32GB)
deepseek-ai/DeepSeek-Coder-V2-Instruct-0724	Q4	Not supported	4.08 tok/sEstimated	115GB (have 32GB)
meta-llama/Llama-3.3-70B-Instruct	FP16	Not supported	1.91 tok/sEstimated	138GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Not supported	5.34 tok/sEstimated	34GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	3.47 tok/sEstimated	69GB (have 32GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	10.26 tok/sEstimated	11GB (have 32GB)

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 32GB available

3.43 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 32GB available

2.44 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 32GB available

1.20 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 32GB available

21.15 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 32GB available

15.07 tok/sEstimated

google/gemma-2-9b-itQ4

Fits comfortably5GB required · 32GB available

20.91 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Not supported35GB required · 32GB available

5.99 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bFP16

Not supported70GB required · 32GB available

3.77 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 32GB available

14.08 tok/sEstimated

openai/gpt-oss-20bQ8

Fits comfortably20GB required · 32GB available

10.22 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 32GB available

21.32 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 32GB available

13.11 tok/sEstimated

openai/gpt-oss-120bQ4

Not supported59GB required · 32GB available

5.84 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported117GB required · 32GB available

4.09 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 32GB available

35.22 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BFP16

Fits comfortably11GB required · 32GB available

11.11 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q4

Fits comfortably10GB required · 32GB available

15.50 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 32GB available

10.55 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 32GB available

24.38 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ8

Fits comfortably4GB required · 32GB available

19.96 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ8

Not supported33GB required · 32GB available

6.86 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitFP16

Not supported66GB required · 32GB available

3.37 tok/sEstimated

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 32GB available

19.01 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Fits comfortably14GB required · 32GB available

13.17 tok/sEstimated

Qwen/Qwen2.5-14BFP16

Fits comfortably29GB required · 32GB available

7.31 tok/sEstimated

google/gemma-2-2b-itQ4

Fits comfortably1GB required · 32GB available

32.71 tok/sEstimated

meta-llama/Llama-Guard-3-1BFP16

Fits comfortably2GB required · 32GB available

12.98 tok/sEstimated

codellama/CodeLlama-34b-hfQ4

Fits comfortably17GB required · 32GB available

10.26 tok/sEstimated

codellama/CodeLlama-34b-hfQ8

Not supported35GB required · 32GB available

7.22 tok/sEstimated

codellama/CodeLlama-34b-hfFP16

Not supported70GB required · 32GB available

3.90 tok/sEstimated

openai-community/gpt2-xlQ8

Fits comfortably7GB required · 32GB available

20.53 tok/sEstimated

openai-community/gpt2-xlFP16

Fits comfortably15GB required · 32GB available

10.83 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ4

Fits comfortably4GB required · 32GB available

29.40 tok/sEstimated

Qwen/Qwen2.5-32BQ8

Not supported33GB required · 32GB available

5.92 tok/sEstimated

Qwen/Qwen2.5-32BFP16

Not supported66GB required · 32GB available

3.45 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q4

Fits comfortably2GB required · 32GB available

26.20 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q8

Fits comfortably4GB required · 32GB available

18.11 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8FP16

Fits comfortably9GB required · 32GB available

10.43 tok/sEstimated

skt/kogpt2-base-v2Q8

Fits comfortably7GB required · 32GB available

17.55 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q8

Not supported78GB required · 32GB available

3.93 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8FP16

Not supported156GB required · 32GB available

2.14 tok/sEstimated

bigcode/starcoder2-3bQ4

Fits comfortably2GB required · 32GB available

31.62 tok/sEstimated

bigcode/starcoder2-3bQ8

Fits comfortably3GB required · 32GB available

23.43 tok/sEstimated

lmsys/vicuna-7b-v1.5Q4

Fits comfortably4GB required · 32GB available

27.53 tok/sEstimated

Qwen/Qwen2.5-Math-72B-InstructFP16

Not supported142GB required · 32GB available

1.88 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Instruct-0724Q4

Not supported115GB required · 32GB available

4.08 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructFP16

Not supported138GB required · 32GB available

1.91 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ4

Not supported34GB required · 32GB available

5.34 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported69GB required · 32GB available

3.47 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 32GB available

10.26 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

Quick Answer: Apple M2 Pro offers 32GB VRAM and starts around current market pricing. It delivers approximately 35 tokens/sec on nari-labs/Dia2-2B. It typically draws 25W under load.

Apple M2 Pro

Unknown

By AppleReleased 2023-01MSRP $1,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM32GB

Cores19

TDP25W

ArchitectureApple Silicon M2

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test Apple M2 Pro performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
nari-labs/Dia2-2B	Q4	35.22 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	35.22 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	35.16 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	34.98 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	34.89 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	34.56 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	34.49 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	34.46 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	34.35 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	34.16 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	33.98 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	33.14 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	32.71 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	32.21 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	32.03 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	32.03 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	32.01 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	31.97 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	31.79 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	31.79 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	31.77 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-OCR	Q4	31.72 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	31.69 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	31.66 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	31.62 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	31.61 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	31.44 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	31.26 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	31.06 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	30.70 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	29.73 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-8B	Q4	29.40 tok/sEstimated Auto-generated benchmark	4GB
huggyllama/llama-7b	Q4	29.37 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-270m-it	Q4	29.33 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	29.26 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-4B	Q4	29.26 tok/sEstimated Auto-generated benchmark	2GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	29.18 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B-Instruct	Q4	29.06 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/pythia-70m-deduped	Q4	29.04 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	28.97 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	28.97 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Reranker-0.6B	Q4	28.89 tok/sEstimated Auto-generated benchmark	3GB
parler-tts/parler-tts-large-v1	Q4	28.88 tok/sEstimated Auto-generated benchmark	4GB
rinna/japanese-gpt-neox-small	Q4	28.88 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-8B	Q4	28.88 tok/sEstimated Auto-generated benchmark	4GB
facebook/opt-125m	Q4	28.78 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceH4/zephyr-7b-beta	Q4	28.75 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3.1	Q4	28.75 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	28.74 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507	Q4	28.71 tok/sEstimated Auto-generated benchmark	2GB

nari-labs/Dia2-2B

2GB

35.22 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

35.22 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

35.16 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

34.98 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

34.89 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

34.56 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

34.49 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

34.46 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

34.35 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

34.16 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

33.98 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

33.14 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

32.71 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

32.21 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

32.03 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

32.03 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

32.01 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

31.97 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

31.79 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

31.79 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

31.77 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

31.72 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

31.69 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

31.66 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

31.62 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

31.61 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

31.44 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

31.26 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

31.06 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

30.70 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

29.73 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

29.40 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

4GB

29.37 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

4GB

29.33 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

29.26 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

29.26 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

29.18 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

4GB

29.06 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

29.04 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

28.97 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

28.97 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

28.89 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

28.88 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

4GB

28.88 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

4GB

28.88 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

4GB

28.78 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

28.75 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

4GB

28.75 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

28.74 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

28.71 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	3.43 tok/sEstimated	378GB (have 32GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	2.44 tok/sEstimated	755GB (have 32GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	1.20 tok/sEstimated	1509GB (have 32GB)
EssentialAI/rnj-1	Q4	Fits comfortably	21.15 tok/sEstimated	5GB (have 32GB)
EssentialAI/rnj-1	Q8	Fits comfortably	15.07 tok/sEstimated	10GB (have 32GB)
google/gemma-2-9b-it	Q4	Fits comfortably	20.91 tok/sEstimated	5GB (have 32GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Not supported	5.99 tok/sEstimated	35GB (have 32GB)
dphn/dolphin-2.9.1-yi-1.5-34b	FP16	Not supported	3.77 tok/sEstimated	70GB (have 32GB)
openai/gpt-oss-20b	Q4	Fits comfortably	14.08 tok/sEstimated	10GB (have 32GB)
openai/gpt-oss-20b	Q8	Fits comfortably	10.22 tok/sEstimated	20GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	21.32 tok/sEstimated	1GB (have 32GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	13.11 tok/sEstimated	2GB (have 32GB)
openai/gpt-oss-120b	Q4	Not supported	5.84 tok/sEstimated	59GB (have 32GB)
openai/gpt-oss-120b	Q8	Not supported	4.09 tok/sEstimated	117GB (have 32GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	35.22 tok/sEstimated	2GB (have 32GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	FP16	Fits comfortably	11.11 tok/sEstimated	11GB (have 32GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	Fits comfortably	15.50 tok/sEstimated	10GB (have 32GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	10.55 tok/sEstimated	9GB (have 32GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	24.38 tok/sEstimated	3GB (have 32GB)
Qwen/Qwen3-Embedding-4B	Q8	Fits comfortably	19.96 tok/sEstimated	4GB (have 32GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	Not supported	6.86 tok/sEstimated	33GB (have 32GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	FP16	Not supported	3.37 tok/sEstimated	66GB (have 32GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	19.01 tok/sEstimated	7GB (have 32GB)
Qwen/Qwen2.5-14B	Q8	Fits comfortably	13.17 tok/sEstimated	14GB (have 32GB)
Qwen/Qwen2.5-14B	FP16	Fits comfortably	7.31 tok/sEstimated	29GB (have 32GB)
google/gemma-2-2b-it	Q4	Fits comfortably	32.71 tok/sEstimated	1GB (have 32GB)
meta-llama/Llama-Guard-3-1B	FP16	Fits comfortably	12.98 tok/sEstimated	2GB (have 32GB)
codellama/CodeLlama-34b-hf	Q4	Fits comfortably	10.26 tok/sEstimated	17GB (have 32GB)
codellama/CodeLlama-34b-hf	Q8	Not supported	7.22 tok/sEstimated	35GB (have 32GB)
codellama/CodeLlama-34b-hf	FP16	Not supported	3.90 tok/sEstimated	70GB (have 32GB)
openai-community/gpt2-xl	Q8	Fits comfortably	20.53 tok/sEstimated	7GB (have 32GB)
openai-community/gpt2-xl	FP16	Fits comfortably	10.83 tok/sEstimated	15GB (have 32GB)
meta-llama/Llama-Guard-3-8B	Q4	Fits comfortably	29.40 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen2.5-32B	Q8	Not supported	5.92 tok/sEstimated	33GB (have 32GB)
Qwen/Qwen2.5-32B	FP16	Not supported	3.45 tok/sEstimated	66GB (have 32GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	Fits comfortably	26.20 tok/sEstimated	2GB (have 32GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	Fits comfortably	18.11 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	FP16	Fits comfortably	10.43 tok/sEstimated	9GB (have 32GB)
skt/kogpt2-base-v2	Q8	Fits comfortably	17.55 tok/sEstimated	7GB (have 32GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q8	Not supported	3.93 tok/sEstimated	78GB (have 32GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	FP16	Not supported	2.14 tok/sEstimated	156GB (have 32GB)
bigcode/starcoder2-3b	Q4	Fits comfortably	31.62 tok/sEstimated	2GB (have 32GB)
bigcode/starcoder2-3b	Q8	Fits comfortably	23.43 tok/sEstimated	3GB (have 32GB)
lmsys/vicuna-7b-v1.5	Q4	Fits comfortably	27.53 tok/sEstimated	4GB (have 32GB)
Qwen/Qwen2.5-Math-72B-Instruct	FP16	Not supported	1.88 tok/sEstimated	142GB (have 32GB)
deepseek-ai/DeepSeek-Coder-V2-Instruct-0724	Q4	Not supported	4.08 tok/sEstimated	115GB (have 32GB)
meta-llama/Llama-3.3-70B-Instruct	FP16	Not supported	1.91 tok/sEstimated	138GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Not supported	5.34 tok/sEstimated	34GB (have 32GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	3.47 tok/sEstimated	69GB (have 32GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	10.26 tok/sEstimated	11GB (have 32GB)

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 32GB available

3.43 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 32GB available

2.44 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 32GB available

1.20 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 32GB available

21.15 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 32GB available

15.07 tok/sEstimated

google/gemma-2-9b-itQ4

Fits comfortably5GB required · 32GB available

20.91 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Not supported35GB required · 32GB available

5.99 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bFP16

Not supported70GB required · 32GB available

3.77 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 32GB available

14.08 tok/sEstimated

openai/gpt-oss-20bQ8

Fits comfortably20GB required · 32GB available

10.22 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 32GB available

21.32 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 32GB available

13.11 tok/sEstimated

openai/gpt-oss-120bQ4

Not supported59GB required · 32GB available

5.84 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported117GB required · 32GB available

4.09 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 32GB available

35.22 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BFP16

Fits comfortably11GB required · 32GB available

11.11 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q4

Fits comfortably10GB required · 32GB available

15.50 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 32GB available

10.55 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 32GB available

24.38 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ8

Fits comfortably4GB required · 32GB available

19.96 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ8

Not supported33GB required · 32GB available

6.86 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitFP16

Not supported66GB required · 32GB available

3.37 tok/sEstimated

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 32GB available

19.01 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Fits comfortably14GB required · 32GB available

13.17 tok/sEstimated

Qwen/Qwen2.5-14BFP16

Fits comfortably29GB required · 32GB available

7.31 tok/sEstimated

google/gemma-2-2b-itQ4

Fits comfortably1GB required · 32GB available

32.71 tok/sEstimated

meta-llama/Llama-Guard-3-1BFP16

Fits comfortably2GB required · 32GB available

12.98 tok/sEstimated

codellama/CodeLlama-34b-hfQ4

Fits comfortably17GB required · 32GB available

10.26 tok/sEstimated

codellama/CodeLlama-34b-hfQ8

Not supported35GB required · 32GB available

7.22 tok/sEstimated

codellama/CodeLlama-34b-hfFP16

Not supported70GB required · 32GB available

3.90 tok/sEstimated

openai-community/gpt2-xlQ8

Fits comfortably7GB required · 32GB available

20.53 tok/sEstimated

openai-community/gpt2-xlFP16

Fits comfortably15GB required · 32GB available

10.83 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ4

Fits comfortably4GB required · 32GB available

29.40 tok/sEstimated

Qwen/Qwen2.5-32BQ8

Not supported33GB required · 32GB available

5.92 tok/sEstimated

Qwen/Qwen2.5-32BFP16

Not supported66GB required · 32GB available

3.45 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q4

Fits comfortably2GB required · 32GB available

26.20 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q8

Fits comfortably4GB required · 32GB available

18.11 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8FP16

Fits comfortably9GB required · 32GB available

10.43 tok/sEstimated

skt/kogpt2-base-v2Q8

Fits comfortably7GB required · 32GB available

17.55 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q8

Not supported78GB required · 32GB available

3.93 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8FP16

Not supported156GB required · 32GB available

2.14 tok/sEstimated

bigcode/starcoder2-3bQ4

Fits comfortably2GB required · 32GB available

31.62 tok/sEstimated

bigcode/starcoder2-3bQ8

Fits comfortably3GB required · 32GB available

23.43 tok/sEstimated

lmsys/vicuna-7b-v1.5Q4

Fits comfortably4GB required · 32GB available

27.53 tok/sEstimated

Qwen/Qwen2.5-Math-72B-InstructFP16

Not supported142GB required · 32GB available

1.88 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Instruct-0724Q4

Not supported115GB required · 32GB available

4.08 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructFP16

Not supported138GB required · 32GB available

1.91 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ4

Not supported34GB required · 32GB available

5.34 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported69GB required · 32GB available

3.47 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 32GB available

10.26 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.