Quick Answer: Apple M4 Pro offers 64GB VRAM and starts around current market pricing. It delivers approximately 48 tokens/sec on bigcode/starcoder2-3b. It typically draws 30W under load.

Apple M4 Pro

Unknown

By AppleReleased 2024-11MSRP $1,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM64GB

Cores20

TDP30W

ArchitectureApple Silicon M4

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test Apple M4 Pro performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
bigcode/starcoder2-3b	Q4	48.24 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	48.18 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	47.97 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	47.67 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	47.51 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	47.47 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-3-1b-it	Q4	47.46 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	46.59 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	46.31 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	45.30 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	45.20 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	45.04 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	44.79 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	44.78 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	44.71 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	44.58 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	44.58 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	44.29 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	43.77 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	43.77 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	43.07 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	41.84 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	41.51 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	41.50 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	41.45 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	41.13 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	41.08 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	40.97 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	40.36 tok/sEstimated Auto-generated benchmark	1GB
numind/NuExtract-1.5	Q4	40.27 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	40.23 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-large	Q4	40.12 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	40.11 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B	Q4	40.04 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-R1-0528	Q4	40.02 tok/sEstimated Auto-generated benchmark	4GB
tencent/HunyuanOCR	Q4	40.01 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-V3-0324	Q4	39.97 tok/sEstimated Auto-generated benchmark	4GB
BSC-LT/salamandraTA-7b-instruct	Q4	39.77 tok/sEstimated Auto-generated benchmark	4GB
parler-tts/parler-tts-large-v1	Q4	39.70 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B	Q4	39.60 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	39.50 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1	Q4	39.46 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-1.5B	Q4	39.44 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-0.5B	Q4	39.41 tok/sEstimated Auto-generated benchmark	3GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	39.39 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	39.16 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-hf	Q4	38.66 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	38.59 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	38.58 tok/sEstimated Auto-generated benchmark	4GB
bigscience/bloomz-560m	Q4	38.46 tok/sEstimated Auto-generated benchmark	4GB

bigcode/starcoder2-3b

2GB

48.24 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

48.18 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

47.97 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

47.67 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

47.51 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

47.47 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

47.46 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

46.59 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

46.31 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

45.30 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

45.20 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

45.04 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

44.79 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

44.78 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

44.71 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

44.58 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

44.58 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

44.29 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

43.77 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

43.77 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

43.07 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

41.84 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

41.51 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

41.50 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

41.45 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

41.13 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

41.08 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

40.97 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

40.36 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

40.27 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

40.23 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

4GB

40.12 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

40.11 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

40.04 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

40.02 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

40.01 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

39.97 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

39.77 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

39.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

3GB

39.60 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

4GB

39.50 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

4GB

39.46 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

39.44 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

39.41 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

39.39 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

39.16 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

4GB

38.66 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

38.59 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

38.58 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

38.46 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
facebook/opt-125m	Q8	Fits comfortably	25.17 tok/sEstimated	7GB (have 64GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	31.49 tok/sEstimated	1GB (have 64GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	18.02 tok/sEstimated	2GB (have 64GB)
openai/gpt-oss-120b	Q4	Fits comfortably	7.51 tok/sEstimated	59GB (have 64GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	37.32 tok/sEstimated	4GB (have 64GB)
google-t5/t5-3b	Q8	Fits comfortably	31.25 tok/sEstimated	3GB (have 64GB)
google-t5/t5-3b	FP16	Fits comfortably	17.93 tok/sEstimated	6GB (have 64GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	27.45 tok/sEstimated	9GB (have 64GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Fits comfortably	13.64 tok/sEstimated	17GB (have 64GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	17.48 tok/sEstimated	14GB (have 64GB)
Qwen/Qwen2.5-14B-Instruct	FP16	Fits comfortably	10.08 tok/sEstimated	29GB (have 64GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	Fits comfortably	39.39 tok/sEstimated	4GB (have 64GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Not supported	5.29 tok/sEstimated	137GB (have 64GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	14.89 tok/sEstimated	15GB (have 64GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	38.16 tok/sEstimated	4GB (have 64GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	27.38 tok/sEstimated	9GB (have 64GB)
Qwen/Qwen2.5-32B-Instruct	FP16	Not supported	4.95 tok/sEstimated	66GB (have 64GB)
mistralai/Mistral-7B-v0.1	Q4	Fits comfortably	34.54 tok/sEstimated	4GB (have 64GB)
mistralai/Mistral-7B-v0.1	Q8	Fits comfortably	27.45 tok/sEstimated	7GB (have 64GB)
mistralai/Mistral-7B-v0.1	FP16	Fits comfortably	12.84 tok/sEstimated	15GB (have 64GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	24.23 tok/sEstimated	9GB (have 64GB)
unsloth/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	13.36 tok/sEstimated	17GB (have 64GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Fits comfortably	12.89 tok/sEstimated	34GB (have 64GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Not supported	8.89 tok/sEstimated	68GB (have 64GB)
Qwen/Qwen3-0.6B-Base	Q4	Fits comfortably	38.01 tok/sEstimated	3GB (have 64GB)
microsoft/DialoGPT-small	Q4	Fits comfortably	39.16 tok/sEstimated	4GB (have 64GB)
microsoft/DialoGPT-small	Q8	Fits comfortably	25.78 tok/sEstimated	7GB (have 64GB)
microsoft/DialoGPT-small	FP16	Fits comfortably	13.07 tok/sEstimated	15GB (have 64GB)
Qwen/Qwen2-1.5B-Instruct	Q4	Fits comfortably	33.31 tok/sEstimated	3GB (have 64GB)
Qwen/Qwen2-1.5B-Instruct	Q8	Fits comfortably	27.26 tok/sEstimated	5GB (have 64GB)
Qwen/Qwen2-1.5B-Instruct	FP16	Fits comfortably	13.05 tok/sEstimated	11GB (have 64GB)
meta-llama/Llama-Guard-3-1B	Q4	Fits comfortably	43.77 tok/sEstimated	1GB (have 64GB)
EleutherAI/gpt-neo-125m	FP16	Fits comfortably	14.17 tok/sEstimated	15GB (have 64GB)
meta-llama/Llama-3.2-3B	Q4	Fits comfortably	47.51 tok/sEstimated	2GB (have 64GB)
meta-llama/Llama-3.2-3B	Q8	Fits comfortably	29.12 tok/sEstimated	3GB (have 64GB)
meta-llama/Llama-3.2-3B	FP16	Fits comfortably	16.63 tok/sEstimated	6GB (have 64GB)
deepseek-ai/DeepSeek-V3-0324	FP16	Fits comfortably	13.86 tok/sEstimated	15GB (have 64GB)
huggyllama/llama-7b	Q4	Fits comfortably	36.81 tok/sEstimated	4GB (have 64GB)
huggyllama/llama-7b	Q8	Fits comfortably	23.93 tok/sEstimated	7GB (have 64GB)
huggyllama/llama-7b	FP16	Fits comfortably	14.45 tok/sEstimated	15GB (have 64GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	Fits comfortably	37.28 tok/sEstimated	4GB (have 64GB)
microsoft/Phi-4-mini-instruct	Q8	Fits comfortably	24.08 tok/sEstimated	7GB (have 64GB)
microsoft/Phi-4-mini-instruct	FP16	Fits comfortably	15.29 tok/sEstimated	15GB (have 64GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	39.50 tok/sEstimated	4GB (have 64GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	37.04 tok/sEstimated	4GB (have 64GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	25.96 tok/sEstimated	9GB (have 64GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	13.73 tok/sEstimated	17GB (have 64GB)
apple/OpenELM-1_1B-Instruct	Q4	Fits comfortably	41.45 tok/sEstimated	1GB (have 64GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Fits comfortably	14.52 tok/sEstimated	31GB (have 64GB)
facebook/opt-125m	Q4	Fits comfortably	37.50 tok/sEstimated	4GB (have 64GB)

facebook/opt-125mQ8

Fits comfortably7GB required · 64GB available

25.17 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 64GB available

31.49 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 64GB available

18.02 tok/sEstimated

openai/gpt-oss-120bQ4

Fits comfortably59GB required · 64GB available

7.51 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 64GB available

37.32 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 64GB available

31.25 tok/sEstimated

google-t5/t5-3bFP16

Fits comfortably6GB required · 64GB available

17.93 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably9GB required · 64GB available

27.45 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Fits comfortably17GB required · 64GB available

13.64 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 64GB available

17.48 tok/sEstimated

Qwen/Qwen2.5-14B-InstructFP16

Fits comfortably29GB required · 64GB available

10.08 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ4

Fits comfortably4GB required · 64GB available

39.39 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Not supported137GB required · 64GB available

5.29 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 64GB available

14.89 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 64GB available

38.16 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably9GB required · 64GB available

27.38 tok/sEstimated

Qwen/Qwen2.5-32B-InstructFP16

Not supported66GB required · 64GB available

4.95 tok/sEstimated

mistralai/Mistral-7B-v0.1Q4

Fits comfortably4GB required · 64GB available

34.54 tok/sEstimated

mistralai/Mistral-7B-v0.1Q8

Fits comfortably7GB required · 64GB available

27.45 tok/sEstimated

mistralai/Mistral-7B-v0.1FP16

Fits comfortably15GB required · 64GB available

12.84 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 64GB available

24.23 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 64GB available

13.36 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ4

Fits comfortably34GB required · 64GB available

12.89 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Not supported68GB required · 64GB available

8.89 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ4

Fits comfortably3GB required · 64GB available

38.01 tok/sEstimated

microsoft/DialoGPT-smallQ4

Fits comfortably4GB required · 64GB available

39.16 tok/sEstimated

microsoft/DialoGPT-smallQ8

Fits comfortably7GB required · 64GB available

25.78 tok/sEstimated

microsoft/DialoGPT-smallFP16

Fits comfortably15GB required · 64GB available

13.07 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ4

Fits comfortably3GB required · 64GB available

33.31 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ8

Fits comfortably5GB required · 64GB available

27.26 tok/sEstimated

Qwen/Qwen2-1.5B-InstructFP16

Fits comfortably11GB required · 64GB available

13.05 tok/sEstimated

meta-llama/Llama-Guard-3-1BQ4

Fits comfortably1GB required · 64GB available

43.77 tok/sEstimated

EleutherAI/gpt-neo-125mFP16

Fits comfortably15GB required · 64GB available

14.17 tok/sEstimated

meta-llama/Llama-3.2-3BQ4

Fits comfortably2GB required · 64GB available

47.51 tok/sEstimated

meta-llama/Llama-3.2-3BQ8

Fits comfortably3GB required · 64GB available

29.12 tok/sEstimated

meta-llama/Llama-3.2-3BFP16

Fits comfortably6GB required · 64GB available

16.63 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324FP16

Fits comfortably15GB required · 64GB available

13.86 tok/sEstimated

huggyllama/llama-7bQ4

Fits comfortably4GB required · 64GB available

36.81 tok/sEstimated

huggyllama/llama-7bQ8

Fits comfortably7GB required · 64GB available

23.93 tok/sEstimated

huggyllama/llama-7bFP16

Fits comfortably15GB required · 64GB available

14.45 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4

Fits comfortably4GB required · 64GB available

37.28 tok/sEstimated

microsoft/Phi-4-mini-instructQ8

Fits comfortably7GB required · 64GB available

24.08 tok/sEstimated

microsoft/Phi-4-mini-instructFP16

Fits comfortably15GB required · 64GB available

15.29 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 64GB available

39.50 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 64GB available

37.04 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 64GB available

25.96 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 64GB available

13.73 tok/sEstimated

apple/OpenELM-1_1B-InstructQ4

Fits comfortably1GB required · 64GB available

41.45 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Fits comfortably31GB required · 64GB available

14.52 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 64GB available

37.50 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

Quick Answer: Apple M4 Pro offers 64GB VRAM and starts around current market pricing. It delivers approximately 48 tokens/sec on bigcode/starcoder2-3b. It typically draws 30W under load.

Apple M4 Pro

Unknown

By AppleReleased 2024-11MSRP $1,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM64GB

Cores20

TDP30W

ArchitectureApple Silicon M4

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test Apple M4 Pro performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
bigcode/starcoder2-3b	Q4	48.24 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	48.18 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	47.97 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	47.67 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	47.51 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	47.47 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-3-1b-it	Q4	47.46 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	46.59 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	46.31 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	45.30 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	45.20 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	45.04 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	44.79 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	44.78 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	44.71 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	44.58 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	44.58 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	44.29 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	43.77 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	43.77 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	43.07 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	41.84 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	41.51 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	41.50 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	41.45 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	41.13 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	41.08 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	40.97 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	40.36 tok/sEstimated Auto-generated benchmark	1GB
numind/NuExtract-1.5	Q4	40.27 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	40.23 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-large	Q4	40.12 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	40.11 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B	Q4	40.04 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-R1-0528	Q4	40.02 tok/sEstimated Auto-generated benchmark	4GB
tencent/HunyuanOCR	Q4	40.01 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-V3-0324	Q4	39.97 tok/sEstimated Auto-generated benchmark	4GB
BSC-LT/salamandraTA-7b-instruct	Q4	39.77 tok/sEstimated Auto-generated benchmark	4GB
parler-tts/parler-tts-large-v1	Q4	39.70 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B	Q4	39.60 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	39.50 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1	Q4	39.46 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-1.5B	Q4	39.44 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-0.5B	Q4	39.41 tok/sEstimated Auto-generated benchmark	3GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	39.39 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	39.16 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-hf	Q4	38.66 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	38.59 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	38.58 tok/sEstimated Auto-generated benchmark	4GB
bigscience/bloomz-560m	Q4	38.46 tok/sEstimated Auto-generated benchmark	4GB

bigcode/starcoder2-3b

2GB

48.24 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

48.18 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

47.97 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

47.67 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

47.51 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

47.47 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

47.46 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

46.59 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

46.31 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

45.30 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

45.20 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

45.04 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

44.79 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

44.78 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

44.71 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

44.58 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

44.58 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

44.29 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

43.77 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

43.77 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

43.07 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

41.84 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

41.51 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

41.50 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

41.45 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

41.13 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

41.08 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

40.97 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

40.36 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

40.27 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

40.23 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

4GB

40.12 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

40.11 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

40.04 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

40.02 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

40.01 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

39.97 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

39.77 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

39.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

3GB

39.60 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

4GB

39.50 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

4GB

39.46 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

39.44 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

39.41 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

39.39 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

39.16 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

4GB

38.66 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

38.59 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

38.58 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

38.46 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
facebook/opt-125m	Q8	Fits comfortably	25.17 tok/sEstimated	7GB (have 64GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	31.49 tok/sEstimated	1GB (have 64GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	18.02 tok/sEstimated	2GB (have 64GB)
openai/gpt-oss-120b	Q4	Fits comfortably	7.51 tok/sEstimated	59GB (have 64GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	37.32 tok/sEstimated	4GB (have 64GB)
google-t5/t5-3b	Q8	Fits comfortably	31.25 tok/sEstimated	3GB (have 64GB)
google-t5/t5-3b	FP16	Fits comfortably	17.93 tok/sEstimated	6GB (have 64GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	27.45 tok/sEstimated	9GB (have 64GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Fits comfortably	13.64 tok/sEstimated	17GB (have 64GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	17.48 tok/sEstimated	14GB (have 64GB)
Qwen/Qwen2.5-14B-Instruct	FP16	Fits comfortably	10.08 tok/sEstimated	29GB (have 64GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	Fits comfortably	39.39 tok/sEstimated	4GB (have 64GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Not supported	5.29 tok/sEstimated	137GB (have 64GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	14.89 tok/sEstimated	15GB (have 64GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	38.16 tok/sEstimated	4GB (have 64GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	27.38 tok/sEstimated	9GB (have 64GB)
Qwen/Qwen2.5-32B-Instruct	FP16	Not supported	4.95 tok/sEstimated	66GB (have 64GB)
mistralai/Mistral-7B-v0.1	Q4	Fits comfortably	34.54 tok/sEstimated	4GB (have 64GB)
mistralai/Mistral-7B-v0.1	Q8	Fits comfortably	27.45 tok/sEstimated	7GB (have 64GB)
mistralai/Mistral-7B-v0.1	FP16	Fits comfortably	12.84 tok/sEstimated	15GB (have 64GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	24.23 tok/sEstimated	9GB (have 64GB)
unsloth/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	13.36 tok/sEstimated	17GB (have 64GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Fits comfortably	12.89 tok/sEstimated	34GB (have 64GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Not supported	8.89 tok/sEstimated	68GB (have 64GB)
Qwen/Qwen3-0.6B-Base	Q4	Fits comfortably	38.01 tok/sEstimated	3GB (have 64GB)
microsoft/DialoGPT-small	Q4	Fits comfortably	39.16 tok/sEstimated	4GB (have 64GB)
microsoft/DialoGPT-small	Q8	Fits comfortably	25.78 tok/sEstimated	7GB (have 64GB)
microsoft/DialoGPT-small	FP16	Fits comfortably	13.07 tok/sEstimated	15GB (have 64GB)
Qwen/Qwen2-1.5B-Instruct	Q4	Fits comfortably	33.31 tok/sEstimated	3GB (have 64GB)
Qwen/Qwen2-1.5B-Instruct	Q8	Fits comfortably	27.26 tok/sEstimated	5GB (have 64GB)
Qwen/Qwen2-1.5B-Instruct	FP16	Fits comfortably	13.05 tok/sEstimated	11GB (have 64GB)
meta-llama/Llama-Guard-3-1B	Q4	Fits comfortably	43.77 tok/sEstimated	1GB (have 64GB)
EleutherAI/gpt-neo-125m	FP16	Fits comfortably	14.17 tok/sEstimated	15GB (have 64GB)
meta-llama/Llama-3.2-3B	Q4	Fits comfortably	47.51 tok/sEstimated	2GB (have 64GB)
meta-llama/Llama-3.2-3B	Q8	Fits comfortably	29.12 tok/sEstimated	3GB (have 64GB)
meta-llama/Llama-3.2-3B	FP16	Fits comfortably	16.63 tok/sEstimated	6GB (have 64GB)
deepseek-ai/DeepSeek-V3-0324	FP16	Fits comfortably	13.86 tok/sEstimated	15GB (have 64GB)
huggyllama/llama-7b	Q4	Fits comfortably	36.81 tok/sEstimated	4GB (have 64GB)
huggyllama/llama-7b	Q8	Fits comfortably	23.93 tok/sEstimated	7GB (have 64GB)
huggyllama/llama-7b	FP16	Fits comfortably	14.45 tok/sEstimated	15GB (have 64GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	Fits comfortably	37.28 tok/sEstimated	4GB (have 64GB)
microsoft/Phi-4-mini-instruct	Q8	Fits comfortably	24.08 tok/sEstimated	7GB (have 64GB)
microsoft/Phi-4-mini-instruct	FP16	Fits comfortably	15.29 tok/sEstimated	15GB (have 64GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	39.50 tok/sEstimated	4GB (have 64GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	37.04 tok/sEstimated	4GB (have 64GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	25.96 tok/sEstimated	9GB (have 64GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	13.73 tok/sEstimated	17GB (have 64GB)
apple/OpenELM-1_1B-Instruct	Q4	Fits comfortably	41.45 tok/sEstimated	1GB (have 64GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Fits comfortably	14.52 tok/sEstimated	31GB (have 64GB)
facebook/opt-125m	Q4	Fits comfortably	37.50 tok/sEstimated	4GB (have 64GB)

facebook/opt-125mQ8

Fits comfortably7GB required · 64GB available

25.17 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 64GB available

31.49 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 64GB available

18.02 tok/sEstimated

openai/gpt-oss-120bQ4

Fits comfortably59GB required · 64GB available

7.51 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 64GB available

37.32 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 64GB available

31.25 tok/sEstimated

google-t5/t5-3bFP16

Fits comfortably6GB required · 64GB available

17.93 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably9GB required · 64GB available

27.45 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Fits comfortably17GB required · 64GB available

13.64 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 64GB available

17.48 tok/sEstimated

Qwen/Qwen2.5-14B-InstructFP16

Fits comfortably29GB required · 64GB available

10.08 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ4

Fits comfortably4GB required · 64GB available

39.39 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Not supported137GB required · 64GB available

5.29 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 64GB available

14.89 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 64GB available

38.16 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably9GB required · 64GB available

27.38 tok/sEstimated

Qwen/Qwen2.5-32B-InstructFP16

Not supported66GB required · 64GB available

4.95 tok/sEstimated

mistralai/Mistral-7B-v0.1Q4

Fits comfortably4GB required · 64GB available

34.54 tok/sEstimated

mistralai/Mistral-7B-v0.1Q8

Fits comfortably7GB required · 64GB available

27.45 tok/sEstimated

mistralai/Mistral-7B-v0.1FP16

Fits comfortably15GB required · 64GB available

12.84 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 64GB available

24.23 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 64GB available

13.36 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ4

Fits comfortably34GB required · 64GB available

12.89 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Not supported68GB required · 64GB available

8.89 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ4

Fits comfortably3GB required · 64GB available

38.01 tok/sEstimated

microsoft/DialoGPT-smallQ4

Fits comfortably4GB required · 64GB available

39.16 tok/sEstimated

microsoft/DialoGPT-smallQ8

Fits comfortably7GB required · 64GB available

25.78 tok/sEstimated

microsoft/DialoGPT-smallFP16

Fits comfortably15GB required · 64GB available

13.07 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ4

Fits comfortably3GB required · 64GB available

33.31 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ8

Fits comfortably5GB required · 64GB available

27.26 tok/sEstimated

Qwen/Qwen2-1.5B-InstructFP16

Fits comfortably11GB required · 64GB available

13.05 tok/sEstimated

meta-llama/Llama-Guard-3-1BQ4

Fits comfortably1GB required · 64GB available

43.77 tok/sEstimated

EleutherAI/gpt-neo-125mFP16

Fits comfortably15GB required · 64GB available

14.17 tok/sEstimated

meta-llama/Llama-3.2-3BQ4

Fits comfortably2GB required · 64GB available

47.51 tok/sEstimated

meta-llama/Llama-3.2-3BQ8

Fits comfortably3GB required · 64GB available

29.12 tok/sEstimated

meta-llama/Llama-3.2-3BFP16

Fits comfortably6GB required · 64GB available

16.63 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324FP16

Fits comfortably15GB required · 64GB available

13.86 tok/sEstimated

huggyllama/llama-7bQ4

Fits comfortably4GB required · 64GB available

36.81 tok/sEstimated

huggyllama/llama-7bQ8

Fits comfortably7GB required · 64GB available

23.93 tok/sEstimated

huggyllama/llama-7bFP16

Fits comfortably15GB required · 64GB available

14.45 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4

Fits comfortably4GB required · 64GB available

37.28 tok/sEstimated

microsoft/Phi-4-mini-instructQ8

Fits comfortably7GB required · 64GB available

24.08 tok/sEstimated

microsoft/Phi-4-mini-instructFP16

Fits comfortably15GB required · 64GB available

15.29 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 64GB available

39.50 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 64GB available

37.04 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 64GB available

25.96 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 64GB available

13.73 tok/sEstimated

apple/OpenELM-1_1B-InstructQ4

Fits comfortably1GB required · 64GB available

41.45 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Fits comfortably31GB required · 64GB available

14.52 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 64GB available

37.50 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.