Quick Answer: Apple M3 Max offers 128GB VRAM and starts around current market pricing. It delivers approximately 70 tokens/sec on deepseek-ai/DeepSeek-OCR. It typically draws 40W under load.

Apple M3 Max

Unknown

By AppleReleased 2023-10MSRP $3,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM128GB

Cores40

TDP40W

ArchitectureApple Silicon

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test Apple M3 Max performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
deepseek-ai/DeepSeek-OCR	Q4	70.03 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-3-1b-it	Q4	69.09 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	67.81 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	67.67 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	67.33 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	67.23 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	66.35 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	66.23 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	65.27 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	65.04 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	64.99 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	64.92 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	64.71 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	64.62 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	63.71 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	63.48 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	63.33 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	62.96 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	62.89 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	62.32 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	61.31 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	61.15 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	61.06 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	60.59 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	59.87 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	59.36 tok/sEstimated Auto-generated benchmark	1GB
HuggingFaceH4/zephyr-7b-beta	Q4	59.01 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-1.5B-Instruct	Q4	58.97 tok/sEstimated Auto-generated benchmark	3GB
unsloth/gemma-3-1b-it	Q4	58.93 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	58.85 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-0.6B	Q4	58.82 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	58.66 tok/sEstimated Auto-generated benchmark	2GB
GSAI-ML/LLaDA-8B-Instruct	Q4	58.56 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-vision-instruct	Q4	58.56 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	58.53 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-3B	Q4	58.48 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	58.30 tok/sEstimated Auto-generated benchmark	1GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	58.12 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-2-7b-chat-hf	Q4	58.09 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	58.06 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q4	58.02 tok/sEstimated Auto-generated benchmark	3GB
google-bert/bert-base-uncased	Q4	57.98 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	57.95 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-8B-FP8	Q4	57.95 tok/sEstimated Auto-generated benchmark	4GB
hmellor/tiny-random-LlamaForCausalLM	Q4	57.93 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B	Q4	57.88 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-0.6B-Base	Q4	57.84 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-V3-0324	Q4	57.84 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	57.78 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	57.66 tok/sEstimated Auto-generated benchmark	4GB

deepseek-ai/DeepSeek-OCR

2GB

70.03 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

69.09 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

67.81 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

67.67 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

67.33 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

67.23 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

66.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

66.23 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

65.27 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

65.04 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

64.99 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

64.92 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

64.71 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

64.62 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

63.71 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

63.48 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

63.33 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

62.96 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

62.89 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

62.32 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

61.31 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

61.15 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

61.06 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

60.59 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

59.87 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

59.36 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

59.01 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

58.97 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

58.93 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

58.85 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

58.82 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

2GB

58.66 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

58.56 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

58.56 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

58.53 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

58.48 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

58.30 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

2GB

58.12 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

58.09 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

58.06 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

58.02 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

57.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

57.95 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

57.95 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

4GB

57.93 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

57.88 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

57.84 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

57.84 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

57.78 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

57.66 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Fits comfortably	12.60 tok/sEstimated	35GB (have 128GB)
Qwen/Qwen2.5-7B	FP16	Fits comfortably	18.80 tok/sEstimated	15GB (have 128GB)
Qwen/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	57.95 tok/sEstimated	3GB (have 128GB)
Qwen/Qwen2.5-0.5B-Instruct	FP16	Fits comfortably	20.40 tok/sEstimated	11GB (have 128GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	30.16 tok/sEstimated	15GB (have 128GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	43.28 tok/sEstimated	3GB (have 128GB)
Qwen/Qwen3-30B-A3B	Q4	Fits comfortably	27.71 tok/sEstimated	15GB (have 128GB)
Qwen/Qwen3-30B-A3B	Q8	Fits comfortably	21.92 tok/sEstimated	31GB (have 128GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	Fits comfortably	38.24 tok/sEstimated	7GB (have 128GB)
lmsys/vicuna-7b-v1.5	Q4	Fits comfortably	53.76 tok/sEstimated	4GB (have 128GB)
Qwen/Qwen2.5-3B	FP16	Fits comfortably	25.07 tok/sEstimated	6GB (have 128GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	Fits comfortably	43.49 tok/sEstimated	5GB (have 128GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	Fits comfortably	27.04 tok/sEstimated	10GB (have 128GB)
dicta-il/dictalm2.0-instruct	FP16	Fits comfortably	21.39 tok/sEstimated	15GB (have 128GB)
BSC-LT/salamandraTA-7b-instruct	Q4	Fits comfortably	55.16 tok/sEstimated	4GB (have 128GB)
BSC-LT/salamandraTA-7b-instruct	Q8	Fits comfortably	35.89 tok/sEstimated	7GB (have 128GB)
microsoft/Phi-3-medium-128k-instruct	FP16	Fits comfortably	15.52 tok/sEstimated	29GB (have 128GB)
deepseek-ai/DeepSeek-OCR	Q8	Fits comfortably	46.07 tok/sEstimated	4GB (have 128GB)
deepseek-ai/DeepSeek-V2.5	Q4	Not supported	17.39 tok/sEstimated	328GB (have 128GB)
deepseek-ai/DeepSeek-V2.5	Q8	Not supported	14.19 tok/sEstimated	656GB (have 128GB)
deepseek-ai/DeepSeek-V2.5	FP16	Not supported	7.84 tok/sEstimated	1312GB (have 128GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	FP16	Not supported	4.30 tok/sEstimated	176GB (have 128GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	6.80 tok/sEstimated	378GB (have 128GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	4.64 tok/sEstimated	755GB (have 128GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	2.42 tok/sEstimated	1509GB (have 128GB)
EssentialAI/rnj-1	Q4	Fits comfortably	39.40 tok/sEstimated	5GB (have 128GB)
EssentialAI/rnj-1	Q8	Fits comfortably	29.76 tok/sEstimated	10GB (have 128GB)
EssentialAI/rnj-1	FP16	Fits comfortably	15.13 tok/sEstimated	19GB (have 128GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	Fits comfortably	18.00 tok/sEstimated	17GB (have 128GB)
dphn/dolphin-2.9.1-yi-1.5-34b	FP16	Fits comfortably	6.56 tok/sEstimated	70GB (have 128GB)
openai/gpt-oss-20b	Q4	Fits comfortably	31.51 tok/sEstimated	10GB (have 128GB)
openai/gpt-oss-20b	Q8	Fits comfortably	19.54 tok/sEstimated	20GB (have 128GB)
openai/gpt-oss-20b	FP16	Fits comfortably	10.12 tok/sEstimated	41GB (have 128GB)
google/gemma-3-1b-it	Q4	Fits comfortably	69.09 tok/sEstimated	1GB (have 128GB)
Qwen/Qwen3-Embedding-0.6B	FP16	Fits comfortably	20.58 tok/sEstimated	13GB (have 128GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	49.74 tok/sEstimated	3GB (have 128GB)
facebook/opt-125m	FP16	Fits comfortably	19.20 tok/sEstimated	15GB (have 128GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	Fits comfortably	61.15 tok/sEstimated	1GB (have 128GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	41.31 tok/sEstimated	1GB (have 128GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	FP16	Fits comfortably	23.39 tok/sEstimated	2GB (have 128GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	55.54 tok/sEstimated	2GB (have 128GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	53.85 tok/sEstimated	4GB (have 128GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	34.13 tok/sEstimated	9GB (have 128GB)
Qwen/Qwen3-8B	FP16	Fits comfortably	19.48 tok/sEstimated	17GB (have 128GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	Fits comfortably	18.36 tok/sEstimated	34GB (have 128GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q8	Fits comfortably	13.90 tok/sEstimated	68GB (have 128GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Not supported	7.50 tok/sEstimated	137GB (have 128GB)
microsoft/phi-2	Q4	Fits comfortably	54.01 tok/sEstimated	4GB (have 128GB)
microsoft/phi-2	Q8	Fits comfortably	35.94 tok/sEstimated	7GB (have 128GB)
openai-community/gpt2	Q4	Fits comfortably	49.39 tok/sEstimated	4GB (have 128GB)

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Fits comfortably35GB required · 128GB available

12.60 tok/sEstimated

Qwen/Qwen2.5-7BFP16

Fits comfortably15GB required · 128GB available

18.80 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 128GB available

57.95 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructFP16

Fits comfortably11GB required · 128GB available

20.40 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 128GB available

30.16 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 128GB available

43.28 tok/sEstimated

Qwen/Qwen3-30B-A3BQ4

Fits comfortably15GB required · 128GB available

27.71 tok/sEstimated

Qwen/Qwen3-30B-A3BQ8

Fits comfortably31GB required · 128GB available

21.92 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ8

Fits comfortably7GB required · 128GB available

38.24 tok/sEstimated

lmsys/vicuna-7b-v1.5Q4

Fits comfortably4GB required · 128GB available

53.76 tok/sEstimated

Qwen/Qwen2.5-3BFP16

Fits comfortably6GB required · 128GB available

25.07 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q4

Fits comfortably5GB required · 128GB available

43.49 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q8

Fits comfortably10GB required · 128GB available

27.04 tok/sEstimated

dicta-il/dictalm2.0-instructFP16

Fits comfortably15GB required · 128GB available

21.39 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ4

Fits comfortably4GB required · 128GB available

55.16 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ8

Fits comfortably7GB required · 128GB available

35.89 tok/sEstimated

microsoft/Phi-3-medium-128k-instructFP16

Fits comfortably29GB required · 128GB available

15.52 tok/sEstimated

deepseek-ai/DeepSeek-OCRQ8

Fits comfortably4GB required · 128GB available

46.07 tok/sEstimated

deepseek-ai/DeepSeek-V2.5Q4

Not supported328GB required · 128GB available

17.39 tok/sEstimated

deepseek-ai/DeepSeek-V2.5Q8

Not supported656GB required · 128GB available

14.19 tok/sEstimated

deepseek-ai/DeepSeek-V2.5FP16

Not supported1312GB required · 128GB available

7.84 tok/sEstimated

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicFP16

Not supported176GB required · 128GB available

4.30 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 128GB available

6.80 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 128GB available

4.64 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 128GB available

2.42 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 128GB available

39.40 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 128GB available

29.76 tok/sEstimated

EssentialAI/rnj-1FP16

Fits comfortably19GB required · 128GB available

15.13 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ4

Fits comfortably17GB required · 128GB available

18.00 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bFP16

Fits comfortably70GB required · 128GB available

6.56 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 128GB available

31.51 tok/sEstimated

openai/gpt-oss-20bQ8

Fits comfortably20GB required · 128GB available

19.54 tok/sEstimated

openai/gpt-oss-20bFP16

Fits comfortably41GB required · 128GB available

10.12 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 128GB available

69.09 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BFP16

Fits comfortably13GB required · 128GB available

20.58 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 128GB available

49.74 tok/sEstimated

facebook/opt-125mFP16

Fits comfortably15GB required · 128GB available

19.20 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4

Fits comfortably1GB required · 128GB available

61.15 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 128GB available

41.31 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0FP16

Fits comfortably2GB required · 128GB available

23.39 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 128GB available

55.54 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 128GB available

53.85 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 128GB available

34.13 tok/sEstimated

Qwen/Qwen3-8BFP16

Fits comfortably17GB required · 128GB available

19.48 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4

Fits comfortably34GB required · 128GB available

18.36 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ8

Fits comfortably68GB required · 128GB available

13.90 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Not supported137GB required · 128GB available

7.50 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 128GB available

54.01 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 128GB available

35.94 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 128GB available

49.39 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

Apple M2 Ultra

192GB

Explore how Apple M2 Ultra stacks up for local inference workloads.

RTX 4090

24GB

Explore how RTX 4090 stacks up for local inference workloads.

RTX 4080

16GB

Explore how RTX 4080 stacks up for local inference workloads.

RX 7900 XTX

24GB

Explore how RX 7900 XTX stacks up for local inference workloads.

NVIDIA RTX 6000 Ada

48GB

Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.

Compare Apple M3 Max

Apple M3 Max vs Apple M2 Ultra

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

Apple M3 Max vs RTX 4090

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

Apple M3 Max vs NVIDIA RTX 6000 Ada

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

Quick Answer: Apple M3 Max offers 128GB VRAM and starts around current market pricing. It delivers approximately 70 tokens/sec on deepseek-ai/DeepSeek-OCR. It typically draws 40W under load.

Apple M3 Max

Unknown

By AppleReleased 2023-10MSRP $3,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM128GB

Cores40

TDP40W

ArchitectureApple Silicon

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test Apple M3 Max performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
deepseek-ai/DeepSeek-OCR	Q4	70.03 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-3-1b-it	Q4	69.09 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	67.81 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	67.67 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	67.33 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	67.23 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	66.35 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	66.23 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	65.27 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	65.04 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	64.99 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	64.92 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	64.71 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	64.62 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	63.71 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	63.48 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	63.33 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	62.96 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	62.89 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	62.32 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	61.31 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	61.15 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	61.06 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	60.59 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	59.87 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	59.36 tok/sEstimated Auto-generated benchmark	1GB
HuggingFaceH4/zephyr-7b-beta	Q4	59.01 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-1.5B-Instruct	Q4	58.97 tok/sEstimated Auto-generated benchmark	3GB
unsloth/gemma-3-1b-it	Q4	58.93 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	58.85 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-0.6B	Q4	58.82 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	58.66 tok/sEstimated Auto-generated benchmark	2GB
GSAI-ML/LLaDA-8B-Instruct	Q4	58.56 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-vision-instruct	Q4	58.56 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	58.53 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-3B	Q4	58.48 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	58.30 tok/sEstimated Auto-generated benchmark	1GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	58.12 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-2-7b-chat-hf	Q4	58.09 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	58.06 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q4	58.02 tok/sEstimated Auto-generated benchmark	3GB
google-bert/bert-base-uncased	Q4	57.98 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	57.95 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-8B-FP8	Q4	57.95 tok/sEstimated Auto-generated benchmark	4GB
hmellor/tiny-random-LlamaForCausalLM	Q4	57.93 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B	Q4	57.88 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-0.6B-Base	Q4	57.84 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-V3-0324	Q4	57.84 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	57.78 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	57.66 tok/sEstimated Auto-generated benchmark	4GB

deepseek-ai/DeepSeek-OCR

2GB

70.03 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

69.09 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

67.81 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

67.67 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

67.33 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

67.23 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

66.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

66.23 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

65.27 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

65.04 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

64.99 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

64.92 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

64.71 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

64.62 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

63.71 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

63.48 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

63.33 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

62.96 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

62.89 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

62.32 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

61.31 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

61.15 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

61.06 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

60.59 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

59.87 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

59.36 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

59.01 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

58.97 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

58.93 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

58.85 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

58.82 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

2GB

58.66 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

58.56 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

58.56 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

58.53 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

58.48 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

58.30 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

2GB

58.12 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

58.09 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

58.06 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

58.02 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

57.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

57.95 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

57.95 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

4GB

57.93 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

57.88 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

57.84 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

57.84 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

57.78 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

57.66 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Fits comfortably	12.60 tok/sEstimated	35GB (have 128GB)
Qwen/Qwen2.5-7B	FP16	Fits comfortably	18.80 tok/sEstimated	15GB (have 128GB)
Qwen/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	57.95 tok/sEstimated	3GB (have 128GB)
Qwen/Qwen2.5-0.5B-Instruct	FP16	Fits comfortably	20.40 tok/sEstimated	11GB (have 128GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	30.16 tok/sEstimated	15GB (have 128GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	43.28 tok/sEstimated	3GB (have 128GB)
Qwen/Qwen3-30B-A3B	Q4	Fits comfortably	27.71 tok/sEstimated	15GB (have 128GB)
Qwen/Qwen3-30B-A3B	Q8	Fits comfortably	21.92 tok/sEstimated	31GB (have 128GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	Fits comfortably	38.24 tok/sEstimated	7GB (have 128GB)
lmsys/vicuna-7b-v1.5	Q4	Fits comfortably	53.76 tok/sEstimated	4GB (have 128GB)
Qwen/Qwen2.5-3B	FP16	Fits comfortably	25.07 tok/sEstimated	6GB (have 128GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	Fits comfortably	43.49 tok/sEstimated	5GB (have 128GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	Fits comfortably	27.04 tok/sEstimated	10GB (have 128GB)
dicta-il/dictalm2.0-instruct	FP16	Fits comfortably	21.39 tok/sEstimated	15GB (have 128GB)
BSC-LT/salamandraTA-7b-instruct	Q4	Fits comfortably	55.16 tok/sEstimated	4GB (have 128GB)
BSC-LT/salamandraTA-7b-instruct	Q8	Fits comfortably	35.89 tok/sEstimated	7GB (have 128GB)
microsoft/Phi-3-medium-128k-instruct	FP16	Fits comfortably	15.52 tok/sEstimated	29GB (have 128GB)
deepseek-ai/DeepSeek-OCR	Q8	Fits comfortably	46.07 tok/sEstimated	4GB (have 128GB)
deepseek-ai/DeepSeek-V2.5	Q4	Not supported	17.39 tok/sEstimated	328GB (have 128GB)
deepseek-ai/DeepSeek-V2.5	Q8	Not supported	14.19 tok/sEstimated	656GB (have 128GB)
deepseek-ai/DeepSeek-V2.5	FP16	Not supported	7.84 tok/sEstimated	1312GB (have 128GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	FP16	Not supported	4.30 tok/sEstimated	176GB (have 128GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	6.80 tok/sEstimated	378GB (have 128GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	4.64 tok/sEstimated	755GB (have 128GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	2.42 tok/sEstimated	1509GB (have 128GB)
EssentialAI/rnj-1	Q4	Fits comfortably	39.40 tok/sEstimated	5GB (have 128GB)
EssentialAI/rnj-1	Q8	Fits comfortably	29.76 tok/sEstimated	10GB (have 128GB)
EssentialAI/rnj-1	FP16	Fits comfortably	15.13 tok/sEstimated	19GB (have 128GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	Fits comfortably	18.00 tok/sEstimated	17GB (have 128GB)
dphn/dolphin-2.9.1-yi-1.5-34b	FP16	Fits comfortably	6.56 tok/sEstimated	70GB (have 128GB)
openai/gpt-oss-20b	Q4	Fits comfortably	31.51 tok/sEstimated	10GB (have 128GB)
openai/gpt-oss-20b	Q8	Fits comfortably	19.54 tok/sEstimated	20GB (have 128GB)
openai/gpt-oss-20b	FP16	Fits comfortably	10.12 tok/sEstimated	41GB (have 128GB)
google/gemma-3-1b-it	Q4	Fits comfortably	69.09 tok/sEstimated	1GB (have 128GB)
Qwen/Qwen3-Embedding-0.6B	FP16	Fits comfortably	20.58 tok/sEstimated	13GB (have 128GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	49.74 tok/sEstimated	3GB (have 128GB)
facebook/opt-125m	FP16	Fits comfortably	19.20 tok/sEstimated	15GB (have 128GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	Fits comfortably	61.15 tok/sEstimated	1GB (have 128GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	41.31 tok/sEstimated	1GB (have 128GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	FP16	Fits comfortably	23.39 tok/sEstimated	2GB (have 128GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	55.54 tok/sEstimated	2GB (have 128GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	53.85 tok/sEstimated	4GB (have 128GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	34.13 tok/sEstimated	9GB (have 128GB)
Qwen/Qwen3-8B	FP16	Fits comfortably	19.48 tok/sEstimated	17GB (have 128GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	Fits comfortably	18.36 tok/sEstimated	34GB (have 128GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q8	Fits comfortably	13.90 tok/sEstimated	68GB (have 128GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Not supported	7.50 tok/sEstimated	137GB (have 128GB)
microsoft/phi-2	Q4	Fits comfortably	54.01 tok/sEstimated	4GB (have 128GB)
microsoft/phi-2	Q8	Fits comfortably	35.94 tok/sEstimated	7GB (have 128GB)
openai-community/gpt2	Q4	Fits comfortably	49.39 tok/sEstimated	4GB (have 128GB)

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Fits comfortably35GB required · 128GB available

12.60 tok/sEstimated

Qwen/Qwen2.5-7BFP16

Fits comfortably15GB required · 128GB available

18.80 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 128GB available

57.95 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructFP16

Fits comfortably11GB required · 128GB available

20.40 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 128GB available

30.16 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 128GB available

43.28 tok/sEstimated

Qwen/Qwen3-30B-A3BQ4

Fits comfortably15GB required · 128GB available

27.71 tok/sEstimated

Qwen/Qwen3-30B-A3BQ8

Fits comfortably31GB required · 128GB available

21.92 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ8

Fits comfortably7GB required · 128GB available

38.24 tok/sEstimated

lmsys/vicuna-7b-v1.5Q4

Fits comfortably4GB required · 128GB available

53.76 tok/sEstimated

Qwen/Qwen2.5-3BFP16

Fits comfortably6GB required · 128GB available

25.07 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q4

Fits comfortably5GB required · 128GB available

43.49 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q8

Fits comfortably10GB required · 128GB available

27.04 tok/sEstimated

dicta-il/dictalm2.0-instructFP16

Fits comfortably15GB required · 128GB available

21.39 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ4

Fits comfortably4GB required · 128GB available

55.16 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ8

Fits comfortably7GB required · 128GB available

35.89 tok/sEstimated

microsoft/Phi-3-medium-128k-instructFP16

Fits comfortably29GB required · 128GB available

15.52 tok/sEstimated

deepseek-ai/DeepSeek-OCRQ8

Fits comfortably4GB required · 128GB available

46.07 tok/sEstimated

deepseek-ai/DeepSeek-V2.5Q4

Not supported328GB required · 128GB available

17.39 tok/sEstimated

deepseek-ai/DeepSeek-V2.5Q8

Not supported656GB required · 128GB available

14.19 tok/sEstimated

deepseek-ai/DeepSeek-V2.5FP16

Not supported1312GB required · 128GB available

7.84 tok/sEstimated

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicFP16

Not supported176GB required · 128GB available

4.30 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 128GB available

6.80 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 128GB available

4.64 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 128GB available

2.42 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 128GB available

39.40 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 128GB available

29.76 tok/sEstimated

EssentialAI/rnj-1FP16

Fits comfortably19GB required · 128GB available

15.13 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ4

Fits comfortably17GB required · 128GB available

18.00 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bFP16

Fits comfortably70GB required · 128GB available

6.56 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 128GB available

31.51 tok/sEstimated

openai/gpt-oss-20bQ8

Fits comfortably20GB required · 128GB available

19.54 tok/sEstimated

openai/gpt-oss-20bFP16

Fits comfortably41GB required · 128GB available

10.12 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 128GB available

69.09 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BFP16

Fits comfortably13GB required · 128GB available

20.58 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 128GB available

49.74 tok/sEstimated

facebook/opt-125mFP16

Fits comfortably15GB required · 128GB available

19.20 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4

Fits comfortably1GB required · 128GB available

61.15 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 128GB available

41.31 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0FP16

Fits comfortably2GB required · 128GB available

23.39 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 128GB available

55.54 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 128GB available

53.85 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 128GB available

34.13 tok/sEstimated

Qwen/Qwen3-8BFP16

Fits comfortably17GB required · 128GB available

19.48 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4

Fits comfortably34GB required · 128GB available

18.36 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ8

Fits comfortably68GB required · 128GB available

13.90 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Not supported137GB required · 128GB available

7.50 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 128GB available

54.01 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 128GB available

35.94 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 128GB available

49.39 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

Apple M2 Ultra

192GB

Explore how Apple M2 Ultra stacks up for local inference workloads.

RTX 4090

24GB

Explore how RTX 4090 stacks up for local inference workloads.

RTX 4080

16GB

Explore how RTX 4080 stacks up for local inference workloads.

RX 7900 XTX

24GB

Explore how RX 7900 XTX stacks up for local inference workloads.

NVIDIA RTX 6000 Ada

48GB

Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.

Compare Apple M3 Max

Apple M3 Max vs Apple M2 Ultra

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

Apple M3 Max vs RTX 4090

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

Apple M3 Max vs NVIDIA RTX 6000 Ada

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.