Quick Answer: Apple M2 Ultra offers 192GB VRAM and starts around $39.99. It delivers approximately 142 tokens/sec on google/gemma-2b. It typically draws 60W under load.

Apple M2 Ultra

In Stock

By AppleReleased 2023-06MSRP $5,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $39.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM192GB

Cores76

TDP60W

ArchitectureApple Silicon

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonIn Stock

$39.99

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test Apple M2 Ultra performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
google/gemma-2b	Q4	141.64 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	140.96 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	139.91 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	139.05 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	139.05 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	138.85 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	137.99 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	137.21 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	136.97 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	136.31 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	136.20 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	136.12 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	135.91 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	133.51 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	131.32 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	131.00 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	130.58 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	130.39 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	129.23 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	129.20 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	128.20 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	124.20 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	123.39 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	123.29 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	121.42 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	120.35 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	119.53 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	118.98 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	118.89 tok/sEstimated Auto-generated benchmark	1GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	118.06 tok/sEstimated Auto-generated benchmark	4GB
Gensyn/Qwen2.5-0.5B-Instruct	Q4	118.03 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	118.00 tok/sEstimated Auto-generated benchmark	3GB
GSAI-ML/LLaDA-8B-Instruct	Q4	117.77 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B	Q4	117.47 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	117.18 tok/sEstimated Auto-generated benchmark	2GB
BSC-LT/salamandraTA-7b-instruct	Q4	117.12 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	117.12 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-medium	Q4	116.83 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Llama-3.2-1B-Instruct	Q4	116.82 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	116.69 tok/sEstimated Auto-generated benchmark	3GB
GSAI-ML/LLaDA-8B-Base	Q4	116.65 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-4	Q4	116.52 tok/sEstimated Auto-generated benchmark	4GB
lmsys/vicuna-7b-v1.5	Q4	116.31 tok/sEstimated Auto-generated benchmark	4GB
microsoft/VibeVoice-1.5B	Q4	116.18 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-0.6B-Base	Q4	116.16 tok/sEstimated Auto-generated benchmark	3GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	115.97 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	115.79 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-Instruct-v0.2	Q4	114.99 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B	Q4	114.84 tok/sEstimated Auto-generated benchmark	2GB
openai-community/gpt2-xl	Q4	114.64 tok/sEstimated Auto-generated benchmark	4GB

google/gemma-2b

1GB

141.64 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

140.96 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

139.91 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

139.05 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

139.05 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

138.85 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

137.99 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

137.21 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

136.97 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

136.31 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

136.20 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

136.12 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

135.91 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

133.51 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

131.32 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

131.00 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

130.58 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

130.39 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

129.23 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

129.20 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

128.20 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

124.20 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

123.39 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

123.29 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

121.42 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

120.35 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

119.53 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

118.98 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

118.89 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

118.06 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

3GB

118.03 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

118.00 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

117.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

117.47 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

117.18 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

117.12 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

117.12 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

116.83 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

116.82 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3GB

116.69 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

116.65 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

4GB

116.52 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

4GB

116.31 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

116.18 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

116.16 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

115.97 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

115.79 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

4GB

114.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

114.84 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

114.64 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	14.12 tok/sEstimated	378GB (have 192GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	9.37 tok/sEstimated	755GB (have 192GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	5.04 tok/sEstimated	1509GB (have 192GB)
EssentialAI/rnj-1	Q4	Fits comfortably	75.57 tok/sEstimated	5GB (have 192GB)
EssentialAI/rnj-1	Q8	Fits comfortably	59.17 tok/sEstimated	10GB (have 192GB)
EssentialAI/rnj-1	FP16	Fits comfortably	31.88 tok/sEstimated	19GB (have 192GB)
google/gemma-3-1b-it	Q4	Fits comfortably	139.91 tok/sEstimated	1GB (have 192GB)
google/gemma-3-1b-it	Q8	Fits comfortably	88.73 tok/sEstimated	1GB (have 192GB)
google/gemma-3-1b-it	FP16	Fits comfortably	47.67 tok/sEstimated	2GB (have 192GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	104.88 tok/sEstimated	3GB (have 192GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	113.06 tok/sEstimated	4GB (have 192GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	72.94 tok/sEstimated	9GB (have 192GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Fits comfortably	41.72 tok/sEstimated	17GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	116.69 tok/sEstimated	3GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	73.76 tok/sEstimated	5GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	FP16	Fits comfortably	44.11 tok/sEstimated	11GB (have 192GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	Fits comfortably	61.53 tok/sEstimated	10GB (have 192GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	Fits comfortably	39.01 tok/sEstimated	20GB (have 192GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	FP16	Fits comfortably	24.08 tok/sEstimated	41GB (have 192GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	Fits comfortably	97.94 tok/sEstimated	2GB (have 192GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	Fits comfortably	73.99 tok/sEstimated	4GB (have 192GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	40.23 tok/sEstimated	9GB (have 192GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	110.56 tok/sEstimated	3GB (have 192GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Fits comfortably	28.26 tok/sEstimated	68GB (have 192GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Fits comfortably	15.58 tok/sEstimated	137GB (have 192GB)
microsoft/phi-2	Q4	Fits comfortably	106.46 tok/sEstimated	4GB (have 192GB)
microsoft/phi-2	Q8	Fits comfortably	71.52 tok/sEstimated	7GB (have 192GB)
microsoft/phi-2	FP16	Fits comfortably	37.68 tok/sEstimated	15GB (have 192GB)
meta-llama/Llama-2-7b-hf	FP16	Fits comfortably	41.33 tok/sEstimated	15GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	104.97 tok/sEstimated	4GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	79.44 tok/sEstimated	9GB (have 192GB)
microsoft/DialoGPT-medium	Q4	Fits comfortably	100.23 tok/sEstimated	4GB (have 192GB)
MiniMaxAI/MiniMax-M2	FP16	Fits comfortably	41.11 tok/sEstimated	15GB (have 192GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	105.88 tok/sEstimated	3GB (have 192GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	68.52 tok/sEstimated	5GB (have 192GB)
Qwen/Qwen2-0.5B	FP16	Fits comfortably	43.25 tok/sEstimated	11GB (have 192GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	117.18 tok/sEstimated	2GB (have 192GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	91.74 tok/sEstimated	3GB (have 192GB)
deepseek-ai/deepseek-coder-1.3b-instruct	FP16	Fits comfortably	46.54 tok/sEstimated	6GB (have 192GB)
microsoft/phi-4	Q4	Fits comfortably	116.52 tok/sEstimated	4GB (have 192GB)
microsoft/phi-4	Q8	Fits comfortably	70.86 tok/sEstimated	7GB (have 192GB)
microsoft/phi-4	FP16	Fits comfortably	37.44 tok/sEstimated	15GB (have 192GB)
deepseek-ai/DeepSeek-V3.1	Q4	Fits comfortably	97.45 tok/sEstimated	4GB (have 192GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	75.91 tok/sEstimated	7GB (have 192GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	44.42 tok/sEstimated	15GB (have 192GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	100.94 tok/sEstimated	4GB (have 192GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	73.16 tok/sEstimated	9GB (have 192GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Fits comfortably	37.14 tok/sEstimated	16GB (have 192GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Fits comfortably	25.71 tok/sEstimated	33GB (have 192GB)
GSAI-ML/LLaDA-8B-Base	FP16	Fits comfortably	40.46 tok/sEstimated	17GB (have 192GB)

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 192GB available

14.12 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 192GB available

9.37 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 192GB available

5.04 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 192GB available

75.57 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 192GB available

59.17 tok/sEstimated

EssentialAI/rnj-1FP16

Fits comfortably19GB required · 192GB available

31.88 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 192GB available

139.91 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 192GB available

88.73 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 192GB available

47.67 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 192GB available

104.88 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 192GB available

113.06 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably9GB required · 192GB available

72.94 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Fits comfortably17GB required · 192GB available

41.72 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 192GB available

116.69 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 192GB available

73.76 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BFP16

Fits comfortably11GB required · 192GB available

44.11 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q4

Fits comfortably10GB required · 192GB available

61.53 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q8

Fits comfortably20GB required · 192GB available

39.01 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8FP16

Fits comfortably41GB required · 192GB available

24.08 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ4

Fits comfortably2GB required · 192GB available

97.94 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8

Fits comfortably4GB required · 192GB available

73.99 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 192GB available

40.23 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 192GB available

110.56 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Fits comfortably68GB required · 192GB available

28.26 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Fits comfortably137GB required · 192GB available

15.58 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 192GB available

106.46 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 192GB available

71.52 tok/sEstimated

microsoft/phi-2FP16

Fits comfortably15GB required · 192GB available

37.68 tok/sEstimated

meta-llama/Llama-2-7b-hfFP16

Fits comfortably15GB required · 192GB available

41.33 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 192GB available

104.97 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably9GB required · 192GB available

79.44 tok/sEstimated

microsoft/DialoGPT-mediumQ4

Fits comfortably4GB required · 192GB available

100.23 tok/sEstimated

MiniMaxAI/MiniMax-M2FP16

Fits comfortably15GB required · 192GB available

41.11 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 192GB available

105.88 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 192GB available

68.52 tok/sEstimated

Qwen/Qwen2-0.5BFP16

Fits comfortably11GB required · 192GB available

43.25 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 192GB available

117.18 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 192GB available

91.74 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructFP16

Fits comfortably6GB required · 192GB available

46.54 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 192GB available

116.52 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 192GB available

70.86 tok/sEstimated

microsoft/phi-4FP16

Fits comfortably15GB required · 192GB available

37.44 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q4

Fits comfortably4GB required · 192GB available

97.45 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 192GB available

75.91 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 192GB available

44.42 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 192GB available

100.94 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably9GB required · 192GB available

73.16 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ4

Fits comfortably16GB required · 192GB available

37.14 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Fits comfortably33GB required · 192GB available

25.71 tok/sEstimated

GSAI-ML/LLaDA-8B-BaseFP16

Fits comfortably17GB required · 192GB available

40.46 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

Apple M3 Max

128GB

Explore how Apple M3 Max stacks up for local inference workloads.

RTX 4090

24GB

Explore how RTX 4090 stacks up for local inference workloads.

NVIDIA RTX 6000 Ada

48GB

Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.

NVIDIA A6000

48GB

Explore how NVIDIA A6000 stacks up for local inference workloads.

RX 7900 XTX

24GB

Explore how RX 7900 XTX stacks up for local inference workloads.

Quick Answer: Apple M2 Ultra offers 192GB VRAM and starts around $39.99. It delivers approximately 142 tokens/sec on google/gemma-2b. It typically draws 60W under load.

Apple M2 Ultra

In Stock

By AppleReleased 2023-06MSRP $5,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $39.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM192GB

Cores76

TDP60W

ArchitectureApple Silicon

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonIn Stock

$39.99

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test Apple M2 Ultra performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
google/gemma-2b	Q4	141.64 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	140.96 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	139.91 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	139.05 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	139.05 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	138.85 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	137.99 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	137.21 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	136.97 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	136.31 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	136.20 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	136.12 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	135.91 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	133.51 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	131.32 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	131.00 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	130.58 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	130.39 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	129.23 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	129.20 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	128.20 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	124.20 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	123.39 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	123.29 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	121.42 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	120.35 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	119.53 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	118.98 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	118.89 tok/sEstimated Auto-generated benchmark	1GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	118.06 tok/sEstimated Auto-generated benchmark	4GB
Gensyn/Qwen2.5-0.5B-Instruct	Q4	118.03 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	118.00 tok/sEstimated Auto-generated benchmark	3GB
GSAI-ML/LLaDA-8B-Instruct	Q4	117.77 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B	Q4	117.47 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	117.18 tok/sEstimated Auto-generated benchmark	2GB
BSC-LT/salamandraTA-7b-instruct	Q4	117.12 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	117.12 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-medium	Q4	116.83 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Llama-3.2-1B-Instruct	Q4	116.82 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	116.69 tok/sEstimated Auto-generated benchmark	3GB
GSAI-ML/LLaDA-8B-Base	Q4	116.65 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-4	Q4	116.52 tok/sEstimated Auto-generated benchmark	4GB
lmsys/vicuna-7b-v1.5	Q4	116.31 tok/sEstimated Auto-generated benchmark	4GB
microsoft/VibeVoice-1.5B	Q4	116.18 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-0.6B-Base	Q4	116.16 tok/sEstimated Auto-generated benchmark	3GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	115.97 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	115.79 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-Instruct-v0.2	Q4	114.99 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B	Q4	114.84 tok/sEstimated Auto-generated benchmark	2GB
openai-community/gpt2-xl	Q4	114.64 tok/sEstimated Auto-generated benchmark	4GB

google/gemma-2b

1GB

141.64 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

140.96 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

139.91 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

139.05 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

139.05 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

138.85 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

137.99 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

137.21 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

136.97 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

136.31 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

136.20 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

136.12 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

135.91 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

133.51 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

131.32 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

131.00 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

130.58 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

130.39 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

129.23 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

129.20 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

128.20 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

124.20 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

123.39 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

123.29 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

121.42 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

120.35 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

119.53 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

118.98 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

118.89 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

118.06 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

3GB

118.03 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

118.00 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

117.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

117.47 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

117.18 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

117.12 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

117.12 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

116.83 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

116.82 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3GB

116.69 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

116.65 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

4GB

116.52 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

4GB

116.31 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

116.18 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

116.16 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

115.97 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

115.79 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

4GB

114.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

114.84 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

114.64 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
mistralai/Mistral-Large-3-675B-Instruct-2512	Q4	Not supported	14.12 tok/sEstimated	378GB (have 192GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	Q8	Not supported	9.37 tok/sEstimated	755GB (have 192GB)
mistralai/Mistral-Large-3-675B-Instruct-2512	FP16	Not supported	5.04 tok/sEstimated	1509GB (have 192GB)
EssentialAI/rnj-1	Q4	Fits comfortably	75.57 tok/sEstimated	5GB (have 192GB)
EssentialAI/rnj-1	Q8	Fits comfortably	59.17 tok/sEstimated	10GB (have 192GB)
EssentialAI/rnj-1	FP16	Fits comfortably	31.88 tok/sEstimated	19GB (have 192GB)
google/gemma-3-1b-it	Q4	Fits comfortably	139.91 tok/sEstimated	1GB (have 192GB)
google/gemma-3-1b-it	Q8	Fits comfortably	88.73 tok/sEstimated	1GB (have 192GB)
google/gemma-3-1b-it	FP16	Fits comfortably	47.67 tok/sEstimated	2GB (have 192GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	104.88 tok/sEstimated	3GB (have 192GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	113.06 tok/sEstimated	4GB (have 192GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	72.94 tok/sEstimated	9GB (have 192GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Fits comfortably	41.72 tok/sEstimated	17GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	116.69 tok/sEstimated	3GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	73.76 tok/sEstimated	5GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	FP16	Fits comfortably	44.11 tok/sEstimated	11GB (have 192GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	Fits comfortably	61.53 tok/sEstimated	10GB (have 192GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	Fits comfortably	39.01 tok/sEstimated	20GB (have 192GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	FP16	Fits comfortably	24.08 tok/sEstimated	41GB (have 192GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	Fits comfortably	97.94 tok/sEstimated	2GB (have 192GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	Fits comfortably	73.99 tok/sEstimated	4GB (have 192GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	40.23 tok/sEstimated	9GB (have 192GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	110.56 tok/sEstimated	3GB (have 192GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Fits comfortably	28.26 tok/sEstimated	68GB (have 192GB)
meta-llama/Llama-3.1-70B-Instruct	FP16	Fits comfortably	15.58 tok/sEstimated	137GB (have 192GB)
microsoft/phi-2	Q4	Fits comfortably	106.46 tok/sEstimated	4GB (have 192GB)
microsoft/phi-2	Q8	Fits comfortably	71.52 tok/sEstimated	7GB (have 192GB)
microsoft/phi-2	FP16	Fits comfortably	37.68 tok/sEstimated	15GB (have 192GB)
meta-llama/Llama-2-7b-hf	FP16	Fits comfortably	41.33 tok/sEstimated	15GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	104.97 tok/sEstimated	4GB (have 192GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	79.44 tok/sEstimated	9GB (have 192GB)
microsoft/DialoGPT-medium	Q4	Fits comfortably	100.23 tok/sEstimated	4GB (have 192GB)
MiniMaxAI/MiniMax-M2	FP16	Fits comfortably	41.11 tok/sEstimated	15GB (have 192GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	105.88 tok/sEstimated	3GB (have 192GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	68.52 tok/sEstimated	5GB (have 192GB)
Qwen/Qwen2-0.5B	FP16	Fits comfortably	43.25 tok/sEstimated	11GB (have 192GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	117.18 tok/sEstimated	2GB (have 192GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	91.74 tok/sEstimated	3GB (have 192GB)
deepseek-ai/deepseek-coder-1.3b-instruct	FP16	Fits comfortably	46.54 tok/sEstimated	6GB (have 192GB)
microsoft/phi-4	Q4	Fits comfortably	116.52 tok/sEstimated	4GB (have 192GB)
microsoft/phi-4	Q8	Fits comfortably	70.86 tok/sEstimated	7GB (have 192GB)
microsoft/phi-4	FP16	Fits comfortably	37.44 tok/sEstimated	15GB (have 192GB)
deepseek-ai/DeepSeek-V3.1	Q4	Fits comfortably	97.45 tok/sEstimated	4GB (have 192GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	75.91 tok/sEstimated	7GB (have 192GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	44.42 tok/sEstimated	15GB (have 192GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	100.94 tok/sEstimated	4GB (have 192GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	73.16 tok/sEstimated	9GB (have 192GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Fits comfortably	37.14 tok/sEstimated	16GB (have 192GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Fits comfortably	25.71 tok/sEstimated	33GB (have 192GB)
GSAI-ML/LLaDA-8B-Base	FP16	Fits comfortably	40.46 tok/sEstimated	17GB (have 192GB)

mistralai/Mistral-Large-3-675B-Instruct-2512Q4

Not supported378GB required · 192GB available

14.12 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512Q8

Not supported755GB required · 192GB available

9.37 tok/sEstimated

mistralai/Mistral-Large-3-675B-Instruct-2512FP16

Not supported1509GB required · 192GB available

5.04 tok/sEstimated

EssentialAI/rnj-1Q4

Fits comfortably5GB required · 192GB available

75.57 tok/sEstimated

EssentialAI/rnj-1Q8

Fits comfortably10GB required · 192GB available

59.17 tok/sEstimated

EssentialAI/rnj-1FP16

Fits comfortably19GB required · 192GB available

31.88 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 192GB available

139.91 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 192GB available

88.73 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 192GB available

47.67 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 192GB available

104.88 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 192GB available

113.06 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably9GB required · 192GB available

72.94 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Fits comfortably17GB required · 192GB available

41.72 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 192GB available

116.69 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 192GB available

73.76 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BFP16

Fits comfortably11GB required · 192GB available

44.11 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q4

Fits comfortably10GB required · 192GB available

61.53 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q8

Fits comfortably20GB required · 192GB available

39.01 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8FP16

Fits comfortably41GB required · 192GB available

24.08 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ4

Fits comfortably2GB required · 192GB available

97.94 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8

Fits comfortably4GB required · 192GB available

73.99 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 192GB available

40.23 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 192GB available

110.56 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Fits comfortably68GB required · 192GB available

28.26 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructFP16

Fits comfortably137GB required · 192GB available

15.58 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 192GB available

106.46 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 192GB available

71.52 tok/sEstimated

microsoft/phi-2FP16

Fits comfortably15GB required · 192GB available

37.68 tok/sEstimated

meta-llama/Llama-2-7b-hfFP16

Fits comfortably15GB required · 192GB available

41.33 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 192GB available

104.97 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably9GB required · 192GB available

79.44 tok/sEstimated

microsoft/DialoGPT-mediumQ4

Fits comfortably4GB required · 192GB available

100.23 tok/sEstimated

MiniMaxAI/MiniMax-M2FP16

Fits comfortably15GB required · 192GB available

41.11 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 192GB available

105.88 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 192GB available

68.52 tok/sEstimated

Qwen/Qwen2-0.5BFP16

Fits comfortably11GB required · 192GB available

43.25 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 192GB available

117.18 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 192GB available

91.74 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructFP16

Fits comfortably6GB required · 192GB available

46.54 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 192GB available

116.52 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 192GB available

70.86 tok/sEstimated

microsoft/phi-4FP16

Fits comfortably15GB required · 192GB available

37.44 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q4

Fits comfortably4GB required · 192GB available

97.45 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 192GB available

75.91 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 192GB available

44.42 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 192GB available

100.94 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably9GB required · 192GB available

73.16 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ4

Fits comfortably16GB required · 192GB available

37.14 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Fits comfortably33GB required · 192GB available

25.71 tok/sEstimated

GSAI-ML/LLaDA-8B-BaseFP16

Fits comfortably17GB required · 192GB available

40.46 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

Apple M3 Max

128GB

Explore how Apple M3 Max stacks up for local inference workloads.

RTX 4090

24GB

Explore how RTX 4090 stacks up for local inference workloads.

NVIDIA RTX 6000 Ada

48GB

Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.

NVIDIA A6000

48GB

Explore how NVIDIA A6000 stacks up for local inference workloads.

RX 7900 XTX

24GB

Explore how RX 7900 XTX stacks up for local inference workloads.