Quick Answer: NVIDIA H200 SXM 141GB offers 141GB VRAM and starts around current market pricing. It delivers approximately 918 tokens/sec on deepseek-ai/DeepSeek-OCR. It typically draws 700W under load.

NVIDIA H200 SXM 141GB

Check availability

By NVIDIAReleased 2023-11MSRP $35,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM141GB

Cores16,896

TDP700W

ArchitectureHopper

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA H200 SXM 141GB performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
deepseek-ai/DeepSeek-OCR	Q4	918.04 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	899.99 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	892.88 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	890.49 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	890.18 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	887.95 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	882.07 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	880.62 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	879.27 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	874.57 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	854.80 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	845.59 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	844.62 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	831.79 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	824.89 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	822.25 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	818.45 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	814.74 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	807.54 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	806.27 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	801.78 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	794.66 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	793.70 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	790.53 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	783.01 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	775.54 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	764.26 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	760.55 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	760.33 tok/sEstimated Auto-generated benchmark	1GB
black-forest-labs/FLUX.2-dev	Q4	757.49 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	756.55 tok/sEstimated Auto-generated benchmark	4GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	752.49 tok/sEstimated Auto-generated benchmark	2GB
zai-org/GLM-4.6-FP8	Q4	751.25 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	750.94 tok/sEstimated Auto-generated benchmark	4GB
llamafactory/tiny-random-Llama-3	Q4	750.21 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	749.40 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	749.00 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-4-multimodal-instruct	Q4	747.88 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	747.62 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q4	746.78 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-Math-1.5B	Q4	746.03 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	745.77 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen-Image-Edit-2509	Q4	744.63 tok/sEstimated Auto-generated benchmark	4GB
nari-labs/Dia2-2B	Q4	744.44 tok/sEstimated Auto-generated benchmark	2GB
distilbert/distilgpt2	Q4	744.14 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-1.5B	Q4	744.01 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-Embedding-0.6B	Q4	743.37 tok/sEstimated Auto-generated benchmark	3GB
mistralai/Mistral-7B-Instruct-v0.1	Q4	743.33 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	741.76 tok/sEstimated Auto-generated benchmark	4GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	741.46 tok/sEstimated Auto-generated benchmark	4GB

deepseek-ai/DeepSeek-OCR

2GB

918.04 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

899.99 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

892.88 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

890.49 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

890.18 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

887.95 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

882.07 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

880.62 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

879.27 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

874.57 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

854.80 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

845.59 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

844.62 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

831.79 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

824.89 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

822.25 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

818.45 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

814.74 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

807.54 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

806.27 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

801.78 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

794.66 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

793.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

790.53 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

783.01 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

775.54 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

764.26 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

760.55 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

760.33 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.2-dev

4GB

757.49 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

4GB

756.55 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

752.49 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

4GB

751.25 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

750.94 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

4GB

750.21 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

749.40 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

749.00 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

747.88 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

747.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

746.78 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

3GB

746.03 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

745.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen-Image-Edit-2509

4GB

744.63 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

744.44 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

4GB

744.14 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

744.01 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

3GB

743.37 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

4GB

743.33 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

741.76 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

4GB

741.46 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
codellama/CodeLlama-34b-hf	Q4	Fits comfortably	262.21 tok/sEstimated	17GB (have 141GB)
google/gemma-3-1b-it	FP16	Fits comfortably	315.12 tok/sEstimated	2GB (have 141GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	743.37 tok/sEstimated	3GB (have 141GB)
Qwen/Qwen3-0.6B	FP16	Fits comfortably	277.30 tok/sEstimated	13GB (have 141GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	628.06 tok/sEstimated	3GB (have 141GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	451.95 tok/sEstimated	6GB (have 141GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	716.31 tok/sEstimated	4GB (have 141GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	438.05 tok/sEstimated	7GB (have 141GB)
Qwen/Qwen2.5-0.5B-Instruct	FP16	Fits comfortably	245.18 tok/sEstimated	11GB (have 141GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Fits comfortably	125.93 tok/sEstimated	39GB (have 141GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q8	Fits comfortably	102.14 tok/sEstimated	78GB (have 141GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	FP16	Not supported	56.23 tok/sEstimated	156GB (have 141GB)
allenai/OLMo-2-0425-1B	Q4	Fits comfortably	879.27 tok/sEstimated	1GB (have 141GB)
openai-community/gpt2-large	FP16	Fits comfortably	242.81 tok/sEstimated	15GB (have 141GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	632.44 tok/sEstimated	4GB (have 141GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	505.75 tok/sEstimated	7GB (have 141GB)
Qwen/Qwen3-4B	Q8	Fits comfortably	436.11 tok/sEstimated	4GB (have 141GB)
Qwen/Qwen3-4B	FP16	Fits comfortably	277.19 tok/sEstimated	9GB (have 141GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	343.71 tok/sEstimated	15GB (have 141GB)
google-t5/t5-3b	Q8	Fits comfortably	543.51 tok/sEstimated	3GB (have 141GB)
google-t5/t5-3b	FP16	Fits comfortably	343.50 tok/sEstimated	6GB (have 141GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Fits comfortably	237.97 tok/sEstimated	17GB (have 141GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	661.64 tok/sEstimated	3GB (have 141GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	508.74 tok/sEstimated	5GB (have 141GB)
Qwen/Qwen2.5-1.5B	Q8	Fits comfortably	479.10 tok/sEstimated	5GB (have 141GB)
Qwen/Qwen2.5-1.5B	FP16	Fits comfortably	254.57 tok/sEstimated	11GB (have 141GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	567.14 tok/sEstimated	7GB (have 141GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	332.17 tok/sEstimated	14GB (have 141GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	444.56 tok/sEstimated	5GB (have 141GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	261.43 tok/sEstimated	11GB (have 141GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Fits comfortably	238.34 tok/sEstimated	34GB (have 141GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	516.40 tok/sEstimated	7GB (have 141GB)
zai-org/GLM-4.6-FP8	FP16	Fits comfortably	256.62 tok/sEstimated	15GB (have 141GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	280.41 tok/sEstimated	15GB (have 141GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	747.62 tok/sEstimated	4GB (have 141GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	494.06 tok/sEstimated	9GB (have 141GB)
LiquidAI/LFM2-1.2B	Q8	Fits comfortably	587.46 tok/sEstimated	2GB (have 141GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	457.50 tok/sEstimated	9GB (have 141GB)
unsloth/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	261.98 tok/sEstimated	17GB (have 141GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Fits comfortably	237.83 tok/sEstimated	34GB (have 141GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Fits comfortably	185.61 tok/sEstimated	68GB (have 141GB)
Qwen/Qwen2.5-Math-1.5B	FP16	Fits comfortably	285.10 tok/sEstimated	11GB (have 141GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	718.80 tok/sEstimated	4GB (have 141GB)
Qwen/Qwen3-Embedding-4B	Q4	Fits comfortably	662.33 tok/sEstimated	2GB (have 141GB)
Qwen/Qwen3-Embedding-4B	Q8	Fits comfortably	477.16 tok/sEstimated	4GB (have 141GB)
Qwen/Qwen3-Embedding-4B	FP16	Fits comfortably	265.14 tok/sEstimated	9GB (have 141GB)
unsloth/mistral-7b-v0.3-bnb-4bit	FP16	Fits comfortably	241.52 tok/sEstimated	15GB (have 141GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	491.17 tok/sEstimated	7GB (have 141GB)
Qwen/Qwen2.5-14B	Q8	Fits comfortably	340.33 tok/sEstimated	14GB (have 141GB)
Qwen/Qwen2-1.5B-Instruct	Q8	Fits comfortably	473.49 tok/sEstimated	5GB (have 141GB)

codellama/CodeLlama-34b-hfQ4

Fits comfortably17GB required · 141GB available

262.21 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 141GB available

315.12 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 141GB available

743.37 tok/sEstimated

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 141GB available

277.30 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 141GB available

628.06 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 141GB available

451.95 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 141GB available

716.31 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 141GB available

438.05 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructFP16

Fits comfortably11GB required · 141GB available

245.18 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Fits comfortably39GB required · 141GB available

125.93 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ8

Fits comfortably78GB required · 141GB available

102.14 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructFP16

Not supported156GB required · 141GB available

56.23 tok/sEstimated

allenai/OLMo-2-0425-1BQ4

Fits comfortably1GB required · 141GB available

879.27 tok/sEstimated

openai-community/gpt2-largeFP16

Fits comfortably15GB required · 141GB available

242.81 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 141GB available

632.44 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 141GB available

505.75 tok/sEstimated

Qwen/Qwen3-4BQ8

Fits comfortably4GB required · 141GB available

436.11 tok/sEstimated

Qwen/Qwen3-4BFP16

Fits comfortably9GB required · 141GB available

277.19 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 141GB available

343.71 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 141GB available

543.51 tok/sEstimated

google-t5/t5-3bFP16

Fits comfortably6GB required · 141GB available

343.50 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Fits comfortably17GB required · 141GB available

237.97 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 141GB available

661.64 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 141GB available

508.74 tok/sEstimated

Qwen/Qwen2.5-1.5BQ8

Fits comfortably5GB required · 141GB available

479.10 tok/sEstimated

Qwen/Qwen2.5-1.5BFP16

Fits comfortably11GB required · 141GB available

254.57 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 141GB available

567.14 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 141GB available

332.17 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 141GB available

444.56 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 141GB available

261.43 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ4

Fits comfortably34GB required · 141GB available

238.34 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 141GB available

516.40 tok/sEstimated

zai-org/GLM-4.6-FP8FP16

Fits comfortably15GB required · 141GB available

256.62 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 141GB available

280.41 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 141GB available

747.62 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably9GB required · 141GB available

494.06 tok/sEstimated

LiquidAI/LFM2-1.2BQ8

Fits comfortably2GB required · 141GB available

587.46 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 141GB available

457.50 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 141GB available

261.98 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ4

Fits comfortably34GB required · 141GB available

237.83 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Fits comfortably68GB required · 141GB available

185.61 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BFP16

Fits comfortably11GB required · 141GB available

285.10 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 141GB available

718.80 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ4

Fits comfortably2GB required · 141GB available

662.33 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ8

Fits comfortably4GB required · 141GB available

477.16 tok/sEstimated

Qwen/Qwen3-Embedding-4BFP16

Fits comfortably9GB required · 141GB available

265.14 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitFP16

Fits comfortably15GB required · 141GB available

241.52 tok/sEstimated

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 141GB available

491.17 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Fits comfortably14GB required · 141GB available

340.33 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ8

Fits comfortably5GB required · 141GB available

473.49 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

NVIDIA H200 SXM 141GB

Check availability

By NVIDIAReleased 2023-11MSRP $35,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM141GB

Cores16,896

TDP700W

ArchitectureHopper

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA H200 SXM 141GB performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
deepseek-ai/DeepSeek-OCR	Q4	918.04 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	899.99 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	892.88 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	890.49 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	890.18 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	887.95 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	882.07 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	880.62 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	879.27 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	874.57 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	854.80 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	845.59 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	844.62 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	831.79 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	824.89 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	822.25 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	818.45 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	814.74 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	807.54 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	806.27 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	801.78 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	794.66 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	793.70 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	790.53 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	783.01 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	775.54 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	764.26 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	760.55 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	760.33 tok/sEstimated Auto-generated benchmark	1GB
black-forest-labs/FLUX.2-dev	Q4	757.49 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	756.55 tok/sEstimated Auto-generated benchmark	4GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	752.49 tok/sEstimated Auto-generated benchmark	2GB
zai-org/GLM-4.6-FP8	Q4	751.25 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	750.94 tok/sEstimated Auto-generated benchmark	4GB
llamafactory/tiny-random-Llama-3	Q4	750.21 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	749.40 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	749.00 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-4-multimodal-instruct	Q4	747.88 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	747.62 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q4	746.78 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-Math-1.5B	Q4	746.03 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	745.77 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen-Image-Edit-2509	Q4	744.63 tok/sEstimated Auto-generated benchmark	4GB
nari-labs/Dia2-2B	Q4	744.44 tok/sEstimated Auto-generated benchmark	2GB
distilbert/distilgpt2	Q4	744.14 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-1.5B	Q4	744.01 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-Embedding-0.6B	Q4	743.37 tok/sEstimated Auto-generated benchmark	3GB
mistralai/Mistral-7B-Instruct-v0.1	Q4	743.33 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	741.76 tok/sEstimated Auto-generated benchmark	4GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	741.46 tok/sEstimated Auto-generated benchmark	4GB

deepseek-ai/DeepSeek-OCR

2GB

918.04 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

899.99 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

892.88 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

890.49 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

890.18 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

887.95 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

882.07 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

880.62 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

879.27 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

874.57 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

854.80 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

845.59 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

844.62 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

831.79 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

824.89 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

822.25 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

818.45 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

814.74 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

807.54 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

806.27 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

801.78 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

794.66 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

793.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

790.53 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

783.01 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

775.54 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

764.26 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

760.55 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

760.33 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.2-dev

4GB

757.49 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

4GB

756.55 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

752.49 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

4GB

751.25 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

750.94 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

4GB

750.21 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

749.40 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

749.00 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

747.88 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

747.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

746.78 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

3GB

746.03 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

745.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen-Image-Edit-2509

4GB

744.63 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

744.44 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

4GB

744.14 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

744.01 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

3GB

743.37 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

4GB

743.33 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

741.76 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

4GB

741.46 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
codellama/CodeLlama-34b-hf	Q4	Fits comfortably	262.21 tok/sEstimated	17GB (have 141GB)
google/gemma-3-1b-it	FP16	Fits comfortably	315.12 tok/sEstimated	2GB (have 141GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	743.37 tok/sEstimated	3GB (have 141GB)
Qwen/Qwen3-0.6B	FP16	Fits comfortably	277.30 tok/sEstimated	13GB (have 141GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	628.06 tok/sEstimated	3GB (have 141GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	451.95 tok/sEstimated	6GB (have 141GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	716.31 tok/sEstimated	4GB (have 141GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	438.05 tok/sEstimated	7GB (have 141GB)
Qwen/Qwen2.5-0.5B-Instruct	FP16	Fits comfortably	245.18 tok/sEstimated	11GB (have 141GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Fits comfortably	125.93 tok/sEstimated	39GB (have 141GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q8	Fits comfortably	102.14 tok/sEstimated	78GB (have 141GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	FP16	Not supported	56.23 tok/sEstimated	156GB (have 141GB)
allenai/OLMo-2-0425-1B	Q4	Fits comfortably	879.27 tok/sEstimated	1GB (have 141GB)
openai-community/gpt2-large	FP16	Fits comfortably	242.81 tok/sEstimated	15GB (have 141GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	632.44 tok/sEstimated	4GB (have 141GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	505.75 tok/sEstimated	7GB (have 141GB)
Qwen/Qwen3-4B	Q8	Fits comfortably	436.11 tok/sEstimated	4GB (have 141GB)
Qwen/Qwen3-4B	FP16	Fits comfortably	277.19 tok/sEstimated	9GB (have 141GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	343.71 tok/sEstimated	15GB (have 141GB)
google-t5/t5-3b	Q8	Fits comfortably	543.51 tok/sEstimated	3GB (have 141GB)
google-t5/t5-3b	FP16	Fits comfortably	343.50 tok/sEstimated	6GB (have 141GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Fits comfortably	237.97 tok/sEstimated	17GB (have 141GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	661.64 tok/sEstimated	3GB (have 141GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	508.74 tok/sEstimated	5GB (have 141GB)
Qwen/Qwen2.5-1.5B	Q8	Fits comfortably	479.10 tok/sEstimated	5GB (have 141GB)
Qwen/Qwen2.5-1.5B	FP16	Fits comfortably	254.57 tok/sEstimated	11GB (have 141GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	567.14 tok/sEstimated	7GB (have 141GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	332.17 tok/sEstimated	14GB (have 141GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	444.56 tok/sEstimated	5GB (have 141GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	261.43 tok/sEstimated	11GB (have 141GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Fits comfortably	238.34 tok/sEstimated	34GB (have 141GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	516.40 tok/sEstimated	7GB (have 141GB)
zai-org/GLM-4.6-FP8	FP16	Fits comfortably	256.62 tok/sEstimated	15GB (have 141GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits comfortably	280.41 tok/sEstimated	15GB (have 141GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	747.62 tok/sEstimated	4GB (have 141GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	494.06 tok/sEstimated	9GB (have 141GB)
LiquidAI/LFM2-1.2B	Q8	Fits comfortably	587.46 tok/sEstimated	2GB (have 141GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	457.50 tok/sEstimated	9GB (have 141GB)
unsloth/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	261.98 tok/sEstimated	17GB (have 141GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Fits comfortably	237.83 tok/sEstimated	34GB (have 141GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Fits comfortably	185.61 tok/sEstimated	68GB (have 141GB)
Qwen/Qwen2.5-Math-1.5B	FP16	Fits comfortably	285.10 tok/sEstimated	11GB (have 141GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	718.80 tok/sEstimated	4GB (have 141GB)
Qwen/Qwen3-Embedding-4B	Q4	Fits comfortably	662.33 tok/sEstimated	2GB (have 141GB)
Qwen/Qwen3-Embedding-4B	Q8	Fits comfortably	477.16 tok/sEstimated	4GB (have 141GB)
Qwen/Qwen3-Embedding-4B	FP16	Fits comfortably	265.14 tok/sEstimated	9GB (have 141GB)
unsloth/mistral-7b-v0.3-bnb-4bit	FP16	Fits comfortably	241.52 tok/sEstimated	15GB (have 141GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	491.17 tok/sEstimated	7GB (have 141GB)
Qwen/Qwen2.5-14B	Q8	Fits comfortably	340.33 tok/sEstimated	14GB (have 141GB)
Qwen/Qwen2-1.5B-Instruct	Q8	Fits comfortably	473.49 tok/sEstimated	5GB (have 141GB)

codellama/CodeLlama-34b-hfQ4

Fits comfortably17GB required · 141GB available

262.21 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 141GB available

315.12 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 141GB available

743.37 tok/sEstimated

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 141GB available

277.30 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 141GB available

628.06 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 141GB available

451.95 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 141GB available

716.31 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 141GB available

438.05 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructFP16

Fits comfortably11GB required · 141GB available

245.18 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Fits comfortably39GB required · 141GB available

125.93 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ8

Fits comfortably78GB required · 141GB available

102.14 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructFP16

Not supported156GB required · 141GB available

56.23 tok/sEstimated

allenai/OLMo-2-0425-1BQ4

Fits comfortably1GB required · 141GB available

879.27 tok/sEstimated

openai-community/gpt2-largeFP16

Fits comfortably15GB required · 141GB available

242.81 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 141GB available

632.44 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 141GB available

505.75 tok/sEstimated

Qwen/Qwen3-4BQ8

Fits comfortably4GB required · 141GB available

436.11 tok/sEstimated

Qwen/Qwen3-4BFP16

Fits comfortably9GB required · 141GB available

277.19 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 141GB available

343.71 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 141GB available

543.51 tok/sEstimated

google-t5/t5-3bFP16

Fits comfortably6GB required · 141GB available

343.50 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Fits comfortably17GB required · 141GB available

237.97 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 141GB available

661.64 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 141GB available

508.74 tok/sEstimated

Qwen/Qwen2.5-1.5BQ8

Fits comfortably5GB required · 141GB available

479.10 tok/sEstimated

Qwen/Qwen2.5-1.5BFP16

Fits comfortably11GB required · 141GB available

254.57 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 141GB available

567.14 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 141GB available

332.17 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 141GB available

444.56 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 141GB available

261.43 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ4

Fits comfortably34GB required · 141GB available

238.34 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 141GB available

516.40 tok/sEstimated

zai-org/GLM-4.6-FP8FP16

Fits comfortably15GB required · 141GB available

256.62 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits comfortably15GB required · 141GB available

280.41 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 141GB available

747.62 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably9GB required · 141GB available

494.06 tok/sEstimated

LiquidAI/LFM2-1.2BQ8

Fits comfortably2GB required · 141GB available

587.46 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 141GB available

457.50 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 141GB available

261.98 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ4

Fits comfortably34GB required · 141GB available

237.83 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Fits comfortably68GB required · 141GB available

185.61 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BFP16

Fits comfortably11GB required · 141GB available

285.10 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 141GB available

718.80 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ4

Fits comfortably2GB required · 141GB available

662.33 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ8

Fits comfortably4GB required · 141GB available

477.16 tok/sEstimated

Qwen/Qwen3-Embedding-4BFP16

Fits comfortably9GB required · 141GB available

265.14 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitFP16

Fits comfortably15GB required · 141GB available

241.52 tok/sEstimated

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 141GB available

491.17 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Fits comfortably14GB required · 141GB available

340.33 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ8

Fits comfortably5GB required · 141GB available

473.49 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.