Quick Answer: NVIDIA H100 PCIe 80GB offers 80GB VRAM and starts around current market pricing. It delivers approximately 414 tokens/sec on Qwen/Qwen2.5-3B-Instruct. It typically draws 350W under load.

NVIDIA H100 PCIe 80GB

Check availability

By NVIDIAReleased 2023-03MSRP $25,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM80GB

Cores16,896

TDP350W

ArchitectureHopper

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA H100 PCIe 80GB performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
Qwen/Qwen2.5-3B-Instruct	Q4	414.49 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	408.61 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	405.35 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	405.06 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	401.13 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	399.11 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	398.51 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	394.28 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	394.22 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	393.70 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	393.29 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	392.49 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	391.10 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	386.78 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	384.08 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	378.12 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	377.46 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	377.27 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	374.47 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	372.47 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	372.34 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	370.82 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	364.43 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	359.22 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	350.22 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	350.13 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	348.34 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	346.03 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	345.77 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	344.29 tok/sEstimated Auto-generated benchmark	2GB
distilbert/distilgpt2	Q4	343.84 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q4	343.81 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-1.5B-Instruct	Q4	343.31 tok/sEstimated Auto-generated benchmark	3GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	342.39 tok/sEstimated Auto-generated benchmark	3GB
openai-community/gpt2-medium	Q4	342.13 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3-0324	Q4	341.97 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-2b-instruct	Q4	341.57 tok/sEstimated Auto-generated benchmark	1GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	341.51 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B-Instruct	Q4	341.03 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	340.84 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	339.74 tok/sEstimated Auto-generated benchmark	4GB
rednote-hilab/dots.ocr	Q4	339.60 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	339.43 tok/sEstimated Auto-generated benchmark	4GB
rinna/japanese-gpt-neox-small	Q4	338.98 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3.1	Q4	337.19 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceTB/SmolLM-135M	Q4	336.83 tok/sEstimated Auto-generated benchmark	4GB
Tongyi-MAI/Z-Image-Turbo	Q4	336.09 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B	Q4	335.88 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-1.7B-Base	Q4	334.53 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-1.5B-Instruct	Q4	334.26 tok/sEstimated Auto-generated benchmark	3GB

Qwen/Qwen2.5-3B-Instruct

2GB

414.49 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

408.61 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

405.35 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

405.06 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

401.13 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

399.11 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

398.51 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

394.28 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

394.22 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

393.70 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

393.29 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

392.49 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

391.10 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

386.78 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

384.08 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

378.12 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

377.46 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

377.27 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

374.47 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

372.47 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

372.34 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

370.82 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

364.43 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

359.22 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

350.22 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

350.13 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

348.34 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

346.03 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

345.77 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

344.29 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

4GB

343.84 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

343.81 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

3GB

343.31 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

342.39 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

342.13 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

341.97 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

341.57 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

341.51 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

4GB

341.03 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

340.84 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

339.74 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

4GB

339.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

339.43 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

4GB

338.98 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

4GB

337.19 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

4GB

336.83 tok/sEstimated

Auto-generated benchmark

Tongyi-MAI/Z-Image-Turbo

4GB

336.09 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

335.88 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

334.53 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

334.26 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
mlx-community/gpt-oss-20b-MXFP4-Q8	FP16	Fits comfortably	62.03 tok/sEstimated	41GB (have 80GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	121.86 tok/sEstimated	9GB (have 80GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	312.27 tok/sEstimated	4GB (have 80GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q8	Fits comfortably	110.31 tok/sEstimated	20GB (have 80GB)
Qwen/QwQ-32B-Preview	Q4	Fits comfortably	103.33 tok/sEstimated	17GB (have 80GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Fits comfortably	101.90 tok/sEstimated	17GB (have 80GB)
openai/gpt-oss-safeguard-20b	FP16	Fits comfortably	64.08 tok/sEstimated	44GB (have 80GB)
moonshotai/Kimi-K2-Thinking	Q8	Not supported	81.64 tok/sEstimated	978GB (have 80GB)
black-forest-labs/FLUX.1-dev	FP16	Fits comfortably	118.74 tok/sEstimated	16GB (have 80GB)
google/embeddinggemma-300m	Q4	Fits comfortably	364.43 tok/sEstimated	1GB (have 80GB)
WeiboAI/VibeThinker-1.5B	Q4	Fits comfortably	408.61 tok/sEstimated	1GB (have 80GB)
Qwen/Qwen2.5-7B-Instruct	FP16	Fits comfortably	109.05 tok/sEstimated	15GB (have 80GB)
Qwen/Qwen3-0.6B	Q4	Fits comfortably	288.51 tok/sEstimated	3GB (have 80GB)
Qwen/Qwen3-0.6B	Q8	Fits comfortably	210.96 tok/sEstimated	6GB (have 80GB)
bigscience/bloomz-560m	FP16	Fits comfortably	125.43 tok/sEstimated	15GB (have 80GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	287.71 tok/sEstimated	3GB (have 80GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	FP16	Fits comfortably	151.30 tok/sEstimated	6GB (have 80GB)
openai-community/gpt2	Q4	Fits comfortably	310.90 tok/sEstimated	4GB (have 80GB)
openai-community/gpt2	Q8	Fits comfortably	233.70 tok/sEstimated	7GB (have 80GB)
openai-community/gpt2	FP16	Fits comfortably	119.66 tok/sEstimated	15GB (have 80GB)
Qwen/Qwen2.5-7B-Instruct	Q8	Fits comfortably	201.73 tok/sEstimated	7GB (have 80GB)
Qwen/Qwen3-0.6B	FP16	Fits comfortably	115.77 tok/sEstimated	13GB (have 80GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	226.88 tok/sEstimated	9GB (have 80GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	341.03 tok/sEstimated	4GB (have 80GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	226.54 tok/sEstimated	9GB (have 80GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Fits comfortably	114.89 tok/sEstimated	17GB (have 80GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	241.43 tok/sEstimated	6GB (have 80GB)
Qwen/Qwen3-Embedding-0.6B	FP16	Fits comfortably	110.48 tok/sEstimated	13GB (have 80GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	343.31 tok/sEstimated	3GB (have 80GB)
Qwen/Qwen2.5-1.5B-Instruct	Q8	Fits comfortably	210.08 tok/sEstimated	5GB (have 80GB)
Qwen/Qwen2.5-1.5B-Instruct	FP16	Fits comfortably	118.35 tok/sEstimated	11GB (have 80GB)
facebook/opt-125m	Q4	Fits comfortably	307.69 tok/sEstimated	4GB (have 80GB)
facebook/opt-125m	Q8	Fits comfortably	210.56 tok/sEstimated	7GB (have 80GB)
facebook/opt-125m	FP16	Fits comfortably	112.80 tok/sEstimated	15GB (have 80GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	129.28 tok/sEstimated	9GB (have 80GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	377.46 tok/sEstimated	1GB (have 80GB)
openai/gpt-oss-120b	Q4	Fits comfortably	59.12 tok/sEstimated	59GB (have 80GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	377.27 tok/sEstimated	2GB (have 80GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	294.20 tok/sEstimated	4GB (have 80GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	205.23 tok/sEstimated	7GB (have 80GB)
mistralai/Mistral-7B-Instruct-v0.2	FP16	Fits comfortably	112.13 tok/sEstimated	15GB (have 80GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	322.19 tok/sEstimated	4GB (have 80GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	238.17 tok/sEstimated	9GB (have 80GB)
meta-llama/Llama-3.2-1B	Q4	Fits comfortably	391.10 tok/sEstimated	1GB (have 80GB)
Qwen/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	214.29 tok/sEstimated	5GB (have 80GB)
Qwen/Qwen2.5-0.5B-Instruct	FP16	Fits comfortably	119.33 tok/sEstimated	11GB (have 80GB)
Qwen/Qwen3-32B	Q4	Fits comfortably	103.00 tok/sEstimated	16GB (have 80GB)
Qwen/Qwen3-32B	Q8	Fits comfortably	72.37 tok/sEstimated	33GB (have 80GB)
allenai/OLMo-2-0425-1B	Q4	Fits comfortably	394.28 tok/sEstimated	1GB (have 80GB)
meta-llama/Llama-3.2-1B	FP16	Fits comfortably	138.71 tok/sEstimated	2GB (have 80GB)

mlx-community/gpt-oss-20b-MXFP4-Q8FP16

Fits comfortably41GB required · 80GB available

62.03 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 80GB available

121.86 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 80GB available

312.27 tok/sEstimated

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ8

Fits comfortably20GB required · 80GB available

110.31 tok/sEstimated

Qwen/QwQ-32B-PreviewQ4

Fits comfortably17GB required · 80GB available

103.33 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ4

Fits comfortably17GB required · 80GB available

101.90 tok/sEstimated

openai/gpt-oss-safeguard-20bFP16

Fits comfortably44GB required · 80GB available

64.08 tok/sEstimated

moonshotai/Kimi-K2-ThinkingQ8

Not supported978GB required · 80GB available

81.64 tok/sEstimated

black-forest-labs/FLUX.1-devFP16

Fits comfortably16GB required · 80GB available

118.74 tok/sEstimated

google/embeddinggemma-300mQ4

Fits comfortably1GB required · 80GB available

364.43 tok/sEstimated

WeiboAI/VibeThinker-1.5BQ4

Fits comfortably1GB required · 80GB available

408.61 tok/sEstimated

Qwen/Qwen2.5-7B-InstructFP16

Fits comfortably15GB required · 80GB available

109.05 tok/sEstimated

Qwen/Qwen3-0.6BQ4

Fits comfortably3GB required · 80GB available

288.51 tok/sEstimated

Qwen/Qwen3-0.6BQ8

Fits comfortably6GB required · 80GB available

210.96 tok/sEstimated

bigscience/bloomz-560mFP16

Fits comfortably15GB required · 80GB available

125.43 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 80GB available

287.71 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16

Fits comfortably6GB required · 80GB available

151.30 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 80GB available

310.90 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 80GB available

233.70 tok/sEstimated

openai-community/gpt2FP16

Fits comfortably15GB required · 80GB available

119.66 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ8

Fits comfortably7GB required · 80GB available

201.73 tok/sEstimated

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 80GB available

115.77 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 80GB available

226.88 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 80GB available

341.03 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably9GB required · 80GB available

226.54 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Fits comfortably17GB required · 80GB available

114.89 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 80GB available

241.43 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BFP16

Fits comfortably13GB required · 80GB available

110.48 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 80GB available

343.31 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ8

Fits comfortably5GB required · 80GB available

210.08 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructFP16

Fits comfortably11GB required · 80GB available

118.35 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 80GB available

307.69 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 80GB available

210.56 tok/sEstimated

facebook/opt-125mFP16

Fits comfortably15GB required · 80GB available

112.80 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 80GB available

129.28 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 80GB available

377.46 tok/sEstimated

openai/gpt-oss-120bQ4

Fits comfortably59GB required · 80GB available

59.12 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 80GB available

377.27 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 80GB available

294.20 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 80GB available

205.23 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2FP16

Fits comfortably15GB required · 80GB available

112.13 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 80GB available

322.19 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 80GB available

238.17 tok/sEstimated

meta-llama/Llama-3.2-1BQ4

Fits comfortably1GB required · 80GB available

391.10 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 80GB available

214.29 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructFP16

Fits comfortably11GB required · 80GB available

119.33 tok/sEstimated

Qwen/Qwen3-32BQ4

Fits comfortably16GB required · 80GB available

103.00 tok/sEstimated

Qwen/Qwen3-32BQ8

Fits comfortably33GB required · 80GB available

72.37 tok/sEstimated

allenai/OLMo-2-0425-1BQ4

Fits comfortably1GB required · 80GB available

394.28 tok/sEstimated

meta-llama/Llama-3.2-1BFP16

Fits comfortably2GB required · 80GB available

138.71 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

NVIDIA H100 PCIe 80GB

Check availability

By NVIDIAReleased 2023-03MSRP $25,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM80GB

Cores16,896

TDP350W

ArchitectureHopper

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA H100 PCIe 80GB performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
Qwen/Qwen2.5-3B-Instruct	Q4	414.49 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	408.61 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	405.35 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	405.06 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	401.13 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	399.11 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	398.51 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	394.28 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	394.22 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	393.70 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	393.29 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	392.49 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	391.10 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	386.78 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	384.08 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	378.12 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	377.46 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	377.27 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	374.47 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	372.47 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	372.34 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	370.82 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	364.43 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	359.22 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	350.22 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	350.13 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	348.34 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	346.03 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	345.77 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	344.29 tok/sEstimated Auto-generated benchmark	2GB
distilbert/distilgpt2	Q4	343.84 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q4	343.81 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-1.5B-Instruct	Q4	343.31 tok/sEstimated Auto-generated benchmark	3GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	342.39 tok/sEstimated Auto-generated benchmark	3GB
openai-community/gpt2-medium	Q4	342.13 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3-0324	Q4	341.97 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-2b-instruct	Q4	341.57 tok/sEstimated Auto-generated benchmark	1GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	341.51 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B-Instruct	Q4	341.03 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	340.84 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	339.74 tok/sEstimated Auto-generated benchmark	4GB
rednote-hilab/dots.ocr	Q4	339.60 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	339.43 tok/sEstimated Auto-generated benchmark	4GB
rinna/japanese-gpt-neox-small	Q4	338.98 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3.1	Q4	337.19 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceTB/SmolLM-135M	Q4	336.83 tok/sEstimated Auto-generated benchmark	4GB
Tongyi-MAI/Z-Image-Turbo	Q4	336.09 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B	Q4	335.88 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-1.7B-Base	Q4	334.53 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-1.5B-Instruct	Q4	334.26 tok/sEstimated Auto-generated benchmark	3GB

Qwen/Qwen2.5-3B-Instruct

2GB

414.49 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

408.61 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

405.35 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

405.06 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

401.13 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

399.11 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

398.51 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

394.28 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

394.22 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

393.70 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

393.29 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

392.49 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

391.10 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

386.78 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

384.08 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

378.12 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

377.46 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

377.27 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

374.47 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

372.47 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

372.34 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

370.82 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

364.43 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

359.22 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

350.22 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

350.13 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

348.34 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

346.03 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

345.77 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

344.29 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

4GB

343.84 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

343.81 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

3GB

343.31 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

342.39 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

342.13 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

341.97 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

341.57 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

341.51 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

4GB

341.03 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

340.84 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

339.74 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

4GB

339.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

339.43 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

4GB

338.98 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

4GB

337.19 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

4GB

336.83 tok/sEstimated

Auto-generated benchmark

Tongyi-MAI/Z-Image-Turbo

4GB

336.09 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

335.88 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

334.53 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

334.26 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
mlx-community/gpt-oss-20b-MXFP4-Q8	FP16	Fits comfortably	62.03 tok/sEstimated	41GB (have 80GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	FP16	Fits comfortably	121.86 tok/sEstimated	9GB (have 80GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	312.27 tok/sEstimated	4GB (have 80GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q8	Fits comfortably	110.31 tok/sEstimated	20GB (have 80GB)
Qwen/QwQ-32B-Preview	Q4	Fits comfortably	103.33 tok/sEstimated	17GB (have 80GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Fits comfortably	101.90 tok/sEstimated	17GB (have 80GB)
openai/gpt-oss-safeguard-20b	FP16	Fits comfortably	64.08 tok/sEstimated	44GB (have 80GB)
moonshotai/Kimi-K2-Thinking	Q8	Not supported	81.64 tok/sEstimated	978GB (have 80GB)
black-forest-labs/FLUX.1-dev	FP16	Fits comfortably	118.74 tok/sEstimated	16GB (have 80GB)
google/embeddinggemma-300m	Q4	Fits comfortably	364.43 tok/sEstimated	1GB (have 80GB)
WeiboAI/VibeThinker-1.5B	Q4	Fits comfortably	408.61 tok/sEstimated	1GB (have 80GB)
Qwen/Qwen2.5-7B-Instruct	FP16	Fits comfortably	109.05 tok/sEstimated	15GB (have 80GB)
Qwen/Qwen3-0.6B	Q4	Fits comfortably	288.51 tok/sEstimated	3GB (have 80GB)
Qwen/Qwen3-0.6B	Q8	Fits comfortably	210.96 tok/sEstimated	6GB (have 80GB)
bigscience/bloomz-560m	FP16	Fits comfortably	125.43 tok/sEstimated	15GB (have 80GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	287.71 tok/sEstimated	3GB (have 80GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	FP16	Fits comfortably	151.30 tok/sEstimated	6GB (have 80GB)
openai-community/gpt2	Q4	Fits comfortably	310.90 tok/sEstimated	4GB (have 80GB)
openai-community/gpt2	Q8	Fits comfortably	233.70 tok/sEstimated	7GB (have 80GB)
openai-community/gpt2	FP16	Fits comfortably	119.66 tok/sEstimated	15GB (have 80GB)
Qwen/Qwen2.5-7B-Instruct	Q8	Fits comfortably	201.73 tok/sEstimated	7GB (have 80GB)
Qwen/Qwen3-0.6B	FP16	Fits comfortably	115.77 tok/sEstimated	13GB (have 80GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	226.88 tok/sEstimated	9GB (have 80GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	341.03 tok/sEstimated	4GB (have 80GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	226.54 tok/sEstimated	9GB (have 80GB)
meta-llama/Meta-Llama-3-8B-Instruct	FP16	Fits comfortably	114.89 tok/sEstimated	17GB (have 80GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	241.43 tok/sEstimated	6GB (have 80GB)
Qwen/Qwen3-Embedding-0.6B	FP16	Fits comfortably	110.48 tok/sEstimated	13GB (have 80GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	343.31 tok/sEstimated	3GB (have 80GB)
Qwen/Qwen2.5-1.5B-Instruct	Q8	Fits comfortably	210.08 tok/sEstimated	5GB (have 80GB)
Qwen/Qwen2.5-1.5B-Instruct	FP16	Fits comfortably	118.35 tok/sEstimated	11GB (have 80GB)
facebook/opt-125m	Q4	Fits comfortably	307.69 tok/sEstimated	4GB (have 80GB)
facebook/opt-125m	Q8	Fits comfortably	210.56 tok/sEstimated	7GB (have 80GB)
facebook/opt-125m	FP16	Fits comfortably	112.80 tok/sEstimated	15GB (have 80GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	129.28 tok/sEstimated	9GB (have 80GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	377.46 tok/sEstimated	1GB (have 80GB)
openai/gpt-oss-120b	Q4	Fits comfortably	59.12 tok/sEstimated	59GB (have 80GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	377.27 tok/sEstimated	2GB (have 80GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	294.20 tok/sEstimated	4GB (have 80GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	205.23 tok/sEstimated	7GB (have 80GB)
mistralai/Mistral-7B-Instruct-v0.2	FP16	Fits comfortably	112.13 tok/sEstimated	15GB (have 80GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	322.19 tok/sEstimated	4GB (have 80GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	238.17 tok/sEstimated	9GB (have 80GB)
meta-llama/Llama-3.2-1B	Q4	Fits comfortably	391.10 tok/sEstimated	1GB (have 80GB)
Qwen/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	214.29 tok/sEstimated	5GB (have 80GB)
Qwen/Qwen2.5-0.5B-Instruct	FP16	Fits comfortably	119.33 tok/sEstimated	11GB (have 80GB)
Qwen/Qwen3-32B	Q4	Fits comfortably	103.00 tok/sEstimated	16GB (have 80GB)
Qwen/Qwen3-32B	Q8	Fits comfortably	72.37 tok/sEstimated	33GB (have 80GB)
allenai/OLMo-2-0425-1B	Q4	Fits comfortably	394.28 tok/sEstimated	1GB (have 80GB)
meta-llama/Llama-3.2-1B	FP16	Fits comfortably	138.71 tok/sEstimated	2GB (have 80GB)

mlx-community/gpt-oss-20b-MXFP4-Q8FP16

Fits comfortably41GB required · 80GB available

62.03 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16

Fits comfortably9GB required · 80GB available

121.86 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 80GB available

312.27 tok/sEstimated

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ8

Fits comfortably20GB required · 80GB available

110.31 tok/sEstimated

Qwen/QwQ-32B-PreviewQ4

Fits comfortably17GB required · 80GB available

103.33 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ4

Fits comfortably17GB required · 80GB available

101.90 tok/sEstimated

openai/gpt-oss-safeguard-20bFP16

Fits comfortably44GB required · 80GB available

64.08 tok/sEstimated

moonshotai/Kimi-K2-ThinkingQ8

Not supported978GB required · 80GB available

81.64 tok/sEstimated

black-forest-labs/FLUX.1-devFP16

Fits comfortably16GB required · 80GB available

118.74 tok/sEstimated

google/embeddinggemma-300mQ4

Fits comfortably1GB required · 80GB available

364.43 tok/sEstimated

WeiboAI/VibeThinker-1.5BQ4

Fits comfortably1GB required · 80GB available

408.61 tok/sEstimated

Qwen/Qwen2.5-7B-InstructFP16

Fits comfortably15GB required · 80GB available

109.05 tok/sEstimated

Qwen/Qwen3-0.6BQ4

Fits comfortably3GB required · 80GB available

288.51 tok/sEstimated

Qwen/Qwen3-0.6BQ8

Fits comfortably6GB required · 80GB available

210.96 tok/sEstimated

bigscience/bloomz-560mFP16

Fits comfortably15GB required · 80GB available

125.43 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 80GB available

287.71 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16

Fits comfortably6GB required · 80GB available

151.30 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 80GB available

310.90 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 80GB available

233.70 tok/sEstimated

openai-community/gpt2FP16

Fits comfortably15GB required · 80GB available

119.66 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ8

Fits comfortably7GB required · 80GB available

201.73 tok/sEstimated

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 80GB available

115.77 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 80GB available

226.88 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 80GB available

341.03 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably9GB required · 80GB available

226.54 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructFP16

Fits comfortably17GB required · 80GB available

114.89 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 80GB available

241.43 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BFP16

Fits comfortably13GB required · 80GB available

110.48 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 80GB available

343.31 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ8

Fits comfortably5GB required · 80GB available

210.08 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructFP16

Fits comfortably11GB required · 80GB available

118.35 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 80GB available

307.69 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 80GB available

210.56 tok/sEstimated

facebook/opt-125mFP16

Fits comfortably15GB required · 80GB available

112.80 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 80GB available

129.28 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 80GB available

377.46 tok/sEstimated

openai/gpt-oss-120bQ4

Fits comfortably59GB required · 80GB available

59.12 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 80GB available

377.27 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 80GB available

294.20 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 80GB available

205.23 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2FP16

Fits comfortably15GB required · 80GB available

112.13 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 80GB available

322.19 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 80GB available

238.17 tok/sEstimated

meta-llama/Llama-3.2-1BQ4

Fits comfortably1GB required · 80GB available

391.10 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 80GB available

214.29 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructFP16

Fits comfortably11GB required · 80GB available

119.33 tok/sEstimated

Qwen/Qwen3-32BQ4

Fits comfortably16GB required · 80GB available

103.00 tok/sEstimated

Qwen/Qwen3-32BQ8

Fits comfortably33GB required · 80GB available

72.37 tok/sEstimated

allenai/OLMo-2-0425-1BQ4

Fits comfortably1GB required · 80GB available

394.28 tok/sEstimated

meta-llama/Llama-3.2-1BFP16

Fits comfortably2GB required · 80GB available

138.71 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.