L
localai.computer
ModelsGPUsSystemsAI SetupsBuildsMethodology

Resources

  • Methodology
  • Submit Benchmark
  • About

Browse

  • AI Models
  • GPUs
  • PC Builds

Community

  • Leaderboard

Legal

  • Privacy
  • Terms
  • Contact

© 2025 localai.computer. Hardware recommendations for running AI models locally.

ℹ️We earn from qualifying purchases through affiliate links at no extra cost to you. This supports our free content and research.

  1. Home
  2. GPUs
  3. NVIDIA H100 PCIe 80GB

Quick Answer: NVIDIA H100 PCIe 80GB offers 80GB VRAM and starts around current market pricing. It delivers approximately 414 tokens/sec on Qwen/Qwen2.5-3B-Instruct. It typically draws 350W under load.

NVIDIA H100 PCIe 80GB

Check availability
By NVIDIAReleased 2023-03MSRP $25,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on AmazonView Benchmarks
Specs snapshot
Key hardware metrics for AI workloads.
VRAM80GB
Cores16,896
TDP350W
ArchitectureHopper

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA H100 PCIe 80GB performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hrRunPodfrom $0.30/hrLambda Labsenterprise-grade

AI benchmarks

ModelQuantizationTokens/secVRAM used
Qwen/Qwen2.5-3B-InstructQ4
414.49 tok/sEstimated

Auto-generated benchmark

2GB
WeiboAI/VibeThinker-1.5BQ4
408.61 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-2-2b-itQ4
405.35 tok/sEstimated

Auto-generated benchmark

1GB
apple/OpenELM-1_1B-InstructQ4
405.06 tok/sEstimated

Auto-generated benchmark

1GB
LiquidAI/LFM2-1.2BQ4
401.13 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-3B-InstructQ4
399.11 tok/sEstimated

Auto-generated benchmark

2GB
google-bert/bert-base-uncasedQ4
398.51 tok/sEstimated

Auto-generated benchmark

1GB
allenai/OLMo-2-0425-1BQ4
394.28 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/gemma-3-1b-itQ4
394.22 tok/sEstimated

Auto-generated benchmark

1GB
inference-net/Schematron-3BQ4
393.70 tok/sEstimated

Auto-generated benchmark

2GB
bigcode/starcoder2-3bQ4
393.29 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-3BQ4
392.49 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-1BQ4
391.10 tok/sEstimated

Auto-generated benchmark

1GB
google-t5/t5-3bQ4
386.78 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/Llama-3.2-1B-InstructQ4
384.08 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/DeepSeek-OCRQ4
378.12 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-1B-InstructQ4
377.46 tok/sEstimated

Auto-generated benchmark

1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4
377.27 tok/sEstimated

Auto-generated benchmark

2GB
tencent/HunyuanOCRQ4
374.47 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-Guard-3-1BQ4
372.47 tok/sEstimated

Auto-generated benchmark

1GB
ibm-research/PowerMoE-3bQ4
372.34 tok/sEstimated

Auto-generated benchmark

2GB
google/gemma-2bQ4
370.82 tok/sEstimated

Auto-generated benchmark

1GB
google/embeddinggemma-300mQ4
364.43 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/deepseek-coder-1.3b-instructQ4
359.22 tok/sEstimated

Auto-generated benchmark

2GB
nari-labs/Dia2-2BQ4
350.22 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen2.5-3BQ4
350.13 tok/sEstimated

Auto-generated benchmark

2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4
348.34 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-3-1b-itQ4
346.03 tok/sEstimated

Auto-generated benchmark

1GB
facebook/sam3Q4
345.77 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/Llama-3.2-3B-InstructQ4
344.29 tok/sEstimated

Auto-generated benchmark

2GB
distilbert/distilgpt2Q4
343.84 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2.5-1.5BQ4
343.81 tok/sEstimated

Auto-generated benchmark

3GB
Qwen/Qwen2.5-1.5B-InstructQ4
343.31 tok/sEstimated

Auto-generated benchmark

3GB
Alibaba-NLP/gte-Qwen2-1.5B-instructQ4
342.39 tok/sEstimated

Auto-generated benchmark

3GB
openai-community/gpt2-mediumQ4
342.13 tok/sEstimated

Auto-generated benchmark

4GB
deepseek-ai/DeepSeek-V3-0324Q4
341.97 tok/sEstimated

Auto-generated benchmark

4GB
ibm-granite/granite-3.3-2b-instructQ4
341.57 tok/sEstimated

Auto-generated benchmark

1GB
HuggingFaceM4/tiny-random-LlamaForCausalLMQ4
341.51 tok/sEstimated

Auto-generated benchmark

4GB
meta-llama/Meta-Llama-3-8B-InstructQ4
341.03 tok/sEstimated

Auto-generated benchmark

4GB
meta-llama/Llama-3.1-8BQ4
340.84 tok/sEstimated

Auto-generated benchmark

4GB
deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4
339.74 tok/sEstimated

Auto-generated benchmark

4GB
rednote-hilab/dots.ocrQ4
339.60 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2.5-7B-InstructQ4
339.43 tok/sEstimated

Auto-generated benchmark

4GB
rinna/japanese-gpt-neox-smallQ4
338.98 tok/sEstimated

Auto-generated benchmark

4GB
deepseek-ai/DeepSeek-V3.1Q4
337.19 tok/sEstimated

Auto-generated benchmark

4GB
HuggingFaceTB/SmolLM-135MQ4
336.83 tok/sEstimated

Auto-generated benchmark

4GB
Tongyi-MAI/Z-Image-TurboQ4
336.09 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2.5-0.5BQ4
335.88 tok/sEstimated

Auto-generated benchmark

3GB
Qwen/Qwen3-1.7B-BaseQ4
334.53 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2-1.5B-InstructQ4
334.26 tok/sEstimated

Auto-generated benchmark

3GB
Qwen/Qwen2.5-3B-Instruct
Q4
2GB
414.49 tok/sEstimated
Auto-generated benchmark
WeiboAI/VibeThinker-1.5B
Q4
1GB
408.61 tok/sEstimated
Auto-generated benchmark
google/gemma-2-2b-it
Q4
1GB
405.35 tok/sEstimated
Auto-generated benchmark
apple/OpenELM-1_1B-Instruct
Q4
1GB
405.06 tok/sEstimated
Auto-generated benchmark
LiquidAI/LFM2-1.2B
Q4
1GB
401.13 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B-Instruct
Q4
2GB
399.11 tok/sEstimated
Auto-generated benchmark
google-bert/bert-base-uncased
Q4
1GB
398.51 tok/sEstimated
Auto-generated benchmark
allenai/OLMo-2-0425-1B
Q4
1GB
394.28 tok/sEstimated
Auto-generated benchmark
unsloth/gemma-3-1b-it
Q4
1GB
394.22 tok/sEstimated
Auto-generated benchmark
inference-net/Schematron-3B
Q4
2GB
393.70 tok/sEstimated
Auto-generated benchmark
bigcode/starcoder2-3b
Q4
2GB
393.29 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B
Q4
2GB
392.49 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B
Q4
1GB
391.10 tok/sEstimated
Auto-generated benchmark
google-t5/t5-3b
Q4
2GB
386.78 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-1B-Instruct
Q4
1GB
384.08 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-OCR
Q4
2GB
378.12 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B-Instruct
Q4
1GB
377.46 tok/sEstimated
Auto-generated benchmark
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
Q4
2GB
377.27 tok/sEstimated
Auto-generated benchmark
tencent/HunyuanOCR
Q4
1GB
374.47 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-Guard-3-1B
Q4
1GB
372.47 tok/sEstimated
Auto-generated benchmark
ibm-research/PowerMoE-3b
Q4
2GB
372.34 tok/sEstimated
Auto-generated benchmark
google/gemma-2b
Q4
1GB
370.82 tok/sEstimated
Auto-generated benchmark
google/embeddinggemma-300m
Q4
1GB
364.43 tok/sEstimated
Auto-generated benchmark
deepseek-ai/deepseek-coder-1.3b-instruct
Q4
2GB
359.22 tok/sEstimated
Auto-generated benchmark
nari-labs/Dia2-2B
Q4
2GB
350.22 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B
Q4
2GB
350.13 tok/sEstimated
Auto-generated benchmark
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Q4
1GB
348.34 tok/sEstimated
Auto-generated benchmark
google/gemma-3-1b-it
Q4
1GB
346.03 tok/sEstimated
Auto-generated benchmark
facebook/sam3
Q4
1GB
345.77 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-3B-Instruct
Q4
2GB
344.29 tok/sEstimated
Auto-generated benchmark
distilbert/distilgpt2
Q4
4GB
343.84 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-1.5B
Q4
3GB
343.81 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-1.5B-Instruct
Q4
3GB
343.31 tok/sEstimated
Auto-generated benchmark
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Q4
3GB
342.39 tok/sEstimated
Auto-generated benchmark
openai-community/gpt2-medium
Q4
4GB
342.13 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-V3-0324
Q4
4GB
341.97 tok/sEstimated
Auto-generated benchmark
ibm-granite/granite-3.3-2b-instruct
Q4
1GB
341.57 tok/sEstimated
Auto-generated benchmark
HuggingFaceM4/tiny-random-LlamaForCausalLM
Q4
4GB
341.51 tok/sEstimated
Auto-generated benchmark
meta-llama/Meta-Llama-3-8B-Instruct
Q4
4GB
341.03 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.1-8B
Q4
4GB
340.84 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Q4
4GB
339.74 tok/sEstimated
Auto-generated benchmark
rednote-hilab/dots.ocr
Q4
4GB
339.60 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-7B-Instruct
Q4
4GB
339.43 tok/sEstimated
Auto-generated benchmark
rinna/japanese-gpt-neox-small
Q4
4GB
338.98 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-V3.1
Q4
4GB
337.19 tok/sEstimated
Auto-generated benchmark
HuggingFaceTB/SmolLM-135M
Q4
4GB
336.83 tok/sEstimated
Auto-generated benchmark
Tongyi-MAI/Z-Image-Turbo
Q4
4GB
336.09 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-0.5B
Q4
3GB
335.88 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-1.7B-Base
Q4
4GB
334.53 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2-1.5B-Instruct
Q4
3GB
334.26 tok/sEstimated
Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

ModelQuantizationVerdictEstimated speedVRAM needed
mlx-community/gpt-oss-20b-MXFP4-Q8FP16Fits comfortably
62.03 tok/sEstimated
41GB (have 80GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16Fits comfortably
121.86 tok/sEstimated
9GB (have 80GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4Fits comfortably
312.27 tok/sEstimated
4GB (have 80GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bitQ8Fits comfortably
110.31 tok/sEstimated
20GB (have 80GB)
Qwen/QwQ-32B-PreviewQ4Fits comfortably
103.33 tok/sEstimated
17GB (have 80GB)
Qwen/Qwen2.5-32B-InstructQ4Fits comfortably
101.90 tok/sEstimated
17GB (have 80GB)
openai/gpt-oss-safeguard-20bFP16Fits comfortably
64.08 tok/sEstimated
44GB (have 80GB)
moonshotai/Kimi-K2-ThinkingQ8Not supported
81.64 tok/sEstimated
978GB (have 80GB)
black-forest-labs/FLUX.1-devFP16Fits comfortably
118.74 tok/sEstimated
16GB (have 80GB)
google/embeddinggemma-300mQ4Fits comfortably
364.43 tok/sEstimated
1GB (have 80GB)
WeiboAI/VibeThinker-1.5BQ4Fits comfortably
408.61 tok/sEstimated
1GB (have 80GB)
Qwen/Qwen2.5-7B-InstructFP16Fits comfortably
109.05 tok/sEstimated
15GB (have 80GB)
Qwen/Qwen3-0.6BQ4Fits comfortably
288.51 tok/sEstimated
3GB (have 80GB)
Qwen/Qwen3-0.6BQ8Fits comfortably
210.96 tok/sEstimated
6GB (have 80GB)
bigscience/bloomz-560mFP16Fits comfortably
125.43 tok/sEstimated
15GB (have 80GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8Fits comfortably
287.71 tok/sEstimated
3GB (have 80GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16Fits comfortably
151.30 tok/sEstimated
6GB (have 80GB)
openai-community/gpt2Q4Fits comfortably
310.90 tok/sEstimated
4GB (have 80GB)
openai-community/gpt2Q8Fits comfortably
233.70 tok/sEstimated
7GB (have 80GB)
openai-community/gpt2FP16Fits comfortably
119.66 tok/sEstimated
15GB (have 80GB)
Qwen/Qwen2.5-7B-InstructQ8Fits comfortably
201.73 tok/sEstimated
7GB (have 80GB)
Qwen/Qwen3-0.6BFP16Fits comfortably
115.77 tok/sEstimated
13GB (have 80GB)
meta-llama/Llama-3.1-8B-InstructQ8Fits comfortably
226.88 tok/sEstimated
9GB (have 80GB)
meta-llama/Meta-Llama-3-8B-InstructQ4Fits comfortably
341.03 tok/sEstimated
4GB (have 80GB)
meta-llama/Meta-Llama-3-8B-InstructQ8Fits comfortably
226.54 tok/sEstimated
9GB (have 80GB)
meta-llama/Meta-Llama-3-8B-InstructFP16Fits comfortably
114.89 tok/sEstimated
17GB (have 80GB)
Qwen/Qwen3-Embedding-0.6BQ8Fits comfortably
241.43 tok/sEstimated
6GB (have 80GB)
Qwen/Qwen3-Embedding-0.6BFP16Fits comfortably
110.48 tok/sEstimated
13GB (have 80GB)
Qwen/Qwen2.5-1.5B-InstructQ4Fits comfortably
343.31 tok/sEstimated
3GB (have 80GB)
Qwen/Qwen2.5-1.5B-InstructQ8Fits comfortably
210.08 tok/sEstimated
5GB (have 80GB)
Qwen/Qwen2.5-1.5B-InstructFP16Fits comfortably
118.35 tok/sEstimated
11GB (have 80GB)
facebook/opt-125mQ4Fits comfortably
307.69 tok/sEstimated
4GB (have 80GB)
facebook/opt-125mQ8Fits comfortably
210.56 tok/sEstimated
7GB (have 80GB)
facebook/opt-125mFP16Fits comfortably
112.80 tok/sEstimated
15GB (have 80GB)
Qwen/Qwen3-4B-Instruct-2507FP16Fits comfortably
129.28 tok/sEstimated
9GB (have 80GB)
meta-llama/Llama-3.2-1B-InstructQ4Fits comfortably
377.46 tok/sEstimated
1GB (have 80GB)
openai/gpt-oss-120bQ4Fits comfortably
59.12 tok/sEstimated
59GB (have 80GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4Fits comfortably
377.27 tok/sEstimated
2GB (have 80GB)
mistralai/Mistral-7B-Instruct-v0.2Q4Fits comfortably
294.20 tok/sEstimated
4GB (have 80GB)
mistralai/Mistral-7B-Instruct-v0.2Q8Fits comfortably
205.23 tok/sEstimated
7GB (have 80GB)
mistralai/Mistral-7B-Instruct-v0.2FP16Fits comfortably
112.13 tok/sEstimated
15GB (have 80GB)
Qwen/Qwen3-8BQ4Fits comfortably
322.19 tok/sEstimated
4GB (have 80GB)
Qwen/Qwen3-8BQ8Fits comfortably
238.17 tok/sEstimated
9GB (have 80GB)
meta-llama/Llama-3.2-1BQ4Fits comfortably
391.10 tok/sEstimated
1GB (have 80GB)
Qwen/Qwen2.5-0.5B-InstructQ8Fits comfortably
214.29 tok/sEstimated
5GB (have 80GB)
Qwen/Qwen2.5-0.5B-InstructFP16Fits comfortably
119.33 tok/sEstimated
11GB (have 80GB)
Qwen/Qwen3-32BQ4Fits comfortably
103.00 tok/sEstimated
16GB (have 80GB)
Qwen/Qwen3-32BQ8Fits comfortably
72.37 tok/sEstimated
33GB (have 80GB)
allenai/OLMo-2-0425-1BQ4Fits comfortably
394.28 tok/sEstimated
1GB (have 80GB)
meta-llama/Llama-3.2-1BFP16Fits comfortably
138.71 tok/sEstimated
2GB (have 80GB)
mlx-community/gpt-oss-20b-MXFP4-Q8FP16
Fits comfortably41GB required · 80GB available
62.03 tok/sEstimated
kaitchup/Phi-3-mini-4k-instruct-gptq-4bitFP16
Fits comfortably9GB required · 80GB available
121.86 tok/sEstimated
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4
Fits comfortably4GB required · 80GB available
312.27 tok/sEstimated
unsloth/gpt-oss-20b-unsloth-bnb-4bitQ8
Fits comfortably20GB required · 80GB available
110.31 tok/sEstimated
Qwen/QwQ-32B-PreviewQ4
Fits comfortably17GB required · 80GB available
103.33 tok/sEstimated
Qwen/Qwen2.5-32B-InstructQ4
Fits comfortably17GB required · 80GB available
101.90 tok/sEstimated
openai/gpt-oss-safeguard-20bFP16
Fits comfortably44GB required · 80GB available
64.08 tok/sEstimated
moonshotai/Kimi-K2-ThinkingQ8
Not supported978GB required · 80GB available
81.64 tok/sEstimated
black-forest-labs/FLUX.1-devFP16
Fits comfortably16GB required · 80GB available
118.74 tok/sEstimated
google/embeddinggemma-300mQ4
Fits comfortably1GB required · 80GB available
364.43 tok/sEstimated
WeiboAI/VibeThinker-1.5BQ4
Fits comfortably1GB required · 80GB available
408.61 tok/sEstimated
Qwen/Qwen2.5-7B-InstructFP16
Fits comfortably15GB required · 80GB available
109.05 tok/sEstimated
Qwen/Qwen3-0.6BQ4
Fits comfortably3GB required · 80GB available
288.51 tok/sEstimated
Qwen/Qwen3-0.6BQ8
Fits comfortably6GB required · 80GB available
210.96 tok/sEstimated
bigscience/bloomz-560mFP16
Fits comfortably15GB required · 80GB available
125.43 tok/sEstimated
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8
Fits comfortably3GB required · 80GB available
287.71 tok/sEstimated
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16
Fits comfortably6GB required · 80GB available
151.30 tok/sEstimated
openai-community/gpt2Q4
Fits comfortably4GB required · 80GB available
310.90 tok/sEstimated
openai-community/gpt2Q8
Fits comfortably7GB required · 80GB available
233.70 tok/sEstimated
openai-community/gpt2FP16
Fits comfortably15GB required · 80GB available
119.66 tok/sEstimated
Qwen/Qwen2.5-7B-InstructQ8
Fits comfortably7GB required · 80GB available
201.73 tok/sEstimated
Qwen/Qwen3-0.6BFP16
Fits comfortably13GB required · 80GB available
115.77 tok/sEstimated
meta-llama/Llama-3.1-8B-InstructQ8
Fits comfortably9GB required · 80GB available
226.88 tok/sEstimated
meta-llama/Meta-Llama-3-8B-InstructQ4
Fits comfortably4GB required · 80GB available
341.03 tok/sEstimated
meta-llama/Meta-Llama-3-8B-InstructQ8
Fits comfortably9GB required · 80GB available
226.54 tok/sEstimated
meta-llama/Meta-Llama-3-8B-InstructFP16
Fits comfortably17GB required · 80GB available
114.89 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BQ8
Fits comfortably6GB required · 80GB available
241.43 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BFP16
Fits comfortably13GB required · 80GB available
110.48 tok/sEstimated
Qwen/Qwen2.5-1.5B-InstructQ4
Fits comfortably3GB required · 80GB available
343.31 tok/sEstimated
Qwen/Qwen2.5-1.5B-InstructQ8
Fits comfortably5GB required · 80GB available
210.08 tok/sEstimated
Qwen/Qwen2.5-1.5B-InstructFP16
Fits comfortably11GB required · 80GB available
118.35 tok/sEstimated
facebook/opt-125mQ4
Fits comfortably4GB required · 80GB available
307.69 tok/sEstimated
facebook/opt-125mQ8
Fits comfortably7GB required · 80GB available
210.56 tok/sEstimated
facebook/opt-125mFP16
Fits comfortably15GB required · 80GB available
112.80 tok/sEstimated
Qwen/Qwen3-4B-Instruct-2507FP16
Fits comfortably9GB required · 80GB available
129.28 tok/sEstimated
meta-llama/Llama-3.2-1B-InstructQ4
Fits comfortably1GB required · 80GB available
377.46 tok/sEstimated
openai/gpt-oss-120bQ4
Fits comfortably59GB required · 80GB available
59.12 tok/sEstimated
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4
Fits comfortably2GB required · 80GB available
377.27 tok/sEstimated
mistralai/Mistral-7B-Instruct-v0.2Q4
Fits comfortably4GB required · 80GB available
294.20 tok/sEstimated
mistralai/Mistral-7B-Instruct-v0.2Q8
Fits comfortably7GB required · 80GB available
205.23 tok/sEstimated
mistralai/Mistral-7B-Instruct-v0.2FP16
Fits comfortably15GB required · 80GB available
112.13 tok/sEstimated
Qwen/Qwen3-8BQ4
Fits comfortably4GB required · 80GB available
322.19 tok/sEstimated
Qwen/Qwen3-8BQ8
Fits comfortably9GB required · 80GB available
238.17 tok/sEstimated
meta-llama/Llama-3.2-1BQ4
Fits comfortably1GB required · 80GB available
391.10 tok/sEstimated
Qwen/Qwen2.5-0.5B-InstructQ8
Fits comfortably5GB required · 80GB available
214.29 tok/sEstimated
Qwen/Qwen2.5-0.5B-InstructFP16
Fits comfortably11GB required · 80GB available
119.33 tok/sEstimated
Qwen/Qwen3-32BQ4
Fits comfortably16GB required · 80GB available
103.00 tok/sEstimated
Qwen/Qwen3-32BQ8
Fits comfortably33GB required · 80GB available
72.37 tok/sEstimated
allenai/OLMo-2-0425-1BQ4
Fits comfortably1GB required · 80GB available
394.28 tok/sEstimated
meta-llama/Llama-3.2-1BFP16
Fits comfortably2GB required · 80GB available
138.71 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070
12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB
16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT
16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super
12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080
10GB

Explore how RTX 3080 stacks up for local inference workloads.