L
localai.computer
ModelsGPUsSystemsAI SetupsBuildsMethodology

Resources

  • Methodology
  • Submit Benchmark
  • About

Browse

  • AI Models
  • GPUs
  • PC Builds

Community

  • Leaderboard

Legal

  • Privacy
  • Terms
  • Contact

© 2025 localai.computer. Hardware recommendations for running AI models locally.

ℹ️We earn from qualifying purchases through affiliate links at no extra cost to you. This supports our free content and research.

  1. Home
  2. GPUs
  3. NVIDIA H200 SXM 141GB

Quick Answer: NVIDIA H200 SXM 141GB offers 141GB VRAM and starts around current market pricing. It delivers approximately 918 tokens/sec on deepseek-ai/DeepSeek-OCR. It typically draws 700W under load.

NVIDIA H200 SXM 141GB

Check availability
By NVIDIAReleased 2023-11MSRP $35,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on AmazonView Benchmarks
Specs snapshot
Key hardware metrics for AI workloads.
VRAM141GB
Cores16,896
TDP700W
ArchitectureHopper

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA H200 SXM 141GB performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hrRunPodfrom $0.30/hrLambda Labsenterprise-grade

AI benchmarks

ModelQuantizationTokens/secVRAM used
deepseek-ai/DeepSeek-OCRQ4
918.04 tok/sEstimated

Auto-generated benchmark

2GB
ibm-research/PowerMoE-3bQ4
899.99 tok/sEstimated

Auto-generated benchmark

2GB
google/embeddinggemma-300mQ4
892.88 tok/sEstimated

Auto-generated benchmark

1GB
google-bert/bert-base-uncasedQ4
890.49 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-2bQ4
890.18 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/Llama-3.2-1B-InstructQ4
887.95 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-3-1b-itQ4
882.07 tok/sEstimated

Auto-generated benchmark

1GB
apple/OpenELM-1_1B-InstructQ4
880.62 tok/sEstimated

Auto-generated benchmark

1GB
allenai/OLMo-2-0425-1BQ4
879.27 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen2.5-3BQ4
874.57 tok/sEstimated

Auto-generated benchmark

2GB
facebook/sam3Q4
854.80 tok/sEstimated

Auto-generated benchmark

1GB
ibm-granite/granite-3.3-2b-instructQ4
845.59 tok/sEstimated

Auto-generated benchmark

1GB
inference-net/Schematron-3BQ4
844.62 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-1B-InstructQ4
831.79 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/deepseek-coder-1.3b-instructQ4
824.89 tok/sEstimated

Auto-generated benchmark

2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4
822.25 tok/sEstimated

Auto-generated benchmark

1GB
LiquidAI/LFM2-1.2BQ4
818.45 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-1BQ4
814.74 tok/sEstimated

Auto-generated benchmark

1GB
bigcode/starcoder2-3bQ4
807.54 tok/sEstimated

Auto-generated benchmark

2GB
WeiboAI/VibeThinker-1.5BQ4
806.27 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-2-2b-itQ4
801.78 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-3B-InstructQ4
794.66 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-Guard-3-1BQ4
793.70 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen2.5-3B-InstructQ4
790.53 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-3BQ4
783.01 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/gemma-3-1b-itQ4
775.54 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/Llama-3.2-3B-InstructQ4
764.26 tok/sEstimated

Auto-generated benchmark

2GB
google-t5/t5-3bQ4
760.55 tok/sEstimated

Auto-generated benchmark

2GB
tencent/HunyuanOCRQ4
760.33 tok/sEstimated

Auto-generated benchmark

1GB
black-forest-labs/FLUX.2-devQ4
757.49 tok/sEstimated

Auto-generated benchmark

4GB
unsloth/Meta-Llama-3.1-8B-InstructQ4
756.55 tok/sEstimated

Auto-generated benchmark

4GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4
752.49 tok/sEstimated

Auto-generated benchmark

2GB
zai-org/GLM-4.6-FP8Q4
751.25 tok/sEstimated

Auto-generated benchmark

4GB
zai-org/GLM-4.5-AirQ4
750.94 tok/sEstimated

Auto-generated benchmark

4GB
llamafactory/tiny-random-Llama-3Q4
750.21 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-4B-Thinking-2507-FP8Q4
749.40 tok/sEstimated

Auto-generated benchmark

2GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4
749.00 tok/sEstimated

Auto-generated benchmark

4GB
microsoft/Phi-4-multimodal-instructQ4
747.88 tok/sEstimated

Auto-generated benchmark

4GB
meta-llama/Llama-3.1-8BQ4
747.62 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2.5-1.5BQ4
746.78 tok/sEstimated

Auto-generated benchmark

3GB
Qwen/Qwen2.5-Math-1.5BQ4
746.03 tok/sEstimated

Auto-generated benchmark

3GB
deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4
745.77 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen-Image-Edit-2509Q4
744.63 tok/sEstimated

Auto-generated benchmark

4GB
nari-labs/Dia2-2BQ4
744.44 tok/sEstimated

Auto-generated benchmark

2GB
distilbert/distilgpt2Q4
744.14 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2.5-Coder-1.5BQ4
744.01 tok/sEstimated

Auto-generated benchmark

3GB
Qwen/Qwen3-Embedding-0.6BQ4
743.37 tok/sEstimated

Auto-generated benchmark

3GB
mistralai/Mistral-7B-Instruct-v0.1Q4
743.33 tok/sEstimated

Auto-generated benchmark

4GB
numind/NuExtract-1.5Q4
741.76 tok/sEstimated

Auto-generated benchmark

4GB
unsloth/mistral-7b-v0.3-bnb-4bitQ4
741.46 tok/sEstimated

Auto-generated benchmark

4GB
deepseek-ai/DeepSeek-OCR
Q4
2GB
918.04 tok/sEstimated
Auto-generated benchmark
ibm-research/PowerMoE-3b
Q4
2GB
899.99 tok/sEstimated
Auto-generated benchmark
google/embeddinggemma-300m
Q4
1GB
892.88 tok/sEstimated
Auto-generated benchmark
google-bert/bert-base-uncased
Q4
1GB
890.49 tok/sEstimated
Auto-generated benchmark
google/gemma-2b
Q4
1GB
890.18 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-1B-Instruct
Q4
1GB
887.95 tok/sEstimated
Auto-generated benchmark
google/gemma-3-1b-it
Q4
1GB
882.07 tok/sEstimated
Auto-generated benchmark
apple/OpenELM-1_1B-Instruct
Q4
1GB
880.62 tok/sEstimated
Auto-generated benchmark
allenai/OLMo-2-0425-1B
Q4
1GB
879.27 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B
Q4
2GB
874.57 tok/sEstimated
Auto-generated benchmark
facebook/sam3
Q4
1GB
854.80 tok/sEstimated
Auto-generated benchmark
ibm-granite/granite-3.3-2b-instruct
Q4
1GB
845.59 tok/sEstimated
Auto-generated benchmark
inference-net/Schematron-3B
Q4
2GB
844.62 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B-Instruct
Q4
1GB
831.79 tok/sEstimated
Auto-generated benchmark
deepseek-ai/deepseek-coder-1.3b-instruct
Q4
2GB
824.89 tok/sEstimated
Auto-generated benchmark
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Q4
1GB
822.25 tok/sEstimated
Auto-generated benchmark
LiquidAI/LFM2-1.2B
Q4
1GB
818.45 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B
Q4
1GB
814.74 tok/sEstimated
Auto-generated benchmark
bigcode/starcoder2-3b
Q4
2GB
807.54 tok/sEstimated
Auto-generated benchmark
WeiboAI/VibeThinker-1.5B
Q4
1GB
806.27 tok/sEstimated
Auto-generated benchmark
google/gemma-2-2b-it
Q4
1GB
801.78 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B-Instruct
Q4
2GB
794.66 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-Guard-3-1B
Q4
1GB
793.70 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B-Instruct
Q4
2GB
790.53 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B
Q4
2GB
783.01 tok/sEstimated
Auto-generated benchmark
unsloth/gemma-3-1b-it
Q4
1GB
775.54 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-3B-Instruct
Q4
2GB
764.26 tok/sEstimated
Auto-generated benchmark
google-t5/t5-3b
Q4
2GB
760.55 tok/sEstimated
Auto-generated benchmark
tencent/HunyuanOCR
Q4
1GB
760.33 tok/sEstimated
Auto-generated benchmark
black-forest-labs/FLUX.2-dev
Q4
4GB
757.49 tok/sEstimated
Auto-generated benchmark
unsloth/Meta-Llama-3.1-8B-Instruct
Q4
4GB
756.55 tok/sEstimated
Auto-generated benchmark
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
Q4
2GB
752.49 tok/sEstimated
Auto-generated benchmark
zai-org/GLM-4.6-FP8
Q4
4GB
751.25 tok/sEstimated
Auto-generated benchmark
zai-org/GLM-4.5-Air
Q4
4GB
750.94 tok/sEstimated
Auto-generated benchmark
llamafactory/tiny-random-Llama-3
Q4
4GB
750.21 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-4B-Thinking-2507-FP8
Q4
2GB
749.40 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Q4
4GB
749.00 tok/sEstimated
Auto-generated benchmark
microsoft/Phi-4-multimodal-instruct
Q4
4GB
747.88 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.1-8B
Q4
4GB
747.62 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-1.5B
Q4
3GB
746.78 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-Math-1.5B
Q4
3GB
746.03 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Q4
4GB
745.77 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen-Image-Edit-2509
Q4
4GB
744.63 tok/sEstimated
Auto-generated benchmark
nari-labs/Dia2-2B
Q4
2GB
744.44 tok/sEstimated
Auto-generated benchmark
distilbert/distilgpt2
Q4
4GB
744.14 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-Coder-1.5B
Q4
3GB
744.01 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-Embedding-0.6B
Q4
3GB
743.37 tok/sEstimated
Auto-generated benchmark
mistralai/Mistral-7B-Instruct-v0.1
Q4
4GB
743.33 tok/sEstimated
Auto-generated benchmark
numind/NuExtract-1.5
Q4
4GB
741.76 tok/sEstimated
Auto-generated benchmark
unsloth/mistral-7b-v0.3-bnb-4bit
Q4
4GB
741.46 tok/sEstimated
Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

ModelQuantizationVerdictEstimated speedVRAM needed
codellama/CodeLlama-34b-hfQ4Fits comfortably
262.21 tok/sEstimated
17GB (have 141GB)
google/gemma-3-1b-itFP16Fits comfortably
315.12 tok/sEstimated
2GB (have 141GB)
Qwen/Qwen3-Embedding-0.6BQ4Fits comfortably
743.37 tok/sEstimated
3GB (have 141GB)
Qwen/Qwen3-0.6BFP16Fits comfortably
277.30 tok/sEstimated
13GB (have 141GB)
Gensyn/Qwen2.5-0.5B-InstructQ4Fits comfortably
628.06 tok/sEstimated
3GB (have 141GB)
Qwen/Qwen3-Embedding-0.6BQ8Fits comfortably
451.95 tok/sEstimated
6GB (have 141GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4Fits comfortably
716.31 tok/sEstimated
4GB (have 141GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8Fits comfortably
438.05 tok/sEstimated
7GB (have 141GB)
Qwen/Qwen2.5-0.5B-InstructFP16Fits comfortably
245.18 tok/sEstimated
11GB (have 141GB)
Qwen/Qwen3-Next-80B-A3B-InstructQ4Fits comfortably
125.93 tok/sEstimated
39GB (have 141GB)
Qwen/Qwen3-Next-80B-A3B-InstructQ8Fits comfortably
102.14 tok/sEstimated
78GB (have 141GB)
Qwen/Qwen3-Next-80B-A3B-InstructFP16Not supported
56.23 tok/sEstimated
156GB (have 141GB)
allenai/OLMo-2-0425-1BQ4Fits comfortably
879.27 tok/sEstimated
1GB (have 141GB)
openai-community/gpt2-largeFP16Fits comfortably
242.81 tok/sEstimated
15GB (have 141GB)
Qwen/Qwen3-1.7BQ4Fits comfortably
632.44 tok/sEstimated
4GB (have 141GB)
Qwen/Qwen3-1.7BQ8Fits comfortably
505.75 tok/sEstimated
7GB (have 141GB)
Qwen/Qwen3-4BQ8Fits comfortably
436.11 tok/sEstimated
4GB (have 141GB)
Qwen/Qwen3-4BFP16Fits comfortably
277.19 tok/sEstimated
9GB (have 141GB)
Qwen/Qwen3-30B-A3B-Instruct-2507Q4Fits comfortably
343.71 tok/sEstimated
15GB (have 141GB)
google-t5/t5-3bQ8Fits comfortably
543.51 tok/sEstimated
3GB (have 141GB)
google-t5/t5-3bFP16Fits comfortably
343.50 tok/sEstimated
6GB (have 141GB)
meta-llama/Meta-Llama-3-8B-InstructFP16Fits comfortably
237.97 tok/sEstimated
17GB (have 141GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4Fits comfortably
661.64 tok/sEstimated
3GB (have 141GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8Fits comfortably
508.74 tok/sEstimated
5GB (have 141GB)
Qwen/Qwen2.5-1.5BQ8Fits comfortably
479.10 tok/sEstimated
5GB (have 141GB)
Qwen/Qwen2.5-1.5BFP16Fits comfortably
254.57 tok/sEstimated
11GB (have 141GB)
Qwen/Qwen2.5-14B-InstructQ4Fits comfortably
567.14 tok/sEstimated
7GB (have 141GB)
Qwen/Qwen2.5-14B-InstructQ8Fits comfortably
332.17 tok/sEstimated
14GB (have 141GB)
Qwen/Qwen2.5-0.5BQ8Fits comfortably
444.56 tok/sEstimated
5GB (have 141GB)
Qwen/Qwen2.5-0.5BFP16Fits comfortably
261.43 tok/sEstimated
11GB (have 141GB)
meta-llama/Llama-3.1-70B-InstructQ4Fits comfortably
238.34 tok/sEstimated
34GB (have 141GB)
zai-org/GLM-4.6-FP8Q8Fits comfortably
516.40 tok/sEstimated
7GB (have 141GB)
zai-org/GLM-4.6-FP8FP16Fits comfortably
256.62 tok/sEstimated
15GB (have 141GB)
deepseek-ai/DeepSeek-V3.1FP16Fits comfortably
280.41 tok/sEstimated
15GB (have 141GB)
meta-llama/Llama-3.1-8BQ4Fits comfortably
747.62 tok/sEstimated
4GB (have 141GB)
meta-llama/Llama-3.1-8BQ8Fits comfortably
494.06 tok/sEstimated
9GB (have 141GB)
LiquidAI/LFM2-1.2BQ8Fits comfortably
587.46 tok/sEstimated
2GB (have 141GB)
unsloth/Meta-Llama-3.1-8B-InstructQ8Fits comfortably
457.50 tok/sEstimated
9GB (have 141GB)
unsloth/Meta-Llama-3.1-8B-InstructFP16Fits comfortably
261.98 tok/sEstimated
17GB (have 141GB)
meta-llama/Meta-Llama-3-70B-InstructQ4Fits comfortably
237.83 tok/sEstimated
34GB (have 141GB)
meta-llama/Meta-Llama-3-70B-InstructQ8Fits comfortably
185.61 tok/sEstimated
68GB (have 141GB)
Qwen/Qwen2.5-Math-1.5BFP16Fits comfortably
285.10 tok/sEstimated
11GB (have 141GB)
trl-internal-testing/tiny-random-LlamaForCausalLMQ4Fits comfortably
718.80 tok/sEstimated
4GB (have 141GB)
Qwen/Qwen3-Embedding-4BQ4Fits comfortably
662.33 tok/sEstimated
2GB (have 141GB)
Qwen/Qwen3-Embedding-4BQ8Fits comfortably
477.16 tok/sEstimated
4GB (have 141GB)
Qwen/Qwen3-Embedding-4BFP16Fits comfortably
265.14 tok/sEstimated
9GB (have 141GB)
unsloth/mistral-7b-v0.3-bnb-4bitFP16Fits comfortably
241.52 tok/sEstimated
15GB (have 141GB)
Qwen/Qwen2.5-14BQ4Fits comfortably
491.17 tok/sEstimated
7GB (have 141GB)
Qwen/Qwen2.5-14BQ8Fits comfortably
340.33 tok/sEstimated
14GB (have 141GB)
Qwen/Qwen2-1.5B-InstructQ8Fits comfortably
473.49 tok/sEstimated
5GB (have 141GB)
codellama/CodeLlama-34b-hfQ4
Fits comfortably17GB required · 141GB available
262.21 tok/sEstimated
google/gemma-3-1b-itFP16
Fits comfortably2GB required · 141GB available
315.12 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BQ4
Fits comfortably3GB required · 141GB available
743.37 tok/sEstimated
Qwen/Qwen3-0.6BFP16
Fits comfortably13GB required · 141GB available
277.30 tok/sEstimated
Gensyn/Qwen2.5-0.5B-InstructQ4
Fits comfortably3GB required · 141GB available
628.06 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BQ8
Fits comfortably6GB required · 141GB available
451.95 tok/sEstimated
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4
Fits comfortably4GB required · 141GB available
716.31 tok/sEstimated
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8
Fits comfortably7GB required · 141GB available
438.05 tok/sEstimated
Qwen/Qwen2.5-0.5B-InstructFP16
Fits comfortably11GB required · 141GB available
245.18 tok/sEstimated
Qwen/Qwen3-Next-80B-A3B-InstructQ4
Fits comfortably39GB required · 141GB available
125.93 tok/sEstimated
Qwen/Qwen3-Next-80B-A3B-InstructQ8
Fits comfortably78GB required · 141GB available
102.14 tok/sEstimated
Qwen/Qwen3-Next-80B-A3B-InstructFP16
Not supported156GB required · 141GB available
56.23 tok/sEstimated
allenai/OLMo-2-0425-1BQ4
Fits comfortably1GB required · 141GB available
879.27 tok/sEstimated
openai-community/gpt2-largeFP16
Fits comfortably15GB required · 141GB available
242.81 tok/sEstimated
Qwen/Qwen3-1.7BQ4
Fits comfortably4GB required · 141GB available
632.44 tok/sEstimated
Qwen/Qwen3-1.7BQ8
Fits comfortably7GB required · 141GB available
505.75 tok/sEstimated
Qwen/Qwen3-4BQ8
Fits comfortably4GB required · 141GB available
436.11 tok/sEstimated
Qwen/Qwen3-4BFP16
Fits comfortably9GB required · 141GB available
277.19 tok/sEstimated
Qwen/Qwen3-30B-A3B-Instruct-2507Q4
Fits comfortably15GB required · 141GB available
343.71 tok/sEstimated
google-t5/t5-3bQ8
Fits comfortably3GB required · 141GB available
543.51 tok/sEstimated
google-t5/t5-3bFP16
Fits comfortably6GB required · 141GB available
343.50 tok/sEstimated
meta-llama/Meta-Llama-3-8B-InstructFP16
Fits comfortably17GB required · 141GB available
237.97 tok/sEstimated
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4
Fits comfortably3GB required · 141GB available
661.64 tok/sEstimated
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8
Fits comfortably5GB required · 141GB available
508.74 tok/sEstimated
Qwen/Qwen2.5-1.5BQ8
Fits comfortably5GB required · 141GB available
479.10 tok/sEstimated
Qwen/Qwen2.5-1.5BFP16
Fits comfortably11GB required · 141GB available
254.57 tok/sEstimated
Qwen/Qwen2.5-14B-InstructQ4
Fits comfortably7GB required · 141GB available
567.14 tok/sEstimated
Qwen/Qwen2.5-14B-InstructQ8
Fits comfortably14GB required · 141GB available
332.17 tok/sEstimated
Qwen/Qwen2.5-0.5BQ8
Fits comfortably5GB required · 141GB available
444.56 tok/sEstimated
Qwen/Qwen2.5-0.5BFP16
Fits comfortably11GB required · 141GB available
261.43 tok/sEstimated
meta-llama/Llama-3.1-70B-InstructQ4
Fits comfortably34GB required · 141GB available
238.34 tok/sEstimated
zai-org/GLM-4.6-FP8Q8
Fits comfortably7GB required · 141GB available
516.40 tok/sEstimated
zai-org/GLM-4.6-FP8FP16
Fits comfortably15GB required · 141GB available
256.62 tok/sEstimated
deepseek-ai/DeepSeek-V3.1FP16
Fits comfortably15GB required · 141GB available
280.41 tok/sEstimated
meta-llama/Llama-3.1-8BQ4
Fits comfortably4GB required · 141GB available
747.62 tok/sEstimated
meta-llama/Llama-3.1-8BQ8
Fits comfortably9GB required · 141GB available
494.06 tok/sEstimated
LiquidAI/LFM2-1.2BQ8
Fits comfortably2GB required · 141GB available
587.46 tok/sEstimated
unsloth/Meta-Llama-3.1-8B-InstructQ8
Fits comfortably9GB required · 141GB available
457.50 tok/sEstimated
unsloth/Meta-Llama-3.1-8B-InstructFP16
Fits comfortably17GB required · 141GB available
261.98 tok/sEstimated
meta-llama/Meta-Llama-3-70B-InstructQ4
Fits comfortably34GB required · 141GB available
237.83 tok/sEstimated
meta-llama/Meta-Llama-3-70B-InstructQ8
Fits comfortably68GB required · 141GB available
185.61 tok/sEstimated
Qwen/Qwen2.5-Math-1.5BFP16
Fits comfortably11GB required · 141GB available
285.10 tok/sEstimated
trl-internal-testing/tiny-random-LlamaForCausalLMQ4
Fits comfortably4GB required · 141GB available
718.80 tok/sEstimated
Qwen/Qwen3-Embedding-4BQ4
Fits comfortably2GB required · 141GB available
662.33 tok/sEstimated
Qwen/Qwen3-Embedding-4BQ8
Fits comfortably4GB required · 141GB available
477.16 tok/sEstimated
Qwen/Qwen3-Embedding-4BFP16
Fits comfortably9GB required · 141GB available
265.14 tok/sEstimated
unsloth/mistral-7b-v0.3-bnb-4bitFP16
Fits comfortably15GB required · 141GB available
241.52 tok/sEstimated
Qwen/Qwen2.5-14BQ4
Fits comfortably7GB required · 141GB available
491.17 tok/sEstimated
Qwen/Qwen2.5-14BQ8
Fits comfortably14GB required · 141GB available
340.33 tok/sEstimated
Qwen/Qwen2-1.5B-InstructQ8
Fits comfortably5GB required · 141GB available
473.49 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070
12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB
16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT
16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super
12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080
10GB

Explore how RTX 3080 stacks up for local inference workloads.