Quick Answer: NVIDIA A100 40GB PCIe offers 40GB VRAM and starts around current market pricing. It delivers approximately 305 tokens/sec on deepseek-ai/DeepSeek-OCR. It typically draws 250W under load.

NVIDIA A100 40GB PCIe

Check availability

By NVIDIAReleased 2020-05MSRP $9,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM40GB

Cores6,912

TDP250W

ArchitectureAmpere

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA A100 40GB PCIe performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
deepseek-ai/DeepSeek-OCR	Q4	305.06 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	299.56 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	292.06 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	292.02 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	291.66 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	291.41 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	288.71 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	288.46 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	286.95 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	286.26 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-3-1b-it	Q4	286.00 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	283.67 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	279.10 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	278.91 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	278.86 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	272.47 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	267.77 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	266.87 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	265.74 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	263.70 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	261.73 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	261.26 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	258.34 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	257.39 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	255.31 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	253.25 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	253.20 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	253.12 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	252.58 tok/sEstimated Auto-generated benchmark	1GB
IlyaGusev/saiga_llama3_8b	Q4	249.78 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	249.76 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	249.38 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	249.03 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-v0.1	Q4	247.89 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	247.60 tok/sEstimated Auto-generated benchmark	4GB
tencent/HunyuanOCR	Q4	247.59 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-8B	Q4	247.34 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Instruct	Q4	247.20 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B	Q4	246.36 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B-Instruct	Q4	245.70 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B	Q4	245.62 tok/sEstimated Auto-generated benchmark	2GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	245.56 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B	Q4	245.44 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Reranker-0.6B	Q4	245.36 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-Embedding-4B	Q4	245.00 tok/sEstimated Auto-generated benchmark	2GB
HuggingFaceTB/SmolLM-135M	Q4	244.51 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	244.31 tok/sEstimated Auto-generated benchmark	4GB
black-forest-labs/FLUX.2-dev	Q4	244.26 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	243.04 tok/sEstimated Auto-generated benchmark	4GB
allenai/Olmo-3-7B-Think	Q4	242.76 tok/sEstimated Auto-generated benchmark	4GB

deepseek-ai/DeepSeek-OCR

2GB

305.06 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

299.56 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

292.06 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

292.02 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

291.66 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

291.41 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

288.71 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

288.46 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

286.95 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

286.26 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

286.00 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

283.67 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

279.10 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

278.91 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

278.86 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

272.47 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

267.77 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

266.87 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

265.74 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

263.70 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

261.73 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

261.26 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

258.34 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

257.39 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

255.31 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

253.25 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

253.20 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

253.12 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

252.58 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

4GB

249.78 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

249.76 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

249.38 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

249.03 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

247.89 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

4GB

247.60 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

247.59 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

247.34 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

247.20 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

4GB

246.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

245.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

245.62 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

245.56 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

245.44 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

245.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

245.00 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

4GB

244.51 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

244.31 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.2-dev

4GB

244.26 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

243.04 tok/sEstimated

Auto-generated benchmark

allenai/Olmo-3-7B-Think

4GB

242.76 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
openai-community/gpt2	FP16	Fits comfortably	82.14 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen2.5-7B-Instruct	Q4	Fits comfortably	212.12 tok/sEstimated	4GB (have 40GB)
Qwen/Qwen2.5-7B-Instruct	Q8	Fits comfortably	157.17 tok/sEstimated	7GB (have 40GB)
Qwen/Qwen2.5-7B-Instruct	FP16	Fits comfortably	93.89 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen3-0.6B	FP16	Fits comfortably	86.25 tok/sEstimated	13GB (have 40GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	157.58 tok/sEstimated	9GB (have 40GB)
meta-llama/Llama-3.1-8B-Instruct	FP16	Fits comfortably	80.80 tok/sEstimated	17GB (have 40GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	Fits comfortably	82.01 tok/sEstimated	17GB (have 40GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Fits comfortably	58.89 tok/sEstimated	35GB (have 40GB)
openai/gpt-oss-20b	FP16	Not supported	46.10 tok/sEstimated	41GB (have 40GB)
google/gemma-3-1b-it	Q4	Fits comfortably	286.00 tok/sEstimated	1GB (have 40GB)
google/gemma-3-1b-it	Q8	Fits comfortably	175.67 tok/sEstimated	1GB (have 40GB)
google/gemma-3-1b-it	FP16	Fits comfortably	96.54 tok/sEstimated	2GB (have 40GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	166.01 tok/sEstimated	6GB (have 40GB)
facebook/opt-125m	FP16	Fits comfortably	83.94 tok/sEstimated	15GB (have 40GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	Fits comfortably	253.12 tok/sEstimated	1GB (have 40GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	194.52 tok/sEstimated	1GB (have 40GB)
Qwen/Qwen3-4B-Instruct-2507	Q8	Fits comfortably	171.15 tok/sEstimated	4GB (have 40GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	87.85 tok/sEstimated	9GB (have 40GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	261.73 tok/sEstimated	1GB (have 40GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	182.83 tok/sEstimated	1GB (have 40GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	107.24 tok/sEstimated	2GB (have 40GB)
openai/gpt-oss-120b	Q4	Not supported	44.76 tok/sEstimated	59GB (have 40GB)
openai/gpt-oss-120b	Q8	Not supported	34.13 tok/sEstimated	117GB (have 40GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	263.70 tok/sEstimated	2GB (have 40GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	189.76 tok/sEstimated	3GB (have 40GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	FP16	Fits comfortably	99.35 tok/sEstimated	6GB (have 40GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	230.33 tok/sEstimated	4GB (have 40GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	143.52 tok/sEstimated	7GB (have 40GB)
mistralai/Mistral-7B-Instruct-v0.2	FP16	Fits comfortably	86.44 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	160.82 tok/sEstimated	9GB (have 40GB)
Qwen/Qwen3-8B	FP16	Fits comfortably	78.21 tok/sEstimated	17GB (have 40GB)
inference-net/Schematron-3B	Q4	Fits comfortably	278.91 tok/sEstimated	2GB (have 40GB)
inference-net/Schematron-3B	Q8	Fits comfortably	204.68 tok/sEstimated	3GB (have 40GB)
inference-net/Schematron-3B	FP16	Fits comfortably	109.19 tok/sEstimated	6GB (have 40GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q4	Fits comfortably	71.99 tok/sEstimated	16GB (have 40GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q8	Fits comfortably	57.95 tok/sEstimated	33GB (have 40GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	FP16	Not supported	30.91 tok/sEstimated	66GB (have 40GB)
Qwen/Qwen2.5-7B	FP16	Fits comfortably	92.88 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Fits (tight)	44.99 tok/sEstimated	39GB (have 40GB)
allenai/OLMo-2-0425-1B	FP16	Fits comfortably	109.38 tok/sEstimated	2GB (have 40GB)
microsoft/Phi-3-mini-4k-instruct	Q4	Fits comfortably	239.34 tok/sEstimated	4GB (have 40GB)
microsoft/Phi-3-mini-4k-instruct	Q8	Fits comfortably	159.70 tok/sEstimated	7GB (have 40GB)
microsoft/Phi-3-mini-4k-instruct	FP16	Fits comfortably	82.87 tok/sEstimated	15GB (have 40GB)
openai-community/gpt2-large	Q8	Fits comfortably	144.36 tok/sEstimated	7GB (have 40GB)
openai-community/gpt2-large	FP16	Fits comfortably	93.75 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	214.33 tok/sEstimated	4GB (have 40GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	148.42 tok/sEstimated	7GB (have 40GB)
Qwen/Qwen3-1.7B	FP16	Fits comfortably	79.76 tok/sEstimated	15GB (have 40GB)
openai-community/gpt2	Q8	Fits comfortably	158.89 tok/sEstimated	7GB (have 40GB)

openai-community/gpt2FP16

Fits comfortably15GB required · 40GB available

82.14 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ4

Fits comfortably4GB required · 40GB available

212.12 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ8

Fits comfortably7GB required · 40GB available

157.17 tok/sEstimated

Qwen/Qwen2.5-7B-InstructFP16

Fits comfortably15GB required · 40GB available

93.89 tok/sEstimated

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 40GB available

86.25 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 40GB available

157.58 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 40GB available

80.80 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ4

Fits comfortably17GB required · 40GB available

82.01 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Fits comfortably35GB required · 40GB available

58.89 tok/sEstimated

openai/gpt-oss-20bFP16

Not supported41GB required · 40GB available

46.10 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 40GB available

286.00 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 40GB available

175.67 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 40GB available

96.54 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 40GB available

166.01 tok/sEstimated

facebook/opt-125mFP16

Fits comfortably15GB required · 40GB available

83.94 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4

Fits comfortably1GB required · 40GB available

253.12 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 40GB available

194.52 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q8

Fits comfortably4GB required · 40GB available

171.15 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 40GB available

87.85 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 40GB available

261.73 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 40GB available

182.83 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 40GB available

107.24 tok/sEstimated

openai/gpt-oss-120bQ4

Not supported59GB required · 40GB available

44.76 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported117GB required · 40GB available

34.13 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 40GB available

263.70 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 40GB available

189.76 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16

Fits comfortably6GB required · 40GB available

99.35 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 40GB available

230.33 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 40GB available

143.52 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2FP16

Fits comfortably15GB required · 40GB available

86.44 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 40GB available

160.82 tok/sEstimated

Qwen/Qwen3-8BFP16

Fits comfortably17GB required · 40GB available

78.21 tok/sEstimated

inference-net/Schematron-3BQ4

Fits comfortably2GB required · 40GB available

278.91 tok/sEstimated

inference-net/Schematron-3BQ8

Fits comfortably3GB required · 40GB available

204.68 tok/sEstimated

inference-net/Schematron-3BFP16

Fits comfortably6GB required · 40GB available

109.19 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ4

Fits comfortably16GB required · 40GB available

71.99 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ8

Fits comfortably33GB required · 40GB available

57.95 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BFP16

Not supported66GB required · 40GB available

30.91 tok/sEstimated

Qwen/Qwen2.5-7BFP16

Fits comfortably15GB required · 40GB available

92.88 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Fits (tight)39GB required · 40GB available

44.99 tok/sEstimated

allenai/OLMo-2-0425-1BFP16

Fits comfortably2GB required · 40GB available

109.38 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ4

Fits comfortably4GB required · 40GB available

239.34 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ8

Fits comfortably7GB required · 40GB available

159.70 tok/sEstimated

microsoft/Phi-3-mini-4k-instructFP16

Fits comfortably15GB required · 40GB available

82.87 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 40GB available

144.36 tok/sEstimated

openai-community/gpt2-largeFP16

Fits comfortably15GB required · 40GB available

93.75 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 40GB available

214.33 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 40GB available

148.42 tok/sEstimated

Qwen/Qwen3-1.7BFP16

Fits comfortably15GB required · 40GB available

79.76 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 40GB available

158.89 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

NVIDIA A100 40GB PCIe

Check availability

By NVIDIAReleased 2020-05MSRP $9,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM40GB

Cores6,912

TDP250W

ArchitectureAmpere

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA A100 40GB PCIe performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
deepseek-ai/DeepSeek-OCR	Q4	305.06 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	299.56 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	292.06 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	292.02 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	291.66 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	291.41 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	288.71 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	288.46 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	286.95 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	286.26 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-3-1b-it	Q4	286.00 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	283.67 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	279.10 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	278.91 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	278.86 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	272.47 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	267.77 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	266.87 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	265.74 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	263.70 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q4	261.73 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	261.26 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q4	258.34 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	257.39 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	255.31 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	253.25 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	253.20 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	253.12 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	252.58 tok/sEstimated Auto-generated benchmark	1GB
IlyaGusev/saiga_llama3_8b	Q4	249.78 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	249.76 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	249.38 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	249.03 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-v0.1	Q4	247.89 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	247.60 tok/sEstimated Auto-generated benchmark	4GB
tencent/HunyuanOCR	Q4	247.59 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-8B	Q4	247.34 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Instruct	Q4	247.20 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B	Q4	246.36 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B-Instruct	Q4	245.70 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B	Q4	245.62 tok/sEstimated Auto-generated benchmark	2GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	245.56 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B	Q4	245.44 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Reranker-0.6B	Q4	245.36 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-Embedding-4B	Q4	245.00 tok/sEstimated Auto-generated benchmark	2GB
HuggingFaceTB/SmolLM-135M	Q4	244.51 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	244.31 tok/sEstimated Auto-generated benchmark	4GB
black-forest-labs/FLUX.2-dev	Q4	244.26 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	243.04 tok/sEstimated Auto-generated benchmark	4GB
allenai/Olmo-3-7B-Think	Q4	242.76 tok/sEstimated Auto-generated benchmark	4GB

deepseek-ai/DeepSeek-OCR

2GB

305.06 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

299.56 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

292.06 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

292.02 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

291.66 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

291.41 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

288.71 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

288.46 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

286.95 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

286.26 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

286.00 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

283.67 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

279.10 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

278.91 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

278.86 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

272.47 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

267.77 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

266.87 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

265.74 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

263.70 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

261.73 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

261.26 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

258.34 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

257.39 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

255.31 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

253.25 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

253.20 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

253.12 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

252.58 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

4GB

249.78 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

249.76 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

249.38 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

249.03 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

247.89 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

4GB

247.60 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

247.59 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

247.34 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

247.20 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

4GB

246.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

245.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

245.62 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

245.56 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

245.44 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

245.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

245.00 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

4GB

244.51 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

244.31 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.2-dev

4GB

244.26 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

243.04 tok/sEstimated

Auto-generated benchmark

allenai/Olmo-3-7B-Think

4GB

242.76 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
openai-community/gpt2	FP16	Fits comfortably	82.14 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen2.5-7B-Instruct	Q4	Fits comfortably	212.12 tok/sEstimated	4GB (have 40GB)
Qwen/Qwen2.5-7B-Instruct	Q8	Fits comfortably	157.17 tok/sEstimated	7GB (have 40GB)
Qwen/Qwen2.5-7B-Instruct	FP16	Fits comfortably	93.89 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen3-0.6B	FP16	Fits comfortably	86.25 tok/sEstimated	13GB (have 40GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	157.58 tok/sEstimated	9GB (have 40GB)
meta-llama/Llama-3.1-8B-Instruct	FP16	Fits comfortably	80.80 tok/sEstimated	17GB (have 40GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	Fits comfortably	82.01 tok/sEstimated	17GB (have 40GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Fits comfortably	58.89 tok/sEstimated	35GB (have 40GB)
openai/gpt-oss-20b	FP16	Not supported	46.10 tok/sEstimated	41GB (have 40GB)
google/gemma-3-1b-it	Q4	Fits comfortably	286.00 tok/sEstimated	1GB (have 40GB)
google/gemma-3-1b-it	Q8	Fits comfortably	175.67 tok/sEstimated	1GB (have 40GB)
google/gemma-3-1b-it	FP16	Fits comfortably	96.54 tok/sEstimated	2GB (have 40GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	166.01 tok/sEstimated	6GB (have 40GB)
facebook/opt-125m	FP16	Fits comfortably	83.94 tok/sEstimated	15GB (have 40GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	Fits comfortably	253.12 tok/sEstimated	1GB (have 40GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	194.52 tok/sEstimated	1GB (have 40GB)
Qwen/Qwen3-4B-Instruct-2507	Q8	Fits comfortably	171.15 tok/sEstimated	4GB (have 40GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	87.85 tok/sEstimated	9GB (have 40GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	261.73 tok/sEstimated	1GB (have 40GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	182.83 tok/sEstimated	1GB (have 40GB)
meta-llama/Llama-3.2-1B-Instruct	FP16	Fits comfortably	107.24 tok/sEstimated	2GB (have 40GB)
openai/gpt-oss-120b	Q4	Not supported	44.76 tok/sEstimated	59GB (have 40GB)
openai/gpt-oss-120b	Q8	Not supported	34.13 tok/sEstimated	117GB (have 40GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	263.70 tok/sEstimated	2GB (have 40GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	189.76 tok/sEstimated	3GB (have 40GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	FP16	Fits comfortably	99.35 tok/sEstimated	6GB (have 40GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	230.33 tok/sEstimated	4GB (have 40GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	143.52 tok/sEstimated	7GB (have 40GB)
mistralai/Mistral-7B-Instruct-v0.2	FP16	Fits comfortably	86.44 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	160.82 tok/sEstimated	9GB (have 40GB)
Qwen/Qwen3-8B	FP16	Fits comfortably	78.21 tok/sEstimated	17GB (have 40GB)
inference-net/Schematron-3B	Q4	Fits comfortably	278.91 tok/sEstimated	2GB (have 40GB)
inference-net/Schematron-3B	Q8	Fits comfortably	204.68 tok/sEstimated	3GB (have 40GB)
inference-net/Schematron-3B	FP16	Fits comfortably	109.19 tok/sEstimated	6GB (have 40GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q4	Fits comfortably	71.99 tok/sEstimated	16GB (have 40GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q8	Fits comfortably	57.95 tok/sEstimated	33GB (have 40GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	FP16	Not supported	30.91 tok/sEstimated	66GB (have 40GB)
Qwen/Qwen2.5-7B	FP16	Fits comfortably	92.88 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Fits (tight)	44.99 tok/sEstimated	39GB (have 40GB)
allenai/OLMo-2-0425-1B	FP16	Fits comfortably	109.38 tok/sEstimated	2GB (have 40GB)
microsoft/Phi-3-mini-4k-instruct	Q4	Fits comfortably	239.34 tok/sEstimated	4GB (have 40GB)
microsoft/Phi-3-mini-4k-instruct	Q8	Fits comfortably	159.70 tok/sEstimated	7GB (have 40GB)
microsoft/Phi-3-mini-4k-instruct	FP16	Fits comfortably	82.87 tok/sEstimated	15GB (have 40GB)
openai-community/gpt2-large	Q8	Fits comfortably	144.36 tok/sEstimated	7GB (have 40GB)
openai-community/gpt2-large	FP16	Fits comfortably	93.75 tok/sEstimated	15GB (have 40GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	214.33 tok/sEstimated	4GB (have 40GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	148.42 tok/sEstimated	7GB (have 40GB)
Qwen/Qwen3-1.7B	FP16	Fits comfortably	79.76 tok/sEstimated	15GB (have 40GB)
openai-community/gpt2	Q8	Fits comfortably	158.89 tok/sEstimated	7GB (have 40GB)

openai-community/gpt2FP16

Fits comfortably15GB required · 40GB available

82.14 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ4

Fits comfortably4GB required · 40GB available

212.12 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ8

Fits comfortably7GB required · 40GB available

157.17 tok/sEstimated

Qwen/Qwen2.5-7B-InstructFP16

Fits comfortably15GB required · 40GB available

93.89 tok/sEstimated

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 40GB available

86.25 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 40GB available

157.58 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 40GB available

80.80 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ4

Fits comfortably17GB required · 40GB available

82.01 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Fits comfortably35GB required · 40GB available

58.89 tok/sEstimated

openai/gpt-oss-20bFP16

Not supported41GB required · 40GB available

46.10 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 40GB available

286.00 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 40GB available

175.67 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 40GB available

96.54 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 40GB available

166.01 tok/sEstimated

facebook/opt-125mFP16

Fits comfortably15GB required · 40GB available

83.94 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4

Fits comfortably1GB required · 40GB available

253.12 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 40GB available

194.52 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q8

Fits comfortably4GB required · 40GB available

171.15 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 40GB available

87.85 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 40GB available

261.73 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 40GB available

182.83 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 40GB available

107.24 tok/sEstimated

openai/gpt-oss-120bQ4

Not supported59GB required · 40GB available

44.76 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported117GB required · 40GB available

34.13 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 40GB available

263.70 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 40GB available

189.76 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16FP16

Fits comfortably6GB required · 40GB available

99.35 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 40GB available

230.33 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 40GB available

143.52 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2FP16

Fits comfortably15GB required · 40GB available

86.44 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 40GB available

160.82 tok/sEstimated

Qwen/Qwen3-8BFP16

Fits comfortably17GB required · 40GB available

78.21 tok/sEstimated

inference-net/Schematron-3BQ4

Fits comfortably2GB required · 40GB available

278.91 tok/sEstimated

inference-net/Schematron-3BQ8

Fits comfortably3GB required · 40GB available

204.68 tok/sEstimated

inference-net/Schematron-3BFP16

Fits comfortably6GB required · 40GB available

109.19 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ4

Fits comfortably16GB required · 40GB available

71.99 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ8

Fits comfortably33GB required · 40GB available

57.95 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BFP16

Not supported66GB required · 40GB available

30.91 tok/sEstimated

Qwen/Qwen2.5-7BFP16

Fits comfortably15GB required · 40GB available

92.88 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Fits (tight)39GB required · 40GB available

44.99 tok/sEstimated

allenai/OLMo-2-0425-1BFP16

Fits comfortably2GB required · 40GB available

109.38 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ4

Fits comfortably4GB required · 40GB available

239.34 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ8

Fits comfortably7GB required · 40GB available

159.70 tok/sEstimated

microsoft/Phi-3-mini-4k-instructFP16

Fits comfortably15GB required · 40GB available

82.87 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 40GB available

144.36 tok/sEstimated

openai-community/gpt2-largeFP16

Fits comfortably15GB required · 40GB available

93.75 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 40GB available

214.33 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 40GB available

148.42 tok/sEstimated

Qwen/Qwen3-1.7BFP16

Fits comfortably15GB required · 40GB available

79.76 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 40GB available

158.89 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.