Quick Answer: NVIDIA A4000 offers 16GB VRAM and starts around $21.99. It delivers approximately 101 tokens/sec on Qwen/Qwen2.5-3B. It typically draws 140W under load.

NVIDIA A4000

Unknown

By NVIDIAReleased 2021-04MSRP $999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $21.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM16GB

Cores6,144

TDP140W

ArchitectureAmpere

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

$21.99

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA A4000 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
Qwen/Qwen2.5-3B	Q4	101.43 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	100.22 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	100.11 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	99.96 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	99.32 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	99.02 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	98.78 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	97.97 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	96.76 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	96.67 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	96.14 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	95.54 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	93.81 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	93.21 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	93.03 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	92.59 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	91.82 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	91.12 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	91.12 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	89.93 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	89.71 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	89.41 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	89.24 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	88.91 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	87.99 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	86.96 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	86.92 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	85.94 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-Coder-1.5B	Q4	84.34 tok/sEstimated Auto-generated benchmark	3GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	84.34 tok/sEstimated Auto-generated benchmark	4GB
allenai/Olmo-3-7B-Think	Q4	84.19 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-4k-instruct	Q4	84.16 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-2b-instruct	Q4	84.08 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	83.93 tok/sEstimated Auto-generated benchmark	2GB
HuggingFaceH4/zephyr-7b-beta	Q4	83.76 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Base	Q4	83.72 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-8B	Q4	83.66 tok/sEstimated Auto-generated benchmark	4GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	83.60 tok/sEstimated Auto-generated benchmark	1GB
mistralai/Mistral-7B-v0.1	Q4	83.60 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	82.85 tok/sEstimated Auto-generated benchmark	3GB
dicta-il/dictalm2.0-instruct	Q4	82.80 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/gpt-neo-125m	Q4	82.71 tok/sEstimated Auto-generated benchmark	4GB
bigscience/bloomz-560m	Q4	82.70 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	82.56 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Meta-Llama-3-8B	Q4	82.11 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-chat-hf	Q4	81.90 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B	Q4	81.86 tok/sEstimated Auto-generated benchmark	3GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	81.64 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	81.43 tok/sEstimated Auto-generated benchmark	4GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	81.38 tok/sEstimated Auto-generated benchmark	3GB

Qwen/Qwen2.5-3B

2GB

101.43 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

100.22 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

100.11 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

99.96 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

99.32 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

99.02 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

98.78 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

97.97 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

96.76 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

96.67 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

96.14 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

95.54 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

93.81 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

93.21 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

93.03 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

92.59 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

91.82 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

91.12 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

91.12 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

89.93 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

89.71 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

89.41 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

89.24 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

88.91 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

87.99 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

86.96 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

86.92 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

85.94 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

84.34 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

84.34 tok/sEstimated

Auto-generated benchmark

allenai/Olmo-3-7B-Think

4GB

84.19 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

4GB

84.16 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

84.08 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

83.93 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

83.76 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

83.72 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

83.66 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

83.60 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

83.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

82.85 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

4GB

82.80 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

4GB

82.71 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

82.70 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

2GB

82.56 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

82.11 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

81.90 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

3GB

81.86 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

81.64 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

81.43 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

81.38 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Not supported	19.26 tok/sEstimated	35GB (have 16GB)
dphn/dolphin-2.9.1-yi-1.5-34b	FP16	Not supported	9.99 tok/sEstimated	70GB (have 16GB)
openai/gpt-oss-20b	Q4	Fits comfortably	38.59 tok/sEstimated	10GB (have 16GB)
openai/gpt-oss-20b	Q8	Not supported	27.86 tok/sEstimated	20GB (have 16GB)
openai/gpt-oss-20b	FP16	Not supported	16.95 tok/sEstimated	41GB (have 16GB)
google/gemma-3-1b-it	Q4	Fits comfortably	96.76 tok/sEstimated	1GB (have 16GB)
google/gemma-3-1b-it	FP16	Fits comfortably	36.87 tok/sEstimated	2GB (have 16GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	74.17 tok/sEstimated	3GB (have 16GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	57.22 tok/sEstimated	6GB (have 16GB)
Qwen/Qwen3-Embedding-0.6B	FP16	Fits comfortably	28.60 tok/sEstimated	13GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	70.70 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B	Q8	Fits comfortably	70.78 tok/sEstimated	1GB (have 16GB)
Qwen/Qwen3-4B	FP16	Fits comfortably	27.90 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits (tight)	46.18 tok/sEstimated	15GB (have 16GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	79.06 tok/sEstimated	4GB (have 16GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	76.74 tok/sEstimated	3GB (have 16GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	51.65 tok/sEstimated	5GB (have 16GB)
Qwen/Qwen2.5-1.5B	FP16	Fits comfortably	27.62 tok/sEstimated	11GB (have 16GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	62.68 tok/sEstimated	7GB (have 16GB)
Qwen/Qwen3-Embedding-8B	Q8	Fits comfortably	54.58 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen3-Embedding-8B	FP16	Not supported	31.39 tok/sEstimated	17GB (have 16GB)
Qwen/Qwen2-0.5B	FP16	Fits comfortably	31.18 tok/sEstimated	11GB (have 16GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	89.71 tok/sEstimated	2GB (have 16GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	66.51 tok/sEstimated	3GB (have 16GB)
deepseek-ai/deepseek-coder-1.3b-instruct	FP16	Fits comfortably	32.68 tok/sEstimated	6GB (have 16GB)
microsoft/phi-4	Q4	Fits comfortably	78.34 tok/sEstimated	4GB (have 16GB)
microsoft/phi-4	Q8	Fits comfortably	50.77 tok/sEstimated	7GB (have 16GB)
microsoft/phi-4	FP16	Fits (tight)	26.76 tok/sEstimated	15GB (have 16GB)
deepseek-ai/DeepSeek-V3.1	Q4	Fits comfortably	79.19 tok/sEstimated	4GB (have 16GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	53.78 tok/sEstimated	7GB (have 16GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits (tight)	30.68 tok/sEstimated	15GB (have 16GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	78.38 tok/sEstimated	4GB (have 16GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	58.68 tok/sEstimated	9GB (have 16GB)
meta-llama/Llama-3.1-8B	FP16	Not supported	29.36 tok/sEstimated	17GB (have 16GB)
LiquidAI/LFM2-1.2B	Q4	Fits comfortably	100.11 tok/sEstimated	1GB (have 16GB)
LiquidAI/LFM2-1.2B	Q8	Fits comfortably	65.32 tok/sEstimated	2GB (have 16GB)
LiquidAI/LFM2-1.2B	FP16	Fits comfortably	31.57 tok/sEstimated	4GB (have 16GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Not supported	28.07 tok/sEstimated	34GB (have 16GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Not supported	17.91 tok/sEstimated	68GB (have 16GB)
meta-llama/Meta-Llama-3-70B-Instruct	FP16	Not supported	10.04 tok/sEstimated	137GB (have 16GB)
unsloth/gpt-oss-20b-BF16	Q4	Fits comfortably	46.37 tok/sEstimated	10GB (have 16GB)
unsloth/gpt-oss-20b-BF16	FP16	Not supported	15.11 tok/sEstimated	41GB (have 16GB)
HuggingFaceTB/SmolLM-135M	Q4	Fits comfortably	72.01 tok/sEstimated	4GB (have 16GB)
HuggingFaceTB/SmolLM-135M	Q8	Fits comfortably	52.47 tok/sEstimated	7GB (have 16GB)
HuggingFaceTB/SmolLM-135M	FP16	Fits (tight)	26.92 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen2.5-Math-1.5B	Q4	Fits comfortably	76.90 tok/sEstimated	3GB (have 16GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	81.43 tok/sEstimated	4GB (have 16GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	52.13 tok/sEstimated	7GB (have 16GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	FP16	Fits (tight)	27.74 tok/sEstimated	15GB (have 16GB)
openai-community/gpt2	Q4	Fits comfortably	70.73 tok/sEstimated	4GB (have 16GB)

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Not supported35GB required · 16GB available

19.26 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bFP16

Not supported70GB required · 16GB available

9.99 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 16GB available

38.59 tok/sEstimated

openai/gpt-oss-20bQ8

Not supported20GB required · 16GB available

27.86 tok/sEstimated

openai/gpt-oss-20bFP16

Not supported41GB required · 16GB available

16.95 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 16GB available

96.76 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 16GB available

36.87 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 16GB available

74.17 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 16GB available

57.22 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BFP16

Fits comfortably13GB required · 16GB available

28.60 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 16GB available

70.70 tok/sEstimated

meta-llama/Llama-3.2-1BQ8

Fits comfortably1GB required · 16GB available

70.78 tok/sEstimated

Qwen/Qwen3-4BFP16

Fits comfortably9GB required · 16GB available

27.90 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits (tight)15GB required · 16GB available

46.18 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 16GB available

79.06 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 16GB available

76.74 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 16GB available

51.65 tok/sEstimated

Qwen/Qwen2.5-1.5BFP16

Fits comfortably11GB required · 16GB available

27.62 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 16GB available

62.68 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ8

Fits comfortably9GB required · 16GB available

54.58 tok/sEstimated

Qwen/Qwen3-Embedding-8BFP16

Not supported17GB required · 16GB available

31.39 tok/sEstimated

Qwen/Qwen2-0.5BFP16

Fits comfortably11GB required · 16GB available

31.18 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 16GB available

89.71 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 16GB available

66.51 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructFP16

Fits comfortably6GB required · 16GB available

32.68 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 16GB available

78.34 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 16GB available

50.77 tok/sEstimated

microsoft/phi-4FP16

Fits (tight)15GB required · 16GB available

26.76 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q4

Fits comfortably4GB required · 16GB available

79.19 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 16GB available

53.78 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits (tight)15GB required · 16GB available

30.68 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 16GB available

78.38 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably9GB required · 16GB available

58.68 tok/sEstimated

meta-llama/Llama-3.1-8BFP16

Not supported17GB required · 16GB available

29.36 tok/sEstimated

LiquidAI/LFM2-1.2BQ4

Fits comfortably1GB required · 16GB available

100.11 tok/sEstimated

LiquidAI/LFM2-1.2BQ8

Fits comfortably2GB required · 16GB available

65.32 tok/sEstimated

LiquidAI/LFM2-1.2BFP16

Fits comfortably4GB required · 16GB available

31.57 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ4

Not supported34GB required · 16GB available

28.07 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Not supported68GB required · 16GB available

17.91 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructFP16

Not supported137GB required · 16GB available

10.04 tok/sEstimated

unsloth/gpt-oss-20b-BF16Q4

Fits comfortably10GB required · 16GB available

46.37 tok/sEstimated

unsloth/gpt-oss-20b-BF16FP16

Not supported41GB required · 16GB available

15.11 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ4

Fits comfortably4GB required · 16GB available

72.01 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ8

Fits comfortably7GB required · 16GB available

52.47 tok/sEstimated

HuggingFaceTB/SmolLM-135MFP16

Fits (tight)15GB required · 16GB available

26.92 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ4

Fits comfortably3GB required · 16GB available

76.90 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 16GB available

81.43 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 16GB available

52.13 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMFP16

Fits (tight)15GB required · 16GB available

27.74 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 16GB available

70.73 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

NVIDIA A5000

24GB

Explore how NVIDIA A5000 stacks up for local inference workloads.

NVIDIA A6000

48GB

Explore how NVIDIA A6000 stacks up for local inference workloads.

RTX 4080

16GB

Explore how RTX 4080 stacks up for local inference workloads.

RTX 4070

12GB

Explore how RTX 4070 stacks up for local inference workloads.

RTX 3060 12GB

12GB

Explore how RTX 3060 12GB stacks up for local inference workloads.

Quick Answer: NVIDIA A4000 offers 16GB VRAM and starts around $21.99. It delivers approximately 101 tokens/sec on Qwen/Qwen2.5-3B. It typically draws 140W under load.

NVIDIA A4000

Unknown

By NVIDIAReleased 2021-04MSRP $999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $21.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM16GB

Cores6,144

TDP140W

ArchitectureAmpere

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

$21.99

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA A4000 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
Qwen/Qwen2.5-3B	Q4	101.43 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	100.22 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	100.11 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	99.96 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	99.32 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	99.02 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	98.78 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	97.97 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	96.76 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	96.67 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	96.14 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	95.54 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	93.81 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	93.21 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	93.03 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	92.59 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	91.82 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	91.12 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	91.12 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	89.93 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	89.71 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	89.41 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	89.24 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	88.91 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	87.99 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	86.96 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	86.92 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	85.94 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-Coder-1.5B	Q4	84.34 tok/sEstimated Auto-generated benchmark	3GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	84.34 tok/sEstimated Auto-generated benchmark	4GB
allenai/Olmo-3-7B-Think	Q4	84.19 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-4k-instruct	Q4	84.16 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-2b-instruct	Q4	84.08 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	83.93 tok/sEstimated Auto-generated benchmark	2GB
HuggingFaceH4/zephyr-7b-beta	Q4	83.76 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Base	Q4	83.72 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-8B	Q4	83.66 tok/sEstimated Auto-generated benchmark	4GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	83.60 tok/sEstimated Auto-generated benchmark	1GB
mistralai/Mistral-7B-v0.1	Q4	83.60 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	82.85 tok/sEstimated Auto-generated benchmark	3GB
dicta-il/dictalm2.0-instruct	Q4	82.80 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/gpt-neo-125m	Q4	82.71 tok/sEstimated Auto-generated benchmark	4GB
bigscience/bloomz-560m	Q4	82.70 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	82.56 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Meta-Llama-3-8B	Q4	82.11 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-chat-hf	Q4	81.90 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B	Q4	81.86 tok/sEstimated Auto-generated benchmark	3GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	81.64 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	81.43 tok/sEstimated Auto-generated benchmark	4GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	81.38 tok/sEstimated Auto-generated benchmark	3GB

Qwen/Qwen2.5-3B

2GB

101.43 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

100.22 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

100.11 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

99.96 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

99.32 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

99.02 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

98.78 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

97.97 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

96.76 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

96.67 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

96.14 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

95.54 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

93.81 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

93.21 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

93.03 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

92.59 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

91.82 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

91.12 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

91.12 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

89.93 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

89.71 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

89.41 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

89.24 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

88.91 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

87.99 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

86.96 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

86.92 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

85.94 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

84.34 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

84.34 tok/sEstimated

Auto-generated benchmark

allenai/Olmo-3-7B-Think

4GB

84.19 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

4GB

84.16 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

84.08 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

83.93 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

83.76 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

83.72 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

83.66 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

83.60 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

83.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

82.85 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

4GB

82.80 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

4GB

82.71 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

82.70 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

2GB

82.56 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

82.11 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

81.90 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

3GB

81.86 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

81.64 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

81.43 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

81.38 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Not supported	19.26 tok/sEstimated	35GB (have 16GB)
dphn/dolphin-2.9.1-yi-1.5-34b	FP16	Not supported	9.99 tok/sEstimated	70GB (have 16GB)
openai/gpt-oss-20b	Q4	Fits comfortably	38.59 tok/sEstimated	10GB (have 16GB)
openai/gpt-oss-20b	Q8	Not supported	27.86 tok/sEstimated	20GB (have 16GB)
openai/gpt-oss-20b	FP16	Not supported	16.95 tok/sEstimated	41GB (have 16GB)
google/gemma-3-1b-it	Q4	Fits comfortably	96.76 tok/sEstimated	1GB (have 16GB)
google/gemma-3-1b-it	FP16	Fits comfortably	36.87 tok/sEstimated	2GB (have 16GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	74.17 tok/sEstimated	3GB (have 16GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	57.22 tok/sEstimated	6GB (have 16GB)
Qwen/Qwen3-Embedding-0.6B	FP16	Fits comfortably	28.60 tok/sEstimated	13GB (have 16GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	70.70 tok/sEstimated	1GB (have 16GB)
meta-llama/Llama-3.2-1B	Q8	Fits comfortably	70.78 tok/sEstimated	1GB (have 16GB)
Qwen/Qwen3-4B	FP16	Fits comfortably	27.90 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits (tight)	46.18 tok/sEstimated	15GB (have 16GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	79.06 tok/sEstimated	4GB (have 16GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	76.74 tok/sEstimated	3GB (have 16GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	51.65 tok/sEstimated	5GB (have 16GB)
Qwen/Qwen2.5-1.5B	FP16	Fits comfortably	27.62 tok/sEstimated	11GB (have 16GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	62.68 tok/sEstimated	7GB (have 16GB)
Qwen/Qwen3-Embedding-8B	Q8	Fits comfortably	54.58 tok/sEstimated	9GB (have 16GB)
Qwen/Qwen3-Embedding-8B	FP16	Not supported	31.39 tok/sEstimated	17GB (have 16GB)
Qwen/Qwen2-0.5B	FP16	Fits comfortably	31.18 tok/sEstimated	11GB (have 16GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	89.71 tok/sEstimated	2GB (have 16GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	66.51 tok/sEstimated	3GB (have 16GB)
deepseek-ai/deepseek-coder-1.3b-instruct	FP16	Fits comfortably	32.68 tok/sEstimated	6GB (have 16GB)
microsoft/phi-4	Q4	Fits comfortably	78.34 tok/sEstimated	4GB (have 16GB)
microsoft/phi-4	Q8	Fits comfortably	50.77 tok/sEstimated	7GB (have 16GB)
microsoft/phi-4	FP16	Fits (tight)	26.76 tok/sEstimated	15GB (have 16GB)
deepseek-ai/DeepSeek-V3.1	Q4	Fits comfortably	79.19 tok/sEstimated	4GB (have 16GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	53.78 tok/sEstimated	7GB (have 16GB)
deepseek-ai/DeepSeek-V3.1	FP16	Fits (tight)	30.68 tok/sEstimated	15GB (have 16GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	78.38 tok/sEstimated	4GB (have 16GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	58.68 tok/sEstimated	9GB (have 16GB)
meta-llama/Llama-3.1-8B	FP16	Not supported	29.36 tok/sEstimated	17GB (have 16GB)
LiquidAI/LFM2-1.2B	Q4	Fits comfortably	100.11 tok/sEstimated	1GB (have 16GB)
LiquidAI/LFM2-1.2B	Q8	Fits comfortably	65.32 tok/sEstimated	2GB (have 16GB)
LiquidAI/LFM2-1.2B	FP16	Fits comfortably	31.57 tok/sEstimated	4GB (have 16GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Not supported	28.07 tok/sEstimated	34GB (have 16GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Not supported	17.91 tok/sEstimated	68GB (have 16GB)
meta-llama/Meta-Llama-3-70B-Instruct	FP16	Not supported	10.04 tok/sEstimated	137GB (have 16GB)
unsloth/gpt-oss-20b-BF16	Q4	Fits comfortably	46.37 tok/sEstimated	10GB (have 16GB)
unsloth/gpt-oss-20b-BF16	FP16	Not supported	15.11 tok/sEstimated	41GB (have 16GB)
HuggingFaceTB/SmolLM-135M	Q4	Fits comfortably	72.01 tok/sEstimated	4GB (have 16GB)
HuggingFaceTB/SmolLM-135M	Q8	Fits comfortably	52.47 tok/sEstimated	7GB (have 16GB)
HuggingFaceTB/SmolLM-135M	FP16	Fits (tight)	26.92 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen2.5-Math-1.5B	Q4	Fits comfortably	76.90 tok/sEstimated	3GB (have 16GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	81.43 tok/sEstimated	4GB (have 16GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	52.13 tok/sEstimated	7GB (have 16GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	FP16	Fits (tight)	27.74 tok/sEstimated	15GB (have 16GB)
openai-community/gpt2	Q4	Fits comfortably	70.73 tok/sEstimated	4GB (have 16GB)

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Not supported35GB required · 16GB available

19.26 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bFP16

Not supported70GB required · 16GB available

9.99 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 16GB available

38.59 tok/sEstimated

openai/gpt-oss-20bQ8

Not supported20GB required · 16GB available

27.86 tok/sEstimated

openai/gpt-oss-20bFP16

Not supported41GB required · 16GB available

16.95 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 16GB available

96.76 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 16GB available

36.87 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 16GB available

74.17 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 16GB available

57.22 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BFP16

Fits comfortably13GB required · 16GB available

28.60 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 16GB available

70.70 tok/sEstimated

meta-llama/Llama-3.2-1BQ8

Fits comfortably1GB required · 16GB available

70.78 tok/sEstimated

Qwen/Qwen3-4BFP16

Fits comfortably9GB required · 16GB available

27.90 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits (tight)15GB required · 16GB available

46.18 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 16GB available

79.06 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 16GB available

76.74 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 16GB available

51.65 tok/sEstimated

Qwen/Qwen2.5-1.5BFP16

Fits comfortably11GB required · 16GB available

27.62 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 16GB available

62.68 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ8

Fits comfortably9GB required · 16GB available

54.58 tok/sEstimated

Qwen/Qwen3-Embedding-8BFP16

Not supported17GB required · 16GB available

31.39 tok/sEstimated

Qwen/Qwen2-0.5BFP16

Fits comfortably11GB required · 16GB available

31.18 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 16GB available

89.71 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 16GB available

66.51 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructFP16

Fits comfortably6GB required · 16GB available

32.68 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 16GB available

78.34 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 16GB available

50.77 tok/sEstimated

microsoft/phi-4FP16

Fits (tight)15GB required · 16GB available

26.76 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q4

Fits comfortably4GB required · 16GB available

79.19 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 16GB available

53.78 tok/sEstimated

deepseek-ai/DeepSeek-V3.1FP16

Fits (tight)15GB required · 16GB available

30.68 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 16GB available

78.38 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably9GB required · 16GB available

58.68 tok/sEstimated

meta-llama/Llama-3.1-8BFP16

Not supported17GB required · 16GB available

29.36 tok/sEstimated

LiquidAI/LFM2-1.2BQ4

Fits comfortably1GB required · 16GB available

100.11 tok/sEstimated

LiquidAI/LFM2-1.2BQ8

Fits comfortably2GB required · 16GB available

65.32 tok/sEstimated

LiquidAI/LFM2-1.2BFP16

Fits comfortably4GB required · 16GB available

31.57 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ4

Not supported34GB required · 16GB available

28.07 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Not supported68GB required · 16GB available

17.91 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructFP16

Not supported137GB required · 16GB available

10.04 tok/sEstimated

unsloth/gpt-oss-20b-BF16Q4

Fits comfortably10GB required · 16GB available

46.37 tok/sEstimated

unsloth/gpt-oss-20b-BF16FP16

Not supported41GB required · 16GB available

15.11 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ4

Fits comfortably4GB required · 16GB available

72.01 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ8

Fits comfortably7GB required · 16GB available

52.47 tok/sEstimated

HuggingFaceTB/SmolLM-135MFP16

Fits (tight)15GB required · 16GB available

26.92 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ4

Fits comfortably3GB required · 16GB available

76.90 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 16GB available

81.43 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 16GB available

52.13 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMFP16

Fits (tight)15GB required · 16GB available

27.74 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 16GB available

70.73 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

NVIDIA A5000

24GB

Explore how NVIDIA A5000 stacks up for local inference workloads.

NVIDIA A6000

48GB

Explore how NVIDIA A6000 stacks up for local inference workloads.

RTX 4080

16GB

Explore how RTX 4080 stacks up for local inference workloads.

RTX 4070

12GB

Explore how RTX 4070 stacks up for local inference workloads.

RTX 3060 12GB

12GB

Explore how RTX 3060 12GB stacks up for local inference workloads.