Loading GPU data...

Quick Answer: RTX 3060 12GB offers 12GB VRAM and starts around $329.00. It delivers approximately 39 tokens/sec on meta-llama/Llama-Guard-3-1B. It typically draws 170W under load.

RTX 3060 12GB

Name: RTX 3060 12GB
Brand: NVIDIA
Rating: 4.5 (89 reviews)

In stock

By NVIDIAReleased 2021-02MSRP $329.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Specs snapshot

Key hardware metrics for AI workloads.

VRAM12GB

Cores3,584

TDP170W

ArchitectureAmpere

Price comparison

Retailer	Price	Buy
AmazonPrimary	$429.49In Stock	Buy now
Best Buy	$329.00LowestIn stock	Buy now
Newegg	$339.00In stock	Buy now

AmazonPrimary

$429.49

In Stock

Buy

Best Buy

$329.00

Lowest PriceIn stock

Buy

Newegg

$339.00

In stock

Buy

More Amazon options

Rotate out primary variants whenever validation flags an issue.

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-Guard-3-1B	Q4	38.83 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	38.62 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	38.13 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	37.84 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	37.48 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	37.36 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	36.80 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	33.20 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	32.60 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	30.30 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	29.31 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	27.58 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	26.59 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q8	26.52 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	26.15 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	25.56 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	25.47 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	25.01 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	24.63 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q8	24.44 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q8	24.07 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	23.98 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	23.73 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	23.67 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q8	23.53 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q8	23.50 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	23.47 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q8	23.47 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	23.38 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Thinking-2507	Q4	23.35 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-3-1b-it	Q8	23.29 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	23.25 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B	Q4	23.10 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q8	23.09 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	23.03 tok/sEstimated Auto-generated benchmark	2GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	23.03 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Base	Q4	22.94 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Embedding-4B	Q4	22.70 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	22.39 tok/sEstimated Auto-generated benchmark	2GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q4	22.11 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2-0.5B-Instruct	Q4	21.79 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-1.5B-Instruct	Q4	21.72 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-0.5B	Q4	21.66 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-1.5B-Instruct	Q4	21.59 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B-Instruct-2507	Q4	21.56 tok/sEstimated Auto-generated benchmark	2GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	20.86 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-0.6B	Q4	20.20 tok/sEstimated Auto-generated benchmark	3GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	20.05 tok/sEstimated Auto-generated benchmark	2GB
microsoft/VibeVoice-1.5B	Q4	19.87 tok/sEstimated Auto-generated benchmark	3GB
LiquidAI/LFM2-1.2B	Q8	19.77 tok/sEstimated Auto-generated benchmark	2GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	19.74 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-Math-1.5B	Q4	19.74 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-0.5B	Q4	19.68 tok/sEstimated Auto-generated benchmark	3GB
ibm-granite/granite-3.3-2b-instruct	Q8	19.37 tok/sEstimated Auto-generated benchmark	2GB
EleutherAI/pythia-70m-deduped	Q4	19.30 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-0.6B-Base	Q4	19.21 tok/sEstimated Auto-generated benchmark	3GB
google/gemma-2-2b-it	Q8	19.11 tok/sEstimated Auto-generated benchmark	2GB
parler-tts/parler-tts-large-v1	Q4	19.07 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-0.6B	Q4	19.05 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	19.00 tok/sEstimated Auto-generated benchmark	3GB
Gensyn/Qwen2.5-0.5B-Instruct	Q4	18.93 tok/sEstimated Auto-generated benchmark	3GB
sshleifer/tiny-gpt2	Q4	18.91 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	18.77 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	18.76 tok/sEstimated Auto-generated benchmark	4GB
facebook/opt-125m	Q4	18.71 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q4	18.70 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-4-mini-instruct	Q4	18.48 tok/sEstimated Auto-generated benchmark	4GB
bigscience/bloomz-560m	Q4	18.48 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	18.42 tok/sEstimated Auto-generated benchmark	3GB
zai-org/GLM-4.6-FP8	Q4	18.41 tok/sEstimated Auto-generated benchmark	4GB
dicta-il/dictalm2.0-instruct	Q4	18.35 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-Instruct-v0.1	Q4	18.34 tok/sEstimated Auto-generated benchmark	4GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	18.31 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	18.31 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	18.30 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	18.29 tok/sEstimated Auto-generated benchmark	4GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	18.27 tok/sEstimated Auto-generated benchmark	4GB
rinna/japanese-gpt-neox-small	Q4	18.24 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	18.16 tok/sEstimated Auto-generated benchmark	4GB
rednote-hilab/dots.ocr	Q4	18.15 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	18.14 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B	Q4	18.12 tok/sEstimated Auto-generated benchmark	4GB
inference-net/Schematron-3B	Q8	18.08 tok/sEstimated Auto-generated benchmark	3GB
GSAI-ML/LLaDA-8B-Instruct	Q4	18.06 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-1.5B	Q4	18.05 tok/sEstimated Auto-generated benchmark	3GB
distilbert/distilgpt2	Q4	18.05 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-xl	Q4	18.02 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-medium	Q4	18.00 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	17.86 tok/sEstimated Auto-generated benchmark	4GB
llamafactory/tiny-random-Llama-3	Q4	17.84 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	17.83 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-Guard-3-8B	Q4	17.77 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-4k-instruct	Q4	17.75 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-medium	Q4	17.75 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	17.74 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-128k-instruct	Q4	17.73 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2b	Q8	17.65 tok/sEstimated Auto-generated benchmark	2GB
EleutherAI/gpt-neo-125m	Q4	17.65 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Llama-3.2-3B-Instruct	Q8	17.62 tok/sEstimated Auto-generated benchmark	3GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	17.57 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceTB/SmolLM-135M	Q4	17.55 tok/sEstimated Auto-generated benchmark	4GB
liuhaotian/llava-v1.5-7b	Q4	17.51 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-3B	Q8	17.50 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-Reranker-0.6B	Q4	17.47 tok/sEstimated Auto-generated benchmark	3GB
bigcode/starcoder2-3b	Q8	17.44 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-R1-0528	Q4	17.44 tok/sEstimated Auto-generated benchmark	4GB
vikhyatk/moondream2	Q4	17.39 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-4	Q4	17.36 tok/sEstimated Auto-generated benchmark	4GB
huggyllama/llama-7b	Q4	17.35 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	17.33 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	17.28 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2	Q4	17.15 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	17.15 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceH4/zephyr-7b-beta	Q4	16.98 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-2	Q4	16.96 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-docling-258M	Q4	16.95 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B	Q8	16.93 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-1.7B-Base	Q4	16.89 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-270m-it	Q4	16.87 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	16.84 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-hf	Q4	16.80 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-chat-hf	Q4	16.69 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Base	Q4	16.68 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	16.66 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	16.59 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-7B-Instruct	Q4	16.59 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-vision-instruct	Q4	16.58 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-large	Q4	16.53 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3-0324	Q4	16.51 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B	Q8	16.47 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q8	16.39 tok/sEstimated Auto-generated benchmark	4GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	16.36 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-R1	Q4	16.34 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-Instruct-v0.2	Q4	16.34 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B-Instruct	Q4	16.32 tok/sEstimated Auto-generated benchmark	4GB
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	16.28 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-Embedding-8B	Q4	16.25 tok/sEstimated Auto-generated benchmark	4GB
google-t5/t5-3b	Q8	16.25 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-1.7B	Q4	16.23 tok/sEstimated Auto-generated benchmark	4GB
hmellor/tiny-random-LlamaForCausalLM	Q4	16.22 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-4B	Q8	16.20 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceTB/SmolLM2-135M	Q4	16.12 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	16.12 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	16.10 tok/sEstimated Auto-generated benchmark	4GB
lmsys/vicuna-7b-v1.5	Q4	16.10 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	16.09 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B-Instruct	Q8	16.07 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-4-multimodal-instruct	Q4	16.03 tok/sEstimated Auto-generated benchmark	4GB
MiniMaxAI/MiniMax-M2	Q4	16.03 tok/sEstimated Auto-generated benchmark	4GB
BSC-LT/salamandraTA-7b-instruct	Q4	16.01 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	15.98 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	15.93 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	15.92 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-v0.1	Q4	15.84 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3.1	Q4	15.82 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	15.81 tok/sEstimated Auto-generated benchmark	4GB
ibm-research/PowerMoE-3b	Q8	15.69 tok/sEstimated Auto-generated benchmark	3GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	15.65 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-3B-Instruct	Q8	15.59 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Meta-Llama-3-8B	Q4	15.58 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B	Q4	15.57 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	15.49 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-3.1-8B-Instruct	Q4	15.38 tok/sEstimated Auto-generated benchmark	4GB
IlyaGusev/saiga_llama3_8b	Q4	15.38 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	15.27 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-FP8	Q4	15.13 tok/sEstimated Auto-generated benchmark	4GB
microsoft/VibeVoice-1.5B	Q8	15.12 tok/sEstimated Auto-generated benchmark	5GB
ai-forever/ruGPT-3.5-13B	Q4	14.86 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	14.70 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-Math-1.5B	Q8	14.48 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-1.5B-Instruct	Q8	14.45 tok/sEstimated Auto-generated benchmark	5GB
OpenPipe/Qwen3-14B-Instruct	Q4	14.38 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-4B-Base	Q8	14.38 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-1.5B-Instruct	Q8	14.36 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-4B-Thinking-2507	Q8	14.35 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-14B	Q4	14.35 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-4B-Instruct-2507	Q8	14.28 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q8	14.19 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	14.15 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B-Instruct	Q8	13.91 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-14B-Base	Q4	13.87 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-14B	Q4	13.74 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-0.5B	Q8	13.54 tok/sEstimated Auto-generated benchmark	5GB
distilbert/distilgpt2	Q8	13.49 tok/sEstimated Auto-generated benchmark	7GB
EleutherAI/gpt-neo-125m	Q8	13.47 tok/sEstimated Auto-generated benchmark	7GB
microsoft/DialoGPT-small	Q8	13.44 tok/sEstimated Auto-generated benchmark	7GB
google/gemma-3-270m-it	Q8	13.40 tok/sEstimated Auto-generated benchmark	7GB
EleutherAI/pythia-70m-deduped	Q8	13.33 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2-medium	Q8	13.29 tok/sEstimated Auto-generated benchmark	7GB
ibm-granite/granite-docling-258M	Q8	13.28 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2-xl	Q8	13.26 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-2-13b-chat-hf	Q4	13.23 tok/sEstimated Auto-generated benchmark	7GB
MiniMaxAI/MiniMax-M2	Q8	13.22 tok/sEstimated Auto-generated benchmark	7GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	13.22 tok/sEstimated Auto-generated benchmark	7GB
lmsys/vicuna-7b-v1.5	Q8	13.21 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2-0.5B	Q8	13.20 tok/sEstimated Auto-generated benchmark	5GB
microsoft/Phi-3.5-vision-instruct	Q8	13.16 tok/sEstimated Auto-generated benchmark	7GB
vikhyatk/moondream2	Q8	13.15 tok/sEstimated Auto-generated benchmark	7GB
skt/kogpt2-base-v2	Q8	13.13 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-14B-Instruct	Q4	13.12 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-Coder-1.5B	Q8	13.12 tok/sEstimated Auto-generated benchmark	5GB
microsoft/Phi-4-multimodal-instruct	Q8	13.12 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3-mini-128k-instruct	Q8	13.09 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	13.06 tok/sEstimated Auto-generated benchmark	7GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q8	13.05 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-1.5B	Q8	13.02 tok/sEstimated Auto-generated benchmark	5GB
meta-llama/Llama-2-7b-hf	Q8	12.98 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1	Q8	12.98 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-0.5B-Instruct	Q8	12.95 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2-7B-Instruct	Q8	12.90 tok/sEstimated Auto-generated benchmark	7GB
huggyllama/llama-7b	Q8	12.87 tok/sEstimated Auto-generated benchmark	7GB
mistralai/Mistral-7B-v0.1	Q8	12.82 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-8B	Q8	12.80 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen3-0.6B	Q8	12.78 tok/sEstimated Auto-generated benchmark	6GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	12.75 tok/sEstimated Auto-generated benchmark	7GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	12.75 tok/sEstimated Auto-generated benchmark	7GB
Gensyn/Qwen2.5-0.5B-Instruct	Q8	12.73 tok/sEstimated Auto-generated benchmark	5GB
zai-org/GLM-4.5-Air	Q8	12.70 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceTB/SmolLM2-135M	Q8	12.68 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-V3	Q8	12.64 tok/sEstimated Auto-generated benchmark	7GB
hmellor/tiny-random-LlamaForCausalLM	Q8	12.56 tok/sEstimated Auto-generated benchmark	7GB
dicta-il/dictalm2.0-instruct	Q8	12.54 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-1.7B-Base	Q8	12.51 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	12.42 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-8B-Base	Q8	12.39 tok/sEstimated Auto-generated benchmark	8GB
microsoft/phi-2	Q8	12.39 tok/sEstimated Auto-generated benchmark	7GB
parler-tts/parler-tts-large-v1	Q8	12.31 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3-mini-4k-instruct	Q8	12.30 tok/sEstimated Auto-generated benchmark	7GB
rednote-hilab/dots.ocr	Q8	12.27 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Embedding-8B	Q8	12.22 tok/sEstimated Auto-generated benchmark	8GB
llamafactory/tiny-random-Llama-3	Q8	12.18 tok/sEstimated Auto-generated benchmark	7GB
swiss-ai/Apertus-8B-Instruct-2509	Q8	12.18 tok/sEstimated Auto-generated benchmark	8GB
meta-llama/Llama-2-7b-chat-hf	Q8	12.16 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	12.16 tok/sEstimated Auto-generated benchmark	8GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q8	12.14 tok/sEstimated Auto-generated benchmark	8GB
mistralai/Mistral-7B-Instruct-v0.1	Q8	12.13 tok/sEstimated Auto-generated benchmark	7GB
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	12.11 tok/sEstimated Auto-generated benchmark	9GB
Qwen/Qwen3-0.6B-Base	Q8	12.10 tok/sEstimated Auto-generated benchmark	6GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	12.09 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-1.7B	Q8	12.08 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2	Q8	12.05 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceH4/zephyr-7b-beta	Q8	12.02 tok/sEstimated Auto-generated benchmark	7GB
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	12.02 tok/sEstimated Auto-generated benchmark	10GB
meta-llama/Meta-Llama-3-8B	Q8	11.97 tok/sEstimated Auto-generated benchmark	8GB
BSC-LT/salamandraTA-7b-instruct	Q8	11.93 tok/sEstimated Auto-generated benchmark	7GB
unsloth/gpt-oss-20b-BF16	Q4	11.93 tok/sEstimated Auto-generated benchmark	10GB
meta-llama/Llama-Guard-3-8B	Q8	11.91 tok/sEstimated Auto-generated benchmark	8GB
facebook/opt-125m	Q8	11.90 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2-large	Q8	11.88 tok/sEstimated Auto-generated benchmark	7GB
bigscience/bloomz-560m	Q8	11.87 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3.5-mini-instruct	Q8	11.84 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-7B	Q8	11.83 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Embedding-0.6B	Q8	11.79 tok/sEstimated Auto-generated benchmark	6GB
petals-team/StableBeluga2	Q8	11.77 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Reranker-0.6B	Q8	11.75 tok/sEstimated Auto-generated benchmark	6GB
deepseek-ai/DeepSeek-R1-0528	Q8	11.72 tok/sEstimated Auto-generated benchmark	7GB
sshleifer/tiny-gpt2	Q8	11.70 tok/sEstimated Auto-generated benchmark	7GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	11.65 tok/sEstimated Auto-generated benchmark	8GB
mistralai/Mistral-7B-Instruct-v0.2	Q8	11.64 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-3.1-8B-Instruct	Q8	11.61 tok/sEstimated Auto-generated benchmark	8GB
deepseek-ai/DeepSeek-V3.1	Q8	11.61 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q8	11.61 tok/sEstimated Auto-generated benchmark	7GB
microsoft/DialoGPT-medium	Q8	11.60 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Meta-Llama-3-8B-Instruct	Q8	11.55 tok/sEstimated Auto-generated benchmark	8GB
microsoft/phi-4	Q8	11.50 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceTB/SmolLM-135M	Q8	11.48 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-4-mini-instruct	Q8	11.38 tok/sEstimated Auto-generated benchmark	7GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	11.36 tok/sEstimated Auto-generated benchmark	8GB
deepseek-ai/DeepSeek-V3-0324	Q8	11.35 tok/sEstimated Auto-generated benchmark	7GB
liuhaotian/llava-v1.5-7b	Q8	11.33 tok/sEstimated Auto-generated benchmark	7GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	11.32 tok/sEstimated Auto-generated benchmark	8GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	11.29 tok/sEstimated Auto-generated benchmark	8GB
meta-llama/Llama-3.1-8B	Q8	11.28 tok/sEstimated Auto-generated benchmark	8GB
rinna/japanese-gpt-neox-small	Q8	11.25 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-7B-Instruct	Q8	11.25 tok/sEstimated Auto-generated benchmark	7GB
numind/NuExtract-1.5	Q8	11.25 tok/sEstimated Auto-generated benchmark	7GB
zai-org/GLM-4.6-FP8	Q8	11.17 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	11.08 tok/sEstimated Auto-generated benchmark	7GB
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q4	11.00 tok/sEstimated Auto-generated benchmark	10GB
IlyaGusev/saiga_llama3_8b	Q8	10.90 tok/sEstimated Auto-generated benchmark	8GB
openai/gpt-oss-20b	Q4	10.87 tok/sEstimated Auto-generated benchmark	10GB
GSAI-ML/LLaDA-8B-Instruct	Q8	10.87 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen3-8B-FP8	Q8	10.71 tok/sEstimated Auto-generated benchmark	8GB
GSAI-ML/LLaDA-8B-Base	Q8	10.71 tok/sEstimated Auto-generated benchmark	8GB
ibm-granite/granite-3.3-8b-instruct	Q8	10.61 tok/sEstimated Auto-generated benchmark	8GB

meta-llama/Llama-Guard-3-1B

1GB

38.83 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

38.62 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

38.13 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

37.84 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

37.48 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

37.36 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

36.80 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

33.20 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

32.60 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

30.30 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

29.31 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

27.58 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

26.59 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

26.52 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

26.15 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

25.56 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

25.47 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

25.01 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

24.63 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

24.44 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

24.07 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

23.98 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

23.73 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

23.67 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

23.53 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

23.50 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

23.47 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

23.47 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

23.38 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

23.35 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

23.29 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

23.25 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

23.10 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

23.09 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

23.03 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

2GB

23.03 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

22.94 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

22.70 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

22.39 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit

2GB

22.11 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

3GB

21.79 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

3GB

21.72 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

21.66 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

21.59 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

2GB

21.56 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

20.86 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

20.20 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

20.05 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

19.87 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

2GB

19.77 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

2GB

19.74 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

3GB

19.74 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

3GB

19.68 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

2GB

19.37 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

19.30 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

19.21 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

2GB

19.11 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

19.07 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

3GB

19.05 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3GB

19.00 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

3GB

18.93 tok/sEstimated

Auto-generated benchmark

sshleifer/tiny-gpt2

4GB

18.91 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

18.77 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

18.76 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

4GB

18.71 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

18.70 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-mini-instruct

4GB

18.48 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

18.48 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

18.42 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

4GB

18.41 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

4GB

18.35 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

4GB

18.34 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

18.31 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

4GB

18.31 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

18.30 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

18.29 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

4GB

18.27 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

4GB

18.24 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

18.16 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

4GB

18.15 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

18.14 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

4GB

18.12 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

3GB

18.08 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

18.06 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

18.05 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

4GB

18.05 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

18.02 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-medium

4GB

18.00 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

17.86 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

4GB

17.84 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

17.83 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

17.77 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

4GB

17.75 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

17.75 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

17.74 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

17.73 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

2GB

17.65 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

4GB

17.65 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

3GB

17.62 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

4GB

17.57 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

4GB

17.55 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

4GB

17.51 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

3GB

17.50 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

17.47 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

3GB

17.44 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

17.44 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

4GB

17.39 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

4GB

17.36 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

4GB

17.35 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

17.33 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

4GB

17.28 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

17.15 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

4GB

17.15 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

16.98 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

16.96 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

4GB

16.95 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

3GB

16.93 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

16.89 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

4GB

16.87 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

16.84 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

4GB

16.80 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

16.69 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

16.68 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

16.66 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

16.59 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

4GB

16.59 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

16.58 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

4GB

16.53 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

16.51 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

4GB

16.47 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit

4GB

16.39 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

3GB

16.36 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

4GB

16.34 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

4GB

16.34 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

4GB

16.32 tok/sEstimated

Auto-generated benchmark

nvidia/NVIDIA-Nemotron-Nano-9B-v2

5GB

16.28 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

4GB

16.25 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

3GB

16.25 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

16.23 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

4GB

16.22 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

4GB

16.20 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM2-135M

4GB

16.12 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

16.12 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

16.10 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

4GB

16.10 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

4GB

16.09 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

3GB

16.07 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

16.03 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

4GB

16.03 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

16.01 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

15.98 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

15.93 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

4GB

15.92 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

15.84 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

4GB

15.82 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

15.81 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

3GB

15.69 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

4GB

15.65 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

3GB

15.59 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

15.58 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

15.57 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

3GB

15.49 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

4GB

15.38 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

4GB

15.38 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

4GB

15.27 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

15.13 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

5GB

15.12 tok/sEstimated

Auto-generated benchmark

ai-forever/ruGPT-3.5-13B

7GB

14.86 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

5GB

14.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

5GB

14.48 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

5GB

14.45 tok/sEstimated

Auto-generated benchmark

OpenPipe/Qwen3-14B-Instruct

7GB

14.38 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

4GB

14.38 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

5GB

14.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

4GB

14.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B

7GB

14.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

4GB

14.28 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

4GB

14.19 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

4GB

14.15 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

5GB

13.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B-Base

7GB

13.87 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B

7GB

13.74 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

5GB

13.54 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

7GB

13.49 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

7GB

13.47 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

7GB

13.44 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

7GB

13.40 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

7GB

13.33 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

7GB

13.29 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

7GB

13.28 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

7GB

13.26 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-13b-chat-hf

7GB

13.23 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

7GB

13.22 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

7GB

13.22 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

7GB

13.21 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

5GB

13.20 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

7GB

13.16 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

7GB

13.15 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

7GB

13.13 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B-Instruct

7GB

13.12 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

5GB

13.12 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

7GB

13.12 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

7GB

13.09 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

7GB

13.06 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

5GB

13.05 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

5GB

13.02 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

7GB

12.98 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

7GB

12.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

5GB

12.95 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

7GB

12.90 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

7GB

12.87 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

7GB

12.82 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

8GB

12.80 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

6GB

12.78 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

7GB

12.75 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

7GB

12.75 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

5GB

12.73 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

7GB

12.70 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM2-135M

7GB

12.68 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

7GB

12.64 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

7GB

12.56 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

7GB

12.54 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

7GB

12.51 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

7GB

12.42 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

8GB

12.39 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

7GB

12.39 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

7GB

12.31 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

7GB

12.30 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

7GB

12.27 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

8GB

12.22 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

7GB

12.18 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

8GB

12.18 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

7GB

12.16 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

8GB

12.16 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

8GB

12.14 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

7GB

12.13 tok/sEstimated

Auto-generated benchmark

nvidia/NVIDIA-Nemotron-Nano-9B-v2

9GB

12.11 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

6GB

12.10 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

7GB

12.09 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

7GB

12.08 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

7GB

12.05 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

7GB

12.02 tok/sEstimated

Auto-generated benchmark

mlx-community/gpt-oss-20b-MXFP4-Q8

10GB

12.02 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

8GB

11.97 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

7GB

11.93 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-BF16

10GB

11.93 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

8GB

11.91 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

7GB

11.90 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

7GB

11.88 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

7GB

11.87 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

7GB

11.84 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

7GB

11.83 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

6GB

11.79 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

7GB

11.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

6GB

11.75 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

7GB

11.72 tok/sEstimated

Auto-generated benchmark

sshleifer/tiny-gpt2

7GB

11.70 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

8GB

11.65 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

7GB

11.64 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

8GB

11.61 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

7GB

11.61 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

7GB

11.61 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-medium

7GB

11.60 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

8GB

11.55 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

7GB

11.50 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

7GB

11.48 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-mini-instruct

7GB

11.38 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

8GB

11.36 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

7GB

11.35 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

7GB

11.33 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

8GB

11.32 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

8GB

11.29 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

8GB

11.28 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

7GB

11.25 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

7GB

11.25 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

7GB

11.25 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

7GB

11.17 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

7GB

11.08 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-unsloth-bnb-4bit

10GB

11.00 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

8GB

10.90 tok/sEstimated

Auto-generated benchmark

openai/gpt-oss-20b

10GB

10.87 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

8GB

10.87 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

8GB

10.71 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

8GB

10.71 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

8GB

10.61 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q8	Not supported	—	80GB (have 12GB)
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q4	Not supported	—	40GB (have 12GB)
ai-forever/ruGPT-3.5-13B	Q8	Not supported	—	13GB (have 12GB)
ai-forever/ruGPT-3.5-13B	Q4	Fits comfortably	14.86 tok/sEstimated	7GB (have 12GB)
baichuan-inc/Baichuan-M2-32B	Q8	Not supported	—	32GB (have 12GB)
baichuan-inc/Baichuan-M2-32B	Q4	Not supported	—	16GB (have 12GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	13.06 tok/sEstimated	7GB (have 12GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	18.77 tok/sEstimated	4GB (have 12GB)
ibm-granite/granite-3.3-8b-instruct	Q8	Fits comfortably	10.61 tok/sEstimated	8GB (have 12GB)
ibm-granite/granite-3.3-8b-instruct	Q4	Fits comfortably	17.83 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-1.7B-Base	Q8	Fits comfortably	12.51 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen3-1.7B-Base	Q4	Fits comfortably	16.89 tok/sEstimated	4GB (have 12GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q8	Not supported	—	20GB (have 12GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q4	Fits comfortably	11.00 tok/sEstimated	10GB (have 12GB)
BSC-LT/salamandraTA-7b-instruct	Q8	Fits comfortably	11.93 tok/sEstimated	7GB (have 12GB)
BSC-LT/salamandraTA-7b-instruct	Q4	Fits comfortably	16.01 tok/sEstimated	4GB (have 12GB)
dicta-il/dictalm2.0-instruct	Q8	Fits comfortably	12.54 tok/sEstimated	7GB (have 12GB)
dicta-il/dictalm2.0-instruct	Q4	Fits comfortably	18.35 tok/sEstimated	4GB (have 12GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q8	Not supported	—	30GB (have 12GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q4	Not supported	—	15GB (have 12GB)
GSAI-ML/LLaDA-8B-Base	Q8	Fits comfortably	10.71 tok/sEstimated	8GB (have 12GB)
GSAI-ML/LLaDA-8B-Base	Q4	Fits comfortably	16.68 tok/sEstimated	4GB (have 12GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q8	Not supported	—	30GB (have 12GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q4	Not supported	—	15GB (have 12GB)
Qwen/Qwen2-0.5B-Instruct	Q8	Fits comfortably	13.91 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2-0.5B-Instruct	Q4	Fits comfortably	21.79 tok/sEstimated	3GB (have 12GB)
deepseek-ai/DeepSeek-V3	Q8	Fits comfortably	12.64 tok/sEstimated	7GB (have 12GB)
deepseek-ai/DeepSeek-V3	Q4	Fits comfortably	16.66 tok/sEstimated	4GB (have 12GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q8	Fits comfortably	13.05 tok/sEstimated	5GB (have 12GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	Fits comfortably	20.86 tok/sEstimated	3GB (have 12GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Not supported	—	30GB (have 12GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	Not supported	—	15GB (have 12GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	Q8	Not supported	—	30GB (have 12GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	Q4	Not supported	—	15GB (have 12GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q8	Not supported	—	30GB (have 12GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q4	Not supported	—	15GB (have 12GB)
AI-MO/Kimina-Prover-72B	Q8	Not supported	—	72GB (have 12GB)
AI-MO/Kimina-Prover-72B	Q4	Not supported	—	36GB (have 12GB)
apple/OpenELM-1_1B-Instruct	Q8	Fits comfortably	24.44 tok/sEstimated	1GB (have 12GB)
apple/OpenELM-1_1B-Instruct	Q4	Fits comfortably	38.13 tok/sEstimated	1GB (have 12GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	11.65 tok/sEstimated	8GB (have 12GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	18.27 tok/sEstimated	4GB (have 12GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	Fits comfortably	12.11 tok/sEstimated	9GB (have 12GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	Fits comfortably	16.28 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2.5-3B	Q8	Fits comfortably	16.93 tok/sEstimated	3GB (have 12GB)
Qwen/Qwen2.5-3B	Q4	Fits comfortably	23.03 tok/sEstimated	2GB (have 12GB)
lmsys/vicuna-7b-v1.5	Q8	Fits comfortably	13.21 tok/sEstimated	7GB (have 12GB)
lmsys/vicuna-7b-v1.5	Q4	Fits comfortably	16.10 tok/sEstimated	4GB (have 12GB)
meta-llama/Llama-2-13b-chat-hf	Q8	Not supported	—	13GB (have 12GB)
meta-llama/Llama-2-13b-chat-hf	Q4	Fits comfortably	13.23 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q8	Not supported	—	80GB (have 12GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q4	Not supported	—	40GB (have 12GB)
unsloth/gemma-3-1b-it	Q8	Fits comfortably	23.53 tok/sEstimated	1GB (have 12GB)
unsloth/gemma-3-1b-it	Q4	Fits comfortably	37.36 tok/sEstimated	1GB (have 12GB)
bigcode/starcoder2-3b	Q8	Fits comfortably	17.44 tok/sEstimated	3GB (have 12GB)
bigcode/starcoder2-3b	Q4	Fits comfortably	23.98 tok/sEstimated	2GB (have 12GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q8	Not supported	—	80GB (have 12GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q4	Not supported	—	40GB (have 12GB)
ibm-granite/granite-docling-258M	Q8	Fits comfortably	13.28 tok/sEstimated	7GB (have 12GB)
ibm-granite/granite-docling-258M	Q4	Fits comfortably	16.95 tok/sEstimated	4GB (have 12GB)
skt/kogpt2-base-v2	Q8	Fits comfortably	13.13 tok/sEstimated	7GB (have 12GB)
skt/kogpt2-base-v2	Q4	Fits comfortably	16.59 tok/sEstimated	4GB (have 12GB)
google/gemma-3-270m-it	Q8	Fits comfortably	13.40 tok/sEstimated	7GB (have 12GB)
google/gemma-3-270m-it	Q4	Fits comfortably	16.87 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	Fits comfortably	14.15 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	Fits comfortably	23.67 tok/sEstimated	2GB (have 12GB)
Qwen/Qwen2.5-32B	Q8	Not supported	—	32GB (have 12GB)
Qwen/Qwen2.5-32B	Q4	Not supported	—	16GB (have 12GB)
parler-tts/parler-tts-large-v1	Q8	Fits comfortably	12.31 tok/sEstimated	7GB (have 12GB)
parler-tts/parler-tts-large-v1	Q4	Fits comfortably	19.07 tok/sEstimated	4GB (have 12GB)
EleutherAI/pythia-70m-deduped	Q8	Fits comfortably	13.33 tok/sEstimated	7GB (have 12GB)
EleutherAI/pythia-70m-deduped	Q4	Fits comfortably	19.30 tok/sEstimated	4GB (have 12GB)
microsoft/VibeVoice-1.5B	Q8	Fits comfortably	15.12 tok/sEstimated	5GB (have 12GB)
microsoft/VibeVoice-1.5B	Q4	Fits comfortably	19.87 tok/sEstimated	3GB (have 12GB)
ibm-granite/granite-3.3-2b-instruct	Q8	Fits comfortably	19.37 tok/sEstimated	2GB (have 12GB)
ibm-granite/granite-3.3-2b-instruct	Q4	Fits comfortably	26.59 tok/sEstimated	1GB (have 12GB)
Qwen/Qwen2.5-72B-Instruct	Q8	Not supported	—	72GB (have 12GB)
Qwen/Qwen2.5-72B-Instruct	Q4	Not supported	—	36GB (have 12GB)
liuhaotian/llava-v1.5-7b	Q8	Fits comfortably	11.33 tok/sEstimated	7GB (have 12GB)
liuhaotian/llava-v1.5-7b	Q4	Fits comfortably	17.51 tok/sEstimated	4GB (have 12GB)
google/gemma-2b	Q8	Fits comfortably	17.65 tok/sEstimated	2GB (have 12GB)
google/gemma-2b	Q4	Fits comfortably	29.31 tok/sEstimated	1GB (have 12GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	Fits comfortably	13.22 tok/sEstimated	7GB (have 12GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	Fits comfortably	17.86 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-235B-A22B	Q8	Not supported	—	235GB (have 12GB)
Qwen/Qwen3-235B-A22B	Q4	Not supported	—	118GB (have 12GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	Fits comfortably	11.32 tok/sEstimated	8GB (have 12GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	15.27 tok/sEstimated	4GB (have 12GB)
microsoft/Phi-4-mini-instruct	Q8	Fits comfortably	11.38 tok/sEstimated	7GB (have 12GB)
microsoft/Phi-4-mini-instruct	Q4	Fits comfortably	18.48 tok/sEstimated	4GB (have 12GB)
llamafactory/tiny-random-Llama-3	Q8	Fits comfortably	12.18 tok/sEstimated	7GB (have 12GB)
llamafactory/tiny-random-Llama-3	Q4	Fits comfortably	17.84 tok/sEstimated	4GB (have 12GB)
HuggingFaceH4/zephyr-7b-beta	Q8	Fits comfortably	12.02 tok/sEstimated	7GB (have 12GB)
HuggingFaceH4/zephyr-7b-beta	Q4	Fits comfortably	16.98 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-4B-Thinking-2507	Q8	Fits comfortably	14.35 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-4B-Thinking-2507	Q4	Fits comfortably	23.35 tok/sEstimated	2GB (have 12GB)
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q8	Not supported	—	30GB (have 12GB)
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q4	Not supported	—	15GB (have 12GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	Fits comfortably	11.29 tok/sEstimated	8GB (have 12GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	Fits comfortably	17.15 tok/sEstimated	4GB (have 12GB)
unsloth/Llama-3.2-1B-Instruct	Q8	Fits comfortably	23.47 tok/sEstimated	1GB (have 12GB)
unsloth/Llama-3.2-1B-Instruct	Q4	Fits comfortably	36.80 tok/sEstimated	1GB (have 12GB)
GSAI-ML/LLaDA-8B-Instruct	Q8	Fits comfortably	10.87 tok/sEstimated	8GB (have 12GB)
GSAI-ML/LLaDA-8B-Instruct	Q4	Fits comfortably	18.06 tok/sEstimated	4GB (have 12GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q8	Not supported	—	90GB (have 12GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q4	Not supported	—	45GB (have 12GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	Fits comfortably	11.08 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	Fits comfortably	16.09 tok/sEstimated	4GB (have 12GB)
numind/NuExtract-1.5	Q8	Fits comfortably	11.25 tok/sEstimated	7GB (have 12GB)
numind/NuExtract-1.5	Q4	Fits comfortably	15.98 tok/sEstimated	4GB (have 12GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q8	Fits comfortably	11.61 tok/sEstimated	7GB (have 12GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	Fits comfortably	18.14 tok/sEstimated	4GB (have 12GB)
hmellor/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	12.56 tok/sEstimated	7GB (have 12GB)
hmellor/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	16.22 tok/sEstimated	4GB (have 12GB)
huggyllama/llama-7b	Q8	Fits comfortably	12.87 tok/sEstimated	7GB (have 12GB)
huggyllama/llama-7b	Q4	Fits comfortably	17.35 tok/sEstimated	4GB (have 12GB)
deepseek-ai/DeepSeek-V3-0324	Q8	Fits comfortably	11.35 tok/sEstimated	7GB (have 12GB)
deepseek-ai/DeepSeek-V3-0324	Q4	Fits comfortably	16.51 tok/sEstimated	4GB (have 12GB)
microsoft/Phi-3-mini-128k-instruct	Q8	Fits comfortably	13.09 tok/sEstimated	7GB (have 12GB)
microsoft/Phi-3-mini-128k-instruct	Q4	Fits comfortably	17.73 tok/sEstimated	4GB (have 12GB)
sshleifer/tiny-gpt2	Q8	Fits comfortably	11.70 tok/sEstimated	7GB (have 12GB)
sshleifer/tiny-gpt2	Q4	Fits comfortably	18.91 tok/sEstimated	4GB (have 12GB)
meta-llama/Llama-Guard-3-8B	Q8	Fits comfortably	11.91 tok/sEstimated	8GB (have 12GB)
meta-llama/Llama-Guard-3-8B	Q4	Fits comfortably	17.77 tok/sEstimated	4GB (have 12GB)
openai-community/gpt2-xl	Q8	Fits comfortably	13.26 tok/sEstimated	7GB (have 12GB)
openai-community/gpt2-xl	Q4	Fits comfortably	18.02 tok/sEstimated	4GB (have 12GB)
OpenPipe/Qwen3-14B-Instruct	Q8	Not supported	—	14GB (have 12GB)
OpenPipe/Qwen3-14B-Instruct	Q4	Fits comfortably	14.38 tok/sEstimated	7GB (have 12GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q8	Not supported	—	70GB (have 12GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q4	Not supported	—	35GB (have 12GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q8	Fits comfortably	16.39 tok/sEstimated	4GB (have 12GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q4	Fits comfortably	22.11 tok/sEstimated	2GB (have 12GB)
ibm-research/PowerMoE-3b	Q8	Fits comfortably	15.69 tok/sEstimated	3GB (have 12GB)
ibm-research/PowerMoE-3b	Q4	Fits comfortably	23.38 tok/sEstimated	2GB (have 12GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q8	Fits comfortably	14.19 tok/sEstimated	4GB (have 12GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	Fits comfortably	23.03 tok/sEstimated	2GB (have 12GB)
unsloth/Llama-3.2-3B-Instruct	Q8	Fits comfortably	17.62 tok/sEstimated	3GB (have 12GB)
unsloth/Llama-3.2-3B-Instruct	Q4	Fits comfortably	25.01 tok/sEstimated	2GB (have 12GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	Fits comfortably	15.92 tok/sEstimated	4GB (have 12GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	Fits comfortably	19.74 tok/sEstimated	2GB (have 12GB)
meta-llama/Llama-3.2-3B	Q8	Fits comfortably	17.50 tok/sEstimated	3GB (have 12GB)
meta-llama/Llama-3.2-3B	Q4	Fits comfortably	23.73 tok/sEstimated	2GB (have 12GB)
EleutherAI/gpt-neo-125m	Q8	Fits comfortably	13.47 tok/sEstimated	7GB (have 12GB)
EleutherAI/gpt-neo-125m	Q4	Fits comfortably	17.65 tok/sEstimated	4GB (have 12GB)
codellama/CodeLlama-34b-hf	Q8	Not supported	—	34GB (have 12GB)
codellama/CodeLlama-34b-hf	Q4	Not supported	—	17GB (have 12GB)
meta-llama/Llama-Guard-3-1B	Q8	Fits comfortably	24.07 tok/sEstimated	1GB (have 12GB)
meta-llama/Llama-Guard-3-1B	Q4	Fits comfortably	38.83 tok/sEstimated	1GB (have 12GB)
Qwen/Qwen2-1.5B-Instruct	Q8	Fits comfortably	14.36 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2-1.5B-Instruct	Q4	Fits comfortably	21.59 tok/sEstimated	3GB (have 12GB)
google/gemma-2-2b-it	Q8	Fits comfortably	19.11 tok/sEstimated	2GB (have 12GB)
google/gemma-2-2b-it	Q4	Fits comfortably	30.30 tok/sEstimated	1GB (have 12GB)
Qwen/Qwen2.5-14B	Q8	Not supported	—	14GB (have 12GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	13.74 tok/sEstimated	7GB (have 12GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	Not supported	—	32GB (have 12GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q4	Not supported	—	16GB (have 12GB)
microsoft/Phi-3.5-mini-instruct	Q8	Fits comfortably	11.84 tok/sEstimated	7GB (have 12GB)
microsoft/Phi-3.5-mini-instruct	Q4	Fits comfortably	17.74 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-4B-Base	Q8	Fits comfortably	14.38 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	22.94 tok/sEstimated	2GB (have 12GB)
Qwen/Qwen2-7B-Instruct	Q8	Fits comfortably	12.90 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen2-7B-Instruct	Q4	Fits comfortably	16.59 tok/sEstimated	4GB (have 12GB)
meta-llama/Llama-2-7b-chat-hf	Q8	Fits comfortably	12.16 tok/sEstimated	7GB (have 12GB)
meta-llama/Llama-2-7b-chat-hf	Q4	Fits comfortably	16.69 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-14B-Base	Q8	Not supported	—	14GB (have 12GB)
Qwen/Qwen3-14B-Base	Q4	Fits comfortably	13.87 tok/sEstimated	7GB (have 12GB)
swiss-ai/Apertus-8B-Instruct-2509	Q8	Fits comfortably	12.18 tok/sEstimated	8GB (have 12GB)
swiss-ai/Apertus-8B-Instruct-2509	Q4	Fits comfortably	18.31 tok/sEstimated	4GB (have 12GB)
microsoft/Phi-3.5-vision-instruct	Q8	Fits comfortably	13.16 tok/sEstimated	7GB (have 12GB)
microsoft/Phi-3.5-vision-instruct	Q4	Fits comfortably	16.58 tok/sEstimated	4GB (have 12GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	Fits comfortably	12.09 tok/sEstimated	7GB (have 12GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	Fits comfortably	17.57 tok/sEstimated	4GB (have 12GB)
rinna/japanese-gpt-neox-small	Q8	Fits comfortably	11.25 tok/sEstimated	7GB (have 12GB)
rinna/japanese-gpt-neox-small	Q4	Fits comfortably	18.24 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen2.5-Coder-1.5B	Q8	Fits comfortably	13.12 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2.5-Coder-1.5B	Q4	Fits comfortably	18.05 tok/sEstimated	3GB (have 12GB)
IlyaGusev/saiga_llama3_8b	Q8	Fits comfortably	10.90 tok/sEstimated	8GB (have 12GB)
IlyaGusev/saiga_llama3_8b	Q4	Fits comfortably	15.38 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-30B-A3B	Q8	Not supported	—	30GB (have 12GB)
Qwen/Qwen3-30B-A3B	Q4	Not supported	—	15GB (have 12GB)
deepseek-ai/DeepSeek-R1	Q8	Fits comfortably	12.98 tok/sEstimated	7GB (have 12GB)
deepseek-ai/DeepSeek-R1	Q4	Fits comfortably	16.34 tok/sEstimated	4GB (have 12GB)
microsoft/DialoGPT-small	Q8	Fits comfortably	13.44 tok/sEstimated	7GB (have 12GB)
microsoft/DialoGPT-small	Q4	Fits comfortably	15.93 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-8B-FP8	Q8	Fits comfortably	10.71 tok/sEstimated	8GB (have 12GB)
Qwen/Qwen3-8B-FP8	Q4	Fits comfortably	15.13 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q8	Not supported	—	30GB (have 12GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q4	Not supported	—	15GB (have 12GB)
Qwen/Qwen3-Embedding-4B	Q8	Fits comfortably	16.20 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-Embedding-4B	Q4	Fits comfortably	22.70 tok/sEstimated	2GB (have 12GB)
microsoft/Phi-4-multimodal-instruct	Q8	Fits comfortably	13.12 tok/sEstimated	7GB (have 12GB)
microsoft/Phi-4-multimodal-instruct	Q4	Fits comfortably	16.03 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-8B-Base	Q8	Fits comfortably	12.39 tok/sEstimated	8GB (have 12GB)
Qwen/Qwen3-8B-Base	Q4	Fits comfortably	18.16 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-0.6B-Base	Q8	Fits comfortably	12.10 tok/sEstimated	6GB (have 12GB)
Qwen/Qwen3-0.6B-Base	Q4	Fits comfortably	19.21 tok/sEstimated	3GB (have 12GB)
openai-community/gpt2-medium	Q8	Fits comfortably	13.29 tok/sEstimated	7GB (have 12GB)
openai-community/gpt2-medium	Q4	Fits comfortably	17.75 tok/sEstimated	4GB (have 12GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	12.75 tok/sEstimated	7GB (have 12GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	18.30 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen2.5-Math-1.5B	Q8	Fits comfortably	14.48 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2.5-Math-1.5B	Q4	Fits comfortably	19.74 tok/sEstimated	3GB (have 12GB)
HuggingFaceTB/SmolLM-135M	Q8	Fits comfortably	11.48 tok/sEstimated	7GB (have 12GB)
HuggingFaceTB/SmolLM-135M	Q4	Fits comfortably	17.55 tok/sEstimated	4GB (have 12GB)
unsloth/gpt-oss-20b-BF16	Q8	Not supported	—	20GB (have 12GB)
unsloth/gpt-oss-20b-BF16	Q4	Fits comfortably	11.93 tok/sEstimated	10GB (have 12GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Not supported	—	70GB (have 12GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Not supported	—	35GB (have 12GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	11.36 tok/sEstimated	8GB (have 12GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	18.31 tok/sEstimated	4GB (have 12GB)
zai-org/GLM-4.5-Air	Q8	Fits comfortably	12.70 tok/sEstimated	7GB (have 12GB)
zai-org/GLM-4.5-Air	Q4	Fits comfortably	16.12 tok/sEstimated	4GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.1	Q8	Fits comfortably	12.13 tok/sEstimated	7GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.1	Q4	Fits comfortably	18.34 tok/sEstimated	4GB (have 12GB)
LiquidAI/LFM2-1.2B	Q8	Fits comfortably	19.77 tok/sEstimated	2GB (have 12GB)
LiquidAI/LFM2-1.2B	Q4	Fits comfortably	27.58 tok/sEstimated	1GB (have 12GB)
mistralai/Mistral-7B-v0.1	Q8	Fits comfortably	12.82 tok/sEstimated	7GB (have 12GB)
mistralai/Mistral-7B-v0.1	Q4	Fits comfortably	15.84 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Not supported	—	32GB (have 12GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Not supported	—	16GB (have 12GB)
deepseek-ai/DeepSeek-R1-0528	Q8	Fits comfortably	11.72 tok/sEstimated	7GB (have 12GB)
deepseek-ai/DeepSeek-R1-0528	Q4	Fits comfortably	17.44 tok/sEstimated	4GB (have 12GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	11.28 tok/sEstimated	8GB (have 12GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	17.33 tok/sEstimated	4GB (have 12GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	11.61 tok/sEstimated	7GB (have 12GB)
deepseek-ai/DeepSeek-V3.1	Q4	Fits comfortably	15.82 tok/sEstimated	4GB (have 12GB)
microsoft/phi-4	Q8	Fits comfortably	11.50 tok/sEstimated	7GB (have 12GB)
microsoft/phi-4	Q4	Fits comfortably	17.36 tok/sEstimated	4GB (have 12GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	15.49 tok/sEstimated	3GB (have 12GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	22.39 tok/sEstimated	2GB (have 12GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	13.20 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	19.68 tok/sEstimated	3GB (have 12GB)
MiniMaxAI/MiniMax-M2	Q8	Fits comfortably	13.22 tok/sEstimated	7GB (have 12GB)
MiniMaxAI/MiniMax-M2	Q4	Fits comfortably	16.03 tok/sEstimated	4GB (have 12GB)
microsoft/DialoGPT-medium	Q8	Fits comfortably	11.60 tok/sEstimated	7GB (have 12GB)
microsoft/DialoGPT-medium	Q4	Fits comfortably	18.00 tok/sEstimated	4GB (have 12GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	11.17 tok/sEstimated	7GB (have 12GB)
zai-org/GLM-4.6-FP8	Q4	Fits comfortably	18.41 tok/sEstimated	4GB (have 12GB)
HuggingFaceTB/SmolLM2-135M	Q8	Fits comfortably	12.68 tok/sEstimated	7GB (have 12GB)
HuggingFaceTB/SmolLM2-135M	Q4	Fits comfortably	16.12 tok/sEstimated	4GB (have 12GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	12.16 tok/sEstimated	8GB (have 12GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	18.29 tok/sEstimated	4GB (have 12GB)
meta-llama/Llama-2-7b-hf	Q8	Fits comfortably	12.98 tok/sEstimated	7GB (have 12GB)
meta-llama/Llama-2-7b-hf	Q4	Fits comfortably	16.80 tok/sEstimated	4GB (have 12GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	12.42 tok/sEstimated	7GB (have 12GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	18.76 tok/sEstimated	4GB (have 12GB)
microsoft/phi-2	Q8	Fits comfortably	12.39 tok/sEstimated	7GB (have 12GB)
microsoft/phi-2	Q4	Fits comfortably	16.96 tok/sEstimated	4GB (have 12GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	—	70GB (have 12GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Not supported	—	35GB (have 12GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	13.54 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2.5-0.5B	Q4	Fits comfortably	21.66 tok/sEstimated	3GB (have 12GB)
Qwen/Qwen3-14B	Q8	Not supported	—	14GB (have 12GB)
Qwen/Qwen3-14B	Q4	Fits comfortably	14.35 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen3-Embedding-8B	Q8	Fits comfortably	12.22 tok/sEstimated	8GB (have 12GB)
Qwen/Qwen3-Embedding-8B	Q4	Fits comfortably	16.25 tok/sEstimated	4GB (have 12GB)
meta-llama/Llama-3.3-70B-Instruct	Q8	Not supported	—	70GB (have 12GB)
meta-llama/Llama-3.3-70B-Instruct	Q4	Not supported	—	35GB (have 12GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q8	Fits comfortably	12.14 tok/sEstimated	8GB (have 12GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	Fits comfortably	15.81 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Not supported	—	14GB (have 12GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	13.12 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen2.5-1.5B	Q8	Fits comfortably	13.02 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	18.70 tok/sEstimated	3GB (have 12GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	Fits comfortably	15.65 tok/sEstimated	4GB (have 12GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	Fits comfortably	20.05 tok/sEstimated	2GB (have 12GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	Not supported	—	20GB (have 12GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	Fits comfortably	12.02 tok/sEstimated	10GB (have 12GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	14.70 tok/sEstimated	5GB (have 12GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	19.00 tok/sEstimated	3GB (have 12GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	11.55 tok/sEstimated	8GB (have 12GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	16.32 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-Reranker-0.6B	Q8	Fits comfortably	11.75 tok/sEstimated	6GB (have 12GB)
Qwen/Qwen3-Reranker-0.6B	Q4	Fits comfortably	17.47 tok/sEstimated	3GB (have 12GB)
rednote-hilab/dots.ocr	Q8	Fits comfortably	12.27 tok/sEstimated	7GB (have 12GB)
rednote-hilab/dots.ocr	Q4	Fits comfortably	18.15 tok/sEstimated	4GB (have 12GB)
google-t5/t5-3b	Q8	Fits comfortably	16.25 tok/sEstimated	3GB (have 12GB)
google-t5/t5-3b	Q4	Fits comfortably	25.56 tok/sEstimated	2GB (have 12GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q8	Not supported	—	30GB (have 12GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Not supported	—	15GB (have 12GB)
Qwen/Qwen3-4B	Q8	Fits comfortably	16.47 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-4B	Q4	Fits comfortably	23.10 tok/sEstimated	2GB (have 12GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	12.08 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	16.23 tok/sEstimated	4GB (have 12GB)
openai-community/gpt2-large	Q8	Fits comfortably	11.88 tok/sEstimated	7GB (have 12GB)
openai-community/gpt2-large	Q4	Fits comfortably	16.53 tok/sEstimated	4GB (have 12GB)
microsoft/Phi-3-mini-4k-instruct	Q8	Fits comfortably	12.30 tok/sEstimated	7GB (have 12GB)
microsoft/Phi-3-mini-4k-instruct	Q4	Fits comfortably	17.75 tok/sEstimated	4GB (have 12GB)
allenai/OLMo-2-0425-1B	Q8	Fits comfortably	26.52 tok/sEstimated	1GB (have 12GB)
allenai/OLMo-2-0425-1B	Q4	Fits comfortably	33.20 tok/sEstimated	1GB (have 12GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q8	Not supported	—	80GB (have 12GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Not supported	—	40GB (have 12GB)
Qwen/Qwen3-32B	Q8	Not supported	—	32GB (have 12GB)
Qwen/Qwen3-32B	Q4	Not supported	—	16GB (have 12GB)
Qwen/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	12.95 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	18.42 tok/sEstimated	3GB (have 12GB)
Qwen/Qwen2.5-7B	Q8	Fits comfortably	11.83 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen2.5-7B	Q4	Fits comfortably	18.12 tok/sEstimated	4GB (have 12GB)
meta-llama/Meta-Llama-3-8B	Q8	Fits comfortably	11.97 tok/sEstimated	8GB (have 12GB)
meta-llama/Meta-Llama-3-8B	Q4	Fits comfortably	15.58 tok/sEstimated	4GB (have 12GB)
meta-llama/Llama-3.2-1B	Q8	Fits comfortably	23.50 tok/sEstimated	1GB (have 12GB)
meta-llama/Llama-3.2-1B	Q4	Fits comfortably	32.60 tok/sEstimated	1GB (have 12GB)
petals-team/StableBeluga2	Q8	Fits comfortably	11.77 tok/sEstimated	7GB (have 12GB)
petals-team/StableBeluga2	Q4	Fits comfortably	16.84 tok/sEstimated	4GB (have 12GB)
vikhyatk/moondream2	Q8	Fits comfortably	13.15 tok/sEstimated	7GB (have 12GB)
vikhyatk/moondream2	Q4	Fits comfortably	17.39 tok/sEstimated	4GB (have 12GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	15.59 tok/sEstimated	3GB (have 12GB)
meta-llama/Llama-3.2-3B-Instruct	Q4	Fits comfortably	24.63 tok/sEstimated	2GB (have 12GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q8	Not supported	—	70GB (have 12GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	Not supported	—	35GB (have 12GB)
distilbert/distilgpt2	Q8	Fits comfortably	13.49 tok/sEstimated	7GB (have 12GB)
distilbert/distilgpt2	Q4	Fits comfortably	18.05 tok/sEstimated	4GB (have 12GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q8	Not supported	—	32GB (have 12GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q4	Not supported	—	16GB (have 12GB)
inference-net/Schematron-3B	Q8	Fits comfortably	18.08 tok/sEstimated	3GB (have 12GB)
inference-net/Schematron-3B	Q4	Fits comfortably	25.47 tok/sEstimated	2GB (have 12GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	12.80 tok/sEstimated	8GB (have 12GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	15.57 tok/sEstimated	4GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	11.64 tok/sEstimated	7GB (have 12GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	16.34 tok/sEstimated	4GB (have 12GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	16.36 tok/sEstimated	3GB (have 12GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	23.25 tok/sEstimated	2GB (have 12GB)
bigscience/bloomz-560m	Q8	Fits comfortably	11.87 tok/sEstimated	7GB (have 12GB)
bigscience/bloomz-560m	Q4	Fits comfortably	18.48 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen2.5-3B-Instruct	Q8	Fits comfortably	16.07 tok/sEstimated	3GB (have 12GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	23.47 tok/sEstimated	2GB (have 12GB)
openai/gpt-oss-120b	Q8	Not supported	—	120GB (have 12GB)
openai/gpt-oss-120b	Q4	Not supported	—	60GB (have 12GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	23.09 tok/sEstimated	1GB (have 12GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	37.48 tok/sEstimated	1GB (have 12GB)
Qwen/Qwen3-4B-Instruct-2507	Q8	Fits comfortably	14.28 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	21.56 tok/sEstimated	2GB (have 12GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	12.75 tok/sEstimated	7GB (have 12GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	17.28 tok/sEstimated	4GB (have 12GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	26.15 tok/sEstimated	1GB (have 12GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	Fits comfortably	38.62 tok/sEstimated	1GB (have 12GB)
facebook/opt-125m	Q8	Fits comfortably	11.90 tok/sEstimated	7GB (have 12GB)
facebook/opt-125m	Q4	Fits comfortably	18.71 tok/sEstimated	4GB (have 12GB)
Qwen/Qwen2.5-1.5B-Instruct	Q8	Fits comfortably	14.45 tok/sEstimated	5GB (have 12GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	21.72 tok/sEstimated	3GB (have 12GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	11.79 tok/sEstimated	6GB (have 12GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	19.05 tok/sEstimated	3GB (have 12GB)
google/gemma-3-1b-it	Q8	Fits comfortably	23.29 tok/sEstimated	1GB (have 12GB)
google/gemma-3-1b-it	Q4	Fits comfortably	37.84 tok/sEstimated	1GB (have 12GB)
openai/gpt-oss-20b	Q8	Not supported	—	20GB (have 12GB)
openai/gpt-oss-20b	Q4	Fits comfortably	10.87 tok/sEstimated	10GB (have 12GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Not supported	—	34GB (have 12GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	Not supported	—	17GB (have 12GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	11.61 tok/sEstimated	8GB (have 12GB)
meta-llama/Llama-3.1-8B-Instruct	Q4	Fits comfortably	15.38 tok/sEstimated	4GB (have 12GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	12.73 tok/sEstimated	5GB (have 12GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	18.93 tok/sEstimated	3GB (have 12GB)
Qwen/Qwen3-0.6B	Q8	Fits comfortably	12.78 tok/sEstimated	6GB (have 12GB)
Qwen/Qwen3-0.6B	Q4	Fits comfortably	20.20 tok/sEstimated	3GB (have 12GB)
Qwen/Qwen2.5-7B-Instruct	Q8	Fits comfortably	11.25 tok/sEstimated	7GB (have 12GB)
Qwen/Qwen2.5-7B-Instruct	Q4	Fits comfortably	16.10 tok/sEstimated	4GB (have 12GB)
openai-community/gpt2	Q8	Fits comfortably	12.05 tok/sEstimated	7GB (have 12GB)
openai-community/gpt2	Q4	Fits comfortably	17.15 tok/sEstimated	4GB (have 12GB)

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8Q8

Not supported80GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8Q4

Not supported40GB required · 12GB available

Speed data coming soon

ai-forever/ruGPT-3.5-13BQ8

Not supported13GB required · 12GB available

Speed data coming soon

ai-forever/ruGPT-3.5-13BQ4

Fits comfortably7GB required · 12GB available

14.86 tok/sEstimated

baichuan-inc/Baichuan-M2-32BQ8

Not supported32GB required · 12GB available

Speed data coming soon

baichuan-inc/Baichuan-M2-32BQ4

Not supported16GB required · 12GB available

Speed data coming soon

HuggingFaceM4/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 12GB available

13.06 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 12GB available

18.77 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ8

Fits comfortably8GB required · 12GB available

10.61 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ4

Fits comfortably4GB required · 12GB available

17.83 tok/sEstimated

Qwen/Qwen3-1.7B-BaseQ8

Fits comfortably7GB required · 12GB available

12.51 tok/sEstimated

Qwen/Qwen3-1.7B-BaseQ4

Fits comfortably4GB required · 12GB available

16.89 tok/sEstimated

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ8

Not supported20GB required · 12GB available

Speed data coming soon

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ4

Fits comfortably10GB required · 12GB available

11.00 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ8

Fits comfortably7GB required · 12GB available

11.93 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ4

Fits comfortably4GB required · 12GB available

16.01 tok/sEstimated

dicta-il/dictalm2.0-instructQ8

Fits comfortably7GB required · 12GB available

12.54 tok/sEstimated

dicta-il/dictalm2.0-instructQ4

Fits comfortably4GB required · 12GB available

18.35 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ8

Not supported30GB required · 12GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ4

Not supported15GB required · 12GB available

Speed data coming soon

GSAI-ML/LLaDA-8B-BaseQ8

Fits comfortably8GB required · 12GB available

10.71 tok/sEstimated

GSAI-ML/LLaDA-8B-BaseQ4

Fits comfortably4GB required · 12GB available

16.68 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bitQ8

Not supported30GB required · 12GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bitQ4

Not supported15GB required · 12GB available

Speed data coming soon

Qwen/Qwen2-0.5B-InstructQ8

Fits comfortably5GB required · 12GB available

13.91 tok/sEstimated

Qwen/Qwen2-0.5B-InstructQ4

Fits comfortably3GB required · 12GB available

21.79 tok/sEstimated

deepseek-ai/DeepSeek-V3Q8

Fits comfortably7GB required · 12GB available

12.64 tok/sEstimated

deepseek-ai/DeepSeek-V3Q4

Fits comfortably4GB required · 12GB available

16.66 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ8

Fits comfortably5GB required · 12GB available

13.05 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ4

Fits comfortably3GB required · 12GB available

20.86 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Not supported30GB required · 12GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ4

Not supported15GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-30B-A3B-Thinking-2507Q8

Not supported30GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-30B-A3B-Thinking-2507Q4

Not supported15GB required · 12GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ8

Not supported30GB required · 12GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ4

Not supported15GB required · 12GB available

Speed data coming soon

AI-MO/Kimina-Prover-72BQ8

Not supported72GB required · 12GB available

Speed data coming soon

AI-MO/Kimina-Prover-72BQ4

Not supported36GB required · 12GB available

Speed data coming soon

apple/OpenELM-1_1B-InstructQ8

Fits comfortably1GB required · 12GB available

24.44 tok/sEstimated

apple/OpenELM-1_1B-InstructQ4

Fits comfortably1GB required · 12GB available

38.13 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 12GB available

11.65 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 12GB available

18.27 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q8

Fits comfortably9GB required · 12GB available

12.11 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q4

Fits comfortably5GB required · 12GB available

16.28 tok/sEstimated

Qwen/Qwen2.5-3BQ8

Fits comfortably3GB required · 12GB available

16.93 tok/sEstimated

Qwen/Qwen2.5-3BQ4

Fits comfortably2GB required · 12GB available

23.03 tok/sEstimated

lmsys/vicuna-7b-v1.5Q8

Fits comfortably7GB required · 12GB available

13.21 tok/sEstimated

lmsys/vicuna-7b-v1.5Q4

Fits comfortably4GB required · 12GB available

16.10 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ8

Not supported13GB required · 12GB available

Speed data coming soon

meta-llama/Llama-2-13b-chat-hfQ4

Fits comfortably7GB required · 12GB available

13.23 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ8

Not supported80GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-ThinkingQ4

Not supported40GB required · 12GB available

Speed data coming soon

unsloth/gemma-3-1b-itQ8

Fits comfortably1GB required · 12GB available

23.53 tok/sEstimated

unsloth/gemma-3-1b-itQ4

Fits comfortably1GB required · 12GB available

37.36 tok/sEstimated

bigcode/starcoder2-3bQ8

Fits comfortably3GB required · 12GB available

17.44 tok/sEstimated

bigcode/starcoder2-3bQ4

Fits comfortably2GB required · 12GB available

23.98 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q8

Not supported80GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q4

Not supported40GB required · 12GB available

Speed data coming soon

ibm-granite/granite-docling-258MQ8

Fits comfortably7GB required · 12GB available

13.28 tok/sEstimated

ibm-granite/granite-docling-258MQ4

Fits comfortably4GB required · 12GB available

16.95 tok/sEstimated

skt/kogpt2-base-v2Q8

Fits comfortably7GB required · 12GB available

13.13 tok/sEstimated

skt/kogpt2-base-v2Q4

Fits comfortably4GB required · 12GB available

16.59 tok/sEstimated

google/gemma-3-270m-itQ8

Fits comfortably7GB required · 12GB available

13.40 tok/sEstimated

google/gemma-3-270m-itQ4

Fits comfortably4GB required · 12GB available

16.87 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q8

Fits comfortably4GB required · 12GB available

14.15 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q4

Fits comfortably2GB required · 12GB available

23.67 tok/sEstimated

Qwen/Qwen2.5-32BQ8

Not supported32GB required · 12GB available

Speed data coming soon

Qwen/Qwen2.5-32BQ4

Not supported16GB required · 12GB available

Speed data coming soon

parler-tts/parler-tts-large-v1Q8

Fits comfortably7GB required · 12GB available

12.31 tok/sEstimated

parler-tts/parler-tts-large-v1Q4

Fits comfortably4GB required · 12GB available

19.07 tok/sEstimated

EleutherAI/pythia-70m-dedupedQ8

Fits comfortably7GB required · 12GB available

13.33 tok/sEstimated

EleutherAI/pythia-70m-dedupedQ4

Fits comfortably4GB required · 12GB available

19.30 tok/sEstimated

microsoft/VibeVoice-1.5BQ8

Fits comfortably5GB required · 12GB available

15.12 tok/sEstimated

microsoft/VibeVoice-1.5BQ4

Fits comfortably3GB required · 12GB available

19.87 tok/sEstimated

ibm-granite/granite-3.3-2b-instructQ8

Fits comfortably2GB required · 12GB available

19.37 tok/sEstimated

ibm-granite/granite-3.3-2b-instructQ4

Fits comfortably1GB required · 12GB available

26.59 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ8

Not supported72GB required · 12GB available

Speed data coming soon

Qwen/Qwen2.5-72B-InstructQ4

Not supported36GB required · 12GB available

Speed data coming soon

liuhaotian/llava-v1.5-7bQ8

Fits comfortably7GB required · 12GB available

11.33 tok/sEstimated

liuhaotian/llava-v1.5-7bQ4

Fits comfortably4GB required · 12GB available

17.51 tok/sEstimated

google/gemma-2bQ8

Fits comfortably2GB required · 12GB available

17.65 tok/sEstimated

google/gemma-2bQ4

Fits comfortably1GB required · 12GB available

29.31 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q8

Fits comfortably7GB required · 12GB available

13.22 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q4

Fits comfortably4GB required · 12GB available

17.86 tok/sEstimated

Qwen/Qwen3-235B-A22BQ8

Not supported235GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-235B-A22BQ4

Not supported118GB required · 12GB available

Speed data coming soon

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ8

Fits comfortably8GB required · 12GB available

11.32 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 12GB available

15.27 tok/sEstimated

microsoft/Phi-4-mini-instructQ8

Fits comfortably7GB required · 12GB available

11.38 tok/sEstimated

microsoft/Phi-4-mini-instructQ4

Fits comfortably4GB required · 12GB available

18.48 tok/sEstimated

llamafactory/tiny-random-Llama-3Q8

Fits comfortably7GB required · 12GB available

12.18 tok/sEstimated

llamafactory/tiny-random-Llama-3Q4

Fits comfortably4GB required · 12GB available

17.84 tok/sEstimated

HuggingFaceH4/zephyr-7b-betaQ8

Fits comfortably7GB required · 12GB available

12.02 tok/sEstimated

HuggingFaceH4/zephyr-7b-betaQ4

Fits comfortably4GB required · 12GB available

16.98 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q8

Fits comfortably4GB required · 12GB available

14.35 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q4

Fits comfortably2GB required · 12GB available

23.35 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8Q8

Not supported30GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8Q4

Not supported15GB required · 12GB available

Speed data coming soon

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ8

Fits comfortably8GB required · 12GB available

11.29 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ4

Fits comfortably4GB required · 12GB available

17.15 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 12GB available

23.47 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 12GB available

36.80 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ8

Fits comfortably8GB required · 12GB available

10.87 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ4

Fits comfortably4GB required · 12GB available

18.06 tok/sEstimated

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicQ8

Not supported90GB required · 12GB available

Speed data coming soon

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicQ4

Not supported45GB required · 12GB available

Speed data coming soon

Qwen/Qwen2.5-Coder-7B-InstructQ8

Fits comfortably7GB required · 12GB available

11.08 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ4

Fits comfortably4GB required · 12GB available

16.09 tok/sEstimated

numind/NuExtract-1.5Q8

Fits comfortably7GB required · 12GB available

11.25 tok/sEstimated

numind/NuExtract-1.5Q4

Fits comfortably4GB required · 12GB available

15.98 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ8

Fits comfortably7GB required · 12GB available

11.61 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4

Fits comfortably4GB required · 12GB available

18.14 tok/sEstimated

hmellor/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 12GB available

12.56 tok/sEstimated

hmellor/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 12GB available

16.22 tok/sEstimated

huggyllama/llama-7bQ8

Fits comfortably7GB required · 12GB available

12.87 tok/sEstimated

huggyllama/llama-7bQ4

Fits comfortably4GB required · 12GB available

17.35 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324Q8

Fits comfortably7GB required · 12GB available

11.35 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324Q4

Fits comfortably4GB required · 12GB available

16.51 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ8

Fits comfortably7GB required · 12GB available

13.09 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ4

Fits comfortably4GB required · 12GB available

17.73 tok/sEstimated

sshleifer/tiny-gpt2Q8

Fits comfortably7GB required · 12GB available

11.70 tok/sEstimated

sshleifer/tiny-gpt2Q4

Fits comfortably4GB required · 12GB available

18.91 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ8

Fits comfortably8GB required · 12GB available

11.91 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ4

Fits comfortably4GB required · 12GB available

17.77 tok/sEstimated

openai-community/gpt2-xlQ8

Fits comfortably7GB required · 12GB available

13.26 tok/sEstimated

openai-community/gpt2-xlQ4

Fits comfortably4GB required · 12GB available

18.02 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ8

Not supported14GB required · 12GB available

Speed data coming soon

OpenPipe/Qwen3-14B-InstructQ4

Fits comfortably7GB required · 12GB available

14.38 tok/sEstimated

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16Q8

Not supported70GB required · 12GB available

Speed data coming soon

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16Q4

Not supported35GB required · 12GB available

Speed data coming soon

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bitQ8

Fits comfortably4GB required · 12GB available

16.39 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bitQ4

Fits comfortably2GB required · 12GB available

22.11 tok/sEstimated

ibm-research/PowerMoE-3bQ8

Fits comfortably3GB required · 12GB available

15.69 tok/sEstimated

ibm-research/PowerMoE-3bQ4

Fits comfortably2GB required · 12GB available

23.38 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bitQ8

Fits comfortably4GB required · 12GB available

14.19 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bitQ4

Fits comfortably2GB required · 12GB available

23.03 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 12GB available

17.62 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 12GB available

25.01 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ8

Fits comfortably4GB required · 12GB available

15.92 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ4

Fits comfortably2GB required · 12GB available

19.74 tok/sEstimated

meta-llama/Llama-3.2-3BQ8

Fits comfortably3GB required · 12GB available

17.50 tok/sEstimated

meta-llama/Llama-3.2-3BQ4

Fits comfortably2GB required · 12GB available

23.73 tok/sEstimated

EleutherAI/gpt-neo-125mQ8

Fits comfortably7GB required · 12GB available

13.47 tok/sEstimated

EleutherAI/gpt-neo-125mQ4

Fits comfortably4GB required · 12GB available

17.65 tok/sEstimated

codellama/CodeLlama-34b-hfQ8

Not supported34GB required · 12GB available

Speed data coming soon

codellama/CodeLlama-34b-hfQ4

Not supported17GB required · 12GB available

Speed data coming soon

meta-llama/Llama-Guard-3-1BQ8

Fits comfortably1GB required · 12GB available

24.07 tok/sEstimated

meta-llama/Llama-Guard-3-1BQ4

Fits comfortably1GB required · 12GB available

38.83 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ8

Fits comfortably5GB required · 12GB available

14.36 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ4

Fits comfortably3GB required · 12GB available

21.59 tok/sEstimated

google/gemma-2-2b-itQ8

Fits comfortably2GB required · 12GB available

19.11 tok/sEstimated

google/gemma-2-2b-itQ4

Fits comfortably1GB required · 12GB available

30.30 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Not supported14GB required · 12GB available

Speed data coming soon

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 12GB available

13.74 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ8

Not supported32GB required · 12GB available

Speed data coming soon

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ4

Not supported16GB required · 12GB available

Speed data coming soon

microsoft/Phi-3.5-mini-instructQ8

Fits comfortably7GB required · 12GB available

11.84 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ4

Fits comfortably4GB required · 12GB available

17.74 tok/sEstimated

Qwen/Qwen3-4B-BaseQ8

Fits comfortably4GB required · 12GB available

14.38 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 12GB available

22.94 tok/sEstimated

Qwen/Qwen2-7B-InstructQ8

Fits comfortably7GB required · 12GB available

12.90 tok/sEstimated

Qwen/Qwen2-7B-InstructQ4

Fits comfortably4GB required · 12GB available

16.59 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ8

Fits comfortably7GB required · 12GB available

12.16 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ4

Fits comfortably4GB required · 12GB available

16.69 tok/sEstimated

Qwen/Qwen3-14B-BaseQ8

Not supported14GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-14B-BaseQ4

Fits comfortably7GB required · 12GB available

13.87 tok/sEstimated

swiss-ai/Apertus-8B-Instruct-2509Q8

Fits comfortably8GB required · 12GB available

12.18 tok/sEstimated

swiss-ai/Apertus-8B-Instruct-2509Q4

Fits comfortably4GB required · 12GB available

18.31 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ8

Fits comfortably7GB required · 12GB available

13.16 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ4

Fits comfortably4GB required · 12GB available

16.58 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ8

Fits comfortably7GB required · 12GB available

12.09 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ4

Fits comfortably4GB required · 12GB available

17.57 tok/sEstimated

rinna/japanese-gpt-neox-smallQ8

Fits comfortably7GB required · 12GB available

11.25 tok/sEstimated

rinna/japanese-gpt-neox-smallQ4

Fits comfortably4GB required · 12GB available

18.24 tok/sEstimated

Qwen/Qwen2.5-Coder-1.5BQ8

Fits comfortably5GB required · 12GB available

13.12 tok/sEstimated

Qwen/Qwen2.5-Coder-1.5BQ4

Fits comfortably3GB required · 12GB available

18.05 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ8

Fits comfortably8GB required · 12GB available

10.90 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ4

Fits comfortably4GB required · 12GB available

15.38 tok/sEstimated

Qwen/Qwen3-30B-A3BQ8

Not supported30GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-30B-A3BQ4

Not supported15GB required · 12GB available

Speed data coming soon

deepseek-ai/DeepSeek-R1Q8

Fits comfortably7GB required · 12GB available

12.98 tok/sEstimated

deepseek-ai/DeepSeek-R1Q4

Fits comfortably4GB required · 12GB available

16.34 tok/sEstimated

microsoft/DialoGPT-smallQ8

Fits comfortably7GB required · 12GB available

13.44 tok/sEstimated

microsoft/DialoGPT-smallQ4

Fits comfortably4GB required · 12GB available

15.93 tok/sEstimated

Qwen/Qwen3-8B-FP8Q8

Fits comfortably8GB required · 12GB available

10.71 tok/sEstimated

Qwen/Qwen3-8B-FP8Q4

Fits comfortably4GB required · 12GB available

15.13 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructQ8

Not supported30GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-Coder-30B-A3B-InstructQ4

Not supported15GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-Embedding-4BQ8

Fits comfortably4GB required · 12GB available

16.20 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ4

Fits comfortably2GB required · 12GB available

22.70 tok/sEstimated

microsoft/Phi-4-multimodal-instructQ8

Fits comfortably7GB required · 12GB available

13.12 tok/sEstimated

microsoft/Phi-4-multimodal-instructQ4

Fits comfortably4GB required · 12GB available

16.03 tok/sEstimated

Qwen/Qwen3-8B-BaseQ8

Fits comfortably8GB required · 12GB available

12.39 tok/sEstimated

Qwen/Qwen3-8B-BaseQ4

Fits comfortably4GB required · 12GB available

18.16 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ8

Fits comfortably6GB required · 12GB available

12.10 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ4

Fits comfortably3GB required · 12GB available

19.21 tok/sEstimated

openai-community/gpt2-mediumQ8

Fits comfortably7GB required · 12GB available

13.29 tok/sEstimated

openai-community/gpt2-mediumQ4

Fits comfortably4GB required · 12GB available

17.75 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 12GB available

12.75 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 12GB available

18.30 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ8

Fits comfortably5GB required · 12GB available

14.48 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ4

Fits comfortably3GB required · 12GB available

19.74 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ8

Fits comfortably7GB required · 12GB available

11.48 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ4

Fits comfortably4GB required · 12GB available

17.55 tok/sEstimated

unsloth/gpt-oss-20b-BF16Q8

Not supported20GB required · 12GB available

Speed data coming soon

unsloth/gpt-oss-20b-BF16Q4

Fits comfortably10GB required · 12GB available

11.93 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Not supported70GB required · 12GB available

Speed data coming soon

meta-llama/Meta-Llama-3-70B-InstructQ4

Not supported35GB required · 12GB available

Speed data coming soon

unsloth/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 12GB available

11.36 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 12GB available

18.31 tok/sEstimated

zai-org/GLM-4.5-AirQ8

Fits comfortably7GB required · 12GB available

12.70 tok/sEstimated

zai-org/GLM-4.5-AirQ4

Fits comfortably4GB required · 12GB available

16.12 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.1Q8

Fits comfortably7GB required · 12GB available

12.13 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.1Q4

Fits comfortably4GB required · 12GB available

18.34 tok/sEstimated

LiquidAI/LFM2-1.2BQ8

Fits comfortably2GB required · 12GB available

19.77 tok/sEstimated

LiquidAI/LFM2-1.2BQ4

Fits comfortably1GB required · 12GB available

27.58 tok/sEstimated

mistralai/Mistral-7B-v0.1Q8

Fits comfortably7GB required · 12GB available

12.82 tok/sEstimated

mistralai/Mistral-7B-v0.1Q4

Fits comfortably4GB required · 12GB available

15.84 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Not supported32GB required · 12GB available

Speed data coming soon

Qwen/Qwen2.5-32B-InstructQ4

Not supported16GB required · 12GB available

Speed data coming soon

deepseek-ai/DeepSeek-R1-0528Q8

Fits comfortably7GB required · 12GB available

11.72 tok/sEstimated

deepseek-ai/DeepSeek-R1-0528Q4

Fits comfortably4GB required · 12GB available

17.44 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably8GB required · 12GB available

11.28 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 12GB available

17.33 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 12GB available

11.61 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q4

Fits comfortably4GB required · 12GB available

15.82 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 12GB available

11.50 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 12GB available

17.36 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 12GB available

15.49 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 12GB available

22.39 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 12GB available

13.20 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 12GB available

19.68 tok/sEstimated

MiniMaxAI/MiniMax-M2Q8

Fits comfortably7GB required · 12GB available

13.22 tok/sEstimated

MiniMaxAI/MiniMax-M2Q4

Fits comfortably4GB required · 12GB available

16.03 tok/sEstimated

microsoft/DialoGPT-mediumQ8

Fits comfortably7GB required · 12GB available

11.60 tok/sEstimated

microsoft/DialoGPT-mediumQ4

Fits comfortably4GB required · 12GB available

18.00 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 12GB available

11.17 tok/sEstimated

zai-org/GLM-4.6-FP8Q4

Fits comfortably4GB required · 12GB available

18.41 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ8

Fits comfortably7GB required · 12GB available

12.68 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ4

Fits comfortably4GB required · 12GB available

16.12 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably8GB required · 12GB available

12.16 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 12GB available

18.29 tok/sEstimated

meta-llama/Llama-2-7b-hfQ8

Fits comfortably7GB required · 12GB available

12.98 tok/sEstimated

meta-llama/Llama-2-7b-hfQ4

Fits comfortably4GB required · 12GB available

16.80 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 12GB available

12.42 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 12GB available

18.76 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 12GB available

12.39 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 12GB available

16.96 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported70GB required · 12GB available

Speed data coming soon

meta-llama/Llama-3.1-70B-InstructQ4

Not supported35GB required · 12GB available

Speed data coming soon

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 12GB available

13.54 tok/sEstimated

Qwen/Qwen2.5-0.5BQ4

Fits comfortably3GB required · 12GB available

21.66 tok/sEstimated

Qwen/Qwen3-14BQ8

Not supported14GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-14BQ4

Fits comfortably7GB required · 12GB available

14.35 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ8

Fits comfortably8GB required · 12GB available

12.22 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ4

Fits comfortably4GB required · 12GB available

16.25 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructQ8

Not supported70GB required · 12GB available

Speed data coming soon

meta-llama/Llama-3.3-70B-InstructQ4

Not supported35GB required · 12GB available

Speed data coming soon

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ8

Fits comfortably8GB required · 12GB available

12.14 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ4

Fits comfortably4GB required · 12GB available

15.81 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Not supported14GB required · 12GB available

Speed data coming soon

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 12GB available

13.12 tok/sEstimated

Qwen/Qwen2.5-1.5BQ8

Fits comfortably5GB required · 12GB available

13.02 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 12GB available

18.70 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8

Fits comfortably4GB required · 12GB available

15.65 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ4

Fits comfortably2GB required · 12GB available

20.05 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q8

Not supported20GB required · 12GB available

Speed data coming soon

mlx-community/gpt-oss-20b-MXFP4-Q8Q4

Fits comfortably10GB required · 12GB available

12.02 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 12GB available

14.70 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 12GB available

19.00 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably8GB required · 12GB available

11.55 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 12GB available

16.32 tok/sEstimated

Qwen/Qwen3-Reranker-0.6BQ8

Fits comfortably6GB required · 12GB available

11.75 tok/sEstimated

Qwen/Qwen3-Reranker-0.6BQ4

Fits comfortably3GB required · 12GB available

17.47 tok/sEstimated

rednote-hilab/dots.ocrQ8

Fits comfortably7GB required · 12GB available

12.27 tok/sEstimated

rednote-hilab/dots.ocrQ4

Fits comfortably4GB required · 12GB available

18.15 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 12GB available

16.25 tok/sEstimated

google-t5/t5-3bQ4

Fits comfortably2GB required · 12GB available

25.56 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q8

Not supported30GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Not supported15GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-4BQ8

Fits comfortably4GB required · 12GB available

16.47 tok/sEstimated

Qwen/Qwen3-4BQ4

Fits comfortably2GB required · 12GB available

23.10 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 12GB available

12.08 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 12GB available

16.23 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 12GB available

11.88 tok/sEstimated

openai-community/gpt2-largeQ4

Fits comfortably4GB required · 12GB available

16.53 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ8

Fits comfortably7GB required · 12GB available

12.30 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ4

Fits comfortably4GB required · 12GB available

17.75 tok/sEstimated

allenai/OLMo-2-0425-1BQ8

Fits comfortably1GB required · 12GB available

26.52 tok/sEstimated

allenai/OLMo-2-0425-1BQ4

Fits comfortably1GB required · 12GB available

33.20 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ8

Not supported80GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Not supported40GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-32BQ8

Not supported32GB required · 12GB available

Speed data coming soon

Qwen/Qwen3-32BQ4

Not supported16GB required · 12GB available

Speed data coming soon

Qwen/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 12GB available

12.95 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 12GB available

18.42 tok/sEstimated

Qwen/Qwen2.5-7BQ8

Fits comfortably7GB required · 12GB available

11.83 tok/sEstimated

Qwen/Qwen2.5-7BQ4

Fits comfortably4GB required · 12GB available

18.12 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ8

Fits comfortably8GB required · 12GB available

11.97 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ4

Fits comfortably4GB required · 12GB available

15.58 tok/sEstimated

meta-llama/Llama-3.2-1BQ8

Fits comfortably1GB required · 12GB available

23.50 tok/sEstimated

meta-llama/Llama-3.2-1BQ4

Fits comfortably1GB required · 12GB available

32.60 tok/sEstimated

petals-team/StableBeluga2Q8

Fits comfortably7GB required · 12GB available

11.77 tok/sEstimated

petals-team/StableBeluga2Q4

Fits comfortably4GB required · 12GB available

16.84 tok/sEstimated

vikhyatk/moondream2Q8

Fits comfortably7GB required · 12GB available

13.15 tok/sEstimated

vikhyatk/moondream2Q4

Fits comfortably4GB required · 12GB available

17.39 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 12GB available

15.59 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 12GB available

24.63 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ8

Not supported70GB required · 12GB available

Speed data coming soon

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4

Not supported35GB required · 12GB available

Speed data coming soon

distilbert/distilgpt2Q8

Fits comfortably7GB required · 12GB available

13.49 tok/sEstimated

distilbert/distilgpt2Q4

Fits comfortably4GB required · 12GB available

18.05 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ8

Not supported32GB required · 12GB available

Speed data coming soon

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ4

Not supported16GB required · 12GB available

Speed data coming soon

inference-net/Schematron-3BQ8

Fits comfortably3GB required · 12GB available

18.08 tok/sEstimated

inference-net/Schematron-3BQ4

Fits comfortably2GB required · 12GB available

25.47 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably8GB required · 12GB available

12.80 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 12GB available

15.57 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 12GB available

11.64 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 12GB available

16.34 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 12GB available

16.36 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 12GB available

23.25 tok/sEstimated

bigscience/bloomz-560mQ8

Fits comfortably7GB required · 12GB available

11.87 tok/sEstimated

bigscience/bloomz-560mQ4

Fits comfortably4GB required · 12GB available

18.48 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ8

Fits comfortably3GB required · 12GB available

16.07 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 12GB available

23.47 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported120GB required · 12GB available

Speed data coming soon

openai/gpt-oss-120bQ4

Not supported60GB required · 12GB available

Speed data coming soon

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 12GB available

23.09 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 12GB available

37.48 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q8

Fits comfortably4GB required · 12GB available

14.28 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 12GB available

21.56 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 12GB available

12.75 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 12GB available

17.28 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 12GB available

26.15 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4

Fits comfortably1GB required · 12GB available

38.62 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 12GB available

11.90 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 12GB available

18.71 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ8

Fits comfortably5GB required · 12GB available

14.45 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 12GB available

21.72 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 12GB available

11.79 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 12GB available

19.05 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 12GB available

23.29 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 12GB available

37.84 tok/sEstimated

openai/gpt-oss-20bQ8

Not supported20GB required · 12GB available

Speed data coming soon

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 12GB available

10.87 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Not supported34GB required · 12GB available

Speed data coming soon

dphn/dolphin-2.9.1-yi-1.5-34bQ4

Not supported17GB required · 12GB available

Speed data coming soon

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 12GB available

11.61 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 12GB available

15.38 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 12GB available

12.73 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 12GB available

18.93 tok/sEstimated

Qwen/Qwen3-0.6BQ8

Fits comfortably6GB required · 12GB available

12.78 tok/sEstimated

Qwen/Qwen3-0.6BQ4

Fits comfortably3GB required · 12GB available

20.20 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ8

Fits comfortably7GB required · 12GB available

11.25 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ4

Fits comfortably4GB required · 12GB available

16.10 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 12GB available

12.05 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 12GB available

17.15 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

GPU FAQs

Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.

How fast is a single RTX 3060 on 7B Q8 models?

Even with modest specs, a 12 GB RTX 3060 can drive 7B Q8 quant models at over 60 tokens/sec—fast enough for iterative coding and agents.

Source: Reddit – /r/LocalLLaMA (l6nfptd)

What throughput does a triple-3060 rig deliver?

One builder running three RTX 3060 cards reports Gemma 3 27B Q4 at ~15 tok/sec, Mistral 24B Q4 at ~18 tok/sec, and DeepSeek R1 32B Q4 at ~20 tok/sec via Ollama.

Source: Reddit – /r/LocalLLaMA (mo6ttds)

Do scaling calculators match real-world performance?

Not always—2× RTX 3060 was projected to hit ~29 tok/sec on DeepSeek R1 32B (16K ctx), but real benchmarks landed closer to 14 tok/sec.

Source: Reddit – /r/LocalLLaMA (mq781cj)

How does CPU-only performance compare?

A dual-Xeon workstation without GPU offload only mustered ~1.68 tok/sec on DeepSeek R1 Q4—showing why even a single 3060 is a major upgrade.

Source: Reddit – /r/LocalLLaMA (mm9ladj)

What are the specs and current prices?

RTX 3060 12 GB draws 170 W, uses an 8-pin PCIe connector, and NVIDIA recommends a 550 W PSU. On 3 Nov 2025 the card was $329 at Amazon and Best Buy (in stock) and $339 at Newegg.

Source: TechPowerUp – RTX 3060 Specs

Alternative GPUs

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.