Loading GPU data...

Quick Answer: RTX 3080 offers 10GB VRAM and starts around $520.59. It delivers approximately 90 tokens/sec on allenai/OLMo-2-0425-1B. It typically draws 320W under load.

RTX 3080

Name: RTX 3080
Brand: NVIDIA
Rating: 4.5 (89 reviews)

Unknown

By NVIDIAReleased 2020-09MSRP $699.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Specs snapshot

Key hardware metrics for AI workloads.

VRAM10GB

Cores8,704

TDP320W

ArchitectureAmpere

Price comparison

Retailer	Price	Buy
AmazonPrimary	$520.59LowestUnknown	Buy now
Best Buy	$699.00Backorder	Buy now
Newegg	$729.00In stock	Buy now

AmazonPrimary

$520.59

Lowest PriceUnknown

Buy

Best Buy

$699.00

Backorder

Buy

Newegg

$729.00

In stock

Buy

More Amazon options

Rotate out primary variants whenever validation flags an issue.

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
allenai/OLMo-2-0425-1B	Q4	90.42 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	89.56 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	86.19 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	84.89 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	84.59 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	84.47 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	82.79 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	82.64 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	80.70 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	70.61 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	68.60 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	68.47 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q8	65.86 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q8	64.94 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q8	64.77 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	63.55 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q8	63.22 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	62.85 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	61.32 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q8	58.52 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	58.32 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	57.87 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	57.19 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	56.44 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	56.36 tok/sEstimated Auto-generated benchmark	2GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	56.23 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	56.05 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-4B-Base	Q4	55.70 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q8	55.70 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	55.55 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	55.34 tok/sEstimated Auto-generated benchmark	2GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	55.16 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q8	55.13 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q8	54.36 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	53.55 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Thinking-2507	Q4	53.47 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	53.09 tok/sEstimated Auto-generated benchmark	2GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	52.79 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	52.73 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-Coder-1.5B	Q4	52.52 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-0.5B-Instruct	Q4	52.18 tok/sEstimated Auto-generated benchmark	3GB
Gensyn/Qwen2.5-0.5B-Instruct	Q4	52.15 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-Embedding-4B	Q4	52.00 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Instruct-2507	Q4	51.72 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	51.48 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-1.5B-Instruct	Q4	50.94 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q4	50.79 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-1.5B	Q4	50.18 tok/sEstimated Auto-generated benchmark	3GB
microsoft/VibeVoice-1.5B	Q4	49.45 tok/sEstimated Auto-generated benchmark	3GB
LiquidAI/LFM2-1.2B	Q8	49.42 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	49.28 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B	Q4	49.13 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-0.5B	Q4	48.69 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-0.6B	Q4	46.85 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-Math-1.5B	Q4	46.77 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-7B-Instruct	Q4	46.72 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-large	Q4	46.59 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B	Q4	46.46 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-V3-0324	Q4	46.44 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-vision-instruct	Q4	46.43 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-2b-instruct	Q8	46.33 tok/sEstimated Auto-generated benchmark	2GB
lmsys/vicuna-7b-v1.5	Q4	46.19 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1	Q4	46.14 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-xl	Q4	46.05 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-medium	Q4	45.92 tok/sEstimated Auto-generated benchmark	4GB
vikhyatk/moondream2	Q4	45.88 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	45.72 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	45.66 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	45.59 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B-Base	Q4	45.52 tok/sEstimated Auto-generated benchmark	4GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	45.39 tok/sEstimated Auto-generated benchmark	3GB
sshleifer/tiny-gpt2	Q4	45.16 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2-2b-it	Q8	45.04 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-1.5B-Instruct	Q4	44.85 tok/sEstimated Auto-generated benchmark	3GB
mistralai/Mistral-7B-Instruct-v0.1	Q4	44.82 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	44.77 tok/sEstimated Auto-generated benchmark	4GB
hmellor/tiny-random-LlamaForCausalLM	Q4	44.71 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-v0.1	Q4	44.63 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-hf	Q4	44.56 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	44.53 tok/sEstimated Auto-generated benchmark	4GB
parler-tts/parler-tts-large-v1	Q4	44.48 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceTB/SmolLM-135M	Q4	44.47 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-medium	Q4	44.25 tok/sEstimated Auto-generated benchmark	4GB
llamafactory/tiny-random-Llama-3	Q4	44.20 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-0.6B	Q4	44.15 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-Guard-3-8B	Q4	44.11 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	44.07 tok/sEstimated Auto-generated benchmark	4GB
facebook/opt-125m	Q4	44.04 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-3B	Q8	43.99 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-7B	Q4	43.99 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-chat-hf	Q4	43.86 tok/sEstimated Auto-generated benchmark	4GB
ibm-research/PowerMoE-3b	Q8	43.73 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-Embedding-8B	Q4	43.70 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	43.69 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-0.6B-Base	Q4	43.68 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-3.1-8B	Q4	43.55 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	43.50 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-4-multimodal-instruct	Q4	43.50 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Llama-3.2-3B-Instruct	Q8	43.37 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-3.1-8B-Instruct	Q4	43.36 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B-Instruct	Q8	43.34 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	43.22 tok/sEstimated Auto-generated benchmark	4GB
IlyaGusev/saiga_llama3_8b	Q4	42.91 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-4-mini-instruct	Q4	42.90 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/gpt-neo-125m	Q4	42.88 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2b	Q8	42.79 tok/sEstimated Auto-generated benchmark	2GB
rednote-hilab/dots.ocr	Q4	42.75 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	42.68 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.6-FP8	Q4	42.64 tok/sEstimated Auto-generated benchmark	4GB
bigscience/bloomz-560m	Q4	42.40 tok/sEstimated Auto-generated benchmark	4GB
BSC-LT/salamandraTA-7b-instruct	Q4	42.39 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Reranker-0.6B	Q4	42.38 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Meta-Llama-3-8B	Q4	42.32 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3.1	Q4	42.17 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	42.11 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	42.09 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B-Instruct	Q4	42.06 tok/sEstimated Auto-generated benchmark	4GB
dicta-il/dictalm2.0-instruct	Q4	41.95 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B	Q4	41.88 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	41.80 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B	Q4	41.73 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	41.72 tok/sEstimated Auto-generated benchmark	4GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	41.54 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-Instruct-v0.2	Q4	41.45 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	41.44 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-3B-Instruct	Q8	41.40 tok/sEstimated Auto-generated benchmark	3GB
microsoft/phi-4	Q4	41.37 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-docling-258M	Q4	41.28 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceTB/SmolLM2-135M	Q4	41.18 tok/sEstimated Auto-generated benchmark	4GB
liuhaotian/llava-v1.5-7b	Q4	40.96 tok/sEstimated Auto-generated benchmark	4GB
distilbert/distilgpt2	Q4	40.79 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Instruct	Q4	40.78 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-FP8	Q4	40.67 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	40.55 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-4k-instruct	Q4	40.22 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/pythia-70m-deduped	Q4	40.15 tok/sEstimated Auto-generated benchmark	4GB
bigcode/starcoder2-3b	Q8	40.13 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	40.05 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	40.00 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-2	Q4	39.89 tok/sEstimated Auto-generated benchmark	4GB
rinna/japanese-gpt-neox-small	Q4	39.80 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Base	Q8	39.68 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceH4/zephyr-7b-beta	Q4	39.62 tok/sEstimated Auto-generated benchmark	4GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	39.44 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	39.39 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	39.36 tok/sEstimated Auto-generated benchmark	4GB
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	39.29 tok/sEstimated Auto-generated benchmark	5GB
inference-net/Schematron-3B	Q8	39.28 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3-mini-128k-instruct	Q4	39.11 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	39.08 tok/sEstimated Auto-generated benchmark	4GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	39.08 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	39.04 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2	Q4	38.91 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	38.82 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507	Q8	38.78 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-270m-it	Q4	38.70 tok/sEstimated Auto-generated benchmark	4GB
MiniMaxAI/MiniMax-M2	Q4	38.61 tok/sEstimated Auto-generated benchmark	4GB
huggyllama/llama-7b	Q4	38.54 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B	Q8	38.52 tok/sEstimated Auto-generated benchmark	3GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	38.32 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q8	37.98 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Base	Q4	37.62 tok/sEstimated Auto-generated benchmark	4GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	37.11 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	37.00 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	36.95 tok/sEstimated Auto-generated benchmark	4GB
Gensyn/Qwen2.5-0.5B-Instruct	Q8	36.89 tok/sEstimated Auto-generated benchmark	5GB
google-t5/t5-3b	Q8	36.88 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-Embedding-4B	Q8	36.82 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B-Instruct	Q8	36.69 tok/sEstimated Auto-generated benchmark	5GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	36.58 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q8	36.49 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Math-1.5B	Q8	35.80 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-14B	Q4	35.76 tok/sEstimated Auto-generated benchmark	7GB
ai-forever/ruGPT-3.5-13B	Q4	35.25 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-1.5B	Q8	34.76 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-Coder-1.5B	Q8	34.34 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-0.6B-Base	Q8	34.07 tok/sEstimated Auto-generated benchmark	6GB
Qwen/Qwen2.5-0.5B-Instruct	Q8	34.00 tok/sEstimated Auto-generated benchmark	5GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q8	33.92 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-14B-Instruct	Q4	33.92 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-4B	Q8	33.59 tok/sEstimated Auto-generated benchmark	4GB
OpenPipe/Qwen3-14B-Instruct	Q4	33.15 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-4B-Instruct-2507	Q8	33.04 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B	Q8	32.79 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2-1.5B-Instruct	Q8	32.76 tok/sEstimated Auto-generated benchmark	5GB
dicta-il/dictalm2.0-instruct	Q8	32.69 tok/sEstimated Auto-generated benchmark	7GB
llamafactory/tiny-random-Llama-3	Q8	32.54 tok/sEstimated Auto-generated benchmark	7GB
microsoft/DialoGPT-small	Q8	32.45 tok/sEstimated Auto-generated benchmark	7GB
EleutherAI/pythia-70m-deduped	Q8	32.34 tok/sEstimated Auto-generated benchmark	7GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	32.24 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2-large	Q8	32.21 tok/sEstimated Auto-generated benchmark	7GB
zai-org/GLM-4.5-Air	Q8	32.06 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	32.05 tok/sEstimated Auto-generated benchmark	5GB
mistralai/Mistral-7B-Instruct-v0.2	Q8	31.96 tok/sEstimated Auto-generated benchmark	7GB
rinna/japanese-gpt-neox-small	Q8	31.90 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-4-mini-instruct	Q8	31.78 tok/sEstimated Auto-generated benchmark	7GB
numind/NuExtract-1.5	Q8	31.76 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2-xl	Q8	31.64 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3.5-mini-instruct	Q8	31.60 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-2-7b-chat-hf	Q8	31.49 tok/sEstimated Auto-generated benchmark	7GB
ibm-granite/granite-docling-258M	Q8	31.48 tok/sEstimated Auto-generated benchmark	7GB
parler-tts/parler-tts-large-v1	Q8	31.29 tok/sEstimated Auto-generated benchmark	7GB
microsoft/VibeVoice-1.5B	Q8	31.26 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-7B-Instruct	Q8	31.19 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-0.5B	Q8	31.19 tok/sEstimated Auto-generated benchmark	5GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	31.17 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-1.7B	Q8	31.15 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-14B-Base	Q4	30.98 tok/sEstimated Auto-generated benchmark	7GB
vikhyatk/moondream2	Q8	30.94 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2	Q8	30.92 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-14B	Q4	30.92 tok/sEstimated Auto-generated benchmark	7GB
skt/kogpt2-base-v2	Q8	30.91 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2-0.5B	Q8	30.91 tok/sEstimated Auto-generated benchmark	5GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q8	30.87 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-2-7b-hf	Q8	30.84 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-2-13b-chat-hf	Q4	30.76 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceTB/SmolLM-135M	Q8	30.75 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-1.5B-Instruct	Q8	30.52 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-8B-FP8	Q8	30.50 tok/sEstimated Auto-generated benchmark	8GB
meta-llama/Llama-Guard-3-8B	Q8	30.46 tok/sEstimated Auto-generated benchmark	8GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	30.36 tok/sEstimated Auto-generated benchmark	7GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	30.35 tok/sEstimated Auto-generated benchmark	8GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	30.34 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-8B	Q8	30.26 tok/sEstimated Auto-generated benchmark	8GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q8	30.26 tok/sEstimated Auto-generated benchmark	8GB
openai-community/gpt2-medium	Q8	30.15 tok/sEstimated Auto-generated benchmark	7GB
BSC-LT/salamandraTA-7b-instruct	Q8	29.92 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3-mini-4k-instruct	Q8	29.92 tok/sEstimated Auto-generated benchmark	7GB
facebook/opt-125m	Q8	29.88 tok/sEstimated Auto-generated benchmark	7GB
openai/gpt-oss-20b	Q4	29.80 tok/sEstimated Auto-generated benchmark	10GB
microsoft/phi-4	Q8	29.75 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-0.6B	Q8	29.60 tok/sEstimated Auto-generated benchmark	6GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	29.45 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen2-7B-Instruct	Q8	29.35 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	29.33 tok/sEstimated Auto-generated benchmark	7GB
petals-team/StableBeluga2	Q8	29.32 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Reranker-0.6B	Q8	29.31 tok/sEstimated Auto-generated benchmark	6GB
hmellor/tiny-random-LlamaForCausalLM	Q8	29.27 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	29.08 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-4-multimodal-instruct	Q8	29.08 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceH4/zephyr-7b-beta	Q8	29.05 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-3.1-8B-Instruct	Q8	28.77 tok/sEstimated Auto-generated benchmark	8GB
mistralai/Mistral-7B-v0.1	Q8	28.76 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1	Q8	28.53 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Embedding-0.6B	Q8	28.46 tok/sEstimated Auto-generated benchmark	6GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	28.44 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	28.40 tok/sEstimated Auto-generated benchmark	8GB
deepseek-ai/DeepSeek-V3.1	Q8	28.38 tok/sEstimated Auto-generated benchmark	7GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	28.09 tok/sEstimated Auto-generated benchmark	8GB
microsoft/phi-2	Q8	27.84 tok/sEstimated Auto-generated benchmark	7GB
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	27.83 tok/sEstimated Auto-generated benchmark	10GB
mistralai/Mistral-7B-Instruct-v0.1	Q8	27.73 tok/sEstimated Auto-generated benchmark	7GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	27.62 tok/sEstimated Auto-generated benchmark	8GB
EleutherAI/gpt-neo-125m	Q8	27.62 tok/sEstimated Auto-generated benchmark	7GB
sshleifer/tiny-gpt2	Q8	27.61 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3.5-vision-instruct	Q8	27.61 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-V3-0324	Q8	27.57 tok/sEstimated Auto-generated benchmark	7GB
google/gemma-3-270m-it	Q8	27.53 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-1.7B-Base	Q8	27.49 tok/sEstimated Auto-generated benchmark	7GB
huggyllama/llama-7b	Q8	27.48 tok/sEstimated Auto-generated benchmark	7GB
distilbert/distilgpt2	Q8	27.29 tok/sEstimated Auto-generated benchmark	7GB
MiniMaxAI/MiniMax-M2	Q8	27.29 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-V3	Q8	27.29 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-0528	Q8	27.27 tok/sEstimated Auto-generated benchmark	7GB
bigscience/bloomz-560m	Q8	27.26 tok/sEstimated Auto-generated benchmark	7GB
ibm-granite/granite-3.3-8b-instruct	Q8	27.19 tok/sEstimated Auto-generated benchmark	8GB
rednote-hilab/dots.ocr	Q8	27.15 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Meta-Llama-3-8B-Instruct	Q8	27.11 tok/sEstimated Auto-generated benchmark	8GB
lmsys/vicuna-7b-v1.5	Q8	27.03 tok/sEstimated Auto-generated benchmark	7GB
microsoft/DialoGPT-medium	Q8	26.99 tok/sEstimated Auto-generated benchmark	7GB
zai-org/GLM-4.6-FP8	Q8	26.95 tok/sEstimated Auto-generated benchmark	7GB
liuhaotian/llava-v1.5-7b	Q8	26.95 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Embedding-8B	Q8	26.93 tok/sEstimated Auto-generated benchmark	8GB
HuggingFaceTB/SmolLM2-135M	Q8	26.86 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3-mini-128k-instruct	Q8	26.85 tok/sEstimated Auto-generated benchmark	7GB
IlyaGusev/saiga_llama3_8b	Q8	26.54 tok/sEstimated Auto-generated benchmark	8GB
unsloth/gpt-oss-20b-BF16	Q4	26.53 tok/sEstimated Auto-generated benchmark	10GB
swiss-ai/Apertus-8B-Instruct-2509	Q8	26.11 tok/sEstimated Auto-generated benchmark	8GB
meta-llama/Llama-3.1-8B	Q8	26.00 tok/sEstimated Auto-generated benchmark	8GB
GSAI-ML/LLaDA-8B-Instruct	Q8	25.95 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen3-8B-Base	Q8	25.81 tok/sEstimated Auto-generated benchmark	8GB
GSAI-ML/LLaDA-8B-Base	Q8	25.81 tok/sEstimated Auto-generated benchmark	8GB
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q4	25.59 tok/sEstimated Auto-generated benchmark	10GB
meta-llama/Meta-Llama-3-8B	Q8	25.59 tok/sEstimated Auto-generated benchmark	8GB
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	24.98 tok/sEstimated Auto-generated benchmark	9GB

allenai/OLMo-2-0425-1B

1GB

90.42 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

89.56 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

86.19 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

84.89 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

84.59 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

84.47 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

82.79 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

82.64 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

80.70 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

70.61 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

68.60 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

68.47 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

65.86 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

64.94 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

64.77 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

63.55 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

63.22 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

62.85 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

61.32 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

58.52 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

58.32 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

57.87 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

57.19 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

56.44 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

56.36 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

2GB

56.23 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

56.05 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

55.70 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

55.70 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

55.55 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

55.34 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

2GB

55.16 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

55.13 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

54.36 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

53.55 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

53.47 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

53.09 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

52.79 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

52.73 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

52.52 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

3GB

52.18 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

3GB

52.15 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

52.00 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

2GB

51.72 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

51.48 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

50.94 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit

2GB

50.79 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

50.18 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

49.45 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

2GB

49.42 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3GB

49.28 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

49.13 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

48.69 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

46.85 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

3GB

46.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

4GB

46.72 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

4GB

46.59 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

3GB

46.46 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

46.44 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

46.43 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

2GB

46.33 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

4GB

46.19 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

4GB

46.14 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

46.05 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-medium

4GB

45.92 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

4GB

45.88 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

45.72 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

45.66 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

4GB

45.59 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

45.52 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

45.39 tok/sEstimated

Auto-generated benchmark

sshleifer/tiny-gpt2

4GB

45.16 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

2GB

45.04 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

3GB

44.85 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

4GB

44.82 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

44.77 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

4GB

44.71 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

44.63 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

4GB

44.56 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

44.53 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

44.48 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

4GB

44.47 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

44.25 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

4GB

44.20 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

3GB

44.15 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

44.11 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

44.07 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

4GB

44.04 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

3GB

43.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

4GB

43.99 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

43.86 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

3GB

43.73 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

4GB

43.70 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

4GB

43.69 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

43.68 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

43.55 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

43.50 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

43.50 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

3GB

43.37 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

4GB

43.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

3GB

43.34 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

43.22 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

4GB

42.91 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-mini-instruct

4GB

42.90 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

4GB

42.88 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

2GB

42.79 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

4GB

42.75 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

42.68 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

4GB

42.64 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

42.40 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

42.39 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

42.38 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

42.32 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

4GB

42.17 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

42.11 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

42.09 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

4GB

42.06 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

4GB

41.95 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

41.88 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

41.80 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

41.73 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

41.72 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

4GB

41.54 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

4GB

41.45 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

41.44 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

3GB

41.40 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

4GB

41.37 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

4GB

41.28 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM2-135M

4GB

41.18 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

4GB

40.96 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

4GB

40.79 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

40.78 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

40.67 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

40.55 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

4GB

40.22 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

40.15 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

3GB

40.13 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

4GB

40.05 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

4GB

40.00 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

39.89 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

4GB

39.80 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

4GB

39.68 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

39.62 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

4GB

39.44 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

4GB

39.39 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

39.36 tok/sEstimated

Auto-generated benchmark

nvidia/NVIDIA-Nemotron-Nano-9B-v2

5GB

39.29 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

3GB

39.28 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

39.11 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

39.08 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

4GB

39.08 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

39.04 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

38.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

4GB

38.82 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

4GB

38.78 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

4GB

38.70 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

4GB

38.61 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

4GB

38.54 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

3GB

38.52 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

3GB

38.32 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

4GB

37.98 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

37.62 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

37.11 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

4GB

37.00 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

36.95 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

5GB

36.89 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

3GB

36.88 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

4GB

36.82 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

5GB

36.69 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

3GB

36.58 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit

4GB

36.49 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

5GB

35.80 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B

7GB

35.76 tok/sEstimated

Auto-generated benchmark

ai-forever/ruGPT-3.5-13B

7GB

35.25 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

5GB

34.76 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

5GB

34.34 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

6GB

34.07 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

5GB

34.00 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

5GB

33.92 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B-Instruct

7GB

33.92 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

4GB

33.59 tok/sEstimated

Auto-generated benchmark

OpenPipe/Qwen3-14B-Instruct

7GB

33.15 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

4GB

33.04 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

7GB

32.79 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

5GB

32.76 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

7GB

32.69 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

7GB

32.54 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

7GB

32.45 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

7GB

32.34 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

7GB

32.24 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

7GB

32.21 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

7GB

32.06 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

5GB

32.05 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

7GB

31.96 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

7GB

31.90 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-mini-instruct

7GB

31.78 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

7GB

31.76 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

7GB

31.64 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

7GB

31.60 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

7GB

31.49 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

7GB

31.48 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

7GB

31.29 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

5GB

31.26 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

7GB

31.19 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

5GB

31.19 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

7GB

31.17 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

7GB

31.15 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B-Base

7GB

30.98 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

7GB

30.94 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

7GB

30.92 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B

7GB

30.92 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

7GB

30.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

5GB

30.91 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

7GB

30.87 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

7GB

30.84 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-13b-chat-hf

7GB

30.76 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

7GB

30.75 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

5GB

30.52 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

8GB

30.50 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

8GB

30.46 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

7GB

30.36 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

8GB

30.35 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

7GB

30.34 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

8GB

30.26 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

8GB

30.26 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

7GB

30.15 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

7GB

29.92 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

7GB

29.92 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

7GB

29.88 tok/sEstimated

Auto-generated benchmark

openai/gpt-oss-20b

10GB

29.80 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

7GB

29.75 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

6GB

29.60 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

8GB

29.45 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

7GB

29.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

7GB

29.33 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

7GB

29.32 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

6GB

29.31 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

7GB

29.27 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

7GB

29.08 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

7GB

29.08 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

7GB

29.05 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

8GB

28.77 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

7GB

28.76 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

7GB

28.53 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

6GB

28.46 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

7GB

28.44 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

8GB

28.40 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

7GB

28.38 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

8GB

28.09 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

7GB

27.84 tok/sEstimated

Auto-generated benchmark

mlx-community/gpt-oss-20b-MXFP4-Q8

10GB

27.83 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

7GB

27.73 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

8GB

27.62 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

7GB

27.62 tok/sEstimated

Auto-generated benchmark

sshleifer/tiny-gpt2

7GB

27.61 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

7GB

27.61 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

7GB

27.57 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

7GB

27.53 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

7GB

27.49 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

7GB

27.48 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

7GB

27.29 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

7GB

27.29 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

7GB

27.29 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

7GB

27.27 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

7GB

27.26 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

8GB

27.19 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

7GB

27.15 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

8GB

27.11 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

7GB

27.03 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-medium

7GB

26.99 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

7GB

26.95 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

7GB

26.95 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

8GB

26.93 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM2-135M

7GB

26.86 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

7GB

26.85 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

8GB

26.54 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-BF16

10GB

26.53 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

8GB

26.11 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

8GB

26.00 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

8GB

25.95 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

8GB

25.81 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

8GB

25.81 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-unsloth-bnb-4bit

10GB

25.59 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

8GB

25.59 tok/sEstimated

Auto-generated benchmark

nvidia/NVIDIA-Nemotron-Nano-9B-v2

9GB

24.98 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Gensyn/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	36.89 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen3-8B-Base	Q4	Fits comfortably	39.08 tok/sEstimated	4GB (have 10GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	52.15 tok/sEstimated	3GB (have 10GB)
Qwen/Qwen3-0.6B-Base	Q8	Fits comfortably	34.07 tok/sEstimated	6GB (have 10GB)
Qwen/Qwen3-0.6B	Q8	Fits comfortably	29.60 tok/sEstimated	6GB (have 10GB)
Qwen/Qwen3-0.6B-Base	Q4	Fits comfortably	43.68 tok/sEstimated	3GB (have 10GB)
Qwen/Qwen3-0.6B	Q4	Fits comfortably	46.85 tok/sEstimated	3GB (have 10GB)
openai-community/gpt2-medium	Q8	Fits comfortably	30.15 tok/sEstimated	7GB (have 10GB)
Qwen/Qwen2.5-7B-Instruct	Q8	Fits comfortably	31.19 tok/sEstimated	7GB (have 10GB)
openai-community/gpt2-medium	Q4	Fits comfortably	44.25 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen2.5-7B-Instruct	Q4	Fits comfortably	42.11 tok/sEstimated	4GB (have 10GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	32.24 tok/sEstimated	7GB (have 10GB)
openai-community/gpt2	Q8	Fits comfortably	30.92 tok/sEstimated	7GB (have 10GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	44.53 tok/sEstimated	4GB (have 10GB)
openai-community/gpt2	Q4	Fits comfortably	38.91 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen2.5-Math-1.5B	Q8	Fits comfortably	35.80 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2.5-Math-1.5B	Q4	Fits comfortably	46.77 tok/sEstimated	3GB (have 10GB)
HuggingFaceTB/SmolLM-135M	Q8	Fits comfortably	30.75 tok/sEstimated	7GB (have 10GB)
HuggingFaceTB/SmolLM-135M	Q4	Fits comfortably	44.47 tok/sEstimated	4GB (have 10GB)
unsloth/gpt-oss-20b-BF16	Q8	Not supported	—	20GB (have 10GB)
unsloth/gpt-oss-20b-BF16	Q4	Fits (tight)	26.53 tok/sEstimated	10GB (have 10GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Not supported	—	70GB (have 10GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Not supported	—	35GB (have 10GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	29.45 tok/sEstimated	8GB (have 10GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	37.00 tok/sEstimated	4GB (have 10GB)
zai-org/GLM-4.5-Air	Q8	Fits comfortably	32.06 tok/sEstimated	7GB (have 10GB)
zai-org/GLM-4.5-Air	Q4	Fits comfortably	43.50 tok/sEstimated	4GB (have 10GB)
mistralai/Mistral-7B-Instruct-v0.1	Q8	Fits comfortably	27.73 tok/sEstimated	7GB (have 10GB)
mistralai/Mistral-7B-Instruct-v0.1	Q4	Fits comfortably	44.82 tok/sEstimated	4GB (have 10GB)
LiquidAI/LFM2-1.2B	Q8	Fits comfortably	49.42 tok/sEstimated	2GB (have 10GB)
LiquidAI/LFM2-1.2B	Q4	Fits comfortably	62.85 tok/sEstimated	1GB (have 10GB)
mistralai/Mistral-7B-v0.1	Q8	Fits comfortably	28.76 tok/sEstimated	7GB (have 10GB)
mistralai/Mistral-7B-v0.1	Q4	Fits comfortably	44.63 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Not supported	—	32GB (have 10GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Not supported	—	16GB (have 10GB)
deepseek-ai/DeepSeek-R1-0528	Q8	Fits comfortably	27.27 tok/sEstimated	7GB (have 10GB)
deepseek-ai/DeepSeek-R1-0528	Q4	Fits comfortably	44.07 tok/sEstimated	4GB (have 10GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	26.00 tok/sEstimated	8GB (have 10GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	43.55 tok/sEstimated	4GB (have 10GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	28.38 tok/sEstimated	7GB (have 10GB)
deepseek-ai/DeepSeek-V3.1	Q4	Fits comfortably	42.17 tok/sEstimated	4GB (have 10GB)
microsoft/phi-4	Q8	Fits comfortably	29.75 tok/sEstimated	7GB (have 10GB)
microsoft/phi-4	Q4	Fits comfortably	41.37 tok/sEstimated	4GB (have 10GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	36.58 tok/sEstimated	3GB (have 10GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	53.55 tok/sEstimated	2GB (have 10GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	30.91 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	46.46 tok/sEstimated	3GB (have 10GB)
MiniMaxAI/MiniMax-M2	Q8	Fits comfortably	27.29 tok/sEstimated	7GB (have 10GB)
MiniMaxAI/MiniMax-M2	Q4	Fits comfortably	38.61 tok/sEstimated	4GB (have 10GB)
microsoft/DialoGPT-medium	Q8	Fits comfortably	26.99 tok/sEstimated	7GB (have 10GB)
microsoft/DialoGPT-medium	Q4	Fits comfortably	45.92 tok/sEstimated	4GB (have 10GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	26.95 tok/sEstimated	7GB (have 10GB)
zai-org/GLM-4.6-FP8	Q4	Fits comfortably	42.64 tok/sEstimated	4GB (have 10GB)
HuggingFaceTB/SmolLM2-135M	Q8	Fits comfortably	26.86 tok/sEstimated	7GB (have 10GB)
HuggingFaceTB/SmolLM2-135M	Q4	Fits comfortably	41.18 tok/sEstimated	4GB (have 10GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	28.40 tok/sEstimated	8GB (have 10GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	42.09 tok/sEstimated	4GB (have 10GB)
meta-llama/Llama-2-7b-hf	Q8	Fits comfortably	30.84 tok/sEstimated	7GB (have 10GB)
meta-llama/Llama-2-7b-hf	Q4	Fits comfortably	44.56 tok/sEstimated	4GB (have 10GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	29.08 tok/sEstimated	7GB (have 10GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	39.04 tok/sEstimated	4GB (have 10GB)
microsoft/phi-2	Q8	Fits comfortably	27.84 tok/sEstimated	7GB (have 10GB)
microsoft/phi-2	Q4	Fits comfortably	39.89 tok/sEstimated	4GB (have 10GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	—	70GB (have 10GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Not supported	—	35GB (have 10GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	31.19 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2.5-0.5B	Q4	Fits comfortably	48.69 tok/sEstimated	3GB (have 10GB)
Qwen/Qwen3-14B	Q8	Not supported	—	14GB (have 10GB)
Qwen/Qwen3-14B	Q4	Fits comfortably	30.92 tok/sEstimated	7GB (have 10GB)
Qwen/Qwen3-Embedding-8B	Q8	Fits comfortably	26.93 tok/sEstimated	8GB (have 10GB)
Qwen/Qwen3-Embedding-8B	Q4	Fits comfortably	43.70 tok/sEstimated	4GB (have 10GB)
meta-llama/Llama-3.3-70B-Instruct	Q8	Not supported	—	70GB (have 10GB)
meta-llama/Llama-3.3-70B-Instruct	Q4	Not supported	—	35GB (have 10GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q8	Fits comfortably	30.26 tok/sEstimated	8GB (have 10GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	Fits comfortably	36.95 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Not supported	—	14GB (have 10GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	33.92 tok/sEstimated	7GB (have 10GB)
Qwen/Qwen2.5-1.5B	Q8	Fits comfortably	34.76 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	50.18 tok/sEstimated	3GB (have 10GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	Fits comfortably	39.44 tok/sEstimated	4GB (have 10GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	Fits comfortably	52.79 tok/sEstimated	2GB (have 10GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	Not supported	—	20GB (have 10GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	Fits (tight)	27.83 tok/sEstimated	10GB (have 10GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	32.05 tok/sEstimated	5GB (have 10GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	49.28 tok/sEstimated	3GB (have 10GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	27.11 tok/sEstimated	8GB (have 10GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	42.06 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-Reranker-0.6B	Q8	Fits comfortably	29.31 tok/sEstimated	6GB (have 10GB)
Qwen/Qwen3-Reranker-0.6B	Q4	Fits comfortably	42.38 tok/sEstimated	3GB (have 10GB)
rednote-hilab/dots.ocr	Q8	Fits comfortably	27.15 tok/sEstimated	7GB (have 10GB)
rednote-hilab/dots.ocr	Q4	Fits comfortably	42.75 tok/sEstimated	4GB (have 10GB)
google-t5/t5-3b	Q8	Fits comfortably	36.88 tok/sEstimated	3GB (have 10GB)
google-t5/t5-3b	Q4	Fits comfortably	61.32 tok/sEstimated	2GB (have 10GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q8	Not supported	—	30GB (have 10GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Not supported	—	15GB (have 10GB)
Qwen/Qwen3-4B	Q8	Fits comfortably	33.59 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-4B	Q4	Fits comfortably	49.13 tok/sEstimated	2GB (have 10GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	31.15 tok/sEstimated	7GB (have 10GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	41.88 tok/sEstimated	4GB (have 10GB)
openai-community/gpt2-large	Q8	Fits comfortably	32.21 tok/sEstimated	7GB (have 10GB)
openai-community/gpt2-large	Q4	Fits comfortably	46.59 tok/sEstimated	4GB (have 10GB)
microsoft/Phi-3-mini-4k-instruct	Q8	Fits comfortably	29.92 tok/sEstimated	7GB (have 10GB)
microsoft/Phi-3-mini-4k-instruct	Q4	Fits comfortably	40.22 tok/sEstimated	4GB (have 10GB)
allenai/OLMo-2-0425-1B	Q8	Fits comfortably	55.70 tok/sEstimated	1GB (have 10GB)
allenai/OLMo-2-0425-1B	Q4	Fits comfortably	90.42 tok/sEstimated	1GB (have 10GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q8	Not supported	—	80GB (have 10GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Not supported	—	40GB (have 10GB)
Qwen/Qwen3-32B	Q8	Not supported	—	32GB (have 10GB)
Qwen/Qwen3-32B	Q4	Not supported	—	16GB (have 10GB)
Qwen/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	34.00 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	51.48 tok/sEstimated	3GB (have 10GB)
Qwen/Qwen2.5-7B	Q8	Fits comfortably	32.79 tok/sEstimated	7GB (have 10GB)
Qwen/Qwen2.5-7B	Q4	Fits comfortably	43.99 tok/sEstimated	4GB (have 10GB)
meta-llama/Meta-Llama-3-8B	Q8	Fits comfortably	25.59 tok/sEstimated	8GB (have 10GB)
meta-llama/Meta-Llama-3-8B	Q4	Fits comfortably	42.32 tok/sEstimated	4GB (have 10GB)
meta-llama/Llama-3.2-1B	Q8	Fits comfortably	63.22 tok/sEstimated	1GB (have 10GB)
meta-llama/Llama-3.2-1B	Q4	Fits comfortably	84.59 tok/sEstimated	1GB (have 10GB)
petals-team/StableBeluga2	Q8	Fits comfortably	29.32 tok/sEstimated	7GB (have 10GB)
petals-team/StableBeluga2	Q4	Fits comfortably	45.66 tok/sEstimated	4GB (have 10GB)
vikhyatk/moondream2	Q8	Fits comfortably	30.94 tok/sEstimated	7GB (have 10GB)
vikhyatk/moondream2	Q4	Fits comfortably	45.88 tok/sEstimated	4GB (have 10GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	41.40 tok/sEstimated	3GB (have 10GB)
meta-llama/Llama-3.2-3B-Instruct	Q4	Fits comfortably	58.32 tok/sEstimated	2GB (have 10GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q8	Not supported	—	70GB (have 10GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	Not supported	—	35GB (have 10GB)
distilbert/distilgpt2	Q8	Fits comfortably	27.29 tok/sEstimated	7GB (have 10GB)
distilbert/distilgpt2	Q4	Fits comfortably	40.79 tok/sEstimated	4GB (have 10GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q8	Not supported	—	32GB (have 10GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q4	Not supported	—	16GB (have 10GB)
inference-net/Schematron-3B	Q8	Fits comfortably	39.28 tok/sEstimated	3GB (have 10GB)
inference-net/Schematron-3B	Q4	Fits comfortably	52.73 tok/sEstimated	2GB (have 10GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	30.26 tok/sEstimated	8GB (have 10GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	41.73 tok/sEstimated	4GB (have 10GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	31.96 tok/sEstimated	7GB (have 10GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	41.45 tok/sEstimated	4GB (have 10GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	38.32 tok/sEstimated	3GB (have 10GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	57.87 tok/sEstimated	2GB (have 10GB)
bigscience/bloomz-560m	Q8	Fits comfortably	27.26 tok/sEstimated	7GB (have 10GB)
bigscience/bloomz-560m	Q4	Fits comfortably	42.40 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen2.5-3B-Instruct	Q8	Fits comfortably	43.34 tok/sEstimated	3GB (have 10GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	56.44 tok/sEstimated	2GB (have 10GB)
openai/gpt-oss-120b	Q8	Not supported	—	120GB (have 10GB)
openai/gpt-oss-120b	Q4	Not supported	—	60GB (have 10GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	58.52 tok/sEstimated	1GB (have 10GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	82.64 tok/sEstimated	1GB (have 10GB)
Qwen/Qwen3-4B-Instruct-2507	Q8	Fits comfortably	33.04 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	51.72 tok/sEstimated	2GB (have 10GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	30.36 tok/sEstimated	7GB (have 10GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	45.59 tok/sEstimated	4GB (have 10GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	56.05 tok/sEstimated	1GB (have 10GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	Fits comfortably	89.56 tok/sEstimated	1GB (have 10GB)
facebook/opt-125m	Q8	Fits comfortably	29.88 tok/sEstimated	7GB (have 10GB)
facebook/opt-125m	Q4	Fits comfortably	44.04 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen2.5-1.5B-Instruct	Q8	Fits comfortably	30.52 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	44.85 tok/sEstimated	3GB (have 10GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	28.46 tok/sEstimated	6GB (have 10GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	44.15 tok/sEstimated	3GB (have 10GB)
google/gemma-3-1b-it	Q8	Fits comfortably	54.36 tok/sEstimated	1GB (have 10GB)
google/gemma-3-1b-it	Q4	Fits comfortably	82.79 tok/sEstimated	1GB (have 10GB)
openai/gpt-oss-20b	Q8	Not supported	—	20GB (have 10GB)
openai/gpt-oss-20b	Q4	Fits (tight)	29.80 tok/sEstimated	10GB (have 10GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Not supported	—	34GB (have 10GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	Not supported	—	17GB (have 10GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	28.77 tok/sEstimated	8GB (have 10GB)
meta-llama/Llama-3.1-8B-Instruct	Q4	Fits comfortably	43.36 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q8	Not supported	—	80GB (have 10GB)
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q4	Not supported	—	40GB (have 10GB)
ai-forever/ruGPT-3.5-13B	Q8	Not supported	—	13GB (have 10GB)
ai-forever/ruGPT-3.5-13B	Q4	Fits comfortably	35.25 tok/sEstimated	7GB (have 10GB)
baichuan-inc/Baichuan-M2-32B	Q8	Not supported	—	32GB (have 10GB)
baichuan-inc/Baichuan-M2-32B	Q4	Not supported	—	16GB (have 10GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	28.44 tok/sEstimated	7GB (have 10GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	41.72 tok/sEstimated	4GB (have 10GB)
ibm-granite/granite-3.3-8b-instruct	Q8	Fits comfortably	27.19 tok/sEstimated	8GB (have 10GB)
ibm-granite/granite-3.3-8b-instruct	Q4	Fits comfortably	42.68 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-1.7B-Base	Q8	Fits comfortably	27.49 tok/sEstimated	7GB (have 10GB)
Qwen/Qwen3-1.7B-Base	Q4	Fits comfortably	45.52 tok/sEstimated	4GB (have 10GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q8	Not supported	—	20GB (have 10GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q4	Fits (tight)	25.59 tok/sEstimated	10GB (have 10GB)
BSC-LT/salamandraTA-7b-instruct	Q8	Fits comfortably	29.92 tok/sEstimated	7GB (have 10GB)
BSC-LT/salamandraTA-7b-instruct	Q4	Fits comfortably	42.39 tok/sEstimated	4GB (have 10GB)
dicta-il/dictalm2.0-instruct	Q8	Fits comfortably	32.69 tok/sEstimated	7GB (have 10GB)
dicta-il/dictalm2.0-instruct	Q4	Fits comfortably	41.95 tok/sEstimated	4GB (have 10GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q8	Not supported	—	30GB (have 10GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q4	Not supported	—	15GB (have 10GB)
GSAI-ML/LLaDA-8B-Base	Q8	Fits comfortably	25.81 tok/sEstimated	8GB (have 10GB)
GSAI-ML/LLaDA-8B-Base	Q4	Fits comfortably	37.62 tok/sEstimated	4GB (have 10GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q8	Not supported	—	30GB (have 10GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q4	Not supported	—	15GB (have 10GB)
Qwen/Qwen2-0.5B-Instruct	Q8	Fits comfortably	36.69 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2-0.5B-Instruct	Q4	Fits comfortably	52.18 tok/sEstimated	3GB (have 10GB)
deepseek-ai/DeepSeek-V3	Q8	Fits comfortably	27.29 tok/sEstimated	7GB (have 10GB)
deepseek-ai/DeepSeek-V3	Q4	Fits comfortably	44.77 tok/sEstimated	4GB (have 10GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q8	Fits comfortably	33.92 tok/sEstimated	5GB (have 10GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	Fits comfortably	45.39 tok/sEstimated	3GB (have 10GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Not supported	—	30GB (have 10GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	Not supported	—	15GB (have 10GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	Q8	Not supported	—	30GB (have 10GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	Q4	Not supported	—	15GB (have 10GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q8	Not supported	—	30GB (have 10GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q4	Not supported	—	15GB (have 10GB)
AI-MO/Kimina-Prover-72B	Q8	Not supported	—	72GB (have 10GB)
AI-MO/Kimina-Prover-72B	Q4	Not supported	—	36GB (have 10GB)
apple/OpenELM-1_1B-Instruct	Q8	Fits comfortably	65.86 tok/sEstimated	1GB (have 10GB)
apple/OpenELM-1_1B-Instruct	Q4	Fits comfortably	80.70 tok/sEstimated	1GB (have 10GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	28.09 tok/sEstimated	8GB (have 10GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	41.54 tok/sEstimated	4GB (have 10GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	Fits (tight)	24.98 tok/sEstimated	9GB (have 10GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	Fits comfortably	39.29 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2.5-3B	Q8	Fits comfortably	38.52 tok/sEstimated	3GB (have 10GB)
Qwen/Qwen2.5-3B	Q4	Fits comfortably	55.34 tok/sEstimated	2GB (have 10GB)
lmsys/vicuna-7b-v1.5	Q8	Fits comfortably	27.03 tok/sEstimated	7GB (have 10GB)
lmsys/vicuna-7b-v1.5	Q4	Fits comfortably	46.19 tok/sEstimated	4GB (have 10GB)
meta-llama/Llama-2-13b-chat-hf	Q8	Not supported	—	13GB (have 10GB)
meta-llama/Llama-2-13b-chat-hf	Q4	Fits comfortably	30.76 tok/sEstimated	7GB (have 10GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q8	Not supported	—	80GB (have 10GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q4	Not supported	—	40GB (have 10GB)
unsloth/gemma-3-1b-it	Q8	Fits comfortably	64.94 tok/sEstimated	1GB (have 10GB)
unsloth/gemma-3-1b-it	Q4	Fits comfortably	86.19 tok/sEstimated	1GB (have 10GB)
bigcode/starcoder2-3b	Q8	Fits comfortably	40.13 tok/sEstimated	3GB (have 10GB)
bigcode/starcoder2-3b	Q4	Fits comfortably	55.55 tok/sEstimated	2GB (have 10GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q8	Not supported	—	80GB (have 10GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q4	Not supported	—	40GB (have 10GB)
ibm-granite/granite-docling-258M	Q8	Fits comfortably	31.48 tok/sEstimated	7GB (have 10GB)
ibm-granite/granite-docling-258M	Q4	Fits comfortably	41.28 tok/sEstimated	4GB (have 10GB)
skt/kogpt2-base-v2	Q8	Fits comfortably	30.91 tok/sEstimated	7GB (have 10GB)
skt/kogpt2-base-v2	Q4	Fits comfortably	40.55 tok/sEstimated	4GB (have 10GB)
google/gemma-3-270m-it	Q8	Fits comfortably	27.53 tok/sEstimated	7GB (have 10GB)
google/gemma-3-270m-it	Q4	Fits comfortably	38.70 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	Fits comfortably	40.00 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	Fits comfortably	53.09 tok/sEstimated	2GB (have 10GB)
Qwen/Qwen2.5-32B	Q8	Not supported	—	32GB (have 10GB)
Qwen/Qwen2.5-32B	Q4	Not supported	—	16GB (have 10GB)
parler-tts/parler-tts-large-v1	Q8	Fits comfortably	31.29 tok/sEstimated	7GB (have 10GB)
parler-tts/parler-tts-large-v1	Q4	Fits comfortably	44.48 tok/sEstimated	4GB (have 10GB)
EleutherAI/pythia-70m-deduped	Q8	Fits comfortably	32.34 tok/sEstimated	7GB (have 10GB)
EleutherAI/pythia-70m-deduped	Q4	Fits comfortably	40.15 tok/sEstimated	4GB (have 10GB)
microsoft/VibeVoice-1.5B	Q8	Fits comfortably	31.26 tok/sEstimated	5GB (have 10GB)
microsoft/VibeVoice-1.5B	Q4	Fits comfortably	49.45 tok/sEstimated	3GB (have 10GB)
ibm-granite/granite-3.3-2b-instruct	Q8	Fits comfortably	46.33 tok/sEstimated	2GB (have 10GB)
ibm-granite/granite-3.3-2b-instruct	Q4	Fits comfortably	68.60 tok/sEstimated	1GB (have 10GB)
Qwen/Qwen2.5-72B-Instruct	Q8	Not supported	—	72GB (have 10GB)
Qwen/Qwen2.5-72B-Instruct	Q4	Not supported	—	36GB (have 10GB)
liuhaotian/llava-v1.5-7b	Q8	Fits comfortably	26.95 tok/sEstimated	7GB (have 10GB)
liuhaotian/llava-v1.5-7b	Q4	Fits comfortably	40.96 tok/sEstimated	4GB (have 10GB)
google/gemma-2b	Q8	Fits comfortably	42.79 tok/sEstimated	2GB (have 10GB)
google/gemma-2b	Q4	Fits comfortably	68.47 tok/sEstimated	1GB (have 10GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	Fits comfortably	31.17 tok/sEstimated	7GB (have 10GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	Fits comfortably	41.44 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-235B-A22B	Q8	Not supported	—	235GB (have 10GB)
Qwen/Qwen3-235B-A22B	Q4	Not supported	—	118GB (have 10GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	Fits comfortably	30.35 tok/sEstimated	8GB (have 10GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	40.05 tok/sEstimated	4GB (have 10GB)
microsoft/Phi-4-mini-instruct	Q8	Fits comfortably	31.78 tok/sEstimated	7GB (have 10GB)
microsoft/Phi-4-mini-instruct	Q4	Fits comfortably	42.90 tok/sEstimated	4GB (have 10GB)
llamafactory/tiny-random-Llama-3	Q8	Fits comfortably	32.54 tok/sEstimated	7GB (have 10GB)
llamafactory/tiny-random-Llama-3	Q4	Fits comfortably	44.20 tok/sEstimated	4GB (have 10GB)
HuggingFaceH4/zephyr-7b-beta	Q8	Fits comfortably	29.05 tok/sEstimated	7GB (have 10GB)
HuggingFaceH4/zephyr-7b-beta	Q4	Fits comfortably	39.62 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-4B-Thinking-2507	Q8	Fits comfortably	38.78 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-4B-Thinking-2507	Q4	Fits comfortably	53.47 tok/sEstimated	2GB (have 10GB)
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q8	Not supported	—	30GB (have 10GB)
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q4	Not supported	—	15GB (have 10GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	Fits comfortably	27.62 tok/sEstimated	8GB (have 10GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	Fits comfortably	43.69 tok/sEstimated	4GB (have 10GB)
unsloth/Llama-3.2-1B-Instruct	Q8	Fits comfortably	64.77 tok/sEstimated	1GB (have 10GB)
unsloth/Llama-3.2-1B-Instruct	Q4	Fits comfortably	84.89 tok/sEstimated	1GB (have 10GB)
GSAI-ML/LLaDA-8B-Instruct	Q8	Fits comfortably	25.95 tok/sEstimated	8GB (have 10GB)
GSAI-ML/LLaDA-8B-Instruct	Q4	Fits comfortably	40.78 tok/sEstimated	4GB (have 10GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q8	Not supported	—	90GB (have 10GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q4	Not supported	—	45GB (have 10GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	Fits comfortably	29.33 tok/sEstimated	7GB (have 10GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	Fits comfortably	38.82 tok/sEstimated	4GB (have 10GB)
numind/NuExtract-1.5	Q8	Fits comfortably	31.76 tok/sEstimated	7GB (have 10GB)
numind/NuExtract-1.5	Q4	Fits comfortably	39.36 tok/sEstimated	4GB (have 10GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q8	Fits comfortably	30.87 tok/sEstimated	7GB (have 10GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	Fits comfortably	43.22 tok/sEstimated	4GB (have 10GB)
hmellor/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	29.27 tok/sEstimated	7GB (have 10GB)
hmellor/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	44.71 tok/sEstimated	4GB (have 10GB)
huggyllama/llama-7b	Q8	Fits comfortably	27.48 tok/sEstimated	7GB (have 10GB)
huggyllama/llama-7b	Q4	Fits comfortably	38.54 tok/sEstimated	4GB (have 10GB)
deepseek-ai/DeepSeek-V3-0324	Q8	Fits comfortably	27.57 tok/sEstimated	7GB (have 10GB)
deepseek-ai/DeepSeek-V3-0324	Q4	Fits comfortably	46.44 tok/sEstimated	4GB (have 10GB)
microsoft/Phi-3-mini-128k-instruct	Q8	Fits comfortably	26.85 tok/sEstimated	7GB (have 10GB)
microsoft/Phi-3-mini-128k-instruct	Q4	Fits comfortably	39.11 tok/sEstimated	4GB (have 10GB)
sshleifer/tiny-gpt2	Q8	Fits comfortably	27.61 tok/sEstimated	7GB (have 10GB)
sshleifer/tiny-gpt2	Q4	Fits comfortably	45.16 tok/sEstimated	4GB (have 10GB)
meta-llama/Llama-Guard-3-8B	Q8	Fits comfortably	30.46 tok/sEstimated	8GB (have 10GB)
meta-llama/Llama-Guard-3-8B	Q4	Fits comfortably	44.11 tok/sEstimated	4GB (have 10GB)
openai-community/gpt2-xl	Q8	Fits comfortably	31.64 tok/sEstimated	7GB (have 10GB)
openai-community/gpt2-xl	Q4	Fits comfortably	46.05 tok/sEstimated	4GB (have 10GB)
OpenPipe/Qwen3-14B-Instruct	Q8	Not supported	—	14GB (have 10GB)
OpenPipe/Qwen3-14B-Instruct	Q4	Fits comfortably	33.15 tok/sEstimated	7GB (have 10GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q8	Not supported	—	70GB (have 10GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q4	Not supported	—	35GB (have 10GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q8	Fits comfortably	36.49 tok/sEstimated	4GB (have 10GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q4	Fits comfortably	50.79 tok/sEstimated	2GB (have 10GB)
ibm-research/PowerMoE-3b	Q8	Fits comfortably	43.73 tok/sEstimated	3GB (have 10GB)
ibm-research/PowerMoE-3b	Q4	Fits comfortably	63.55 tok/sEstimated	2GB (have 10GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q8	Fits comfortably	37.98 tok/sEstimated	4GB (have 10GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	Fits comfortably	56.23 tok/sEstimated	2GB (have 10GB)
unsloth/Llama-3.2-3B-Instruct	Q8	Fits comfortably	43.37 tok/sEstimated	3GB (have 10GB)
unsloth/Llama-3.2-3B-Instruct	Q4	Fits comfortably	57.19 tok/sEstimated	2GB (have 10GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	Fits comfortably	39.39 tok/sEstimated	4GB (have 10GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	Fits comfortably	55.16 tok/sEstimated	2GB (have 10GB)
meta-llama/Llama-3.2-3B	Q8	Fits comfortably	43.99 tok/sEstimated	3GB (have 10GB)
meta-llama/Llama-3.2-3B	Q4	Fits comfortably	56.36 tok/sEstimated	2GB (have 10GB)
EleutherAI/gpt-neo-125m	Q8	Fits comfortably	27.62 tok/sEstimated	7GB (have 10GB)
EleutherAI/gpt-neo-125m	Q4	Fits comfortably	42.88 tok/sEstimated	4GB (have 10GB)
codellama/CodeLlama-34b-hf	Q8	Not supported	—	34GB (have 10GB)
codellama/CodeLlama-34b-hf	Q4	Not supported	—	17GB (have 10GB)
meta-llama/Llama-Guard-3-1B	Q8	Fits comfortably	55.13 tok/sEstimated	1GB (have 10GB)
meta-llama/Llama-Guard-3-1B	Q4	Fits comfortably	84.47 tok/sEstimated	1GB (have 10GB)
Qwen/Qwen2-1.5B-Instruct	Q8	Fits comfortably	32.76 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2-1.5B-Instruct	Q4	Fits comfortably	50.94 tok/sEstimated	3GB (have 10GB)
google/gemma-2-2b-it	Q8	Fits comfortably	45.04 tok/sEstimated	2GB (have 10GB)
google/gemma-2-2b-it	Q4	Fits comfortably	70.61 tok/sEstimated	1GB (have 10GB)
Qwen/Qwen2.5-14B	Q8	Not supported	—	14GB (have 10GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	35.76 tok/sEstimated	7GB (have 10GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	Not supported	—	32GB (have 10GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q4	Not supported	—	16GB (have 10GB)
microsoft/Phi-3.5-mini-instruct	Q8	Fits comfortably	31.60 tok/sEstimated	7GB (have 10GB)
microsoft/Phi-3.5-mini-instruct	Q4	Fits comfortably	41.80 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-4B-Base	Q8	Fits comfortably	39.68 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	55.70 tok/sEstimated	2GB (have 10GB)
Qwen/Qwen2-7B-Instruct	Q8	Fits comfortably	29.35 tok/sEstimated	7GB (have 10GB)
Qwen/Qwen2-7B-Instruct	Q4	Fits comfortably	46.72 tok/sEstimated	4GB (have 10GB)
meta-llama/Llama-2-7b-chat-hf	Q8	Fits comfortably	31.49 tok/sEstimated	7GB (have 10GB)
meta-llama/Llama-2-7b-chat-hf	Q4	Fits comfortably	43.86 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-14B-Base	Q8	Not supported	—	14GB (have 10GB)
Qwen/Qwen3-14B-Base	Q4	Fits comfortably	30.98 tok/sEstimated	7GB (have 10GB)
swiss-ai/Apertus-8B-Instruct-2509	Q8	Fits comfortably	26.11 tok/sEstimated	8GB (have 10GB)
swiss-ai/Apertus-8B-Instruct-2509	Q4	Fits comfortably	37.11 tok/sEstimated	4GB (have 10GB)
microsoft/Phi-3.5-vision-instruct	Q8	Fits comfortably	27.61 tok/sEstimated	7GB (have 10GB)
microsoft/Phi-3.5-vision-instruct	Q4	Fits comfortably	46.43 tok/sEstimated	4GB (have 10GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	Fits comfortably	30.34 tok/sEstimated	7GB (have 10GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	Fits comfortably	39.08 tok/sEstimated	4GB (have 10GB)
rinna/japanese-gpt-neox-small	Q8	Fits comfortably	31.90 tok/sEstimated	7GB (have 10GB)
rinna/japanese-gpt-neox-small	Q4	Fits comfortably	39.80 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen2.5-Coder-1.5B	Q8	Fits comfortably	34.34 tok/sEstimated	5GB (have 10GB)
Qwen/Qwen2.5-Coder-1.5B	Q4	Fits comfortably	52.52 tok/sEstimated	3GB (have 10GB)
IlyaGusev/saiga_llama3_8b	Q8	Fits comfortably	26.54 tok/sEstimated	8GB (have 10GB)
IlyaGusev/saiga_llama3_8b	Q4	Fits comfortably	42.91 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-30B-A3B	Q8	Not supported	—	30GB (have 10GB)
Qwen/Qwen3-30B-A3B	Q4	Not supported	—	15GB (have 10GB)
deepseek-ai/DeepSeek-R1	Q8	Fits comfortably	28.53 tok/sEstimated	7GB (have 10GB)
deepseek-ai/DeepSeek-R1	Q4	Fits comfortably	46.14 tok/sEstimated	4GB (have 10GB)
microsoft/DialoGPT-small	Q8	Fits comfortably	32.45 tok/sEstimated	7GB (have 10GB)
microsoft/DialoGPT-small	Q4	Fits comfortably	45.72 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-8B-FP8	Q8	Fits comfortably	30.50 tok/sEstimated	8GB (have 10GB)
Qwen/Qwen3-8B-FP8	Q4	Fits comfortably	40.67 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q8	Not supported	—	30GB (have 10GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q4	Not supported	—	15GB (have 10GB)
Qwen/Qwen3-Embedding-4B	Q8	Fits comfortably	36.82 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-Embedding-4B	Q4	Fits comfortably	52.00 tok/sEstimated	2GB (have 10GB)
microsoft/Phi-4-multimodal-instruct	Q8	Fits comfortably	29.08 tok/sEstimated	7GB (have 10GB)
microsoft/Phi-4-multimodal-instruct	Q4	Fits comfortably	43.50 tok/sEstimated	4GB (have 10GB)
Qwen/Qwen3-8B-Base	Q8	Fits comfortably	28.12 tok/sEstimated	8GB (have 10GB)

Gensyn/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 10GB available

36.89 tok/sEstimated

Qwen/Qwen3-8B-BaseQ4

Fits comfortably4GB required · 10GB available

39.08 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 10GB available

52.15 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ8

Fits comfortably6GB required · 10GB available

34.07 tok/sEstimated

Qwen/Qwen3-0.6BQ8

Fits comfortably6GB required · 10GB available

29.60 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ4

Fits comfortably3GB required · 10GB available

43.68 tok/sEstimated

Qwen/Qwen3-0.6BQ4

Fits comfortably3GB required · 10GB available

46.85 tok/sEstimated

openai-community/gpt2-mediumQ8

Fits comfortably7GB required · 10GB available

30.15 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ8

Fits comfortably7GB required · 10GB available

31.19 tok/sEstimated

openai-community/gpt2-mediumQ4

Fits comfortably4GB required · 10GB available

44.25 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ4

Fits comfortably4GB required · 10GB available

42.11 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 10GB available

32.24 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 10GB available

30.92 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 10GB available

44.53 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 10GB available

38.91 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ8

Fits comfortably5GB required · 10GB available

35.80 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ4

Fits comfortably3GB required · 10GB available

46.77 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ8

Fits comfortably7GB required · 10GB available

30.75 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ4

Fits comfortably4GB required · 10GB available

44.47 tok/sEstimated

unsloth/gpt-oss-20b-BF16Q8

Not supported20GB required · 10GB available

Speed data coming soon

unsloth/gpt-oss-20b-BF16Q4

Fits (tight)10GB required · 10GB available

26.53 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Not supported70GB required · 10GB available

Speed data coming soon

meta-llama/Meta-Llama-3-70B-InstructQ4

Not supported35GB required · 10GB available

Speed data coming soon

unsloth/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 10GB available

29.45 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 10GB available

37.00 tok/sEstimated

zai-org/GLM-4.5-AirQ8

Fits comfortably7GB required · 10GB available

32.06 tok/sEstimated

zai-org/GLM-4.5-AirQ4

Fits comfortably4GB required · 10GB available

43.50 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.1Q8

Fits comfortably7GB required · 10GB available

27.73 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.1Q4

Fits comfortably4GB required · 10GB available

44.82 tok/sEstimated

LiquidAI/LFM2-1.2BQ8

Fits comfortably2GB required · 10GB available

49.42 tok/sEstimated

LiquidAI/LFM2-1.2BQ4

Fits comfortably1GB required · 10GB available

62.85 tok/sEstimated

mistralai/Mistral-7B-v0.1Q8

Fits comfortably7GB required · 10GB available

28.76 tok/sEstimated

mistralai/Mistral-7B-v0.1Q4

Fits comfortably4GB required · 10GB available

44.63 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Not supported32GB required · 10GB available

Speed data coming soon

Qwen/Qwen2.5-32B-InstructQ4

Not supported16GB required · 10GB available

Speed data coming soon

deepseek-ai/DeepSeek-R1-0528Q8

Fits comfortably7GB required · 10GB available

27.27 tok/sEstimated

deepseek-ai/DeepSeek-R1-0528Q4

Fits comfortably4GB required · 10GB available

44.07 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably8GB required · 10GB available

26.00 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 10GB available

43.55 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 10GB available

28.38 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q4

Fits comfortably4GB required · 10GB available

42.17 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 10GB available

29.75 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 10GB available

41.37 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 10GB available

36.58 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 10GB available

53.55 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 10GB available

30.91 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 10GB available

46.46 tok/sEstimated

MiniMaxAI/MiniMax-M2Q8

Fits comfortably7GB required · 10GB available

27.29 tok/sEstimated

MiniMaxAI/MiniMax-M2Q4

Fits comfortably4GB required · 10GB available

38.61 tok/sEstimated

microsoft/DialoGPT-mediumQ8

Fits comfortably7GB required · 10GB available

26.99 tok/sEstimated

microsoft/DialoGPT-mediumQ4

Fits comfortably4GB required · 10GB available

45.92 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 10GB available

26.95 tok/sEstimated

zai-org/GLM-4.6-FP8Q4

Fits comfortably4GB required · 10GB available

42.64 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ8

Fits comfortably7GB required · 10GB available

26.86 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ4

Fits comfortably4GB required · 10GB available

41.18 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably8GB required · 10GB available

28.40 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 10GB available

42.09 tok/sEstimated

meta-llama/Llama-2-7b-hfQ8

Fits comfortably7GB required · 10GB available

30.84 tok/sEstimated

meta-llama/Llama-2-7b-hfQ4

Fits comfortably4GB required · 10GB available

44.56 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 10GB available

29.08 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 10GB available

39.04 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 10GB available

27.84 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 10GB available

39.89 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported70GB required · 10GB available

Speed data coming soon

meta-llama/Llama-3.1-70B-InstructQ4

Not supported35GB required · 10GB available

Speed data coming soon

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 10GB available

31.19 tok/sEstimated

Qwen/Qwen2.5-0.5BQ4

Fits comfortably3GB required · 10GB available

48.69 tok/sEstimated

Qwen/Qwen3-14BQ8

Not supported14GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-14BQ4

Fits comfortably7GB required · 10GB available

30.92 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ8

Fits comfortably8GB required · 10GB available

26.93 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ4

Fits comfortably4GB required · 10GB available

43.70 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructQ8

Not supported70GB required · 10GB available

Speed data coming soon

meta-llama/Llama-3.3-70B-InstructQ4

Not supported35GB required · 10GB available

Speed data coming soon

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ8

Fits comfortably8GB required · 10GB available

30.26 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ4

Fits comfortably4GB required · 10GB available

36.95 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Not supported14GB required · 10GB available

Speed data coming soon

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 10GB available

33.92 tok/sEstimated

Qwen/Qwen2.5-1.5BQ8

Fits comfortably5GB required · 10GB available

34.76 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 10GB available

50.18 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8

Fits comfortably4GB required · 10GB available

39.44 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ4

Fits comfortably2GB required · 10GB available

52.79 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q8

Not supported20GB required · 10GB available

Speed data coming soon

mlx-community/gpt-oss-20b-MXFP4-Q8Q4

Fits (tight)10GB required · 10GB available

27.83 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 10GB available

32.05 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 10GB available

49.28 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably8GB required · 10GB available

27.11 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 10GB available

42.06 tok/sEstimated

Qwen/Qwen3-Reranker-0.6BQ8

Fits comfortably6GB required · 10GB available

29.31 tok/sEstimated

Qwen/Qwen3-Reranker-0.6BQ4

Fits comfortably3GB required · 10GB available

42.38 tok/sEstimated

rednote-hilab/dots.ocrQ8

Fits comfortably7GB required · 10GB available

27.15 tok/sEstimated

rednote-hilab/dots.ocrQ4

Fits comfortably4GB required · 10GB available

42.75 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 10GB available

36.88 tok/sEstimated

google-t5/t5-3bQ4

Fits comfortably2GB required · 10GB available

61.32 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q8

Not supported30GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Not supported15GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-4BQ8

Fits comfortably4GB required · 10GB available

33.59 tok/sEstimated

Qwen/Qwen3-4BQ4

Fits comfortably2GB required · 10GB available

49.13 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 10GB available

31.15 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 10GB available

41.88 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 10GB available

32.21 tok/sEstimated

openai-community/gpt2-largeQ4

Fits comfortably4GB required · 10GB available

46.59 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ8

Fits comfortably7GB required · 10GB available

29.92 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ4

Fits comfortably4GB required · 10GB available

40.22 tok/sEstimated

allenai/OLMo-2-0425-1BQ8

Fits comfortably1GB required · 10GB available

55.70 tok/sEstimated

allenai/OLMo-2-0425-1BQ4

Fits comfortably1GB required · 10GB available

90.42 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ8

Not supported80GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Not supported40GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-32BQ8

Not supported32GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-32BQ4

Not supported16GB required · 10GB available

Speed data coming soon

Qwen/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 10GB available

34.00 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 10GB available

51.48 tok/sEstimated

Qwen/Qwen2.5-7BQ8

Fits comfortably7GB required · 10GB available

32.79 tok/sEstimated

Qwen/Qwen2.5-7BQ4

Fits comfortably4GB required · 10GB available

43.99 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ8

Fits comfortably8GB required · 10GB available

25.59 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ4

Fits comfortably4GB required · 10GB available

42.32 tok/sEstimated

meta-llama/Llama-3.2-1BQ8

Fits comfortably1GB required · 10GB available

63.22 tok/sEstimated

meta-llama/Llama-3.2-1BQ4

Fits comfortably1GB required · 10GB available

84.59 tok/sEstimated

petals-team/StableBeluga2Q8

Fits comfortably7GB required · 10GB available

29.32 tok/sEstimated

petals-team/StableBeluga2Q4

Fits comfortably4GB required · 10GB available

45.66 tok/sEstimated

vikhyatk/moondream2Q8

Fits comfortably7GB required · 10GB available

30.94 tok/sEstimated

vikhyatk/moondream2Q4

Fits comfortably4GB required · 10GB available

45.88 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 10GB available

41.40 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 10GB available

58.32 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ8

Not supported70GB required · 10GB available

Speed data coming soon

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4

Not supported35GB required · 10GB available

Speed data coming soon

distilbert/distilgpt2Q8

Fits comfortably7GB required · 10GB available

27.29 tok/sEstimated

distilbert/distilgpt2Q4

Fits comfortably4GB required · 10GB available

40.79 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ8

Not supported32GB required · 10GB available

Speed data coming soon

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ4

Not supported16GB required · 10GB available

Speed data coming soon

inference-net/Schematron-3BQ8

Fits comfortably3GB required · 10GB available

39.28 tok/sEstimated

inference-net/Schematron-3BQ4

Fits comfortably2GB required · 10GB available

52.73 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably8GB required · 10GB available

30.26 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 10GB available

41.73 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 10GB available

31.96 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 10GB available

41.45 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 10GB available

38.32 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 10GB available

57.87 tok/sEstimated

bigscience/bloomz-560mQ8

Fits comfortably7GB required · 10GB available

27.26 tok/sEstimated

bigscience/bloomz-560mQ4

Fits comfortably4GB required · 10GB available

42.40 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ8

Fits comfortably3GB required · 10GB available

43.34 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 10GB available

56.44 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported120GB required · 10GB available

Speed data coming soon

openai/gpt-oss-120bQ4

Not supported60GB required · 10GB available

Speed data coming soon

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 10GB available

58.52 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 10GB available

82.64 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q8

Fits comfortably4GB required · 10GB available

33.04 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 10GB available

51.72 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 10GB available

30.36 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 10GB available

45.59 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 10GB available

56.05 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4

Fits comfortably1GB required · 10GB available

89.56 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 10GB available

29.88 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 10GB available

44.04 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ8

Fits comfortably5GB required · 10GB available

30.52 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 10GB available

44.85 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 10GB available

28.46 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 10GB available

44.15 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 10GB available

54.36 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 10GB available

82.79 tok/sEstimated

openai/gpt-oss-20bQ8

Not supported20GB required · 10GB available

Speed data coming soon

openai/gpt-oss-20bQ4

Fits (tight)10GB required · 10GB available

29.80 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Not supported34GB required · 10GB available

Speed data coming soon

dphn/dolphin-2.9.1-yi-1.5-34bQ4

Not supported17GB required · 10GB available

Speed data coming soon

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 10GB available

28.77 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 10GB available

43.36 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8Q8

Not supported80GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8Q4

Not supported40GB required · 10GB available

Speed data coming soon

ai-forever/ruGPT-3.5-13BQ8

Not supported13GB required · 10GB available

Speed data coming soon

ai-forever/ruGPT-3.5-13BQ4

Fits comfortably7GB required · 10GB available

35.25 tok/sEstimated

baichuan-inc/Baichuan-M2-32BQ8

Not supported32GB required · 10GB available

Speed data coming soon

baichuan-inc/Baichuan-M2-32BQ4

Not supported16GB required · 10GB available

Speed data coming soon

HuggingFaceM4/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 10GB available

28.44 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 10GB available

41.72 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ8

Fits comfortably8GB required · 10GB available

27.19 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ4

Fits comfortably4GB required · 10GB available

42.68 tok/sEstimated

Qwen/Qwen3-1.7B-BaseQ8

Fits comfortably7GB required · 10GB available

27.49 tok/sEstimated

Qwen/Qwen3-1.7B-BaseQ4

Fits comfortably4GB required · 10GB available

45.52 tok/sEstimated

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ8

Not supported20GB required · 10GB available

Speed data coming soon

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ4

Fits (tight)10GB required · 10GB available

25.59 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ8

Fits comfortably7GB required · 10GB available

29.92 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ4

Fits comfortably4GB required · 10GB available

42.39 tok/sEstimated

dicta-il/dictalm2.0-instructQ8

Fits comfortably7GB required · 10GB available

32.69 tok/sEstimated

dicta-il/dictalm2.0-instructQ4

Fits comfortably4GB required · 10GB available

41.95 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ8

Not supported30GB required · 10GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ4

Not supported15GB required · 10GB available

Speed data coming soon

GSAI-ML/LLaDA-8B-BaseQ8

Fits comfortably8GB required · 10GB available

25.81 tok/sEstimated

GSAI-ML/LLaDA-8B-BaseQ4

Fits comfortably4GB required · 10GB available

37.62 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bitQ8

Not supported30GB required · 10GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bitQ4

Not supported15GB required · 10GB available

Speed data coming soon

Qwen/Qwen2-0.5B-InstructQ8

Fits comfortably5GB required · 10GB available

36.69 tok/sEstimated

Qwen/Qwen2-0.5B-InstructQ4

Fits comfortably3GB required · 10GB available

52.18 tok/sEstimated

deepseek-ai/DeepSeek-V3Q8

Fits comfortably7GB required · 10GB available

27.29 tok/sEstimated

deepseek-ai/DeepSeek-V3Q4

Fits comfortably4GB required · 10GB available

44.77 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ8

Fits comfortably5GB required · 10GB available

33.92 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ4

Fits comfortably3GB required · 10GB available

45.39 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Not supported30GB required · 10GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ4

Not supported15GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-30B-A3B-Thinking-2507Q8

Not supported30GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-30B-A3B-Thinking-2507Q4

Not supported15GB required · 10GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ8

Not supported30GB required · 10GB available

Speed data coming soon

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ4

Not supported15GB required · 10GB available

Speed data coming soon

AI-MO/Kimina-Prover-72BQ8

Not supported72GB required · 10GB available

Speed data coming soon

AI-MO/Kimina-Prover-72BQ4

Not supported36GB required · 10GB available

Speed data coming soon

apple/OpenELM-1_1B-InstructQ8

Fits comfortably1GB required · 10GB available

65.86 tok/sEstimated

apple/OpenELM-1_1B-InstructQ4

Fits comfortably1GB required · 10GB available

80.70 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 10GB available

28.09 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 10GB available

41.54 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q8

Fits (tight)9GB required · 10GB available

24.98 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q4

Fits comfortably5GB required · 10GB available

39.29 tok/sEstimated

Qwen/Qwen2.5-3BQ8

Fits comfortably3GB required · 10GB available

38.52 tok/sEstimated

Qwen/Qwen2.5-3BQ4

Fits comfortably2GB required · 10GB available

55.34 tok/sEstimated

lmsys/vicuna-7b-v1.5Q8

Fits comfortably7GB required · 10GB available

27.03 tok/sEstimated

lmsys/vicuna-7b-v1.5Q4

Fits comfortably4GB required · 10GB available

46.19 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ8

Not supported13GB required · 10GB available

Speed data coming soon

meta-llama/Llama-2-13b-chat-hfQ4

Fits comfortably7GB required · 10GB available

30.76 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ8

Not supported80GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-ThinkingQ4

Not supported40GB required · 10GB available

Speed data coming soon

unsloth/gemma-3-1b-itQ8

Fits comfortably1GB required · 10GB available

64.94 tok/sEstimated

unsloth/gemma-3-1b-itQ4

Fits comfortably1GB required · 10GB available

86.19 tok/sEstimated

bigcode/starcoder2-3bQ8

Fits comfortably3GB required · 10GB available

40.13 tok/sEstimated

bigcode/starcoder2-3bQ4

Fits comfortably2GB required · 10GB available

55.55 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q8

Not supported80GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q4

Not supported40GB required · 10GB available

Speed data coming soon

ibm-granite/granite-docling-258MQ8

Fits comfortably7GB required · 10GB available

31.48 tok/sEstimated

ibm-granite/granite-docling-258MQ4

Fits comfortably4GB required · 10GB available

41.28 tok/sEstimated

skt/kogpt2-base-v2Q8

Fits comfortably7GB required · 10GB available

30.91 tok/sEstimated

skt/kogpt2-base-v2Q4

Fits comfortably4GB required · 10GB available

40.55 tok/sEstimated

google/gemma-3-270m-itQ8

Fits comfortably7GB required · 10GB available

27.53 tok/sEstimated

google/gemma-3-270m-itQ4

Fits comfortably4GB required · 10GB available

38.70 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q8

Fits comfortably4GB required · 10GB available

40.00 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q4

Fits comfortably2GB required · 10GB available

53.09 tok/sEstimated

Qwen/Qwen2.5-32BQ8

Not supported32GB required · 10GB available

Speed data coming soon

Qwen/Qwen2.5-32BQ4

Not supported16GB required · 10GB available

Speed data coming soon

parler-tts/parler-tts-large-v1Q8

Fits comfortably7GB required · 10GB available

31.29 tok/sEstimated

parler-tts/parler-tts-large-v1Q4

Fits comfortably4GB required · 10GB available

44.48 tok/sEstimated

EleutherAI/pythia-70m-dedupedQ8

Fits comfortably7GB required · 10GB available

32.34 tok/sEstimated

EleutherAI/pythia-70m-dedupedQ4

Fits comfortably4GB required · 10GB available

40.15 tok/sEstimated

microsoft/VibeVoice-1.5BQ8

Fits comfortably5GB required · 10GB available

31.26 tok/sEstimated

microsoft/VibeVoice-1.5BQ4

Fits comfortably3GB required · 10GB available

49.45 tok/sEstimated

ibm-granite/granite-3.3-2b-instructQ8

Fits comfortably2GB required · 10GB available

46.33 tok/sEstimated

ibm-granite/granite-3.3-2b-instructQ4

Fits comfortably1GB required · 10GB available

68.60 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ8

Not supported72GB required · 10GB available

Speed data coming soon

Qwen/Qwen2.5-72B-InstructQ4

Not supported36GB required · 10GB available

Speed data coming soon

liuhaotian/llava-v1.5-7bQ8

Fits comfortably7GB required · 10GB available

26.95 tok/sEstimated

liuhaotian/llava-v1.5-7bQ4

Fits comfortably4GB required · 10GB available

40.96 tok/sEstimated

google/gemma-2bQ8

Fits comfortably2GB required · 10GB available

42.79 tok/sEstimated

google/gemma-2bQ4

Fits comfortably1GB required · 10GB available

68.47 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q8

Fits comfortably7GB required · 10GB available

31.17 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q4

Fits comfortably4GB required · 10GB available

41.44 tok/sEstimated

Qwen/Qwen3-235B-A22BQ8

Not supported235GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-235B-A22BQ4

Not supported118GB required · 10GB available

Speed data coming soon

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ8

Fits comfortably8GB required · 10GB available

30.35 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 10GB available

40.05 tok/sEstimated

microsoft/Phi-4-mini-instructQ8

Fits comfortably7GB required · 10GB available

31.78 tok/sEstimated

microsoft/Phi-4-mini-instructQ4

Fits comfortably4GB required · 10GB available

42.90 tok/sEstimated

llamafactory/tiny-random-Llama-3Q8

Fits comfortably7GB required · 10GB available

32.54 tok/sEstimated

llamafactory/tiny-random-Llama-3Q4

Fits comfortably4GB required · 10GB available

44.20 tok/sEstimated

HuggingFaceH4/zephyr-7b-betaQ8

Fits comfortably7GB required · 10GB available

29.05 tok/sEstimated

HuggingFaceH4/zephyr-7b-betaQ4

Fits comfortably4GB required · 10GB available

39.62 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q8

Fits comfortably4GB required · 10GB available

38.78 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q4

Fits comfortably2GB required · 10GB available

53.47 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8Q8

Not supported30GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8Q4

Not supported15GB required · 10GB available

Speed data coming soon

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ8

Fits comfortably8GB required · 10GB available

27.62 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ4

Fits comfortably4GB required · 10GB available

43.69 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 10GB available

64.77 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 10GB available

84.89 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ8

Fits comfortably8GB required · 10GB available

25.95 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ4

Fits comfortably4GB required · 10GB available

40.78 tok/sEstimated

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicQ8

Not supported90GB required · 10GB available

Speed data coming soon

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicQ4

Not supported45GB required · 10GB available

Speed data coming soon

Qwen/Qwen2.5-Coder-7B-InstructQ8

Fits comfortably7GB required · 10GB available

29.33 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ4

Fits comfortably4GB required · 10GB available

38.82 tok/sEstimated

numind/NuExtract-1.5Q8

Fits comfortably7GB required · 10GB available

31.76 tok/sEstimated

numind/NuExtract-1.5Q4

Fits comfortably4GB required · 10GB available

39.36 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ8

Fits comfortably7GB required · 10GB available

30.87 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4

Fits comfortably4GB required · 10GB available

43.22 tok/sEstimated

hmellor/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 10GB available

29.27 tok/sEstimated

hmellor/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 10GB available

44.71 tok/sEstimated

huggyllama/llama-7bQ8

Fits comfortably7GB required · 10GB available

27.48 tok/sEstimated

huggyllama/llama-7bQ4

Fits comfortably4GB required · 10GB available

38.54 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324Q8

Fits comfortably7GB required · 10GB available

27.57 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324Q4

Fits comfortably4GB required · 10GB available

46.44 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ8

Fits comfortably7GB required · 10GB available

26.85 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ4

Fits comfortably4GB required · 10GB available

39.11 tok/sEstimated

sshleifer/tiny-gpt2Q8

Fits comfortably7GB required · 10GB available

27.61 tok/sEstimated

sshleifer/tiny-gpt2Q4

Fits comfortably4GB required · 10GB available

45.16 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ8

Fits comfortably8GB required · 10GB available

30.46 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ4

Fits comfortably4GB required · 10GB available

44.11 tok/sEstimated

openai-community/gpt2-xlQ8

Fits comfortably7GB required · 10GB available

31.64 tok/sEstimated

openai-community/gpt2-xlQ4

Fits comfortably4GB required · 10GB available

46.05 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ8

Not supported14GB required · 10GB available

Speed data coming soon

OpenPipe/Qwen3-14B-InstructQ4

Fits comfortably7GB required · 10GB available

33.15 tok/sEstimated

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16Q8

Not supported70GB required · 10GB available

Speed data coming soon

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16Q4

Not supported35GB required · 10GB available

Speed data coming soon

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bitQ8

Fits comfortably4GB required · 10GB available

36.49 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bitQ4

Fits comfortably2GB required · 10GB available

50.79 tok/sEstimated

ibm-research/PowerMoE-3bQ8

Fits comfortably3GB required · 10GB available

43.73 tok/sEstimated

ibm-research/PowerMoE-3bQ4

Fits comfortably2GB required · 10GB available

63.55 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bitQ8

Fits comfortably4GB required · 10GB available

37.98 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bitQ4

Fits comfortably2GB required · 10GB available

56.23 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 10GB available

43.37 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 10GB available

57.19 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ8

Fits comfortably4GB required · 10GB available

39.39 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ4

Fits comfortably2GB required · 10GB available

55.16 tok/sEstimated

meta-llama/Llama-3.2-3BQ8

Fits comfortably3GB required · 10GB available

43.99 tok/sEstimated

meta-llama/Llama-3.2-3BQ4

Fits comfortably2GB required · 10GB available

56.36 tok/sEstimated

EleutherAI/gpt-neo-125mQ8

Fits comfortably7GB required · 10GB available

27.62 tok/sEstimated

EleutherAI/gpt-neo-125mQ4

Fits comfortably4GB required · 10GB available

42.88 tok/sEstimated

codellama/CodeLlama-34b-hfQ8

Not supported34GB required · 10GB available

Speed data coming soon

codellama/CodeLlama-34b-hfQ4

Not supported17GB required · 10GB available

Speed data coming soon

meta-llama/Llama-Guard-3-1BQ8

Fits comfortably1GB required · 10GB available

55.13 tok/sEstimated

meta-llama/Llama-Guard-3-1BQ4

Fits comfortably1GB required · 10GB available

84.47 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ8

Fits comfortably5GB required · 10GB available

32.76 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ4

Fits comfortably3GB required · 10GB available

50.94 tok/sEstimated

google/gemma-2-2b-itQ8

Fits comfortably2GB required · 10GB available

45.04 tok/sEstimated

google/gemma-2-2b-itQ4

Fits comfortably1GB required · 10GB available

70.61 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Not supported14GB required · 10GB available

Speed data coming soon

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 10GB available

35.76 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ8

Not supported32GB required · 10GB available

Speed data coming soon

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ4

Not supported16GB required · 10GB available

Speed data coming soon

microsoft/Phi-3.5-mini-instructQ8

Fits comfortably7GB required · 10GB available

31.60 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ4

Fits comfortably4GB required · 10GB available

41.80 tok/sEstimated

Qwen/Qwen3-4B-BaseQ8

Fits comfortably4GB required · 10GB available

39.68 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 10GB available

55.70 tok/sEstimated

Qwen/Qwen2-7B-InstructQ8

Fits comfortably7GB required · 10GB available

29.35 tok/sEstimated

Qwen/Qwen2-7B-InstructQ4

Fits comfortably4GB required · 10GB available

46.72 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ8

Fits comfortably7GB required · 10GB available

31.49 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ4

Fits comfortably4GB required · 10GB available

43.86 tok/sEstimated

Qwen/Qwen3-14B-BaseQ8

Not supported14GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-14B-BaseQ4

Fits comfortably7GB required · 10GB available

30.98 tok/sEstimated

swiss-ai/Apertus-8B-Instruct-2509Q8

Fits comfortably8GB required · 10GB available

26.11 tok/sEstimated

swiss-ai/Apertus-8B-Instruct-2509Q4

Fits comfortably4GB required · 10GB available

37.11 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ8

Fits comfortably7GB required · 10GB available

27.61 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ4

Fits comfortably4GB required · 10GB available

46.43 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ8

Fits comfortably7GB required · 10GB available

30.34 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ4

Fits comfortably4GB required · 10GB available

39.08 tok/sEstimated

rinna/japanese-gpt-neox-smallQ8

Fits comfortably7GB required · 10GB available

31.90 tok/sEstimated

rinna/japanese-gpt-neox-smallQ4

Fits comfortably4GB required · 10GB available

39.80 tok/sEstimated

Qwen/Qwen2.5-Coder-1.5BQ8

Fits comfortably5GB required · 10GB available

34.34 tok/sEstimated

Qwen/Qwen2.5-Coder-1.5BQ4

Fits comfortably3GB required · 10GB available

52.52 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ8

Fits comfortably8GB required · 10GB available

26.54 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ4

Fits comfortably4GB required · 10GB available

42.91 tok/sEstimated

Qwen/Qwen3-30B-A3BQ8

Not supported30GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-30B-A3BQ4

Not supported15GB required · 10GB available

Speed data coming soon

deepseek-ai/DeepSeek-R1Q8

Fits comfortably7GB required · 10GB available

28.53 tok/sEstimated

deepseek-ai/DeepSeek-R1Q4

Fits comfortably4GB required · 10GB available

46.14 tok/sEstimated

microsoft/DialoGPT-smallQ8

Fits comfortably7GB required · 10GB available

32.45 tok/sEstimated

microsoft/DialoGPT-smallQ4

Fits comfortably4GB required · 10GB available

45.72 tok/sEstimated

Qwen/Qwen3-8B-FP8Q8

Fits comfortably8GB required · 10GB available

30.50 tok/sEstimated

Qwen/Qwen3-8B-FP8Q4

Fits comfortably4GB required · 10GB available

40.67 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructQ8

Not supported30GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-Coder-30B-A3B-InstructQ4

Not supported15GB required · 10GB available

Speed data coming soon

Qwen/Qwen3-Embedding-4BQ8

Fits comfortably4GB required · 10GB available

36.82 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ4

Fits comfortably2GB required · 10GB available

52.00 tok/sEstimated

microsoft/Phi-4-multimodal-instructQ8

Fits comfortably7GB required · 10GB available

29.08 tok/sEstimated

microsoft/Phi-4-multimodal-instructQ4

Fits comfortably4GB required · 10GB available

43.50 tok/sEstimated

Qwen/Qwen3-8B-BaseQ8

Fits comfortably8GB required · 10GB available

28.12 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

GPU FAQs

Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.

What speed can a 10 GB RTX 3080 hit on Qwen 30B?

Owners running Qwen3-30B-A3B on a 10 GB RTX 3080 report roughly 15 tokens/sec after tuning, keeping interactive coding prompts responsive.

Source: Reddit – /r/LocalLLaMA (mquvxwc)

Why do benchmark sheets overstate requirements?

Some spec sheets assume higher ceilings, but real-world users note they already achieve ~10 tok/sec on a 10 GB 3080—showing how tuning beats blanket requirements.

Source: Reddit – /r/LocalLLaMA (mj408ke)

Why does Ollama still offload to CPU on 12B models?

With larger context windows, Ollama reports 40% of layers moving to system RAM even on 12B models—illustrating the need to tune gpu_layers on 10 GB cards.

Source: Reddit – /r/LocalLLaMA (mnspe0d)

What are the core specs of RTX 3080?

The RTX 3080 Founders Edition includes 10 GB GDDR6X, a 320 W board power, triple 8-pin power connectors, and NVIDIA recommends a 750 W PSU.

Source: TechPowerUp – RTX 3080 Specs

How much does an RTX 3080 cost right now?

Latest snapshot (3 Nov 2025): Amazon at $699 out of stock, Newegg at $729 in stock, Best Buy at $699 out of stock.

Source: Supabase price tracker snapshot – 2025-11-03