Loading GPU data...

Quick Answer: NVIDIA RTX 6000 Ada offers 48GB VRAM and starts around $4999.00. It delivers approximately 192 tokens/sec on apple/OpenELM-1_1B-Instruct. It typically draws 300W under load.

NVIDIA RTX 6000 Ada

Name: NVIDIA RTX 6000 Ada
Brand: NVIDIA
Rating: 4.5 (89 reviews)

Unknown

By NVIDIAReleased 2022-09MSRP $6,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Specs snapshot

Key hardware metrics for AI workloads.

VRAM48GB

Cores18,176

TDP300W

ArchitectureAda Lovelace

Price comparison

Retailer	Price	Buy
AmazonPrimary	$4,999.00LowestUnknown	Buy now
Newegg	$6,999.00In stock	Buy now
Best Buy	$7,299.00Backorder	Buy now

AmazonPrimary

$4,999.00

Lowest PriceUnknown

Buy

Newegg

$6,999.00

In stock

Buy

Best Buy

$7,299.00

Backorder

Buy

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
apple/OpenELM-1_1B-Instruct	Q4	192.20 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	189.90 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	186.76 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	180.86 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	176.80 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	175.85 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	170.05 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	167.62 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	166.50 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	151.65 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	148.05 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	137.77 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q8	137.23 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q8	136.33 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	134.23 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	133.24 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q8	132.18 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	131.80 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	131.45 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	131.00 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	127.97 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	125.26 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B-Instruct	Q8	124.33 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	122.40 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	122.17 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q8	120.42 tok/sEstimated Auto-generated benchmark	1GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	116.15 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q8	116.07 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q8	115.42 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q8	114.24 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	112.92 tok/sEstimated Auto-generated benchmark	2GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	112.45 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	112.02 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	111.60 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-1.5B	Q4	110.10 tok/sEstimated Auto-generated benchmark	3GB
unsloth/Llama-3.2-3B-Instruct	Q4	109.75 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Base	Q4	108.82 tok/sEstimated Auto-generated benchmark	2GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	108.13 tok/sEstimated Auto-generated benchmark	3GB
LiquidAI/LFM2-1.2B	Q8	105.86 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-Math-1.5B	Q4	105.48 tok/sEstimated Auto-generated benchmark	3GB
ibm-granite/granite-3.3-2b-instruct	Q8	105.30 tok/sEstimated Auto-generated benchmark	2GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q4	103.87 tok/sEstimated Auto-generated benchmark	2GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	103.62 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2-1.5B-Instruct	Q4	102.84 tok/sEstimated Auto-generated benchmark	3GB
google/gemma-2b	Q8	102.71 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-0.5B	Q4	100.27 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	99.77 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Embedding-4B	Q4	99.75 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Thinking-2507	Q4	99.46 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Instruct-2507	Q4	98.98 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Embedding-0.6B	Q4	98.75 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B	Q4	98.58 tok/sEstimated Auto-generated benchmark	2GB
Gensyn/Qwen2.5-0.5B-Instruct	Q4	98.10 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-0.5B-Instruct	Q4	97.76 tok/sEstimated Auto-generated benchmark	3GB
mistralai/Mistral-7B-v0.1	Q4	97.44 tok/sEstimated Auto-generated benchmark	4GB
huggyllama/llama-7b	Q4	97.44 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B-Instruct	Q4	96.19 tok/sEstimated Auto-generated benchmark	3GB
EleutherAI/gpt-neo-125m	Q4	95.94 tok/sEstimated Auto-generated benchmark	4GB
llamafactory/tiny-random-Llama-3	Q4	95.92 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	95.86 tok/sEstimated Auto-generated benchmark	3GB
rednote-hilab/dots.ocr	Q4	95.72 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	95.38 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	94.86 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2-2b-it	Q8	94.59 tok/sEstimated Auto-generated benchmark	2GB
sshleifer/tiny-gpt2	Q4	94.30 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-medium	Q4	94.10 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B	Q4	94.07 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.6-FP8	Q4	94.03 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	93.80 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Reranker-0.6B	Q4	93.33 tok/sEstimated Auto-generated benchmark	3GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	93.33 tok/sEstimated Auto-generated benchmark	4GB
hmellor/tiny-random-LlamaForCausalLM	Q4	93.19 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	92.82 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-2-7b-chat-hf	Q4	92.60 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	92.55 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-hf	Q4	92.44 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-270m-it	Q4	92.30 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-1.5B	Q4	92.29 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-Guard-3-8B	Q4	92.21 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	92.12 tok/sEstimated Auto-generated benchmark	3GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	91.77 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3-0324	Q4	91.37 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-0.6B	Q4	91.08 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	90.91 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B	Q4	90.82 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B	Q4	90.71 tok/sEstimated Auto-generated benchmark	3GB
microsoft/VibeVoice-1.5B	Q4	90.70 tok/sEstimated Auto-generated benchmark	3GB
GSAI-ML/LLaDA-8B-Base	Q4	90.57 tok/sEstimated Auto-generated benchmark	4GB
dicta-il/dictalm2.0-instruct	Q4	90.53 tok/sEstimated Auto-generated benchmark	4GB
lmsys/vicuna-7b-v1.5	Q4	90.42 tok/sEstimated Auto-generated benchmark	4GB
vikhyatk/moondream2	Q4	90.40 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B-Base	Q4	90.19 tok/sEstimated Auto-generated benchmark	4GB
numind/NuExtract-1.5	Q4	90.07 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	89.90 tok/sEstimated Auto-generated benchmark	4GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	89.78 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	89.59 tok/sEstimated Auto-generated benchmark	4GB
ibm-research/PowerMoE-3b	Q8	89.44 tok/sEstimated Auto-generated benchmark	3GB
HuggingFaceH4/zephyr-7b-beta	Q4	89.33 tok/sEstimated Auto-generated benchmark	4GB
distilbert/distilgpt2	Q4	89.29 tok/sEstimated Auto-generated benchmark	4GB
MiniMaxAI/MiniMax-M2	Q4	89.26 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	88.88 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-docling-258M	Q4	88.75 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	88.47 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-4-mini-instruct	Q4	88.41 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	88.31 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-4k-instruct	Q4	88.04 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/pythia-70m-deduped	Q4	88.00 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-xl	Q4	87.94 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Instruct	Q4	87.59 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	87.28 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-3B	Q8	87.02 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-3.1-8B	Q4	86.99 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceTB/SmolLM2-135M	Q4	86.52 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-7B-Instruct	Q4	86.46 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-0.6B-Base	Q4	86.19 tok/sEstimated Auto-generated benchmark	3GB
openai-community/gpt2	Q4	86.16 tok/sEstimated Auto-generated benchmark	4GB
IlyaGusev/saiga_llama3_8b	Q4	85.81 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	85.74 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-2	Q4	85.36 tok/sEstimated Auto-generated benchmark	4GB
rinna/japanese-gpt-neox-small	Q4	85.19 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B-Instruct	Q4	85.04 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-medium	Q4	85.04 tok/sEstimated Auto-generated benchmark	4GB
BSC-LT/salamandraTA-7b-instruct	Q4	84.77 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-Instruct-v0.1	Q4	84.32 tok/sEstimated Auto-generated benchmark	4GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	84.31 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-4	Q4	84.26 tok/sEstimated Auto-generated benchmark	4GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	84.10 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	84.08 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	84.05 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3.1	Q4	83.99 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B-Instruct	Q8	83.86 tok/sEstimated Auto-generated benchmark	3GB
mistralai/Mistral-7B-Instruct-v0.2	Q4	83.71 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B-Instruct	Q4	83.67 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-4-multimodal-instruct	Q4	83.48 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	82.89 tok/sEstimated Auto-generated benchmark	4GB
bigcode/starcoder2-3b	Q8	82.76 tok/sEstimated Auto-generated benchmark	3GB
openai-community/gpt2-large	Q4	82.69 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1	Q4	82.35 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	82.32 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B	Q8	82.26 tok/sEstimated Auto-generated benchmark	3GB
bigscience/bloomz-560m	Q4	82.15 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-vision-instruct	Q4	81.99 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-FP8	Q4	81.98 tok/sEstimated Auto-generated benchmark	4GB
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	81.65 tok/sEstimated Auto-generated benchmark	5GB
google-t5/t5-3b	Q8	81.58 tok/sEstimated Auto-generated benchmark	3GB
HuggingFaceTB/SmolLM-135M	Q4	81.42 tok/sEstimated Auto-generated benchmark	4GB
liuhaotian/llava-v1.5-7b	Q4	81.41 tok/sEstimated Auto-generated benchmark	4GB
facebook/opt-125m	Q4	81.34 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	81.34 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-128k-instruct	Q4	81.25 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	81.07 tok/sEstimated Auto-generated benchmark	4GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	80.92 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-3.2-3B-Instruct	Q8	80.77 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	80.64 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B	Q8	80.54 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	80.36 tok/sEstimated Auto-generated benchmark	4GB
inference-net/Schematron-3B	Q8	80.28 tok/sEstimated Auto-generated benchmark	3GB
parler-tts/parler-tts-large-v1	Q4	80.23 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B	Q4	80.12 tok/sEstimated Auto-generated benchmark	4GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	79.82 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-8B	Q4	79.69 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	79.69 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Base	Q8	78.94 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	78.59 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Llama-3.2-3B-Instruct	Q8	78.21 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-0.5B	Q8	77.15 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-4B-Instruct-2507	Q8	76.59 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	76.44 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Math-1.5B	Q8	75.45 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2-1.5B-Instruct	Q8	75.38 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2-0.5B-Instruct	Q8	75.31 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-14B	Q4	75.07 tok/sEstimated Auto-generated benchmark	7GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q8	74.70 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-14B-Base	Q4	74.07 tok/sEstimated Auto-generated benchmark	7GB
ai-forever/ruGPT-3.5-13B	Q4	73.76 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-14B-Instruct	Q4	73.56 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-14B	Q4	72.68 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-1.5B-Instruct	Q8	72.50 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-4B-Thinking-2507	Q8	71.84 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-1.5B	Q8	71.75 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-0.5B-Instruct	Q8	71.50 tok/sEstimated Auto-generated benchmark	5GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q8	71.09 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q8	70.98 tok/sEstimated Auto-generated benchmark	5GB
meta-llama/Llama-2-13b-chat-hf	Q4	70.91 tok/sEstimated Auto-generated benchmark	7GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q8	70.34 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	69.76 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-Embedding-4B	Q8	69.70 tok/sEstimated Auto-generated benchmark	4GB
OpenPipe/Qwen3-14B-Instruct	Q4	69.21 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Reranker-0.6B	Q8	69.14 tok/sEstimated Auto-generated benchmark	6GB
Qwen/Qwen3-0.6B	Q8	68.27 tok/sEstimated Auto-generated benchmark	6GB
mistralai/Mistral-7B-v0.1	Q8	68.26 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceTB/SmolLM2-135M	Q8	68.17 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-0.5B	Q8	68.13 tok/sEstimated Auto-generated benchmark	5GB
openai-community/gpt2-medium	Q8	67.96 tok/sEstimated Auto-generated benchmark	7GB
mistralai/Mistral-7B-Instruct-v0.1	Q8	67.75 tok/sEstimated Auto-generated benchmark	7GB
huggyllama/llama-7b	Q8	67.65 tok/sEstimated Auto-generated benchmark	7GB
EleutherAI/pythia-70m-deduped	Q8	67.12 tok/sEstimated Auto-generated benchmark	7GB
lmsys/vicuna-7b-v1.5	Q8	67.01 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3.5-mini-instruct	Q8	66.89 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2-large	Q8	66.58 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q8	66.54 tok/sEstimated Auto-generated benchmark	7GB
microsoft/VibeVoice-1.5B	Q8	66.52 tok/sEstimated Auto-generated benchmark	5GB
dicta-il/dictalm2.0-instruct	Q8	66.48 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Embedding-0.6B	Q8	66.47 tok/sEstimated Auto-generated benchmark	6GB
deepseek-ai/DeepSeek-R1-0528	Q8	66.24 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2	Q8	66.02 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-2-7b-hf	Q8	65.30 tok/sEstimated Auto-generated benchmark	7GB
EleutherAI/gpt-neo-125m	Q8	65.17 tok/sEstimated Auto-generated benchmark	7GB
mistralai/Mistral-7B-Instruct-v0.2	Q8	65.17 tok/sEstimated Auto-generated benchmark	7GB
distilbert/distilgpt2	Q8	64.83 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-2-7b-chat-hf	Q8	64.51 tok/sEstimated Auto-generated benchmark	7GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	64.44 tok/sEstimated Auto-generated benchmark	8GB
microsoft/DialoGPT-small	Q8	64.40 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-1.7B-Base	Q8	64.16 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-4-multimodal-instruct	Q8	64.09 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceTB/SmolLM-135M	Q8	64.05 tok/sEstimated Auto-generated benchmark	7GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	63.93 tok/sEstimated Auto-generated benchmark	7GB
liuhaotian/llava-v1.5-7b	Q8	63.91 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2-7B-Instruct	Q8	63.73 tok/sEstimated Auto-generated benchmark	7GB
Gensyn/Qwen2.5-0.5B-Instruct	Q8	63.70 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	63.50 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-3.1-8B-Instruct	Q8	63.43 tok/sEstimated Auto-generated benchmark	8GB
zai-org/GLM-4.5-Air	Q8	63.33 tok/sEstimated Auto-generated benchmark	7GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	63.23 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen3-8B	Q8	62.86 tok/sEstimated Auto-generated benchmark	8GB
GSAI-ML/LLaDA-8B-Instruct	Q8	62.77 tok/sEstimated Auto-generated benchmark	8GB
llamafactory/tiny-random-Llama-3	Q8	62.67 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1	Q8	62.64 tok/sEstimated Auto-generated benchmark	7GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	62.38 tok/sEstimated Auto-generated benchmark	7GB
BSC-LT/salamandraTA-7b-instruct	Q8	62.26 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	62.24 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-7B	Q8	61.84 tok/sEstimated Auto-generated benchmark	7GB
rednote-hilab/dots.ocr	Q8	61.75 tok/sEstimated Auto-generated benchmark	7GB
facebook/opt-125m	Q8	61.55 tok/sEstimated Auto-generated benchmark	7GB
hmellor/tiny-random-LlamaForCausalLM	Q8	61.54 tok/sEstimated Auto-generated benchmark	7GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q8	61.49 tok/sEstimated Auto-generated benchmark	8GB
ibm-granite/granite-docling-258M	Q8	61.34 tok/sEstimated Auto-generated benchmark	7GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	61.32 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2-xl	Q8	61.23 tok/sEstimated Auto-generated benchmark	7GB
microsoft/phi-4	Q8	60.83 tok/sEstimated Auto-generated benchmark	7GB
IlyaGusev/saiga_llama3_8b	Q8	60.83 tok/sEstimated Auto-generated benchmark	8GB
GSAI-ML/LLaDA-8B-Base	Q8	60.67 tok/sEstimated Auto-generated benchmark	8GB
microsoft/Phi-3-mini-4k-instruct	Q8	60.64 tok/sEstimated Auto-generated benchmark	7GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	60.61 tok/sEstimated Auto-generated benchmark	7GB
microsoft/phi-2	Q8	60.41 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-V3	Q8	60.22 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Meta-Llama-3-8B-Instruct	Q8	60.12 tok/sEstimated Auto-generated benchmark	8GB
google/gemma-3-270m-it	Q8	59.94 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-0.6B-Base	Q8	59.76 tok/sEstimated Auto-generated benchmark	6GB
deepseek-ai/DeepSeek-V3.1	Q8	59.52 tok/sEstimated Auto-generated benchmark	7GB
rinna/japanese-gpt-neox-small	Q8	59.16 tok/sEstimated Auto-generated benchmark	7GB
bigscience/bloomz-560m	Q8	59.09 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3.5-vision-instruct	Q8	59.09 tok/sEstimated Auto-generated benchmark	7GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	59.08 tok/sEstimated Auto-generated benchmark	8GB
numind/NuExtract-1.5	Q8	59.01 tok/sEstimated Auto-generated benchmark	7GB
parler-tts/parler-tts-large-v1	Q8	58.87 tok/sEstimated Auto-generated benchmark	7GB
sshleifer/tiny-gpt2	Q8	58.82 tok/sEstimated Auto-generated benchmark	7GB
swiss-ai/Apertus-8B-Instruct-2509	Q8	58.61 tok/sEstimated Auto-generated benchmark	8GB
deepseek-ai/DeepSeek-V3-0324	Q8	58.41 tok/sEstimated Auto-generated benchmark	7GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	58.18 tok/sEstimated Auto-generated benchmark	8GB
ibm-granite/granite-3.3-8b-instruct	Q8	58.13 tok/sEstimated Auto-generated benchmark	8GB
microsoft/Phi-3-mini-128k-instruct	Q8	58.11 tok/sEstimated Auto-generated benchmark	7GB
unsloth/gpt-oss-20b-BF16	Q4	58.10 tok/sEstimated Auto-generated benchmark	10GB
petals-team/StableBeluga2	Q8	57.97 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Meta-Llama-3-8B	Q8	57.69 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen2.5-7B-Instruct	Q8	57.54 tok/sEstimated Auto-generated benchmark	7GB
zai-org/GLM-4.6-FP8	Q8	57.53 tok/sEstimated Auto-generated benchmark	7GB
microsoft/DialoGPT-medium	Q8	57.50 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	57.42 tok/sEstimated Auto-generated benchmark	7GB
MiniMaxAI/MiniMax-M2	Q8	57.23 tok/sEstimated Auto-generated benchmark	7GB
openai/gpt-oss-20b	Q4	57.06 tok/sEstimated Auto-generated benchmark	10GB
microsoft/Phi-4-mini-instruct	Q8	56.88 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceH4/zephyr-7b-beta	Q8	56.81 tok/sEstimated Auto-generated benchmark	7GB
skt/kogpt2-base-v2	Q8	56.62 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-1.7B	Q8	56.60 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-Guard-3-8B	Q8	56.60 tok/sEstimated Auto-generated benchmark	8GB
vikhyatk/moondream2	Q8	56.39 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-8B-FP8	Q8	56.22 tok/sEstimated Auto-generated benchmark	8GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	56.15 tok/sEstimated Auto-generated benchmark	8GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	54.37 tok/sEstimated Auto-generated benchmark	15GB
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q4	54.32 tok/sEstimated Auto-generated benchmark	10GB
Qwen/Qwen3-8B-Base	Q8	53.92 tok/sEstimated Auto-generated benchmark	8GB
meta-llama/Llama-3.1-8B	Q8	53.76 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen3-Embedding-8B	Q8	53.49 tok/sEstimated Auto-generated benchmark	8GB
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	53.39 tok/sEstimated Auto-generated benchmark	10GB
meta-llama/Llama-2-13b-chat-hf	Q8	53.32 tok/sEstimated Auto-generated benchmark	13GB
Qwen/Qwen2.5-32B-Instruct	Q4	52.96 tok/sEstimated Auto-generated benchmark	16GB
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	52.93 tok/sEstimated Auto-generated benchmark	9GB
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q4	52.14 tok/sEstimated Auto-generated benchmark	15GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q4	52.06 tok/sEstimated Auto-generated benchmark	15GB
Qwen/Qwen2.5-14B	Q8	51.69 tok/sEstimated Auto-generated benchmark	14GB
Qwen/Qwen3-30B-A3B	Q4	51.35 tok/sEstimated Auto-generated benchmark	15GB
Qwen/Qwen3-14B-Base	Q8	51.12 tok/sEstimated Auto-generated benchmark	14GB
codellama/CodeLlama-34b-hf	Q4	49.89 tok/sEstimated Auto-generated benchmark	17GB
baichuan-inc/Baichuan-M2-32B	Q4	49.34 tok/sEstimated Auto-generated benchmark	16GB
Qwen/Qwen3-14B	Q8	49.15 tok/sEstimated Auto-generated benchmark	14GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q4	48.99 tok/sEstimated Auto-generated benchmark	15GB
OpenPipe/Qwen3-14B-Instruct	Q8	48.68 tok/sEstimated Auto-generated benchmark	14GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q4	48.12 tok/sEstimated Auto-generated benchmark	15GB
Qwen/Qwen3-30B-A3B-Thinking-2507	Q4	47.71 tok/sEstimated Auto-generated benchmark	15GB
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q4	47.41 tok/sEstimated Auto-generated benchmark	15GB
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	47.35 tok/sEstimated Auto-generated benchmark	15GB
Qwen/Qwen2.5-14B-Instruct	Q8	46.98 tok/sEstimated Auto-generated benchmark	14GB
Qwen/Qwen2.5-32B	Q4	46.23 tok/sEstimated Auto-generated benchmark	16GB
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q4	46.02 tok/sEstimated Auto-generated benchmark	16GB
ai-forever/ruGPT-3.5-13B	Q8	45.74 tok/sEstimated Auto-generated benchmark	13GB
openai/gpt-oss-20b	Q8	45.36 tok/sEstimated Auto-generated benchmark	20GB
Qwen/Qwen3-32B	Q4	44.83 tok/sEstimated Auto-generated benchmark	16GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q4	44.61 tok/sEstimated Auto-generated benchmark	16GB
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	44.31 tok/sEstimated Auto-generated benchmark	20GB
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	43.39 tok/sEstimated Auto-generated benchmark	17GB
unsloth/gpt-oss-20b-BF16	Q8	41.98 tok/sEstimated Auto-generated benchmark	20GB
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q8	41.36 tok/sEstimated Auto-generated benchmark	20GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	36.82 tok/sEstimated Auto-generated benchmark	30GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q8	36.62 tok/sEstimated Auto-generated benchmark	30GB
Qwen/Qwen3-30B-A3B-Thinking-2507	Q8	36.02 tok/sEstimated Auto-generated benchmark	30GB
baichuan-inc/Baichuan-M2-32B	Q8	35.49 tok/sEstimated Auto-generated benchmark	32GB
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	35.47 tok/sEstimated Auto-generated benchmark	34GB
Qwen/Qwen2.5-32B	Q8	35.21 tok/sEstimated Auto-generated benchmark	32GB
Qwen/Qwen3-32B	Q8	34.42 tok/sEstimated Auto-generated benchmark	32GB
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	34.29 tok/sEstimated Auto-generated benchmark	35GB
Qwen/Qwen3-30B-A3B	Q8	33.94 tok/sEstimated Auto-generated benchmark	30GB
codellama/CodeLlama-34b-hf	Q8	33.36 tok/sEstimated Auto-generated benchmark	34GB
meta-llama/Llama-3.3-70B-Instruct	Q4	33.33 tok/sEstimated Auto-generated benchmark	35GB
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q8	33.28 tok/sEstimated Auto-generated benchmark	30GB
Qwen/Qwen3-30B-A3B-Instruct-2507	Q8	32.97 tok/sEstimated Auto-generated benchmark	30GB
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q8	32.96 tok/sEstimated Auto-generated benchmark	30GB
Qwen/Qwen2.5-72B-Instruct	Q4	32.93 tok/sEstimated Auto-generated benchmark	36GB
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	32.65 tok/sEstimated Auto-generated benchmark	32GB
Qwen/Qwen2.5-32B-Instruct	Q8	32.55 tok/sEstimated Auto-generated benchmark	32GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q8	32.43 tok/sEstimated Auto-generated benchmark	30GB
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q4	31.88 tok/sEstimated Auto-generated benchmark	35GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q8	31.76 tok/sEstimated Auto-generated benchmark	32GB
meta-llama/Meta-Llama-3-70B-Instruct	Q4	31.69 tok/sEstimated Auto-generated benchmark	35GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q8	31.37 tok/sEstimated Auto-generated benchmark	30GB
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q4	30.05 tok/sEstimated Auto-generated benchmark	40GB
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	30.00 tok/sEstimated Auto-generated benchmark	40GB
meta-llama/Llama-3.1-70B-Instruct	Q4	29.70 tok/sEstimated Auto-generated benchmark	35GB
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q4	28.99 tok/sEstimated Auto-generated benchmark	40GB
AI-MO/Kimina-Prover-72B	Q4	28.65 tok/sEstimated Auto-generated benchmark	36GB
Qwen/Qwen3-Next-80B-A3B-Thinking	Q4	28.22 tok/sEstimated Auto-generated benchmark	40GB
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q4	26.38 tok/sEstimated Auto-generated benchmark	45GB

apple/OpenELM-1_1B-Instruct

1GB

192.20 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

189.90 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

186.76 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

180.86 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

176.80 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

175.85 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

170.05 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

167.62 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

166.50 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

151.65 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

148.05 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

137.77 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

137.23 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

136.33 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

134.23 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

133.24 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

132.18 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

131.80 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

131.45 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

131.00 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

127.97 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

125.26 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

124.33 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

122.40 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

122.17 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

120.42 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

2GB

116.15 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

116.07 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

115.42 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

114.24 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

112.92 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

2GB

112.45 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

112.02 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

111.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

110.10 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

109.75 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

108.82 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

108.13 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

2GB

105.86 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

3GB

105.48 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

2GB

105.30 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit

2GB

103.87 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

103.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

102.84 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

2GB

102.71 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

100.27 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

99.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

99.75 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

99.46 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

2GB

98.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

3GB

98.75 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

98.58 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

3GB

98.10 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

3GB

97.76 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

97.44 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

4GB

97.44 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

3GB

96.19 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

4GB

95.94 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

4GB

95.92 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3GB

95.86 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

4GB

95.72 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

95.38 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

4GB

94.86 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

2GB

94.59 tok/sEstimated

Auto-generated benchmark

sshleifer/tiny-gpt2

4GB

94.30 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-medium

4GB

94.10 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

94.07 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

4GB

94.03 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

93.80 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

93.33 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

93.33 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

4GB

93.19 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

3GB

92.82 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

92.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

92.55 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

4GB

92.44 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

4GB

92.30 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

92.29 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

92.21 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

92.12 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

91.77 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

91.37 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

91.08 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

90.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

90.82 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

3GB

90.71 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

90.70 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

90.57 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

4GB

90.53 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

4GB

90.42 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

4GB

90.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

90.19 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

90.07 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

4GB

89.90 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

4GB

89.78 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

89.59 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

3GB

89.44 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

89.33 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

4GB

89.29 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

4GB

89.26 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

4GB

88.88 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

4GB

88.75 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

88.47 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-mini-instruct

4GB

88.41 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

88.31 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

4GB

88.04 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

88.00 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

87.94 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

87.59 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

87.28 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

3GB

87.02 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

86.99 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM2-135M

4GB

86.52 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

4GB

86.46 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

86.19 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

86.16 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

4GB

85.81 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

85.74 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

85.36 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

4GB

85.19 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

4GB

85.04 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

85.04 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

84.77 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

4GB

84.32 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

84.31 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

4GB

84.26 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

4GB

84.10 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

84.08 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

84.05 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

4GB

83.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

3GB

83.86 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

4GB

83.71 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

4GB

83.67 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

83.48 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

82.89 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

3GB

82.76 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

4GB

82.69 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

4GB

82.35 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

82.32 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

3GB

82.26 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

82.15 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

81.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

81.98 tok/sEstimated

Auto-generated benchmark

nvidia/NVIDIA-Nemotron-Nano-9B-v2

5GB

81.65 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

3GB

81.58 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

4GB

81.42 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

4GB

81.41 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

4GB

81.34 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

81.34 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

81.25 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

4GB

81.07 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

3GB

80.92 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

3GB

80.77 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

4GB

80.64 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

4GB

80.54 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

80.36 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

3GB

80.28 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

80.23 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

4GB

80.12 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

4GB

79.82 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

4GB

79.69 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

79.69 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

4GB

78.94 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

4GB

78.59 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

3GB

78.21 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

5GB

77.15 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

4GB

76.59 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

4GB

76.44 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

5GB

75.45 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

5GB

75.38 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

5GB

75.31 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B

7GB

75.07 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

5GB

74.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B-Base

7GB

74.07 tok/sEstimated

Auto-generated benchmark

ai-forever/ruGPT-3.5-13B

7GB

73.76 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B-Instruct

7GB

73.56 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B

7GB

72.68 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

5GB

72.50 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

4GB

71.84 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

5GB

71.75 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

5GB

71.50 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

4GB

71.09 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

5GB

70.98 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-13b-chat-hf

7GB

70.91 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit

4GB

70.34 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

5GB

69.76 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

4GB

69.70 tok/sEstimated

Auto-generated benchmark

OpenPipe/Qwen3-14B-Instruct

7GB

69.21 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

6GB

69.14 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

6GB

68.27 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

7GB

68.26 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM2-135M

7GB

68.17 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

5GB

68.13 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

7GB

67.96 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

7GB

67.75 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

7GB

67.65 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

7GB

67.12 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

7GB

67.01 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

7GB

66.89 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

7GB

66.58 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

7GB

66.54 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

5GB

66.52 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

7GB

66.48 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

6GB

66.47 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

7GB

66.24 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

7GB

66.02 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

7GB

65.30 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

7GB

65.17 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

7GB

65.17 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

7GB

64.83 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

7GB

64.51 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

8GB

64.44 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

7GB

64.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

7GB

64.16 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

7GB

64.09 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

7GB

64.05 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

7GB

63.93 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

7GB

63.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

7GB

63.73 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

5GB

63.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

7GB

63.50 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

8GB

63.43 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

7GB

63.33 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

8GB

63.23 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

8GB

62.86 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

8GB

62.77 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

7GB

62.67 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

7GB

62.64 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

7GB

62.38 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

7GB

62.26 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

7GB

62.24 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

7GB

61.84 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

7GB

61.75 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

7GB

61.55 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

7GB

61.54 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

8GB

61.49 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

7GB

61.34 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

7GB

61.32 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

7GB

61.23 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

7GB

60.83 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

8GB

60.83 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

8GB

60.67 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

7GB

60.64 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

7GB

60.61 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

7GB

60.41 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

7GB

60.22 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

8GB

60.12 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

7GB

59.94 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

6GB

59.76 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

7GB

59.52 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

7GB

59.16 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

7GB

59.09 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

7GB

59.09 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

8GB

59.08 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

7GB

59.01 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

7GB

58.87 tok/sEstimated

Auto-generated benchmark

sshleifer/tiny-gpt2

7GB

58.82 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

8GB

58.61 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

7GB

58.41 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

8GB

58.18 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

8GB

58.13 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

7GB

58.11 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-BF16

10GB

58.10 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

7GB

57.97 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

8GB

57.69 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

7GB

57.54 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

7GB

57.53 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-medium

7GB

57.50 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

7GB

57.42 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

7GB

57.23 tok/sEstimated

Auto-generated benchmark

openai/gpt-oss-20b

10GB

57.06 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-mini-instruct

7GB

56.88 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

7GB

56.81 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

7GB

56.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

7GB

56.60 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

8GB

56.60 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

7GB

56.39 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

8GB

56.22 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

8GB

56.15 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit

15GB

54.37 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-unsloth-bnb-4bit

10GB

54.32 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

8GB

53.92 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

8GB

53.76 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

8GB

53.49 tok/sEstimated

Auto-generated benchmark

mlx-community/gpt-oss-20b-MXFP4-Q8

10GB

53.39 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-13b-chat-hf

13GB

53.32 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-32B-Instruct

16GB

52.96 tok/sEstimated

Auto-generated benchmark

nvidia/NVIDIA-Nemotron-Nano-9B-v2

9GB

52.93 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

15GB

52.14 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit

15GB

52.06 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B

14GB

51.69 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B

15GB

51.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B-Base

14GB

51.12 tok/sEstimated

Auto-generated benchmark

codellama/CodeLlama-34b-hf

17GB

49.89 tok/sEstimated

Auto-generated benchmark

baichuan-inc/Baichuan-M2-32B

16GB

49.34 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B

14GB

49.15 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit

15GB

48.99 tok/sEstimated

Auto-generated benchmark

OpenPipe/Qwen3-14B-Instruct

14GB

48.68 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit

15GB

48.12 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Thinking-2507

15GB

47.71 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Coder-30B-A3B-Instruct

15GB

47.41 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Instruct-2507

15GB

47.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B-Instruct

14GB

46.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-32B

16GB

46.23 tok/sEstimated

Auto-generated benchmark

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit

16GB

46.02 tok/sEstimated

Auto-generated benchmark

ai-forever/ruGPT-3.5-13B

13GB

45.74 tok/sEstimated

Auto-generated benchmark

openai/gpt-oss-20b

20GB

45.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-32B

16GB

44.83 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

16GB

44.61 tok/sEstimated

Auto-generated benchmark

mlx-community/gpt-oss-20b-MXFP4-Q8

20GB

44.31 tok/sEstimated

Auto-generated benchmark

dphn/dolphin-2.9.1-yi-1.5-34b

17GB

43.39 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-BF16

20GB

41.98 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-unsloth-bnb-4bit

20GB

41.36 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit

30GB

36.82 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit

30GB

36.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Thinking-2507

30GB

36.02 tok/sEstimated

Auto-generated benchmark

baichuan-inc/Baichuan-M2-32B

32GB

35.49 tok/sEstimated

Auto-generated benchmark

dphn/dolphin-2.9.1-yi-1.5-34b

34GB

35.47 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-32B

32GB

35.21 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-32B

32GB

34.42 tok/sEstimated

Auto-generated benchmark

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic

35GB

34.29 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B

30GB

33.94 tok/sEstimated

Auto-generated benchmark

codellama/CodeLlama-34b-hf

34GB

33.36 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.3-70B-Instruct

35GB

33.33 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

30GB

33.28 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Instruct-2507

30GB

32.97 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Coder-30B-A3B-Instruct

30GB

32.96 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-72B-Instruct

36GB

32.93 tok/sEstimated

Auto-generated benchmark

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit

32GB

32.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-32B-Instruct

32GB

32.55 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit

30GB

32.43 tok/sEstimated

Auto-generated benchmark

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

35GB

31.88 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

32GB

31.76 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-70B-Instruct

35GB

31.69 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit

30GB

31.37 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8

40GB

30.05 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Next-80B-A3B-Instruct

40GB

30.00 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-70B-Instruct

35GB

29.70 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8

40GB

28.99 tok/sEstimated

Auto-generated benchmark

AI-MO/Kimina-Prover-72B

36GB

28.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Next-80B-A3B-Thinking

40GB

28.22 tok/sEstimated

Auto-generated benchmark

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic

45GB

26.38 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q8	Not supported	—	80GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q4	Fits comfortably	30.05 tok/sEstimated	40GB (have 48GB)
ai-forever/ruGPT-3.5-13B	Q8	Fits comfortably	45.74 tok/sEstimated	13GB (have 48GB)
ai-forever/ruGPT-3.5-13B	Q4	Fits comfortably	73.76 tok/sEstimated	7GB (have 48GB)
baichuan-inc/Baichuan-M2-32B	Q8	Fits comfortably	35.49 tok/sEstimated	32GB (have 48GB)
baichuan-inc/Baichuan-M2-32B	Q4	Fits comfortably	49.34 tok/sEstimated	16GB (have 48GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	62.24 tok/sEstimated	7GB (have 48GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	93.33 tok/sEstimated	4GB (have 48GB)
ibm-granite/granite-3.3-8b-instruct	Q8	Fits comfortably	58.13 tok/sEstimated	8GB (have 48GB)
ibm-granite/granite-3.3-8b-instruct	Q4	Fits comfortably	84.05 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-1.7B-Base	Q8	Fits comfortably	64.16 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-1.7B-Base	Q4	Fits comfortably	90.19 tok/sEstimated	4GB (have 48GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q8	Fits comfortably	41.36 tok/sEstimated	20GB (have 48GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q4	Fits comfortably	54.32 tok/sEstimated	10GB (have 48GB)
BSC-LT/salamandraTA-7b-instruct	Q8	Fits comfortably	62.26 tok/sEstimated	7GB (have 48GB)
BSC-LT/salamandraTA-7b-instruct	Q4	Fits comfortably	84.77 tok/sEstimated	4GB (have 48GB)
dicta-il/dictalm2.0-instruct	Q8	Fits comfortably	66.48 tok/sEstimated	7GB (have 48GB)
dicta-il/dictalm2.0-instruct	Q4	Fits comfortably	90.53 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q8	Fits comfortably	36.62 tok/sEstimated	30GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q4	Fits comfortably	52.06 tok/sEstimated	15GB (have 48GB)
GSAI-ML/LLaDA-8B-Base	Q8	Fits comfortably	60.67 tok/sEstimated	8GB (have 48GB)
GSAI-ML/LLaDA-8B-Base	Q4	Fits comfortably	90.57 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q8	Fits comfortably	32.43 tok/sEstimated	30GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q4	Fits comfortably	48.12 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen2-0.5B-Instruct	Q8	Fits comfortably	75.31 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2-0.5B-Instruct	Q4	Fits comfortably	97.76 tok/sEstimated	3GB (have 48GB)
deepseek-ai/DeepSeek-V3	Q8	Fits comfortably	60.22 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-V3	Q4	Fits comfortably	85.74 tok/sEstimated	4GB (have 48GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q8	Fits comfortably	74.70 tok/sEstimated	5GB (have 48GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	Fits comfortably	108.13 tok/sEstimated	3GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Fits comfortably	36.82 tok/sEstimated	30GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	Fits comfortably	54.37 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	Q8	Fits comfortably	36.02 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	Q4	Fits comfortably	47.71 tok/sEstimated	15GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q8	Fits comfortably	31.37 tok/sEstimated	30GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q4	Fits comfortably	48.99 tok/sEstimated	15GB (have 48GB)
AI-MO/Kimina-Prover-72B	Q8	Not supported	—	72GB (have 48GB)
AI-MO/Kimina-Prover-72B	Q4	Fits comfortably	28.65 tok/sEstimated	36GB (have 48GB)
apple/OpenELM-1_1B-Instruct	Q8	Fits comfortably	116.07 tok/sEstimated	1GB (have 48GB)
apple/OpenELM-1_1B-Instruct	Q4	Fits comfortably	192.20 tok/sEstimated	1GB (have 48GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	59.08 tok/sEstimated	8GB (have 48GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	84.10 tok/sEstimated	4GB (have 48GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	Fits comfortably	52.93 tok/sEstimated	9GB (have 48GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	Fits comfortably	81.65 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-3B	Q8	Fits comfortably	82.26 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-3B	Q4	Fits comfortably	125.26 tok/sEstimated	2GB (have 48GB)
lmsys/vicuna-7b-v1.5	Q8	Fits comfortably	67.01 tok/sEstimated	7GB (have 48GB)
lmsys/vicuna-7b-v1.5	Q4	Fits comfortably	90.42 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-2-13b-chat-hf	Q8	Fits comfortably	53.32 tok/sEstimated	13GB (have 48GB)
meta-llama/Llama-2-13b-chat-hf	Q4	Fits comfortably	70.91 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q8	Not supported	—	80GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q4	Fits comfortably	28.22 tok/sEstimated	40GB (have 48GB)
unsloth/gemma-3-1b-it	Q8	Fits comfortably	114.24 tok/sEstimated	1GB (have 48GB)
unsloth/gemma-3-1b-it	Q4	Fits comfortably	166.50 tok/sEstimated	1GB (have 48GB)
bigcode/starcoder2-3b	Q8	Fits comfortably	82.76 tok/sEstimated	3GB (have 48GB)
bigcode/starcoder2-3b	Q4	Fits comfortably	131.45 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q8	Not supported	—	80GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q4	Fits comfortably	28.99 tok/sEstimated	40GB (have 48GB)
ibm-granite/granite-docling-258M	Q8	Fits comfortably	61.34 tok/sEstimated	7GB (have 48GB)
ibm-granite/granite-docling-258M	Q4	Fits comfortably	88.75 tok/sEstimated	4GB (have 48GB)
skt/kogpt2-base-v2	Q8	Fits comfortably	56.62 tok/sEstimated	7GB (have 48GB)
skt/kogpt2-base-v2	Q4	Fits comfortably	82.89 tok/sEstimated	4GB (have 48GB)
google/gemma-3-270m-it	Q8	Fits comfortably	59.94 tok/sEstimated	7GB (have 48GB)
google/gemma-3-270m-it	Q4	Fits comfortably	92.30 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	Fits comfortably	76.44 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	Fits comfortably	99.77 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen2.5-32B	Q8	Fits comfortably	35.21 tok/sEstimated	32GB (have 48GB)
Qwen/Qwen2.5-32B	Q4	Fits comfortably	46.23 tok/sEstimated	16GB (have 48GB)
parler-tts/parler-tts-large-v1	Q8	Fits comfortably	58.87 tok/sEstimated	7GB (have 48GB)
parler-tts/parler-tts-large-v1	Q4	Fits comfortably	80.23 tok/sEstimated	4GB (have 48GB)
EleutherAI/pythia-70m-deduped	Q8	Fits comfortably	67.12 tok/sEstimated	7GB (have 48GB)
EleutherAI/pythia-70m-deduped	Q4	Fits comfortably	88.00 tok/sEstimated	4GB (have 48GB)
microsoft/VibeVoice-1.5B	Q8	Fits comfortably	66.52 tok/sEstimated	5GB (have 48GB)
microsoft/VibeVoice-1.5B	Q4	Fits comfortably	90.70 tok/sEstimated	3GB (have 48GB)
ibm-granite/granite-3.3-2b-instruct	Q8	Fits comfortably	105.30 tok/sEstimated	2GB (have 48GB)
ibm-granite/granite-3.3-2b-instruct	Q4	Fits comfortably	134.23 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen2.5-72B-Instruct	Q8	Not supported	—	72GB (have 48GB)
Qwen/Qwen2.5-72B-Instruct	Q4	Fits comfortably	32.93 tok/sEstimated	36GB (have 48GB)
liuhaotian/llava-v1.5-7b	Q8	Fits comfortably	63.91 tok/sEstimated	7GB (have 48GB)
liuhaotian/llava-v1.5-7b	Q4	Fits comfortably	81.41 tok/sEstimated	4GB (have 48GB)
google/gemma-2b	Q8	Fits comfortably	102.71 tok/sEstimated	2GB (have 48GB)
google/gemma-2b	Q4	Fits comfortably	137.77 tok/sEstimated	1GB (have 48GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	Fits comfortably	62.38 tok/sEstimated	7GB (have 48GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	Fits comfortably	93.80 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-235B-A22B	Q8	Not supported	—	235GB (have 48GB)
Qwen/Qwen3-235B-A22B	Q4	Not supported	—	118GB (have 48GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	Fits comfortably	64.44 tok/sEstimated	8GB (have 48GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	78.59 tok/sEstimated	4GB (have 48GB)
microsoft/Phi-4-mini-instruct	Q8	Fits comfortably	56.88 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-4-mini-instruct	Q4	Fits comfortably	88.41 tok/sEstimated	4GB (have 48GB)
llamafactory/tiny-random-Llama-3	Q8	Fits comfortably	62.67 tok/sEstimated	7GB (have 48GB)
llamafactory/tiny-random-Llama-3	Q4	Fits comfortably	95.92 tok/sEstimated	4GB (have 48GB)
HuggingFaceH4/zephyr-7b-beta	Q8	Fits comfortably	56.81 tok/sEstimated	7GB (have 48GB)
HuggingFaceH4/zephyr-7b-beta	Q4	Fits comfortably	89.33 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Thinking-2507	Q8	Fits comfortably	71.84 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Thinking-2507	Q4	Fits comfortably	99.46 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q8	Fits comfortably	33.28 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q4	Fits comfortably	52.14 tok/sEstimated	15GB (have 48GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	Fits comfortably	63.23 tok/sEstimated	8GB (have 48GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	Fits comfortably	89.90 tok/sEstimated	4GB (have 48GB)
unsloth/Llama-3.2-1B-Instruct	Q8	Fits comfortably	132.18 tok/sEstimated	1GB (have 48GB)
unsloth/Llama-3.2-1B-Instruct	Q4	Fits comfortably	180.86 tok/sEstimated	1GB (have 48GB)
GSAI-ML/LLaDA-8B-Instruct	Q8	Fits comfortably	62.77 tok/sEstimated	8GB (have 48GB)
GSAI-ML/LLaDA-8B-Instruct	Q4	Fits comfortably	87.59 tok/sEstimated	4GB (have 48GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q8	Not supported	—	90GB (have 48GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q4	Fits comfortably	26.38 tok/sEstimated	45GB (have 48GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	Fits comfortably	63.50 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	Fits comfortably	94.86 tok/sEstimated	4GB (have 48GB)
numind/NuExtract-1.5	Q8	Fits comfortably	59.01 tok/sEstimated	7GB (have 48GB)
numind/NuExtract-1.5	Q4	Fits comfortably	90.07 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q8	Fits comfortably	66.54 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	Fits comfortably	88.47 tok/sEstimated	4GB (have 48GB)
hmellor/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	61.54 tok/sEstimated	7GB (have 48GB)
hmellor/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	93.19 tok/sEstimated	4GB (have 48GB)
huggyllama/llama-7b	Q8	Fits comfortably	67.65 tok/sEstimated	7GB (have 48GB)
huggyllama/llama-7b	Q4	Fits comfortably	97.44 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-V3-0324	Q8	Fits comfortably	58.41 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-V3-0324	Q4	Fits comfortably	91.37 tok/sEstimated	4GB (have 48GB)
microsoft/Phi-3-mini-128k-instruct	Q8	Fits comfortably	58.11 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3-mini-128k-instruct	Q4	Fits comfortably	81.25 tok/sEstimated	4GB (have 48GB)
sshleifer/tiny-gpt2	Q8	Fits comfortably	58.82 tok/sEstimated	7GB (have 48GB)
sshleifer/tiny-gpt2	Q4	Fits comfortably	94.30 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-Guard-3-8B	Q8	Fits comfortably	56.60 tok/sEstimated	8GB (have 48GB)
meta-llama/Llama-Guard-3-8B	Q4	Fits comfortably	92.21 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2-xl	Q8	Fits comfortably	61.23 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2-xl	Q4	Fits comfortably	87.94 tok/sEstimated	4GB (have 48GB)
OpenPipe/Qwen3-14B-Instruct	Q8	Fits comfortably	48.68 tok/sEstimated	14GB (have 48GB)
OpenPipe/Qwen3-14B-Instruct	Q4	Fits comfortably	69.21 tok/sEstimated	7GB (have 48GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q8	Not supported	—	70GB (have 48GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q4	Fits comfortably	31.88 tok/sEstimated	35GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q8	Fits comfortably	70.34 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q4	Fits comfortably	103.87 tok/sEstimated	2GB (have 48GB)
ibm-research/PowerMoE-3b	Q8	Fits comfortably	89.44 tok/sEstimated	3GB (have 48GB)
ibm-research/PowerMoE-3b	Q4	Fits comfortably	122.17 tok/sEstimated	2GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q8	Fits comfortably	71.09 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	Fits comfortably	116.15 tok/sEstimated	2GB (have 48GB)
unsloth/Llama-3.2-3B-Instruct	Q8	Fits comfortably	78.21 tok/sEstimated	3GB (have 48GB)
unsloth/Llama-3.2-3B-Instruct	Q4	Fits comfortably	109.75 tok/sEstimated	2GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	Fits comfortably	80.64 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	Fits comfortably	112.45 tok/sEstimated	2GB (have 48GB)
meta-llama/Llama-3.2-3B	Q8	Fits comfortably	87.02 tok/sEstimated	3GB (have 48GB)
meta-llama/Llama-3.2-3B	Q4	Fits comfortably	127.97 tok/sEstimated	2GB (have 48GB)
EleutherAI/gpt-neo-125m	Q8	Fits comfortably	65.17 tok/sEstimated	7GB (have 48GB)
EleutherAI/gpt-neo-125m	Q4	Fits comfortably	95.94 tok/sEstimated	4GB (have 48GB)
codellama/CodeLlama-34b-hf	Q8	Fits comfortably	33.36 tok/sEstimated	34GB (have 48GB)
codellama/CodeLlama-34b-hf	Q4	Fits comfortably	49.89 tok/sEstimated	17GB (have 48GB)
meta-llama/Llama-Guard-3-1B	Q8	Fits comfortably	136.33 tok/sEstimated	1GB (have 48GB)
meta-llama/Llama-Guard-3-1B	Q4	Fits comfortably	175.85 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen2-1.5B-Instruct	Q8	Fits comfortably	75.38 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2-1.5B-Instruct	Q4	Fits comfortably	102.84 tok/sEstimated	3GB (have 48GB)
google/gemma-2-2b-it	Q8	Fits comfortably	94.59 tok/sEstimated	2GB (have 48GB)
google/gemma-2-2b-it	Q4	Fits comfortably	148.05 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen2.5-14B	Q8	Fits comfortably	51.69 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	75.07 tok/sEstimated	7GB (have 48GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	Fits comfortably	32.65 tok/sEstimated	32GB (have 48GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q4	Fits comfortably	46.02 tok/sEstimated	16GB (have 48GB)
microsoft/Phi-3.5-mini-instruct	Q8	Fits comfortably	66.89 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3.5-mini-instruct	Q4	Fits comfortably	80.36 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Base	Q8	Fits comfortably	78.94 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	108.82 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen2-7B-Instruct	Q8	Fits comfortably	63.73 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2-7B-Instruct	Q4	Fits comfortably	86.46 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-2-7b-chat-hf	Q8	Fits comfortably	64.51 tok/sEstimated	7GB (have 48GB)
meta-llama/Llama-2-7b-chat-hf	Q4	Fits comfortably	92.60 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-14B-Base	Q8	Fits comfortably	51.12 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen3-14B-Base	Q4	Fits comfortably	74.07 tok/sEstimated	7GB (have 48GB)
swiss-ai/Apertus-8B-Instruct-2509	Q8	Fits comfortably	58.61 tok/sEstimated	8GB (have 48GB)
swiss-ai/Apertus-8B-Instruct-2509	Q4	Fits comfortably	84.31 tok/sEstimated	4GB (have 48GB)
microsoft/Phi-3.5-vision-instruct	Q8	Fits comfortably	59.09 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3.5-vision-instruct	Q4	Fits comfortably	81.99 tok/sEstimated	4GB (have 48GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	Fits comfortably	60.61 tok/sEstimated	7GB (have 48GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	Fits comfortably	89.78 tok/sEstimated	4GB (have 48GB)
rinna/japanese-gpt-neox-small	Q8	Fits comfortably	59.16 tok/sEstimated	7GB (have 48GB)
rinna/japanese-gpt-neox-small	Q4	Fits comfortably	85.19 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-Coder-1.5B	Q8	Fits comfortably	71.75 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-Coder-1.5B	Q4	Fits comfortably	92.29 tok/sEstimated	3GB (have 48GB)
IlyaGusev/saiga_llama3_8b	Q8	Fits comfortably	60.83 tok/sEstimated	8GB (have 48GB)
IlyaGusev/saiga_llama3_8b	Q4	Fits comfortably	85.81 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-30B-A3B	Q8	Fits comfortably	33.94 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-30B-A3B	Q4	Fits comfortably	51.35 tok/sEstimated	15GB (have 48GB)
deepseek-ai/DeepSeek-R1	Q8	Fits comfortably	62.64 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-R1	Q4	Fits comfortably	82.35 tok/sEstimated	4GB (have 48GB)
microsoft/DialoGPT-small	Q8	Fits comfortably	64.40 tok/sEstimated	7GB (have 48GB)
microsoft/DialoGPT-small	Q4	Fits comfortably	89.59 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-8B-FP8	Q8	Fits comfortably	56.22 tok/sEstimated	8GB (have 48GB)
Qwen/Qwen3-8B-FP8	Q4	Fits comfortably	81.98 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q8	Fits comfortably	32.96 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q4	Fits comfortably	47.41 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-Embedding-4B	Q8	Fits comfortably	69.70 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-Embedding-4B	Q4	Fits comfortably	99.75 tok/sEstimated	2GB (have 48GB)
microsoft/Phi-4-multimodal-instruct	Q8	Fits comfortably	64.09 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-4-multimodal-instruct	Q4	Fits comfortably	83.48 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-8B-Base	Q8	Fits comfortably	53.92 tok/sEstimated	8GB (have 48GB)
Qwen/Qwen3-8B-Base	Q4	Fits comfortably	92.55 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-0.6B-Base	Q8	Fits comfortably	59.76 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen3-0.6B-Base	Q4	Fits comfortably	86.19 tok/sEstimated	3GB (have 48GB)
openai-community/gpt2-medium	Q8	Fits comfortably	67.96 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2-medium	Q4	Fits comfortably	85.04 tok/sEstimated	4GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	63.93 tok/sEstimated	7GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	91.77 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	Q8	Fits comfortably	75.45 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	Q4	Fits comfortably	105.48 tok/sEstimated	3GB (have 48GB)
HuggingFaceTB/SmolLM-135M	Q8	Fits comfortably	64.05 tok/sEstimated	7GB (have 48GB)
HuggingFaceTB/SmolLM-135M	Q4	Fits comfortably	81.42 tok/sEstimated	4GB (have 48GB)
unsloth/gpt-oss-20b-BF16	Q8	Fits comfortably	41.98 tok/sEstimated	20GB (have 48GB)
unsloth/gpt-oss-20b-BF16	Q4	Fits comfortably	58.10 tok/sEstimated	10GB (have 48GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Not supported	—	70GB (have 48GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Fits comfortably	31.69 tok/sEstimated	35GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	58.18 tok/sEstimated	8GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	81.07 tok/sEstimated	4GB (have 48GB)
zai-org/GLM-4.5-Air	Q8	Fits comfortably	63.33 tok/sEstimated	7GB (have 48GB)
zai-org/GLM-4.5-Air	Q4	Fits comfortably	95.38 tok/sEstimated	4GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.1	Q8	Fits comfortably	67.75 tok/sEstimated	7GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.1	Q4	Fits comfortably	84.32 tok/sEstimated	4GB (have 48GB)
LiquidAI/LFM2-1.2B	Q8	Fits comfortably	105.86 tok/sEstimated	2GB (have 48GB)
LiquidAI/LFM2-1.2B	Q4	Fits comfortably	151.65 tok/sEstimated	1GB (have 48GB)
mistralai/Mistral-7B-v0.1	Q8	Fits comfortably	68.26 tok/sEstimated	7GB (have 48GB)
mistralai/Mistral-7B-v0.1	Q4	Fits comfortably	97.44 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Fits comfortably	32.55 tok/sEstimated	32GB (have 48GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Fits comfortably	52.96 tok/sEstimated	16GB (have 48GB)
deepseek-ai/DeepSeek-R1-0528	Q8	Fits comfortably	66.24 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-R1-0528	Q4	Fits comfortably	88.31 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	53.76 tok/sEstimated	8GB (have 48GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	86.99 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	59.52 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-V3.1	Q4	Fits comfortably	83.99 tok/sEstimated	4GB (have 48GB)
microsoft/phi-4	Q8	Fits comfortably	60.83 tok/sEstimated	7GB (have 48GB)
microsoft/phi-4	Q4	Fits comfortably	84.26 tok/sEstimated	4GB (have 48GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	92.82 tok/sEstimated	3GB (have 48GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	111.60 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	77.15 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	90.71 tok/sEstimated	3GB (have 48GB)
MiniMaxAI/MiniMax-M2	Q8	Fits comfortably	57.23 tok/sEstimated	7GB (have 48GB)
MiniMaxAI/MiniMax-M2	Q4	Fits comfortably	89.26 tok/sEstimated	4GB (have 48GB)
microsoft/DialoGPT-medium	Q8	Fits comfortably	57.50 tok/sEstimated	7GB (have 48GB)
microsoft/DialoGPT-medium	Q4	Fits comfortably	94.10 tok/sEstimated	4GB (have 48GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	57.53 tok/sEstimated	7GB (have 48GB)
zai-org/GLM-4.6-FP8	Q4	Fits comfortably	94.03 tok/sEstimated	4GB (have 48GB)
HuggingFaceTB/SmolLM2-135M	Q8	Fits comfortably	68.17 tok/sEstimated	7GB (have 48GB)
HuggingFaceTB/SmolLM2-135M	Q4	Fits comfortably	86.52 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	56.15 tok/sEstimated	8GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	84.08 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-2-7b-hf	Q8	Fits comfortably	65.30 tok/sEstimated	7GB (have 48GB)
meta-llama/Llama-2-7b-hf	Q4	Fits comfortably	92.44 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	57.42 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	90.91 tok/sEstimated	4GB (have 48GB)
microsoft/phi-2	Q8	Fits comfortably	60.41 tok/sEstimated	7GB (have 48GB)
microsoft/phi-2	Q4	Fits comfortably	85.36 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	—	70GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Fits comfortably	29.70 tok/sEstimated	35GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	68.13 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q4	Fits comfortably	100.27 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen3-14B	Q8	Fits comfortably	49.15 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen3-14B	Q4	Fits comfortably	72.68 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-Embedding-8B	Q8	Fits comfortably	53.49 tok/sEstimated	8GB (have 48GB)
Qwen/Qwen3-Embedding-8B	Q4	Fits comfortably	79.69 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.3-70B-Instruct	Q8	Not supported	—	70GB (have 48GB)
meta-llama/Llama-3.3-70B-Instruct	Q4	Fits comfortably	33.33 tok/sEstimated	35GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q8	Fits comfortably	61.49 tok/sEstimated	8GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	Fits comfortably	82.32 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	46.98 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	73.56 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-1.5B	Q8	Fits comfortably	70.98 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	110.10 tok/sEstimated	3GB (have 48GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	Fits comfortably	79.82 tok/sEstimated	4GB (have 48GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	Fits comfortably	103.62 tok/sEstimated	2GB (have 48GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	Fits comfortably	44.31 tok/sEstimated	20GB (have 48GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	Fits comfortably	53.39 tok/sEstimated	10GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	69.76 tok/sEstimated	5GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	95.86 tok/sEstimated	3GB (have 48GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	60.12 tok/sEstimated	8GB (have 48GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	85.04 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-Reranker-0.6B	Q8	Fits comfortably	69.14 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen3-Reranker-0.6B	Q4	Fits comfortably	93.33 tok/sEstimated	3GB (have 48GB)
rednote-hilab/dots.ocr	Q8	Fits comfortably	61.75 tok/sEstimated	7GB (have 48GB)
rednote-hilab/dots.ocr	Q4	Fits comfortably	95.72 tok/sEstimated	4GB (have 48GB)
google-t5/t5-3b	Q8	Fits comfortably	81.58 tok/sEstimated	3GB (have 48GB)
google-t5/t5-3b	Q4	Fits comfortably	131.80 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q8	Fits comfortably	32.97 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	47.35 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-4B	Q8	Fits comfortably	80.54 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B	Q4	Fits comfortably	98.58 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	56.60 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	94.07 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2-large	Q8	Fits comfortably	66.58 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2-large	Q4	Fits comfortably	82.69 tok/sEstimated	4GB (have 48GB)
microsoft/Phi-3-mini-4k-instruct	Q8	Fits comfortably	60.64 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3-mini-4k-instruct	Q4	Fits comfortably	88.04 tok/sEstimated	4GB (have 48GB)
allenai/OLMo-2-0425-1B	Q8	Fits comfortably	120.42 tok/sEstimated	1GB (have 48GB)
allenai/OLMo-2-0425-1B	Q4	Fits comfortably	189.90 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q8	Not supported	—	80GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Fits comfortably	30.00 tok/sEstimated	40GB (have 48GB)
Qwen/Qwen3-32B	Q8	Fits comfortably	34.42 tok/sEstimated	32GB (have 48GB)
Qwen/Qwen3-32B	Q4	Fits comfortably	44.83 tok/sEstimated	16GB (have 48GB)
Qwen/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	71.50 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	92.12 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-7B	Q8	Fits comfortably	61.84 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-7B	Q4	Fits comfortably	80.12 tok/sEstimated	4GB (have 48GB)
meta-llama/Meta-Llama-3-8B	Q8	Fits comfortably	57.69 tok/sEstimated	8GB (have 48GB)
meta-llama/Meta-Llama-3-8B	Q4	Fits comfortably	79.69 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.2-1B	Q8	Fits comfortably	137.23 tok/sEstimated	1GB (have 48GB)
meta-llama/Llama-3.2-1B	Q4	Fits comfortably	167.62 tok/sEstimated	1GB (have 48GB)
petals-team/StableBeluga2	Q8	Fits comfortably	57.97 tok/sEstimated	7GB (have 48GB)
petals-team/StableBeluga2	Q4	Fits comfortably	81.34 tok/sEstimated	4GB (have 48GB)
vikhyatk/moondream2	Q8	Fits comfortably	56.39 tok/sEstimated	7GB (have 48GB)
vikhyatk/moondream2	Q4	Fits comfortably	90.40 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	80.77 tok/sEstimated	3GB (have 48GB)
meta-llama/Llama-3.2-3B-Instruct	Q4	Fits comfortably	131.00 tok/sEstimated	2GB (have 48GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q8	Not supported	—	70GB (have 48GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	Fits comfortably	34.29 tok/sEstimated	35GB (have 48GB)
distilbert/distilgpt2	Q8	Fits comfortably	64.83 tok/sEstimated	7GB (have 48GB)
distilbert/distilgpt2	Q4	Fits comfortably	89.29 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q8	Fits comfortably	31.76 tok/sEstimated	32GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q4	Fits comfortably	44.61 tok/sEstimated	16GB (have 48GB)
inference-net/Schematron-3B	Q8	Fits comfortably	80.28 tok/sEstimated	3GB (have 48GB)
inference-net/Schematron-3B	Q4	Fits comfortably	112.92 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	62.86 tok/sEstimated	8GB (have 48GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	90.82 tok/sEstimated	4GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	65.17 tok/sEstimated	7GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	83.71 tok/sEstimated	4GB (have 48GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	80.92 tok/sEstimated	3GB (have 48GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	112.02 tok/sEstimated	2GB (have 48GB)
bigscience/bloomz-560m	Q8	Fits comfortably	59.09 tok/sEstimated	7GB (have 48GB)
bigscience/bloomz-560m	Q4	Fits comfortably	82.15 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-3B-Instruct	Q8	Fits comfortably	83.86 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	122.40 tok/sEstimated	2GB (have 48GB)
openai/gpt-oss-120b	Q8	Not supported	—	120GB (have 48GB)
openai/gpt-oss-120b	Q4	Not supported	—	60GB (have 48GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	124.33 tok/sEstimated	1GB (have 48GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	170.05 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen3-4B-Instruct-2507	Q8	Fits comfortably	76.59 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	98.98 tok/sEstimated	2GB (have 48GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	61.32 tok/sEstimated	7GB (have 48GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	88.88 tok/sEstimated	4GB (have 48GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	133.24 tok/sEstimated	1GB (have 48GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	Fits comfortably	176.80 tok/sEstimated	1GB (have 48GB)
facebook/opt-125m	Q8	Fits comfortably	61.55 tok/sEstimated	7GB (have 48GB)
facebook/opt-125m	Q4	Fits comfortably	81.34 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-1.5B-Instruct	Q8	Fits comfortably	72.50 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	96.19 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	66.47 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	98.75 tok/sEstimated	3GB (have 48GB)
google/gemma-3-1b-it	Q8	Fits comfortably	115.42 tok/sEstimated	1GB (have 48GB)
google/gemma-3-1b-it	Q4	Fits comfortably	186.76 tok/sEstimated	1GB (have 48GB)
openai/gpt-oss-20b	Q8	Fits comfortably	45.36 tok/sEstimated	20GB (have 48GB)
openai/gpt-oss-20b	Q4	Fits comfortably	57.06 tok/sEstimated	10GB (have 48GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Fits comfortably	35.47 tok/sEstimated	34GB (have 48GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	Fits comfortably	43.39 tok/sEstimated	17GB (have 48GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	63.43 tok/sEstimated	8GB (have 48GB)
meta-llama/Llama-3.1-8B-Instruct	Q4	Fits comfortably	83.67 tok/sEstimated	4GB (have 48GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	63.70 tok/sEstimated	5GB (have 48GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	98.10 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen3-0.6B	Q8	Fits comfortably	68.27 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen3-0.6B	Q4	Fits comfortably	91.08 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-7B-Instruct	Q8	Fits comfortably	57.54 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-7B-Instruct	Q4	Fits comfortably	87.28 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2	Q8	Fits comfortably	66.02 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2	Q4	Fits comfortably	86.16 tok/sEstimated	4GB (have 48GB)

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8Q8

Not supported80GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8Q4

Fits comfortably40GB required · 48GB available

30.05 tok/sEstimated

ai-forever/ruGPT-3.5-13BQ8

Fits comfortably13GB required · 48GB available

45.74 tok/sEstimated

ai-forever/ruGPT-3.5-13BQ4

Fits comfortably7GB required · 48GB available

73.76 tok/sEstimated

baichuan-inc/Baichuan-M2-32BQ8

Fits comfortably32GB required · 48GB available

35.49 tok/sEstimated

baichuan-inc/Baichuan-M2-32BQ4

Fits comfortably16GB required · 48GB available

49.34 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 48GB available

62.24 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 48GB available

93.33 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ8

Fits comfortably8GB required · 48GB available

58.13 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ4

Fits comfortably4GB required · 48GB available

84.05 tok/sEstimated

Qwen/Qwen3-1.7B-BaseQ8

Fits comfortably7GB required · 48GB available

64.16 tok/sEstimated

Qwen/Qwen3-1.7B-BaseQ4

Fits comfortably4GB required · 48GB available

90.19 tok/sEstimated

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ8

Fits comfortably20GB required · 48GB available

41.36 tok/sEstimated

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ4

Fits comfortably10GB required · 48GB available

54.32 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ8

Fits comfortably7GB required · 48GB available

62.26 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ4

Fits comfortably4GB required · 48GB available

84.77 tok/sEstimated

dicta-il/dictalm2.0-instructQ8

Fits comfortably7GB required · 48GB available

66.48 tok/sEstimated

dicta-il/dictalm2.0-instructQ4

Fits comfortably4GB required · 48GB available

90.53 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ8

Fits comfortably30GB required · 48GB available

36.62 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ4

Fits comfortably15GB required · 48GB available

52.06 tok/sEstimated

GSAI-ML/LLaDA-8B-BaseQ8

Fits comfortably8GB required · 48GB available

60.67 tok/sEstimated

GSAI-ML/LLaDA-8B-BaseQ4

Fits comfortably4GB required · 48GB available

90.57 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bitQ8

Fits comfortably30GB required · 48GB available

32.43 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bitQ4

Fits comfortably15GB required · 48GB available

48.12 tok/sEstimated

Qwen/Qwen2-0.5B-InstructQ8

Fits comfortably5GB required · 48GB available

75.31 tok/sEstimated

Qwen/Qwen2-0.5B-InstructQ4

Fits comfortably3GB required · 48GB available

97.76 tok/sEstimated

deepseek-ai/DeepSeek-V3Q8

Fits comfortably7GB required · 48GB available

60.22 tok/sEstimated

deepseek-ai/DeepSeek-V3Q4

Fits comfortably4GB required · 48GB available

85.74 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ8

Fits comfortably5GB required · 48GB available

74.70 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ4

Fits comfortably3GB required · 48GB available

108.13 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Fits comfortably30GB required · 48GB available

36.82 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ4

Fits comfortably15GB required · 48GB available

54.37 tok/sEstimated

Qwen/Qwen3-30B-A3B-Thinking-2507Q8

Fits comfortably30GB required · 48GB available

36.02 tok/sEstimated

Qwen/Qwen3-30B-A3B-Thinking-2507Q4

Fits comfortably15GB required · 48GB available

47.71 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ8

Fits comfortably30GB required · 48GB available

31.37 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ4

Fits comfortably15GB required · 48GB available

48.99 tok/sEstimated

AI-MO/Kimina-Prover-72BQ8

Not supported72GB required · 48GB available

Speed data coming soon

AI-MO/Kimina-Prover-72BQ4

Fits comfortably36GB required · 48GB available

28.65 tok/sEstimated

apple/OpenELM-1_1B-InstructQ8

Fits comfortably1GB required · 48GB available

116.07 tok/sEstimated

apple/OpenELM-1_1B-InstructQ4

Fits comfortably1GB required · 48GB available

192.20 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 48GB available

59.08 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 48GB available

84.10 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q8

Fits comfortably9GB required · 48GB available

52.93 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q4

Fits comfortably5GB required · 48GB available

81.65 tok/sEstimated

Qwen/Qwen2.5-3BQ8

Fits comfortably3GB required · 48GB available

82.26 tok/sEstimated

Qwen/Qwen2.5-3BQ4

Fits comfortably2GB required · 48GB available

125.26 tok/sEstimated

lmsys/vicuna-7b-v1.5Q8

Fits comfortably7GB required · 48GB available

67.01 tok/sEstimated

lmsys/vicuna-7b-v1.5Q4

Fits comfortably4GB required · 48GB available

90.42 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ8

Fits comfortably13GB required · 48GB available

53.32 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ4

Fits comfortably7GB required · 48GB available

70.91 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ8

Not supported80GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-ThinkingQ4

Fits comfortably40GB required · 48GB available

28.22 tok/sEstimated

unsloth/gemma-3-1b-itQ8

Fits comfortably1GB required · 48GB available

114.24 tok/sEstimated

unsloth/gemma-3-1b-itQ4

Fits comfortably1GB required · 48GB available

166.50 tok/sEstimated

bigcode/starcoder2-3bQ8

Fits comfortably3GB required · 48GB available

82.76 tok/sEstimated

bigcode/starcoder2-3bQ4

Fits comfortably2GB required · 48GB available

131.45 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q8

Not supported80GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q4

Fits comfortably40GB required · 48GB available

28.99 tok/sEstimated

ibm-granite/granite-docling-258MQ8

Fits comfortably7GB required · 48GB available

61.34 tok/sEstimated

ibm-granite/granite-docling-258MQ4

Fits comfortably4GB required · 48GB available

88.75 tok/sEstimated

skt/kogpt2-base-v2Q8

Fits comfortably7GB required · 48GB available

56.62 tok/sEstimated

skt/kogpt2-base-v2Q4

Fits comfortably4GB required · 48GB available

82.89 tok/sEstimated

google/gemma-3-270m-itQ8

Fits comfortably7GB required · 48GB available

59.94 tok/sEstimated

google/gemma-3-270m-itQ4

Fits comfortably4GB required · 48GB available

92.30 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q8

Fits comfortably4GB required · 48GB available

76.44 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q4

Fits comfortably2GB required · 48GB available

99.77 tok/sEstimated

Qwen/Qwen2.5-32BQ8

Fits comfortably32GB required · 48GB available

35.21 tok/sEstimated

Qwen/Qwen2.5-32BQ4

Fits comfortably16GB required · 48GB available

46.23 tok/sEstimated

parler-tts/parler-tts-large-v1Q8

Fits comfortably7GB required · 48GB available

58.87 tok/sEstimated

parler-tts/parler-tts-large-v1Q4

Fits comfortably4GB required · 48GB available

80.23 tok/sEstimated

EleutherAI/pythia-70m-dedupedQ8

Fits comfortably7GB required · 48GB available

67.12 tok/sEstimated

EleutherAI/pythia-70m-dedupedQ4

Fits comfortably4GB required · 48GB available

88.00 tok/sEstimated

microsoft/VibeVoice-1.5BQ8

Fits comfortably5GB required · 48GB available

66.52 tok/sEstimated

microsoft/VibeVoice-1.5BQ4

Fits comfortably3GB required · 48GB available

90.70 tok/sEstimated

ibm-granite/granite-3.3-2b-instructQ8

Fits comfortably2GB required · 48GB available

105.30 tok/sEstimated

ibm-granite/granite-3.3-2b-instructQ4

Fits comfortably1GB required · 48GB available

134.23 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ8

Not supported72GB required · 48GB available

Speed data coming soon

Qwen/Qwen2.5-72B-InstructQ4

Fits comfortably36GB required · 48GB available

32.93 tok/sEstimated

liuhaotian/llava-v1.5-7bQ8

Fits comfortably7GB required · 48GB available

63.91 tok/sEstimated

liuhaotian/llava-v1.5-7bQ4

Fits comfortably4GB required · 48GB available

81.41 tok/sEstimated

google/gemma-2bQ8

Fits comfortably2GB required · 48GB available

102.71 tok/sEstimated

google/gemma-2bQ4

Fits comfortably1GB required · 48GB available

137.77 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q8

Fits comfortably7GB required · 48GB available

62.38 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q4

Fits comfortably4GB required · 48GB available

93.80 tok/sEstimated

Qwen/Qwen3-235B-A22BQ8

Not supported235GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-235B-A22BQ4

Not supported118GB required · 48GB available

Speed data coming soon

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ8

Fits comfortably8GB required · 48GB available

64.44 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 48GB available

78.59 tok/sEstimated

microsoft/Phi-4-mini-instructQ8

Fits comfortably7GB required · 48GB available

56.88 tok/sEstimated

microsoft/Phi-4-mini-instructQ4

Fits comfortably4GB required · 48GB available

88.41 tok/sEstimated

llamafactory/tiny-random-Llama-3Q8

Fits comfortably7GB required · 48GB available

62.67 tok/sEstimated

llamafactory/tiny-random-Llama-3Q4

Fits comfortably4GB required · 48GB available

95.92 tok/sEstimated

HuggingFaceH4/zephyr-7b-betaQ8

Fits comfortably7GB required · 48GB available

56.81 tok/sEstimated

HuggingFaceH4/zephyr-7b-betaQ4

Fits comfortably4GB required · 48GB available

89.33 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q8

Fits comfortably4GB required · 48GB available

71.84 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q4

Fits comfortably2GB required · 48GB available

99.46 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8Q8

Fits comfortably30GB required · 48GB available

33.28 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8Q4

Fits comfortably15GB required · 48GB available

52.14 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ8

Fits comfortably8GB required · 48GB available

63.23 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ4

Fits comfortably4GB required · 48GB available

89.90 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 48GB available

132.18 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 48GB available

180.86 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ8

Fits comfortably8GB required · 48GB available

62.77 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ4

Fits comfortably4GB required · 48GB available

87.59 tok/sEstimated

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicQ8

Not supported90GB required · 48GB available

Speed data coming soon

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicQ4

Fits comfortably45GB required · 48GB available

26.38 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ8

Fits comfortably7GB required · 48GB available

63.50 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ4

Fits comfortably4GB required · 48GB available

94.86 tok/sEstimated

numind/NuExtract-1.5Q8

Fits comfortably7GB required · 48GB available

59.01 tok/sEstimated

numind/NuExtract-1.5Q4

Fits comfortably4GB required · 48GB available

90.07 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ8

Fits comfortably7GB required · 48GB available

66.54 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4

Fits comfortably4GB required · 48GB available

88.47 tok/sEstimated

hmellor/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 48GB available

61.54 tok/sEstimated

hmellor/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 48GB available

93.19 tok/sEstimated

huggyllama/llama-7bQ8

Fits comfortably7GB required · 48GB available

67.65 tok/sEstimated

huggyllama/llama-7bQ4

Fits comfortably4GB required · 48GB available

97.44 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324Q8

Fits comfortably7GB required · 48GB available

58.41 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324Q4

Fits comfortably4GB required · 48GB available

91.37 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ8

Fits comfortably7GB required · 48GB available

58.11 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ4

Fits comfortably4GB required · 48GB available

81.25 tok/sEstimated

sshleifer/tiny-gpt2Q8

Fits comfortably7GB required · 48GB available

58.82 tok/sEstimated

sshleifer/tiny-gpt2Q4

Fits comfortably4GB required · 48GB available

94.30 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ8

Fits comfortably8GB required · 48GB available

56.60 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ4

Fits comfortably4GB required · 48GB available

92.21 tok/sEstimated

openai-community/gpt2-xlQ8

Fits comfortably7GB required · 48GB available

61.23 tok/sEstimated

openai-community/gpt2-xlQ4

Fits comfortably4GB required · 48GB available

87.94 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ8

Fits comfortably14GB required · 48GB available

48.68 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ4

Fits comfortably7GB required · 48GB available

69.21 tok/sEstimated

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16Q8

Not supported70GB required · 48GB available

Speed data coming soon

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16Q4

Fits comfortably35GB required · 48GB available

31.88 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bitQ8

Fits comfortably4GB required · 48GB available

70.34 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bitQ4

Fits comfortably2GB required · 48GB available

103.87 tok/sEstimated

ibm-research/PowerMoE-3bQ8

Fits comfortably3GB required · 48GB available

89.44 tok/sEstimated

ibm-research/PowerMoE-3bQ4

Fits comfortably2GB required · 48GB available

122.17 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bitQ8

Fits comfortably4GB required · 48GB available

71.09 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bitQ4

Fits comfortably2GB required · 48GB available

116.15 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 48GB available

78.21 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 48GB available

109.75 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ8

Fits comfortably4GB required · 48GB available

80.64 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ4

Fits comfortably2GB required · 48GB available

112.45 tok/sEstimated

meta-llama/Llama-3.2-3BQ8

Fits comfortably3GB required · 48GB available

87.02 tok/sEstimated

meta-llama/Llama-3.2-3BQ4

Fits comfortably2GB required · 48GB available

127.97 tok/sEstimated

EleutherAI/gpt-neo-125mQ8

Fits comfortably7GB required · 48GB available

65.17 tok/sEstimated

EleutherAI/gpt-neo-125mQ4

Fits comfortably4GB required · 48GB available

95.94 tok/sEstimated

codellama/CodeLlama-34b-hfQ8

Fits comfortably34GB required · 48GB available

33.36 tok/sEstimated

codellama/CodeLlama-34b-hfQ4

Fits comfortably17GB required · 48GB available

49.89 tok/sEstimated

meta-llama/Llama-Guard-3-1BQ8

Fits comfortably1GB required · 48GB available

136.33 tok/sEstimated

meta-llama/Llama-Guard-3-1BQ4

Fits comfortably1GB required · 48GB available

175.85 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ8

Fits comfortably5GB required · 48GB available

75.38 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ4

Fits comfortably3GB required · 48GB available

102.84 tok/sEstimated

google/gemma-2-2b-itQ8

Fits comfortably2GB required · 48GB available

94.59 tok/sEstimated

google/gemma-2-2b-itQ4

Fits comfortably1GB required · 48GB available

148.05 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Fits comfortably14GB required · 48GB available

51.69 tok/sEstimated

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 48GB available

75.07 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ8

Fits comfortably32GB required · 48GB available

32.65 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ4

Fits comfortably16GB required · 48GB available

46.02 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ8

Fits comfortably7GB required · 48GB available

66.89 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ4

Fits comfortably4GB required · 48GB available

80.36 tok/sEstimated

Qwen/Qwen3-4B-BaseQ8

Fits comfortably4GB required · 48GB available

78.94 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 48GB available

108.82 tok/sEstimated

Qwen/Qwen2-7B-InstructQ8

Fits comfortably7GB required · 48GB available

63.73 tok/sEstimated

Qwen/Qwen2-7B-InstructQ4

Fits comfortably4GB required · 48GB available

86.46 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ8

Fits comfortably7GB required · 48GB available

64.51 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ4

Fits comfortably4GB required · 48GB available

92.60 tok/sEstimated

Qwen/Qwen3-14B-BaseQ8

Fits comfortably14GB required · 48GB available

51.12 tok/sEstimated

Qwen/Qwen3-14B-BaseQ4

Fits comfortably7GB required · 48GB available

74.07 tok/sEstimated

swiss-ai/Apertus-8B-Instruct-2509Q8

Fits comfortably8GB required · 48GB available

58.61 tok/sEstimated

swiss-ai/Apertus-8B-Instruct-2509Q4

Fits comfortably4GB required · 48GB available

84.31 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ8

Fits comfortably7GB required · 48GB available

59.09 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ4

Fits comfortably4GB required · 48GB available

81.99 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ8

Fits comfortably7GB required · 48GB available

60.61 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ4

Fits comfortably4GB required · 48GB available

89.78 tok/sEstimated

rinna/japanese-gpt-neox-smallQ8

Fits comfortably7GB required · 48GB available

59.16 tok/sEstimated

rinna/japanese-gpt-neox-smallQ4

Fits comfortably4GB required · 48GB available

85.19 tok/sEstimated

Qwen/Qwen2.5-Coder-1.5BQ8

Fits comfortably5GB required · 48GB available

71.75 tok/sEstimated

Qwen/Qwen2.5-Coder-1.5BQ4

Fits comfortably3GB required · 48GB available

92.29 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ8

Fits comfortably8GB required · 48GB available

60.83 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ4

Fits comfortably4GB required · 48GB available

85.81 tok/sEstimated

Qwen/Qwen3-30B-A3BQ8

Fits comfortably30GB required · 48GB available

33.94 tok/sEstimated

Qwen/Qwen3-30B-A3BQ4

Fits comfortably15GB required · 48GB available

51.35 tok/sEstimated

deepseek-ai/DeepSeek-R1Q8

Fits comfortably7GB required · 48GB available

62.64 tok/sEstimated

deepseek-ai/DeepSeek-R1Q4

Fits comfortably4GB required · 48GB available

82.35 tok/sEstimated

microsoft/DialoGPT-smallQ8

Fits comfortably7GB required · 48GB available

64.40 tok/sEstimated

microsoft/DialoGPT-smallQ4

Fits comfortably4GB required · 48GB available

89.59 tok/sEstimated

Qwen/Qwen3-8B-FP8Q8

Fits comfortably8GB required · 48GB available

56.22 tok/sEstimated

Qwen/Qwen3-8B-FP8Q4

Fits comfortably4GB required · 48GB available

81.98 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructQ8

Fits comfortably30GB required · 48GB available

32.96 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructQ4

Fits comfortably15GB required · 48GB available

47.41 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ8

Fits comfortably4GB required · 48GB available

69.70 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ4

Fits comfortably2GB required · 48GB available

99.75 tok/sEstimated

microsoft/Phi-4-multimodal-instructQ8

Fits comfortably7GB required · 48GB available

64.09 tok/sEstimated

microsoft/Phi-4-multimodal-instructQ4

Fits comfortably4GB required · 48GB available

83.48 tok/sEstimated

Qwen/Qwen3-8B-BaseQ8

Fits comfortably8GB required · 48GB available

53.92 tok/sEstimated

Qwen/Qwen3-8B-BaseQ4

Fits comfortably4GB required · 48GB available

92.55 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ8

Fits comfortably6GB required · 48GB available

59.76 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ4

Fits comfortably3GB required · 48GB available

86.19 tok/sEstimated

openai-community/gpt2-mediumQ8

Fits comfortably7GB required · 48GB available

67.96 tok/sEstimated

openai-community/gpt2-mediumQ4

Fits comfortably4GB required · 48GB available

85.04 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 48GB available

63.93 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 48GB available

91.77 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ8

Fits comfortably5GB required · 48GB available

75.45 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ4

Fits comfortably3GB required · 48GB available

105.48 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ8

Fits comfortably7GB required · 48GB available

64.05 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ4

Fits comfortably4GB required · 48GB available

81.42 tok/sEstimated

unsloth/gpt-oss-20b-BF16Q8

Fits comfortably20GB required · 48GB available

41.98 tok/sEstimated

unsloth/gpt-oss-20b-BF16Q4

Fits comfortably10GB required · 48GB available

58.10 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Not supported70GB required · 48GB available

Speed data coming soon

meta-llama/Meta-Llama-3-70B-InstructQ4

Fits comfortably35GB required · 48GB available

31.69 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 48GB available

58.18 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 48GB available

81.07 tok/sEstimated

zai-org/GLM-4.5-AirQ8

Fits comfortably7GB required · 48GB available

63.33 tok/sEstimated

zai-org/GLM-4.5-AirQ4

Fits comfortably4GB required · 48GB available

95.38 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.1Q8

Fits comfortably7GB required · 48GB available

67.75 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.1Q4

Fits comfortably4GB required · 48GB available

84.32 tok/sEstimated

LiquidAI/LFM2-1.2BQ8

Fits comfortably2GB required · 48GB available

105.86 tok/sEstimated

LiquidAI/LFM2-1.2BQ4

Fits comfortably1GB required · 48GB available

151.65 tok/sEstimated

mistralai/Mistral-7B-v0.1Q8

Fits comfortably7GB required · 48GB available

68.26 tok/sEstimated

mistralai/Mistral-7B-v0.1Q4

Fits comfortably4GB required · 48GB available

97.44 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Fits comfortably32GB required · 48GB available

32.55 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ4

Fits comfortably16GB required · 48GB available

52.96 tok/sEstimated

deepseek-ai/DeepSeek-R1-0528Q8

Fits comfortably7GB required · 48GB available

66.24 tok/sEstimated

deepseek-ai/DeepSeek-R1-0528Q4

Fits comfortably4GB required · 48GB available

88.31 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably8GB required · 48GB available

53.76 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 48GB available

86.99 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 48GB available

59.52 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q4

Fits comfortably4GB required · 48GB available

83.99 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 48GB available

60.83 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 48GB available

84.26 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 48GB available

92.82 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 48GB available

111.60 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 48GB available

77.15 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 48GB available

90.71 tok/sEstimated

MiniMaxAI/MiniMax-M2Q8

Fits comfortably7GB required · 48GB available

57.23 tok/sEstimated

MiniMaxAI/MiniMax-M2Q4

Fits comfortably4GB required · 48GB available

89.26 tok/sEstimated

microsoft/DialoGPT-mediumQ8

Fits comfortably7GB required · 48GB available

57.50 tok/sEstimated

microsoft/DialoGPT-mediumQ4

Fits comfortably4GB required · 48GB available

94.10 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 48GB available

57.53 tok/sEstimated

zai-org/GLM-4.6-FP8Q4

Fits comfortably4GB required · 48GB available

94.03 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ8

Fits comfortably7GB required · 48GB available

68.17 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ4

Fits comfortably4GB required · 48GB available

86.52 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably8GB required · 48GB available

56.15 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 48GB available

84.08 tok/sEstimated

meta-llama/Llama-2-7b-hfQ8

Fits comfortably7GB required · 48GB available

65.30 tok/sEstimated

meta-llama/Llama-2-7b-hfQ4

Fits comfortably4GB required · 48GB available

92.44 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 48GB available

57.42 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 48GB available

90.91 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 48GB available

60.41 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 48GB available

85.36 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported70GB required · 48GB available

Speed data coming soon

meta-llama/Llama-3.1-70B-InstructQ4

Fits comfortably35GB required · 48GB available

29.70 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 48GB available

68.13 tok/sEstimated

Qwen/Qwen2.5-0.5BQ4

Fits comfortably3GB required · 48GB available

100.27 tok/sEstimated

Qwen/Qwen3-14BQ8

Fits comfortably14GB required · 48GB available

49.15 tok/sEstimated

Qwen/Qwen3-14BQ4

Fits comfortably7GB required · 48GB available

72.68 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ8

Fits comfortably8GB required · 48GB available

53.49 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ4

Fits comfortably4GB required · 48GB available

79.69 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructQ8

Not supported70GB required · 48GB available

Speed data coming soon

meta-llama/Llama-3.3-70B-InstructQ4

Fits comfortably35GB required · 48GB available

33.33 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ8

Fits comfortably8GB required · 48GB available

61.49 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ4

Fits comfortably4GB required · 48GB available

82.32 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 48GB available

46.98 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 48GB available

73.56 tok/sEstimated

Qwen/Qwen2.5-1.5BQ8

Fits comfortably5GB required · 48GB available

70.98 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 48GB available

110.10 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8

Fits comfortably4GB required · 48GB available

79.82 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ4

Fits comfortably2GB required · 48GB available

103.62 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q8

Fits comfortably20GB required · 48GB available

44.31 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q4

Fits comfortably10GB required · 48GB available

53.39 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 48GB available

69.76 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 48GB available

95.86 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably8GB required · 48GB available

60.12 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 48GB available

85.04 tok/sEstimated

Qwen/Qwen3-Reranker-0.6BQ8

Fits comfortably6GB required · 48GB available

69.14 tok/sEstimated

Qwen/Qwen3-Reranker-0.6BQ4

Fits comfortably3GB required · 48GB available

93.33 tok/sEstimated

rednote-hilab/dots.ocrQ8

Fits comfortably7GB required · 48GB available

61.75 tok/sEstimated

rednote-hilab/dots.ocrQ4

Fits comfortably4GB required · 48GB available

95.72 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 48GB available

81.58 tok/sEstimated

google-t5/t5-3bQ4

Fits comfortably2GB required · 48GB available

131.80 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q8

Fits comfortably30GB required · 48GB available

32.97 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 48GB available

47.35 tok/sEstimated

Qwen/Qwen3-4BQ8

Fits comfortably4GB required · 48GB available

80.54 tok/sEstimated

Qwen/Qwen3-4BQ4

Fits comfortably2GB required · 48GB available

98.58 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 48GB available

56.60 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 48GB available

94.07 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 48GB available

66.58 tok/sEstimated

openai-community/gpt2-largeQ4

Fits comfortably4GB required · 48GB available

82.69 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ8

Fits comfortably7GB required · 48GB available

60.64 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ4

Fits comfortably4GB required · 48GB available

88.04 tok/sEstimated

allenai/OLMo-2-0425-1BQ8

Fits comfortably1GB required · 48GB available

120.42 tok/sEstimated

allenai/OLMo-2-0425-1BQ4

Fits comfortably1GB required · 48GB available

189.90 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ8

Not supported80GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Fits comfortably40GB required · 48GB available

30.00 tok/sEstimated

Qwen/Qwen3-32BQ8

Fits comfortably32GB required · 48GB available

34.42 tok/sEstimated

Qwen/Qwen3-32BQ4

Fits comfortably16GB required · 48GB available

44.83 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 48GB available

71.50 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 48GB available

92.12 tok/sEstimated

Qwen/Qwen2.5-7BQ8

Fits comfortably7GB required · 48GB available

61.84 tok/sEstimated

Qwen/Qwen2.5-7BQ4

Fits comfortably4GB required · 48GB available

80.12 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ8

Fits comfortably8GB required · 48GB available

57.69 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ4

Fits comfortably4GB required · 48GB available

79.69 tok/sEstimated

meta-llama/Llama-3.2-1BQ8

Fits comfortably1GB required · 48GB available

137.23 tok/sEstimated

meta-llama/Llama-3.2-1BQ4

Fits comfortably1GB required · 48GB available

167.62 tok/sEstimated

petals-team/StableBeluga2Q8

Fits comfortably7GB required · 48GB available

57.97 tok/sEstimated

petals-team/StableBeluga2Q4

Fits comfortably4GB required · 48GB available

81.34 tok/sEstimated

vikhyatk/moondream2Q8

Fits comfortably7GB required · 48GB available

56.39 tok/sEstimated

vikhyatk/moondream2Q4

Fits comfortably4GB required · 48GB available

90.40 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 48GB available

80.77 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 48GB available

131.00 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ8

Not supported70GB required · 48GB available

Speed data coming soon

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4

Fits comfortably35GB required · 48GB available

34.29 tok/sEstimated

distilbert/distilgpt2Q8

Fits comfortably7GB required · 48GB available

64.83 tok/sEstimated

distilbert/distilgpt2Q4

Fits comfortably4GB required · 48GB available

89.29 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ8

Fits comfortably32GB required · 48GB available

31.76 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ4

Fits comfortably16GB required · 48GB available

44.61 tok/sEstimated

inference-net/Schematron-3BQ8

Fits comfortably3GB required · 48GB available

80.28 tok/sEstimated

inference-net/Schematron-3BQ4

Fits comfortably2GB required · 48GB available

112.92 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably8GB required · 48GB available

62.86 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 48GB available

90.82 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 48GB available

65.17 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 48GB available

83.71 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 48GB available

80.92 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 48GB available

112.02 tok/sEstimated

bigscience/bloomz-560mQ8

Fits comfortably7GB required · 48GB available

59.09 tok/sEstimated

bigscience/bloomz-560mQ4

Fits comfortably4GB required · 48GB available

82.15 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ8

Fits comfortably3GB required · 48GB available

83.86 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 48GB available

122.40 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported120GB required · 48GB available

Speed data coming soon

openai/gpt-oss-120bQ4

Not supported60GB required · 48GB available

Speed data coming soon

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 48GB available

124.33 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 48GB available

170.05 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q8

Fits comfortably4GB required · 48GB available

76.59 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 48GB available

98.98 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 48GB available

61.32 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 48GB available

88.88 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 48GB available

133.24 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4

Fits comfortably1GB required · 48GB available

176.80 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 48GB available

61.55 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 48GB available

81.34 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ8

Fits comfortably5GB required · 48GB available

72.50 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 48GB available

96.19 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 48GB available

66.47 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 48GB available

98.75 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 48GB available

115.42 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 48GB available

186.76 tok/sEstimated

openai/gpt-oss-20bQ8

Fits comfortably20GB required · 48GB available

45.36 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 48GB available

57.06 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Fits comfortably34GB required · 48GB available

35.47 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ4

Fits comfortably17GB required · 48GB available

43.39 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 48GB available

63.43 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 48GB available

83.67 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 48GB available

63.70 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 48GB available

98.10 tok/sEstimated

Qwen/Qwen3-0.6BQ8

Fits comfortably6GB required · 48GB available

68.27 tok/sEstimated

Qwen/Qwen3-0.6BQ4

Fits comfortably3GB required · 48GB available

91.08 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ8

Fits comfortably7GB required · 48GB available

57.54 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ4

Fits comfortably4GB required · 48GB available

87.28 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 48GB available

66.02 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 48GB available

86.16 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

GPU FAQs

Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.

What throughput can the RTX 6000 Ada reach?

LM Studio users fully offloading Qwen 3 30B Q4 with FlashAttention report about 33 tokens/sec at a 32K context window on the RTX 6000 Ada.

Source: Reddit – /r/LocalLLaMA (mpya1gb)

How much does a workstation build cost?

Professionals cite turnkey RTX 6000 Ada boxes at roughly $6,000—already fast and private enough to replace API workflows for many coding teams.

Source: Reddit – /r/LocalLLaMA (mr6x6wu)

What hybrid builds include the RTX 6000 Ada?

One ProLiant DL380 Gen10 setup pairs a single RTX 6000 Ada with three RTX 4090s, virtualized under Proxmox to expose 120 GB of total VRAM for AI workloads.

Source: Reddit – /r/LocalLLaMA (mqubm2s)

Is the workstation premium justified?

Some buyers note the RTX 6000 Ada’s price (~$7k) rivals three RTX 5090 cards, so the workstation route only makes sense when ECC VRAM and pro drivers are required.

Source: Reddit – /r/LocalLLaMA (mqsk1ah)

What are the specs and current prices?

RTX 6000 Ada includes 48 GB GDDR6 ECC, a 300 W TDP, and PCIe 4.0 x16 connectivity. As of 3 Nov 2025 it listed at $6,999 (Newegg), $7,199 (Amazon), and $7,299 (Best Buy, out of stock).

Source: TechPowerUp – NVIDIA RTX 6000 Ada Specs

Alternative GPUs

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.