Loading GPU data...

Quick Answer: NVIDIA A6000 offers 48GB VRAM and starts around $4699.00. It delivers approximately 111 tokens/sec on meta-llama/Llama-3.2-1B-Instruct. It typically draws 300W under load.

NVIDIA A6000

Name: NVIDIA A6000
Brand: NVIDIA
Rating: 4.5 (89 reviews)

In stock

By NVIDIAReleased 2020-10MSRP $4,699.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Specs snapshot

Key hardware metrics for AI workloads.

VRAM48GB

Cores10,752

TDP300W

ArchitectureAmpere

Price comparison

Retailer	Price	Buy
AmazonPrimary	$4,999.00Unknown	Buy now
Newegg	$4,699.00LowestIn stock	Buy now
Best Buy	$4,899.00Backorder	Buy now

AmazonPrimary

$4,999.00

Unknown

Buy

Newegg

$4,699.00

Lowest PriceIn stock

Buy

Best Buy

$4,899.00

Backorder

Buy

More Amazon options

Rotate out primary variants whenever validation flags an issue.

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-3.2-1B-Instruct	Q4	110.71 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	109.94 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	107.89 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	106.57 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	106.24 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	104.34 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	103.17 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	102.48 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	96.84 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	89.87 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	89.66 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	89.25 tok/sEstimated Auto-generated benchmark	1GB
LiquidAI/LFM2-1.2B	Q4	88.86 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q8	78.14 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q8	77.44 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q8	77.43 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	77.31 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	74.92 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	74.05 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-Guard-3-1B	Q8	73.78 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q8	73.60 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	73.48 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Thinking-2507	Q4	70.83 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q8	69.86 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	69.37 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q8	69.32 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q8	68.62 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	68.15 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	67.82 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	67.62 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Base	Q4	67.52 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	66.27 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	66.24 tok/sEstimated Auto-generated benchmark	2GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	66.20 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	65.93 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	65.86 tok/sEstimated Auto-generated benchmark	2GB
microsoft/VibeVoice-1.5B	Q4	65.40 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-1.5B	Q4	64.77 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-0.5B-Instruct	Q4	63.99 tok/sEstimated Auto-generated benchmark	3GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	62.94 tok/sEstimated Auto-generated benchmark	3GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	61.95 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-0.5B	Q4	61.77 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-1.5B-Instruct	Q4	61.61 tok/sEstimated Auto-generated benchmark	3GB
google/gemma-2b	Q8	61.20 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-0.6B-Base	Q4	61.15 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	61.10 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B	Q4	61.08 tok/sEstimated Auto-generated benchmark	2GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q4	61.08 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q8	60.96 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Reranker-0.6B	Q4	60.77 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-Embedding-4B	Q4	60.50 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q8	60.04 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2-0.5B	Q4	59.66 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B-Instruct-2507	Q4	58.72 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	58.36 tok/sEstimated Auto-generated benchmark	2GB
huggyllama/llama-7b	Q4	57.68 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/gpt-neo-125m	Q4	57.65 tok/sEstimated Auto-generated benchmark	4GB
MiniMaxAI/MiniMax-M2	Q4	57.62 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-vision-instruct	Q4	57.62 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-large	Q4	57.44 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-4k-instruct	Q4	57.29 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	57.24 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-0.6B	Q4	57.24 tok/sEstimated Auto-generated benchmark	3GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	57.07 tok/sEstimated Auto-generated benchmark	3GB
rednote-hilab/dots.ocr	Q4	56.78 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	56.69 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-hf	Q4	56.61 tok/sEstimated Auto-generated benchmark	4GB
Gensyn/Qwen2.5-0.5B-Instruct	Q4	56.57 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-1.5B-Instruct	Q4	56.36 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-4-mini-instruct	Q4	56.04 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	55.99 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2.5-Coder-1.5B	Q4	55.97 tok/sEstimated Auto-generated benchmark	3GB
openai-community/gpt2-medium	Q4	55.91 tok/sEstimated Auto-generated benchmark	4GB
microsoft/phi-2	Q4	55.64 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3.1	Q4	55.62 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-medium	Q4	55.47 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-Instruct-v0.1	Q4	55.37 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	54.75 tok/sEstimated Auto-generated benchmark	3GB
ibm-granite/granite-3.3-2b-instruct	Q8	54.71 tok/sEstimated Auto-generated benchmark	2GB
microsoft/phi-4	Q4	54.66 tok/sEstimated Auto-generated benchmark	4GB
ibm-research/PowerMoE-3b	Q8	54.64 tok/sEstimated Auto-generated benchmark	3GB
google/gemma-3-270m-it	Q4	54.61 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B-Instruct	Q4	54.56 tok/sEstimated Auto-generated benchmark	4GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	54.33 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Math-1.5B	Q4	54.32 tok/sEstimated Auto-generated benchmark	3GB
dicta-il/dictalm2.0-instruct	Q4	54.24 tok/sEstimated Auto-generated benchmark	4GB
llamafactory/tiny-random-Llama-3	Q4	53.96 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	53.96 tok/sEstimated Auto-generated benchmark	4GB
bigscience/bloomz-560m	Q4	53.92 tok/sEstimated Auto-generated benchmark	4GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	53.82 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B	Q8	53.68 tok/sEstimated Auto-generated benchmark	3GB
unsloth/Llama-3.2-3B-Instruct	Q8	53.61 tok/sEstimated Auto-generated benchmark	3GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	53.60 tok/sEstimated Auto-generated benchmark	4GB
inference-net/Schematron-3B	Q8	53.30 tok/sEstimated Auto-generated benchmark	3GB
numind/NuExtract-1.5	Q4	53.11 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	53.08 tok/sEstimated Auto-generated benchmark	4GB
facebook/opt-125m	Q4	53.05 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	52.92 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	52.91 tok/sEstimated Auto-generated benchmark	4GB
vikhyatk/moondream2	Q4	52.88 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	52.76 tok/sEstimated Auto-generated benchmark	4GB
liuhaotian/llava-v1.5-7b	Q4	52.72 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	52.71 tok/sEstimated Auto-generated benchmark	4GB
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	52.68 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-8B	Q4	52.50 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	52.46 tok/sEstimated Auto-generated benchmark	4GB
rinna/japanese-gpt-neox-small	Q4	52.40 tok/sEstimated Auto-generated benchmark	4GB
lmsys/vicuna-7b-v1.5	Q4	52.39 tok/sEstimated Auto-generated benchmark	4GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	52.35 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen2-7B-Instruct	Q4	52.33 tok/sEstimated Auto-generated benchmark	4GB
distilbert/distilgpt2	Q4	52.30 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1	Q4	51.66 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-0.6B	Q4	51.61 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-8B-Base	Q4	51.56 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.6-FP8	Q4	51.07 tok/sEstimated Auto-generated benchmark	4GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	51.05 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3.5-mini-instruct	Q4	50.91 tok/sEstimated Auto-generated benchmark	4GB
EleutherAI/pythia-70m-deduped	Q4	50.88 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceH4/zephyr-7b-beta	Q4	50.83 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	50.73 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	50.59 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-3B-Instruct	Q8	50.53 tok/sEstimated Auto-generated benchmark	3GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	50.47 tok/sEstimated Auto-generated benchmark	4GB
google-t5/t5-3b	Q8	50.26 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-1.7B-Base	Q4	50.16 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-docling-258M	Q4	50.02 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-8B	Q4	49.88 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	49.83 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-FP8	Q4	49.77 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-Instruct-v0.2	Q4	49.72 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	49.67 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B	Q4	49.48 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q8	49.37 tok/sEstimated Auto-generated benchmark	4GB
parler-tts/parler-tts-large-v1	Q4	49.36 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Instruct	Q4	49.22 tok/sEstimated Auto-generated benchmark	4GB
hmellor/tiny-random-LlamaForCausalLM	Q4	49.10 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-4-multimodal-instruct	Q4	49.08 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	48.91 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	48.82 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-xl	Q4	48.72 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-7b-chat-hf	Q4	48.71 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3-0324	Q4	48.59 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	48.55 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-4B	Q8	48.54 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	48.48 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	48.36 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B	Q4	48.32 tok/sEstimated Auto-generated benchmark	4GB
sshleifer/tiny-gpt2	Q4	48.01 tok/sEstimated Auto-generated benchmark	4GB
BSC-LT/salamandraTA-7b-instruct	Q4	47.95 tok/sEstimated Auto-generated benchmark	4GB
IlyaGusev/saiga_llama3_8b	Q4	47.95 tok/sEstimated Auto-generated benchmark	4GB
GSAI-ML/LLaDA-8B-Base	Q4	47.89 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-v0.1	Q4	47.80 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceTB/SmolLM2-135M	Q4	47.66 tok/sEstimated Auto-generated benchmark	4GB
microsoft/Phi-3-mini-128k-instruct	Q4	47.55 tok/sEstimated Auto-generated benchmark	4GB
HuggingFaceTB/SmolLM-135M	Q4	47.50 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2	Q4	47.49 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-Guard-3-8B	Q4	47.17 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Meta-Llama-3-8B	Q4	47.08 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-3B-Instruct	Q8	46.84 tok/sEstimated Auto-generated benchmark	3GB
bigcode/starcoder2-3b	Q8	46.63 tok/sEstimated Auto-generated benchmark	3GB
ibm-granite/granite-3.3-8b-instruct	Q4	46.34 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-3B	Q8	45.89 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B-Base	Q8	45.87 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B-Instruct	Q4	45.54 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	45.18 tok/sEstimated Auto-generated benchmark	4GB
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	45.12 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Instruct-2507	Q8	44.69 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B-Instruct	Q8	44.60 tok/sEstimated Auto-generated benchmark	5GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	44.60 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2-0.5B-Instruct	Q8	44.38 tok/sEstimated Auto-generated benchmark	5GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	44.19 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B	Q8	43.91 tok/sEstimated Auto-generated benchmark	4GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q8	43.65 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-14B-Base	Q4	43.20 tok/sEstimated Auto-generated benchmark	7GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q8	43.03 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-2-13b-chat-hf	Q4	42.45 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-4B-Thinking-2507	Q8	42.15 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-14B	Q4	42.13 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-14B	Q4	41.99 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-0.5B-Instruct	Q8	41.74 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-0.6B-Base	Q8	41.69 tok/sEstimated Auto-generated benchmark	6GB
Gensyn/Qwen2.5-0.5B-Instruct	Q8	41.23 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-0.6B	Q8	41.03 tok/sEstimated Auto-generated benchmark	6GB
mistralai/Mistral-7B-Instruct-v0.1	Q8	40.39 tok/sEstimated Auto-generated benchmark	7GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	40.13 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3-mini-128k-instruct	Q8	39.99 tok/sEstimated Auto-generated benchmark	7GB
ai-forever/ruGPT-3.5-13B	Q4	39.93 tok/sEstimated Auto-generated benchmark	7GB
mistralai/Mistral-7B-Instruct-v0.2	Q8	39.86 tok/sEstimated Auto-generated benchmark	7GB
microsoft/VibeVoice-1.5B	Q8	39.76 tok/sEstimated Auto-generated benchmark	5GB
huggyllama/llama-7b	Q8	39.75 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Reranker-0.6B	Q8	39.72 tok/sEstimated Auto-generated benchmark	6GB
EleutherAI/pythia-70m-deduped	Q8	39.68 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-V3.1	Q8	39.68 tok/sEstimated Auto-generated benchmark	7GB
skt/kogpt2-base-v2	Q8	39.60 tok/sEstimated Auto-generated benchmark	7GB
lmsys/vicuna-7b-v1.5	Q8	39.55 tok/sEstimated Auto-generated benchmark	7GB
microsoft/DialoGPT-medium	Q8	39.53 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-7B	Q8	39.47 tok/sEstimated Auto-generated benchmark	7GB
zai-org/GLM-4.6-FP8	Q8	39.33 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-1.7B	Q8	39.33 tok/sEstimated Auto-generated benchmark	7GB
llamafactory/tiny-random-Llama-3	Q8	39.28 tok/sEstimated Auto-generated benchmark	7GB
facebook/opt-125m	Q8	39.23 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceTB/SmolLM-135M	Q8	39.15 tok/sEstimated Auto-generated benchmark	7GB
BSC-LT/salamandraTA-7b-instruct	Q8	39.04 tok/sEstimated Auto-generated benchmark	7GB
rinna/japanese-gpt-neox-small	Q8	38.71 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-1.5B	Q8	38.70 tok/sEstimated Auto-generated benchmark	5GB
petals-team/StableBeluga2	Q8	38.67 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2-0.5B	Q8	38.64 tok/sEstimated Auto-generated benchmark	5GB
ibm-granite/granite-docling-258M	Q8	38.55 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q8	38.45 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-4-mini-instruct	Q8	38.39 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1	Q8	38.37 tok/sEstimated Auto-generated benchmark	7GB
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	38.35 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	38.33 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-0528	Q8	38.31 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-Math-1.5B	Q8	38.27 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen2.5-Coder-1.5B	Q8	38.27 tok/sEstimated Auto-generated benchmark	5GB
HuggingFaceH4/zephyr-7b-beta	Q8	38.17 tok/sEstimated Auto-generated benchmark	7GB
microsoft/phi-2	Q8	38.10 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	38.03 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen2.5-0.5B	Q8	38.03 tok/sEstimated Auto-generated benchmark	5GB
OpenPipe/Qwen3-14B-Instruct	Q4	38.01 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2-1.5B-Instruct	Q8	37.98 tok/sEstimated Auto-generated benchmark	5GB
Qwen/Qwen3-8B-FP8	Q8	37.96 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen2.5-7B-Instruct	Q8	37.93 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-4-multimodal-instruct	Q8	37.76 tok/sEstimated Auto-generated benchmark	7GB
sshleifer/tiny-gpt2	Q8	37.71 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Meta-Llama-3-8B	Q8	37.56 tok/sEstimated Auto-generated benchmark	8GB
dicta-il/dictalm2.0-instruct	Q8	37.48 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2-7B-Instruct	Q8	37.46 tok/sEstimated Auto-generated benchmark	7GB
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	37.45 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen2.5-14B-Instruct	Q4	37.31 tok/sEstimated Auto-generated benchmark	7GB
microsoft/DialoGPT-small	Q8	37.24 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Embedding-8B	Q8	37.07 tok/sEstimated Auto-generated benchmark	8GB
numind/NuExtract-1.5	Q8	37.04 tok/sEstimated Auto-generated benchmark	7GB
hmellor/tiny-random-LlamaForCausalLM	Q8	36.87 tok/sEstimated Auto-generated benchmark	7GB
google/gemma-3-270m-it	Q8	36.81 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-V3-0324	Q8	36.71 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-Guard-3-8B	Q8	36.56 tok/sEstimated Auto-generated benchmark	8GB
openai-community/gpt2-medium	Q8	36.50 tok/sEstimated Auto-generated benchmark	7GB
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	36.11 tok/sEstimated Auto-generated benchmark	9GB
distilbert/distilgpt2	Q8	36.09 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-Embedding-0.6B	Q8	36.03 tok/sEstimated Auto-generated benchmark	6GB
liuhaotian/llava-v1.5-7b	Q8	35.99 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	35.82 tok/sEstimated Auto-generated benchmark	7GB
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	35.77 tok/sEstimated Auto-generated benchmark	7GB
ibm-granite/granite-3.3-8b-instruct	Q8	35.76 tok/sEstimated Auto-generated benchmark	8GB
meta-llama/Llama-3.1-8B-Instruct	Q8	35.73 tok/sEstimated Auto-generated benchmark	8GB
microsoft/Phi-3.5-mini-instruct	Q8	35.71 tok/sEstimated Auto-generated benchmark	7GB
deepseek-ai/DeepSeek-V3	Q8	35.56 tok/sEstimated Auto-generated benchmark	7GB
rednote-hilab/dots.ocr	Q8	35.44 tok/sEstimated Auto-generated benchmark	7GB
swiss-ai/Apertus-8B-Instruct-2509	Q8	35.29 tok/sEstimated Auto-generated benchmark	8GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	35.23 tok/sEstimated Auto-generated benchmark	8GB
openai-community/gpt2-xl	Q8	35.15 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-2-7b-chat-hf	Q8	35.13 tok/sEstimated Auto-generated benchmark	7GB
microsoft/Phi-3.5-vision-instruct	Q8	35.11 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceTB/SmolLM2-135M	Q8	35.09 tok/sEstimated Auto-generated benchmark	7GB
bigscience/bloomz-560m	Q8	35.08 tok/sEstimated Auto-generated benchmark	7GB
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	35.07 tok/sEstimated Auto-generated benchmark	7GB
vikhyatk/moondream2	Q8	34.87 tok/sEstimated Auto-generated benchmark	7GB
Qwen/Qwen3-1.7B-Base	Q8	34.67 tok/sEstimated Auto-generated benchmark	7GB
openai-community/gpt2	Q8	34.37 tok/sEstimated Auto-generated benchmark	7GB
meta-llama/Llama-2-7b-hf	Q8	34.26 tok/sEstimated Auto-generated benchmark	7GB
MiniMaxAI/MiniMax-M2	Q8	34.21 tok/sEstimated Auto-generated benchmark	7GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	34.21 tok/sEstimated Auto-generated benchmark	8GB
zai-org/GLM-4.5-Air	Q8	34.17 tok/sEstimated Auto-generated benchmark	7GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q8	34.11 tok/sEstimated Auto-generated benchmark	8GB
meta-llama/Meta-Llama-3-8B-Instruct	Q8	34.07 tok/sEstimated Auto-generated benchmark	8GB
parler-tts/parler-tts-large-v1	Q8	33.90 tok/sEstimated Auto-generated benchmark	7GB
microsoft/phi-4	Q8	33.89 tok/sEstimated Auto-generated benchmark	7GB
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q4	33.83 tok/sEstimated Auto-generated benchmark	10GB
microsoft/Phi-3-mini-4k-instruct	Q8	33.72 tok/sEstimated Auto-generated benchmark	7GB
EleutherAI/gpt-neo-125m	Q8	33.65 tok/sEstimated Auto-generated benchmark	7GB
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	33.63 tok/sEstimated Auto-generated benchmark	10GB
Qwen/Qwen3-8B	Q8	33.55 tok/sEstimated Auto-generated benchmark	8GB
openai-community/gpt2-large	Q8	33.44 tok/sEstimated Auto-generated benchmark	7GB
unsloth/gpt-oss-20b-BF16	Q4	33.23 tok/sEstimated Auto-generated benchmark	10GB
mistralai/Mistral-7B-v0.1	Q8	33.17 tok/sEstimated Auto-generated benchmark	7GB
openai/gpt-oss-20b	Q4	33.15 tok/sEstimated Auto-generated benchmark	10GB
Qwen/Qwen3-8B-Base	Q8	32.81 tok/sEstimated Auto-generated benchmark	8GB
IlyaGusev/saiga_llama3_8b	Q8	32.77 tok/sEstimated Auto-generated benchmark	8GB
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	32.75 tok/sEstimated Auto-generated benchmark	8GB
GSAI-ML/LLaDA-8B-Base	Q8	32.56 tok/sEstimated Auto-generated benchmark	8GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	32.24 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	32.18 tok/sEstimated Auto-generated benchmark	15GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q4	32.12 tok/sEstimated Auto-generated benchmark	15GB
meta-llama/Llama-3.1-8B	Q8	32.08 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q4	32.06 tok/sEstimated Auto-generated benchmark	15GB
GSAI-ML/LLaDA-8B-Instruct	Q8	31.64 tok/sEstimated Auto-generated benchmark	8GB
Qwen/Qwen2.5-14B-Instruct	Q8	30.99 tok/sEstimated Auto-generated benchmark	14GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q4	30.81 tok/sEstimated Auto-generated benchmark	15GB
Qwen/Qwen3-30B-A3B-Thinking-2507	Q4	30.29 tok/sEstimated Auto-generated benchmark	15GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q4	30.24 tok/sEstimated Auto-generated benchmark	16GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q4	30.20 tok/sEstimated Auto-generated benchmark	15GB
Qwen/Qwen3-32B	Q4	30.14 tok/sEstimated Auto-generated benchmark	16GB
Qwen/Qwen2.5-14B	Q8	29.95 tok/sEstimated Auto-generated benchmark	14GB
OpenPipe/Qwen3-14B-Instruct	Q8	29.60 tok/sEstimated Auto-generated benchmark	14GB
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q4	29.49 tok/sEstimated Auto-generated benchmark	15GB
Qwen/Qwen3-14B-Base	Q8	29.30 tok/sEstimated Auto-generated benchmark	14GB
Qwen/Qwen2.5-32B-Instruct	Q4	28.93 tok/sEstimated Auto-generated benchmark	16GB
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q4	28.85 tok/sEstimated Auto-generated benchmark	16GB
meta-llama/Llama-2-13b-chat-hf	Q8	28.49 tok/sEstimated Auto-generated benchmark	13GB
Qwen/Qwen2.5-32B	Q4	28.35 tok/sEstimated Auto-generated benchmark	16GB
Qwen/Qwen3-14B	Q8	27.69 tok/sEstimated Auto-generated benchmark	14GB
ai-forever/ruGPT-3.5-13B	Q8	27.49 tok/sEstimated Auto-generated benchmark	13GB
Qwen/Qwen3-30B-A3B	Q4	27.07 tok/sEstimated Auto-generated benchmark	15GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	26.88 tok/sEstimated Auto-generated benchmark	15GB
codellama/CodeLlama-34b-hf	Q4	26.85 tok/sEstimated Auto-generated benchmark	17GB
baichuan-inc/Baichuan-M2-32B	Q4	26.63 tok/sEstimated Auto-generated benchmark	16GB
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	26.45 tok/sEstimated Auto-generated benchmark	17GB
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q8	25.80 tok/sEstimated Auto-generated benchmark	20GB
openai/gpt-oss-20b	Q8	23.52 tok/sEstimated Auto-generated benchmark	20GB
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	23.28 tok/sEstimated Auto-generated benchmark	20GB
unsloth/gpt-oss-20b-BF16	Q8	22.50 tok/sEstimated Auto-generated benchmark	20GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q8	22.47 tok/sEstimated Auto-generated benchmark	30GB
Qwen/Qwen2.5-32B-Instruct	Q8	21.55 tok/sEstimated Auto-generated benchmark	32GB
Qwen/Qwen2.5-32B	Q8	21.29 tok/sEstimated Auto-generated benchmark	32GB
Qwen/Qwen3-32B	Q8	21.06 tok/sEstimated Auto-generated benchmark	32GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	20.93 tok/sEstimated Auto-generated benchmark	30GB
baichuan-inc/Baichuan-M2-32B	Q8	20.71 tok/sEstimated Auto-generated benchmark	32GB
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q8	20.69 tok/sEstimated Auto-generated benchmark	30GB
meta-llama/Llama-3.3-70B-Instruct	Q4	20.65 tok/sEstimated Auto-generated benchmark	35GB
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q8	20.18 tok/sEstimated Auto-generated benchmark	30GB
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	20.02 tok/sEstimated Auto-generated benchmark	32GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q8	19.94 tok/sEstimated Auto-generated benchmark	30GB
meta-llama/Meta-Llama-3-70B-Instruct	Q4	19.87 tok/sEstimated Auto-generated benchmark	35GB
Qwen/Qwen3-30B-A3B-Instruct-2507	Q8	19.85 tok/sEstimated Auto-generated benchmark	30GB
AI-MO/Kimina-Prover-72B	Q4	19.50 tok/sEstimated Auto-generated benchmark	36GB
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q4	19.49 tok/sEstimated Auto-generated benchmark	35GB
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	19.37 tok/sEstimated Auto-generated benchmark	35GB
Qwen/Qwen3-30B-A3B	Q8	19.23 tok/sEstimated Auto-generated benchmark	30GB
meta-llama/Llama-3.1-70B-Instruct	Q4	19.05 tok/sEstimated Auto-generated benchmark	35GB
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q8	18.92 tok/sEstimated Auto-generated benchmark	30GB
codellama/CodeLlama-34b-hf	Q8	18.87 tok/sEstimated Auto-generated benchmark	34GB
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q4	18.80 tok/sEstimated Auto-generated benchmark	40GB
Qwen/Qwen3-30B-A3B-Thinking-2507	Q8	18.62 tok/sEstimated Auto-generated benchmark	30GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q8	18.49 tok/sEstimated Auto-generated benchmark	32GB
Qwen/Qwen2.5-72B-Instruct	Q4	17.76 tok/sEstimated Auto-generated benchmark	36GB
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	17.61 tok/sEstimated Auto-generated benchmark	34GB
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	17.58 tok/sEstimated Auto-generated benchmark	40GB
Qwen/Qwen3-Next-80B-A3B-Thinking	Q4	16.91 tok/sEstimated Auto-generated benchmark	40GB
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q4	16.45 tok/sEstimated Auto-generated benchmark	40GB
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q4	16.44 tok/sEstimated Auto-generated benchmark	45GB

meta-llama/Llama-3.2-1B-Instruct

1GB

110.71 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

109.94 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

107.89 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

106.57 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

106.24 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

104.34 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

103.17 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

102.48 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

96.84 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

89.87 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

89.66 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

89.25 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

88.86 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

78.14 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

77.44 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

77.43 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

77.31 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

74.92 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

74.05 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

73.78 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

73.60 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

73.48 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

2GB

70.83 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

69.86 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

69.37 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

69.32 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

68.62 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

68.15 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

67.82 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

67.62 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

67.52 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

66.27 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

66.24 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

2GB

66.20 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

65.93 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

65.86 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

65.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

64.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

3GB

63.99 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

62.94 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

2GB

61.95 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

61.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

3GB

61.61 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

2GB

61.20 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

61.15 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

2GB

61.10 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

61.08 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit

2GB

61.08 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

2GB

60.96 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

3GB

60.77 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

60.50 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

2GB

60.04 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

3GB

59.66 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

2GB

58.72 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

58.36 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

4GB

57.68 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

4GB

57.65 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

4GB

57.62 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

57.62 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

4GB

57.44 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

4GB

57.29 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

4GB

57.24 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

57.24 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3GB

57.07 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

4GB

56.78 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

56.69 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

4GB

56.61 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

3GB

56.57 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

56.36 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-mini-instruct

4GB

56.04 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

55.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

3GB

55.97 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

55.91 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

55.64 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

4GB

55.62 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-medium

4GB

55.47 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

4GB

55.37 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

3GB

54.75 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

2GB

54.71 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

4GB

54.66 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

3GB

54.64 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

4GB

54.61 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

4GB

54.56 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

4GB

54.33 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

3GB

54.32 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

4GB

54.24 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

4GB

53.96 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

53.96 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

4GB

53.92 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

4GB

53.82 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

3GB

53.68 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

3GB

53.61 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

4GB

53.60 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

3GB

53.30 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

4GB

53.11 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

53.08 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

4GB

53.05 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

4GB

52.92 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

52.91 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

4GB

52.88 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

4GB

52.76 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

4GB

52.72 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

4GB

52.71 tok/sEstimated

Auto-generated benchmark

nvidia/NVIDIA-Nemotron-Nano-9B-v2

5GB

52.68 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

52.50 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

4GB

52.46 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

4GB

52.40 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

4GB

52.39 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

3GB

52.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

4GB

52.33 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

4GB

52.30 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

4GB

51.66 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

3GB

51.61 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

51.56 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

4GB

51.07 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

4GB

51.05 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

4GB

50.91 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

4GB

50.88 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

4GB

50.83 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

4GB

50.73 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

50.59 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

3GB

50.53 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

50.47 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

3GB

50.26 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

50.16 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

4GB

50.02 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

4GB

49.88 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

49.83 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

4GB

49.77 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

4GB

49.72 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

49.67 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

49.48 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit

4GB

49.37 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

49.36 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

4GB

49.22 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

4GB

49.10 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

49.08 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

48.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

48.82 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

48.72 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

4GB

48.71 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

4GB

48.59 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

48.55 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

4GB

48.54 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

4GB

48.48 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

48.36 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

4GB

48.32 tok/sEstimated

Auto-generated benchmark

sshleifer/tiny-gpt2

4GB

48.01 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

4GB

47.95 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

4GB

47.95 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

4GB

47.89 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

47.80 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM2-135M

4GB

47.66 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

47.55 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

4GB

47.50 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

47.49 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

47.17 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

47.08 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

3GB

46.84 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

3GB

46.63 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

46.34 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

3GB

45.89 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

4GB

45.87 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

4GB

45.54 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

4GB

45.18 tok/sEstimated

Auto-generated benchmark

kaitchup/Phi-3-mini-4k-instruct-gptq-4bit

4GB

45.12 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Instruct-2507

4GB

44.69 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

5GB

44.60 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

5GB

44.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

5GB

44.38 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

4GB

44.19 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

4GB

43.91 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

5GB

43.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B-Base

7GB

43.20 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit

4GB

43.03 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-13b-chat-hf

7GB

42.45 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507

4GB

42.15 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B

7GB

42.13 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B

7GB

41.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

5GB

41.74 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

6GB

41.69 tok/sEstimated

Auto-generated benchmark

Gensyn/Qwen2.5-0.5B-Instruct

5GB

41.23 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

6GB

41.03 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.1

7GB

40.39 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-LlamaForCausalLM-3.2

7GB

40.13 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

7GB

39.99 tok/sEstimated

Auto-generated benchmark

ai-forever/ruGPT-3.5-13B

7GB

39.93 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-Instruct-v0.2

7GB

39.86 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

5GB

39.76 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

7GB

39.75 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Reranker-0.6B

6GB

39.72 tok/sEstimated

Auto-generated benchmark

EleutherAI/pythia-70m-deduped

7GB

39.68 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3.1

7GB

39.68 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

7GB

39.60 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

7GB

39.55 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-medium

7GB

39.53 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

7GB

39.47 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.6-FP8

7GB

39.33 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

7GB

39.33 tok/sEstimated

Auto-generated benchmark

llamafactory/tiny-random-Llama-3

7GB

39.28 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

7GB

39.23 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM-135M

7GB

39.15 tok/sEstimated

Auto-generated benchmark

BSC-LT/salamandraTA-7b-instruct

7GB

39.04 tok/sEstimated

Auto-generated benchmark

rinna/japanese-gpt-neox-small

7GB

38.71 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

5GB

38.70 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

7GB

38.67 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B

5GB

38.64 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

7GB

38.55 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

7GB

38.45 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-mini-instruct

7GB

38.39 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1

7GB

38.37 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5

7GB

38.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-7B-Instruct

7GB

38.33 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

7GB

38.31 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

5GB

38.27 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Coder-1.5B

5GB

38.27 tok/sEstimated

Auto-generated benchmark

HuggingFaceH4/zephyr-7b-beta

7GB

38.17 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

7GB

38.10 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

8GB

38.03 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

5GB

38.03 tok/sEstimated

Auto-generated benchmark

OpenPipe/Qwen3-14B-Instruct

7GB

38.01 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

5GB

37.98 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-FP8

8GB

37.96 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

7GB

37.93 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

7GB

37.76 tok/sEstimated

Auto-generated benchmark

sshleifer/tiny-gpt2

7GB

37.71 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

8GB

37.56 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

7GB

37.48 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-7B-Instruct

7GB

37.46 tok/sEstimated

Auto-generated benchmark

unsloth/mistral-7b-v0.3-bnb-4bit

7GB

37.45 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B-Instruct

7GB

37.31 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

7GB

37.24 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

8GB

37.07 tok/sEstimated

Auto-generated benchmark

numind/NuExtract-1.5

7GB

37.04 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

7GB

36.87 tok/sEstimated

Auto-generated benchmark

google/gemma-3-270m-it

7GB

36.81 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3-0324

7GB

36.71 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

8GB

36.56 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

7GB

36.50 tok/sEstimated

Auto-generated benchmark

nvidia/NVIDIA-Nemotron-Nano-9B-v2

9GB

36.11 tok/sEstimated

Auto-generated benchmark

distilbert/distilgpt2

7GB

36.09 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-0.6B

6GB

36.03 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

7GB

35.99 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

7GB

35.82 tok/sEstimated

Auto-generated benchmark

trl-internal-testing/tiny-random-LlamaForCausalLM

7GB

35.77 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

8GB

35.76 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B-Instruct

8GB

35.73 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

7GB

35.71 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

7GB

35.56 tok/sEstimated

Auto-generated benchmark

rednote-hilab/dots.ocr

7GB

35.44 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

8GB

35.29 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

8GB

35.23 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

7GB

35.15 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-chat-hf

7GB

35.13 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

7GB

35.11 tok/sEstimated

Auto-generated benchmark

HuggingFaceTB/SmolLM2-135M

7GB

35.09 tok/sEstimated

Auto-generated benchmark

bigscience/bloomz-560m

7GB

35.08 tok/sEstimated

Auto-generated benchmark

HuggingFaceM4/tiny-random-LlamaForCausalLM

7GB

35.07 tok/sEstimated

Auto-generated benchmark

vikhyatk/moondream2

7GB

34.87 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

7GB

34.67 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

7GB

34.37 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-7b-hf

7GB

34.26 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

7GB

34.21 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

8GB

34.21 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

7GB

34.17 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

8GB

34.11 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B-Instruct

8GB

34.07 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

7GB

33.90 tok/sEstimated

Auto-generated benchmark

microsoft/phi-4

7GB

33.89 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-unsloth-bnb-4bit

10GB

33.83 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-4k-instruct

7GB

33.72 tok/sEstimated

Auto-generated benchmark

EleutherAI/gpt-neo-125m

7GB

33.65 tok/sEstimated

Auto-generated benchmark

mlx-community/gpt-oss-20b-MXFP4-Q8

10GB

33.63 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

8GB

33.55 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-large

7GB

33.44 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-BF16

10GB

33.23 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

7GB

33.17 tok/sEstimated

Auto-generated benchmark

openai/gpt-oss-20b

10GB

33.15 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

8GB

32.81 tok/sEstimated

Auto-generated benchmark

IlyaGusev/saiga_llama3_8b

8GB

32.77 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct

8GB

32.75 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Base

8GB

32.56 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit

8GB

32.24 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Instruct-2507

15GB

32.18 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit

15GB

32.12 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

8GB

32.08 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

15GB

32.06 tok/sEstimated

Auto-generated benchmark

GSAI-ML/LLaDA-8B-Instruct

8GB

31.64 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B-Instruct

14GB

30.99 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit

15GB

30.81 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Thinking-2507

15GB

30.29 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

16GB

30.24 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit

15GB

30.20 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-32B

16GB

30.14 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-14B

14GB

29.95 tok/sEstimated

Auto-generated benchmark

OpenPipe/Qwen3-14B-Instruct

14GB

29.60 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Coder-30B-A3B-Instruct

15GB

29.49 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B-Base

14GB

29.30 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-32B-Instruct

16GB

28.93 tok/sEstimated

Auto-generated benchmark

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit

16GB

28.85 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-2-13b-chat-hf

13GB

28.49 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-32B

16GB

28.35 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-14B

14GB

27.69 tok/sEstimated

Auto-generated benchmark

ai-forever/ruGPT-3.5-13B

13GB

27.49 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B

15GB

27.07 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit

15GB

26.88 tok/sEstimated

Auto-generated benchmark

codellama/CodeLlama-34b-hf

17GB

26.85 tok/sEstimated

Auto-generated benchmark

baichuan-inc/Baichuan-M2-32B

16GB

26.63 tok/sEstimated

Auto-generated benchmark

dphn/dolphin-2.9.1-yi-1.5-34b

17GB

26.45 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-unsloth-bnb-4bit

20GB

25.80 tok/sEstimated

Auto-generated benchmark

openai/gpt-oss-20b

20GB

23.52 tok/sEstimated

Auto-generated benchmark

mlx-community/gpt-oss-20b-MXFP4-Q8

20GB

23.28 tok/sEstimated

Auto-generated benchmark

unsloth/gpt-oss-20b-BF16

20GB

22.50 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit

30GB

22.47 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-32B-Instruct

32GB

21.55 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-32B

32GB

21.29 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-32B

32GB

21.06 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit

30GB

20.93 tok/sEstimated

Auto-generated benchmark

baichuan-inc/Baichuan-M2-32B

32GB

20.71 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Coder-30B-A3B-Instruct

30GB

20.69 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.3-70B-Instruct

35GB

20.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

30GB

20.18 tok/sEstimated

Auto-generated benchmark

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit

32GB

20.02 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit

30GB

19.94 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-70B-Instruct

35GB

19.87 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Instruct-2507

30GB

19.85 tok/sEstimated

Auto-generated benchmark

AI-MO/Kimina-Prover-72B

36GB

19.50 tok/sEstimated

Auto-generated benchmark

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

35GB

19.49 tok/sEstimated

Auto-generated benchmark

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic

35GB

19.37 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B

30GB

19.23 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-70B-Instruct

35GB

19.05 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit

30GB

18.92 tok/sEstimated

Auto-generated benchmark

codellama/CodeLlama-34b-hf

34GB

18.87 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8

40GB

18.80 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-30B-A3B-Thinking-2507

30GB

18.62 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

32GB

18.49 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-72B-Instruct

36GB

17.76 tok/sEstimated

Auto-generated benchmark

dphn/dolphin-2.9.1-yi-1.5-34b

34GB

17.61 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Next-80B-A3B-Instruct

40GB

17.58 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Next-80B-A3B-Thinking

40GB

16.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8

40GB

16.45 tok/sEstimated

Auto-generated benchmark

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic

45GB

16.44 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q8	Not supported	—	80GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q4	Fits comfortably	16.45 tok/sEstimated	40GB (have 48GB)
ai-forever/ruGPT-3.5-13B	Q8	Fits comfortably	27.49 tok/sEstimated	13GB (have 48GB)
ai-forever/ruGPT-3.5-13B	Q4	Fits comfortably	39.93 tok/sEstimated	7GB (have 48GB)
baichuan-inc/Baichuan-M2-32B	Q8	Fits comfortably	20.71 tok/sEstimated	32GB (have 48GB)
baichuan-inc/Baichuan-M2-32B	Q4	Fits comfortably	26.63 tok/sEstimated	16GB (have 48GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	35.07 tok/sEstimated	7GB (have 48GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	52.76 tok/sEstimated	4GB (have 48GB)
ibm-granite/granite-3.3-8b-instruct	Q8	Fits comfortably	35.76 tok/sEstimated	8GB (have 48GB)
ibm-granite/granite-3.3-8b-instruct	Q4	Fits comfortably	46.34 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-1.7B-Base	Q8	Fits comfortably	34.67 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-1.7B-Base	Q4	Fits comfortably	50.16 tok/sEstimated	4GB (have 48GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q8	Fits comfortably	25.80 tok/sEstimated	20GB (have 48GB)
unsloth/gpt-oss-20b-unsloth-bnb-4bit	Q4	Fits comfortably	33.83 tok/sEstimated	10GB (have 48GB)
BSC-LT/salamandraTA-7b-instruct	Q8	Fits comfortably	39.04 tok/sEstimated	7GB (have 48GB)
BSC-LT/salamandraTA-7b-instruct	Q4	Fits comfortably	47.95 tok/sEstimated	4GB (have 48GB)
dicta-il/dictalm2.0-instruct	Q8	Fits comfortably	37.48 tok/sEstimated	7GB (have 48GB)
dicta-il/dictalm2.0-instruct	Q4	Fits comfortably	54.24 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q8	Fits comfortably	19.94 tok/sEstimated	30GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q4	Fits comfortably	30.20 tok/sEstimated	15GB (have 48GB)
GSAI-ML/LLaDA-8B-Base	Q8	Fits comfortably	32.56 tok/sEstimated	8GB (have 48GB)
GSAI-ML/LLaDA-8B-Base	Q4	Fits comfortably	47.89 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q8	Fits comfortably	18.92 tok/sEstimated	30GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit	Q4	Fits comfortably	32.12 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen2-0.5B-Instruct	Q8	Fits comfortably	44.38 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2-0.5B-Instruct	Q4	Fits comfortably	63.99 tok/sEstimated	3GB (have 48GB)
deepseek-ai/DeepSeek-V3	Q8	Fits comfortably	35.56 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-V3	Q4	Fits comfortably	50.59 tok/sEstimated	4GB (have 48GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q8	Fits comfortably	43.65 tok/sEstimated	5GB (have 48GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	Fits comfortably	62.94 tok/sEstimated	3GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Fits comfortably	20.93 tok/sEstimated	30GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	Fits comfortably	26.88 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	Q8	Fits comfortably	18.62 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	Q4	Fits comfortably	30.29 tok/sEstimated	15GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q8	Fits comfortably	22.47 tok/sEstimated	30GB (have 48GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q4	Fits comfortably	30.81 tok/sEstimated	15GB (have 48GB)
AI-MO/Kimina-Prover-72B	Q8	Not supported	—	72GB (have 48GB)
AI-MO/Kimina-Prover-72B	Q4	Fits comfortably	19.50 tok/sEstimated	36GB (have 48GB)
apple/OpenELM-1_1B-Instruct	Q8	Fits comfortably	69.86 tok/sEstimated	1GB (have 48GB)
apple/OpenELM-1_1B-Instruct	Q4	Fits comfortably	109.94 tok/sEstimated	1GB (have 48GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	35.23 tok/sEstimated	8GB (have 48GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	51.05 tok/sEstimated	4GB (have 48GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	Fits comfortably	36.11 tok/sEstimated	9GB (have 48GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	Fits comfortably	52.68 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-3B	Q8	Fits comfortably	53.68 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-3B	Q4	Fits comfortably	65.86 tok/sEstimated	2GB (have 48GB)
lmsys/vicuna-7b-v1.5	Q8	Fits comfortably	39.55 tok/sEstimated	7GB (have 48GB)
lmsys/vicuna-7b-v1.5	Q4	Fits comfortably	52.39 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-2-13b-chat-hf	Q8	Fits comfortably	28.49 tok/sEstimated	13GB (have 48GB)
meta-llama/Llama-2-13b-chat-hf	Q4	Fits comfortably	42.45 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q8	Not supported	—	80GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q4	Fits comfortably	16.91 tok/sEstimated	40GB (have 48GB)
unsloth/gemma-3-1b-it	Q8	Fits comfortably	69.32 tok/sEstimated	1GB (have 48GB)
unsloth/gemma-3-1b-it	Q4	Fits comfortably	104.34 tok/sEstimated	1GB (have 48GB)
bigcode/starcoder2-3b	Q8	Fits comfortably	46.63 tok/sEstimated	3GB (have 48GB)
bigcode/starcoder2-3b	Q4	Fits comfortably	73.48 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q8	Not supported	—	80GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q4	Fits comfortably	18.80 tok/sEstimated	40GB (have 48GB)
ibm-granite/granite-docling-258M	Q8	Fits comfortably	38.55 tok/sEstimated	7GB (have 48GB)
ibm-granite/granite-docling-258M	Q4	Fits comfortably	50.02 tok/sEstimated	4GB (have 48GB)
skt/kogpt2-base-v2	Q8	Fits comfortably	39.60 tok/sEstimated	7GB (have 48GB)
skt/kogpt2-base-v2	Q4	Fits comfortably	52.91 tok/sEstimated	4GB (have 48GB)
google/gemma-3-270m-it	Q8	Fits comfortably	36.81 tok/sEstimated	7GB (have 48GB)
google/gemma-3-270m-it	Q4	Fits comfortably	54.61 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q8	Fits comfortably	48.48 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	Fits comfortably	58.36 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen2.5-32B	Q8	Fits comfortably	21.29 tok/sEstimated	32GB (have 48GB)
Qwen/Qwen2.5-32B	Q4	Fits comfortably	28.35 tok/sEstimated	16GB (have 48GB)
parler-tts/parler-tts-large-v1	Q8	Fits comfortably	33.90 tok/sEstimated	7GB (have 48GB)
parler-tts/parler-tts-large-v1	Q4	Fits comfortably	49.36 tok/sEstimated	4GB (have 48GB)
EleutherAI/pythia-70m-deduped	Q8	Fits comfortably	39.68 tok/sEstimated	7GB (have 48GB)
EleutherAI/pythia-70m-deduped	Q4	Fits comfortably	50.88 tok/sEstimated	4GB (have 48GB)
microsoft/VibeVoice-1.5B	Q8	Fits comfortably	39.76 tok/sEstimated	5GB (have 48GB)
microsoft/VibeVoice-1.5B	Q4	Fits comfortably	65.40 tok/sEstimated	3GB (have 48GB)
ibm-granite/granite-3.3-2b-instruct	Q8	Fits comfortably	54.71 tok/sEstimated	2GB (have 48GB)
ibm-granite/granite-3.3-2b-instruct	Q4	Fits comfortably	89.66 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen2.5-72B-Instruct	Q8	Not supported	—	72GB (have 48GB)
Qwen/Qwen2.5-72B-Instruct	Q4	Fits comfortably	17.76 tok/sEstimated	36GB (have 48GB)
liuhaotian/llava-v1.5-7b	Q8	Fits comfortably	35.99 tok/sEstimated	7GB (have 48GB)
liuhaotian/llava-v1.5-7b	Q4	Fits comfortably	52.72 tok/sEstimated	4GB (have 48GB)
google/gemma-2b	Q8	Fits comfortably	61.20 tok/sEstimated	2GB (have 48GB)
google/gemma-2b	Q4	Fits comfortably	89.87 tok/sEstimated	1GB (have 48GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	Fits comfortably	40.13 tok/sEstimated	7GB (have 48GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	Fits comfortably	57.24 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-235B-A22B	Q8	Not supported	—	235GB (have 48GB)
Qwen/Qwen3-235B-A22B	Q4	Not supported	—	118GB (have 48GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	Fits comfortably	32.24 tok/sEstimated	8GB (have 48GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	52.71 tok/sEstimated	4GB (have 48GB)
microsoft/Phi-4-mini-instruct	Q8	Fits comfortably	38.39 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-4-mini-instruct	Q4	Fits comfortably	56.04 tok/sEstimated	4GB (have 48GB)
llamafactory/tiny-random-Llama-3	Q8	Fits comfortably	39.28 tok/sEstimated	7GB (have 48GB)
llamafactory/tiny-random-Llama-3	Q4	Fits comfortably	53.96 tok/sEstimated	4GB (have 48GB)
HuggingFaceH4/zephyr-7b-beta	Q8	Fits comfortably	38.17 tok/sEstimated	7GB (have 48GB)
HuggingFaceH4/zephyr-7b-beta	Q4	Fits comfortably	50.83 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Thinking-2507	Q8	Fits comfortably	42.15 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Thinking-2507	Q4	Fits comfortably	70.83 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q8	Fits comfortably	20.18 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	Q4	Fits comfortably	32.06 tok/sEstimated	15GB (have 48GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	Fits comfortably	34.21 tok/sEstimated	8GB (have 48GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	Fits comfortably	50.73 tok/sEstimated	4GB (have 48GB)
unsloth/Llama-3.2-1B-Instruct	Q8	Fits comfortably	77.43 tok/sEstimated	1GB (have 48GB)
unsloth/Llama-3.2-1B-Instruct	Q4	Fits comfortably	106.57 tok/sEstimated	1GB (have 48GB)
GSAI-ML/LLaDA-8B-Instruct	Q8	Fits comfortably	31.64 tok/sEstimated	8GB (have 48GB)
GSAI-ML/LLaDA-8B-Instruct	Q4	Fits comfortably	49.22 tok/sEstimated	4GB (have 48GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q8	Not supported	—	90GB (have 48GB)
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q4	Fits comfortably	16.44 tok/sEstimated	45GB (have 48GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q8	Fits comfortably	38.33 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-Coder-7B-Instruct	Q4	Fits comfortably	52.46 tok/sEstimated	4GB (have 48GB)
numind/NuExtract-1.5	Q8	Fits comfortably	37.04 tok/sEstimated	7GB (have 48GB)
numind/NuExtract-1.5	Q4	Fits comfortably	53.11 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q8	Fits comfortably	38.45 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	Q4	Fits comfortably	52.92 tok/sEstimated	4GB (have 48GB)
hmellor/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	36.87 tok/sEstimated	7GB (have 48GB)
hmellor/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	49.10 tok/sEstimated	4GB (have 48GB)
huggyllama/llama-7b	Q8	Fits comfortably	39.75 tok/sEstimated	7GB (have 48GB)
huggyllama/llama-7b	Q4	Fits comfortably	57.68 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-V3-0324	Q8	Fits comfortably	36.71 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-V3-0324	Q4	Fits comfortably	48.59 tok/sEstimated	4GB (have 48GB)
microsoft/Phi-3-mini-128k-instruct	Q8	Fits comfortably	39.99 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3-mini-128k-instruct	Q4	Fits comfortably	47.55 tok/sEstimated	4GB (have 48GB)
sshleifer/tiny-gpt2	Q8	Fits comfortably	37.71 tok/sEstimated	7GB (have 48GB)
sshleifer/tiny-gpt2	Q4	Fits comfortably	48.01 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-Guard-3-8B	Q8	Fits comfortably	36.56 tok/sEstimated	8GB (have 48GB)
meta-llama/Llama-Guard-3-8B	Q4	Fits comfortably	47.17 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2-xl	Q8	Fits comfortably	35.15 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2-xl	Q4	Fits comfortably	48.72 tok/sEstimated	4GB (have 48GB)
OpenPipe/Qwen3-14B-Instruct	Q8	Fits comfortably	29.60 tok/sEstimated	14GB (have 48GB)
OpenPipe/Qwen3-14B-Instruct	Q4	Fits comfortably	38.01 tok/sEstimated	7GB (have 48GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q8	Not supported	—	70GB (have 48GB)
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16	Q4	Fits comfortably	19.49 tok/sEstimated	35GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q8	Fits comfortably	49.37 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit	Q4	Fits comfortably	61.08 tok/sEstimated	2GB (have 48GB)
ibm-research/PowerMoE-3b	Q8	Fits comfortably	54.64 tok/sEstimated	3GB (have 48GB)
ibm-research/PowerMoE-3b	Q4	Fits comfortably	66.27 tok/sEstimated	2GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q8	Fits comfortably	43.03 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit	Q4	Fits comfortably	66.20 tok/sEstimated	2GB (have 48GB)
unsloth/Llama-3.2-3B-Instruct	Q8	Fits comfortably	53.61 tok/sEstimated	3GB (have 48GB)
unsloth/Llama-3.2-3B-Instruct	Q4	Fits comfortably	74.05 tok/sEstimated	2GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	Fits comfortably	44.19 tok/sEstimated	4GB (have 48GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	Fits comfortably	61.10 tok/sEstimated	2GB (have 48GB)
meta-llama/Llama-3.2-3B	Q8	Fits comfortably	45.89 tok/sEstimated	3GB (have 48GB)
meta-llama/Llama-3.2-3B	Q4	Fits comfortably	66.24 tok/sEstimated	2GB (have 48GB)
EleutherAI/gpt-neo-125m	Q8	Fits comfortably	33.65 tok/sEstimated	7GB (have 48GB)
EleutherAI/gpt-neo-125m	Q4	Fits comfortably	57.65 tok/sEstimated	4GB (have 48GB)
codellama/CodeLlama-34b-hf	Q8	Fits comfortably	18.87 tok/sEstimated	34GB (have 48GB)
codellama/CodeLlama-34b-hf	Q4	Fits comfortably	26.85 tok/sEstimated	17GB (have 48GB)
meta-llama/Llama-Guard-3-1B	Q8	Fits comfortably	73.78 tok/sEstimated	1GB (have 48GB)
meta-llama/Llama-Guard-3-1B	Q4	Fits comfortably	107.89 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen2-1.5B-Instruct	Q8	Fits comfortably	37.98 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2-1.5B-Instruct	Q4	Fits comfortably	56.36 tok/sEstimated	3GB (have 48GB)
google/gemma-2-2b-it	Q8	Fits comfortably	60.04 tok/sEstimated	2GB (have 48GB)
google/gemma-2-2b-it	Q4	Fits comfortably	89.25 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen2.5-14B	Q8	Fits comfortably	29.95 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	41.99 tok/sEstimated	7GB (have 48GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q8	Fits comfortably	20.02 tok/sEstimated	32GB (have 48GB)
unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit	Q4	Fits comfortably	28.85 tok/sEstimated	16GB (have 48GB)
microsoft/Phi-3.5-mini-instruct	Q8	Fits comfortably	35.71 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3.5-mini-instruct	Q4	Fits comfortably	50.91 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Base	Q8	Fits comfortably	45.87 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	67.52 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen2-7B-Instruct	Q8	Fits comfortably	37.46 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2-7B-Instruct	Q4	Fits comfortably	52.33 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-2-7b-chat-hf	Q8	Fits comfortably	35.13 tok/sEstimated	7GB (have 48GB)
meta-llama/Llama-2-7b-chat-hf	Q4	Fits comfortably	48.71 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-14B-Base	Q8	Fits comfortably	29.30 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen3-14B-Base	Q4	Fits comfortably	43.20 tok/sEstimated	7GB (have 48GB)
swiss-ai/Apertus-8B-Instruct-2509	Q8	Fits comfortably	35.29 tok/sEstimated	8GB (have 48GB)
swiss-ai/Apertus-8B-Instruct-2509	Q4	Fits comfortably	50.47 tok/sEstimated	4GB (have 48GB)
microsoft/Phi-3.5-vision-instruct	Q8	Fits comfortably	35.11 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3.5-vision-instruct	Q4	Fits comfortably	57.62 tok/sEstimated	4GB (have 48GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	Fits comfortably	37.45 tok/sEstimated	7GB (have 48GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	Fits comfortably	54.33 tok/sEstimated	4GB (have 48GB)
rinna/japanese-gpt-neox-small	Q8	Fits comfortably	38.71 tok/sEstimated	7GB (have 48GB)
rinna/japanese-gpt-neox-small	Q4	Fits comfortably	52.40 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-Coder-1.5B	Q8	Fits comfortably	38.27 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-Coder-1.5B	Q4	Fits comfortably	55.97 tok/sEstimated	3GB (have 48GB)
IlyaGusev/saiga_llama3_8b	Q8	Fits comfortably	32.77 tok/sEstimated	8GB (have 48GB)
IlyaGusev/saiga_llama3_8b	Q4	Fits comfortably	47.95 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-30B-A3B	Q8	Fits comfortably	19.23 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-30B-A3B	Q4	Fits comfortably	27.07 tok/sEstimated	15GB (have 48GB)
deepseek-ai/DeepSeek-R1	Q8	Fits comfortably	38.37 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-R1	Q4	Fits comfortably	51.66 tok/sEstimated	4GB (have 48GB)
microsoft/DialoGPT-small	Q8	Fits comfortably	37.24 tok/sEstimated	7GB (have 48GB)
microsoft/DialoGPT-small	Q4	Fits comfortably	49.67 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-8B-FP8	Q8	Fits comfortably	37.96 tok/sEstimated	8GB (have 48GB)
Qwen/Qwen3-8B-FP8	Q4	Fits comfortably	49.77 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q8	Fits comfortably	20.69 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q4	Fits comfortably	29.49 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-Embedding-4B	Q8	Fits comfortably	48.54 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-Embedding-4B	Q4	Fits comfortably	60.50 tok/sEstimated	2GB (have 48GB)
microsoft/Phi-4-multimodal-instruct	Q8	Fits comfortably	37.76 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-4-multimodal-instruct	Q4	Fits comfortably	49.08 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-8B-Base	Q8	Fits comfortably	32.81 tok/sEstimated	8GB (have 48GB)
Qwen/Qwen3-8B-Base	Q4	Fits comfortably	51.56 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-0.6B-Base	Q8	Fits comfortably	41.69 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen3-0.6B-Base	Q4	Fits comfortably	61.15 tok/sEstimated	3GB (have 48GB)
openai-community/gpt2-medium	Q8	Fits comfortably	36.50 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2-medium	Q4	Fits comfortably	55.91 tok/sEstimated	4GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	35.77 tok/sEstimated	7GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	Q4	Fits comfortably	53.60 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	Q8	Fits comfortably	38.27 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	Q4	Fits comfortably	54.32 tok/sEstimated	3GB (have 48GB)
HuggingFaceTB/SmolLM-135M	Q8	Fits comfortably	39.15 tok/sEstimated	7GB (have 48GB)
HuggingFaceTB/SmolLM-135M	Q4	Fits comfortably	47.50 tok/sEstimated	4GB (have 48GB)
unsloth/gpt-oss-20b-BF16	Q8	Fits comfortably	22.50 tok/sEstimated	20GB (have 48GB)
unsloth/gpt-oss-20b-BF16	Q4	Fits comfortably	33.23 tok/sEstimated	10GB (have 48GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q8	Not supported	—	70GB (have 48GB)
meta-llama/Meta-Llama-3-70B-Instruct	Q4	Fits comfortably	19.87 tok/sEstimated	35GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	32.75 tok/sEstimated	8GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct	Q4	Fits comfortably	45.18 tok/sEstimated	4GB (have 48GB)
zai-org/GLM-4.5-Air	Q8	Fits comfortably	34.17 tok/sEstimated	7GB (have 48GB)
zai-org/GLM-4.5-Air	Q4	Fits comfortably	53.08 tok/sEstimated	4GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.1	Q8	Fits comfortably	40.39 tok/sEstimated	7GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.1	Q4	Fits comfortably	55.37 tok/sEstimated	4GB (have 48GB)
LiquidAI/LFM2-1.2B	Q8	Fits comfortably	60.96 tok/sEstimated	2GB (have 48GB)
LiquidAI/LFM2-1.2B	Q4	Fits comfortably	88.86 tok/sEstimated	1GB (have 48GB)
mistralai/Mistral-7B-v0.1	Q8	Fits comfortably	33.17 tok/sEstimated	7GB (have 48GB)
mistralai/Mistral-7B-v0.1	Q4	Fits comfortably	47.80 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Fits comfortably	21.55 tok/sEstimated	32GB (have 48GB)
Qwen/Qwen2.5-32B-Instruct	Q4	Fits comfortably	28.93 tok/sEstimated	16GB (have 48GB)
deepseek-ai/DeepSeek-R1-0528	Q8	Fits comfortably	38.31 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-R1-0528	Q4	Fits comfortably	53.96 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.1-8B	Q8	Fits comfortably	32.08 tok/sEstimated	8GB (have 48GB)
meta-llama/Llama-3.1-8B	Q4	Fits comfortably	48.55 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-V3.1	Q8	Fits comfortably	39.68 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-V3.1	Q4	Fits comfortably	55.62 tok/sEstimated	4GB (have 48GB)
microsoft/phi-4	Q8	Fits comfortably	33.89 tok/sEstimated	7GB (have 48GB)
microsoft/phi-4	Q4	Fits comfortably	54.66 tok/sEstimated	4GB (have 48GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q8	Fits comfortably	54.75 tok/sEstimated	3GB (have 48GB)
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	Fits comfortably	77.31 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	38.64 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	59.66 tok/sEstimated	3GB (have 48GB)
MiniMaxAI/MiniMax-M2	Q8	Fits comfortably	34.21 tok/sEstimated	7GB (have 48GB)
MiniMaxAI/MiniMax-M2	Q4	Fits comfortably	57.62 tok/sEstimated	4GB (have 48GB)
microsoft/DialoGPT-medium	Q8	Fits comfortably	39.53 tok/sEstimated	7GB (have 48GB)
microsoft/DialoGPT-medium	Q4	Fits comfortably	55.47 tok/sEstimated	4GB (have 48GB)
zai-org/GLM-4.6-FP8	Q8	Fits comfortably	39.33 tok/sEstimated	7GB (have 48GB)
zai-org/GLM-4.6-FP8	Q4	Fits comfortably	51.07 tok/sEstimated	4GB (have 48GB)
HuggingFaceTB/SmolLM2-135M	Q8	Fits comfortably	35.09 tok/sEstimated	7GB (have 48GB)
HuggingFaceTB/SmolLM2-135M	Q4	Fits comfortably	47.66 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q8	Fits comfortably	38.03 tok/sEstimated	8GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	Fits comfortably	48.91 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-2-7b-hf	Q8	Fits comfortably	34.26 tok/sEstimated	7GB (have 48GB)
meta-llama/Llama-2-7b-hf	Q4	Fits comfortably	56.61 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q8	Fits comfortably	35.82 tok/sEstimated	7GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	56.69 tok/sEstimated	4GB (have 48GB)
microsoft/phi-2	Q8	Fits comfortably	38.10 tok/sEstimated	7GB (have 48GB)
microsoft/phi-2	Q4	Fits comfortably	55.64 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	Q8	Not supported	—	70GB (have 48GB)
meta-llama/Llama-3.1-70B-Instruct	Q4	Fits comfortably	19.05 tok/sEstimated	35GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	38.03 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q4	Fits comfortably	61.77 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen3-14B	Q8	Fits comfortably	27.69 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen3-14B	Q4	Fits comfortably	42.13 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-Embedding-8B	Q8	Fits comfortably	37.07 tok/sEstimated	8GB (have 48GB)
Qwen/Qwen3-Embedding-8B	Q4	Fits comfortably	49.88 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.3-70B-Instruct	Q8	Not supported	—	70GB (have 48GB)
meta-llama/Llama-3.3-70B-Instruct	Q4	Fits comfortably	20.65 tok/sEstimated	35GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q8	Fits comfortably	34.11 tok/sEstimated	8GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	Fits comfortably	49.83 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-14B-Instruct	Q8	Fits comfortably	30.99 tok/sEstimated	14GB (have 48GB)
Qwen/Qwen2.5-14B-Instruct	Q4	Fits comfortably	37.31 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-1.5B	Q8	Fits comfortably	38.70 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-1.5B	Q4	Fits comfortably	64.77 tok/sEstimated	3GB (have 48GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q8	Fits comfortably	45.12 tok/sEstimated	4GB (have 48GB)
kaitchup/Phi-3-mini-4k-instruct-gptq-4bit	Q4	Fits comfortably	61.95 tok/sEstimated	2GB (have 48GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q8	Fits comfortably	23.28 tok/sEstimated	20GB (have 48GB)
mlx-community/gpt-oss-20b-MXFP4-Q8	Q4	Fits comfortably	33.63 tok/sEstimated	10GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q8	Fits comfortably	44.60 tok/sEstimated	5GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	Fits comfortably	57.07 tok/sEstimated	3GB (have 48GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q8	Fits comfortably	34.07 tok/sEstimated	8GB (have 48GB)
meta-llama/Meta-Llama-3-8B-Instruct	Q4	Fits comfortably	54.56 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-Reranker-0.6B	Q8	Fits comfortably	39.72 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen3-Reranker-0.6B	Q4	Fits comfortably	60.77 tok/sEstimated	3GB (have 48GB)
rednote-hilab/dots.ocr	Q8	Fits comfortably	35.44 tok/sEstimated	7GB (have 48GB)
rednote-hilab/dots.ocr	Q4	Fits comfortably	56.78 tok/sEstimated	4GB (have 48GB)
google-t5/t5-3b	Q8	Fits comfortably	50.26 tok/sEstimated	3GB (have 48GB)
google-t5/t5-3b	Q4	Fits comfortably	68.15 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q8	Fits comfortably	19.85 tok/sEstimated	30GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	32.18 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-4B	Q8	Fits comfortably	43.91 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B	Q4	Fits comfortably	61.08 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	39.33 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	49.48 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2-large	Q8	Fits comfortably	33.44 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2-large	Q4	Fits comfortably	57.44 tok/sEstimated	4GB (have 48GB)
microsoft/Phi-3-mini-4k-instruct	Q8	Fits comfortably	33.72 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3-mini-4k-instruct	Q4	Fits comfortably	57.29 tok/sEstimated	4GB (have 48GB)
allenai/OLMo-2-0425-1B	Q8	Fits comfortably	78.14 tok/sEstimated	1GB (have 48GB)
allenai/OLMo-2-0425-1B	Q4	Fits comfortably	96.84 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q8	Not supported	—	80GB (have 48GB)
Qwen/Qwen3-Next-80B-A3B-Instruct	Q4	Fits comfortably	17.58 tok/sEstimated	40GB (have 48GB)
Qwen/Qwen3-32B	Q8	Fits comfortably	21.06 tok/sEstimated	32GB (have 48GB)
Qwen/Qwen3-32B	Q4	Fits comfortably	30.14 tok/sEstimated	16GB (have 48GB)
Qwen/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	41.74 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	55.99 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-7B	Q8	Fits comfortably	39.47 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-7B	Q4	Fits comfortably	48.32 tok/sEstimated	4GB (have 48GB)
meta-llama/Meta-Llama-3-8B	Q8	Fits comfortably	37.56 tok/sEstimated	8GB (have 48GB)
meta-llama/Meta-Llama-3-8B	Q4	Fits comfortably	47.08 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.2-1B	Q8	Fits comfortably	73.60 tok/sEstimated	1GB (have 48GB)
meta-llama/Llama-3.2-1B	Q4	Fits comfortably	106.24 tok/sEstimated	1GB (have 48GB)
petals-team/StableBeluga2	Q8	Fits comfortably	38.67 tok/sEstimated	7GB (have 48GB)
petals-team/StableBeluga2	Q4	Fits comfortably	48.36 tok/sEstimated	4GB (have 48GB)
vikhyatk/moondream2	Q8	Fits comfortably	34.87 tok/sEstimated	7GB (have 48GB)
vikhyatk/moondream2	Q4	Fits comfortably	52.88 tok/sEstimated	4GB (have 48GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	50.53 tok/sEstimated	3GB (have 48GB)
meta-llama/Llama-3.2-3B-Instruct	Q4	Fits comfortably	69.37 tok/sEstimated	2GB (have 48GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q8	Not supported	—	70GB (have 48GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	Fits comfortably	19.37 tok/sEstimated	35GB (have 48GB)
distilbert/distilgpt2	Q8	Fits comfortably	36.09 tok/sEstimated	7GB (have 48GB)
distilbert/distilgpt2	Q4	Fits comfortably	52.30 tok/sEstimated	4GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q8	Fits comfortably	18.49 tok/sEstimated	32GB (have 48GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Q4	Fits comfortably	30.24 tok/sEstimated	16GB (have 48GB)
inference-net/Schematron-3B	Q8	Fits comfortably	53.30 tok/sEstimated	3GB (have 48GB)
inference-net/Schematron-3B	Q4	Fits comfortably	67.62 tok/sEstimated	2GB (have 48GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	33.55 tok/sEstimated	8GB (have 48GB)
Qwen/Qwen3-8B	Q4	Fits comfortably	52.50 tok/sEstimated	4GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.2	Q8	Fits comfortably	39.86 tok/sEstimated	7GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	49.72 tok/sEstimated	4GB (have 48GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q8	Fits comfortably	52.35 tok/sEstimated	3GB (have 48GB)
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	Fits comfortably	67.82 tok/sEstimated	2GB (have 48GB)
bigscience/bloomz-560m	Q8	Fits comfortably	35.08 tok/sEstimated	7GB (have 48GB)
bigscience/bloomz-560m	Q4	Fits comfortably	53.92 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-3B-Instruct	Q8	Fits comfortably	46.84 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-3B-Instruct	Q4	Fits comfortably	65.93 tok/sEstimated	2GB (have 48GB)
openai/gpt-oss-120b	Q8	Not supported	—	120GB (have 48GB)
openai/gpt-oss-120b	Q4	Not supported	—	60GB (have 48GB)
meta-llama/Llama-3.2-1B-Instruct	Q8	Fits comfortably	77.44 tok/sEstimated	1GB (have 48GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	110.71 tok/sEstimated	1GB (have 48GB)
Qwen/Qwen3-4B-Instruct-2507	Q8	Fits comfortably	44.69 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-4B-Instruct-2507	Q4	Fits comfortably	58.72 tok/sEstimated	2GB (have 48GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q8	Fits comfortably	38.35 tok/sEstimated	7GB (have 48GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5	Q4	Fits comfortably	53.82 tok/sEstimated	4GB (have 48GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q8	Fits comfortably	74.92 tok/sEstimated	1GB (have 48GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	Fits comfortably	102.48 tok/sEstimated	1GB (have 48GB)
facebook/opt-125m	Q8	Fits comfortably	39.23 tok/sEstimated	7GB (have 48GB)
facebook/opt-125m	Q4	Fits comfortably	53.05 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen2.5-1.5B-Instruct	Q8	Fits comfortably	44.60 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen2.5-1.5B-Instruct	Q4	Fits comfortably	61.61 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen3-Embedding-0.6B	Q8	Fits comfortably	36.03 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen3-Embedding-0.6B	Q4	Fits comfortably	51.61 tok/sEstimated	3GB (have 48GB)
google/gemma-3-1b-it	Q8	Fits comfortably	68.62 tok/sEstimated	1GB (have 48GB)
google/gemma-3-1b-it	Q4	Fits comfortably	103.17 tok/sEstimated	1GB (have 48GB)
openai/gpt-oss-20b	Q8	Fits comfortably	23.52 tok/sEstimated	20GB (have 48GB)
openai/gpt-oss-20b	Q4	Fits comfortably	33.15 tok/sEstimated	10GB (have 48GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q8	Fits comfortably	17.61 tok/sEstimated	34GB (have 48GB)
dphn/dolphin-2.9.1-yi-1.5-34b	Q4	Fits comfortably	26.45 tok/sEstimated	17GB (have 48GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	35.73 tok/sEstimated	8GB (have 48GB)
meta-llama/Llama-3.1-8B-Instruct	Q4	Fits comfortably	45.54 tok/sEstimated	4GB (have 48GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q8	Fits comfortably	41.23 tok/sEstimated	5GB (have 48GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	56.57 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen3-0.6B	Q8	Fits comfortably	41.03 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen3-0.6B	Q4	Fits comfortably	57.24 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2.5-7B-Instruct	Q8	Fits comfortably	37.93 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-7B-Instruct	Q4	Fits comfortably	48.82 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2	Q8	Fits comfortably	34.37 tok/sEstimated	7GB (have 48GB)
openai-community/gpt2	Q4	Fits comfortably	47.49 tok/sEstimated	4GB (have 48GB)

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8Q8

Not supported80GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8Q4

Fits comfortably40GB required · 48GB available

16.45 tok/sEstimated

ai-forever/ruGPT-3.5-13BQ8

Fits comfortably13GB required · 48GB available

27.49 tok/sEstimated

ai-forever/ruGPT-3.5-13BQ4

Fits comfortably7GB required · 48GB available

39.93 tok/sEstimated

baichuan-inc/Baichuan-M2-32BQ8

Fits comfortably32GB required · 48GB available

20.71 tok/sEstimated

baichuan-inc/Baichuan-M2-32BQ4

Fits comfortably16GB required · 48GB available

26.63 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 48GB available

35.07 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 48GB available

52.76 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ8

Fits comfortably8GB required · 48GB available

35.76 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ4

Fits comfortably4GB required · 48GB available

46.34 tok/sEstimated

Qwen/Qwen3-1.7B-BaseQ8

Fits comfortably7GB required · 48GB available

34.67 tok/sEstimated

Qwen/Qwen3-1.7B-BaseQ4

Fits comfortably4GB required · 48GB available

50.16 tok/sEstimated

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ8

Fits comfortably20GB required · 48GB available

25.80 tok/sEstimated

unsloth/gpt-oss-20b-unsloth-bnb-4bitQ4

Fits comfortably10GB required · 48GB available

33.83 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ8

Fits comfortably7GB required · 48GB available

39.04 tok/sEstimated

BSC-LT/salamandraTA-7b-instructQ4

Fits comfortably4GB required · 48GB available

47.95 tok/sEstimated

dicta-il/dictalm2.0-instructQ8

Fits comfortably7GB required · 48GB available

37.48 tok/sEstimated

dicta-il/dictalm2.0-instructQ4

Fits comfortably4GB required · 48GB available

54.24 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ8

Fits comfortably30GB required · 48GB available

19.94 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ4

Fits comfortably15GB required · 48GB available

30.20 tok/sEstimated

GSAI-ML/LLaDA-8B-BaseQ8

Fits comfortably8GB required · 48GB available

32.56 tok/sEstimated

GSAI-ML/LLaDA-8B-BaseQ4

Fits comfortably4GB required · 48GB available

47.89 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bitQ8

Fits comfortably30GB required · 48GB available

18.92 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bitQ4

Fits comfortably15GB required · 48GB available

32.12 tok/sEstimated

Qwen/Qwen2-0.5B-InstructQ8

Fits comfortably5GB required · 48GB available

44.38 tok/sEstimated

Qwen/Qwen2-0.5B-InstructQ4

Fits comfortably3GB required · 48GB available

63.99 tok/sEstimated

deepseek-ai/DeepSeek-V3Q8

Fits comfortably7GB required · 48GB available

35.56 tok/sEstimated

deepseek-ai/DeepSeek-V3Q4

Fits comfortably4GB required · 48GB available

50.59 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ8

Fits comfortably5GB required · 48GB available

43.65 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ4

Fits comfortably3GB required · 48GB available

62.94 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Fits comfortably30GB required · 48GB available

20.93 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ4

Fits comfortably15GB required · 48GB available

26.88 tok/sEstimated

Qwen/Qwen3-30B-A3B-Thinking-2507Q8

Fits comfortably30GB required · 48GB available

18.62 tok/sEstimated

Qwen/Qwen3-30B-A3B-Thinking-2507Q4

Fits comfortably15GB required · 48GB available

30.29 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ8

Fits comfortably30GB required · 48GB available

22.47 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ4

Fits comfortably15GB required · 48GB available

30.81 tok/sEstimated

AI-MO/Kimina-Prover-72BQ8

Not supported72GB required · 48GB available

Speed data coming soon

AI-MO/Kimina-Prover-72BQ4

Fits comfortably36GB required · 48GB available

19.50 tok/sEstimated

apple/OpenELM-1_1B-InstructQ8

Fits comfortably1GB required · 48GB available

69.86 tok/sEstimated

apple/OpenELM-1_1B-InstructQ4

Fits comfortably1GB required · 48GB available

109.94 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 48GB available

35.23 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 48GB available

51.05 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q8

Fits comfortably9GB required · 48GB available

36.11 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q4

Fits comfortably5GB required · 48GB available

52.68 tok/sEstimated

Qwen/Qwen2.5-3BQ8

Fits comfortably3GB required · 48GB available

53.68 tok/sEstimated

Qwen/Qwen2.5-3BQ4

Fits comfortably2GB required · 48GB available

65.86 tok/sEstimated

lmsys/vicuna-7b-v1.5Q8

Fits comfortably7GB required · 48GB available

39.55 tok/sEstimated

lmsys/vicuna-7b-v1.5Q4

Fits comfortably4GB required · 48GB available

52.39 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ8

Fits comfortably13GB required · 48GB available

28.49 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ4

Fits comfortably7GB required · 48GB available

42.45 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ8

Not supported80GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-ThinkingQ4

Fits comfortably40GB required · 48GB available

16.91 tok/sEstimated

unsloth/gemma-3-1b-itQ8

Fits comfortably1GB required · 48GB available

69.32 tok/sEstimated

unsloth/gemma-3-1b-itQ4

Fits comfortably1GB required · 48GB available

104.34 tok/sEstimated

bigcode/starcoder2-3bQ8

Fits comfortably3GB required · 48GB available

46.63 tok/sEstimated

bigcode/starcoder2-3bQ4

Fits comfortably2GB required · 48GB available

73.48 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q8

Not supported80GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8Q4

Fits comfortably40GB required · 48GB available

18.80 tok/sEstimated

ibm-granite/granite-docling-258MQ8

Fits comfortably7GB required · 48GB available

38.55 tok/sEstimated

ibm-granite/granite-docling-258MQ4

Fits comfortably4GB required · 48GB available

50.02 tok/sEstimated

skt/kogpt2-base-v2Q8

Fits comfortably7GB required · 48GB available

39.60 tok/sEstimated

skt/kogpt2-base-v2Q4

Fits comfortably4GB required · 48GB available

52.91 tok/sEstimated

google/gemma-3-270m-itQ8

Fits comfortably7GB required · 48GB available

36.81 tok/sEstimated

google/gemma-3-270m-itQ4

Fits comfortably4GB required · 48GB available

54.61 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q8

Fits comfortably4GB required · 48GB available

48.48 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507-FP8Q4

Fits comfortably2GB required · 48GB available

58.36 tok/sEstimated

Qwen/Qwen2.5-32BQ8

Fits comfortably32GB required · 48GB available

21.29 tok/sEstimated

Qwen/Qwen2.5-32BQ4

Fits comfortably16GB required · 48GB available

28.35 tok/sEstimated

parler-tts/parler-tts-large-v1Q8

Fits comfortably7GB required · 48GB available

33.90 tok/sEstimated

parler-tts/parler-tts-large-v1Q4

Fits comfortably4GB required · 48GB available

49.36 tok/sEstimated

EleutherAI/pythia-70m-dedupedQ8

Fits comfortably7GB required · 48GB available

39.68 tok/sEstimated

EleutherAI/pythia-70m-dedupedQ4

Fits comfortably4GB required · 48GB available

50.88 tok/sEstimated

microsoft/VibeVoice-1.5BQ8

Fits comfortably5GB required · 48GB available

39.76 tok/sEstimated

microsoft/VibeVoice-1.5BQ4

Fits comfortably3GB required · 48GB available

65.40 tok/sEstimated

ibm-granite/granite-3.3-2b-instructQ8

Fits comfortably2GB required · 48GB available

54.71 tok/sEstimated

ibm-granite/granite-3.3-2b-instructQ4

Fits comfortably1GB required · 48GB available

89.66 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ8

Not supported72GB required · 48GB available

Speed data coming soon

Qwen/Qwen2.5-72B-InstructQ4

Fits comfortably36GB required · 48GB available

17.76 tok/sEstimated

liuhaotian/llava-v1.5-7bQ8

Fits comfortably7GB required · 48GB available

35.99 tok/sEstimated

liuhaotian/llava-v1.5-7bQ4

Fits comfortably4GB required · 48GB available

52.72 tok/sEstimated

google/gemma-2bQ8

Fits comfortably2GB required · 48GB available

61.20 tok/sEstimated

google/gemma-2bQ4

Fits comfortably1GB required · 48GB available

89.87 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q8

Fits comfortably7GB required · 48GB available

40.13 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q4

Fits comfortably4GB required · 48GB available

57.24 tok/sEstimated

Qwen/Qwen3-235B-A22BQ8

Not supported235GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-235B-A22BQ4

Not supported118GB required · 48GB available

Speed data coming soon

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ8

Fits comfortably8GB required · 48GB available

32.24 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 48GB available

52.71 tok/sEstimated

microsoft/Phi-4-mini-instructQ8

Fits comfortably7GB required · 48GB available

38.39 tok/sEstimated

microsoft/Phi-4-mini-instructQ4

Fits comfortably4GB required · 48GB available

56.04 tok/sEstimated

llamafactory/tiny-random-Llama-3Q8

Fits comfortably7GB required · 48GB available

39.28 tok/sEstimated

llamafactory/tiny-random-Llama-3Q4

Fits comfortably4GB required · 48GB available

53.96 tok/sEstimated

HuggingFaceH4/zephyr-7b-betaQ8

Fits comfortably7GB required · 48GB available

38.17 tok/sEstimated

HuggingFaceH4/zephyr-7b-betaQ4

Fits comfortably4GB required · 48GB available

50.83 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q8

Fits comfortably4GB required · 48GB available

42.15 tok/sEstimated

Qwen/Qwen3-4B-Thinking-2507Q4

Fits comfortably2GB required · 48GB available

70.83 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8Q8

Fits comfortably30GB required · 48GB available

20.18 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8Q4

Fits comfortably15GB required · 48GB available

32.06 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ8

Fits comfortably8GB required · 48GB available

34.21 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ4

Fits comfortably4GB required · 48GB available

50.73 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 48GB available

77.43 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 48GB available

106.57 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ8

Fits comfortably8GB required · 48GB available

31.64 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ4

Fits comfortably4GB required · 48GB available

49.22 tok/sEstimated

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicQ8

Not supported90GB required · 48GB available

Speed data coming soon

RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamicQ4

Fits comfortably45GB required · 48GB available

16.44 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ8

Fits comfortably7GB required · 48GB available

38.33 tok/sEstimated

Qwen/Qwen2.5-Coder-7B-InstructQ4

Fits comfortably4GB required · 48GB available

52.46 tok/sEstimated

numind/NuExtract-1.5Q8

Fits comfortably7GB required · 48GB available

37.04 tok/sEstimated

numind/NuExtract-1.5Q4

Fits comfortably4GB required · 48GB available

53.11 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ8

Fits comfortably7GB required · 48GB available

38.45 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Lite-InstructQ4

Fits comfortably4GB required · 48GB available

52.92 tok/sEstimated

hmellor/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 48GB available

36.87 tok/sEstimated

hmellor/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 48GB available

49.10 tok/sEstimated

huggyllama/llama-7bQ8

Fits comfortably7GB required · 48GB available

39.75 tok/sEstimated

huggyllama/llama-7bQ4

Fits comfortably4GB required · 48GB available

57.68 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324Q8

Fits comfortably7GB required · 48GB available

36.71 tok/sEstimated

deepseek-ai/DeepSeek-V3-0324Q4

Fits comfortably4GB required · 48GB available

48.59 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ8

Fits comfortably7GB required · 48GB available

39.99 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ4

Fits comfortably4GB required · 48GB available

47.55 tok/sEstimated

sshleifer/tiny-gpt2Q8

Fits comfortably7GB required · 48GB available

37.71 tok/sEstimated

sshleifer/tiny-gpt2Q4

Fits comfortably4GB required · 48GB available

48.01 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ8

Fits comfortably8GB required · 48GB available

36.56 tok/sEstimated

meta-llama/Llama-Guard-3-8BQ4

Fits comfortably4GB required · 48GB available

47.17 tok/sEstimated

openai-community/gpt2-xlQ8

Fits comfortably7GB required · 48GB available

35.15 tok/sEstimated

openai-community/gpt2-xlQ4

Fits comfortably4GB required · 48GB available

48.72 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ8

Fits comfortably14GB required · 48GB available

29.60 tok/sEstimated

OpenPipe/Qwen3-14B-InstructQ4

Fits comfortably7GB required · 48GB available

38.01 tok/sEstimated

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16Q8

Not supported70GB required · 48GB available

Speed data coming soon

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16Q4

Fits comfortably35GB required · 48GB available

19.49 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bitQ8

Fits comfortably4GB required · 48GB available

49.37 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bitQ4

Fits comfortably2GB required · 48GB available

61.08 tok/sEstimated

ibm-research/PowerMoE-3bQ8

Fits comfortably3GB required · 48GB available

54.64 tok/sEstimated

ibm-research/PowerMoE-3bQ4

Fits comfortably2GB required · 48GB available

66.27 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bitQ8

Fits comfortably4GB required · 48GB available

43.03 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bitQ4

Fits comfortably2GB required · 48GB available

66.20 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 48GB available

53.61 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 48GB available

74.05 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ8

Fits comfortably4GB required · 48GB available

44.19 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ4

Fits comfortably2GB required · 48GB available

61.10 tok/sEstimated

meta-llama/Llama-3.2-3BQ8

Fits comfortably3GB required · 48GB available

45.89 tok/sEstimated

meta-llama/Llama-3.2-3BQ4

Fits comfortably2GB required · 48GB available

66.24 tok/sEstimated

EleutherAI/gpt-neo-125mQ8

Fits comfortably7GB required · 48GB available

33.65 tok/sEstimated

EleutherAI/gpt-neo-125mQ4

Fits comfortably4GB required · 48GB available

57.65 tok/sEstimated

codellama/CodeLlama-34b-hfQ8

Fits comfortably34GB required · 48GB available

18.87 tok/sEstimated

codellama/CodeLlama-34b-hfQ4

Fits comfortably17GB required · 48GB available

26.85 tok/sEstimated

meta-llama/Llama-Guard-3-1BQ8

Fits comfortably1GB required · 48GB available

73.78 tok/sEstimated

meta-llama/Llama-Guard-3-1BQ4

Fits comfortably1GB required · 48GB available

107.89 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ8

Fits comfortably5GB required · 48GB available

37.98 tok/sEstimated

Qwen/Qwen2-1.5B-InstructQ4

Fits comfortably3GB required · 48GB available

56.36 tok/sEstimated

google/gemma-2-2b-itQ8

Fits comfortably2GB required · 48GB available

60.04 tok/sEstimated

google/gemma-2-2b-itQ4

Fits comfortably1GB required · 48GB available

89.25 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Fits comfortably14GB required · 48GB available

29.95 tok/sEstimated

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 48GB available

41.99 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ8

Fits comfortably32GB required · 48GB available

20.02 tok/sEstimated

unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bitQ4

Fits comfortably16GB required · 48GB available

28.85 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ8

Fits comfortably7GB required · 48GB available

35.71 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ4

Fits comfortably4GB required · 48GB available

50.91 tok/sEstimated

Qwen/Qwen3-4B-BaseQ8

Fits comfortably4GB required · 48GB available

45.87 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 48GB available

67.52 tok/sEstimated

Qwen/Qwen2-7B-InstructQ8

Fits comfortably7GB required · 48GB available

37.46 tok/sEstimated

Qwen/Qwen2-7B-InstructQ4

Fits comfortably4GB required · 48GB available

52.33 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ8

Fits comfortably7GB required · 48GB available

35.13 tok/sEstimated

meta-llama/Llama-2-7b-chat-hfQ4

Fits comfortably4GB required · 48GB available

48.71 tok/sEstimated

Qwen/Qwen3-14B-BaseQ8

Fits comfortably14GB required · 48GB available

29.30 tok/sEstimated

Qwen/Qwen3-14B-BaseQ4

Fits comfortably7GB required · 48GB available

43.20 tok/sEstimated

swiss-ai/Apertus-8B-Instruct-2509Q8

Fits comfortably8GB required · 48GB available

35.29 tok/sEstimated

swiss-ai/Apertus-8B-Instruct-2509Q4

Fits comfortably4GB required · 48GB available

50.47 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ8

Fits comfortably7GB required · 48GB available

35.11 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ4

Fits comfortably4GB required · 48GB available

57.62 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ8

Fits comfortably7GB required · 48GB available

37.45 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ4

Fits comfortably4GB required · 48GB available

54.33 tok/sEstimated

rinna/japanese-gpt-neox-smallQ8

Fits comfortably7GB required · 48GB available

38.71 tok/sEstimated

rinna/japanese-gpt-neox-smallQ4

Fits comfortably4GB required · 48GB available

52.40 tok/sEstimated

Qwen/Qwen2.5-Coder-1.5BQ8

Fits comfortably5GB required · 48GB available

38.27 tok/sEstimated

Qwen/Qwen2.5-Coder-1.5BQ4

Fits comfortably3GB required · 48GB available

55.97 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ8

Fits comfortably8GB required · 48GB available

32.77 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ4

Fits comfortably4GB required · 48GB available

47.95 tok/sEstimated

Qwen/Qwen3-30B-A3BQ8

Fits comfortably30GB required · 48GB available

19.23 tok/sEstimated

Qwen/Qwen3-30B-A3BQ4

Fits comfortably15GB required · 48GB available

27.07 tok/sEstimated

deepseek-ai/DeepSeek-R1Q8

Fits comfortably7GB required · 48GB available

38.37 tok/sEstimated

deepseek-ai/DeepSeek-R1Q4

Fits comfortably4GB required · 48GB available

51.66 tok/sEstimated

microsoft/DialoGPT-smallQ8

Fits comfortably7GB required · 48GB available

37.24 tok/sEstimated

microsoft/DialoGPT-smallQ4

Fits comfortably4GB required · 48GB available

49.67 tok/sEstimated

Qwen/Qwen3-8B-FP8Q8

Fits comfortably8GB required · 48GB available

37.96 tok/sEstimated

Qwen/Qwen3-8B-FP8Q4

Fits comfortably4GB required · 48GB available

49.77 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructQ8

Fits comfortably30GB required · 48GB available

20.69 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructQ4

Fits comfortably15GB required · 48GB available

29.49 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ8

Fits comfortably4GB required · 48GB available

48.54 tok/sEstimated

Qwen/Qwen3-Embedding-4BQ4

Fits comfortably2GB required · 48GB available

60.50 tok/sEstimated

microsoft/Phi-4-multimodal-instructQ8

Fits comfortably7GB required · 48GB available

37.76 tok/sEstimated

microsoft/Phi-4-multimodal-instructQ4

Fits comfortably4GB required · 48GB available

49.08 tok/sEstimated

Qwen/Qwen3-8B-BaseQ8

Fits comfortably8GB required · 48GB available

32.81 tok/sEstimated

Qwen/Qwen3-8B-BaseQ4

Fits comfortably4GB required · 48GB available

51.56 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ8

Fits comfortably6GB required · 48GB available

41.69 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ4

Fits comfortably3GB required · 48GB available

61.15 tok/sEstimated

openai-community/gpt2-mediumQ8

Fits comfortably7GB required · 48GB available

36.50 tok/sEstimated

openai-community/gpt2-mediumQ4

Fits comfortably4GB required · 48GB available

55.91 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 48GB available

35.77 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMQ4

Fits comfortably4GB required · 48GB available

53.60 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ8

Fits comfortably5GB required · 48GB available

38.27 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BQ4

Fits comfortably3GB required · 48GB available

54.32 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ8

Fits comfortably7GB required · 48GB available

39.15 tok/sEstimated

HuggingFaceTB/SmolLM-135MQ4

Fits comfortably4GB required · 48GB available

47.50 tok/sEstimated

unsloth/gpt-oss-20b-BF16Q8

Fits comfortably20GB required · 48GB available

22.50 tok/sEstimated

unsloth/gpt-oss-20b-BF16Q4

Fits comfortably10GB required · 48GB available

33.23 tok/sEstimated

meta-llama/Meta-Llama-3-70B-InstructQ8

Not supported70GB required · 48GB available

Speed data coming soon

meta-llama/Meta-Llama-3-70B-InstructQ4

Fits comfortably35GB required · 48GB available

19.87 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 48GB available

32.75 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 48GB available

45.18 tok/sEstimated

zai-org/GLM-4.5-AirQ8

Fits comfortably7GB required · 48GB available

34.17 tok/sEstimated

zai-org/GLM-4.5-AirQ4

Fits comfortably4GB required · 48GB available

53.08 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.1Q8

Fits comfortably7GB required · 48GB available

40.39 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.1Q4

Fits comfortably4GB required · 48GB available

55.37 tok/sEstimated

LiquidAI/LFM2-1.2BQ8

Fits comfortably2GB required · 48GB available

60.96 tok/sEstimated

LiquidAI/LFM2-1.2BQ4

Fits comfortably1GB required · 48GB available

88.86 tok/sEstimated

mistralai/Mistral-7B-v0.1Q8

Fits comfortably7GB required · 48GB available

33.17 tok/sEstimated

mistralai/Mistral-7B-v0.1Q4

Fits comfortably4GB required · 48GB available

47.80 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Fits comfortably32GB required · 48GB available

21.55 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ4

Fits comfortably16GB required · 48GB available

28.93 tok/sEstimated

deepseek-ai/DeepSeek-R1-0528Q8

Fits comfortably7GB required · 48GB available

38.31 tok/sEstimated

deepseek-ai/DeepSeek-R1-0528Q4

Fits comfortably4GB required · 48GB available

53.96 tok/sEstimated

meta-llama/Llama-3.1-8BQ8

Fits comfortably8GB required · 48GB available

32.08 tok/sEstimated

meta-llama/Llama-3.1-8BQ4

Fits comfortably4GB required · 48GB available

48.55 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q8

Fits comfortably7GB required · 48GB available

39.68 tok/sEstimated

deepseek-ai/DeepSeek-V3.1Q4

Fits comfortably4GB required · 48GB available

55.62 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 48GB available

33.89 tok/sEstimated

microsoft/phi-4Q4

Fits comfortably4GB required · 48GB available

54.66 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ8

Fits comfortably3GB required · 48GB available

54.75 tok/sEstimated

deepseek-ai/deepseek-coder-1.3b-instructQ4

Fits comfortably2GB required · 48GB available

77.31 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 48GB available

38.64 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 48GB available

59.66 tok/sEstimated

MiniMaxAI/MiniMax-M2Q8

Fits comfortably7GB required · 48GB available

34.21 tok/sEstimated

MiniMaxAI/MiniMax-M2Q4

Fits comfortably4GB required · 48GB available

57.62 tok/sEstimated

microsoft/DialoGPT-mediumQ8

Fits comfortably7GB required · 48GB available

39.53 tok/sEstimated

microsoft/DialoGPT-mediumQ4

Fits comfortably4GB required · 48GB available

55.47 tok/sEstimated

zai-org/GLM-4.6-FP8Q8

Fits comfortably7GB required · 48GB available

39.33 tok/sEstimated

zai-org/GLM-4.6-FP8Q4

Fits comfortably4GB required · 48GB available

51.07 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ8

Fits comfortably7GB required · 48GB available

35.09 tok/sEstimated

HuggingFaceTB/SmolLM2-135MQ4

Fits comfortably4GB required · 48GB available

47.66 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ8

Fits comfortably8GB required · 48GB available

38.03 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Llama-8BQ4

Fits comfortably4GB required · 48GB available

48.91 tok/sEstimated

meta-llama/Llama-2-7b-hfQ8

Fits comfortably7GB required · 48GB available

34.26 tok/sEstimated

meta-llama/Llama-2-7b-hfQ4

Fits comfortably4GB required · 48GB available

56.61 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ8

Fits comfortably7GB required · 48GB available

35.82 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 48GB available

56.69 tok/sEstimated

microsoft/phi-2Q8

Fits comfortably7GB required · 48GB available

38.10 tok/sEstimated

microsoft/phi-2Q4

Fits comfortably4GB required · 48GB available

55.64 tok/sEstimated

meta-llama/Llama-3.1-70B-InstructQ8

Not supported70GB required · 48GB available

Speed data coming soon

meta-llama/Llama-3.1-70B-InstructQ4

Fits comfortably35GB required · 48GB available

19.05 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 48GB available

38.03 tok/sEstimated

Qwen/Qwen2.5-0.5BQ4

Fits comfortably3GB required · 48GB available

61.77 tok/sEstimated

Qwen/Qwen3-14BQ8

Fits comfortably14GB required · 48GB available

27.69 tok/sEstimated

Qwen/Qwen3-14BQ4

Fits comfortably7GB required · 48GB available

42.13 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ8

Fits comfortably8GB required · 48GB available

37.07 tok/sEstimated

Qwen/Qwen3-Embedding-8BQ4

Fits comfortably4GB required · 48GB available

49.88 tok/sEstimated

meta-llama/Llama-3.3-70B-InstructQ8

Not supported70GB required · 48GB available

Speed data coming soon

meta-llama/Llama-3.3-70B-InstructQ4

Fits comfortably35GB required · 48GB available

20.65 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ8

Fits comfortably8GB required · 48GB available

34.11 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ4

Fits comfortably4GB required · 48GB available

49.83 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ8

Fits comfortably14GB required · 48GB available

30.99 tok/sEstimated

Qwen/Qwen2.5-14B-InstructQ4

Fits comfortably7GB required · 48GB available

37.31 tok/sEstimated

Qwen/Qwen2.5-1.5BQ8

Fits comfortably5GB required · 48GB available

38.70 tok/sEstimated

Qwen/Qwen2.5-1.5BQ4

Fits comfortably3GB required · 48GB available

64.77 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ8

Fits comfortably4GB required · 48GB available

45.12 tok/sEstimated

kaitchup/Phi-3-mini-4k-instruct-gptq-4bitQ4

Fits comfortably2GB required · 48GB available

61.95 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q8

Fits comfortably20GB required · 48GB available

23.28 tok/sEstimated

mlx-community/gpt-oss-20b-MXFP4-Q8Q4

Fits comfortably10GB required · 48GB available

33.63 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8

Fits comfortably5GB required · 48GB available

44.60 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4

Fits comfortably3GB required · 48GB available

57.07 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ8

Fits comfortably8GB required · 48GB available

34.07 tok/sEstimated

meta-llama/Meta-Llama-3-8B-InstructQ4

Fits comfortably4GB required · 48GB available

54.56 tok/sEstimated

Qwen/Qwen3-Reranker-0.6BQ8

Fits comfortably6GB required · 48GB available

39.72 tok/sEstimated

Qwen/Qwen3-Reranker-0.6BQ4

Fits comfortably3GB required · 48GB available

60.77 tok/sEstimated

rednote-hilab/dots.ocrQ8

Fits comfortably7GB required · 48GB available

35.44 tok/sEstimated

rednote-hilab/dots.ocrQ4

Fits comfortably4GB required · 48GB available

56.78 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 48GB available

50.26 tok/sEstimated

google-t5/t5-3bQ4

Fits comfortably2GB required · 48GB available

68.15 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q8

Fits comfortably30GB required · 48GB available

19.85 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 48GB available

32.18 tok/sEstimated

Qwen/Qwen3-4BQ8

Fits comfortably4GB required · 48GB available

43.91 tok/sEstimated

Qwen/Qwen3-4BQ4

Fits comfortably2GB required · 48GB available

61.08 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 48GB available

39.33 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 48GB available

49.48 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 48GB available

33.44 tok/sEstimated

openai-community/gpt2-largeQ4

Fits comfortably4GB required · 48GB available

57.44 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ8

Fits comfortably7GB required · 48GB available

33.72 tok/sEstimated

microsoft/Phi-3-mini-4k-instructQ4

Fits comfortably4GB required · 48GB available

57.29 tok/sEstimated

allenai/OLMo-2-0425-1BQ8

Fits comfortably1GB required · 48GB available

78.14 tok/sEstimated

allenai/OLMo-2-0425-1BQ4

Fits comfortably1GB required · 48GB available

96.84 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-InstructQ8

Not supported80GB required · 48GB available

Speed data coming soon

Qwen/Qwen3-Next-80B-A3B-InstructQ4

Fits comfortably40GB required · 48GB available

17.58 tok/sEstimated

Qwen/Qwen3-32BQ8

Fits comfortably32GB required · 48GB available

21.06 tok/sEstimated

Qwen/Qwen3-32BQ4

Fits comfortably16GB required · 48GB available

30.14 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 48GB available

41.74 tok/sEstimated

Qwen/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 48GB available

55.99 tok/sEstimated

Qwen/Qwen2.5-7BQ8

Fits comfortably7GB required · 48GB available

39.47 tok/sEstimated

Qwen/Qwen2.5-7BQ4

Fits comfortably4GB required · 48GB available

48.32 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ8

Fits comfortably8GB required · 48GB available

37.56 tok/sEstimated

meta-llama/Meta-Llama-3-8BQ4

Fits comfortably4GB required · 48GB available

47.08 tok/sEstimated

meta-llama/Llama-3.2-1BQ8

Fits comfortably1GB required · 48GB available

73.60 tok/sEstimated

meta-llama/Llama-3.2-1BQ4

Fits comfortably1GB required · 48GB available

106.24 tok/sEstimated

petals-team/StableBeluga2Q8

Fits comfortably7GB required · 48GB available

38.67 tok/sEstimated

petals-team/StableBeluga2Q4

Fits comfortably4GB required · 48GB available

48.36 tok/sEstimated

vikhyatk/moondream2Q8

Fits comfortably7GB required · 48GB available

34.87 tok/sEstimated

vikhyatk/moondream2Q4

Fits comfortably4GB required · 48GB available

52.88 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 48GB available

50.53 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 48GB available

69.37 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ8

Not supported70GB required · 48GB available

Speed data coming soon

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4

Fits comfortably35GB required · 48GB available

19.37 tok/sEstimated

distilbert/distilgpt2Q8

Fits comfortably7GB required · 48GB available

36.09 tok/sEstimated

distilbert/distilgpt2Q4

Fits comfortably4GB required · 48GB available

52.30 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ8

Fits comfortably32GB required · 48GB available

18.49 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQ4

Fits comfortably16GB required · 48GB available

30.24 tok/sEstimated

inference-net/Schematron-3BQ8

Fits comfortably3GB required · 48GB available

53.30 tok/sEstimated

inference-net/Schematron-3BQ4

Fits comfortably2GB required · 48GB available

67.62 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably8GB required · 48GB available

33.55 tok/sEstimated

Qwen/Qwen3-8BQ4

Fits comfortably4GB required · 48GB available

52.50 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q8

Fits comfortably7GB required · 48GB available

39.86 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 48GB available

49.72 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q8

Fits comfortably3GB required · 48GB available

52.35 tok/sEstimated

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4

Fits comfortably2GB required · 48GB available

67.82 tok/sEstimated

bigscience/bloomz-560mQ8

Fits comfortably7GB required · 48GB available

35.08 tok/sEstimated

bigscience/bloomz-560mQ4

Fits comfortably4GB required · 48GB available

53.92 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ8

Fits comfortably3GB required · 48GB available

46.84 tok/sEstimated

Qwen/Qwen2.5-3B-InstructQ4

Fits comfortably2GB required · 48GB available

65.93 tok/sEstimated

openai/gpt-oss-120bQ8

Not supported120GB required · 48GB available

Speed data coming soon

openai/gpt-oss-120bQ4

Not supported60GB required · 48GB available

Speed data coming soon

meta-llama/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 48GB available

77.44 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 48GB available

110.71 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q8

Fits comfortably4GB required · 48GB available

44.69 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507Q4

Fits comfortably2GB required · 48GB available

58.72 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8

Fits comfortably7GB required · 48GB available

38.35 tok/sEstimated

trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4

Fits comfortably4GB required · 48GB available

53.82 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8

Fits comfortably1GB required · 48GB available

74.92 tok/sEstimated

TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4

Fits comfortably1GB required · 48GB available

102.48 tok/sEstimated

facebook/opt-125mQ8

Fits comfortably7GB required · 48GB available

39.23 tok/sEstimated

facebook/opt-125mQ4

Fits comfortably4GB required · 48GB available

53.05 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ8

Fits comfortably5GB required · 48GB available

44.60 tok/sEstimated

Qwen/Qwen2.5-1.5B-InstructQ4

Fits comfortably3GB required · 48GB available

61.61 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ8

Fits comfortably6GB required · 48GB available

36.03 tok/sEstimated

Qwen/Qwen3-Embedding-0.6BQ4

Fits comfortably3GB required · 48GB available

51.61 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 48GB available

68.62 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 48GB available

103.17 tok/sEstimated

openai/gpt-oss-20bQ8

Fits comfortably20GB required · 48GB available

23.52 tok/sEstimated

openai/gpt-oss-20bQ4

Fits comfortably10GB required · 48GB available

33.15 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ8

Fits comfortably34GB required · 48GB available

17.61 tok/sEstimated

dphn/dolphin-2.9.1-yi-1.5-34bQ4

Fits comfortably17GB required · 48GB available

26.45 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably8GB required · 48GB available

35.73 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ4

Fits comfortably4GB required · 48GB available

45.54 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ8

Fits comfortably5GB required · 48GB available

41.23 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 48GB available

56.57 tok/sEstimated

Qwen/Qwen3-0.6BQ8

Fits comfortably6GB required · 48GB available

41.03 tok/sEstimated

Qwen/Qwen3-0.6BQ4

Fits comfortably3GB required · 48GB available

57.24 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ8

Fits comfortably7GB required · 48GB available

37.93 tok/sEstimated

Qwen/Qwen2.5-7B-InstructQ4

Fits comfortably4GB required · 48GB available

48.82 tok/sEstimated

openai-community/gpt2Q8

Fits comfortably7GB required · 48GB available

34.37 tok/sEstimated

openai-community/gpt2Q4

Fits comfortably4GB required · 48GB available

47.49 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

GPU FAQs

Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.

What throughput does an RTX A6000 deliver on 70B Q4?

Operators running dual RTX A6000/RTX 8000 cards inside oobabooga report roughly 6–7 tokens/sec on 70B IQ4 MiQu workloads—adequate for shared inference queues.

Source: Reddit – /r/LocalLLaMA (lnv0ww3)

Why do PCIe lanes limit RTX A6000 performance?

Enthusiasts caution that consumer boards seldom provide x16/x16 for two A6000s; dropping to x8/x4 starves llama.cpp workloads and erodes throughput.

Source: Reddit – /r/LocalLLaMA (mqpg0wp)

Is 48 GB still worth the workstation premium?

Even 2020-era RTX A6000 cards still list near $5,000, and the community expects scalpers to follow new workstation launches—showing how demand stays high.

Source: Reddit – /r/LocalLLaMA (movlqi2)

Should you swap to 48 GB RTX 4090 variants?

Some builders consider 48 GB 4090s, which keep full VRAM for inference but drop to 24 GB for PCIe peer-to-peer training—making the trade-off workload dependent.

Source: Reddit – /r/LocalLLaMA (mqoerg0)

What are the specs and latest prices?

RTX A6000 ships with 48 GB GDDR6 ECC and a 300 W TDP. On 3 Nov 2025 pricing sat at $4,699 (Newegg, in stock), $4,899 (Amazon), and $4,899 (Best Buy, out of stock).

Source: TechPowerUp – NVIDIA RTX A6000 Specs

Alternative GPUs

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.