Loading GPU specifications...
Loading GPU benchmarks...
Quick Answer: Apple M2 Pro offers 32GB VRAM and starts around current market pricing. It delivers approximately 35 tokens/sec on google-t5/t5-3b. It typically draws 25W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
Buy directly on Amazon with fast shipping and reliable customer service.
Rotate out primary variants whenever validation flags an issue.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| google-t5/t5-3b | Q4 | 35.25 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 35.23 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-1b-it | Q4 | 35.20 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/DeepSeek-OCR | Q4 | 34.63 tok/sEstimated Auto-generated benchmark | 2GB |
| facebook/sam3 | Q4 | 34.43 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 34.40 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 34.28 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 33.75 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 33.55 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 33.51 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 33.46 tok/sEstimated Auto-generated benchmark | 1GB |
| tencent/HunyuanOCR | Q4 | 33.22 tok/sEstimated Auto-generated benchmark | 1GB |
| bigcode/starcoder2-3b | Q4 | 32.68 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 32.51 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-research/PowerMoE-3b | Q4 | 32.44 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q4 | 32.35 tok/sEstimated Auto-generated benchmark | 1GB |
| nari-labs/Dia2-2B | Q4 | 32.34 tok/sEstimated Auto-generated benchmark | 2GB |
| google-bert/bert-base-uncased | Q4 | 32.32 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 31.44 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 31.42 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 30.85 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 30.52 tok/sEstimated Auto-generated benchmark | 2GB |
| WeiboAI/VibeThinker-1.5B | Q4 | 30.43 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 30.41 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 30.37 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 30.23 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 30.14 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 29.96 tok/sEstimated Auto-generated benchmark | 2GB |
| google/embeddinggemma-300m | Q4 | 29.90 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 29.43 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 29.43 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 29.36 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q4 | 29.32 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 29.28 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 29.20 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q4 | 29.18 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 29.17 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 29.00 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 28.96 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 28.88 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 28.86 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q4 | 28.85 tok/sEstimated Auto-generated benchmark | 2GB |
| EleutherAI/gpt-neo-125m | Q4 | 28.82 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 28.82 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-8B | Q4 | 28.74 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 28.67 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 28.64 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 28.60 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 28.58 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/VibeVoice-1.5B | Q4 | 28.52 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 28.45 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen-Image-Edit-2509 | Q4 | 28.45 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 28.36 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 28.28 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 28.22 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 28.11 tok/sEstimated Auto-generated benchmark | 3GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 28.08 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 28.01 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 27.97 tok/sEstimated Auto-generated benchmark | 3GB |
| vikhyatk/moondream2 | Q4 | 27.97 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 27.96 tok/sEstimated Auto-generated benchmark | 3GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 27.89 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 27.86 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 27.82 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 27.70 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 27.64 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 27.61 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 27.58 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 27.57 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 27.56 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-1.7B | Q4 | 27.55 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 27.54 tok/sEstimated Auto-generated benchmark | 3GB |
| rinna/japanese-gpt-neox-small | Q4 | 27.51 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 27.45 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 27.44 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 27.44 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 27.36 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 27.33 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 27.32 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 27.21 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 27.16 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 27.11 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 27.05 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 27.04 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 27.01 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 27.01 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 27.01 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 26.99 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q4 | 26.97 tok/sEstimated Auto-generated benchmark | 3GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 26.93 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.1-dev | Q4 | 26.84 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 26.80 tok/sEstimated Auto-generated benchmark | 4GB |
| liuhaotian/llava-v1.5-7b | Q4 | 26.75 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 26.75 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 26.60 tok/sEstimated Auto-generated benchmark | 2GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 26.59 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 26.58 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 26.57 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 26.55 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 26.49 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-4-mini-instruct | Q4 | 26.47 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 26.47 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-1.5B | Q4 | 26.45 tok/sEstimated Auto-generated benchmark | 3GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 26.42 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 26.42 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 26.41 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 26.41 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 26.35 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-medium | Q4 | 26.32 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/Olmo-3-7B-Think | Q4 | 26.28 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 26.23 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 26.22 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-0.5B | Q4 | 26.18 tok/sEstimated Auto-generated benchmark | 3GB |
| parler-tts/parler-tts-large-v1 | Q4 | 26.13 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 26.01 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 25.99 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 25.96 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 25.92 tok/sEstimated Auto-generated benchmark | 2GB |
| petals-team/StableBeluga2 | Q4 | 25.92 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.2-dev | Q4 | 25.86 tok/sEstimated Auto-generated benchmark | 4GB |
| tencent/HunyuanVideo-1.5 | Q4 | 25.80 tok/sEstimated Auto-generated benchmark | 4GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 25.76 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 25.75 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 25.55 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-OCR | Q8 | 25.42 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 25.31 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 25.28 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 25.18 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 25.05 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 25.04 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 24.98 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 24.87 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 24.84 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 24.82 tok/sEstimated Auto-generated benchmark | 3GB |
| google-t5/t5-3b | Q8 | 24.78 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-8B-Base | Q4 | 24.73 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 24.72 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 24.68 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B | Q4 | 24.68 tok/sEstimated Auto-generated benchmark | 2GB |
| Tongyi-MAI/Z-Image-Turbo | Q4 | 24.63 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 24.61 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 24.59 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 24.54 tok/sEstimated Auto-generated benchmark | 4GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 24.49 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 24.49 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 24.49 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 24.43 tok/sEstimated Auto-generated benchmark | 3GB |
| rednote-hilab/dots.ocr | Q4 | 24.41 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 24.38 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 24.33 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 24.32 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 24.26 tok/sEstimated Auto-generated benchmark | 2GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 24.25 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 24.24 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q4 | 24.22 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q8 | 24.20 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 24.18 tok/sEstimated Auto-generated benchmark | 2GB |
| nari-labs/Dia2-2B | Q8 | 24.14 tok/sEstimated Auto-generated benchmark | 3GB |
| WeiboAI/VibeThinker-1.5B | Q8 | 24.02 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q8 | 23.99 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 23.58 tok/sEstimated Auto-generated benchmark | 1GB |
| google-bert/bert-base-uncased | Q8 | 23.49 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q8 | 23.29 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 23.14 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q8 | 23.12 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 22.81 tok/sEstimated Auto-generated benchmark | 3GB |
| bigcode/starcoder2-3b | Q8 | 22.67 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 22.65 tok/sEstimated Auto-generated benchmark | 3GB |
| google/embeddinggemma-300m | Q8 | 22.63 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 22.54 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 22.53 tok/sEstimated Auto-generated benchmark | 1GB |
| facebook/sam3 | Q8 | 22.45 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-research/PowerMoE-3b | Q8 | 22.32 tok/sEstimated Auto-generated benchmark | 3GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 22.04 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 22.03 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B-Base | Q4 | 22.01 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 21.90 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 21.87 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q4 | 21.75 tok/sEstimated Auto-generated benchmark | 8GB |
| allenai/OLMo-2-0425-1B | Q8 | 21.75 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q8 | 21.73 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 21.53 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-3B | Q8 | 21.51 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 21.46 tok/sEstimated Auto-generated benchmark | 3GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 21.37 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 21.24 tok/sEstimated Auto-generated benchmark | 1GB |
| tencent/HunyuanOCR | Q8 | 21.02 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gemma-3-1b-it | Q8 | 20.85 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q8 | 20.82 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 20.80 tok/sEstimated Auto-generated benchmark | 8GB |
| google/gemma-2b | Q8 | 20.77 tok/sEstimated Auto-generated benchmark | 2GB |
| openai-community/gpt2-large | Q8 | 20.58 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 20.55 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 20.55 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B | Q4 | 20.48 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 20.46 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 20.44 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-2-7b-hf | Q8 | 20.43 tok/sEstimated Auto-generated benchmark | 7GB |
| EssentialAI/rnj-1 | Q4 | 20.43 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-4-mini-instruct | Q8 | 20.41 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 20.40 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 20.39 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 20.36 tok/sEstimated Auto-generated benchmark | 6GB |
| openai-community/gpt2-medium | Q8 | 20.33 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 20.30 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 20.22 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 20.17 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 20.17 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 20.14 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 20.06 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-1.5B | Q8 | 20.05 tok/sEstimated Auto-generated benchmark | 5GB |
| numind/NuExtract-1.5 | Q8 | 20.03 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 20.01 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 20.00 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 19.92 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 19.84 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 19.84 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 19.81 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 19.76 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 19.71 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/VibeVoice-1.5B | Q8 | 19.57 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 19.54 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q8 | 19.51 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 19.44 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 19.43 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 19.42 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.6-FP8 | Q8 | 19.39 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 19.39 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen-Image-Edit-2509 | Q8 | 19.38 tok/sEstimated Auto-generated benchmark | 8GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 19.32 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 19.28 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 19.28 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 19.27 tok/sEstimated Auto-generated benchmark | 5GB |
| openai-community/gpt2-xl | Q8 | 19.24 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 19.23 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 19.22 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 19.22 tok/sEstimated Auto-generated benchmark | 9GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 19.21 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 19.10 tok/sEstimated Auto-generated benchmark | 9GB |
| rednote-hilab/dots.ocr | Q8 | 19.09 tok/sEstimated Auto-generated benchmark | 7GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 19.08 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 19.06 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 19.04 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 19.00 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 18.92 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Base | Q8 | 18.92 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 18.91 tok/sEstimated Auto-generated benchmark | 9GB |
| huggyllama/llama-7b | Q8 | 18.90 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 18.87 tok/sEstimated Auto-generated benchmark | 7GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 18.82 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 18.82 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-2-9b-it | Q4 | 18.81 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 18.77 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 18.75 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 18.75 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-8B | Q8 | 18.74 tok/sEstimated Auto-generated benchmark | 9GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 18.72 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 18.71 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 18.67 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 18.64 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 18.63 tok/sEstimated Auto-generated benchmark | 7GB |
| Tongyi-MAI/Z-Image-Turbo | Q8 | 18.61 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 18.60 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 18.60 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 18.55 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 18.52 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 18.52 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 18.49 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 18.49 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/DialoGPT-small | Q8 | 18.48 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 18.36 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 18.33 tok/sEstimated Auto-generated benchmark | 7GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 18.33 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 18.31 tok/sEstimated Auto-generated benchmark | 9GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 18.29 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 18.25 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 18.25 tok/sEstimated Auto-generated benchmark | 9GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 18.20 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 18.17 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-7B | Q8 | 18.15 tok/sEstimated Auto-generated benchmark | 7GB |
| allenai/Olmo-3-7B-Think | Q8 | 18.15 tok/sEstimated Auto-generated benchmark | 8GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 18.11 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 18.10 tok/sEstimated Auto-generated benchmark | 9GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 18.09 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 18.07 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 18.03 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 17.98 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 17.96 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 17.93 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 17.89 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-3.1-8B | Q8 | 17.85 tok/sEstimated Auto-generated benchmark | 9GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 17.85 tok/sEstimated Auto-generated benchmark | 5GB |
| black-forest-labs/FLUX.2-dev | Q8 | 17.84 tok/sEstimated Auto-generated benchmark | 8GB |
| EleutherAI/gpt-neo-125m | Q8 | 17.81 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 17.78 tok/sEstimated Auto-generated benchmark | 5GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 17.76 tok/sEstimated Auto-generated benchmark | 9GB |
| black-forest-labs/FLUX.1-dev | Q8 | 17.71 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 17.70 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 17.70 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 17.65 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 17.60 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 17.57 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 17.53 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 17.49 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q8 | 17.42 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 17.41 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-0.5B | Q8 | 17.37 tok/sEstimated Auto-generated benchmark | 5GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 17.37 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-0.6B | Q8 | 17.36 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 17.34 tok/sEstimated Auto-generated benchmark | 7GB |
| dicta-il/dictalm2.0-instruct | Q8 | 17.27 tok/sEstimated Auto-generated benchmark | 7GB |
| tencent/HunyuanVideo-1.5 | Q8 | 17.26 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 17.23 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 17.21 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B | Q8 | 17.14 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 17.11 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 17.03 tok/sEstimated Auto-generated benchmark | 7GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 16.94 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 16.92 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 16.22 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 16.21 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 15.60 tok/sEstimated Auto-generated benchmark | 15GB |
| openai/gpt-oss-20b | Q4 | 15.55 tok/sEstimated Auto-generated benchmark | 10GB |
| EssentialAI/rnj-1 | Q8 | 15.41 tok/sEstimated Auto-generated benchmark | 10GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q8 | 15.37 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 15.33 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q4 | 15.06 tok/sEstimated Auto-generated benchmark | 15GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 14.98 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 14.87 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 14.86 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 14.81 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-14B | Q8 | 14.67 tok/sEstimated Auto-generated benchmark | 14GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 14.64 tok/sEstimated Auto-generated benchmark | 10GB |
| openai/gpt-oss-safeguard-20b | Q4 | 14.53 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2.5-14B | Q8 | 14.31 tok/sEstimated Auto-generated benchmark | 14GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 14.28 tok/sEstimated Auto-generated benchmark | 11GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 14.16 tok/sEstimated Auto-generated benchmark | 13GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 14.09 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 14.05 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 13.97 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-2-27b-it | Q4 | 13.94 tok/sEstimated Auto-generated benchmark | 14GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 13.92 tok/sEstimated Auto-generated benchmark | 10GB |
| google/gemma-2-9b-it | Q8 | 13.79 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 13.77 tok/sEstimated Auto-generated benchmark | 15GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 13.63 tok/sEstimated Auto-generated benchmark | 14GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 13.59 tok/sEstimated Auto-generated benchmark | 10GB |
| LiquidAI/LFM2-1.2B | FP16 | 13.44 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 13.37 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-14B-Base | Q8 | 13.34 tok/sEstimated Auto-generated benchmark | 14GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | FP16 | 13.27 tok/sEstimated Auto-generated benchmark | 6GB |
| google-t5/t5-3b | FP16 | 13.26 tok/sEstimated Auto-generated benchmark | 6GB |
| google/gemma-2-2b-it | FP16 | 13.25 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-1B | FP16 | 13.16 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 13.14 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 13.07 tok/sEstimated Auto-generated benchmark | 14GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | FP16 | 13.02 tok/sEstimated Auto-generated benchmark | 2GB |
| WeiboAI/VibeThinker-1.5B | FP16 | 12.97 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B-Instruct | FP16 | 12.96 tok/sEstimated Auto-generated benchmark | 6GB |
| apple/OpenELM-1_1B-Instruct | FP16 | 12.75 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-2b-instruct | FP16 | 12.73 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 12.64 tok/sEstimated Auto-generated benchmark | 6GB |
| nari-labs/Dia2-2B | FP16 | 12.57 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-3B | FP16 | 12.43 tok/sEstimated Auto-generated benchmark | 6GB |
| inference-net/Schematron-3B | FP16 | 12.41 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-OCR | FP16 | 12.28 tok/sEstimated Auto-generated benchmark | 7GB |
| google-bert/bert-base-uncased | FP16 | 12.27 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | FP16 | 12.16 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gemma-3-1b-it | FP16 | 12.06 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | FP16 | 11.91 tok/sEstimated Auto-generated benchmark | 6GB |
| meta-llama/Llama-Guard-3-1B | FP16 | 11.90 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | FP16 | 11.84 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | FP16 | 11.67 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-1b-it | FP16 | 11.67 tok/sEstimated Auto-generated benchmark | 2GB |
| google/embeddinggemma-300m | FP16 | 11.67 tok/sEstimated Auto-generated benchmark | 1GB |
| tencent/HunyuanOCR | FP16 | 11.49 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/Llama-3.2-3B-Instruct | FP16 | 11.40 tok/sEstimated Auto-generated benchmark | 6GB |
| bigcode/starcoder2-3b | FP16 | 11.39 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | FP16 | 11.27 tok/sEstimated Auto-generated benchmark | 6GB |
| meta-llama/Llama-3.2-3B | FP16 | 11.24 tok/sEstimated Auto-generated benchmark | 6GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | FP16 | 11.18 tok/sEstimated Auto-generated benchmark | 9GB |
| liuhaotian/llava-v1.5-7b | FP16 | 11.17 tok/sEstimated Auto-generated benchmark | 15GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | FP16 | 11.15 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | 11.15 tok/sEstimated Auto-generated benchmark | 23GB |
| Qwen/Qwen2-7B-Instruct | FP16 | 11.13 tok/sEstimated Auto-generated benchmark | 15GB |
| distilbert/distilgpt2 | FP16 | 11.10 tok/sEstimated Auto-generated benchmark | 15GB |
| allenai/Olmo-3-7B-Think | FP16 | 11.09 tok/sEstimated Auto-generated benchmark | 16GB |
| Tongyi-MAI/Z-Image-Turbo | FP16 | 11.07 tok/sEstimated Auto-generated benchmark | 16GB |
| google/gemma-2b | FP16 | 11.05 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-27b-it | Q8 | 11.04 tok/sEstimated Auto-generated benchmark | 28GB |
| facebook/sam3 | FP16 | 11.03 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-1.5B-Instruct | FP16 | 10.99 tok/sEstimated Auto-generated benchmark | 11GB |
| deepseek-ai/DeepSeek-R1-0528 | FP16 | 10.99 tok/sEstimated Auto-generated benchmark | 15GB |
| vikhyatk/moondream2 | FP16 | 10.98 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | FP16 | 10.98 tok/sEstimated Auto-generated benchmark | 9GB |
| HuggingFaceH4/zephyr-7b-beta | FP16 | 10.96 tok/sEstimated Auto-generated benchmark | 15GB |
| GSAI-ML/LLaDA-8B-Base | FP16 | 10.95 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | 10.94 tok/sEstimated Auto-generated benchmark | 31GB |
| mistralai/Mistral-7B-Instruct-v0.2 | FP16 | 10.93 tok/sEstimated Auto-generated benchmark | 15GB |
| zai-org/GLM-4.5-Air | FP16 | 10.90 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3-mini-128k-instruct | FP16 | 10.90 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-medium | FP16 | 10.90 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-1.7B | FP16 | 10.87 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B-Base | FP16 | 10.86 tok/sEstimated Auto-generated benchmark | 17GB |
| facebook/opt-125m | FP16 | 10.84 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-3-270m-it | FP16 | 10.83 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM2-135M | FP16 | 10.83 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Reranker-0.6B | FP16 | 10.77 tok/sEstimated Auto-generated benchmark | 13GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 10.75 tok/sEstimated Auto-generated benchmark | 17GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | FP16 | 10.75 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | 10.72 tok/sEstimated Auto-generated benchmark | 31GB |
| Gensyn/Qwen2.5-0.5B-Instruct | FP16 | 10.71 tok/sEstimated Auto-generated benchmark | 11GB |
| swiss-ai/Apertus-8B-Instruct-2509 | FP16 | 10.69 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Meta-Llama-3-8B | FP16 | 10.68 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-R1 | FP16 | 10.67 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-0.5B | FP16 | 10.66 tok/sEstimated Auto-generated benchmark | 11GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | 10.64 tok/sEstimated Auto-generated benchmark | 31GB |
| deepseek-ai/DeepSeek-V3.1 | FP16 | 10.64 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | 10.62 tok/sEstimated Auto-generated benchmark | 31GB |
| openai/gpt-oss-20b | Q8 | 10.60 tok/sEstimated Auto-generated benchmark | 20GB |
| bigscience/bloomz-560m | FP16 | 10.59 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | FP16 | 10.58 tok/sEstimated Auto-generated benchmark | 9GB |
| skt/kogpt2-base-v2 | FP16 | 10.58 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-1.5B | FP16 | 10.58 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-4B-Thinking-2507 | FP16 | 10.57 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-Coder-1.5B | FP16 | 10.55 tok/sEstimated Auto-generated benchmark | 11GB |
| IlyaGusev/saiga_llama3_8b | FP16 | 10.55 tok/sEstimated Auto-generated benchmark | 17GB |
| numind/NuExtract-1.5 | FP16 | 10.53 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | FP16 | 10.52 tok/sEstimated Auto-generated benchmark | 15GB |
| EleutherAI/pythia-70m-deduped | FP16 | 10.52 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | FP16 | 10.51 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/VibeVoice-1.5B | FP16 | 10.51 tok/sEstimated Auto-generated benchmark | 11GB |
| MiniMaxAI/MiniMax-M2 | FP16 | 10.48 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Embedding-4B | FP16 | 10.44 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-7B-Instruct-v0.1 | FP16 | 10.43 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM-135M | FP16 | 10.42 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B | FP16 | 10.41 tok/sEstimated Auto-generated benchmark | 9GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | 10.37 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | FP16 | 10.36 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | 10.36 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen2.5-1.5B-Instruct | FP16 | 10.34 tok/sEstimated Auto-generated benchmark | 11GB |
| petals-team/StableBeluga2 | FP16 | 10.33 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-medium | FP16 | 10.31 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 10.31 tok/sEstimated Auto-generated benchmark | 8GB |
| moonshotai/Kimi-K2-Thinking | Q4 | 10.31 tok/sEstimated Auto-generated benchmark | 489GB |
| 01-ai/Yi-1.5-34B-Chat | Q4 | 10.30 tok/sEstimated Auto-generated benchmark | 18GB |
| Qwen/Qwen3-32B | Q4 | 10.29 tok/sEstimated Auto-generated benchmark | 16GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | 10.28 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-V2.5 | Q4 | 10.27 tok/sEstimated Auto-generated benchmark | 328GB |
| microsoft/Phi-4-mini-instruct | FP16 | 10.26 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-7B | FP16 | 10.26 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-0.5B | FP16 | 10.25 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 10.24 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-Math-1.5B | FP16 | 10.24 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-30B-A3B | Q8 | 10.21 tok/sEstimated Auto-generated benchmark | 31GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 10.17 tok/sEstimated Auto-generated benchmark | 20GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | FP16 | 10.16 tok/sEstimated Auto-generated benchmark | 15GB |
| parler-tts/parler-tts-large-v1 | FP16 | 10.14 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-32B | Q4 | 10.13 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-3.1-8B | FP16 | 10.12 tok/sEstimated Auto-generated benchmark | 17GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | FP16 | 10.12 tok/sEstimated Auto-generated benchmark | 11GB |
| BSC-LT/salamandraTA-7b-instruct | FP16 | 10.12 tok/sEstimated Auto-generated benchmark | 15GB |
| EleutherAI/gpt-neo-125m | FP16 | 10.10 tok/sEstimated Auto-generated benchmark | 15GB |
| black-forest-labs/FLUX.2-dev | FP16 | 10.07 tok/sEstimated Auto-generated benchmark | 16GB |
| microsoft/Phi-3-mini-4k-instruct | FP16 | 10.06 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2 | FP16 | 10.05 tok/sEstimated Auto-generated benchmark | 15GB |
| codellama/CodeLlama-34b-hf | Q4 | 10.04 tok/sEstimated Auto-generated benchmark | 17GB |
| rinna/japanese-gpt-neox-small | FP16 | 10.03 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-0.6B-Base | FP16 | 10.03 tok/sEstimated Auto-generated benchmark | 13GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | 10.03 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-2-7b-hf | FP16 | 10.00 tok/sEstimated Auto-generated benchmark | 15GB |
| lmsys/vicuna-7b-v1.5 | FP16 | 9.99 tok/sEstimated Auto-generated benchmark | 15GB |
| huggyllama/llama-7b | FP16 | 9.98 tok/sEstimated Auto-generated benchmark | 15GB |
| sshleifer/tiny-gpt2 | FP16 | 9.96 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | 9.96 tok/sEstimated Auto-generated benchmark | 31GB |
| deepseek-ai/DeepSeek-V3 | FP16 | 9.96 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-xl | FP16 | 9.95 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Instruct-2507 | FP16 | 9.93 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3.5-vision-instruct | FP16 | 9.92 tok/sEstimated Auto-generated benchmark | 15GB |
| GSAI-ML/LLaDA-8B-Instruct | FP16 | 9.91 tok/sEstimated Auto-generated benchmark | 17GB |
| openai-community/gpt2-large | FP16 | 9.90 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-small | FP16 | 9.90 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-1.7B-Base | FP16 | 9.89 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3-0324 | FP16 | 9.89 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | FP16 | 9.87 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | FP16 | 9.86 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-0.5B-Instruct | FP16 | 9.86 tok/sEstimated Auto-generated benchmark | 11GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 9.84 tok/sEstimated Auto-generated benchmark | 20GB |
| tencent/HunyuanVideo-1.5 | FP16 | 9.78 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 9.78 tok/sEstimated Auto-generated benchmark | 20GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | 9.78 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-Embedding-0.6B | FP16 | 9.76 tok/sEstimated Auto-generated benchmark | 13GB |
| ibm-granite/granite-3.3-8b-instruct | FP16 | 9.76 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Embedding-8B | FP16 | 9.75 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-0.6B | FP16 | 9.74 tok/sEstimated Auto-generated benchmark | 13GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | FP16 | 9.72 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | 9.69 tok/sEstimated Auto-generated benchmark | 17GB |
| zai-org/GLM-4.6-FP8 | FP16 | 9.68 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 9.65 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2-0.5B-Instruct | FP16 | 9.63 tok/sEstimated Auto-generated benchmark | 11GB |
| dicta-il/dictalm2.0-instruct | FP16 | 9.60 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Base | FP16 | 9.59 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | FP16 | 9.58 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 9.55 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 9.54 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen-Image-Edit-2509 | FP16 | 9.53 tok/sEstimated Auto-generated benchmark | 16GB |
| openai/gpt-oss-safeguard-20b | Q8 | 9.53 tok/sEstimated Auto-generated benchmark | 22GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 9.50 tok/sEstimated Auto-generated benchmark | 15GB |
| black-forest-labs/FLUX.1-dev | FP16 | 9.49 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-Guard-3-8B | FP16 | 9.48 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | FP16 | 9.47 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | 9.45 tok/sEstimated Auto-generated benchmark | 31GB |
| hmellor/tiny-random-LlamaForCausalLM | FP16 | 9.45 tok/sEstimated Auto-generated benchmark | 15GB |
| llamafactory/tiny-random-Llama-3 | FP16 | 9.44 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | FP16 | 9.41 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B | FP16 | 9.39 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-4-multimodal-instruct | FP16 | 9.38 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | 9.36 tok/sEstimated Auto-generated benchmark | 31GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 9.36 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/phi-2 | FP16 | 9.35 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/phi-4 | FP16 | 9.34 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 9.32 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-2-7b-chat-hf | FP16 | 9.32 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | 9.32 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-8B-FP8 | FP16 | 9.31 tok/sEstimated Auto-generated benchmark | 17GB |
| rednote-hilab/dots.ocr | FP16 | 9.30 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 9.28 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | 9.26 tok/sEstimated Auto-generated benchmark | 15GB |
| ibm-granite/granite-docling-258M | FP16 | 9.25 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 9.22 tok/sEstimated Auto-generated benchmark | 16GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | FP16 | 9.21 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-7B-v0.1 | FP16 | 9.21 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 9.07 tok/sEstimated Auto-generated benchmark | 16GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | 9.05 tok/sEstimated Auto-generated benchmark | 34GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q4 | 9.04 tok/sEstimated Auto-generated benchmark | 25GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 8.92 tok/sEstimated Auto-generated benchmark | 34GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 8.82 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/QwQ-32B-Preview | Q4 | 8.72 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | 8.69 tok/sEstimated Auto-generated benchmark | 17GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 8.67 tok/sEstimated Auto-generated benchmark | 16GB |
| mistralai/Ministral-3-14B-Instruct-2512 | FP16 | 8.41 tok/sEstimated Auto-generated benchmark | 32GB |
| EssentialAI/rnj-1 | FP16 | 8.40 tok/sEstimated Auto-generated benchmark | 19GB |
| google/gemma-2-9b-it | FP16 | 8.19 tok/sEstimated Auto-generated benchmark | 20GB |
| OpenPipe/Qwen3-14B-Instruct | FP16 | 8.16 tok/sEstimated Auto-generated benchmark | 29GB |
| meta-llama/Llama-2-13b-chat-hf | FP16 | 8.02 tok/sEstimated Auto-generated benchmark | 27GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 7.93 tok/sEstimated Auto-generated benchmark | 30GB |
| Qwen/Qwen2.5-14B | FP16 | 7.93 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen3-14B | FP16 | 7.82 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen3-14B-Base | FP16 | 7.56 tok/sEstimated Auto-generated benchmark | 29GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 7.54 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-3-medium-128k-instruct | FP16 | 7.39 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 7.37 tok/sEstimated Auto-generated benchmark | 29GB |
| NousResearch/Hermes-3-Llama-3.1-8B | FP16 | 7.36 tok/sEstimated Auto-generated benchmark | 17GB |
| ai-forever/ruGPT-3.5-13B | FP16 | 7.32 tok/sEstimated Auto-generated benchmark | 27GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | 7.22 tok/sEstimated Auto-generated benchmark | 69GB |
| baichuan-inc/Baichuan-M2-32B | Q8 | 7.19 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 7.17 tok/sEstimated Auto-generated benchmark | 33GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | 7.14 tok/sEstimated Auto-generated benchmark | 68GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | FP16 | 7.08 tok/sEstimated Auto-generated benchmark | 19GB |
| Qwen/Qwen2.5-32B | Q8 | 7.00 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/QwQ-32B-Preview | Q8 | 6.94 tok/sEstimated Auto-generated benchmark | 34GB |
| 01-ai/Yi-1.5-34B-Chat | Q8 | 6.91 tok/sEstimated Auto-generated benchmark | 35GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | 6.75 tok/sEstimated Auto-generated benchmark | 68GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 6.73 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | 6.72 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 6.53 tok/sEstimated Auto-generated benchmark | 34GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | 6.52 tok/sEstimated Auto-generated benchmark | 68GB |
| deepseek-ai/DeepSeek-V2.5 | Q8 | 6.52 tok/sEstimated Auto-generated benchmark | 656GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q8 | 6.41 tok/sEstimated Auto-generated benchmark | 50GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | 6.41 tok/sEstimated Auto-generated benchmark | 33GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | 6.38 tok/sEstimated Auto-generated benchmark | 35GB |
| moonshotai/Kimi-K2-Thinking | Q8 | 6.37 tok/sEstimated Auto-generated benchmark | 978GB |
| codellama/CodeLlama-34b-hf | Q8 | 6.34 tok/sEstimated Auto-generated benchmark | 35GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | 6.28 tok/sEstimated Auto-generated benchmark | 34GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | 6.15 tok/sEstimated Auto-generated benchmark | 33GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | FP16 | 6.11 tok/sEstimated Auto-generated benchmark | 61GB |
| mistralai/Mistral-Small-Instruct-2409 | FP16 | 6.11 tok/sEstimated Auto-generated benchmark | 46GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | 6.11 tok/sEstimated Auto-generated benchmark | 61GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | FP16 | 6.09 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-32B | Q8 | 5.99 tok/sEstimated Auto-generated benchmark | 33GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 5.98 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | FP16 | 5.92 tok/sEstimated Auto-generated benchmark | 61GB |
| openai/gpt-oss-20b | FP16 | 5.90 tok/sEstimated Auto-generated benchmark | 41GB |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | 5.86 tok/sEstimated Auto-generated benchmark | 60GB |
| Qwen/Qwen3-30B-A3B | FP16 | 5.84 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 5.83 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | 5.79 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | FP16 | 5.70 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | 5.70 tok/sEstimated Auto-generated benchmark | 39GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | FP16 | 5.67 tok/sEstimated Auto-generated benchmark | 61GB |
| google/gemma-2-27b-it | FP16 | 5.66 tok/sEstimated Auto-generated benchmark | 56GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | 5.63 tok/sEstimated Auto-generated benchmark | 39GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 5.57 tok/sEstimated Auto-generated benchmark | 34GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | 5.53 tok/sEstimated Auto-generated benchmark | 34GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | 5.50 tok/sEstimated Auto-generated benchmark | 34GB |
| unsloth/gpt-oss-20b-BF16 | FP16 | 5.49 tok/sEstimated Auto-generated benchmark | 41GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | 5.44 tok/sEstimated Auto-generated benchmark | 61GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | 5.41 tok/sEstimated Auto-generated benchmark | 44GB |
| openai/gpt-oss-safeguard-20b | FP16 | 5.38 tok/sEstimated Auto-generated benchmark | 44GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | FP16 | 5.28 tok/sEstimated Auto-generated benchmark | 41GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | FP16 | 5.26 tok/sEstimated Auto-generated benchmark | 41GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 5.17 tok/sEstimated Auto-generated benchmark | 36GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 5.12 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | FP16 | 5.09 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | 5.06 tok/sEstimated Auto-generated benchmark | 36GB |
| openai/gpt-oss-120b | Q4 | 5.04 tok/sEstimated Auto-generated benchmark | 59GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | 5.02 tok/sEstimated Auto-generated benchmark | 39GB |
| AI-MO/Kimina-Prover-72B | Q4 | 5.00 tok/sEstimated Auto-generated benchmark | 35GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | 4.87 tok/sEstimated Auto-generated benchmark | 138GB |
| deepseek-ai/DeepSeek-Math-V2 | Q4 | 4.25 tok/sEstimated Auto-generated benchmark | 383GB |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | 4.10 tok/sEstimated Auto-generated benchmark | 120GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 4.07 tok/sEstimated Auto-generated benchmark | 70GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | 4.00 tok/sEstimated Auto-generated benchmark | 69GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 3.94 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | 3.94 tok/sEstimated Auto-generated benchmark | 78GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | 3.93 tok/sEstimated Auto-generated benchmark | 115GB |
| 01-ai/Yi-1.5-34B-Chat | FP16 | 3.91 tok/sEstimated Auto-generated benchmark | 70GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | FP16 | 3.90 tok/sEstimated Auto-generated benchmark | 70GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | FP16 | 3.89 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 3.89 tok/sEstimated Auto-generated benchmark | 137GB |
| moonshotai/Kimi-K2-Thinking | FP16 | 3.86 tok/sEstimated Auto-generated benchmark | 1956GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | 3.84 tok/sEstimated Auto-generated benchmark | 71GB |
| baichuan-inc/Baichuan-M2-32B | FP16 | 3.81 tok/sEstimated Auto-generated benchmark | 66GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | FP16 | 3.79 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 3.77 tok/sEstimated Auto-generated benchmark | 69GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 3.73 tok/sEstimated Auto-generated benchmark | 137GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | FP16 | 3.71 tok/sEstimated Auto-generated benchmark | 137GB |
| Qwen/Qwen2.5-32B | FP16 | 3.71 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 3.69 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | 3.67 tok/sEstimated Auto-generated benchmark | 78GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | 3.64 tok/sEstimated Auto-generated benchmark | 88GB |
| meta-llama/Meta-Llama-3-70B-Instruct | FP16 | 3.62 tok/sEstimated Auto-generated benchmark | 137GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | 3.60 tok/sEstimated Auto-generated benchmark | 69GB |
| deepseek-ai/DeepSeek-V2.5 | FP16 | 3.58 tok/sEstimated Auto-generated benchmark | 1312GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | 3.55 tok/sEstimated Auto-generated benchmark | 78GB |
| MiniMaxAI/MiniMax-M1-40k | Q4 | 3.54 tok/sEstimated Auto-generated benchmark | 255GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | 3.53 tok/sEstimated Auto-generated benchmark | 78GB |
| codellama/CodeLlama-34b-hf | FP16 | 3.52 tok/sEstimated Auto-generated benchmark | 70GB |
| openai/gpt-oss-120b | Q8 | 3.49 tok/sEstimated Auto-generated benchmark | 117GB |
| deepseek-ai/deepseek-coder-33b-instruct | FP16 | 3.48 tok/sEstimated Auto-generated benchmark | 68GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | FP16 | 3.45 tok/sEstimated Auto-generated benchmark | 101GB |
| AI-MO/Kimina-Prover-72B | Q8 | 3.45 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/QwQ-32B-Preview | FP16 | 3.40 tok/sEstimated Auto-generated benchmark | 67GB |
| Qwen/Qwen3-32B | FP16 | 3.39 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | FP16 | 3.39 tok/sEstimated Auto-generated benchmark | 67GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 3.39 tok/sEstimated Auto-generated benchmark | 71GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 3.38 tok/sEstimated Auto-generated benchmark | 67GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | FP16 | 3.29 tok/sEstimated Auto-generated benchmark | 137GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | 3.04 tok/sEstimated Auto-generated benchmark | 378GB |
| MiniMaxAI/MiniMax-VL-01 | Q4 | 3.02 tok/sEstimated Auto-generated benchmark | 256GB |
| Qwen/Qwen3-235B-A22B | Q4 | 3.02 tok/sEstimated Auto-generated benchmark | 115GB |
| deepseek-ai/DeepSeek-Math-V2 | Q8 | 2.93 tok/sEstimated Auto-generated benchmark | 766GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | 2.81 tok/sEstimated Auto-generated benchmark | 231GB |
| MiniMaxAI/MiniMax-M1-40k | Q8 | 2.42 tok/sEstimated Auto-generated benchmark | 510GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | FP16 | 2.39 tok/sEstimated Auto-generated benchmark | 275GB |
| MiniMaxAI/MiniMax-VL-01 | Q8 | 2.34 tok/sEstimated Auto-generated benchmark | 511GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | FP16 | 2.23 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 2.20 tok/sEstimated Auto-generated benchmark | 141GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | FP16 | 2.19 tok/sEstimated Auto-generated benchmark | 176GB |
| openai/gpt-oss-120b | FP16 | 2.18 tok/sEstimated Auto-generated benchmark | 235GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 2.15 tok/sEstimated Auto-generated benchmark | 142GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | FP16 | 2.13 tok/sEstimated Auto-generated benchmark | 156GB |
| mistralai/Mistral-Large-Instruct-2411 | FP16 | 2.12 tok/sEstimated Auto-generated benchmark | 240GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | 2.11 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen3-235B-A22B | Q8 | 2.10 tok/sEstimated Auto-generated benchmark | 230GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | FP16 | 2.10 tok/sEstimated Auto-generated benchmark | 138GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | 2.06 tok/sEstimated Auto-generated benchmark | 755GB |
| NousResearch/Hermes-3-Llama-3.1-70B | FP16 | 2.00 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | FP16 | 1.99 tok/sEstimated Auto-generated benchmark | 156GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 1.93 tok/sEstimated Auto-generated benchmark | 138GB |
| AI-MO/Kimina-Prover-72B | FP16 | 1.90 tok/sEstimated Auto-generated benchmark | 141GB |
| Qwen/Qwen2.5-Math-72B-Instruct | FP16 | 1.89 tok/sEstimated Auto-generated benchmark | 142GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 1.85 tok/sEstimated Auto-generated benchmark | 138GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | FP16 | 1.68 tok/sEstimated Auto-generated benchmark | 461GB |
| deepseek-ai/DeepSeek-Math-V2 | FP16 | 1.58 tok/sEstimated Auto-generated benchmark | 1532GB |
| Qwen/Qwen3-235B-A22B | FP16 | 1.27 tok/sEstimated Auto-generated benchmark | 460GB |
| MiniMaxAI/MiniMax-M1-40k | FP16 | 1.21 tok/sEstimated Auto-generated benchmark | 1020GB |
| MiniMaxAI/MiniMax-VL-01 | FP16 | 1.15 tok/sEstimated Auto-generated benchmark | 1021GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | 1.12 tok/sEstimated Auto-generated benchmark | 1509GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | Not supported | 2.06 tok/sEstimated | 755GB (have 32GB) |
| EssentialAI/rnj-1 | FP16 | Fits comfortably | 8.40 tok/sEstimated | 19GB (have 32GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | Not supported | 3.04 tok/sEstimated | 378GB (have 32GB) |
| EssentialAI/rnj-1 | Q8 | Fits comfortably | 15.41 tok/sEstimated | 10GB (have 32GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | Not supported | 1.12 tok/sEstimated | 1509GB (have 32GB) |
| EssentialAI/rnj-1 | Q4 | Fits comfortably | 20.43 tok/sEstimated | 5GB (have 32GB) |
| meta-llama/Llama-3.2-1B-Instruct | FP16 | Fits comfortably | 11.84 tok/sEstimated | 2GB (have 32GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 23.58 tok/sEstimated | 1GB (have 32GB) |
| openai/gpt-oss-20b | Q8 | Fits comfortably | 10.60 tok/sEstimated | 20GB (have 32GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 15.55 tok/sEstimated | 10GB (have 32GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | FP16 | Not supported | 3.89 tok/sEstimated | 66GB (have 32GB) |
| Qwen/Qwen2.5-14B | Q8 | Fits comfortably | 14.31 tok/sEstimated | 14GB (have 32GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 30.23 tok/sEstimated | 1GB (have 32GB) |
| openai/gpt-oss-120b | Q4 | Not supported | 5.04 tok/sEstimated | 59GB (have 32GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | Fits comfortably | 10.37 tok/sEstimated | 9GB (have 32GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | FP16 | Not supported | 3.90 tok/sEstimated | 70GB (have 32GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 13.59 tok/sEstimated | 10GB (have 32GB) |
| Qwen/Qwen2.5-32B | Q8 | Not supported | 7.00 tok/sEstimated | 33GB (have 32GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | FP16 | Fits comfortably | 10.58 tok/sEstimated | 9GB (have 32GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 26.45 tok/sEstimated | 3GB (have 32GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 17.49 tok/sEstimated | 4GB (have 32GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | 6.15 tok/sEstimated | 33GB (have 32GB) |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | Not supported | 3.93 tok/sEstimated | 115GB (have 32GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 20.48 tok/sEstimated | 7GB (have 32GB) |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | Not supported | 1.93 tok/sEstimated | 138GB (have 32GB) |
| Qwen/Qwen2.5-14B | FP16 | Fits comfortably | 7.93 tok/sEstimated | 29GB (have 32GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 35.25 tok/sEstimated | 2GB (have 32GB) |
| meta-llama/Llama-Guard-3-1B | FP16 | Fits comfortably | 11.90 tok/sEstimated | 2GB (have 32GB) |
| codellama/CodeLlama-34b-hf | Q4 | Fits comfortably | 10.04 tok/sEstimated | 17GB (have 32GB) |
| codellama/CodeLlama-34b-hf | Q8 | Not supported | 6.34 tok/sEstimated | 35GB (have 32GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | FP16 | Fits comfortably | 10.36 tok/sEstimated | 11GB (have 32GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 26.80 tok/sEstimated | 4GB (have 32GB) |
| openai-community/gpt2-xl | FP16 | Fits comfortably | 9.95 tok/sEstimated | 15GB (have 32GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 31.42 tok/sEstimated | 2GB (have 32GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 26.42 tok/sEstimated | 4GB (have 32GB) |
| Qwen/Qwen2.5-32B | FP16 | Not supported | 3.71 tok/sEstimated | 66GB (have 32GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 28.58 tok/sEstimated | 2GB (have 32GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 18.60 tok/sEstimated | 4GB (have 32GB) |
| Qwen/Qwen3-4B-Base | FP16 | Fits comfortably | 9.59 tok/sEstimated | 9GB (have 32GB) |
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 18.71 tok/sEstimated | 7GB (have 32GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | 3.67 tok/sEstimated | 78GB (have 32GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | FP16 | Not supported | 2.13 tok/sEstimated | 156GB (have 32GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 32.68 tok/sEstimated | 2GB (have 32GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 22.67 tok/sEstimated | 3GB (have 32GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 25.76 tok/sEstimated | 4GB (have 32GB) |
| Qwen/Qwen2.5-Math-72B-Instruct | FP16 | Not supported | 1.89 tok/sEstimated | 142GB (have 32GB) |
| microsoft/Phi-3-medium-128k-instruct | FP16 | Fits comfortably | 7.39 tok/sEstimated | 29GB (have 32GB) |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | Not supported | 1.85 tok/sEstimated | 138GB (have 32GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | 5.57 tok/sEstimated | 34GB (have 32GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | 3.77 tok/sEstimated | 69GB (have 32GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 27.05 tok/sEstimated | 4GB (have 32GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 26.41 tok/sEstimated | 3GB (have 32GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 19.27 tok/sEstimated | 5GB (have 32GB) |
| Qwen/Qwen2.5-0.5B-Instruct | FP16 | Fits comfortably | 9.86 tok/sEstimated | 11GB (have 32GB) |
| openai-community/gpt2-xl | Q8 | Fits comfortably | 19.24 tok/sEstimated | 7GB (have 32GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 24.78 tok/sEstimated | 3GB (have 32GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 20.44 tok/sEstimated | 9GB (have 32GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | Fits comfortably | 10.03 tok/sEstimated | 17GB (have 32GB) |
| Qwen/Qwen2.5-Math-1.5B | FP16 | Fits comfortably | 10.24 tok/sEstimated | 11GB (have 32GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 26.57 tok/sEstimated | 4GB (have 32GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 18.63 tok/sEstimated | 7GB (have 32GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | FP16 | Fits comfortably | 10.16 tok/sEstimated | 15GB (have 32GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | FP16 | Not supported | 5.09 tok/sEstimated | 61GB (have 32GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Fits (tight) | 10.94 tok/sEstimated | 31GB (have 32GB) |
| codellama/CodeLlama-34b-hf | FP16 | Not supported | 3.52 tok/sEstimated | 70GB (have 32GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 18.31 tok/sEstimated | 9GB (have 32GB) |
| Qwen/Qwen3-8B-FP8 | FP16 | Fits comfortably | 9.31 tok/sEstimated | 17GB (have 32GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 19.22 tok/sEstimated | 7GB (have 32GB) |
| rinna/japanese-gpt-neox-small | FP16 | Fits comfortably | 10.03 tok/sEstimated | 15GB (have 32GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 26.41 tok/sEstimated | 4GB (have 32GB) |
| openai/gpt-oss-120b | Q8 | Not supported | 3.49 tok/sEstimated | 117GB (have 32GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 18.77 tok/sEstimated | 7GB (have 32GB) |
| meta-llama/Llama-2-7b-chat-hf | FP16 | Fits comfortably | 9.32 tok/sEstimated | 15GB (have 32GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 28.22 tok/sEstimated | 4GB (have 32GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 17.34 tok/sEstimated | 7GB (have 32GB) |
| Qwen/Qwen2-7B-Instruct | FP16 | Fits comfortably | 11.13 tok/sEstimated | 15GB (have 32GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 28.85 tok/sEstimated | 2GB (have 32GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 18.92 tok/sEstimated | 4GB (have 32GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | 6.38 tok/sEstimated | 35GB (have 32GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 29.17 tok/sEstimated | 4GB (have 32GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | FP16 | Not supported | 6.09 tok/sEstimated | 61GB (have 32GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Fits comfortably | 15.33 tok/sEstimated | 15GB (have 32GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Fits (tight) | 9.45 tok/sEstimated | 31GB (have 32GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | Not supported | 6.11 tok/sEstimated | 61GB (have 32GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Fits comfortably | 16.22 tok/sEstimated | 15GB (have 32GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Fits (tight) | 10.72 tok/sEstimated | 31GB (have 32GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | Not supported | 5.44 tok/sEstimated | 61GB (have 32GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 27.54 tok/sEstimated | 3GB (have 32GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 21.37 tok/sEstimated | 1GB (have 32GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | FP16 | Fits comfortably | 13.02 tok/sEstimated | 2GB (have 32GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 25.92 tok/sEstimated | 2GB (have 32GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 18.49 tok/sEstimated | 3GB (have 32GB) |
| microsoft/Phi-3-medium-128k-instruct | Q4 | Fits comfortably | 22.03 tok/sEstimated | 7GB (have 32GB) |
| microsoft/Phi-3-medium-128k-instruct | Q8 | Fits comfortably | 13.14 tok/sEstimated | 14GB (have 32GB) |
| google/gemma-2-9b-it | Q4 | Fits comfortably | 18.81 tok/sEstimated | 5GB (have 32GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | Not supported | 5.50 tok/sEstimated | 34GB (have 32GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | Not supported | 4.00 tok/sEstimated | 69GB (have 32GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | FP16 | Not supported | 2.00 tok/sEstimated | 138GB (have 32GB) |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | Fits comfortably | 18.75 tok/sEstimated | 4GB (have 32GB) |
| Qwen/Qwen2.5-0.5B | FP16 | Fits comfortably | 10.25 tok/sEstimated | 11GB (have 32GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Explore how RTX 5070 stacks up for local inference workloads.
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 4070 Super stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.