Loading GPU specifications...
Loading GPU benchmarks...
Quick Answer: Apple M2 Ultra offers 192GB VRAM and starts around $39.99. It delivers approximately 142 tokens/sec on deepseek-ai/DeepSeek-OCR. It typically draws 60W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
Buy directly on Amazon with fast shipping and reliable customer service.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| deepseek-ai/DeepSeek-OCR | Q4 | 142.31 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 140.13 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 140.07 tok/sEstimated Auto-generated benchmark | 2GB |
| google/embeddinggemma-300m | Q4 | 139.84 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 139.33 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 139.27 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 137.85 tok/sEstimated Auto-generated benchmark | 1GB |
| google-t5/t5-3b | Q4 | 137.07 tok/sEstimated Auto-generated benchmark | 2GB |
| LiquidAI/LFM2-1.2B | Q4 | 136.28 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 135.85 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 133.61 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 132.98 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 132.92 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 131.49 tok/sEstimated Auto-generated benchmark | 1GB |
| tencent/HunyuanOCR | Q4 | 129.98 tok/sEstimated Auto-generated benchmark | 1GB |
| nari-labs/Dia2-2B | Q4 | 126.85 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 126.73 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 126.36 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gemma-3-1b-it | Q4 | 125.77 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q4 | 125.20 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B | Q4 | 124.55 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 123.10 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 121.32 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 120.03 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 119.96 tok/sEstimated Auto-generated benchmark | 1GB |
| WeiboAI/VibeThinker-1.5B | Q4 | 119.78 tok/sEstimated Auto-generated benchmark | 1GB |
| facebook/sam3 | Q4 | 118.68 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B | Q4 | 118.40 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 118.01 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 117.84 tok/sEstimated Auto-generated benchmark | 2GB |
| liuhaotian/llava-v1.5-7b | Q4 | 117.80 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 117.74 tok/sEstimated Auto-generated benchmark | 2GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 117.71 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 117.36 tok/sEstimated Auto-generated benchmark | 4GB |
| google-bert/bert-base-uncased | Q4 | 117.34 tok/sEstimated Auto-generated benchmark | 1GB |
| microsoft/DialoGPT-medium | Q4 | 117.29 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 117.18 tok/sEstimated Auto-generated benchmark | 4GB |
| rednote-hilab/dots.ocr | Q4 | 116.65 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 116.62 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q4 | 116.58 tok/sEstimated Auto-generated benchmark | 1GB |
| openai-community/gpt2-large | Q4 | 116.51 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 116.49 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 116.45 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 116.42 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-granite/granite-docling-258M | Q4 | 116.20 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 115.93 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q4 | 115.82 tok/sEstimated Auto-generated benchmark | 3GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 115.81 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 115.41 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 115.41 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 114.65 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 114.58 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 114.22 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 114.18 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 114.10 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 113.59 tok/sEstimated Auto-generated benchmark | 3GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 113.06 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 113.05 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-4-mini-instruct | Q4 | 113.01 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 112.93 tok/sEstimated Auto-generated benchmark | 4GB |
| parler-tts/parler-tts-large-v1 | Q4 | 112.56 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 112.41 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 111.88 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 111.82 tok/sEstimated Auto-generated benchmark | 4GB |
| tencent/HunyuanVideo-1.5 | Q4 | 111.49 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 111.36 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 111.29 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 111.21 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 111.08 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen-Image-Edit-2509 | Q4 | 111.05 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 110.99 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 110.95 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 110.62 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 110.58 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 110.47 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B | Q4 | 110.32 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 110.27 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 110.03 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 110.01 tok/sEstimated Auto-generated benchmark | 3GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 109.81 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.2-dev | Q4 | 109.65 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 109.46 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 109.40 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-8B-Base | Q4 | 108.74 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 108.61 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 108.54 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 108.23 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 107.88 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 107.83 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 107.54 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 106.71 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 106.56 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 106.53 tok/sEstimated Auto-generated benchmark | 4GB |
| Tongyi-MAI/Z-Image-Turbo | Q4 | 106.49 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 106.37 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 106.00 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Base | Q4 | 105.92 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 105.88 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q4 | 105.80 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 105.69 tok/sEstimated Auto-generated benchmark | 2GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 105.68 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 105.61 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 104.89 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 104.84 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 104.82 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/Olmo-3-7B-Think | Q4 | 104.52 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 104.45 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 104.19 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 103.63 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.1-dev | Q4 | 103.25 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 103.06 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 102.79 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 102.75 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B | Q4 | 102.57 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-7b-hf | Q4 | 102.45 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 102.28 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 102.05 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 102.00 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 101.69 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 101.65 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 101.65 tok/sEstimated Auto-generated benchmark | 3GB |
| facebook/opt-125m | Q4 | 101.61 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 101.54 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/VibeVoice-1.5B | Q4 | 101.51 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-8B | Q4 | 101.30 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 101.23 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 101.18 tok/sEstimated Auto-generated benchmark | 2GB |
| huggyllama/llama-7b | Q4 | 101.08 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 100.56 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 100.40 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 100.34 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 100.15 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 100.10 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 99.99 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 99.79 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 99.74 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 99.63 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 99.57 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 99.49 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 99.47 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 99.45 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 99.30 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 99.13 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 99.02 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B | Q4 | 98.95 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 98.87 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q8 | 98.54 tok/sEstimated Auto-generated benchmark | 1GB |
| openai-community/gpt2-medium | Q4 | 98.13 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 98.07 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 97.88 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 97.73 tok/sEstimated Auto-generated benchmark | 3GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 97.61 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/sam3 | Q8 | 97.18 tok/sEstimated Auto-generated benchmark | 1GB |
| numind/NuExtract-1.5 | Q4 | 96.76 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 96.75 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/OLMo-2-0425-1B | Q8 | 96.50 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q8 | 94.76 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 94.70 tok/sEstimated Auto-generated benchmark | 3GB |
| google/embeddinggemma-300m | Q8 | 94.26 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 93.88 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q8 | 93.87 tok/sEstimated Auto-generated benchmark | 2GB |
| google-bert/bert-base-uncased | Q8 | 93.31 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 92.26 tok/sEstimated Auto-generated benchmark | 1GB |
| bigcode/starcoder2-3b | Q8 | 91.74 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 91.59 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 91.53 tok/sEstimated Auto-generated benchmark | 1GB |
| nari-labs/Dia2-2B | Q8 | 91.22 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 91.01 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q8 | 90.75 tok/sEstimated Auto-generated benchmark | 3GB |
| tencent/HunyuanOCR | Q8 | 90.68 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 90.55 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B | Q8 | 88.55 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-3-1b-it | Q8 | 88.53 tok/sEstimated Auto-generated benchmark | 1GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 88.50 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B-Base | Q4 | 87.85 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q4 | 87.07 tok/sEstimated Auto-generated benchmark | 8GB |
| EssentialAI/rnj-1 | Q4 | 86.68 tok/sEstimated Auto-generated benchmark | 5GB |
| google-t5/t5-3b | Q8 | 86.26 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2-9b-it | Q4 | 86.20 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-OCR | Q8 | 86.12 tok/sEstimated Auto-generated benchmark | 4GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 86.05 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-3B | Q8 | 85.92 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 85.64 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-14B | Q4 | 84.20 tok/sEstimated Auto-generated benchmark | 7GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 83.87 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 82.67 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 82.60 tok/sEstimated Auto-generated benchmark | 9GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 82.56 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 82.41 tok/sEstimated Auto-generated benchmark | 5GB |
| facebook/opt-125m | Q8 | 82.37 tok/sEstimated Auto-generated benchmark | 7GB |
| LiquidAI/LFM2-1.2B | Q8 | 82.25 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 82.23 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 82.18 tok/sEstimated Auto-generated benchmark | 9GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 82.13 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 82.12 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 81.95 tok/sEstimated Auto-generated benchmark | 7GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 81.81 tok/sEstimated Auto-generated benchmark | 7GB |
| inference-net/Schematron-3B | Q8 | 81.75 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 81.50 tok/sEstimated Auto-generated benchmark | 3GB |
| WeiboAI/VibeThinker-1.5B | Q8 | 81.33 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/DialoGPT-small | Q8 | 81.16 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 81.11 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 81.11 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 81.01 tok/sEstimated Auto-generated benchmark | 9GB |
| distilbert/distilgpt2 | Q8 | 80.83 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 80.62 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 80.55 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 80.54 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 80.51 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 80.51 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 80.44 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 80.36 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 80.21 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 80.19 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 79.79 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 79.72 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B | Q8 | 79.35 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 79.17 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 79.16 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 78.86 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/VibeVoice-1.5B | Q8 | 78.79 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 78.73 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q8 | 78.63 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 78.42 tok/sEstimated Auto-generated benchmark | 7GB |
| black-forest-labs/FLUX.1-dev | Q8 | 78.39 tok/sEstimated Auto-generated benchmark | 8GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 78.30 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B | Q8 | 78.15 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 78.04 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 78.01 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 77.88 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 77.75 tok/sEstimated Auto-generated benchmark | 7GB |
| Tongyi-MAI/Z-Image-Turbo | Q8 | 77.71 tok/sEstimated Auto-generated benchmark | 8GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 77.49 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B | Q8 | 77.45 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 77.29 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 77.20 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 77.14 tok/sEstimated Auto-generated benchmark | 5GB |
| numind/NuExtract-1.5 | Q8 | 77.14 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 77.13 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 77.11 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 77.07 tok/sEstimated Auto-generated benchmark | 9GB |
| dicta-il/dictalm2.0-instruct | Q8 | 77.03 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 76.80 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 76.80 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 76.60 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 76.52 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 76.34 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 76.24 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 76.11 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 76.04 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 76.00 tok/sEstimated Auto-generated benchmark | 9GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 75.92 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 75.68 tok/sEstimated Auto-generated benchmark | 3GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 75.54 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 75.14 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 74.96 tok/sEstimated Auto-generated benchmark | 7GB |
| black-forest-labs/FLUX.2-dev | Q8 | 74.91 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/phi-2 | Q8 | 74.89 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 74.74 tok/sEstimated Auto-generated benchmark | 9GB |
| openai-community/gpt2-xl | Q8 | 74.58 tok/sEstimated Auto-generated benchmark | 7GB |
| tencent/HunyuanVideo-1.5 | Q8 | 74.46 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 74.38 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 74.33 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 74.30 tok/sEstimated Auto-generated benchmark | 9GB |
| zai-org/GLM-4.6-FP8 | Q8 | 74.13 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 74.13 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/DialoGPT-medium | Q8 | 74.10 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 74.09 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 73.71 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 73.62 tok/sEstimated Auto-generated benchmark | 6GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 73.60 tok/sEstimated Auto-generated benchmark | 9GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 73.52 tok/sEstimated Auto-generated benchmark | 9GB |
| sshleifer/tiny-gpt2 | Q8 | 73.51 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 73.32 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 73.31 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 73.25 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 73.24 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 73.14 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 73.14 tok/sEstimated Auto-generated benchmark | 7GB |
| allenai/Olmo-3-7B-Think | Q8 | 73.05 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 73.05 tok/sEstimated Auto-generated benchmark | 7GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 72.81 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 72.79 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 72.78 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 72.77 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 72.73 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 72.73 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 72.71 tok/sEstimated Auto-generated benchmark | 4GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 72.68 tok/sEstimated Auto-generated benchmark | 7GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 72.49 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 72.46 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 72.36 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen-Image-Edit-2509 | Q8 | 72.12 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-0.6B | Q8 | 72.12 tok/sEstimated Auto-generated benchmark | 6GB |
| EleutherAI/pythia-70m-deduped | Q8 | 71.62 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 71.55 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 71.48 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-4B-Base | Q8 | 71.27 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q8 | 70.99 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 70.92 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 70.78 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 70.74 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 70.69 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 70.48 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 70.43 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 70.36 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 69.97 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 69.89 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 69.13 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 69.09 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 69.03 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 68.77 tok/sEstimated Auto-generated benchmark | 5GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 68.75 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-8B | Q8 | 68.51 tok/sEstimated Auto-generated benchmark | 9GB |
| skt/kogpt2-base-v2 | Q8 | 68.44 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 68.08 tok/sEstimated Auto-generated benchmark | 5GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 68.01 tok/sEstimated Auto-generated benchmark | 9GB |
| zai-org/GLM-4.5-Air | Q8 | 67.76 tok/sEstimated Auto-generated benchmark | 7GB |
| vikhyatk/moondream2 | Q8 | 67.70 tok/sEstimated Auto-generated benchmark | 7GB |
| openai/gpt-oss-safeguard-20b | Q4 | 64.93 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-30B-A3B | Q4 | 64.10 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 64.07 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 62.65 tok/sEstimated Auto-generated benchmark | 10GB |
| openai/gpt-oss-20b | Q4 | 62.45 tok/sEstimated Auto-generated benchmark | 10GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 62.44 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 62.06 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 61.70 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 61.27 tok/sEstimated Auto-generated benchmark | 9GB |
| google/gemma-2-27b-it | Q4 | 61.01 tok/sEstimated Auto-generated benchmark | 14GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 60.45 tok/sEstimated Auto-generated benchmark | 14GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 60.31 tok/sEstimated Auto-generated benchmark | 13GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 60.26 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen2.5-14B | Q8 | 60.24 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 59.84 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 59.54 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 58.36 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 57.80 tok/sEstimated Auto-generated benchmark | 15GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 56.85 tok/sEstimated Auto-generated benchmark | 14GB |
| google/gemma-2-9b-it | Q8 | 56.33 tok/sEstimated Auto-generated benchmark | 10GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 54.08 tok/sEstimated Auto-generated benchmark | 10GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 54.02 tok/sEstimated Auto-generated benchmark | 9GB |
| ibm-research/PowerMoE-3b | FP16 | 53.83 tok/sEstimated Auto-generated benchmark | 6GB |
| bigcode/starcoder2-3b | FP16 | 53.79 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-14B-Base | Q8 | 53.74 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-3B | FP16 | 53.63 tok/sEstimated Auto-generated benchmark | 6GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q8 | 53.56 tok/sEstimated Auto-generated benchmark | 16GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | FP16 | 53.49 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 53.38 tok/sEstimated Auto-generated benchmark | 15GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 53.34 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 53.12 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-3B-Instruct | FP16 | 53.07 tok/sEstimated Auto-generated benchmark | 6GB |
| facebook/sam3 | FP16 | 52.90 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-1b-it | FP16 | 52.80 tok/sEstimated Auto-generated benchmark | 2GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 52.69 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-14B | Q8 | 52.13 tok/sEstimated Auto-generated benchmark | 14GB |
| EssentialAI/rnj-1 | Q8 | 51.89 tok/sEstimated Auto-generated benchmark | 10GB |
| LiquidAI/LFM2-1.2B | FP16 | 51.87 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-OCR | FP16 | 51.73 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.2-3B | FP16 | 51.41 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 51.19 tok/sEstimated Auto-generated benchmark | 14GB |
| nari-labs/Dia2-2B | FP16 | 51.16 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 51.07 tok/sEstimated Auto-generated benchmark | 6GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | FP16 | 51.02 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | FP16 | 50.43 tok/sEstimated Auto-generated benchmark | 6GB |
| apple/OpenELM-1_1B-Instruct | FP16 | 50.30 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | FP16 | 49.41 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | FP16 | 49.19 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | FP16 | 49.18 tok/sEstimated Auto-generated benchmark | 6GB |
| inference-net/Schematron-3B | FP16 | 48.84 tok/sEstimated Auto-generated benchmark | 6GB |
| google-t5/t5-3b | FP16 | 48.67 tok/sEstimated Auto-generated benchmark | 6GB |
| WeiboAI/VibeThinker-1.5B | FP16 | 48.37 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2b | FP16 | 48.19 tok/sEstimated Auto-generated benchmark | 4GB |
| google-bert/bert-base-uncased | FP16 | 47.53 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | FP16 | 47.18 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-1B | FP16 | 46.75 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gemma-3-1b-it | FP16 | 46.48 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | FP16 | 45.88 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-2b-instruct | FP16 | 45.50 tok/sEstimated Auto-generated benchmark | 4GB |
| google/embeddinggemma-300m | FP16 | 45.48 tok/sEstimated Auto-generated benchmark | 1GB |
| tencent/HunyuanOCR | FP16 | 45.47 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | FP16 | 44.83 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | 44.82 tok/sEstimated Auto-generated benchmark | 31GB |
| mistralai/Mistral-7B-Instruct-v0.1 | FP16 | 44.75 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 44.71 tok/sEstimated Auto-generated benchmark | 20GB |
| HuggingFaceH4/zephyr-7b-beta | FP16 | 44.70 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-0.6B-Base | FP16 | 44.69 tok/sEstimated Auto-generated benchmark | 13GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 44.69 tok/sEstimated Auto-generated benchmark | 17GB |
| allenai/Olmo-3-7B-Think | FP16 | 44.66 tok/sEstimated Auto-generated benchmark | 16GB |
| microsoft/VibeVoice-1.5B | FP16 | 44.65 tok/sEstimated Auto-generated benchmark | 11GB |
| skt/kogpt2-base-v2 | FP16 | 44.60 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3-mini-4k-instruct | FP16 | 44.54 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | 44.41 tok/sEstimated Auto-generated benchmark | 31GB |
| unsloth/Llama-3.2-1B-Instruct | FP16 | 44.22 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | FP16 | 44.19 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-8B | FP16 | 44.14 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Embedding-4B | FP16 | 44.09 tok/sEstimated Auto-generated benchmark | 9GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | FP16 | 44.01 tok/sEstimated Auto-generated benchmark | 15GB |
| zai-org/GLM-4.5-Air | FP16 | 44.00 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | FP16 | 43.87 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2-1.5B-Instruct | FP16 | 43.85 tok/sEstimated Auto-generated benchmark | 11GB |
| mistralai/Mistral-7B-v0.1 | FP16 | 43.66 tok/sEstimated Auto-generated benchmark | 15GB |
| lmsys/vicuna-7b-v1.5 | FP16 | 43.61 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/phi-2 | FP16 | 43.58 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 43.51 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | FP16 | 43.44 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Reranker-0.6B | FP16 | 43.42 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-4B-Instruct-2507 | FP16 | 43.21 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | 43.18 tok/sEstimated Auto-generated benchmark | 31GB |
| openai/gpt-oss-safeguard-20b | Q8 | 43.15 tok/sEstimated Auto-generated benchmark | 22GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 43.14 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | 43.09 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen2-0.5B-Instruct | FP16 | 42.96 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2.5-Math-1.5B | FP16 | 42.94 tok/sEstimated Auto-generated benchmark | 11GB |
| huggyllama/llama-7b | FP16 | 42.93 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-small | FP16 | 42.91 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | FP16 | 42.79 tok/sEstimated Auto-generated benchmark | 11GB |
| deepseek-ai/DeepSeek-R1-0528 | FP16 | 42.77 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | 42.77 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 42.75 tok/sEstimated Auto-generated benchmark | 15GB |
| Gensyn/Qwen2.5-0.5B-Instruct | FP16 | 42.70 tok/sEstimated Auto-generated benchmark | 11GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 42.69 tok/sEstimated Auto-generated benchmark | 20GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | FP16 | 42.68 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-8B-FP8 | FP16 | 42.62 tok/sEstimated Auto-generated benchmark | 17GB |
| dicta-il/dictalm2.0-instruct | FP16 | 42.51 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 42.47 tok/sEstimated Auto-generated benchmark | 17GB |
| rinna/japanese-gpt-neox-small | FP16 | 42.45 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | 42.43 tok/sEstimated Auto-generated benchmark | 31GB |
| BSC-LT/salamandraTA-7b-instruct | FP16 | 42.39 tok/sEstimated Auto-generated benchmark | 15GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 42.38 tok/sEstimated Auto-generated benchmark | 20GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | 42.36 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-4B | FP16 | 42.36 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-4-mini-instruct | FP16 | 42.27 tok/sEstimated Auto-generated benchmark | 15GB |
| GSAI-ML/LLaDA-8B-Base | FP16 | 42.23 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Embedding-8B | FP16 | 42.16 tok/sEstimated Auto-generated benchmark | 17GB |
| hmellor/tiny-random-LlamaForCausalLM | FP16 | 42.13 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-medium | FP16 | 42.05 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM-135M | FP16 | 42.01 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-Guard-3-8B | FP16 | 41.98 tok/sEstimated Auto-generated benchmark | 17GB |
| llamafactory/tiny-random-Llama-3 | FP16 | 41.98 tok/sEstimated Auto-generated benchmark | 15GB |
| black-forest-labs/FLUX.1-dev | FP16 | 41.91 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | FP16 | 41.87 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-1.5B | FP16 | 41.79 tok/sEstimated Auto-generated benchmark | 11GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | FP16 | 41.73 tok/sEstimated Auto-generated benchmark | 17GB |
| black-forest-labs/FLUX.2-dev | FP16 | 41.52 tok/sEstimated Auto-generated benchmark | 16GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 41.33 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | FP16 | 41.29 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-4-multimodal-instruct | FP16 | 41.20 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | 41.18 tok/sEstimated Auto-generated benchmark | 23GB |
| Qwen/Qwen2.5-0.5B-Instruct | FP16 | 41.16 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2.5-32B | Q4 | 41.16 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-4B-Thinking-2507 | FP16 | 41.15 tok/sEstimated Auto-generated benchmark | 9GB |
| petals-team/StableBeluga2 | FP16 | 41.14 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | 41.12 tok/sEstimated Auto-generated benchmark | 31GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | 40.93 tok/sEstimated Auto-generated benchmark | 34GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | FP16 | 40.91 tok/sEstimated Auto-generated benchmark | 15GB |
| codellama/CodeLlama-34b-hf | Q4 | 40.89 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-32B | Q4 | 40.70 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-2-7b-chat-hf | FP16 | 40.63 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 40.45 tok/sEstimated Auto-generated benchmark | 16GB |
| ibm-granite/granite-3.3-8b-instruct | FP16 | 40.40 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 40.39 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 40.21 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | FP16 | 40.16 tok/sEstimated Auto-generated benchmark | 17GB |
| parler-tts/parler-tts-large-v1 | FP16 | 40.15 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3-0324 | FP16 | 40.14 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | FP16 | 40.10 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-R1 | FP16 | 40.07 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-xl | FP16 | 40.06 tok/sEstimated Auto-generated benchmark | 15GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | FP16 | 40.02 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q8 | 40.00 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen2.5-0.5B | FP16 | 39.97 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-1.7B | FP16 | 39.87 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | 39.83 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | FP16 | 39.73 tok/sEstimated Auto-generated benchmark | 15GB |
| Tongyi-MAI/Z-Image-Turbo | FP16 | 39.72 tok/sEstimated Auto-generated benchmark | 16GB |
| rednote-hilab/dots.ocr | FP16 | 39.62 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | FP16 | 39.49 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2-7B-Instruct | FP16 | 39.45 tok/sEstimated Auto-generated benchmark | 15GB |
| GSAI-ML/LLaDA-8B-Instruct | FP16 | 39.36 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-2-7b-hf | FP16 | 39.25 tok/sEstimated Auto-generated benchmark | 15GB |
| swiss-ai/Apertus-8B-Instruct-2509 | FP16 | 39.23 tok/sEstimated Auto-generated benchmark | 17GB |
| HuggingFaceTB/SmolLM2-135M | FP16 | 39.21 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Embedding-0.6B | FP16 | 39.19 tok/sEstimated Auto-generated benchmark | 13GB |
| vikhyatk/moondream2 | FP16 | 39.09 tok/sEstimated Auto-generated benchmark | 15GB |
| openai/gpt-oss-20b | Q8 | 39.08 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen-Image-Edit-2509 | FP16 | 39.01 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | 38.99 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-0.6B | FP16 | 38.95 tok/sEstimated Auto-generated benchmark | 13GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | 38.92 tok/sEstimated Auto-generated benchmark | 9GB |
| zai-org/GLM-4.6-FP8 | FP16 | 38.92 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-7B | FP16 | 38.87 tok/sEstimated Auto-generated benchmark | 15GB |
| sshleifer/tiny-gpt2 | FP16 | 38.80 tok/sEstimated Auto-generated benchmark | 15GB |
| 01-ai/Yi-1.5-34B-Chat | Q4 | 38.79 tok/sEstimated Auto-generated benchmark | 18GB |
| Qwen/Qwen3-4B-Base | FP16 | 38.63 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | 38.55 tok/sEstimated Auto-generated benchmark | 34GB |
| moonshotai/Kimi-K2-Thinking | Q4 | 38.50 tok/sEstimated Auto-generated benchmark | 489GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | FP16 | 38.37 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Meta-Llama-3-8B | FP16 | 38.36 tok/sEstimated Auto-generated benchmark | 17GB |
| ibm-granite/granite-docling-258M | FP16 | 38.36 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 38.35 tok/sEstimated Auto-generated benchmark | 34GB |
| microsoft/Phi-3.5-vision-instruct | FP16 | 38.31 tok/sEstimated Auto-generated benchmark | 15GB |
| distilbert/distilgpt2 | FP16 | 38.30 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3-mini-128k-instruct | FP16 | 38.28 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-large | FP16 | 38.12 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-2-27b-it | Q8 | 38.11 tok/sEstimated Auto-generated benchmark | 28GB |
| Qwen/Qwen2.5-1.5B-Instruct | FP16 | 38.11 tok/sEstimated Auto-generated benchmark | 11GB |
| facebook/opt-125m | FP16 | 38.06 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/phi-4 | FP16 | 37.85 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V2.5 | Q4 | 37.85 tok/sEstimated Auto-generated benchmark | 328GB |
| Qwen/QwQ-32B-Preview | Q4 | 37.84 tok/sEstimated Auto-generated benchmark | 17GB |
| EleutherAI/gpt-neo-125m | FP16 | 37.76 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B | FP16 | 37.73 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2-0.5B | FP16 | 37.73 tok/sEstimated Auto-generated benchmark | 11GB |
| EleutherAI/pythia-70m-deduped | FP16 | 37.73 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | 37.46 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | 37.42 tok/sEstimated Auto-generated benchmark | 31GB |
| bigscience/bloomz-560m | FP16 | 37.38 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B-Base | FP16 | 37.38 tok/sEstimated Auto-generated benchmark | 17GB |
| mistralai/Mistral-7B-Instruct-v0.2 | FP16 | 37.24 tok/sEstimated Auto-generated benchmark | 15GB |
| IlyaGusev/saiga_llama3_8b | FP16 | 37.19 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-Coder-1.5B | FP16 | 37.17 tok/sEstimated Auto-generated benchmark | 11GB |
| deepseek-ai/DeepSeek-V3.1 | FP16 | 37.13 tok/sEstimated Auto-generated benchmark | 15GB |
| liuhaotian/llava-v1.5-7b | FP16 | 37.12 tok/sEstimated Auto-generated benchmark | 15GB |
| tencent/HunyuanVideo-1.5 | FP16 | 37.09 tok/sEstimated Auto-generated benchmark | 16GB |
| openai-community/gpt2-medium | FP16 | 36.96 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | FP16 | 36.95 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-1.7B-Base | FP16 | 36.93 tok/sEstimated Auto-generated benchmark | 15GB |
| numind/NuExtract-1.5 | FP16 | 36.85 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-3-270m-it | FP16 | 36.84 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2 | FP16 | 36.80 tok/sEstimated Auto-generated benchmark | 15GB |
| MiniMaxAI/MiniMax-M2 | FP16 | 36.79 tok/sEstimated Auto-generated benchmark | 15GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 36.68 tok/sEstimated Auto-generated benchmark | 17GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 36.67 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 35.65 tok/sEstimated Auto-generated benchmark | 34GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | 35.11 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 34.89 tok/sEstimated Auto-generated benchmark | 16GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q4 | 34.70 tok/sEstimated Auto-generated benchmark | 25GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | 34.43 tok/sEstimated Auto-generated benchmark | 17GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 34.36 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-14B | FP16 | 33.18 tok/sEstimated Auto-generated benchmark | 29GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | FP16 | 32.58 tok/sEstimated Auto-generated benchmark | 19GB |
| mistralai/Ministral-3-14B-Instruct-2512 | FP16 | 31.81 tok/sEstimated Auto-generated benchmark | 32GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 31.52 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-2-13b-chat-hf | FP16 | 31.25 tok/sEstimated Auto-generated benchmark | 27GB |
| EssentialAI/rnj-1 | FP16 | 29.62 tok/sEstimated Auto-generated benchmark | 19GB |
| Qwen/Qwen3-14B-Base | FP16 | 29.40 tok/sEstimated Auto-generated benchmark | 29GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | 29.34 tok/sEstimated Auto-generated benchmark | 69GB |
| NousResearch/Hermes-3-Llama-3.1-8B | FP16 | 29.22 tok/sEstimated Auto-generated benchmark | 17GB |
| OpenPipe/Qwen3-14B-Instruct | FP16 | 28.86 tok/sEstimated Auto-generated benchmark | 29GB |
| microsoft/Phi-3-medium-128k-instruct | FP16 | 28.80 tok/sEstimated Auto-generated benchmark | 29GB |
| ai-forever/ruGPT-3.5-13B | FP16 | 28.62 tok/sEstimated Auto-generated benchmark | 27GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 28.55 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen2.5-14B | FP16 | 28.50 tok/sEstimated Auto-generated benchmark | 29GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 28.43 tok/sEstimated Auto-generated benchmark | 68GB |
| 01-ai/Yi-1.5-34B-Chat | Q8 | 28.25 tok/sEstimated Auto-generated benchmark | 35GB |
| codellama/CodeLlama-34b-hf | Q8 | 28.13 tok/sEstimated Auto-generated benchmark | 35GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | 28.01 tok/sEstimated Auto-generated benchmark | 33GB |
| moonshotai/Kimi-K2-Thinking | Q8 | 27.94 tok/sEstimated Auto-generated benchmark | 978GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | 27.89 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 27.68 tok/sEstimated Auto-generated benchmark | 30GB |
| google/gemma-2-9b-it | FP16 | 27.57 tok/sEstimated Auto-generated benchmark | 20GB |
| baichuan-inc/Baichuan-M2-32B | Q8 | 27.51 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 27.44 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | 26.79 tok/sEstimated Auto-generated benchmark | 34GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 26.66 tok/sEstimated Auto-generated benchmark | 68GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | 26.60 tok/sEstimated Auto-generated benchmark | 35GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q8 | 26.43 tok/sEstimated Auto-generated benchmark | 50GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | 26.23 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-32B | Q8 | 25.96 tok/sEstimated Auto-generated benchmark | 33GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | 25.95 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 25.52 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/QwQ-32B-Preview | Q8 | 24.74 tok/sEstimated Auto-generated benchmark | 34GB |
| deepseek-ai/DeepSeek-V2.5 | Q8 | 24.59 tok/sEstimated Auto-generated benchmark | 656GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | 24.14 tok/sEstimated Auto-generated benchmark | 33GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | 24.10 tok/sEstimated Auto-generated benchmark | 34GB |
| openai/gpt-oss-20b | FP16 | 23.84 tok/sEstimated Auto-generated benchmark | 41GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | FP16 | 23.83 tok/sEstimated Auto-generated benchmark | 41GB |
| Qwen/Qwen3-32B | Q8 | 23.79 tok/sEstimated Auto-generated benchmark | 33GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 23.59 tok/sEstimated Auto-generated benchmark | 34GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | FP16 | 23.54 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | 23.48 tok/sEstimated Auto-generated benchmark | 39GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | FP16 | 23.39 tok/sEstimated Auto-generated benchmark | 41GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | FP16 | 23.31 tok/sEstimated Auto-generated benchmark | 61GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | FP16 | 23.08 tok/sEstimated Auto-generated benchmark | 61GB |
| google/gemma-2-27b-it | FP16 | 22.94 tok/sEstimated Auto-generated benchmark | 56GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | 22.72 tok/sEstimated Auto-generated benchmark | 39GB |
| mistralai/Mistral-Small-Instruct-2409 | FP16 | 22.45 tok/sEstimated Auto-generated benchmark | 46GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | FP16 | 21.92 tok/sEstimated Auto-generated benchmark | 61GB |
| AI-MO/Kimina-Prover-72B | Q4 | 21.71 tok/sEstimated Auto-generated benchmark | 35GB |
| openai/gpt-oss-safeguard-20b | FP16 | 21.70 tok/sEstimated Auto-generated benchmark | 44GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | FP16 | 21.62 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | 21.39 tok/sEstimated Auto-generated benchmark | 61GB |
| unsloth/gpt-oss-20b-BF16 | FP16 | 21.36 tok/sEstimated Auto-generated benchmark | 41GB |
| Qwen/Qwen3-30B-A3B | FP16 | 21.36 tok/sEstimated Auto-generated benchmark | 61GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | 21.26 tok/sEstimated Auto-generated benchmark | 44GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | 21.02 tok/sEstimated Auto-generated benchmark | 39GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | 20.94 tok/sEstimated Auto-generated benchmark | 34GB |
| openai/gpt-oss-120b | Q4 | 20.58 tok/sEstimated Auto-generated benchmark | 59GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | FP16 | 20.46 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | 20.32 tok/sEstimated Auto-generated benchmark | 36GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | 20.30 tok/sEstimated Auto-generated benchmark | 138GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | 20.22 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 20.19 tok/sEstimated Auto-generated benchmark | 36GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | 20.10 tok/sEstimated Auto-generated benchmark | 39GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | 19.88 tok/sEstimated Auto-generated benchmark | 34GB |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | 19.85 tok/sEstimated Auto-generated benchmark | 60GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 19.70 tok/sEstimated Auto-generated benchmark | 35GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 19.67 tok/sEstimated Auto-generated benchmark | 34GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | 16.49 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | 16.08 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 15.92 tok/sEstimated Auto-generated benchmark | 70GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | 15.84 tok/sEstimated Auto-generated benchmark | 115GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | 15.81 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | 15.78 tok/sEstimated Auto-generated benchmark | 78GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | FP16 | 15.57 tok/sEstimated Auto-generated benchmark | 137GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | FP16 | 15.54 tok/sEstimated Auto-generated benchmark | 137GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | FP16 | 15.26 tok/sEstimated Auto-generated benchmark | 101GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | FP16 | 15.19 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 15.17 tok/sEstimated Auto-generated benchmark | 67GB |
| deepseek-ai/DeepSeek-Math-V2 | Q4 | 15.04 tok/sEstimated Auto-generated benchmark | 383GB |
| AI-MO/Kimina-Prover-72B | Q8 | 15.02 tok/sEstimated Auto-generated benchmark | 70GB |
| deepseek-ai/deepseek-coder-33b-instruct | FP16 | 14.95 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-32B | FP16 | 14.79 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 14.79 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/QwQ-32B-Preview | FP16 | 14.68 tok/sEstimated Auto-generated benchmark | 67GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | FP16 | 14.67 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 14.58 tok/sEstimated Auto-generated benchmark | 71GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 14.53 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | 14.39 tok/sEstimated Auto-generated benchmark | 78GB |
| openai/gpt-oss-120b | Q8 | 14.38 tok/sEstimated Auto-generated benchmark | 117GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | 14.31 tok/sEstimated Auto-generated benchmark | 71GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | FP16 | 14.23 tok/sEstimated Auto-generated benchmark | 67GB |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | 14.20 tok/sEstimated Auto-generated benchmark | 120GB |
| moonshotai/Kimi-K2-Thinking | FP16 | 14.19 tok/sEstimated Auto-generated benchmark | 1956GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | 14.06 tok/sEstimated Auto-generated benchmark | 69GB |
| codellama/CodeLlama-34b-hf | FP16 | 13.93 tok/sEstimated Auto-generated benchmark | 70GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 13.91 tok/sEstimated Auto-generated benchmark | 137GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | 13.64 tok/sEstimated Auto-generated benchmark | 88GB |
| Qwen/Qwen3-235B-A22B | Q4 | 13.47 tok/sEstimated Auto-generated benchmark | 115GB |
| 01-ai/Yi-1.5-34B-Chat | FP16 | 13.36 tok/sEstimated Auto-generated benchmark | 70GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | FP16 | 13.29 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 13.26 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen3-32B | FP16 | 13.25 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Meta-Llama-3-70B-Instruct | FP16 | 13.22 tok/sEstimated Auto-generated benchmark | 137GB |
| deepseek-ai/DeepSeek-V2.5 | FP16 | 13.20 tok/sEstimated Auto-generated benchmark | 1312GB |
| baichuan-inc/Baichuan-M2-32B | FP16 | 13.04 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 12.89 tok/sEstimated Auto-generated benchmark | 137GB |
| MiniMaxAI/MiniMax-M1-40k | Q4 | 12.35 tok/sEstimated Auto-generated benchmark | 255GB |
| MiniMaxAI/MiniMax-VL-01 | Q4 | 12.31 tok/sEstimated Auto-generated benchmark | 256GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | 12.29 tok/sEstimated Auto-generated benchmark | 378GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | 12.24 tok/sEstimated Auto-generated benchmark | 231GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | FP16 | 11.33 tok/sEstimated Auto-generated benchmark | 275GB |
| deepseek-ai/DeepSeek-Math-V2 | Q8 | 10.66 tok/sEstimated Auto-generated benchmark | 766GB |
| MiniMaxAI/MiniMax-M1-40k | Q8 | 9.87 tok/sEstimated Auto-generated benchmark | 510GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | FP16 | 8.92 tok/sEstimated Auto-generated benchmark | 156GB |
| openai/gpt-oss-120b | FP16 | 8.86 tok/sEstimated Auto-generated benchmark | 235GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | 8.85 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | FP16 | 8.58 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen3-235B-A22B | Q8 | 8.57 tok/sEstimated Auto-generated benchmark | 230GB |
| NousResearch/Hermes-3-Llama-3.1-70B | FP16 | 8.54 tok/sEstimated Auto-generated benchmark | 138GB |
| MiniMaxAI/MiniMax-VL-01 | Q8 | 8.54 tok/sEstimated Auto-generated benchmark | 511GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | 8.45 tok/sEstimated Auto-generated benchmark | 755GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 8.38 tok/sEstimated Auto-generated benchmark | 141GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 8.24 tok/sEstimated Auto-generated benchmark | 142GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | FP16 | 8.17 tok/sEstimated Auto-generated benchmark | 138GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 8.17 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen2.5-Math-72B-Instruct | FP16 | 8.14 tok/sEstimated Auto-generated benchmark | 142GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | FP16 | 7.96 tok/sEstimated Auto-generated benchmark | 156GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | FP16 | 7.67 tok/sEstimated Auto-generated benchmark | 176GB |
| mistralai/Mistral-Large-Instruct-2411 | FP16 | 7.43 tok/sEstimated Auto-generated benchmark | 240GB |
| AI-MO/Kimina-Prover-72B | FP16 | 7.39 tok/sEstimated Auto-generated benchmark | 141GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 7.36 tok/sEstimated Auto-generated benchmark | 138GB |
| deepseek-ai/DeepSeek-Math-V2 | FP16 | 5.61 tok/sEstimated Auto-generated benchmark | 1532GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | FP16 | 5.53 tok/sEstimated Auto-generated benchmark | 461GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | 5.36 tok/sEstimated Auto-generated benchmark | 1509GB |
| MiniMaxAI/MiniMax-VL-01 | FP16 | 5.30 tok/sEstimated Auto-generated benchmark | 1021GB |
| Qwen/Qwen3-235B-A22B | FP16 | 5.14 tok/sEstimated Auto-generated benchmark | 460GB |
| MiniMaxAI/MiniMax-M1-40k | FP16 | 4.70 tok/sEstimated Auto-generated benchmark | 1020GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | Not supported | 8.45 tok/sEstimated | 755GB (have 192GB) |
| EssentialAI/rnj-1 | FP16 | Fits comfortably | 29.62 tok/sEstimated | 19GB (have 192GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | Not supported | 12.29 tok/sEstimated | 378GB (have 192GB) |
| EssentialAI/rnj-1 | Q8 | Fits comfortably | 51.89 tok/sEstimated | 10GB (have 192GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | Not supported | 5.36 tok/sEstimated | 1509GB (have 192GB) |
| EssentialAI/rnj-1 | Q4 | Fits comfortably | 86.68 tok/sEstimated | 5GB (have 192GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | Fits comfortably | 42.47 tok/sEstimated | 17GB (have 192GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 82.18 tok/sEstimated | 9GB (have 192GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 112.41 tok/sEstimated | 4GB (have 192GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 102.79 tok/sEstimated | 3GB (have 192GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 102.57 tok/sEstimated | 3GB (have 192GB) |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | Fits comfortably | 12.89 tok/sEstimated | 137GB (have 192GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 74.89 tok/sEstimated | 7GB (have 192GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 113.59 tok/sEstimated | 3GB (have 192GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | FP16 | Fits comfortably | 23.39 tok/sEstimated | 41GB (have 192GB) |
| google/gemma-3-1b-it | FP16 | Fits comfortably | 52.80 tok/sEstimated | 2GB (have 192GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Fits comfortably | 42.38 tok/sEstimated | 20GB (have 192GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 68.08 tok/sEstimated | 5GB (have 192GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | FP16 | Fits comfortably | 50.43 tok/sEstimated | 6GB (have 192GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 109.40 tok/sEstimated | 2GB (have 192GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 77.29 tok/sEstimated | 4GB (have 192GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | Fits comfortably | 38.92 tok/sEstimated | 9GB (have 192GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 74.30 tok/sEstimated | 9GB (have 192GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Fits comfortably | 26.66 tok/sEstimated | 68GB (have 192GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 135.85 tok/sEstimated | 1GB (have 192GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 117.18 tok/sEstimated | 4GB (have 192GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 77.45 tok/sEstimated | 4GB (have 192GB) |
| microsoft/phi-2 | FP16 | Fits comfortably | 43.58 tok/sEstimated | 15GB (have 192GB) |
| meta-llama/Llama-2-7b-hf | FP16 | Fits comfortably | 39.25 tok/sEstimated | 15GB (have 192GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 100.15 tok/sEstimated | 4GB (have 192GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 54.08 tok/sEstimated | 10GB (have 192GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 98.95 tok/sEstimated | 3GB (have 192GB) |
| MiniMaxAI/MiniMax-M2 | FP16 | Fits comfortably | 36.79 tok/sEstimated | 15GB (have 192GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | FP16 | Fits comfortably | 42.79 tok/sEstimated | 11GB (have 192GB) |
| huggyllama/llama-7b | FP16 | Fits comfortably | 42.93 tok/sEstimated | 15GB (have 192GB) |
| Qwen/Qwen2-0.5B | FP16 | Fits comfortably | 37.73 tok/sEstimated | 11GB (have 192GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 126.36 tok/sEstimated | 2GB (have 192GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 91.59 tok/sEstimated | 3GB (have 192GB) |
| ibm-granite/granite-3.3-2b-instruct | FP16 | Fits comfortably | 45.50 tok/sEstimated | 4GB (have 192GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 98.07 tok/sEstimated | 4GB (have 192GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 78.63 tok/sEstimated | 7GB (have 192GB) |
| microsoft/phi-4 | FP16 | Fits comfortably | 37.85 tok/sEstimated | 15GB (have 192GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 108.54 tok/sEstimated | 4GB (have 192GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 80.19 tok/sEstimated | 7GB (have 192GB) |
| deepseek-ai/DeepSeek-V3.1 | FP16 | Fits comfortably | 37.13 tok/sEstimated | 15GB (have 192GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 102.05 tok/sEstimated | 4GB (have 192GB) |
| lmsys/vicuna-7b-v1.5 | FP16 | Fits comfortably | 43.61 tok/sEstimated | 15GB (have 192GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits comfortably | 34.89 tok/sEstimated | 16GB (have 192GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Fits comfortably | 25.52 tok/sEstimated | 33GB (have 192GB) |
| Qwen/Qwen2.5-32B-Instruct | FP16 | Fits comfortably | 13.26 tok/sEstimated | 66GB (have 192GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 73.51 tok/sEstimated | 7GB (have 192GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 98.54 tok/sEstimated | 1GB (have 192GB) |
| meta-llama/Llama-3.2-1B | FP16 | Fits comfortably | 49.41 tok/sEstimated | 2GB (have 192GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 101.23 tok/sEstimated | 4GB (have 192GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 117.29 tok/sEstimated | 4GB (have 192GB) |
| IlyaGusev/saiga_llama3_8b | FP16 | Fits comfortably | 37.19 tok/sEstimated | 17GB (have 192GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 101.65 tok/sEstimated | 3GB (have 192GB) |
| Qwen/Qwen3-4B | FP16 | Fits comfortably | 42.36 tok/sEstimated | 9GB (have 192GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Fits comfortably | 53.38 tok/sEstimated | 15GB (have 192GB) |
| Qwen/Qwen2.5-14B | FP16 | Fits comfortably | 28.50 tok/sEstimated | 29GB (have 192GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 139.33 tok/sEstimated | 1GB (have 192GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 93.87 tok/sEstimated | 2GB (have 192GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 118.40 tok/sEstimated | 2GB (have 192GB) |
| google/gemma-2-2b-it | FP16 | Fits comfortably | 47.18 tok/sEstimated | 4GB (have 192GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 78.04 tok/sEstimated | 9GB (have 192GB) |
| sshleifer/tiny-gpt2 | FP16 | Fits comfortably | 38.80 tok/sEstimated | 15GB (have 192GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 88.55 tok/sEstimated | 3GB (have 192GB) |
| meta-llama/Llama-3.2-3B | FP16 | Fits comfortably | 51.41 tok/sEstimated | 6GB (have 192GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 101.08 tok/sEstimated | 4GB (have 192GB) |
| huggyllama/llama-7b | Q8 | Fits comfortably | 75.14 tok/sEstimated | 7GB (have 192GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 82.41 tok/sEstimated | 5GB (have 192GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 79.17 tok/sEstimated | 7GB (have 192GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | FP16 | Fits comfortably | 44.19 tok/sEstimated | 15GB (have 192GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 96.76 tok/sEstimated | 4GB (have 192GB) |
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 77.14 tok/sEstimated | 7GB (have 192GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Fits comfortably | 19.70 tok/sEstimated | 35GB (have 192GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 131.49 tok/sEstimated | 1GB (have 192GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 91.01 tok/sEstimated | 2GB (have 192GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 88.53 tok/sEstimated | 1GB (have 192GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | Fits comfortably | 26.79 tok/sEstimated | 34GB (have 192GB) |
| 01-ai/Yi-1.5-34B-Chat | FP16 | Fits comfortably | 13.36 tok/sEstimated | 70GB (have 192GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 98.87 tok/sEstimated | 2GB (have 192GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 103.63 tok/sEstimated | 4GB (have 192GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | FP16 | Fits comfortably | 7.67 tok/sEstimated | 176GB (have 192GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 97.61 tok/sEstimated | 4GB (have 192GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Fits comfortably | 15.92 tok/sEstimated | 70GB (have 192GB) |
| Qwen/Qwen2.5-72B-Instruct | FP16 | Fits comfortably | 8.38 tok/sEstimated | 141GB (have 192GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 73.24 tok/sEstimated | 7GB (have 192GB) |
| parler-tts/parler-tts-large-v1 | FP16 | Fits comfortably | 40.15 tok/sEstimated | 15GB (have 192GB) |
| Qwen/Qwen2.5-32B | Q4 | Fits comfortably | 41.16 tok/sEstimated | 16GB (have 192GB) |
| Qwen/Qwen2.5-32B | Q8 | Fits comfortably | 25.96 tok/sEstimated | 33GB (have 192GB) |
| Qwen/Qwen2.5-32B | FP16 | Fits comfortably | 14.79 tok/sEstimated | 66GB (have 192GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 105.68 tok/sEstimated | 4GB (have 192GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 75.54 tok/sEstimated | 7GB (have 192GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 132.98 tok/sEstimated | 1GB (have 192GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 124.55 tok/sEstimated | 2GB (have 192GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 85.92 tok/sEstimated | 3GB (have 192GB) |
| Qwen/Qwen2.5-3B | FP16 | Fits comfortably | 53.63 tok/sEstimated | 6GB (have 192GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 86.05 tok/sEstimated | 5GB (have 192GB) |
| GSAI-ML/LLaDA-8B-Base | FP16 | Fits comfortably | 42.23 tok/sEstimated | 17GB (have 192GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Explore how Apple M3 Max stacks up for local inference workloads.
Explore how RTX 4090 stacks up for local inference workloads.
Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.
Explore how NVIDIA A6000 stacks up for local inference workloads.
Explore how RX 7900 XTX stacks up for local inference workloads.