Loading GPU specifications...
Loading GPU benchmarks...
Quick Answer: RTX 3060 12GB offers 12GB VRAM and starts around $309.99. It delivers approximately 77 tokens/sec on bigcode/starcoder2-3b. It typically draws 170W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
Buy directly on Amazon with fast shipping and reliable customer service.
Rotate out primary variants whenever validation flags an issue.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| bigcode/starcoder2-3b | Q4 | 76.55 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-OCR | Q4 | 76.46 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 76.41 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 75.82 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | Q4 | 75.21 tok/sEstimated Auto-generated benchmark | 2GB |
| google/embeddinggemma-300m | Q4 | 75.15 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 74.63 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 74.57 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 74.23 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 73.77 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 73.16 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 72.98 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 72.92 tok/sEstimated Auto-generated benchmark | 2GB |
| tencent/HunyuanOCR | Q4 | 71.80 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 71.65 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 71.01 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 70.51 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 69.81 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q4 | 69.52 tok/sEstimated Auto-generated benchmark | 2GB |
| nari-labs/Dia2-2B | Q4 | 68.73 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 68.38 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q4 | 67.49 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q4 | 67.47 tok/sEstimated Auto-generated benchmark | 2GB |
| WeiboAI/VibeThinker-1.5B | Q4 | 67.22 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 67.20 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gemma-3-1b-it | Q4 | 65.68 tok/sEstimated Auto-generated benchmark | 1GB |
| google-bert/bert-base-uncased | Q4 | 65.16 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 64.71 tok/sEstimated Auto-generated benchmark | 1GB |
| EleutherAI/gpt-neo-125m | Q4 | 63.79 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 63.76 tok/sEstimated Auto-generated benchmark | 3GB |
| facebook/opt-125m | Q4 | 63.72 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 63.72 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.2-dev | Q4 | 63.71 tok/sEstimated Auto-generated benchmark | 4GB |
| Tongyi-MAI/Z-Image-Turbo | Q4 | 63.70 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 63.48 tok/sEstimated Auto-generated benchmark | 2GB |
| facebook/sam3 | Q4 | 63.40 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 63.31 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 63.20 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 63.20 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 63.11 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 63.09 tok/sEstimated Auto-generated benchmark | 1GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 63.07 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/Olmo-3-7B-Think | Q4 | 63.06 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 62.80 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 62.73 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 62.56 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 62.47 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 62.43 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 62.42 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 62.38 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 62.26 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 62.15 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 61.87 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/VibeVoice-1.5B | Q4 | 61.71 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 61.68 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/phi-4 | Q4 | 61.65 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 61.58 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 61.46 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 61.46 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 61.02 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 60.99 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 60.83 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 60.75 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 60.65 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 60.65 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 60.60 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 60.59 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 60.53 tok/sEstimated Auto-generated benchmark | 3GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 60.43 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 60.21 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 60.15 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 60.10 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 60.03 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 59.98 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen-Image-Edit-2509 | Q4 | 59.92 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.1-dev | Q4 | 59.88 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 59.88 tok/sEstimated Auto-generated benchmark | 2GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 59.68 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 59.66 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/DialoGPT-small | Q4 | 59.64 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 59.63 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-270m-it | Q4 | 59.31 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 59.28 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q4 | 58.95 tok/sEstimated Auto-generated benchmark | 3GB |
| vikhyatk/moondream2 | Q4 | 58.88 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 58.80 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 58.66 tok/sEstimated Auto-generated benchmark | 4GB |
| parler-tts/parler-tts-large-v1 | Q4 | 58.58 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 58.56 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 58.55 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 58.40 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 58.38 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 58.36 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-1.5B | Q4 | 58.07 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 58.05 tok/sEstimated Auto-generated benchmark | 2GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 57.96 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 57.96 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 57.70 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 57.65 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 57.56 tok/sEstimated Auto-generated benchmark | 3GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 57.56 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 57.35 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 57.10 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 56.98 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 56.83 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 56.62 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 56.61 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 56.47 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 56.37 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 56.34 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 56.33 tok/sEstimated Auto-generated benchmark | 2GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 56.28 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 56.28 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 56.23 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q4 | 56.19 tok/sEstimated Auto-generated benchmark | 4GB |
| tencent/HunyuanVideo-1.5 | Q4 | 56.10 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 56.06 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 56.03 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 56.00 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 55.97 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B | Q4 | 55.93 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 55.75 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 55.64 tok/sEstimated Auto-generated benchmark | 4GB |
| rednote-hilab/dots.ocr | Q4 | 55.63 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 55.62 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 55.41 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q4 | 55.39 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 55.14 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 55.09 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 55.02 tok/sEstimated Auto-generated benchmark | 2GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 54.98 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 54.77 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 54.71 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 54.68 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 54.38 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 54.29 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 54.14 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 54.10 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 53.81 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 53.49 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q4 | 53.47 tok/sEstimated Auto-generated benchmark | 2GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 53.47 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 53.12 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-OCR | Q8 | 52.86 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 52.86 tok/sEstimated Auto-generated benchmark | 3GB |
| bigcode/starcoder2-3b | Q8 | 52.81 tok/sEstimated Auto-generated benchmark | 3GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 52.80 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 52.78 tok/sEstimated Auto-generated benchmark | 3GB |
| LiquidAI/LFM2-1.2B | Q8 | 52.75 tok/sEstimated Auto-generated benchmark | 2GB |
| EleutherAI/pythia-70m-deduped | Q4 | 52.61 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 52.61 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 52.55 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q8 | 52.52 tok/sEstimated Auto-generated benchmark | 3GB |
| liuhaotian/llava-v1.5-7b | Q4 | 52.50 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 52.42 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 52.37 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 52.36 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-research/PowerMoE-3b | Q8 | 52.29 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 52.29 tok/sEstimated Auto-generated benchmark | 4GB |
| WeiboAI/VibeThinker-1.5B | Q8 | 51.75 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 50.75 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 50.22 tok/sEstimated Auto-generated benchmark | 3GB |
| google-t5/t5-3b | Q8 | 49.98 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 49.66 tok/sEstimated Auto-generated benchmark | 3GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 49.64 tok/sEstimated Auto-generated benchmark | 3GB |
| facebook/sam3 | Q8 | 49.48 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 49.42 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q8 | 49.26 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 49.05 tok/sEstimated Auto-generated benchmark | 1GB |
| tencent/HunyuanOCR | Q8 | 48.81 tok/sEstimated Auto-generated benchmark | 2GB |
| google-bert/bert-base-uncased | Q8 | 48.62 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 48.02 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2-2b-it | Q8 | 47.89 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q8 | 47.54 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 47.45 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.2-1B | Q8 | 47.34 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q8 | 47.12 tok/sEstimated Auto-generated benchmark | 1GB |
| google/embeddinggemma-300m | Q8 | 46.79 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 46.45 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 46.24 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-14B | Q4 | 46.05 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gemma-3-1b-it | Q8 | 45.73 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q8 | 45.59 tok/sEstimated Auto-generated benchmark | 2GB |
| nari-labs/Dia2-2B | Q8 | 45.16 tok/sEstimated Auto-generated benchmark | 3GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 45.10 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-3B | Q8 | 44.96 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/DialoGPT-small | Q8 | 44.63 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 44.62 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 44.60 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 44.50 tok/sEstimated Auto-generated benchmark | 5GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 44.49 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 44.39 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-4B | Q8 | 44.34 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 44.28 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 44.27 tok/sEstimated Auto-generated benchmark | 7GB |
| allenai/Olmo-3-7B-Think | Q8 | 44.13 tok/sEstimated Auto-generated benchmark | 8GB |
| parler-tts/parler-tts-large-v1 | Q8 | 44.02 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 44.00 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 43.96 tok/sEstimated Auto-generated benchmark | 7GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 43.94 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 43.92 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 43.86 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 43.81 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-0.6B | Q8 | 43.79 tok/sEstimated Auto-generated benchmark | 6GB |
| distilbert/distilgpt2 | Q8 | 43.69 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 43.69 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 43.61 tok/sEstimated Auto-generated benchmark | 6GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 43.58 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B | Q8 | 43.58 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 43.54 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 43.49 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 43.49 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 43.48 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 43.47 tok/sEstimated Auto-generated benchmark | 7GB |
| Tongyi-MAI/Z-Image-Turbo | Q8 | 43.39 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 43.35 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 43.24 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 43.12 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B-Base | Q4 | 43.08 tok/sEstimated Auto-generated benchmark | 7GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 43.00 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 42.82 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-1.5B | Q8 | 42.81 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 42.80 tok/sEstimated Auto-generated benchmark | 6GB |
| microsoft/VibeVoice-1.5B | Q8 | 42.80 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 42.50 tok/sEstimated Auto-generated benchmark | 7GB |
| EssentialAI/rnj-1 | Q4 | 42.44 tok/sEstimated Auto-generated benchmark | 5GB |
| zai-org/GLM-4.6-FP8 | Q8 | 42.29 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 42.28 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 42.28 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 42.21 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 42.18 tok/sEstimated Auto-generated benchmark | 9GB |
| openai-community/gpt2-xl | Q8 | 42.14 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 42.09 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 42.04 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 41.98 tok/sEstimated Auto-generated benchmark | 9GB |
| rednote-hilab/dots.ocr | Q8 | 41.92 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q4 | 41.82 tok/sEstimated Auto-generated benchmark | 8GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 41.80 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 41.78 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 41.77 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 41.75 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 41.73 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 41.59 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/Phi-4-mini-instruct | Q8 | 41.38 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 41.32 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 41.32 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 41.18 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-2-7b-hf | Q8 | 41.09 tok/sEstimated Auto-generated benchmark | 7GB |
| dicta-il/dictalm2.0-instruct | Q8 | 41.01 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 40.92 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 40.81 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 40.80 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 40.62 tok/sEstimated Auto-generated benchmark | 4GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 40.59 tok/sEstimated Auto-generated benchmark | 5GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 40.58 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 40.48 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 40.48 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen-Image-Edit-2509 | Q8 | 40.40 tok/sEstimated Auto-generated benchmark | 8GB |
| black-forest-labs/FLUX.2-dev | Q8 | 40.37 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 40.28 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-2-9b-it | Q4 | 40.17 tok/sEstimated Auto-generated benchmark | 5GB |
| EleutherAI/pythia-70m-deduped | Q8 | 40.17 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 40.12 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 40.07 tok/sEstimated Auto-generated benchmark | 5GB |
| tencent/HunyuanVideo-1.5 | Q8 | 39.96 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 39.82 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q8 | 39.77 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 39.76 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 39.68 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-0.5B | Q8 | 39.68 tok/sEstimated Auto-generated benchmark | 5GB |
| skt/kogpt2-base-v2 | Q8 | 39.68 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B | Q8 | 39.67 tok/sEstimated Auto-generated benchmark | 5GB |
| liuhaotian/llava-v1.5-7b | Q8 | 39.53 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 39.48 tok/sEstimated Auto-generated benchmark | 7GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 39.38 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 39.38 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 39.33 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 39.29 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 39.12 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 39.12 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 38.99 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q8 | 38.93 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 38.68 tok/sEstimated Auto-generated benchmark | 9GB |
| black-forest-labs/FLUX.1-dev | Q8 | 38.58 tok/sEstimated Auto-generated benchmark | 8GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 38.47 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 38.39 tok/sEstimated Auto-generated benchmark | 9GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 38.37 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 38.35 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 38.29 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 38.28 tok/sEstimated Auto-generated benchmark | 7GB |
| numind/NuExtract-1.5 | Q8 | 38.08 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 38.03 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 37.99 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 37.85 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 37.80 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 37.74 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 37.73 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 37.71 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 37.61 tok/sEstimated Auto-generated benchmark | 9GB |
| huggyllama/llama-7b | Q8 | 37.57 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 37.53 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 37.47 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 37.42 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 37.37 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 37.34 tok/sEstimated Auto-generated benchmark | 3GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 37.29 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-4B-Base | Q8 | 37.28 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q8 | 37.26 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 37.25 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 37.20 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 37.13 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 37.05 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 36.97 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 36.93 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 36.92 tok/sEstimated Auto-generated benchmark | 5GB |
| rinna/japanese-gpt-neox-small | Q8 | 36.75 tok/sEstimated Auto-generated benchmark | 7GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 36.72 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q8 | 36.67 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 36.58 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 34.82 tok/sEstimated Auto-generated benchmark | 15GB |
| openai/gpt-oss-20b | Q4 | 33.46 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 33.42 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-2-27b-it | Q4 | 33.41 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-14B | Q8 | 33.41 tok/sEstimated Auto-generated benchmark | 14GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 33.32 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 32.93 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 32.92 tok/sEstimated Auto-generated benchmark | 10GB |
| google/gemma-2-9b-it | Q8 | 32.58 tok/sEstimated Auto-generated benchmark | 10GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 32.56 tok/sEstimated Auto-generated benchmark | 10GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 32.09 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 31.92 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 31.85 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B | Q4 | 31.81 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 31.75 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 31.52 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 31.44 tok/sEstimated Auto-generated benchmark | 9GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 31.42 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen2.5-14B | Q8 | 31.41 tok/sEstimated Auto-generated benchmark | 14GB |
| EssentialAI/rnj-1 | Q8 | 31.27 tok/sEstimated Auto-generated benchmark | 10GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 30.66 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 30.57 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 30.40 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-14B-Base | Q8 | 29.98 tok/sEstimated Auto-generated benchmark | 14GB |
| deepseek-ai/DeepSeek-OCR | FP16 | 29.78 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 29.65 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 29.62 tok/sEstimated Auto-generated benchmark | 15GB |
| openai/gpt-oss-safeguard-20b | Q4 | 29.47 tok/sEstimated Auto-generated benchmark | 11GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 28.85 tok/sEstimated Auto-generated benchmark | 9GB |
| nari-labs/Dia2-2B | FP16 | 28.74 tok/sEstimated Auto-generated benchmark | 5GB |
| google-bert/bert-base-uncased | FP16 | 28.73 tok/sEstimated Auto-generated benchmark | 1GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 28.72 tok/sEstimated Auto-generated benchmark | 14GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 28.53 tok/sEstimated Auto-generated benchmark | 6GB |
| inference-net/Schematron-3B | FP16 | 28.47 tok/sEstimated Auto-generated benchmark | 6GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 28.38 tok/sEstimated Auto-generated benchmark | 13GB |
| google/gemma-2b | FP16 | 28.25 tok/sEstimated Auto-generated benchmark | 4GB |
| tencent/HunyuanOCR | FP16 | 28.08 tok/sEstimated Auto-generated benchmark | 3GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q8 | 28.07 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-3.2-1B-Instruct | FP16 | 27.85 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | FP16 | 27.41 tok/sEstimated Auto-generated benchmark | 6GB |
| unsloth/gemma-3-1b-it | FP16 | 27.33 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | FP16 | 27.30 tok/sEstimated Auto-generated benchmark | 6GB |
| bigcode/starcoder2-3b | FP16 | 27.29 tok/sEstimated Auto-generated benchmark | 6GB |
| meta-llama/Llama-Guard-3-1B | FP16 | 27.11 tok/sEstimated Auto-generated benchmark | 2GB |
| google/embeddinggemma-300m | FP16 | 27.02 tok/sEstimated Auto-generated benchmark | 1GB |
| facebook/sam3 | FP16 | 27.00 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | FP16 | 26.96 tok/sEstimated Auto-generated benchmark | 6GB |
| ibm-granite/granite-3.3-2b-instruct | FP16 | 26.76 tok/sEstimated Auto-generated benchmark | 4GB |
| LiquidAI/LFM2-1.2B | FP16 | 26.55 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B | FP16 | 26.50 tok/sEstimated Auto-generated benchmark | 6GB |
| apple/OpenELM-1_1B-Instruct | FP16 | 26.31 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | FP16 | 25.90 tok/sEstimated Auto-generated benchmark | 6GB |
| google-t5/t5-3b | FP16 | 25.81 tok/sEstimated Auto-generated benchmark | 6GB |
| allenai/OLMo-2-0425-1B | FP16 | 25.51 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B-Instruct | FP16 | 25.11 tok/sEstimated Auto-generated benchmark | 6GB |
| meta-llama/Llama-3.2-1B | FP16 | 25.04 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | FP16 | 24.42 tok/sEstimated Auto-generated benchmark | 6GB |
| google/gemma-2-2b-it | FP16 | 24.32 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 24.28 tok/sEstimated Auto-generated benchmark | 20GB |
| rednote-hilab/dots.ocr | FP16 | 24.26 tok/sEstimated Auto-generated benchmark | 15GB |
| huggyllama/llama-7b | FP16 | 24.24 tok/sEstimated Auto-generated benchmark | 15GB |
| ibm-granite/granite-3.3-8b-instruct | FP16 | 24.21 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Embedding-0.6B | FP16 | 24.15 tok/sEstimated Auto-generated benchmark | 13GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | FP16 | 24.15 tok/sEstimated Auto-generated benchmark | 17GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 24.12 tok/sEstimated Auto-generated benchmark | 20GB |
| unsloth/Llama-3.2-1B-Instruct | FP16 | 24.09 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-Guard-3-8B | FP16 | 24.07 tok/sEstimated Auto-generated benchmark | 17GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | FP16 | 24.05 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-3.5-vision-instruct | FP16 | 24.04 tok/sEstimated Auto-generated benchmark | 15GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | FP16 | 24.02 tok/sEstimated Auto-generated benchmark | 15GB |
| parler-tts/parler-tts-large-v1 | FP16 | 23.97 tok/sEstimated Auto-generated benchmark | 15GB |
| WeiboAI/VibeThinker-1.5B | FP16 | 23.89 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-1b-it | FP16 | 23.88 tok/sEstimated Auto-generated benchmark | 2GB |
| black-forest-labs/FLUX.1-dev | FP16 | 23.87 tok/sEstimated Auto-generated benchmark | 16GB |
| vikhyatk/moondream2 | FP16 | 23.82 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-xl | FP16 | 23.76 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/phi-4 | FP16 | 23.70 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B-Base | FP16 | 23.69 tok/sEstimated Auto-generated benchmark | 17GB |
| openai/gpt-oss-20b | Q8 | 23.59 tok/sEstimated Auto-generated benchmark | 20GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 23.58 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen3-4B-Thinking-2507 | FP16 | 23.53 tok/sEstimated Auto-generated benchmark | 9GB |
| numind/NuExtract-1.5 | FP16 | 23.50 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1 | FP16 | 23.44 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM-135M | FP16 | 23.42 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | FP16 | 23.37 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-0528 | FP16 | 23.34 tok/sEstimated Auto-generated benchmark | 15GB |
| distilbert/distilgpt2 | FP16 | 23.32 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | FP16 | 23.32 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen-Image-Edit-2509 | FP16 | 23.31 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | FP16 | 23.29 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 23.27 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | FP16 | 23.24 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | FP16 | 23.23 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | 23.16 tok/sEstimated Auto-generated benchmark | 31GB |
| meta-llama/Llama-2-7b-chat-hf | FP16 | 23.15 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 23.13 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | FP16 | 22.99 tok/sEstimated Auto-generated benchmark | 15GB |
| allenai/Olmo-3-7B-Think | FP16 | 22.99 tok/sEstimated Auto-generated benchmark | 16GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | FP16 | 22.99 tok/sEstimated Auto-generated benchmark | 11GB |
| MiniMaxAI/MiniMax-M2 | FP16 | 22.97 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-0.5B | FP16 | 22.96 tok/sEstimated Auto-generated benchmark | 11GB |
| llamafactory/tiny-random-Llama-3 | FP16 | 22.95 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | 22.94 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen3-Embedding-4B | FP16 | 22.92 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | FP16 | 22.91 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-1.5B | FP16 | 22.87 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2-1.5B-Instruct | FP16 | 22.86 tok/sEstimated Auto-generated benchmark | 11GB |
| meta-llama/Meta-Llama-3-8B | FP16 | 22.86 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | 22.77 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 22.76 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-0.5B-Instruct | FP16 | 22.72 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-1.7B-Base | FP16 | 22.70 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3-0324 | FP16 | 22.69 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | 22.58 tok/sEstimated Auto-generated benchmark | 23GB |
| Qwen/Qwen2.5-0.5B | FP16 | 22.50 tok/sEstimated Auto-generated benchmark | 11GB |
| lmsys/vicuna-7b-v1.5 | FP16 | 22.45 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-large | FP16 | 22.44 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3.1 | FP16 | 22.42 tok/sEstimated Auto-generated benchmark | 15GB |
| Gensyn/Qwen2.5-0.5B-Instruct | FP16 | 22.37 tok/sEstimated Auto-generated benchmark | 11GB |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 22.29 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | 22.27 tok/sEstimated Auto-generated benchmark | 31GB |
| IlyaGusev/saiga_llama3_8b | FP16 | 22.26 tok/sEstimated Auto-generated benchmark | 17GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 22.25 tok/sEstimated Auto-generated benchmark | 16GB |
| BSC-LT/salamandraTA-7b-instruct | FP16 | 22.24 tok/sEstimated Auto-generated benchmark | 15GB |
| zai-org/GLM-4.6-FP8 | FP16 | 22.18 tok/sEstimated Auto-generated benchmark | 15GB |
| petals-team/StableBeluga2 | FP16 | 22.17 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3-mini-128k-instruct | FP16 | 22.17 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Instruct-2507 | FP16 | 22.14 tok/sEstimated Auto-generated benchmark | 9GB |
| EleutherAI/gpt-neo-125m | FP16 | 22.12 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | 22.09 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen3-4B-Base | FP16 | 22.06 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-7B-v0.1 | FP16 | 22.04 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-1.7B | FP16 | 22.02 tok/sEstimated Auto-generated benchmark | 15GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | 22.00 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | 22.00 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-8B-FP8 | FP16 | 21.99 tok/sEstimated Auto-generated benchmark | 17GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | FP16 | 21.97 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q8 | 21.95 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen2.5-0.5B-Instruct | FP16 | 21.93 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-Reranker-0.6B | FP16 | 21.92 tok/sEstimated Auto-generated benchmark | 13GB |
| HuggingFaceTB/SmolLM2-135M | FP16 | 21.91 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | FP16 | 21.91 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | 21.91 tok/sEstimated Auto-generated benchmark | 31GB |
| facebook/opt-125m | FP16 | 21.91 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-small | FP16 | 21.90 tok/sEstimated Auto-generated benchmark | 15GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | 21.89 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | 21.88 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-Coder-1.5B | FP16 | 21.83 tok/sEstimated Auto-generated benchmark | 11GB |
| hmellor/tiny-random-LlamaForCausalLM | FP16 | 21.81 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | FP16 | 21.76 tok/sEstimated Auto-generated benchmark | 9GB |
| HuggingFaceH4/zephyr-7b-beta | FP16 | 21.75 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-1.5B-Instruct | FP16 | 21.73 tok/sEstimated Auto-generated benchmark | 11GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 21.72 tok/sEstimated Auto-generated benchmark | 15GB |
| Tongyi-MAI/Z-Image-Turbo | FP16 | 21.72 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/DeepSeek-V3 | FP16 | 21.69 tok/sEstimated Auto-generated benchmark | 15GB |
| ibm-granite/granite-docling-258M | FP16 | 21.68 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 21.64 tok/sEstimated Auto-generated benchmark | 34GB |
| openai-community/gpt2 | FP16 | 21.49 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-medium | FP16 | 21.38 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-3-270m-it | FP16 | 21.37 tok/sEstimated Auto-generated benchmark | 15GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | FP16 | 21.33 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 21.27 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-Embedding-8B | FP16 | 21.20 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | 21.19 tok/sEstimated Auto-generated benchmark | 31GB |
| microsoft/phi-2 | FP16 | 21.16 tok/sEstimated Auto-generated benchmark | 15GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | 21.14 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-8B | FP16 | 21.13 tok/sEstimated Auto-generated benchmark | 17GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | FP16 | 21.12 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-8B | FP16 | 21.06 tok/sEstimated Auto-generated benchmark | 17GB |
| liuhaotian/llava-v1.5-7b | FP16 | 21.06 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-7B-Instruct-v0.1 | FP16 | 21.04 tok/sEstimated Auto-generated benchmark | 15GB |
| sshleifer/tiny-gpt2 | FP16 | 21.03 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-0.6B | FP16 | 21.02 tok/sEstimated Auto-generated benchmark | 13GB |
| dicta-il/dictalm2.0-instruct | FP16 | 21.02 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3-mini-4k-instruct | FP16 | 20.99 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | FP16 | 20.95 tok/sEstimated Auto-generated benchmark | 15GB |
| black-forest-labs/FLUX.2-dev | FP16 | 20.91 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 20.91 tok/sEstimated Auto-generated benchmark | 16GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | FP16 | 20.90 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-32B | Q4 | 20.87 tok/sEstimated Auto-generated benchmark | 16GB |
| google/gemma-2-27b-it | Q8 | 20.83 tok/sEstimated Auto-generated benchmark | 28GB |
| bigscience/bloomz-560m | FP16 | 20.82 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | FP16 | 20.80 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 20.75 tok/sEstimated Auto-generated benchmark | 17GB |
| openai/gpt-oss-safeguard-20b | Q8 | 20.69 tok/sEstimated Auto-generated benchmark | 22GB |
| codellama/CodeLlama-34b-hf | Q4 | 20.69 tok/sEstimated Auto-generated benchmark | 17GB |
| mistralai/Mistral-7B-Instruct-v0.2 | FP16 | 20.66 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 20.66 tok/sEstimated Auto-generated benchmark | 17GB |
| swiss-ai/Apertus-8B-Instruct-2509 | FP16 | 20.51 tok/sEstimated Auto-generated benchmark | 17GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | 20.42 tok/sEstimated Auto-generated benchmark | 34GB |
| GSAI-ML/LLaDA-8B-Instruct | FP16 | 20.38 tok/sEstimated Auto-generated benchmark | 17GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q4 | 20.37 tok/sEstimated Auto-generated benchmark | 25GB |
| Qwen/Qwen3-32B | Q4 | 20.36 tok/sEstimated Auto-generated benchmark | 16GB |
| skt/kogpt2-base-v2 | FP16 | 20.31 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-4-multimodal-instruct | FP16 | 20.29 tok/sEstimated Auto-generated benchmark | 15GB |
| EleutherAI/pythia-70m-deduped | FP16 | 20.28 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/VibeVoice-1.5B | FP16 | 20.27 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2.5-Math-1.5B | FP16 | 20.14 tok/sEstimated Auto-generated benchmark | 11GB |
| zai-org/GLM-4.5-Air | FP16 | 20.14 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | 20.13 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | 20.12 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen3-0.6B-Base | FP16 | 20.09 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2-7B-Instruct | FP16 | 20.09 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | 20.08 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B | FP16 | 20.07 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-2-7b-hf | FP16 | 20.03 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/QwQ-32B-Preview | Q4 | 20.00 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-4-mini-instruct | FP16 | 19.98 tok/sEstimated Auto-generated benchmark | 15GB |
| GSAI-ML/LLaDA-8B-Base | FP16 | 19.97 tok/sEstimated Auto-generated benchmark | 17GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 19.96 tok/sEstimated Auto-generated benchmark | 16GB |
| tencent/HunyuanVideo-1.5 | FP16 | 19.96 tok/sEstimated Auto-generated benchmark | 16GB |
| rinna/japanese-gpt-neox-small | FP16 | 19.93 tok/sEstimated Auto-generated benchmark | 15GB |
| 01-ai/Yi-1.5-34B-Chat | Q4 | 19.93 tok/sEstimated Auto-generated benchmark | 18GB |
| openai-community/gpt2-medium | FP16 | 19.92 tok/sEstimated Auto-generated benchmark | 15GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 19.39 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 19.38 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 18.99 tok/sEstimated Auto-generated benchmark | 34GB |
| deepseek-ai/DeepSeek-V2.5 | Q4 | 18.98 tok/sEstimated Auto-generated benchmark | 328GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | 18.96 tok/sEstimated Auto-generated benchmark | 17GB |
| moonshotai/Kimi-K2-Thinking | Q4 | 18.66 tok/sEstimated Auto-generated benchmark | 489GB |
| Qwen/Qwen2.5-14B | FP16 | 17.98 tok/sEstimated Auto-generated benchmark | 29GB |
| NousResearch/Hermes-3-Llama-3.1-8B | FP16 | 17.90 tok/sEstimated Auto-generated benchmark | 17GB |
| EssentialAI/rnj-1 | FP16 | 17.87 tok/sEstimated Auto-generated benchmark | 19GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 17.85 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 17.44 tok/sEstimated Auto-generated benchmark | 30GB |
| meta-llama/Llama-2-13b-chat-hf | FP16 | 17.26 tok/sEstimated Auto-generated benchmark | 27GB |
| OpenPipe/Qwen3-14B-Instruct | FP16 | 17.17 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen3-14B | FP16 | 16.67 tok/sEstimated Auto-generated benchmark | 29GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | FP16 | 16.64 tok/sEstimated Auto-generated benchmark | 19GB |
| google/gemma-2-9b-it | FP16 | 16.57 tok/sEstimated Auto-generated benchmark | 20GB |
| ai-forever/ruGPT-3.5-13B | FP16 | 15.99 tok/sEstimated Auto-generated benchmark | 27GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q8 | 15.64 tok/sEstimated Auto-generated benchmark | 50GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | 15.64 tok/sEstimated Auto-generated benchmark | 68GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | 15.47 tok/sEstimated Auto-generated benchmark | 68GB |
| microsoft/Phi-3-medium-128k-instruct | FP16 | 15.37 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen3-14B-Base | FP16 | 15.34 tok/sEstimated Auto-generated benchmark | 29GB |
| 01-ai/Yi-1.5-34B-Chat | Q8 | 15.32 tok/sEstimated Auto-generated benchmark | 35GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 15.30 tok/sEstimated Auto-generated benchmark | 17GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | 15.21 tok/sEstimated Auto-generated benchmark | 69GB |
| mistralai/Ministral-3-14B-Instruct-2512 | FP16 | 15.15 tok/sEstimated Auto-generated benchmark | 32GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | 14.98 tok/sEstimated Auto-generated benchmark | 34GB |
| baichuan-inc/Baichuan-M2-32B | Q8 | 14.89 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/QwQ-32B-Preview | Q8 | 14.84 tok/sEstimated Auto-generated benchmark | 34GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 14.76 tok/sEstimated Auto-generated benchmark | 68GB |
| deepseek-ai/DeepSeek-V2.5 | Q8 | 14.74 tok/sEstimated Auto-generated benchmark | 656GB |
| Qwen/Qwen2.5-32B | Q8 | 14.72 tok/sEstimated Auto-generated benchmark | 33GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 14.35 tok/sEstimated Auto-generated benchmark | 68GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | 14.06 tok/sEstimated Auto-generated benchmark | 33GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | 13.93 tok/sEstimated Auto-generated benchmark | 35GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | 13.75 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | 13.55 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 13.27 tok/sEstimated Auto-generated benchmark | 33GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | 13.23 tok/sEstimated Auto-generated benchmark | 33GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | FP16 | 13.04 tok/sEstimated Auto-generated benchmark | 41GB |
| codellama/CodeLlama-34b-hf | Q8 | 12.92 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 12.90 tok/sEstimated Auto-generated benchmark | 34GB |
| moonshotai/Kimi-K2-Thinking | Q8 | 12.90 tok/sEstimated Auto-generated benchmark | 978GB |
| Qwen/Qwen3-32B | Q8 | 12.86 tok/sEstimated Auto-generated benchmark | 33GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | FP16 | 12.82 tok/sEstimated Auto-generated benchmark | 61GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | 12.77 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | FP16 | 12.73 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 12.72 tok/sEstimated Auto-generated benchmark | 36GB |
| openai/gpt-oss-safeguard-20b | FP16 | 12.71 tok/sEstimated Auto-generated benchmark | 44GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | 12.71 tok/sEstimated Auto-generated benchmark | 39GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | FP16 | 12.62 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | 12.54 tok/sEstimated Auto-generated benchmark | 36GB |
| mistralai/Mistral-Small-Instruct-2409 | FP16 | 12.43 tok/sEstimated Auto-generated benchmark | 46GB |
| openai/gpt-oss-120b | Q4 | 12.10 tok/sEstimated Auto-generated benchmark | 59GB |
| openai/gpt-oss-20b | FP16 | 12.01 tok/sEstimated Auto-generated benchmark | 41GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | FP16 | 12.01 tok/sEstimated Auto-generated benchmark | 41GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | 11.97 tok/sEstimated Auto-generated benchmark | 44GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | 11.89 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen3-30B-A3B | FP16 | 11.89 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | 11.80 tok/sEstimated Auto-generated benchmark | 39GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | FP16 | 11.80 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | 11.66 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | 11.66 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | FP16 | 11.64 tok/sEstimated Auto-generated benchmark | 61GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | 11.62 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | FP16 | 11.53 tok/sEstimated Auto-generated benchmark | 61GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 11.51 tok/sEstimated Auto-generated benchmark | 34GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | 11.42 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 11.37 tok/sEstimated Auto-generated benchmark | 35GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 11.29 tok/sEstimated Auto-generated benchmark | 34GB |
| google/gemma-2-27b-it | FP16 | 11.16 tok/sEstimated Auto-generated benchmark | 56GB |
| unsloth/gpt-oss-20b-BF16 | FP16 | 11.05 tok/sEstimated Auto-generated benchmark | 41GB |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | 10.98 tok/sEstimated Auto-generated benchmark | 60GB |
| AI-MO/Kimina-Prover-72B | Q4 | 10.49 tok/sEstimated Auto-generated benchmark | 35GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | 9.72 tok/sEstimated Auto-generated benchmark | 138GB |
| deepseek-ai/DeepSeek-Math-V2 | Q4 | 9.10 tok/sEstimated Auto-generated benchmark | 383GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | 8.82 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | 8.75 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 8.73 tok/sEstimated Auto-generated benchmark | 71GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | 8.70 tok/sEstimated Auto-generated benchmark | 88GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | 8.55 tok/sEstimated Auto-generated benchmark | 115GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | FP16 | 8.48 tok/sEstimated Auto-generated benchmark | 137GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | 8.40 tok/sEstimated Auto-generated benchmark | 78GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 8.27 tok/sEstimated Auto-generated benchmark | 69GB |
| baichuan-inc/Baichuan-M2-32B | FP16 | 8.23 tok/sEstimated Auto-generated benchmark | 66GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | 8.18 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | FP16 | 8.14 tok/sEstimated Auto-generated benchmark | 67GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | 8.10 tok/sEstimated Auto-generated benchmark | 71GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 8.10 tok/sEstimated Auto-generated benchmark | 66GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | FP16 | 8.05 tok/sEstimated Auto-generated benchmark | 70GB |
| AI-MO/Kimina-Prover-72B | Q8 | 8.04 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 8.01 tok/sEstimated Auto-generated benchmark | 67GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | 8.01 tok/sEstimated Auto-generated benchmark | 78GB |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | 7.89 tok/sEstimated Auto-generated benchmark | 120GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | FP16 | 7.88 tok/sEstimated Auto-generated benchmark | 101GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 7.79 tok/sEstimated Auto-generated benchmark | 137GB |
| Qwen/QwQ-32B-Preview | FP16 | 7.76 tok/sEstimated Auto-generated benchmark | 67GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 7.71 tok/sEstimated Auto-generated benchmark | 137GB |
| openai/gpt-oss-120b | Q8 | 7.69 tok/sEstimated Auto-generated benchmark | 117GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | 7.68 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 7.67 tok/sEstimated Auto-generated benchmark | 70GB |
| 01-ai/Yi-1.5-34B-Chat | FP16 | 7.58 tok/sEstimated Auto-generated benchmark | 70GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | FP16 | 7.57 tok/sEstimated Auto-generated benchmark | 137GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 7.55 tok/sEstimated Auto-generated benchmark | 69GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | FP16 | 7.55 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Meta-Llama-3-70B-Instruct | FP16 | 7.53 tok/sEstimated Auto-generated benchmark | 137GB |
| deepseek-ai/DeepSeek-V2.5 | FP16 | 7.49 tok/sEstimated Auto-generated benchmark | 1312GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | 7.40 tok/sEstimated Auto-generated benchmark | 378GB |
| moonshotai/Kimi-K2-Thinking | FP16 | 7.39 tok/sEstimated Auto-generated benchmark | 1956GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | FP16 | 7.38 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen3-235B-A22B | Q4 | 7.37 tok/sEstimated Auto-generated benchmark | 115GB |
| deepseek-ai/deepseek-coder-33b-instruct | FP16 | 7.36 tok/sEstimated Auto-generated benchmark | 68GB |
| codellama/CodeLlama-34b-hf | FP16 | 7.32 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen2.5-32B | FP16 | 7.18 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen3-32B | FP16 | 7.00 tok/sEstimated Auto-generated benchmark | 66GB |
| MiniMaxAI/MiniMax-M1-40k | Q4 | 6.73 tok/sEstimated Auto-generated benchmark | 255GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | 6.52 tok/sEstimated Auto-generated benchmark | 231GB |
| MiniMaxAI/MiniMax-VL-01 | Q4 | 6.50 tok/sEstimated Auto-generated benchmark | 256GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | FP16 | 5.65 tok/sEstimated Auto-generated benchmark | 275GB |
| deepseek-ai/DeepSeek-Math-V2 | Q8 | 5.53 tok/sEstimated Auto-generated benchmark | 766GB |
| MiniMaxAI/MiniMax-VL-01 | Q8 | 4.97 tok/sEstimated Auto-generated benchmark | 511GB |
| Qwen/Qwen3-235B-A22B | Q8 | 4.94 tok/sEstimated Auto-generated benchmark | 230GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | FP16 | 4.84 tok/sEstimated Auto-generated benchmark | 176GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | FP16 | 4.80 tok/sEstimated Auto-generated benchmark | 156GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 4.79 tok/sEstimated Auto-generated benchmark | 138GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | 4.78 tok/sEstimated Auto-generated benchmark | 755GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | FP16 | 4.78 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 4.59 tok/sEstimated Auto-generated benchmark | 142GB |
| MiniMaxAI/MiniMax-M1-40k | Q8 | 4.39 tok/sEstimated Auto-generated benchmark | 510GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 4.37 tok/sEstimated Auto-generated benchmark | 141GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | 4.36 tok/sEstimated Auto-generated benchmark | 156GB |
| mistralai/Mistral-Large-Instruct-2411 | FP16 | 4.31 tok/sEstimated Auto-generated benchmark | 240GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | FP16 | 4.22 tok/sEstimated Auto-generated benchmark | 156GB |
| AI-MO/Kimina-Prover-72B | FP16 | 4.21 tok/sEstimated Auto-generated benchmark | 141GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 4.19 tok/sEstimated Auto-generated benchmark | 138GB |
| NousResearch/Hermes-3-Llama-3.1-70B | FP16 | 4.09 tok/sEstimated Auto-generated benchmark | 138GB |
| openai/gpt-oss-120b | FP16 | 4.02 tok/sEstimated Auto-generated benchmark | 235GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | FP16 | 4.00 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen2.5-Math-72B-Instruct | FP16 | 3.98 tok/sEstimated Auto-generated benchmark | 142GB |
| deepseek-ai/DeepSeek-Math-V2 | FP16 | 3.28 tok/sEstimated Auto-generated benchmark | 1532GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | FP16 | 3.24 tok/sEstimated Auto-generated benchmark | 461GB |
| MiniMaxAI/MiniMax-VL-01 | FP16 | 2.79 tok/sEstimated Auto-generated benchmark | 1021GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | 2.74 tok/sEstimated Auto-generated benchmark | 1509GB |
| MiniMaxAI/MiniMax-M1-40k | FP16 | 2.70 tok/sEstimated Auto-generated benchmark | 1020GB |
| Qwen/Qwen3-235B-A22B | FP16 | 2.69 tok/sEstimated Auto-generated benchmark | 460GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | Not supported | 7.40 tok/sEstimated | 378GB (have 12GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | Not supported | 4.78 tok/sEstimated | 755GB (have 12GB) |
| EssentialAI/rnj-1 | FP16 | Not supported | 17.87 tok/sEstimated | 19GB (have 12GB) |
| EssentialAI/rnj-1 | Q8 | Fits comfortably | 31.27 tok/sEstimated | 10GB (have 12GB) |
| EssentialAI/rnj-1 | Q4 | Fits comfortably | 42.44 tok/sEstimated | 5GB (have 12GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | Not supported | 2.74 tok/sEstimated | 1509GB (have 12GB) |
| openai/gpt-oss-20b | FP16 | Not supported | 11.70 tok/sEstimated | 41GB (have 12GB) |
| openai/gpt-oss-20b | Q8 | Not supported | 22.05 tok/sEstimated | 20GB (have 12GB) |
| openai-community/gpt2 | FP16 | Not supported | 21.47 tok/sEstimated | 15GB (have 12GB) |
| Qwen/Qwen3-Embedding-0.6B | FP16 | Not supported | 19.97 tok/sEstimated | 13GB (have 12GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 63.16 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen2.5-1.5B-Instruct | FP16 | Fits (tight) | 21.46 tok/sEstimated | 11GB (have 12GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 55.65 tok/sEstimated | 4GB (have 12GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 44.83 tok/sEstimated | 1GB (have 12GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 37.70 tok/sEstimated | 6GB (have 12GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 41.88 tok/sEstimated | 9GB (have 12GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 54.30 tok/sEstimated | 2GB (have 12GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | FP16 | Not supported | 20.66 tok/sEstimated | 15GB (have 12GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 43.66 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-4B-Instruct-2507 | FP16 | Fits comfortably | 22.81 tok/sEstimated | 9GB (have 12GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 68.96 tok/sEstimated | 1GB (have 12GB) |
| openai/gpt-oss-120b | Q8 | Not supported | 7.37 tok/sEstimated | 117GB (have 12GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 52.11 tok/sEstimated | 3GB (have 12GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | FP16 | Not supported | 19.93 tok/sEstimated | 15GB (have 12GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 61.73 tok/sEstimated | 4GB (have 12GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 47.51 tok/sEstimated | 3GB (have 12GB) |
| inference-net/Schematron-3B | FP16 | Fits comfortably | 28.57 tok/sEstimated | 6GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Not supported | 19.90 tok/sEstimated | 16GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | 13.89 tok/sEstimated | 33GB (have 12GB) |
| petals-team/StableBeluga2 | FP16 | Not supported | 20.58 tok/sEstimated | 15GB (have 12GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 46.12 tok/sEstimated | 1GB (have 12GB) |
| meta-llama/Meta-Llama-3-8B | FP16 | Not supported | 22.06 tok/sEstimated | 17GB (have 12GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 39.90 tok/sEstimated | 9GB (have 12GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 42.32 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 57.62 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 41.27 tok/sEstimated | 7GB (have 12GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 70.55 tok/sEstimated | 1GB (have 12GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 47.76 tok/sEstimated | 1GB (have 12GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 39.01 tok/sEstimated | 7GB (have 12GB) |
| openai-community/gpt2-large | FP16 | Not supported | 23.91 tok/sEstimated | 15GB (have 12GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 63.01 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 56.53 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 44.47 tok/sEstimated | 6GB (have 12GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Not supported | 20.20 tok/sEstimated | 17GB (have 12GB) |
| Qwen/Qwen3-Reranker-0.6B | FP16 | Not supported | 23.28 tok/sEstimated | 13GB (have 12GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 38.88 tok/sEstimated | 9GB (have 12GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | 14.66 tok/sEstimated | 35GB (have 12GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | FP16 | Not supported | 7.73 tok/sEstimated | 70GB (have 12GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 32.31 tok/sEstimated | 10GB (have 12GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 42.47 tok/sEstimated | 4GB (have 12GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | Fits comfortably | 22.35 tok/sEstimated | 9GB (have 12GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 58.31 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen2.5-1.5B | FP16 | Fits (tight) | 23.09 tok/sEstimated | 11GB (have 12GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 43.49 tok/sEstimated | 9GB (have 12GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | FP16 | Not supported | 22.81 tok/sEstimated | 17GB (have 12GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | 13.72 tok/sEstimated | 68GB (have 12GB) |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | Not supported | 7.57 tok/sEstimated | 137GB (have 12GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 63.06 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-14B | FP16 | Not supported | 16.95 tok/sEstimated | 29GB (have 12GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 60.14 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 40.97 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 72.84 tok/sEstimated | 2GB (have 12GB) |
| microsoft/phi-2 | FP16 | Not supported | 21.01 tok/sEstimated | 15GB (have 12GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 39.28 tok/sEstimated | 7GB (have 12GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 47.73 tok/sEstimated | 1GB (have 12GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 59.21 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 44.34 tok/sEstimated | 3GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 63.09 tok/sEstimated | 4GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 40.39 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | Not supported | 21.18 tok/sEstimated | 15GB (have 12GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 53.54 tok/sEstimated | 4GB (have 12GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 43.94 tok/sEstimated | 7GB (have 12GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 54.59 tok/sEstimated | 4GB (have 12GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 38.90 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 47.28 tok/sEstimated | 3GB (have 12GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 58.52 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen2.5-7B | FP16 | Not supported | 23.44 tok/sEstimated | 15GB (have 12GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | Not supported | 23.55 tok/sEstimated | 17GB (have 12GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 43.34 tok/sEstimated | 7GB (have 12GB) |
| microsoft/phi-4 | FP16 | Not supported | 22.94 tok/sEstimated | 15GB (have 12GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 62.65 tok/sEstimated | 4GB (have 12GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 37.30 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/DeepSeek-V3.1 | FP16 | Not supported | 21.89 tok/sEstimated | 15GB (have 12GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 59.18 tok/sEstimated | 4GB (have 12GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 39.23 tok/sEstimated | 7GB (have 12GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | FP16 | Not supported | 21.37 tok/sEstimated | 15GB (have 12GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | FP16 | Not supported | 8.15 tok/sEstimated | 137GB (have 12GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 33.41 tok/sEstimated | 10GB (have 12GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Not supported | 22.27 tok/sEstimated | 20GB (have 12GB) |
| unsloth/gpt-oss-20b-BF16 | FP16 | Not supported | 11.86 tok/sEstimated | 41GB (have 12GB) |
| HuggingFaceTB/SmolLM-135M | FP16 | Not supported | 23.44 tok/sEstimated | 15GB (have 12GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 60.43 tok/sEstimated | 3GB (have 12GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 43.35 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 57.96 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-8B-Base | FP16 | Not supported | 22.57 tok/sEstimated | 17GB (have 12GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 56.79 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-30B-A3B | FP16 | Not supported | 12.13 tok/sEstimated | 61GB (have 12GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 38.92 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 39.83 tok/sEstimated | 8GB (have 12GB) |
| Qwen/Qwen2.5-0.5B | FP16 | Fits (tight) | 24.11 tok/sEstimated | 11GB (have 12GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
Even with modest specs, a 12 GB RTX 3060 can drive 7B Q8 quant models at over 60 tokens/sec—fast enough for iterative coding and agents.
Source: Reddit – /r/LocalLLaMA (l6nfptd)
One builder running three RTX 3060 cards reports Gemma 3 27B Q4 at ~15 tok/sec, Mistral 24B Q4 at ~18 tok/sec, and DeepSeek R1 32B Q4 at ~20 tok/sec via Ollama.
Source: Reddit – /r/LocalLLaMA (mo6ttds)
Not always—2× RTX 3060 was projected to hit ~29 tok/sec on DeepSeek R1 32B (16K ctx), but real benchmarks landed closer to 14 tok/sec.
Source: Reddit – /r/LocalLLaMA (mq781cj)
A dual-Xeon workstation without GPU offload only mustered ~1.68 tok/sec on DeepSeek R1 Q4—showing why even a single 3060 is a major upgrade.
Source: Reddit – /r/LocalLLaMA (mm9ladj)
RTX 3060 12 GB draws 170 W, uses an 8-pin PCIe connector, and NVIDIA recommends a 550 W PSU. As of Nov 2025 the card was around $329 on Amazon.
Source: TechPowerUp – RTX 3060 Specs
Explore how RTX 3070 stacks up for local inference workloads.
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.
Explore how RTX 4070 stacks up for local inference workloads.