Loading GPU specifications...
Loading GPU benchmarks...
Quick Answer: RTX 3080 offers 10GB VRAM and starts around $449.99. It delivers approximately 165 tokens/sec on TinyLlama/TinyLlama-1.1B-Chat-v1.0. It typically draws 320W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
Buy directly on Amazon with fast shipping and reliable customer service.
Rotate out primary variants whenever validation flags an issue.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 164.74 tok/sEstimated Auto-generated benchmark | 1GB |
| google/embeddinggemma-300m | Q4 | 163.14 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 161.87 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q4 | 161.13 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 159.35 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 158.97 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/DeepSeek-OCR | Q4 | 157.43 tok/sEstimated Auto-generated benchmark | 2GB |
| facebook/sam3 | Q4 | 156.75 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 156.10 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 156.05 tok/sEstimated Auto-generated benchmark | 1GB |
| nari-labs/Dia2-2B | Q4 | 155.41 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 153.89 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gemma-3-1b-it | Q4 | 152.31 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 152.25 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 151.80 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 151.63 tok/sEstimated Auto-generated benchmark | 1GB |
| bigcode/starcoder2-3b | Q4 | 151.03 tok/sEstimated Auto-generated benchmark | 2GB |
| WeiboAI/VibeThinker-1.5B | Q4 | 150.00 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B | Q4 | 148.72 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 147.96 tok/sEstimated Auto-generated benchmark | 2GB |
| tencent/HunyuanOCR | Q4 | 147.50 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 146.75 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 145.28 tok/sEstimated Auto-generated benchmark | 2GB |
| google-bert/bert-base-uncased | Q4 | 145.28 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 145.00 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q4 | 143.40 tok/sEstimated Auto-generated benchmark | 1GB |
| google-t5/t5-3b | Q4 | 142.55 tok/sEstimated Auto-generated benchmark | 2GB |
| inference-net/Schematron-3B | Q4 | 141.26 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2-2b-it | Q4 | 140.20 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 139.13 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 138.18 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 138.06 tok/sEstimated Auto-generated benchmark | 2GB |
| sshleifer/tiny-gpt2 | Q4 | 137.73 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 137.64 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q4 | 137.47 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 137.47 tok/sEstimated Auto-generated benchmark | 2GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 137.31 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/Olmo-3-7B-Think | Q4 | 137.01 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 136.96 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 136.29 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 136.09 tok/sEstimated Auto-generated benchmark | 2GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 136.02 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 135.99 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 135.90 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 135.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 135.16 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 134.94 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.2-dev | Q4 | 134.82 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 134.68 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 134.59 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 134.35 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 133.94 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen-Image-Edit-2509 | Q4 | 133.78 tok/sEstimated Auto-generated benchmark | 4GB |
| tencent/HunyuanVideo-1.5 | Q4 | 133.55 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 133.32 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 133.31 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 133.22 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 132.49 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 132.45 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q4 | 132.37 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-8B-Base | Q4 | 132.21 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 132.14 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 132.11 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 131.62 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 131.57 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 131.14 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 130.62 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 130.21 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 129.89 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 129.76 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 129.47 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 129.39 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 129.25 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 129.20 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 129.12 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 129.09 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 129.01 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 128.79 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 128.48 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 128.39 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 127.74 tok/sEstimated Auto-generated benchmark | 2GB |
| openai-community/gpt2 | Q4 | 127.74 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 127.55 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 127.23 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 127.12 tok/sEstimated Auto-generated benchmark | 4GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 126.94 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 126.93 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/VibeVoice-1.5B | Q4 | 126.83 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 126.77 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 126.47 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 126.35 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 126.29 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 126.11 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 125.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 125.61 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 125.43 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.1-8B | Q4 | 125.33 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 125.20 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/phi-2 | Q4 | 125.11 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 125.09 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 125.07 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 124.77 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 124.63 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 124.49 tok/sEstimated Auto-generated benchmark | 3GB |
| EleutherAI/pythia-70m-deduped | Q4 | 123.92 tok/sEstimated Auto-generated benchmark | 4GB |
| parler-tts/parler-tts-large-v1 | Q4 | 123.89 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 123.87 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 123.83 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 123.55 tok/sEstimated Auto-generated benchmark | 4GB |
| liuhaotian/llava-v1.5-7b | Q4 | 123.37 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q4 | 122.78 tok/sEstimated Auto-generated benchmark | 3GB |
| bigscience/bloomz-560m | Q4 | 122.77 tok/sEstimated Auto-generated benchmark | 4GB |
| Tongyi-MAI/Z-Image-Turbo | Q4 | 122.61 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B | Q4 | 122.03 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 121.84 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 121.78 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 121.38 tok/sEstimated Auto-generated benchmark | 2GB |
| openai-community/gpt2-xl | Q4 | 121.13 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 120.90 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 120.83 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-large | Q4 | 120.52 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 119.97 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 119.87 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 119.86 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 119.59 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 119.26 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 118.79 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 118.63 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q4 | 118.38 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B | Q4 | 118.21 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 118.20 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 117.94 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 117.89 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 117.54 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 117.44 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 117.22 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 116.70 tok/sEstimated Auto-generated benchmark | 3GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 116.39 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 116.25 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 115.97 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-research/PowerMoE-3b | Q8 | 115.89 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 115.87 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 115.70 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 115.33 tok/sEstimated Auto-generated benchmark | 3GB |
| black-forest-labs/FLUX.1-dev | Q4 | 114.73 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 114.68 tok/sEstimated Auto-generated benchmark | 3GB |
| rednote-hilab/dots.ocr | Q4 | 114.34 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 114.21 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 113.99 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 113.61 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 113.57 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 113.31 tok/sEstimated Auto-generated benchmark | 1GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 113.19 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 113.18 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-1B | Q8 | 113.02 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 112.56 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-OCR | Q8 | 112.14 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 111.81 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 111.11 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 110.01 tok/sEstimated Auto-generated benchmark | 3GB |
| tencent/HunyuanOCR | Q8 | 109.24 tok/sEstimated Auto-generated benchmark | 2GB |
| google-bert/bert-base-uncased | Q8 | 108.44 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 108.32 tok/sEstimated Auto-generated benchmark | 1GB |
| google/embeddinggemma-300m | Q8 | 107.99 tok/sEstimated Auto-generated benchmark | 1GB |
| facebook/sam3 | Q8 | 107.48 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 107.47 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 107.36 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 106.01 tok/sEstimated Auto-generated benchmark | 3GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 104.94 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q8 | 104.92 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q8 | 104.61 tok/sEstimated Auto-generated benchmark | 1GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 103.51 tok/sEstimated Auto-generated benchmark | 7GB |
| inference-net/Schematron-3B | Q8 | 103.13 tok/sEstimated Auto-generated benchmark | 3GB |
| EssentialAI/rnj-1 | Q4 | 101.59 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-3B | Q8 | 100.88 tok/sEstimated Auto-generated benchmark | 3GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 100.69 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 100.36 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B | Q8 | 100.01 tok/sEstimated Auto-generated benchmark | 3GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 99.38 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-1b-it | Q8 | 99.16 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 99.06 tok/sEstimated Auto-generated benchmark | 1GB |
| google-t5/t5-3b | Q8 | 98.94 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 98.93 tok/sEstimated Auto-generated benchmark | 7GB |
| LiquidAI/LFM2-1.2B | Q8 | 98.60 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-14B-Base | Q4 | 98.21 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 97.47 tok/sEstimated Auto-generated benchmark | 7GB |
| dicta-il/dictalm2.0-instruct | Q8 | 96.76 tok/sEstimated Auto-generated benchmark | 7GB |
| bigcode/starcoder2-3b | Q8 | 96.52 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-medium | Q8 | 96.51 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 96.35 tok/sEstimated Auto-generated benchmark | 5GB |
| google/gemma-2b | Q8 | 96.31 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 96.28 tok/sEstimated Auto-generated benchmark | 9GB |
| nari-labs/Dia2-2B | Q8 | 96.27 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 96.26 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 95.99 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 95.85 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 95.77 tok/sEstimated Auto-generated benchmark | 7GB |
| WeiboAI/VibeThinker-1.5B | Q8 | 95.71 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 95.17 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 94.79 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 94.76 tok/sEstimated Auto-generated benchmark | 9GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 94.61 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 94.43 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 94.32 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 93.98 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 93.63 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-9b-it | Q4 | 93.63 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 93.60 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 93.35 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 93.21 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 93.17 tok/sEstimated Auto-generated benchmark | 6GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 93.11 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 93.08 tok/sEstimated Auto-generated benchmark | 7GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 93.03 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 92.80 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 92.74 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 92.54 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 92.41 tok/sEstimated Auto-generated benchmark | 9GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 91.86 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 91.64 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 91.58 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 91.43 tok/sEstimated Auto-generated benchmark | 4GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 91.40 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q4 | 91.35 tok/sEstimated Auto-generated benchmark | 8GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 91.10 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q8 | 91.03 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 90.97 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 90.95 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen-Image-Edit-2509 | Q8 | 90.85 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 90.81 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 90.81 tok/sEstimated Auto-generated benchmark | 8GB |
| skt/kogpt2-base-v2 | Q8 | 90.75 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 90.75 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 90.48 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-4B-Base | Q8 | 90.45 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 90.41 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 90.23 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 90.09 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 90.06 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q8 | 90.02 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 89.80 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B | Q8 | 89.68 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 89.64 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 89.59 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 89.56 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 89.44 tok/sEstimated Auto-generated benchmark | 5GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 89.42 tok/sEstimated Auto-generated benchmark | 5GB |
| distilbert/distilgpt2 | Q8 | 89.31 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 89.18 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 89.15 tok/sEstimated Auto-generated benchmark | 7GB |
| allenai/Olmo-3-7B-Think | Q8 | 88.65 tok/sEstimated Auto-generated benchmark | 8GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 88.48 tok/sEstimated Auto-generated benchmark | 7GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 88.47 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/phi-4 | Q8 | 88.33 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 88.29 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 88.20 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/DialoGPT-medium | Q8 | 88.02 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 87.98 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 87.76 tok/sEstimated Auto-generated benchmark | 9GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 87.61 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 87.61 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/VibeVoice-1.5B | Q8 | 87.43 tok/sEstimated Auto-generated benchmark | 5GB |
| huggyllama/llama-7b | Q8 | 87.41 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 87.35 tok/sEstimated Auto-generated benchmark | 9GB |
| Tongyi-MAI/Z-Image-Turbo | Q8 | 86.98 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 86.66 tok/sEstimated Auto-generated benchmark | 9GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 86.37 tok/sEstimated Auto-generated benchmark | 9GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 86.35 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 86.29 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 86.18 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 86.08 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 85.96 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 85.95 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 85.77 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 85.56 tok/sEstimated Auto-generated benchmark | 9GB |
| zai-org/GLM-4.6-FP8 | Q8 | 85.38 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B | Q4 | 85.29 tok/sEstimated Auto-generated benchmark | 7GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 85.25 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 85.16 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 85.05 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 84.98 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 84.91 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 84.37 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B | Q8 | 84.36 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 84.25 tok/sEstimated Auto-generated benchmark | 5GB |
| numind/NuExtract-1.5 | Q8 | 84.24 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 84.08 tok/sEstimated Auto-generated benchmark | 6GB |
| black-forest-labs/FLUX.2-dev | Q8 | 84.01 tok/sEstimated Auto-generated benchmark | 8GB |
| tencent/HunyuanVideo-1.5 | Q8 | 83.86 tok/sEstimated Auto-generated benchmark | 8GB |
| google/gemma-3-270m-it | Q8 | 83.82 tok/sEstimated Auto-generated benchmark | 7GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 83.82 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 83.82 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 83.79 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 83.28 tok/sEstimated Auto-generated benchmark | 5GB |
| openai-community/gpt2-large | Q8 | 83.14 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 83.06 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B | Q8 | 83.04 tok/sEstimated Auto-generated benchmark | 6GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 83.02 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 82.97 tok/sEstimated Auto-generated benchmark | 9GB |
| liuhaotian/llava-v1.5-7b | Q8 | 82.96 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 82.51 tok/sEstimated Auto-generated benchmark | 6GB |
| EleutherAI/pythia-70m-deduped | Q8 | 82.46 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 82.19 tok/sEstimated Auto-generated benchmark | 7GB |
| black-forest-labs/FLUX.1-dev | Q8 | 82.06 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 82.04 tok/sEstimated Auto-generated benchmark | 7GB |
| vikhyatk/moondream2 | Q8 | 81.87 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 81.60 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-xl | Q8 | 81.34 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 81.17 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 81.07 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 81.06 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 80.73 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 80.59 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 80.58 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 79.87 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 79.86 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B | Q8 | 79.74 tok/sEstimated Auto-generated benchmark | 5GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 79.70 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B | Q8 | 79.35 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-30B-A3B | Q4 | 75.44 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 75.10 tok/sEstimated Auto-generated benchmark | 10GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 74.73 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 74.64 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 72.84 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 72.74 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-14B-Base | Q8 | 72.35 tok/sEstimated Auto-generated benchmark | 14GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 72.11 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2.5-14B | Q8 | 71.91 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 71.83 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q8 | 71.69 tok/sEstimated Auto-generated benchmark | 16GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 71.66 tok/sEstimated Auto-generated benchmark | 14GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 71.48 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 71.18 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-14B | Q8 | 71.09 tok/sEstimated Auto-generated benchmark | 14GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 70.81 tok/sEstimated Auto-generated benchmark | 10GB |
| openai/gpt-oss-safeguard-20b | Q4 | 69.60 tok/sEstimated Auto-generated benchmark | 11GB |
| google/gemma-2-27b-it | Q4 | 69.22 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 69.11 tok/sEstimated Auto-generated benchmark | 14GB |
| google/gemma-2-9b-it | Q8 | 68.96 tok/sEstimated Auto-generated benchmark | 10GB |
| EssentialAI/rnj-1 | Q8 | 68.13 tok/sEstimated Auto-generated benchmark | 10GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 67.08 tok/sEstimated Auto-generated benchmark | 9GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 66.42 tok/sEstimated Auto-generated benchmark | 13GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 64.57 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 64.13 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 63.67 tok/sEstimated Auto-generated benchmark | 15GB |
| openai/gpt-oss-20b | Q4 | 63.43 tok/sEstimated Auto-generated benchmark | 10GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 62.43 tok/sEstimated Auto-generated benchmark | 11GB |
| tencent/HunyuanOCR | FP16 | 62.29 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 62.13 tok/sEstimated Auto-generated benchmark | 6GB |
| LiquidAI/LFM2-1.2B | FP16 | 62.11 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 61.94 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-3.2-1B-Instruct | FP16 | 61.41 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | FP16 | 60.88 tok/sEstimated Auto-generated benchmark | 6GB |
| google-bert/bert-base-uncased | FP16 | 60.56 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | FP16 | 60.47 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-OCR | FP16 | 60.15 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 60.06 tok/sEstimated Auto-generated benchmark | 15GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 59.79 tok/sEstimated Auto-generated benchmark | 14GB |
| meta-llama/Llama-Guard-3-1B | FP16 | 59.67 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2b | FP16 | 59.16 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | FP16 | 59.07 tok/sEstimated Auto-generated benchmark | 6GB |
| unsloth/Llama-3.2-3B-Instruct | FP16 | 57.83 tok/sEstimated Auto-generated benchmark | 6GB |
| google/gemma-3-1b-it | FP16 | 57.68 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-1B-Instruct | FP16 | 57.48 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | FP16 | 57.40 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B-Instruct | FP16 | 57.35 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-3B | FP16 | 57.25 tok/sEstimated Auto-generated benchmark | 6GB |
| google/gemma-2-2b-it | FP16 | 56.71 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/sam3 | FP16 | 55.82 tok/sEstimated Auto-generated benchmark | 2GB |
| WeiboAI/VibeThinker-1.5B | FP16 | 55.56 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-research/PowerMoE-3b | FP16 | 55.44 tok/sEstimated Auto-generated benchmark | 6GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | FP16 | 54.59 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | FP16 | 54.50 tok/sEstimated Auto-generated benchmark | 2GB |
| google/embeddinggemma-300m | FP16 | 54.38 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | FP16 | 54.22 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | FP16 | 54.11 tok/sEstimated Auto-generated benchmark | 6GB |
| inference-net/Schematron-3B | FP16 | 53.46 tok/sEstimated Auto-generated benchmark | 6GB |
| ibm-granite/granite-3.3-2b-instruct | FP16 | 53.27 tok/sEstimated Auto-generated benchmark | 4GB |
| openai/gpt-oss-safeguard-20b | Q8 | 53.18 tok/sEstimated Auto-generated benchmark | 22GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | 52.62 tok/sEstimated Auto-generated benchmark | 31GB |
| nari-labs/Dia2-2B | FP16 | 52.61 tok/sEstimated Auto-generated benchmark | 5GB |
| bigcode/starcoder2-3b | FP16 | 52.53 tok/sEstimated Auto-generated benchmark | 6GB |
| microsoft/phi-2 | FP16 | 52.52 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | FP16 | 52.51 tok/sEstimated Auto-generated benchmark | 9GB |
| EleutherAI/pythia-70m-deduped | FP16 | 52.45 tok/sEstimated Auto-generated benchmark | 15GB |
| openai/gpt-oss-20b | Q8 | 52.40 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen3-8B | FP16 | 52.30 tok/sEstimated Auto-generated benchmark | 17GB |
| ibm-granite/granite-3.3-8b-instruct | FP16 | 52.23 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | 52.20 tok/sEstimated Auto-generated benchmark | 31GB |
| black-forest-labs/FLUX.2-dev | FP16 | 52.13 tok/sEstimated Auto-generated benchmark | 16GB |
| EleutherAI/gpt-neo-125m | FP16 | 52.07 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-1.5B-Instruct | FP16 | 51.97 tok/sEstimated Auto-generated benchmark | 11GB |
| meta-llama/Llama-2-7b-chat-hf | FP16 | 51.80 tok/sEstimated Auto-generated benchmark | 15GB |
| allenai/OLMo-2-0425-1B | FP16 | 51.76 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-270m-it | FP16 | 51.66 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-7B-v0.1 | FP16 | 51.62 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | FP16 | 51.56 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-1.7B-Base | FP16 | 51.52 tok/sEstimated Auto-generated benchmark | 15GB |
| zai-org/GLM-4.6-FP8 | FP16 | 51.47 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-2-7b-hf | FP16 | 51.17 tok/sEstimated Auto-generated benchmark | 15GB |
| GSAI-ML/LLaDA-8B-Instruct | FP16 | 51.12 tok/sEstimated Auto-generated benchmark | 17GB |
| llamafactory/tiny-random-Llama-3 | FP16 | 51.08 tok/sEstimated Auto-generated benchmark | 15GB |
| ibm-granite/granite-docling-258M | FP16 | 51.06 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3 | FP16 | 50.91 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceH4/zephyr-7b-beta | FP16 | 50.81 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q8 | 50.80 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen3-4B | FP16 | 50.74 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-0.5B-Instruct | FP16 | 50.67 tok/sEstimated Auto-generated benchmark | 11GB |
| openai-community/gpt2-xl | FP16 | 50.66 tok/sEstimated Auto-generated benchmark | 15GB |
| rednote-hilab/dots.ocr | FP16 | 50.53 tok/sEstimated Auto-generated benchmark | 15GB |
| facebook/opt-125m | FP16 | 50.52 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B-FP8 | FP16 | 50.49 tok/sEstimated Auto-generated benchmark | 17GB |
| BSC-LT/salamandraTA-7b-instruct | FP16 | 50.14 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-0.5B | FP16 | 50.07 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-0.6B | FP16 | 49.98 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2.5-Coder-1.5B | FP16 | 49.79 tok/sEstimated Auto-generated benchmark | 11GB |
| deepseek-ai/DeepSeek-R1-0528 | FP16 | 49.76 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-0.6B-Base | FP16 | 49.75 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | 49.68 tok/sEstimated Auto-generated benchmark | 31GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | FP16 | 49.50 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/DialoGPT-medium | FP16 | 49.45 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | 49.32 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-small | FP16 | 49.27 tok/sEstimated Auto-generated benchmark | 15GB |
| distilbert/distilgpt2 | FP16 | 49.26 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 49.26 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/VibeVoice-1.5B | FP16 | 49.19 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-Embedding-8B | FP16 | 49.12 tok/sEstimated Auto-generated benchmark | 17GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | 48.78 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2-0.5B-Instruct | FP16 | 48.78 tok/sEstimated Auto-generated benchmark | 11GB |
| microsoft/Phi-3.5-vision-instruct | FP16 | 48.65 tok/sEstimated Auto-generated benchmark | 15GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 48.60 tok/sEstimated Auto-generated benchmark | 20GB |
| allenai/Olmo-3-7B-Think | FP16 | 48.59 tok/sEstimated Auto-generated benchmark | 16GB |
| dicta-il/dictalm2.0-instruct | FP16 | 48.54 tok/sEstimated Auto-generated benchmark | 15GB |
| vikhyatk/moondream2 | FP16 | 48.50 tok/sEstimated Auto-generated benchmark | 15GB |
| Tongyi-MAI/Z-Image-Turbo | FP16 | 48.50 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/DeepSeek-V3-0324 | FP16 | 48.46 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | FP16 | 48.39 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | 48.36 tok/sEstimated Auto-generated benchmark | 31GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | 48.32 tok/sEstimated Auto-generated benchmark | 9GB |
| tencent/HunyuanVideo-1.5 | FP16 | 48.30 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Meta-Llama-3-8B | FP16 | 48.30 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | FP16 | 48.30 tok/sEstimated Auto-generated benchmark | 9GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | FP16 | 48.28 tok/sEstimated Auto-generated benchmark | 15GB |
| huggyllama/llama-7b | FP16 | 48.24 tok/sEstimated Auto-generated benchmark | 15GB |
| parler-tts/parler-tts-large-v1 | FP16 | 48.17 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-2-27b-it | Q8 | 48.07 tok/sEstimated Auto-generated benchmark | 28GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 48.05 tok/sEstimated Auto-generated benchmark | 15GB |
| sshleifer/tiny-gpt2 | FP16 | 47.95 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3-mini-128k-instruct | FP16 | 47.92 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B-Base | FP16 | 47.79 tok/sEstimated Auto-generated benchmark | 17GB |
| moonshotai/Kimi-K2-Thinking | Q4 | 47.72 tok/sEstimated Auto-generated benchmark | 489GB |
| openai-community/gpt2-large | FP16 | 47.72 tok/sEstimated Auto-generated benchmark | 15GB |
| 01-ai/Yi-1.5-34B-Chat | Q4 | 47.70 tok/sEstimated Auto-generated benchmark | 18GB |
| rinna/japanese-gpt-neox-small | FP16 | 47.65 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | 47.62 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/QwQ-32B-Preview | Q4 | 47.58 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | 47.55 tok/sEstimated Auto-generated benchmark | 31GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | FP16 | 47.47 tok/sEstimated Auto-generated benchmark | 11GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 47.43 tok/sEstimated Auto-generated benchmark | 34GB |
| numind/NuExtract-1.5 | FP16 | 47.43 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | FP16 | 47.35 tok/sEstimated Auto-generated benchmark | 17GB |
| swiss-ai/Apertus-8B-Instruct-2509 | FP16 | 47.26 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Reranker-0.6B | FP16 | 47.22 tok/sEstimated Auto-generated benchmark | 13GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | 47.17 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | 47.14 tok/sEstimated Auto-generated benchmark | 31GB |
| GSAI-ML/LLaDA-8B-Base | FP16 | 47.09 tok/sEstimated Auto-generated benchmark | 17GB |
| mistralai/Mistral-7B-Instruct-v0.1 | FP16 | 47.00 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3-mini-4k-instruct | FP16 | 46.95 tok/sEstimated Auto-generated benchmark | 15GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | 46.93 tok/sEstimated Auto-generated benchmark | 34GB |
| MiniMaxAI/MiniMax-M2 | FP16 | 46.93 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1 | FP16 | 46.90 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-8B | FP16 | 46.90 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-7B | FP16 | 46.86 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/phi-4 | FP16 | 46.85 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3.1 | FP16 | 46.60 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-1.5B-Instruct | FP16 | 46.57 tok/sEstimated Auto-generated benchmark | 11GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | FP16 | 46.34 tok/sEstimated Auto-generated benchmark | 17GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | FP16 | 46.24 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-Embedding-0.6B | FP16 | 46.23 tok/sEstimated Auto-generated benchmark | 13GB |
| mistralai/Mistral-7B-Instruct-v0.2 | FP16 | 46.21 tok/sEstimated Auto-generated benchmark | 15GB |
| black-forest-labs/FLUX.1-dev | FP16 | 46.15 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 46.12 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen-Image-Edit-2509 | FP16 | 46.08 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | FP16 | 45.94 tok/sEstimated Auto-generated benchmark | 15GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 45.89 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 45.88 tok/sEstimated Auto-generated benchmark | 34GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | FP16 | 45.87 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-1.5B | FP16 | 45.74 tok/sEstimated Auto-generated benchmark | 11GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 45.65 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen2-7B-Instruct | FP16 | 45.61 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2 | FP16 | 45.47 tok/sEstimated Auto-generated benchmark | 15GB |
| hmellor/tiny-random-LlamaForCausalLM | FP16 | 45.46 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | FP16 | 45.46 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 45.34 tok/sEstimated Auto-generated benchmark | 16GB |
| skt/kogpt2-base-v2 | FP16 | 45.32 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | 45.29 tok/sEstimated Auto-generated benchmark | 23GB |
| Qwen/Qwen3-4B-Instruct-2507 | FP16 | 45.24 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 45.23 tok/sEstimated Auto-generated benchmark | 16GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | FP16 | 45.22 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 45.20 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | FP16 | 45.18 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 45.04 tok/sEstimated Auto-generated benchmark | 17GB |
| IlyaGusev/saiga_llama3_8b | FP16 | 44.60 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-0.5B | FP16 | 44.57 tok/sEstimated Auto-generated benchmark | 11GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | FP16 | 44.48 tok/sEstimated Auto-generated benchmark | 15GB |
| zai-org/GLM-4.5-Air | FP16 | 44.48 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-medium | FP16 | 44.44 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM-135M | FP16 | 44.37 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Embedding-4B | FP16 | 44.34 tok/sEstimated Auto-generated benchmark | 9GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | 44.25 tok/sEstimated Auto-generated benchmark | 34GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | 44.22 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | 44.21 tok/sEstimated Auto-generated benchmark | 31GB |
| HuggingFaceTB/SmolLM2-135M | FP16 | 44.19 tok/sEstimated Auto-generated benchmark | 15GB |
| Gensyn/Qwen2.5-0.5B-Instruct | FP16 | 44.15 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-4B-Base | FP16 | 44.07 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-Guard-3-8B | FP16 | 44.00 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-Math-1.5B | FP16 | 43.99 tok/sEstimated Auto-generated benchmark | 11GB |
| codellama/CodeLlama-34b-hf | Q4 | 43.97 tok/sEstimated Auto-generated benchmark | 17GB |
| bigscience/bloomz-560m | FP16 | 43.92 tok/sEstimated Auto-generated benchmark | 15GB |
| petals-team/StableBeluga2 | FP16 | 43.88 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-32B | Q4 | 43.81 tok/sEstimated Auto-generated benchmark | 16GB |
| liuhaotian/llava-v1.5-7b | FP16 | 43.78 tok/sEstimated Auto-generated benchmark | 15GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 43.73 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 43.69 tok/sEstimated Auto-generated benchmark | 20GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | 43.63 tok/sEstimated Auto-generated benchmark | 34GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | FP16 | 43.50 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | 43.23 tok/sEstimated Auto-generated benchmark | 17GB |
| lmsys/vicuna-7b-v1.5 | FP16 | 43.22 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-4-multimodal-instruct | FP16 | 43.16 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Thinking-2507 | FP16 | 43.11 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-1.7B | FP16 | 43.07 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-4-mini-instruct | FP16 | 43.03 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 43.01 tok/sEstimated Auto-generated benchmark | 8GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q4 | 42.62 tok/sEstimated Auto-generated benchmark | 25GB |
| Qwen/Qwen2.5-32B | Q4 | 41.62 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 41.06 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 40.95 tok/sEstimated Auto-generated benchmark | 17GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 40.56 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/DeepSeek-V2.5 | Q4 | 40.01 tok/sEstimated Auto-generated benchmark | 328GB |
| Qwen/Qwen3-14B | FP16 | 38.91 tok/sEstimated Auto-generated benchmark | 29GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 38.10 tok/sEstimated Auto-generated benchmark | 17GB |
| mistralai/Ministral-3-14B-Instruct-2512 | FP16 | 37.44 tok/sEstimated Auto-generated benchmark | 32GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 37.21 tok/sEstimated Auto-generated benchmark | 30GB |
| ai-forever/ruGPT-3.5-13B | FP16 | 36.79 tok/sEstimated Auto-generated benchmark | 27GB |
| microsoft/Phi-3-medium-128k-instruct | FP16 | 36.69 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen3-14B-Base | FP16 | 36.68 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 36.57 tok/sEstimated Auto-generated benchmark | 29GB |
| NousResearch/Hermes-3-Llama-3.1-8B | FP16 | 35.88 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-14B | FP16 | 34.60 tok/sEstimated Auto-generated benchmark | 29GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | FP16 | 34.59 tok/sEstimated Auto-generated benchmark | 19GB |
| OpenPipe/Qwen3-14B-Instruct | FP16 | 34.59 tok/sEstimated Auto-generated benchmark | 29GB |
| google/gemma-2-9b-it | FP16 | 33.85 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | 33.49 tok/sEstimated Auto-generated benchmark | 34GB |
| meta-llama/Llama-2-13b-chat-hf | FP16 | 33.21 tok/sEstimated Auto-generated benchmark | 27GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | 33.16 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 33.13 tok/sEstimated Auto-generated benchmark | 34GB |
| codellama/CodeLlama-34b-hf | Q8 | 32.86 tok/sEstimated Auto-generated benchmark | 35GB |
| EssentialAI/rnj-1 | FP16 | 32.70 tok/sEstimated Auto-generated benchmark | 19GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 32.55 tok/sEstimated Auto-generated benchmark | 68GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | 32.32 tok/sEstimated Auto-generated benchmark | 34GB |
| baichuan-inc/Baichuan-M2-32B | Q8 | 31.97 tok/sEstimated Auto-generated benchmark | 33GB |
| deepseek-ai/DeepSeek-V2.5 | Q8 | 31.26 tok/sEstimated Auto-generated benchmark | 656GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | 30.99 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen3-32B | Q8 | 30.98 tok/sEstimated Auto-generated benchmark | 33GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | 30.52 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 30.44 tok/sEstimated Auto-generated benchmark | 33GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | 30.36 tok/sEstimated Auto-generated benchmark | 68GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | 29.62 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen2.5-32B | Q8 | 29.26 tok/sEstimated Auto-generated benchmark | 33GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q8 | 29.14 tok/sEstimated Auto-generated benchmark | 50GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | 28.96 tok/sEstimated Auto-generated benchmark | 33GB |
| moonshotai/Kimi-K2-Thinking | Q8 | 28.91 tok/sEstimated Auto-generated benchmark | 978GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | FP16 | 28.61 tok/sEstimated Auto-generated benchmark | 61GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | FP16 | 28.33 tok/sEstimated Auto-generated benchmark | 41GB |
| 01-ai/Yi-1.5-34B-Chat | Q8 | 28.30 tok/sEstimated Auto-generated benchmark | 35GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | 28.18 tok/sEstimated Auto-generated benchmark | 68GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 27.96 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/QwQ-32B-Preview | Q8 | 27.80 tok/sEstimated Auto-generated benchmark | 34GB |
| google/gemma-2-27b-it | FP16 | 27.63 tok/sEstimated Auto-generated benchmark | 56GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | 27.51 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | 27.47 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen3-30B-A3B | FP16 | 27.10 tok/sEstimated Auto-generated benchmark | 61GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | FP16 | 26.88 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | 26.09 tok/sEstimated Auto-generated benchmark | 61GB |
| openai/gpt-oss-20b | FP16 | 25.97 tok/sEstimated Auto-generated benchmark | 41GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | 25.85 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 25.80 tok/sEstimated Auto-generated benchmark | 36GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | 25.70 tok/sEstimated Auto-generated benchmark | 44GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | 25.46 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | FP16 | 25.43 tok/sEstimated Auto-generated benchmark | 61GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 25.12 tok/sEstimated Auto-generated benchmark | 34GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | FP16 | 24.94 tok/sEstimated Auto-generated benchmark | 61GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | FP16 | 24.86 tok/sEstimated Auto-generated benchmark | 61GB |
| AI-MO/Kimina-Prover-72B | Q4 | 24.69 tok/sEstimated Auto-generated benchmark | 35GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | 24.57 tok/sEstimated Auto-generated benchmark | 61GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | FP16 | 24.39 tok/sEstimated Auto-generated benchmark | 41GB |
| openai/gpt-oss-safeguard-20b | FP16 | 24.16 tok/sEstimated Auto-generated benchmark | 44GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | 24.16 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | FP16 | 23.99 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 23.97 tok/sEstimated Auto-generated benchmark | 35GB |
| mistralai/Mistral-Small-Instruct-2409 | FP16 | 23.97 tok/sEstimated Auto-generated benchmark | 46GB |
| unsloth/gpt-oss-20b-BF16 | FP16 | 23.90 tok/sEstimated Auto-generated benchmark | 41GB |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | 23.75 tok/sEstimated Auto-generated benchmark | 60GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | 23.65 tok/sEstimated Auto-generated benchmark | 36GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | 23.49 tok/sEstimated Auto-generated benchmark | 39GB |
| openai/gpt-oss-120b | Q4 | 23.24 tok/sEstimated Auto-generated benchmark | 59GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 22.79 tok/sEstimated Auto-generated benchmark | 34GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | 22.00 tok/sEstimated Auto-generated benchmark | 138GB |
| deepseek-ai/DeepSeek-Math-V2 | Q4 | 20.41 tok/sEstimated Auto-generated benchmark | 383GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | 18.95 tok/sEstimated Auto-generated benchmark | 88GB |
| AI-MO/Kimina-Prover-72B | Q8 | 18.59 tok/sEstimated Auto-generated benchmark | 70GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | 18.51 tok/sEstimated Auto-generated benchmark | 115GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | 18.22 tok/sEstimated Auto-generated benchmark | 69GB |
| openai/gpt-oss-120b | Q8 | 18.19 tok/sEstimated Auto-generated benchmark | 117GB |
| 01-ai/Yi-1.5-34B-Chat | FP16 | 18.09 tok/sEstimated Auto-generated benchmark | 70GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 17.82 tok/sEstimated Auto-generated benchmark | 137GB |
| baichuan-inc/Baichuan-M2-32B | FP16 | 17.80 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 17.76 tok/sEstimated Auto-generated benchmark | 71GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 17.35 tok/sEstimated Auto-generated benchmark | 137GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | FP16 | 17.33 tok/sEstimated Auto-generated benchmark | 137GB |
| deepseek-ai/DeepSeek-V2.5 | FP16 | 17.32 tok/sEstimated Auto-generated benchmark | 1312GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 17.23 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen2.5-32B | FP16 | 17.10 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | 17.02 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | 16.87 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | FP16 | 16.86 tok/sEstimated Auto-generated benchmark | 67GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | 16.83 tok/sEstimated Auto-generated benchmark | 71GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | 16.80 tok/sEstimated Auto-generated benchmark | 69GB |
| moonshotai/Kimi-K2-Thinking | FP16 | 16.61 tok/sEstimated Auto-generated benchmark | 1956GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | FP16 | 16.52 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 16.51 tok/sEstimated Auto-generated benchmark | 67GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | 16.45 tok/sEstimated Auto-generated benchmark | 78GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 16.34 tok/sEstimated Auto-generated benchmark | 69GB |
| meta-llama/Meta-Llama-3-70B-Instruct | FP16 | 16.22 tok/sEstimated Auto-generated benchmark | 137GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | FP16 | 16.22 tok/sEstimated Auto-generated benchmark | 101GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 16.19 tok/sEstimated Auto-generated benchmark | 69GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | FP16 | 16.16 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | 16.07 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen3-235B-A22B | Q4 | 16.07 tok/sEstimated Auto-generated benchmark | 115GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 16.04 tok/sEstimated Auto-generated benchmark | 66GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | FP16 | 15.98 tok/sEstimated Auto-generated benchmark | 137GB |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | 15.94 tok/sEstimated Auto-generated benchmark | 120GB |
| Qwen/QwQ-32B-Preview | FP16 | 15.85 tok/sEstimated Auto-generated benchmark | 67GB |
| codellama/CodeLlama-34b-hf | FP16 | 15.80 tok/sEstimated Auto-generated benchmark | 70GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | 15.77 tok/sEstimated Auto-generated benchmark | 378GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | FP16 | 15.60 tok/sEstimated Auto-generated benchmark | 70GB |
| deepseek-ai/deepseek-coder-33b-instruct | FP16 | 15.36 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen3-32B | FP16 | 15.05 tok/sEstimated Auto-generated benchmark | 66GB |
| MiniMaxAI/MiniMax-VL-01 | Q4 | 15.00 tok/sEstimated Auto-generated benchmark | 256GB |
| MiniMaxAI/MiniMax-M1-40k | Q4 | 13.58 tok/sEstimated Auto-generated benchmark | 255GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | 13.51 tok/sEstimated Auto-generated benchmark | 231GB |
| deepseek-ai/DeepSeek-Math-V2 | Q8 | 12.86 tok/sEstimated Auto-generated benchmark | 766GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | FP16 | 11.46 tok/sEstimated Auto-generated benchmark | 275GB |
| MiniMaxAI/MiniMax-M1-40k | Q8 | 11.37 tok/sEstimated Auto-generated benchmark | 510GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | 11.22 tok/sEstimated Auto-generated benchmark | 755GB |
| Qwen/Qwen3-235B-A22B | Q8 | 10.50 tok/sEstimated Auto-generated benchmark | 230GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 10.38 tok/sEstimated Auto-generated benchmark | 141GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | FP16 | 10.35 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 10.28 tok/sEstimated Auto-generated benchmark | 142GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | FP16 | 10.17 tok/sEstimated Auto-generated benchmark | 138GB |
| MiniMaxAI/MiniMax-VL-01 | Q8 | 10.15 tok/sEstimated Auto-generated benchmark | 511GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | FP16 | 10.15 tok/sEstimated Auto-generated benchmark | 156GB |
| NousResearch/Hermes-3-Llama-3.1-70B | FP16 | 10.11 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen2.5-Math-72B-Instruct | FP16 | 10.08 tok/sEstimated Auto-generated benchmark | 142GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | 9.83 tok/sEstimated Auto-generated benchmark | 156GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 9.51 tok/sEstimated Auto-generated benchmark | 138GB |
| openai/gpt-oss-120b | FP16 | 9.36 tok/sEstimated Auto-generated benchmark | 235GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | FP16 | 9.33 tok/sEstimated Auto-generated benchmark | 156GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | FP16 | 9.10 tok/sEstimated Auto-generated benchmark | 176GB |
| AI-MO/Kimina-Prover-72B | FP16 | 9.07 tok/sEstimated Auto-generated benchmark | 141GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 9.06 tok/sEstimated Auto-generated benchmark | 138GB |
| mistralai/Mistral-Large-Instruct-2411 | FP16 | 8.65 tok/sEstimated Auto-generated benchmark | 240GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | FP16 | 7.28 tok/sEstimated Auto-generated benchmark | 461GB |
| deepseek-ai/DeepSeek-Math-V2 | FP16 | 6.98 tok/sEstimated Auto-generated benchmark | 1532GB |
| MiniMaxAI/MiniMax-VL-01 | FP16 | 5.96 tok/sEstimated Auto-generated benchmark | 1021GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | 5.71 tok/sEstimated Auto-generated benchmark | 1509GB |
| Qwen/Qwen3-235B-A22B | FP16 | 5.18 tok/sEstimated Auto-generated benchmark | 460GB |
| MiniMaxAI/MiniMax-M1-40k | FP16 | 5.18 tok/sEstimated Auto-generated benchmark | 1020GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | Not supported | 15.77 tok/sEstimated | 378GB (have 10GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | Not supported | 11.22 tok/sEstimated | 755GB (have 10GB) |
| EssentialAI/rnj-1 | FP16 | Not supported | 32.70 tok/sEstimated | 19GB (have 10GB) |
| EssentialAI/rnj-1 | Q8 | Fits (tight) | 68.13 tok/sEstimated | 10GB (have 10GB) |
| EssentialAI/rnj-1 | Q4 | Fits comfortably | 101.59 tok/sEstimated | 5GB (have 10GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | Not supported | 5.71 tok/sEstimated | 1509GB (have 10GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 118.21 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Meta-Llama-3-8B | FP16 | Not supported | 48.30 tok/sEstimated | 17GB (have 10GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 132.37 tok/sEstimated | 3GB (have 10GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 125.33 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits (tight) | 93.98 tok/sEstimated | 9GB (have 10GB) |
| Qwen/Qwen3-8B-FP8 | FP16 | Not supported | 50.49 tok/sEstimated | 17GB (have 10GB) |
| microsoft/Phi-3.5-vision-instruct | FP16 | Not supported | 48.65 tok/sEstimated | 15GB (have 10GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | FP16 | Not supported | 47.26 tok/sEstimated | 17GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits (tight) | 92.41 tok/sEstimated | 9GB (have 10GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits (tight) | 92.74 tok/sEstimated | 9GB (have 10GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 129.47 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-14B-Base | FP16 | Not supported | 36.68 tok/sEstimated | 29GB (have 10GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 115.89 tok/sEstimated | 3GB (have 10GB) |
| ibm-research/PowerMoE-3b | FP16 | Fits comfortably | 55.44 tok/sEstimated | 6GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 137.47 tok/sEstimated | 2GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 89.59 tok/sEstimated | 4GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | FP16 | Fits (tight) | 52.51 tok/sEstimated | 9GB (have 10GB) |
| Qwen/Qwen3-1.7B-Base | FP16 | Not supported | 51.52 tok/sEstimated | 15GB (have 10GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | 31.97 tok/sEstimated | 33GB (have 10GB) |
| baichuan-inc/Baichuan-M2-32B | FP16 | Not supported | 17.80 tok/sEstimated | 66GB (have 10GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 100.69 tok/sEstimated | 7GB (have 10GB) |
| tencent/HunyuanOCR | Q4 | Fits comfortably | 147.50 tok/sEstimated | 1GB (have 10GB) |
| tencent/HunyuanOCR | Q8 | Fits comfortably | 109.24 tok/sEstimated | 2GB (have 10GB) |
| tencent/HunyuanOCR | FP16 | Fits comfortably | 62.29 tok/sEstimated | 3GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | Not supported | 49.32 tok/sEstimated | 15GB (have 10GB) |
| Qwen/Qwen-Image-Edit-2509 | Q4 | Fits comfortably | 133.78 tok/sEstimated | 4GB (have 10GB) |
| facebook/sam3 | FP16 | Fits comfortably | 55.82 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen3-14B-Base | Q8 | Not supported | 72.35 tok/sEstimated | 14GB (have 10GB) |
| Qwen/Qwen-Image-Edit-2509 | Q8 | Fits comfortably | 90.85 tok/sEstimated | 8GB (have 10GB) |
| Qwen/Qwen-Image-Edit-2509 | FP16 | Not supported | 46.08 tok/sEstimated | 16GB (have 10GB) |
| black-forest-labs/FLUX.1-dev | Q4 | Fits comfortably | 114.73 tok/sEstimated | 4GB (have 10GB) |
| black-forest-labs/FLUX.1-dev | Q8 | Fits comfortably | 82.06 tok/sEstimated | 8GB (have 10GB) |
| google-bert/bert-base-uncased | Q8 | Fits comfortably | 108.44 tok/sEstimated | 1GB (have 10GB) |
| google-bert/bert-base-uncased | FP16 | Fits comfortably | 60.56 tok/sEstimated | 1GB (have 10GB) |
| MiniMaxAI/MiniMax-VL-01 | Q4 | Not supported | 15.00 tok/sEstimated | 256GB (have 10GB) |
| MiniMaxAI/MiniMax-VL-01 | Q8 | Not supported | 10.15 tok/sEstimated | 511GB (have 10GB) |
| MiniMaxAI/MiniMax-VL-01 | FP16 | Not supported | 5.96 tok/sEstimated | 1021GB (have 10GB) |
| WeiboAI/VibeThinker-1.5B | Q4 | Fits comfortably | 150.00 tok/sEstimated | 1GB (have 10GB) |
| WeiboAI/VibeThinker-1.5B | Q8 | Fits comfortably | 95.71 tok/sEstimated | 2GB (have 10GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Not supported | 43.73 tok/sEstimated | 16GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Not supported | 74.64 tok/sEstimated | 15GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | 48.36 tok/sEstimated | 31GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | 47.17 tok/sEstimated | 31GB (have 10GB) |
| deepseek-ai/DeepSeek-Math-V2 | Q4 | Not supported | 20.41 tok/sEstimated | 383GB (have 10GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 136.02 tok/sEstimated | 4GB (have 10GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 86.29 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 92.80 tok/sEstimated | 7GB (have 10GB) |
| deepseek-ai/DeepSeek-V2.5 | Q8 | Not supported | 31.26 tok/sEstimated | 656GB (have 10GB) |
| Qwen/Qwen2.5-7B-Instruct | FP16 | Not supported | 48.05 tok/sEstimated | 15GB (have 10GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 114.68 tok/sEstimated | 3GB (have 10GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 145.00 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 100.36 tok/sEstimated | 3GB (have 10GB) |
| Qwen/Qwen2.5-3B-Instruct | FP16 | Fits comfortably | 57.35 tok/sEstimated | 6GB (have 10GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 134.59 tok/sEstimated | 4GB (have 10GB) |
| vikhyatk/moondream2 | Q8 | Fits comfortably | 81.87 tok/sEstimated | 7GB (have 10GB) |
| vikhyatk/moondream2 | FP16 | Not supported | 48.50 tok/sEstimated | 15GB (have 10GB) |
| microsoft/Phi-4-mini-instruct | FP16 | Not supported | 43.03 tok/sEstimated | 15GB (have 10GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 85.77 tok/sEstimated | 7GB (have 10GB) |
| facebook/sam3 | Q8 | Fits comfortably | 107.48 tok/sEstimated | 1GB (have 10GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 120.83 tok/sEstimated | 3GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | 52.62 tok/sEstimated | 31GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | Not supported | 24.57 tok/sEstimated | 61GB (have 10GB) |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q4 | Not supported | 42.62 tok/sEstimated | 25GB (have 10GB) |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q8 | Not supported | 29.14 tok/sEstimated | 50GB (have 10GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 132.45 tok/sEstimated | 4GB (have 10GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits (tight) | 83.82 tok/sEstimated | 9GB (have 10GB) |
| llamafactory/tiny-random-Llama-3 | FP16 | Not supported | 51.08 tok/sEstimated | 15GB (have 10GB) |
| ibm-granite/granite-3.3-2b-instruct | FP16 | Fits comfortably | 53.27 tok/sEstimated | 4GB (have 10GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 126.83 tok/sEstimated | 3GB (have 10GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 87.43 tok/sEstimated | 5GB (have 10GB) |
| microsoft/VibeVoice-1.5B | FP16 | Not supported | 49.19 tok/sEstimated | 11GB (have 10GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | Not supported | 26.09 tok/sEstimated | 61GB (have 10GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | Not supported | 9.83 tok/sEstimated | 156GB (have 10GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | 25.80 tok/sEstimated | 36GB (have 10GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | 17.76 tok/sEstimated | 71GB (have 10GB) |
| Qwen/Qwen2.5-72B-Instruct | FP16 | Not supported | 10.28 tok/sEstimated | 142GB (have 10GB) |
| Qwen/QwQ-32B-Preview | Q4 | Not supported | 47.58 tok/sEstimated | 17GB (have 10GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | Not supported | 25.46 tok/sEstimated | 34GB (have 10GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | Not supported | 16.80 tok/sEstimated | 69GB (have 10GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 79.74 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Not supported | 69.11 tok/sEstimated | 14GB (have 10GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | Not supported | 45.29 tok/sEstimated | 23GB (have 10GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 107.47 tok/sEstimated | 1GB (have 10GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | FP16 | Fits comfortably | 54.59 tok/sEstimated | 2GB (have 10GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 127.55 tok/sEstimated | 4GB (have 10GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 95.99 tok/sEstimated | 7GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | FP16 | Fits (tight) | 48.30 tok/sEstimated | 9GB (have 10GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 136.09 tok/sEstimated | 2GB (have 10GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 112.56 tok/sEstimated | 3GB (have 10GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | FP16 | Not supported | 17.33 tok/sEstimated | 137GB (have 10GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 152.31 tok/sEstimated | 1GB (have 10GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 93.03 tok/sEstimated | 5GB (have 10GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits (tight) | 71.48 tok/sEstimated | 10GB (have 10GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 117.94 tok/sEstimated | 4GB (have 10GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
Owners running Qwen3-30B-A3B on a 10 GB RTX 3080 report roughly 15 tokens/sec after tuning, keeping interactive coding prompts responsive.
Source: Reddit – /r/LocalLLaMA (mquvxwc)
Some spec sheets assume higher ceilings, but real-world users note they already achieve ~10 tok/sec on a 10 GB 3080—showing how tuning beats blanket requirements.
Source: Reddit – /r/LocalLLaMA (mj408ke)
With larger context windows, Ollama reports 40% of layers moving to system RAM even on 12B models—illustrating the need to tune gpu_layers on 10 GB cards.
Source: Reddit – /r/LocalLLaMA (mnspe0d)
The RTX 3080 Founders Edition includes 10 GB GDDR6X, a 320 W board power, triple 8-pin power connectors, and NVIDIA recommends a 750 W PSU.
Source: TechPowerUp – RTX 3080 Specs
Latest snapshot (Nov 2025): Amazon at $699 (check current availability).
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 3090 stacks up for local inference workloads.
Explore how RTX 3070 stacks up for local inference workloads.
Explore how RTX 4070 stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 4090 stacks up for local inference workloads.
Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.
Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.
Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.