Loading GPU specifications...
Loading GPU benchmarks...
Quick Answer: RTX 3070 offers 8GB VRAM and starts around $319.99. It delivers approximately 99 tokens/sec on meta-llama/Llama-3.2-1B. It typically draws 220W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
Buy directly on Amazon with fast shipping and reliable customer service.
Rotate out primary variants whenever validation flags an issue.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| meta-llama/Llama-3.2-1B | Q4 | 99.24 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 99.23 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 98.55 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2-2b-it | Q4 | 98.45 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 98.41 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 98.33 tok/sEstimated Auto-generated benchmark | 1GB |
| google-bert/bert-base-uncased | Q4 | 97.36 tok/sEstimated Auto-generated benchmark | 1GB |
| google/embeddinggemma-300m | Q4 | 97.17 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/DeepSeek-OCR | Q4 | 94.89 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 94.84 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-research/PowerMoE-3b | Q4 | 94.57 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 93.10 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | Q4 | 93.08 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gemma-3-1b-it | Q4 | 93.04 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 92.24 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 92.15 tok/sEstimated Auto-generated benchmark | 1GB |
| tencent/HunyuanOCR | Q4 | 91.25 tok/sEstimated Auto-generated benchmark | 1GB |
| nari-labs/Dia2-2B | Q4 | 90.44 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q4 | 90.25 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q4 | 88.97 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 87.82 tok/sEstimated Auto-generated benchmark | 2GB |
| facebook/sam3 | Q4 | 87.65 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 87.49 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 86.83 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 85.63 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q4 | 85.38 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 84.65 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 84.19 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-8B | Q4 | 83.76 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2b | Q4 | 83.74 tok/sEstimated Auto-generated benchmark | 1GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 83.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 83.51 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 83.37 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 83.32 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 83.13 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 83.07 tok/sEstimated Auto-generated benchmark | 4GB |
| WeiboAI/VibeThinker-1.5B | Q4 | 83.00 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-1.7B | Q4 | 82.83 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 82.52 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 82.48 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 82.46 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 82.37 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 82.34 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 82.14 tok/sEstimated Auto-generated benchmark | 3GB |
| dicta-il/dictalm2.0-instruct | Q4 | 82.00 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q4 | 82.00 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 82.00 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 81.97 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 81.89 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q4 | 81.73 tok/sEstimated Auto-generated benchmark | 2GB |
| zai-org/GLM-4.6-FP8 | Q4 | 81.41 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.1-dev | Q4 | 81.41 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 81.39 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 81.12 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 80.58 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 80.43 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 80.31 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 80.23 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 80.20 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 80.13 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 79.93 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 79.89 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 79.82 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 79.82 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 79.70 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 79.68 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 79.65 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 79.63 tok/sEstimated Auto-generated benchmark | 4GB |
| rednote-hilab/dots.ocr | Q4 | 79.56 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 79.50 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 79.34 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 79.34 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 79.30 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 79.27 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 79.18 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 79.17 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 79.07 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 78.98 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 78.88 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 78.83 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 78.80 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 78.75 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 78.64 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 78.60 tok/sEstimated Auto-generated benchmark | 3GB |
| Tongyi-MAI/Z-Image-Turbo | Q4 | 78.54 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 78.11 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 77.97 tok/sEstimated Auto-generated benchmark | 4GB |
| parler-tts/parler-tts-large-v1 | Q4 | 77.34 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 77.34 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 77.18 tok/sEstimated Auto-generated benchmark | 2GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 77.01 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.2-dev | Q4 | 76.91 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 76.86 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 76.52 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 76.50 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 76.43 tok/sEstimated Auto-generated benchmark | 3GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 76.28 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 76.27 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 76.21 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 75.88 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/Olmo-3-7B-Think | Q4 | 75.87 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 75.85 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/VibeVoice-1.5B | Q4 | 75.73 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B | Q4 | 75.70 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 75.58 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q4 | 75.54 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 75.52 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 75.48 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 75.40 tok/sEstimated Auto-generated benchmark | 3GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 75.10 tok/sEstimated Auto-generated benchmark | 4GB |
| liuhaotian/llava-v1.5-7b | Q4 | 74.96 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen-Image-Edit-2509 | Q4 | 74.77 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 74.62 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 74.60 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 74.54 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 74.28 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 74.23 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 74.19 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 73.59 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 73.38 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 72.71 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 72.61 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-medium | Q4 | 72.50 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 72.23 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 72.22 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 72.07 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-2-7b-hf | Q4 | 72.02 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B | Q4 | 72.01 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 71.82 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-1.5B | Q4 | 71.67 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 71.54 tok/sEstimated Auto-generated benchmark | 2GB |
| EleutherAI/pythia-70m-deduped | Q4 | 71.20 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 71.17 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 71.14 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 70.75 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 70.73 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 70.58 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 70.27 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 70.20 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 70.15 tok/sEstimated Auto-generated benchmark | 4GB |
| tencent/HunyuanVideo-1.5 | Q4 | 70.07 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 69.94 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 69.87 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 69.85 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 69.80 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 69.63 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 69.58 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 69.40 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 69.30 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 69.24 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2-2b-it | Q8 | 69.23 tok/sEstimated Auto-generated benchmark | 2GB |
| WeiboAI/VibeThinker-1.5B | Q8 | 69.15 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 68.87 tok/sEstimated Auto-generated benchmark | 4GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 68.81 tok/sEstimated Auto-generated benchmark | 1GB |
| facebook/sam3 | Q8 | 68.76 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 68.72 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q8 | 68.68 tok/sEstimated Auto-generated benchmark | 2GB |
| openai-community/gpt2 | Q4 | 68.56 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 68.48 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 68.29 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 67.37 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 66.51 tok/sEstimated Auto-generated benchmark | 3GB |
| allenai/OLMo-2-0425-1B | Q8 | 65.66 tok/sEstimated Auto-generated benchmark | 1GB |
| google-bert/bert-base-uncased | Q8 | 64.84 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 64.28 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 63.85 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q8 | 63.30 tok/sEstimated Auto-generated benchmark | 1GB |
| nari-labs/Dia2-2B | Q8 | 63.00 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 62.93 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-14B-Base | Q4 | 62.18 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.2-1B | Q8 | 62.12 tok/sEstimated Auto-generated benchmark | 1GB |
| bigcode/starcoder2-3b | Q8 | 62.08 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 62.00 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-OCR | Q8 | 61.58 tok/sEstimated Auto-generated benchmark | 4GB |
| google/embeddinggemma-300m | Q8 | 61.10 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 60.83 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q8 | 60.60 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2b | Q8 | 60.51 tok/sEstimated Auto-generated benchmark | 2GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q4 | 60.36 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-14B | Q4 | 60.29 tok/sEstimated Auto-generated benchmark | 7GB |
| EssentialAI/rnj-1 | Q4 | 60.09 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 59.98 tok/sEstimated Auto-generated benchmark | 8GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 59.69 tok/sEstimated Auto-generated benchmark | 5GB |
| ibm-research/PowerMoE-3b | Q8 | 59.43 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 59.38 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 59.14 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 59.04 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 58.99 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-3B | Q8 | 58.88 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B | Q8 | 58.58 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 58.54 tok/sEstimated Auto-generated benchmark | 7GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 58.50 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 58.44 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen-Image-Edit-2509 | Q8 | 58.43 tok/sEstimated Auto-generated benchmark | 8GB |
| tencent/HunyuanOCR | Q8 | 58.33 tok/sEstimated Auto-generated benchmark | 2GB |
| black-forest-labs/FLUX.2-dev | Q8 | 58.16 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B-Base | Q8 | 58.09 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 58.06 tok/sEstimated Auto-generated benchmark | 3GB |
| google-t5/t5-3b | Q8 | 58.05 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 58.01 tok/sEstimated Auto-generated benchmark | 6GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 57.96 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q8 | 57.80 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q8 | 57.75 tok/sEstimated Auto-generated benchmark | 7GB |
| vikhyatk/moondream2 | Q8 | 57.67 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 57.66 tok/sEstimated Auto-generated benchmark | 9GB |
| parler-tts/parler-tts-large-v1 | Q8 | 57.59 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 57.47 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 57.30 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 57.07 tok/sEstimated Auto-generated benchmark | 9GB |
| google/gemma-2-9b-it | Q4 | 57.07 tok/sEstimated Auto-generated benchmark | 5GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 56.95 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q8 | 56.93 tok/sEstimated Auto-generated benchmark | 7GB |
| numind/NuExtract-1.5 | Q8 | 56.90 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 56.53 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 56.38 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 56.34 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 56.25 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 55.87 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 55.86 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 55.86 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 55.76 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-small | Q8 | 55.68 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 55.63 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 55.60 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 55.58 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 55.55 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 55.50 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 55.35 tok/sEstimated Auto-generated benchmark | 9GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 55.32 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-14B | Q4 | 55.28 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 55.05 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 54.98 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 54.97 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 54.95 tok/sEstimated Auto-generated benchmark | 6GB |
| microsoft/phi-2 | Q8 | 54.95 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 54.70 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 54.68 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 54.66 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 54.66 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 54.65 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 54.43 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 54.37 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 54.00 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 53.87 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 53.83 tok/sEstimated Auto-generated benchmark | 5GB |
| facebook/opt-125m | Q8 | 53.82 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 53.77 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 53.62 tok/sEstimated Auto-generated benchmark | 4GB |
| Tongyi-MAI/Z-Image-Turbo | Q8 | 53.22 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 53.01 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 52.91 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 52.87 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 52.83 tok/sEstimated Auto-generated benchmark | 9GB |
| petals-team/StableBeluga2 | Q8 | 52.81 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 52.74 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B | Q8 | 52.65 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 52.43 tok/sEstimated Auto-generated benchmark | 5GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 52.39 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 52.32 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-8B | Q8 | 52.26 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/phi-4 | Q8 | 52.24 tok/sEstimated Auto-generated benchmark | 7GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 52.07 tok/sEstimated Auto-generated benchmark | 5GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 52.06 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 51.98 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 51.97 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 51.97 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 51.96 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 51.95 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 51.91 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 51.85 tok/sEstimated Auto-generated benchmark | 9GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 51.82 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 51.76 tok/sEstimated Auto-generated benchmark | 9GB |
| rednote-hilab/dots.ocr | Q8 | 51.51 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 51.51 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/VibeVoice-1.5B | Q8 | 51.49 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 51.45 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 51.43 tok/sEstimated Auto-generated benchmark | 6GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 51.42 tok/sEstimated Auto-generated benchmark | 7GB |
| allenai/Olmo-3-7B-Think | Q8 | 51.33 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 51.27 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 51.21 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 51.06 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 51.02 tok/sEstimated Auto-generated benchmark | 9GB |
| dicta-il/dictalm2.0-instruct | Q8 | 50.94 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 50.94 tok/sEstimated Auto-generated benchmark | 9GB |
| sshleifer/tiny-gpt2 | Q8 | 50.81 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 50.75 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 50.73 tok/sEstimated Auto-generated benchmark | 5GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 50.67 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 50.58 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 50.22 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 50.14 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q8 | 50.08 tok/sEstimated Auto-generated benchmark | 5GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 50.07 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-2-7b-hf | Q8 | 50.06 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 50.04 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 49.87 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q8 | 49.83 tok/sEstimated Auto-generated benchmark | 7GB |
| black-forest-labs/FLUX.1-dev | Q8 | 49.82 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 49.76 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 49.60 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 49.45 tok/sEstimated Auto-generated benchmark | 9GB |
| openai-community/gpt2-medium | Q8 | 49.43 tok/sEstimated Auto-generated benchmark | 7GB |
| tencent/HunyuanVideo-1.5 | Q8 | 49.38 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 49.33 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 49.20 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 49.09 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 48.99 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 48.95 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 48.85 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 48.80 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 48.64 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 48.48 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 48.47 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 48.31 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 48.27 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-4B | Q8 | 48.23 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q8 | 48.07 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 47.99 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 45.03 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-2-27b-it | Q4 | 44.88 tok/sEstimated Auto-generated benchmark | 14GB |
| openai/gpt-oss-20b | Q4 | 44.50 tok/sEstimated Auto-generated benchmark | 10GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 44.08 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 43.89 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 43.76 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 43.66 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 43.31 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-2-9b-it | Q8 | 43.05 tok/sEstimated Auto-generated benchmark | 10GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 42.88 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 42.80 tok/sEstimated Auto-generated benchmark | 15GB |
| openai/gpt-oss-safeguard-20b | Q4 | 42.75 tok/sEstimated Auto-generated benchmark | 11GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 42.66 tok/sEstimated Auto-generated benchmark | 10GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 42.53 tok/sEstimated Auto-generated benchmark | 13GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 41.97 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2.5-14B | Q8 | 41.78 tok/sEstimated Auto-generated benchmark | 14GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 41.66 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 40.66 tok/sEstimated Auto-generated benchmark | 14GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 39.86 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q4 | 39.84 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 39.84 tok/sEstimated Auto-generated benchmark | 10GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q8 | 39.78 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-14B | Q8 | 38.88 tok/sEstimated Auto-generated benchmark | 14GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 38.58 tok/sEstimated Auto-generated benchmark | 10GB |
| google/embeddinggemma-300m | FP16 | 38.15 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 38.02 tok/sEstimated Auto-generated benchmark | 13GB |
| google-bert/bert-base-uncased | FP16 | 37.98 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | FP16 | 37.96 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 37.78 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 37.72 tok/sEstimated Auto-generated benchmark | 15GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 37.61 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-14B-Base | Q8 | 37.40 tok/sEstimated Auto-generated benchmark | 14GB |
| WeiboAI/VibeThinker-1.5B | FP16 | 37.25 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/OLMo-2-0425-1B | FP16 | 37.19 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | FP16 | 36.88 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | FP16 | 36.78 tok/sEstimated Auto-generated benchmark | 6GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 36.65 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-3B-Instruct | FP16 | 36.63 tok/sEstimated Auto-generated benchmark | 6GB |
| tencent/HunyuanOCR | FP16 | 36.56 tok/sEstimated Auto-generated benchmark | 3GB |
| inference-net/Schematron-3B | FP16 | 36.53 tok/sEstimated Auto-generated benchmark | 6GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 36.51 tok/sEstimated Auto-generated benchmark | 9GB |
| EssentialAI/rnj-1 | Q8 | 36.51 tok/sEstimated Auto-generated benchmark | 10GB |
| meta-llama/Llama-3.2-1B-Instruct | FP16 | 36.37 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | FP16 | 36.22 tok/sEstimated Auto-generated benchmark | 6GB |
| google/gemma-2b | FP16 | 35.65 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/sam3 | FP16 | 35.16 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | FP16 | 35.04 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | FP16 | 34.61 tok/sEstimated Auto-generated benchmark | 6GB |
| ibm-granite/granite-3.3-2b-instruct | FP16 | 34.25 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-1B-Instruct | FP16 | 34.23 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | FP16 | 34.06 tok/sEstimated Auto-generated benchmark | 6GB |
| apple/OpenELM-1_1B-Instruct | FP16 | 34.03 tok/sEstimated Auto-generated benchmark | 2GB |
| nari-labs/Dia2-2B | FP16 | 33.08 tok/sEstimated Auto-generated benchmark | 5GB |
| unsloth/Llama-3.2-3B-Instruct | FP16 | 33.05 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-OCR | FP16 | 32.86 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-2-2b-it | FP16 | 32.79 tok/sEstimated Auto-generated benchmark | 4GB |
| LiquidAI/LFM2-1.2B | FP16 | 32.76 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-1B | FP16 | 32.54 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | FP16 | 32.40 tok/sEstimated Auto-generated benchmark | 6GB |
| meta-llama/Llama-3.2-3B | FP16 | 31.90 tok/sEstimated Auto-generated benchmark | 6GB |
| openai-community/gpt2-medium | FP16 | 31.81 tok/sEstimated Auto-generated benchmark | 15GB |
| rinna/japanese-gpt-neox-small | FP16 | 31.75 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-1.5B-Instruct | FP16 | 31.72 tok/sEstimated Auto-generated benchmark | 11GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 31.68 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B | FP16 | 31.58 tok/sEstimated Auto-generated benchmark | 11GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | FP16 | 31.57 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-7B | FP16 | 31.57 tok/sEstimated Auto-generated benchmark | 15GB |
| GSAI-ML/LLaDA-8B-Instruct | FP16 | 31.57 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | 31.56 tok/sEstimated Auto-generated benchmark | 31GB |
| BSC-LT/salamandraTA-7b-instruct | FP16 | 31.54 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM-135M | FP16 | 31.53 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-3B | FP16 | 31.45 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | 31.44 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gemma-3-1b-it | FP16 | 31.43 tok/sEstimated Auto-generated benchmark | 2GB |
| openai-community/gpt2-xl | FP16 | 31.32 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | FP16 | 31.31 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-4B-Base | FP16 | 31.23 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 31.16 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B | FP16 | 31.08 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-V3 | FP16 | 31.08 tok/sEstimated Auto-generated benchmark | 15GB |
| hmellor/tiny-random-LlamaForCausalLM | FP16 | 31.07 tok/sEstimated Auto-generated benchmark | 15GB |
| sshleifer/tiny-gpt2 | FP16 | 31.04 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-8B | FP16 | 31.00 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 30.98 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/Phi-4-multimodal-instruct | FP16 | 30.96 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Instruct-2507 | FP16 | 30.94 tok/sEstimated Auto-generated benchmark | 9GB |
| MiniMaxAI/MiniMax-M2 | FP16 | 30.94 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-2-27b-it | Q8 | 30.93 tok/sEstimated Auto-generated benchmark | 28GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 30.90 tok/sEstimated Auto-generated benchmark | 20GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | FP16 | 30.87 tok/sEstimated Auto-generated benchmark | 17GB |
| lmsys/vicuna-7b-v1.5 | FP16 | 30.84 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | 30.80 tok/sEstimated Auto-generated benchmark | 31GB |
| EleutherAI/gpt-neo-125m | FP16 | 30.78 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-1.5B | FP16 | 30.71 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2-0.5B-Instruct | FP16 | 30.63 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-1.7B-Base | FP16 | 30.54 tok/sEstimated Auto-generated benchmark | 15GB |
| Gensyn/Qwen2.5-0.5B-Instruct | FP16 | 30.53 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen-Image-Edit-2509 | FP16 | 30.49 tok/sEstimated Auto-generated benchmark | 16GB |
| openai/gpt-oss-20b | Q8 | 30.46 tok/sEstimated Auto-generated benchmark | 20GB |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 30.46 tok/sEstimated Auto-generated benchmark | 17GB |
| IlyaGusev/saiga_llama3_8b | FP16 | 30.46 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-4B-Thinking-2507 | FP16 | 30.45 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-1.5B-Instruct | FP16 | 30.42 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | 30.41 tok/sEstimated Auto-generated benchmark | 31GB |
| numind/NuExtract-1.5 | FP16 | 30.30 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | FP16 | 30.29 tok/sEstimated Auto-generated benchmark | 15GB |
| swiss-ai/Apertus-8B-Instruct-2509 | FP16 | 30.24 tok/sEstimated Auto-generated benchmark | 17GB |
| ibm-granite/granite-3.3-8b-instruct | FP16 | 30.24 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | 30.20 tok/sEstimated Auto-generated benchmark | 31GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | 30.14 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | FP16 | 30.10 tok/sEstimated Auto-generated benchmark | 9GB |
| llamafactory/tiny-random-Llama-3 | FP16 | 30.02 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | FP16 | 30.00 tok/sEstimated Auto-generated benchmark | 15GB |
| dicta-il/dictalm2.0-instruct | FP16 | 29.98 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-0.5B | FP16 | 29.91 tok/sEstimated Auto-generated benchmark | 11GB |
| facebook/opt-125m | FP16 | 29.84 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-7B-Instruct-v0.2 | FP16 | 29.84 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-0.5B-Instruct | FP16 | 29.73 tok/sEstimated Auto-generated benchmark | 11GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | FP16 | 29.70 tok/sEstimated Auto-generated benchmark | 9GB |
| openai-community/gpt2-large | FP16 | 29.56 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-Guard-3-8B | FP16 | 29.55 tok/sEstimated Auto-generated benchmark | 17GB |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | 29.50 tok/sEstimated Auto-generated benchmark | 23GB |
| HuggingFaceH4/zephyr-7b-beta | FP16 | 29.39 tok/sEstimated Auto-generated benchmark | 15GB |
| openai/gpt-oss-safeguard-20b | Q8 | 29.36 tok/sEstimated Auto-generated benchmark | 22GB |
| allenai/Olmo-3-7B-Think | FP16 | 29.36 tok/sEstimated Auto-generated benchmark | 16GB |
| huggyllama/llama-7b | FP16 | 29.34 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-0.6B | FP16 | 29.30 tok/sEstimated Auto-generated benchmark | 13GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | FP16 | 29.28 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | FP16 | 29.14 tok/sEstimated Auto-generated benchmark | 9GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q4 | 29.09 tok/sEstimated Auto-generated benchmark | 25GB |
| Qwen/Qwen3-1.7B | FP16 | 29.05 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/VibeVoice-1.5B | FP16 | 29.04 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-0.6B-Base | FP16 | 29.02 tok/sEstimated Auto-generated benchmark | 13GB |
| codellama/CodeLlama-34b-hf | Q4 | 28.89 tok/sEstimated Auto-generated benchmark | 17GB |
| liuhaotian/llava-v1.5-7b | FP16 | 28.87 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | FP16 | 28.86 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Meta-Llama-3-8B | FP16 | 28.84 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Embedding-8B | FP16 | 28.84 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-2-7b-chat-hf | FP16 | 28.81 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Embedding-4B | FP16 | 28.80 tok/sEstimated Auto-generated benchmark | 9GB |
| ibm-granite/granite-docling-258M | FP16 | 28.67 tok/sEstimated Auto-generated benchmark | 15GB |
| petals-team/StableBeluga2 | FP16 | 28.66 tok/sEstimated Auto-generated benchmark | 15GB |
| vikhyatk/moondream2 | FP16 | 28.65 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-32B | Q4 | 28.62 tok/sEstimated Auto-generated benchmark | 16GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | 28.59 tok/sEstimated Auto-generated benchmark | 9GB |
| GSAI-ML/LLaDA-8B-Base | FP16 | 28.57 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-3-mini-128k-instruct | FP16 | 28.54 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | 28.48 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | 28.48 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | 28.43 tok/sEstimated Auto-generated benchmark | 31GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | FP16 | 28.42 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-3.5-vision-instruct | FP16 | 28.42 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/phi-2 | FP16 | 28.40 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3-0324 | FP16 | 28.38 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B-Base | FP16 | 28.32 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-Math-1.5B | FP16 | 28.29 tok/sEstimated Auto-generated benchmark | 11GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | FP16 | 28.29 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-7B-v0.1 | FP16 | 28.22 tok/sEstimated Auto-generated benchmark | 15GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | FP16 | 28.22 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B | FP16 | 28.20 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 28.09 tok/sEstimated Auto-generated benchmark | 16GB |
| distilbert/distilgpt2 | FP16 | 28.02 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 27.94 tok/sEstimated Auto-generated benchmark | 16GB |
| 01-ai/Yi-1.5-34B-Chat | Q4 | 27.92 tok/sEstimated Auto-generated benchmark | 18GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 27.85 tok/sEstimated Auto-generated benchmark | 20GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | FP16 | 27.85 tok/sEstimated Auto-generated benchmark | 11GB |
| meta-llama/Llama-2-7b-hf | FP16 | 27.84 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Reranker-0.6B | FP16 | 27.80 tok/sEstimated Auto-generated benchmark | 13GB |
| microsoft/Phi-3-mini-4k-instruct | FP16 | 27.73 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | FP16 | 27.70 tok/sEstimated Auto-generated benchmark | 11GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | 27.69 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 27.69 tok/sEstimated Auto-generated benchmark | 15GB |
| black-forest-labs/FLUX.1-dev | FP16 | 27.69 tok/sEstimated Auto-generated benchmark | 16GB |
| parler-tts/parler-tts-large-v1 | FP16 | 27.67 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-small | FP16 | 27.63 tok/sEstimated Auto-generated benchmark | 15GB |
| skt/kogpt2-base-v2 | FP16 | 27.51 tok/sEstimated Auto-generated benchmark | 15GB |
| moonshotai/Kimi-K2-Thinking | Q4 | 27.39 tok/sEstimated Auto-generated benchmark | 489GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 27.32 tok/sEstimated Auto-generated benchmark | 17GB |
| black-forest-labs/FLUX.2-dev | FP16 | 27.30 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-8B-FP8 | FP16 | 27.25 tok/sEstimated Auto-generated benchmark | 17GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | FP16 | 27.21 tok/sEstimated Auto-generated benchmark | 15GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | 27.18 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Embedding-0.6B | FP16 | 27.18 tok/sEstimated Auto-generated benchmark | 13GB |
| microsoft/phi-4 | FP16 | 27.18 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3.1 | FP16 | 27.17 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/QwQ-32B-Preview | Q4 | 27.17 tok/sEstimated Auto-generated benchmark | 17GB |
| rednote-hilab/dots.ocr | FP16 | 27.09 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q8 | 27.06 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 27.03 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-R1-0528 | FP16 | 26.97 tok/sEstimated Auto-generated benchmark | 15GB |
| zai-org/GLM-4.5-Air | FP16 | 26.92 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-7B-Instruct | FP16 | 26.80 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 26.76 tok/sEstimated Auto-generated benchmark | 20GB |
| openai-community/gpt2 | FP16 | 26.73 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-Coder-1.5B | FP16 | 26.73 tok/sEstimated Auto-generated benchmark | 11GB |
| microsoft/DialoGPT-medium | FP16 | 26.71 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-3-270m-it | FP16 | 26.64 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-7B-Instruct-v0.1 | FP16 | 26.64 tok/sEstimated Auto-generated benchmark | 15GB |
| zai-org/GLM-4.6-FP8 | FP16 | 26.63 tok/sEstimated Auto-generated benchmark | 15GB |
| bigscience/bloomz-560m | FP16 | 26.60 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | FP16 | 26.57 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 26.53 tok/sEstimated Auto-generated benchmark | 34GB |
| tencent/HunyuanVideo-1.5 | FP16 | 26.50 tok/sEstimated Auto-generated benchmark | 16GB |
| EleutherAI/pythia-70m-deduped | FP16 | 26.50 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM2-135M | FP16 | 26.47 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1 | FP16 | 26.35 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-4-mini-instruct | FP16 | 26.26 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 26.16 tok/sEstimated Auto-generated benchmark | 34GB |
| Tongyi-MAI/Z-Image-Turbo | FP16 | 26.15 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | 26.05 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | 25.79 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | 25.74 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-32B | Q4 | 25.72 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 25.56 tok/sEstimated Auto-generated benchmark | 16GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 25.53 tok/sEstimated Auto-generated benchmark | 16GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 25.51 tok/sEstimated Auto-generated benchmark | 17GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | 25.15 tok/sEstimated Auto-generated benchmark | 34GB |
| deepseek-ai/DeepSeek-V2.5 | Q4 | 24.82 tok/sEstimated Auto-generated benchmark | 328GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 24.29 tok/sEstimated Auto-generated benchmark | 16GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | 24.09 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 23.42 tok/sEstimated Auto-generated benchmark | 30GB |
| ai-forever/ruGPT-3.5-13B | FP16 | 23.28 tok/sEstimated Auto-generated benchmark | 27GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | FP16 | 22.85 tok/sEstimated Auto-generated benchmark | 19GB |
| EssentialAI/rnj-1 | FP16 | 22.45 tok/sEstimated Auto-generated benchmark | 19GB |
| mistralai/Ministral-3-14B-Instruct-2512 | FP16 | 22.28 tok/sEstimated Auto-generated benchmark | 32GB |
| google/gemma-2-9b-it | FP16 | 22.23 tok/sEstimated Auto-generated benchmark | 20GB |
| NousResearch/Hermes-3-Llama-3.1-8B | FP16 | 21.93 tok/sEstimated Auto-generated benchmark | 17GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | 21.77 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen2.5-14B | FP16 | 21.25 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 20.82 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen3-14B-Base | FP16 | 20.74 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 20.51 tok/sEstimated Auto-generated benchmark | 33GB |
| microsoft/Phi-3-medium-128k-instruct | FP16 | 20.43 tok/sEstimated Auto-generated benchmark | 29GB |
| OpenPipe/Qwen3-14B-Instruct | FP16 | 20.16 tok/sEstimated Auto-generated benchmark | 29GB |
| baichuan-inc/Baichuan-M2-32B | Q8 | 20.13 tok/sEstimated Auto-generated benchmark | 33GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 20.02 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-2-13b-chat-hf | FP16 | 19.90 tok/sEstimated Auto-generated benchmark | 27GB |
| codellama/CodeLlama-34b-hf | Q8 | 19.86 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen3-14B | FP16 | 19.74 tok/sEstimated Auto-generated benchmark | 29GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | 19.66 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/QwQ-32B-Preview | Q8 | 19.57 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-32B | Q8 | 19.42 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | 19.37 tok/sEstimated Auto-generated benchmark | 34GB |
| moonshotai/Kimi-K2-Thinking | Q8 | 18.95 tok/sEstimated Auto-generated benchmark | 978GB |
| deepseek-ai/DeepSeek-V2.5 | Q8 | 18.56 tok/sEstimated Auto-generated benchmark | 656GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 18.44 tok/sEstimated Auto-generated benchmark | 68GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | 18.33 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/Qwen3-32B | Q8 | 18.15 tok/sEstimated Auto-generated benchmark | 33GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | 18.10 tok/sEstimated Auto-generated benchmark | 33GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | 17.79 tok/sEstimated Auto-generated benchmark | 68GB |
| 01-ai/Yi-1.5-34B-Chat | Q8 | 17.53 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | 17.49 tok/sEstimated Auto-generated benchmark | 61GB |
| mistralai/Mistral-Small-Instruct-2409 | FP16 | 17.45 tok/sEstimated Auto-generated benchmark | 46GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | 17.31 tok/sEstimated Auto-generated benchmark | 34GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | 17.09 tok/sEstimated Auto-generated benchmark | 68GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q8 | 17.06 tok/sEstimated Auto-generated benchmark | 50GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 17.01 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 16.95 tok/sEstimated Auto-generated benchmark | 34GB |
| openai/gpt-oss-safeguard-20b | FP16 | 16.91 tok/sEstimated Auto-generated benchmark | 44GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | FP16 | 16.89 tok/sEstimated Auto-generated benchmark | 61GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | 16.86 tok/sEstimated Auto-generated benchmark | 35GB |
| google/gemma-2-27b-it | FP16 | 16.77 tok/sEstimated Auto-generated benchmark | 56GB |
| Qwen/Qwen3-30B-A3B | FP16 | 16.60 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | 16.56 tok/sEstimated Auto-generated benchmark | 39GB |
| openai/gpt-oss-120b | Q4 | 16.46 tok/sEstimated Auto-generated benchmark | 59GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | FP16 | 16.39 tok/sEstimated Auto-generated benchmark | 41GB |
| unsloth/gpt-oss-20b-BF16 | FP16 | 16.35 tok/sEstimated Auto-generated benchmark | 41GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | 16.23 tok/sEstimated Auto-generated benchmark | 39GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 16.08 tok/sEstimated Auto-generated benchmark | 34GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | 16.06 tok/sEstimated Auto-generated benchmark | 34GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | FP16 | 15.98 tok/sEstimated Auto-generated benchmark | 41GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | 15.59 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | FP16 | 15.57 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | 15.48 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | 15.43 tok/sEstimated Auto-generated benchmark | 36GB |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | 15.34 tok/sEstimated Auto-generated benchmark | 60GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | FP16 | 15.34 tok/sEstimated Auto-generated benchmark | 61GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | 15.17 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | FP16 | 15.14 tok/sEstimated Auto-generated benchmark | 61GB |
| openai/gpt-oss-20b | FP16 | 15.09 tok/sEstimated Auto-generated benchmark | 41GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | 15.06 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | FP16 | 15.01 tok/sEstimated Auto-generated benchmark | 61GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | FP16 | 14.77 tok/sEstimated Auto-generated benchmark | 61GB |
| AI-MO/Kimina-Prover-72B | Q4 | 14.44 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 14.20 tok/sEstimated Auto-generated benchmark | 36GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | 14.17 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 13.90 tok/sEstimated Auto-generated benchmark | 35GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 13.83 tok/sEstimated Auto-generated benchmark | 34GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | 13.81 tok/sEstimated Auto-generated benchmark | 44GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | 12.41 tok/sEstimated Auto-generated benchmark | 115GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | 11.71 tok/sEstimated Auto-generated benchmark | 69GB |
| AI-MO/Kimina-Prover-72B | Q8 | 11.55 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | 11.25 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen3-32B | FP16 | 11.12 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 11.08 tok/sEstimated Auto-generated benchmark | 69GB |
| deepseek-ai/DeepSeek-V2.5 | FP16 | 11.08 tok/sEstimated Auto-generated benchmark | 1312GB |
| 01-ai/Yi-1.5-34B-Chat | FP16 | 10.84 tok/sEstimated Auto-generated benchmark | 70GB |
| moonshotai/Kimi-K2-Thinking | FP16 | 10.81 tok/sEstimated Auto-generated benchmark | 1956GB |
| Qwen/QwQ-32B-Preview | FP16 | 10.77 tok/sEstimated Auto-generated benchmark | 67GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | 10.76 tok/sEstimated Auto-generated benchmark | 88GB |
| deepseek-ai/deepseek-coder-33b-instruct | FP16 | 10.66 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 10.66 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 10.65 tok/sEstimated Auto-generated benchmark | 67GB |
| meta-llama/Meta-Llama-3-70B-Instruct | FP16 | 10.46 tok/sEstimated Auto-generated benchmark | 137GB |
| Qwen/Qwen2.5-32B | FP16 | 10.41 tok/sEstimated Auto-generated benchmark | 66GB |
| baichuan-inc/Baichuan-M2-32B | FP16 | 10.40 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 10.37 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | 10.36 tok/sEstimated Auto-generated benchmark | 78GB |
| openai/gpt-oss-120b | Q8 | 10.34 tok/sEstimated Auto-generated benchmark | 117GB |
| deepseek-ai/DeepSeek-Math-V2 | Q4 | 10.31 tok/sEstimated Auto-generated benchmark | 383GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | 10.29 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 10.29 tok/sEstimated Auto-generated benchmark | 71GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | FP16 | 10.23 tok/sEstimated Auto-generated benchmark | 66GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | 10.21 tok/sEstimated Auto-generated benchmark | 69GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | FP16 | 10.18 tok/sEstimated Auto-generated benchmark | 137GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | 10.02 tok/sEstimated Auto-generated benchmark | 71GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | FP16 | 9.97 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 9.95 tok/sEstimated Auto-generated benchmark | 70GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 9.94 tok/sEstimated Auto-generated benchmark | 137GB |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | 9.87 tok/sEstimated Auto-generated benchmark | 120GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | 9.86 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen3-235B-A22B | Q4 | 9.78 tok/sEstimated Auto-generated benchmark | 115GB |
| MiniMaxAI/MiniMax-M1-40k | Q4 | 9.65 tok/sEstimated Auto-generated benchmark | 255GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | FP16 | 9.58 tok/sEstimated Auto-generated benchmark | 137GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 9.50 tok/sEstimated Auto-generated benchmark | 137GB |
| MiniMaxAI/MiniMax-VL-01 | Q4 | 9.46 tok/sEstimated Auto-generated benchmark | 256GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | FP16 | 9.43 tok/sEstimated Auto-generated benchmark | 70GB |
| codellama/CodeLlama-34b-hf | FP16 | 9.41 tok/sEstimated Auto-generated benchmark | 70GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | 9.37 tok/sEstimated Auto-generated benchmark | 378GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | FP16 | 9.15 tok/sEstimated Auto-generated benchmark | 67GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | FP16 | 9.13 tok/sEstimated Auto-generated benchmark | 101GB |
| deepseek-ai/DeepSeek-Math-V2 | Q8 | 8.50 tok/sEstimated Auto-generated benchmark | 766GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | 8.13 tok/sEstimated Auto-generated benchmark | 231GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | FP16 | 7.91 tok/sEstimated Auto-generated benchmark | 275GB |
| MiniMaxAI/MiniMax-VL-01 | Q8 | 6.84 tok/sEstimated Auto-generated benchmark | 511GB |
| Qwen/Qwen3-235B-A22B | Q8 | 6.54 tok/sEstimated Auto-generated benchmark | 230GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | 6.21 tok/sEstimated Auto-generated benchmark | 755GB |
| mistralai/Mistral-Large-Instruct-2411 | FP16 | 6.11 tok/sEstimated Auto-generated benchmark | 240GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | FP16 | 6.05 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | FP16 | 6.03 tok/sEstimated Auto-generated benchmark | 156GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 5.99 tok/sEstimated Auto-generated benchmark | 138GB |
| AI-MO/Kimina-Prover-72B | FP16 | 5.90 tok/sEstimated Auto-generated benchmark | 141GB |
| MiniMaxAI/MiniMax-M1-40k | Q8 | 5.89 tok/sEstimated Auto-generated benchmark | 510GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 5.85 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | FP16 | 5.75 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | FP16 | 5.75 tok/sEstimated Auto-generated benchmark | 156GB |
| openai/gpt-oss-120b | FP16 | 5.75 tok/sEstimated Auto-generated benchmark | 235GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 5.74 tok/sEstimated Auto-generated benchmark | 141GB |
| NousResearch/Hermes-3-Llama-3.1-70B | FP16 | 5.64 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 5.57 tok/sEstimated Auto-generated benchmark | 142GB |
| Qwen/Qwen2.5-Math-72B-Instruct | FP16 | 5.57 tok/sEstimated Auto-generated benchmark | 142GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | FP16 | 5.48 tok/sEstimated Auto-generated benchmark | 176GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | 5.34 tok/sEstimated Auto-generated benchmark | 156GB |
| deepseek-ai/DeepSeek-Math-V2 | FP16 | 3.98 tok/sEstimated Auto-generated benchmark | 1532GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | FP16 | 3.94 tok/sEstimated Auto-generated benchmark | 461GB |
| MiniMaxAI/MiniMax-VL-01 | FP16 | 3.77 tok/sEstimated Auto-generated benchmark | 1021GB |
| Qwen/Qwen3-235B-A22B | FP16 | 3.67 tok/sEstimated Auto-generated benchmark | 460GB |
| MiniMaxAI/MiniMax-M1-40k | FP16 | 3.21 tok/sEstimated Auto-generated benchmark | 1020GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | 3.15 tok/sEstimated Auto-generated benchmark | 1509GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | Not supported | 9.37 tok/sEstimated | 378GB (have 8GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | Not supported | 6.21 tok/sEstimated | 755GB (have 8GB) |
| EssentialAI/rnj-1 | FP16 | Not supported | 22.45 tok/sEstimated | 19GB (have 8GB) |
| EssentialAI/rnj-1 | Q8 | Not supported | 36.51 tok/sEstimated | 10GB (have 8GB) |
| EssentialAI/rnj-1 | Q4 | Fits comfortably | 60.09 tok/sEstimated | 5GB (have 8GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | Not supported | 3.15 tok/sEstimated | 1509GB (have 8GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 92.15 tok/sEstimated | 1GB (have 8GB) |
| openai/gpt-oss-20b | FP16 | Not supported | 15.09 tok/sEstimated | 41GB (have 8GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | FP16 | Not supported | 30.53 tok/sEstimated | 11GB (have 8GB) |
| meta-llama/Llama-3.2-1B-Instruct | FP16 | Fits comfortably | 36.37 tok/sEstimated | 2GB (have 8GB) |
| openai/gpt-oss-120b | Q8 | Not supported | 10.34 tok/sEstimated | 117GB (have 8GB) |
| openai/gpt-oss-120b | FP16 | Not supported | 5.75 tok/sEstimated | 235GB (have 8GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 99.23 tok/sEstimated | 2GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | 18.10 tok/sEstimated | 33GB (have 8GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 60.83 tok/sEstimated | 1GB (have 8GB) |
| openai/gpt-oss-20b | Q8 | Not supported | 30.46 tok/sEstimated | 20GB (have 8GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 51.06 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 78.60 tok/sEstimated | 3GB (have 8GB) |
| Qwen/Qwen2.5-0.5B-Instruct | FP16 | Not supported | 29.73 tok/sEstimated | 11GB (have 8GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | 14.17 tok/sEstimated | 39GB (have 8GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 75.52 tok/sEstimated | 4GB (have 8GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits (tight) | 51.42 tok/sEstimated | 7GB (have 8GB) |
| microsoft/Phi-3-mini-4k-instruct | FP16 | Not supported | 27.73 tok/sEstimated | 15GB (have 8GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 78.83 tok/sEstimated | 4GB (have 8GB) |
| openai-community/gpt2-large | Q8 | Fits (tight) | 55.58 tok/sEstimated | 7GB (have 8GB) |
| openai-community/gpt2-large | FP16 | Not supported | 29.56 tok/sEstimated | 15GB (have 8GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 82.83 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits (tight) | 48.95 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Not supported | 45.03 tok/sEstimated | 15GB (have 8GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | 30.41 tok/sEstimated | 31GB (have 8GB) |
| facebook/opt-125m | Q8 | Fits (tight) | 53.82 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | 27.06 tok/sEstimated | 31GB (have 8GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | Not supported | 28.59 tok/sEstimated | 9GB (have 8GB) |
| Qwen/Qwen2.5-7B | FP16 | Not supported | 31.57 tok/sEstimated | 15GB (have 8GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 69.58 tok/sEstimated | 4GB (have 8GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits (tight) | 55.76 tok/sEstimated | 7GB (have 8GB) |
| microsoft/Phi-3.5-mini-instruct | FP16 | Not supported | 31.16 tok/sEstimated | 15GB (have 8GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Not supported | 24.29 tok/sEstimated | 16GB (have 8GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | 18.33 tok/sEstimated | 33GB (have 8GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | FP16 | Not supported | 10.23 tok/sEstimated | 66GB (have 8GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 98.45 tok/sEstimated | 1GB (have 8GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 53.83 tok/sEstimated | 5GB (have 8GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 68.48 tok/sEstimated | 3GB (have 8GB) |
| unsloth/Llama-3.2-3B-Instruct | FP16 | Fits comfortably | 33.05 tok/sEstimated | 6GB (have 8GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | FP16 | Not supported | 29.14 tok/sEstimated | 9GB (have 8GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 94.57 tok/sEstimated | 2GB (have 8GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits (tight) | 52.39 tok/sEstimated | 7GB (have 8GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Not supported | 37.61 tok/sEstimated | 14GB (have 8GB) |
| OpenPipe/Qwen3-14B-Instruct | FP16 | Not supported | 20.16 tok/sEstimated | 29GB (have 8GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 76.50 tok/sEstimated | 4GB (have 8GB) |
| openai-community/gpt2-xl | Q8 | Fits (tight) | 49.83 tok/sEstimated | 7GB (have 8GB) |
| openai-community/gpt2-xl | FP16 | Not supported | 31.32 tok/sEstimated | 15GB (have 8GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 74.23 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-Guard-3-8B | FP16 | Not supported | 29.55 tok/sEstimated | 17GB (have 8GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 69.80 tok/sEstimated | 4GB (have 8GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits (tight) | 50.81 tok/sEstimated | 7GB (have 8GB) |
| sshleifer/tiny-gpt2 | FP16 | Not supported | 31.04 tok/sEstimated | 15GB (have 8GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 72.07 tok/sEstimated | 2GB (have 8GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 50.14 tok/sEstimated | 4GB (have 8GB) |
| WeiboAI/VibeThinker-1.5B | Q4 | Fits comfortably | 83.00 tok/sEstimated | 1GB (have 8GB) |
| WeiboAI/VibeThinker-1.5B | Q8 | Fits comfortably | 69.15 tok/sEstimated | 2GB (have 8GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 54.66 tok/sEstimated | 5GB (have 8GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | 26.16 tok/sEstimated | 34GB (have 8GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Not supported | 51.85 tok/sEstimated | 9GB (have 8GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 56.95 tok/sEstimated | 4GB (have 8GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 78.88 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | 18.44 tok/sEstimated | 68GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 78.75 tok/sEstimated | 4GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits (tight) | 58.44 tok/sEstimated | 7GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-0528 | FP16 | Not supported | 26.97 tok/sEstimated | 15GB (have 8GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Not supported | 25.56 tok/sEstimated | 16GB (have 8GB) |
| HuggingFaceTB/SmolLM-135M | FP16 | Not supported | 31.53 tok/sEstimated | 15GB (have 8GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 75.40 tok/sEstimated | 3GB (have 8GB) |
| Qwen/Qwen2.5-Math-1.5B | FP16 | Not supported | 28.29 tok/sEstimated | 11GB (have 8GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits (tight) | 56.25 tok/sEstimated | 7GB (have 8GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 69.23 tok/sEstimated | 2GB (have 8GB) |
| google/gemma-2-2b-it | FP16 | Fits comfortably | 32.79 tok/sEstimated | 4GB (have 8GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 51.96 tok/sEstimated | 4GB (have 8GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | FP16 | Not supported | 29.70 tok/sEstimated | 9GB (have 8GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | 24.09 tok/sEstimated | 34GB (have 8GB) |
| llamafactory/tiny-random-Llama-3 | FP16 | Not supported | 30.02 tok/sEstimated | 15GB (have 8GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 74.96 tok/sEstimated | 4GB (have 8GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits (tight) | 55.60 tok/sEstimated | 7GB (have 8GB) |
| liuhaotian/llava-v1.5-7b | FP16 | Not supported | 28.87 tok/sEstimated | 15GB (have 8GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | 13.90 tok/sEstimated | 35GB (have 8GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 54.65 tok/sEstimated | 4GB (have 8GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 98.55 tok/sEstimated | 2GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | 27.69 tok/sEstimated | 31GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | Not supported | 15.59 tok/sEstimated | 61GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Not supported | 43.76 tok/sEstimated | 15GB (have 8GB) |
| google/gemma-2-9b-it | Q8 | Not supported | 43.05 tok/sEstimated | 10GB (have 8GB) |
| google/gemma-2-9b-it | FP16 | Not supported | 22.23 tok/sEstimated | 20GB (have 8GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 82.00 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-32B | FP16 | Not supported | 11.12 tok/sEstimated | 66GB (have 8GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 76.21 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Not supported | 48.27 tok/sEstimated | 9GB (have 8GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | Not supported | 30.46 tok/sEstimated | 17GB (have 8GB) |
| openai-community/gpt2-medium | Q8 | Fits (tight) | 49.43 tok/sEstimated | 7GB (have 8GB) |
| openai-community/gpt2-medium | FP16 | Not supported | 31.81 tok/sEstimated | 15GB (have 8GB) |
| Qwen/Qwen2.5-1.5B-Instruct | FP16 | Not supported | 30.42 tok/sEstimated | 11GB (have 8GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
A Windows builder running LLaMA 2 13B Q6 in Kobold with system-memory fallback enabled sees about 8 tokens/sec, while disabling fallback drops throughput toward 5 tok/s.
Source: Reddit – /r/LocalLLaMA (1beu2vh)
Not really—community buyers warn that the 3070’s 8GB ceiling struggles beyond 13B unless you chain multiple cards or accept heavy offload, making higher-VRAM GPUs a safer bet.
Source: Reddit – /r/LocalLLaMA (ndp8799)
Keep NVIDIA’s sysmem fallback enabled when you stretch past 7B models—users disabling it see token speeds collapse as layers spill to host RAM instead of the GPU.
Source: Reddit – /r/LocalLLaMA (kuyjopm)
RTX 3070 ships with 8 GB GDDR6, draws 220 W, and uses dual 8-pin PCIe connectors with NVIDIA recommending a 650 W PSU.
Source: TechPowerUp – RTX 3070 Specs
As of Nov 2025 the RTX 3070 was around $499 on Amazon, in stock.
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 3080 stacks up for local inference workloads.
Explore how RTX 4070 stacks up for local inference workloads.
Explore how RTX 3060 12GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3090 stacks up for local inference workloads.