Loading GPU specifications...
Loading GPU benchmarks...
Quick Answer: NVIDIA A6000 offers 48GB VRAM and starts around $11.79. It delivers approximately 180 tokens/sec on deepseek-ai/DeepSeek-OCR. It typically draws 300W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
Buy directly on Amazon with fast shipping and reliable customer service.
Rotate out primary variants whenever validation flags an issue.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| deepseek-ai/DeepSeek-OCR | Q4 | 180.00 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 174.39 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2b | Q4 | 173.21 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 173.02 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 172.90 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 172.80 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 171.65 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 171.24 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 170.32 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 169.81 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 168.29 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | Q4 | 166.68 tok/sEstimated Auto-generated benchmark | 2GB |
| inference-net/Schematron-3B | Q4 | 162.46 tok/sEstimated Auto-generated benchmark | 2GB |
| google-bert/bert-base-uncased | Q4 | 160.52 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-research/PowerMoE-3b | Q4 | 159.60 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q4 | 159.30 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 159.20 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 158.66 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-1b-it | Q4 | 156.13 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 156.03 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 156.01 tok/sEstimated Auto-generated benchmark | 2GB |
| nari-labs/Dia2-2B | Q4 | 154.76 tok/sEstimated Auto-generated benchmark | 2GB |
| tencent/HunyuanOCR | Q4 | 154.23 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 154.20 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 153.90 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q4 | 153.79 tok/sEstimated Auto-generated benchmark | 2GB |
| google/embeddinggemma-300m | Q4 | 152.92 tok/sEstimated Auto-generated benchmark | 1GB |
| WeiboAI/VibeThinker-1.5B | Q4 | 151.82 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 148.12 tok/sEstimated Auto-generated benchmark | 1GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 145.54 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 145.44 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 145.40 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 145.37 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 145.08 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 144.70 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 144.62 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 144.07 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 143.99 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 143.75 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 143.59 tok/sEstimated Auto-generated benchmark | 4GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 143.54 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-0.6B | Q4 | 143.48 tok/sEstimated Auto-generated benchmark | 3GB |
| facebook/sam3 | Q4 | 143.34 tok/sEstimated Auto-generated benchmark | 1GB |
| rinna/japanese-gpt-neox-small | Q4 | 143.23 tok/sEstimated Auto-generated benchmark | 4GB |
| rednote-hilab/dots.ocr | Q4 | 143.01 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 142.69 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 142.68 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 142.66 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 142.45 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 142.03 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 142.00 tok/sEstimated Auto-generated benchmark | 3GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 141.61 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/VibeVoice-1.5B | Q4 | 141.50 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 141.29 tok/sEstimated Auto-generated benchmark | 2GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 141.07 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 141.02 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 140.96 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 140.96 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 140.49 tok/sEstimated Auto-generated benchmark | 2GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 140.48 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 140.47 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 140.41 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 139.79 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 139.32 tok/sEstimated Auto-generated benchmark | 4GB |
| black-forest-labs/FLUX.2-dev | Q4 | 139.09 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 138.55 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 138.50 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 138.46 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 138.32 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 138.06 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 137.24 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 137.04 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/Olmo-3-7B-Think | Q4 | 136.69 tok/sEstimated Auto-generated benchmark | 4GB |
| Tongyi-MAI/Z-Image-Turbo | Q4 | 136.61 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 136.55 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 136.53 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 136.51 tok/sEstimated Auto-generated benchmark | 2GB |
| EleutherAI/gpt-neo-125m | Q4 | 136.38 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 136.32 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 136.19 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 135.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 135.39 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 135.04 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 134.81 tok/sEstimated Auto-generated benchmark | 4GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 134.39 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 134.20 tok/sEstimated Auto-generated benchmark | 2GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 134.15 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 134.08 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 134.06 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 134.03 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 133.62 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 133.40 tok/sEstimated Auto-generated benchmark | 2GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 133.26 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B | Q4 | 133.02 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-docling-258M | Q4 | 132.77 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 132.19 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 132.10 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 131.76 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen-Image-Edit-2509 | Q4 | 131.64 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 131.37 tok/sEstimated Auto-generated benchmark | 3GB |
| tencent/HunyuanVideo-1.5 | Q4 | 131.22 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 131.00 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 130.87 tok/sEstimated Auto-generated benchmark | 3GB |
| bigscience/bloomz-560m | Q4 | 130.79 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 130.62 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 129.87 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 129.76 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q4 | 129.70 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 129.61 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 129.37 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q4 | 129.17 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B | Q4 | 128.87 tok/sEstimated Auto-generated benchmark | 3GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 128.19 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 127.85 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 127.77 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 127.76 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 127.19 tok/sEstimated Auto-generated benchmark | 2GB |
| liuhaotian/llava-v1.5-7b | Q4 | 126.87 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 126.73 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 126.64 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Base | Q4 | 125.92 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 125.92 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 125.49 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 125.32 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 125.22 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 125.20 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 124.96 tok/sEstimated Auto-generated benchmark | 3GB |
| parler-tts/parler-tts-large-v1 | Q4 | 124.96 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 124.95 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 124.76 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 124.18 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 123.95 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 123.64 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 123.58 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 123.09 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 122.02 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 121.93 tok/sEstimated Auto-generated benchmark | 2GB |
| facebook/opt-125m | Q4 | 121.82 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 121.62 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 121.48 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q4 | 121.41 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 121.41 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 121.31 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 120.97 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 120.93 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 120.91 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 120.82 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 120.23 tok/sEstimated Auto-generated benchmark | 2GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 120.20 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 119.86 tok/sEstimated Auto-generated benchmark | 3GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 119.72 tok/sEstimated Auto-generated benchmark | 2GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 119.51 tok/sEstimated Auto-generated benchmark | 4GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 119.43 tok/sEstimated Auto-generated benchmark | 1GB |
| black-forest-labs/FLUX.1-dev | Q4 | 119.38 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-2b-it | Q8 | 118.43 tok/sEstimated Auto-generated benchmark | 2GB |
| google/embeddinggemma-300m | Q8 | 118.30 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q8 | 116.40 tok/sEstimated Auto-generated benchmark | 2GB |
| google-bert/bert-base-uncased | Q8 | 116.22 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q8 | 115.78 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 115.50 tok/sEstimated Auto-generated benchmark | 3GB |
| nari-labs/Dia2-2B | Q8 | 115.47 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 115.06 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/gemma-3-1b-it | Q8 | 114.08 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 113.24 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q8 | 112.47 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 111.87 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B | Q8 | 111.25 tok/sEstimated Auto-generated benchmark | 3GB |
| WeiboAI/VibeThinker-1.5B | Q8 | 110.97 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 110.61 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-1B | Q8 | 107.95 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 107.31 tok/sEstimated Auto-generated benchmark | 8GB |
| ibm-research/PowerMoE-3b | Q8 | 107.22 tok/sEstimated Auto-generated benchmark | 3GB |
| EssentialAI/rnj-1 | Q4 | 105.99 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-OCR | Q8 | 105.51 tok/sEstimated Auto-generated benchmark | 4GB |
| tencent/HunyuanOCR | Q8 | 105.33 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 105.26 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 104.79 tok/sEstimated Auto-generated benchmark | 3GB |
| google-t5/t5-3b | Q8 | 104.19 tok/sEstimated Auto-generated benchmark | 3GB |
| facebook/sam3 | Q8 | 103.85 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q8 | 103.24 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 102.94 tok/sEstimated Auto-generated benchmark | 1GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 102.78 tok/sEstimated Auto-generated benchmark | 5GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 102.68 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 102.61 tok/sEstimated Auto-generated benchmark | 7GB |
| bigcode/starcoder2-3b | Q8 | 102.48 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-3B | Q8 | 102.41 tok/sEstimated Auto-generated benchmark | 3GB |
| allenai/OLMo-2-0425-1B | Q8 | 102.26 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 101.83 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-14B | Q4 | 101.77 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 101.62 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B | Q8 | 101.47 tok/sEstimated Auto-generated benchmark | 5GB |
| liuhaotian/llava-v1.5-7b | Q8 | 101.38 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 100.66 tok/sEstimated Auto-generated benchmark | 6GB |
| black-forest-labs/FLUX.2-dev | Q8 | 100.47 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 100.43 tok/sEstimated Auto-generated benchmark | 9GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 100.21 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q8 | 100.17 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-xl | Q8 | 100.14 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 100.07 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 99.99 tok/sEstimated Auto-generated benchmark | 9GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 99.80 tok/sEstimated Auto-generated benchmark | 5GB |
| Tongyi-MAI/Z-Image-Turbo | Q8 | 99.69 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 99.56 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 99.24 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q8 | 99.15 tok/sEstimated Auto-generated benchmark | 6GB |
| vikhyatk/moondream2 | Q8 | 99.15 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 98.96 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 98.81 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 98.74 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 98.06 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 97.98 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 97.94 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 97.86 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 97.80 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 97.44 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 97.36 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 97.34 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-14B-Base | Q4 | 97.20 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 97.05 tok/sEstimated Auto-generated benchmark | 9GB |
| google/gemma-2-9b-it | Q4 | 96.79 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-1.5B | Q8 | 96.63 tok/sEstimated Auto-generated benchmark | 5GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 96.53 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 96.36 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B | Q8 | 96.25 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 96.12 tok/sEstimated Auto-generated benchmark | 7GB |
| tencent/HunyuanVideo-1.5 | Q8 | 95.99 tok/sEstimated Auto-generated benchmark | 8GB |
| ibm-granite/granite-docling-258M | Q8 | 95.73 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 95.71 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/DialoGPT-medium | Q8 | 95.64 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 95.54 tok/sEstimated Auto-generated benchmark | 9GB |
| parler-tts/parler-tts-large-v1 | Q8 | 95.54 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 95.51 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 95.49 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B | Q4 | 95.43 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 95.12 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 95.06 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 94.91 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 94.42 tok/sEstimated Auto-generated benchmark | 9GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 94.29 tok/sEstimated Auto-generated benchmark | 7GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 94.17 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 94.13 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 93.89 tok/sEstimated Auto-generated benchmark | 5GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 93.69 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q4 | 93.54 tok/sEstimated Auto-generated benchmark | 8GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 93.41 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 93.31 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 93.20 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/Olmo-3-7B-Think | Q8 | 93.12 tok/sEstimated Auto-generated benchmark | 8GB |
| numind/NuExtract-1.5 | Q8 | 92.92 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 92.55 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 92.31 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 92.21 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 92.14 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 92.10 tok/sEstimated Auto-generated benchmark | 6GB |
| skt/kogpt2-base-v2 | Q8 | 92.08 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/VibeVoice-1.5B | Q8 | 91.95 tok/sEstimated Auto-generated benchmark | 5GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 91.83 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 91.83 tok/sEstimated Auto-generated benchmark | 7GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 91.80 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 91.69 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 91.56 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 91.48 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 91.46 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-7B | Q8 | 91.43 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 91.25 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 91.17 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 91.14 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 91.07 tok/sEstimated Auto-generated benchmark | 7GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 91.04 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 91.04 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 90.51 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 90.43 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 90.35 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 90.16 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q8 | 90.11 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 90.07 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 89.97 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 89.95 tok/sEstimated Auto-generated benchmark | 5GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 89.71 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 89.41 tok/sEstimated Auto-generated benchmark | 5GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 89.32 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 89.06 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 88.62 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 88.31 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-8B-Base | Q8 | 88.28 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 88.17 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 88.04 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 87.94 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 87.88 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 87.77 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 87.77 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 87.45 tok/sEstimated Auto-generated benchmark | 5GB |
| facebook/opt-125m | Q8 | 87.29 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 87.18 tok/sEstimated Auto-generated benchmark | 3GB |
| black-forest-labs/FLUX.1-dev | Q8 | 86.56 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 86.46 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 86.34 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 86.30 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 86.27 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-4B | Q8 | 86.21 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 86.07 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 85.81 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 85.62 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 85.32 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Base | Q8 | 85.15 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q8 | 84.62 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 84.41 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 84.40 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q8 | 84.31 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 84.23 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-1.7B | Q8 | 84.23 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 84.20 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 84.10 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen-Image-Edit-2509 | Q8 | 84.07 tok/sEstimated Auto-generated benchmark | 8GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 84.02 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 83.92 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-small | Q8 | 83.67 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 83.51 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 83.49 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 78.88 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 78.82 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 78.55 tok/sEstimated Auto-generated benchmark | 10GB |
| openai/gpt-oss-safeguard-20b | Q4 | 77.90 tok/sEstimated Auto-generated benchmark | 11GB |
| google/gemma-2-27b-it | Q4 | 77.39 tok/sEstimated Auto-generated benchmark | 14GB |
| openai/gpt-oss-20b | Q4 | 76.94 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 76.79 tok/sEstimated Auto-generated benchmark | 15GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 76.13 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-30B-A3B | Q4 | 75.43 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 75.33 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 74.68 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 74.62 tok/sEstimated Auto-generated benchmark | 15GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 73.78 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-14B-Base | Q8 | 73.37 tok/sEstimated Auto-generated benchmark | 14GB |
| google/gemma-2-9b-it | Q8 | 73.26 tok/sEstimated Auto-generated benchmark | 10GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 72.77 tok/sEstimated Auto-generated benchmark | 14GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 72.73 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 72.06 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 71.19 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 70.65 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 70.64 tok/sEstimated Auto-generated benchmark | 15GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 70.59 tok/sEstimated Auto-generated benchmark | 13GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 70.03 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 69.08 tok/sEstimated Auto-generated benchmark | 14GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q8 | 67.40 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 67.17 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/Llama-3.2-1B-Instruct | FP16 | 65.89 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 65.76 tok/sEstimated Auto-generated benchmark | 10GB |
| unsloth/gemma-3-1b-it | FP16 | 65.40 tok/sEstimated Auto-generated benchmark | 2GB |
| EssentialAI/rnj-1 | Q8 | 65.30 tok/sEstimated Auto-generated benchmark | 10GB |
| allenai/OLMo-2-0425-1B | FP16 | 65.23 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | FP16 | 65.23 tok/sEstimated Auto-generated benchmark | 6GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 65.16 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-14B | Q8 | 65.04 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-3B | FP16 | 64.84 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-14B | Q8 | 64.83 tok/sEstimated Auto-generated benchmark | 14GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 64.75 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-3B-Instruct | FP16 | 64.36 tok/sEstimated Auto-generated benchmark | 6GB |
| bigcode/starcoder2-3b | FP16 | 63.87 tok/sEstimated Auto-generated benchmark | 6GB |
| google-bert/bert-base-uncased | FP16 | 63.30 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B | FP16 | 62.10 tok/sEstimated Auto-generated benchmark | 6GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | FP16 | 60.79 tok/sEstimated Auto-generated benchmark | 6GB |
| nari-labs/Dia2-2B | FP16 | 60.73 tok/sEstimated Auto-generated benchmark | 5GB |
| ibm-granite/granite-3.3-2b-instruct | FP16 | 60.18 tok/sEstimated Auto-generated benchmark | 4GB |
| tencent/HunyuanOCR | FP16 | 59.68 tok/sEstimated Auto-generated benchmark | 3GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | FP16 | 58.97 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-OCR | FP16 | 58.78 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-2-2b-it | FP16 | 58.69 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-1B | FP16 | 58.40 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | FP16 | 57.72 tok/sEstimated Auto-generated benchmark | 2GB |
| LiquidAI/LFM2-1.2B | FP16 | 57.61 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | FP16 | 57.41 tok/sEstimated Auto-generated benchmark | 6GB |
| ibm-research/PowerMoE-3b | FP16 | 57.05 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | FP16 | 56.64 tok/sEstimated Auto-generated benchmark | 6GB |
| facebook/sam3 | FP16 | 56.50 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | FP16 | 56.21 tok/sEstimated Auto-generated benchmark | 2GB |
| inference-net/Schematron-3B | FP16 | 56.06 tok/sEstimated Auto-generated benchmark | 6GB |
| google/gemma-3-1b-it | FP16 | 55.91 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | FP16 | 55.87 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2-27b-it | Q8 | 55.49 tok/sEstimated Auto-generated benchmark | 28GB |
| WeiboAI/VibeThinker-1.5B | FP16 | 55.39 tok/sEstimated Auto-generated benchmark | 4GB |
| google/embeddinggemma-300m | FP16 | 55.38 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | FP16 | 55.30 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | FP16 | 55.24 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-1.5B | FP16 | 55.15 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-4B-Thinking-2507 | FP16 | 55.14 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | 55.09 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-2-7b-hf | FP16 | 54.82 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | FP16 | 54.77 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-3-mini-4k-instruct | FP16 | 54.72 tok/sEstimated Auto-generated benchmark | 15GB |
| distilbert/distilgpt2 | FP16 | 54.65 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/phi-2 | FP16 | 54.64 tok/sEstimated Auto-generated benchmark | 15GB |
| swiss-ai/Apertus-8B-Instruct-2509 | FP16 | 54.43 tok/sEstimated Auto-generated benchmark | 17GB |
| black-forest-labs/FLUX.1-dev | FP16 | 54.38 tok/sEstimated Auto-generated benchmark | 16GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 54.34 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | FP16 | 54.33 tok/sEstimated Auto-generated benchmark | 9GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | FP16 | 54.23 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Embedding-8B | FP16 | 54.21 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-V3-0324 | FP16 | 54.21 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-xl | FP16 | 54.13 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | 54.11 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | 54.07 tok/sEstimated Auto-generated benchmark | 31GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | FP16 | 54.02 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/phi-4 | FP16 | 53.92 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 53.77 tok/sEstimated Auto-generated benchmark | 20GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | 53.60 tok/sEstimated Auto-generated benchmark | 31GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | FP16 | 53.52 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | 53.49 tok/sEstimated Auto-generated benchmark | 31GB |
| microsoft/Phi-4-mini-instruct | FP16 | 53.36 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-0.5B-Instruct | FP16 | 53.35 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-Embedding-0.6B | FP16 | 53.31 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-8B-FP8 | FP16 | 53.30 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | 53.29 tok/sEstimated Auto-generated benchmark | 15GB |
| bigscience/bloomz-560m | FP16 | 53.24 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1 | FP16 | 53.18 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Base | FP16 | 53.07 tok/sEstimated Auto-generated benchmark | 9GB |
| zai-org/GLM-4.6-FP8 | FP16 | 53.06 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2 | FP16 | 52.99 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM-135M | FP16 | 52.99 tok/sEstimated Auto-generated benchmark | 15GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | FP16 | 52.96 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Reranker-0.6B | FP16 | 52.93 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-4B-Instruct-2507 | FP16 | 52.93 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-0.6B-Base | FP16 | 52.88 tok/sEstimated Auto-generated benchmark | 13GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 52.88 tok/sEstimated Auto-generated benchmark | 20GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 52.86 tok/sEstimated Auto-generated benchmark | 17GB |
| google/gemma-3-270m-it | FP16 | 52.65 tok/sEstimated Auto-generated benchmark | 15GB |
| facebook/opt-125m | FP16 | 52.50 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-3.5-vision-instruct | FP16 | 52.49 tok/sEstimated Auto-generated benchmark | 15GB |
| openai/gpt-oss-safeguard-20b | Q8 | 52.41 tok/sEstimated Auto-generated benchmark | 22GB |
| Qwen/Qwen2.5-Math-1.5B | FP16 | 52.28 tok/sEstimated Auto-generated benchmark | 11GB |
| meta-llama/Llama-2-7b-chat-hf | FP16 | 52.25 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-8B | FP16 | 52.16 tok/sEstimated Auto-generated benchmark | 17GB |
| liuhaotian/llava-v1.5-7b | FP16 | 52.04 tok/sEstimated Auto-generated benchmark | 15GB |
| skt/kogpt2-base-v2 | FP16 | 51.93 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 51.92 tok/sEstimated Auto-generated benchmark | 17GB |
| mistralai/Mistral-7B-Instruct-v0.2 | FP16 | 51.83 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM2-135M | FP16 | 51.77 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-0.5B | FP16 | 51.76 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-0.6B | FP16 | 51.57 tok/sEstimated Auto-generated benchmark | 13GB |
| microsoft/VibeVoice-1.5B | FP16 | 51.51 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen-Image-Edit-2509 | FP16 | 51.47 tok/sEstimated Auto-generated benchmark | 16GB |
| sshleifer/tiny-gpt2 | FP16 | 51.45 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B | FP16 | 51.43 tok/sEstimated Auto-generated benchmark | 17GB |
| Tongyi-MAI/Z-Image-Turbo | FP16 | 51.41 tok/sEstimated Auto-generated benchmark | 16GB |
| tencent/HunyuanVideo-1.5 | FP16 | 51.39 tok/sEstimated Auto-generated benchmark | 16GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | 51.36 tok/sEstimated Auto-generated benchmark | 31GB |
| parler-tts/parler-tts-large-v1 | FP16 | 51.34 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 51.34 tok/sEstimated Auto-generated benchmark | 15GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | 51.31 tok/sEstimated Auto-generated benchmark | 17GB |
| dicta-il/dictalm2.0-instruct | FP16 | 51.25 tok/sEstimated Auto-generated benchmark | 15GB |
| llamafactory/tiny-random-Llama-3 | FP16 | 51.22 tok/sEstimated Auto-generated benchmark | 15GB |
| GSAI-ML/LLaDA-8B-Instruct | FP16 | 51.22 tok/sEstimated Auto-generated benchmark | 17GB |
| IlyaGusev/saiga_llama3_8b | FP16 | 51.07 tok/sEstimated Auto-generated benchmark | 17GB |
| ibm-granite/granite-docling-258M | FP16 | 50.99 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | 50.91 tok/sEstimated Auto-generated benchmark | 31GB |
| petals-team/StableBeluga2 | FP16 | 50.86 tok/sEstimated Auto-generated benchmark | 15GB |
| hmellor/tiny-random-LlamaForCausalLM | FP16 | 50.86 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-large | FP16 | 50.81 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | 50.80 tok/sEstimated Auto-generated benchmark | 34GB |
| numind/NuExtract-1.5 | FP16 | 50.79 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-Coder-1.5B | FP16 | 50.65 tok/sEstimated Auto-generated benchmark | 11GB |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | 50.32 tok/sEstimated Auto-generated benchmark | 23GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 50.01 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-1.7B | FP16 | 49.97 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B | FP16 | 49.91 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 49.73 tok/sEstimated Auto-generated benchmark | 16GB |
| mistralai/Mistral-7B-Instruct-v0.1 | FP16 | 49.67 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 49.54 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen3-30B-A3B | Q8 | 49.52 tok/sEstimated Auto-generated benchmark | 31GB |
| rednote-hilab/dots.ocr | FP16 | 49.49 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-1.7B-Base | FP16 | 49.44 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-7B | FP16 | 49.40 tok/sEstimated Auto-generated benchmark | 15GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 49.40 tok/sEstimated Auto-generated benchmark | 16GB |
| rinna/japanese-gpt-neox-small | FP16 | 49.39 tok/sEstimated Auto-generated benchmark | 15GB |
| huggyllama/llama-7b | FP16 | 49.36 tok/sEstimated Auto-generated benchmark | 15GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | 49.17 tok/sEstimated Auto-generated benchmark | 34GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | FP16 | 49.14 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 48.87 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-32B | Q4 | 48.81 tok/sEstimated Auto-generated benchmark | 16GB |
| moonshotai/Kimi-K2-Thinking | Q4 | 48.75 tok/sEstimated Auto-generated benchmark | 489GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | FP16 | 48.75 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3 | FP16 | 48.74 tok/sEstimated Auto-generated benchmark | 15GB |
| vikhyatk/moondream2 | FP16 | 48.69 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 48.56 tok/sEstimated Auto-generated benchmark | 34GB |
| meta-llama/Llama-3.1-8B | FP16 | 48.52 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2-1.5B-Instruct | FP16 | 48.46 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | FP16 | 48.40 tok/sEstimated Auto-generated benchmark | 9GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | FP16 | 48.40 tok/sEstimated Auto-generated benchmark | 17GB |
| HuggingFaceH4/zephyr-7b-beta | FP16 | 48.34 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | FP16 | 48.19 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-medium | FP16 | 48.13 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-1.5B-Instruct | FP16 | 48.05 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | FP16 | 48.04 tok/sEstimated Auto-generated benchmark | 15GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | FP16 | 47.98 tok/sEstimated Auto-generated benchmark | 11GB |
| EleutherAI/gpt-neo-125m | FP16 | 47.97 tok/sEstimated Auto-generated benchmark | 15GB |
| ibm-granite/granite-3.3-8b-instruct | FP16 | 47.96 tok/sEstimated Auto-generated benchmark | 17GB |
| 01-ai/Yi-1.5-34B-Chat | Q4 | 47.83 tok/sEstimated Auto-generated benchmark | 18GB |
| microsoft/DialoGPT-small | FP16 | 47.82 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-0.5B-Instruct | FP16 | 47.78 tok/sEstimated Auto-generated benchmark | 11GB |
| microsoft/Phi-3-mini-128k-instruct | FP16 | 47.77 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-0.5B | FP16 | 47.73 tok/sEstimated Auto-generated benchmark | 11GB |
| microsoft/Phi-4-multimodal-instruct | FP16 | 47.71 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | 47.64 tok/sEstimated Auto-generated benchmark | 31GB |
| Qwen/Qwen2-7B-Instruct | FP16 | 47.55 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | FP16 | 47.53 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | 47.46 tok/sEstimated Auto-generated benchmark | 31GB |
| deepseek-ai/DeepSeek-V2.5 | Q4 | 47.31 tok/sEstimated Auto-generated benchmark | 328GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | 47.31 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 47.29 tok/sEstimated Auto-generated benchmark | 16GB |
| codellama/CodeLlama-34b-hf | Q4 | 47.19 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/QwQ-32B-Preview | Q4 | 47.12 tok/sEstimated Auto-generated benchmark | 17GB |
| openai/gpt-oss-20b | Q8 | 47.08 tok/sEstimated Auto-generated benchmark | 20GB |
| deepseek-ai/DeepSeek-R1-0528 | FP16 | 46.88 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 46.84 tok/sEstimated Auto-generated benchmark | 16GB |
| openai-community/gpt2-medium | FP16 | 46.82 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-7B-v0.1 | FP16 | 46.76 tok/sEstimated Auto-generated benchmark | 15GB |
| zai-org/GLM-4.5-Air | FP16 | 46.75 tok/sEstimated Auto-generated benchmark | 15GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | 46.66 tok/sEstimated Auto-generated benchmark | 9GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | FP16 | 46.43 tok/sEstimated Auto-generated benchmark | 15GB |
| allenai/Olmo-3-7B-Think | FP16 | 46.16 tok/sEstimated Auto-generated benchmark | 16GB |
| MiniMaxAI/MiniMax-M2 | FP16 | 46.12 tok/sEstimated Auto-generated benchmark | 15GB |
| BSC-LT/salamandraTA-7b-instruct | FP16 | 46.11 tok/sEstimated Auto-generated benchmark | 15GB |
| EleutherAI/pythia-70m-deduped | FP16 | 46.00 tok/sEstimated Auto-generated benchmark | 15GB |
| GSAI-ML/LLaDA-8B-Base | FP16 | 45.96 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-8B-Base | FP16 | 45.90 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Embedding-4B | FP16 | 45.80 tok/sEstimated Auto-generated benchmark | 9GB |
| Gensyn/Qwen2.5-0.5B-Instruct | FP16 | 45.74 tok/sEstimated Auto-generated benchmark | 11GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | FP16 | 45.68 tok/sEstimated Auto-generated benchmark | 11GB |
| meta-llama/Llama-Guard-3-8B | FP16 | 45.64 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-V3.1 | FP16 | 45.54 tok/sEstimated Auto-generated benchmark | 15GB |
| lmsys/vicuna-7b-v1.5 | FP16 | 45.53 tok/sEstimated Auto-generated benchmark | 15GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q4 | 45.34 tok/sEstimated Auto-generated benchmark | 25GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 45.32 tok/sEstimated Auto-generated benchmark | 7GB |
| black-forest-labs/FLUX.2-dev | FP16 | 45.32 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-32B | Q4 | 44.67 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | 44.04 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 43.18 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 42.97 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | 42.72 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 42.53 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 41.14 tok/sEstimated Auto-generated benchmark | 30GB |
| OpenPipe/Qwen3-14B-Instruct | FP16 | 40.92 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 40.83 tok/sEstimated Auto-generated benchmark | 29GB |
| ai-forever/ruGPT-3.5-13B | FP16 | 39.83 tok/sEstimated Auto-generated benchmark | 27GB |
| meta-llama/Llama-2-13b-chat-hf | FP16 | 39.35 tok/sEstimated Auto-generated benchmark | 27GB |
| Qwen/Qwen2.5-14B | FP16 | 39.04 tok/sEstimated Auto-generated benchmark | 29GB |
| microsoft/Phi-3-medium-128k-instruct | FP16 | 38.95 tok/sEstimated Auto-generated benchmark | 29GB |
| EssentialAI/rnj-1 | FP16 | 37.97 tok/sEstimated Auto-generated benchmark | 19GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 37.26 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-14B | FP16 | 36.11 tok/sEstimated Auto-generated benchmark | 29GB |
| deepseek-ai/DeepSeek-V2.5 | Q8 | 35.65 tok/sEstimated Auto-generated benchmark | 656GB |
| Qwen/QwQ-32B-Preview | Q8 | 35.26 tok/sEstimated Auto-generated benchmark | 34GB |
| google/gemma-2-9b-it | FP16 | 35.16 tok/sEstimated Auto-generated benchmark | 20GB |
| NousResearch/Hermes-3-Llama-3.1-8B | FP16 | 35.12 tok/sEstimated Auto-generated benchmark | 17GB |
| mistralai/Ministral-3-14B-Instruct-2512 | FP16 | 34.66 tok/sEstimated Auto-generated benchmark | 32GB |
| Qwen/Qwen3-14B-Base | FP16 | 34.64 tok/sEstimated Auto-generated benchmark | 29GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | FP16 | 34.63 tok/sEstimated Auto-generated benchmark | 19GB |
| moonshotai/Kimi-K2-Thinking | Q8 | 34.54 tok/sEstimated Auto-generated benchmark | 978GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 34.50 tok/sEstimated Auto-generated benchmark | 68GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | 34.09 tok/sEstimated Auto-generated benchmark | 33GB |
| 01-ai/Yi-1.5-34B-Chat | Q8 | 34.08 tok/sEstimated Auto-generated benchmark | 35GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | 33.83 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 33.50 tok/sEstimated Auto-generated benchmark | 33GB |
| baichuan-inc/Baichuan-M2-32B | Q8 | 33.04 tok/sEstimated Auto-generated benchmark | 33GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | 32.98 tok/sEstimated Auto-generated benchmark | 69GB |
| codellama/CodeLlama-34b-hf | Q8 | 32.81 tok/sEstimated Auto-generated benchmark | 35GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | 32.56 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-32B | Q8 | 32.52 tok/sEstimated Auto-generated benchmark | 33GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | 32.05 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 32.03 tok/sEstimated Auto-generated benchmark | 34GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | 31.70 tok/sEstimated Auto-generated benchmark | 35GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 30.81 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen3-32B | Q8 | 30.77 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | 30.53 tok/sEstimated Auto-generated benchmark | 34GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | 30.35 tok/sEstimated Auto-generated benchmark | 33GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | 29.97 tok/sEstimated Auto-generated benchmark | 61GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | 29.96 tok/sEstimated Auto-generated benchmark | 68GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | FP16 | 29.85 tok/sEstimated Auto-generated benchmark | 61GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q8 | 29.44 tok/sEstimated Auto-generated benchmark | 50GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | FP16 | 28.77 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | 28.46 tok/sEstimated Auto-generated benchmark | 36GB |
| Qwen/Qwen3-30B-A3B | FP16 | 28.37 tok/sEstimated Auto-generated benchmark | 61GB |
| openai/gpt-oss-safeguard-20b | FP16 | 27.82 tok/sEstimated Auto-generated benchmark | 44GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | FP16 | 27.56 tok/sEstimated Auto-generated benchmark | 61GB |
| openai/gpt-oss-120b | Q4 | 27.54 tok/sEstimated Auto-generated benchmark | 59GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | FP16 | 27.32 tok/sEstimated Auto-generated benchmark | 61GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | FP16 | 26.77 tok/sEstimated Auto-generated benchmark | 41GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | 26.77 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | 26.58 tok/sEstimated Auto-generated benchmark | 39GB |
| mistralai/Mistral-Small-Instruct-2409 | FP16 | 26.45 tok/sEstimated Auto-generated benchmark | 46GB |
| openai/gpt-oss-20b | FP16 | 26.35 tok/sEstimated Auto-generated benchmark | 41GB |
| AI-MO/Kimina-Prover-72B | Q4 | 26.27 tok/sEstimated Auto-generated benchmark | 35GB |
| google/gemma-2-27b-it | FP16 | 26.23 tok/sEstimated Auto-generated benchmark | 56GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | 26.01 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | FP16 | 25.94 tok/sEstimated Auto-generated benchmark | 61GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 25.88 tok/sEstimated Auto-generated benchmark | 34GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | 25.78 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 25.62 tok/sEstimated Auto-generated benchmark | 36GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | 25.58 tok/sEstimated Auto-generated benchmark | 44GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | 25.50 tok/sEstimated Auto-generated benchmark | 39GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | FP16 | 25.41 tok/sEstimated Auto-generated benchmark | 61GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | 25.40 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 25.34 tok/sEstimated Auto-generated benchmark | 35GB |
| unsloth/gpt-oss-20b-BF16 | FP16 | 25.22 tok/sEstimated Auto-generated benchmark | 41GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 25.21 tok/sEstimated Auto-generated benchmark | 34GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | FP16 | 25.01 tok/sEstimated Auto-generated benchmark | 41GB |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | 24.55 tok/sEstimated Auto-generated benchmark | 60GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | 24.39 tok/sEstimated Auto-generated benchmark | 39GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | 22.93 tok/sEstimated Auto-generated benchmark | 138GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | 21.12 tok/sEstimated Auto-generated benchmark | 115GB |
| deepseek-ai/DeepSeek-Math-V2 | Q4 | 20.58 tok/sEstimated Auto-generated benchmark | 383GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 20.36 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | 20.35 tok/sEstimated Auto-generated benchmark | 71GB |
| openai/gpt-oss-120b | Q8 | 19.96 tok/sEstimated Auto-generated benchmark | 117GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | 19.89 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | 19.68 tok/sEstimated Auto-generated benchmark | 78GB |
| AI-MO/Kimina-Prover-72B | Q8 | 19.46 tok/sEstimated Auto-generated benchmark | 70GB |
| baichuan-inc/Baichuan-M2-32B | FP16 | 19.24 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | 19.20 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 18.94 tok/sEstimated Auto-generated benchmark | 71GB |
| deepseek-ai/DeepSeek-V2.5 | FP16 | 18.94 tok/sEstimated Auto-generated benchmark | 1312GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | 18.82 tok/sEstimated Auto-generated benchmark | 69GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 18.77 tok/sEstimated Auto-generated benchmark | 137GB |
| 01-ai/Yi-1.5-34B-Chat | FP16 | 18.69 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | 18.69 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen3-32B | FP16 | 18.69 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | 18.61 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 18.55 tok/sEstimated Auto-generated benchmark | 67GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | FP16 | 18.46 tok/sEstimated Auto-generated benchmark | 101GB |
| Qwen/QwQ-32B-Preview | FP16 | 18.42 tok/sEstimated Auto-generated benchmark | 67GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 18.35 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 18.14 tok/sEstimated Auto-generated benchmark | 69GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | FP16 | 17.93 tok/sEstimated Auto-generated benchmark | 66GB |
| codellama/CodeLlama-34b-hf | FP16 | 17.93 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | FP16 | 17.91 tok/sEstimated Auto-generated benchmark | 67GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | 17.79 tok/sEstimated Auto-generated benchmark | 88GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 17.49 tok/sEstimated Auto-generated benchmark | 69GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | FP16 | 17.41 tok/sEstimated Auto-generated benchmark | 137GB |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | 17.36 tok/sEstimated Auto-generated benchmark | 120GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | FP16 | 17.18 tok/sEstimated Auto-generated benchmark | 70GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | FP16 | 16.97 tok/sEstimated Auto-generated benchmark | 137GB |
| moonshotai/Kimi-K2-Thinking | FP16 | 16.96 tok/sEstimated Auto-generated benchmark | 1956GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | 16.78 tok/sEstimated Auto-generated benchmark | 378GB |
| Qwen/Qwen2.5-32B | FP16 | 16.78 tok/sEstimated Auto-generated benchmark | 66GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | FP16 | 16.35 tok/sEstimated Auto-generated benchmark | 66GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 16.34 tok/sEstimated Auto-generated benchmark | 137GB |
| meta-llama/Meta-Llama-3-70B-Instruct | FP16 | 16.29 tok/sEstimated Auto-generated benchmark | 137GB |
| deepseek-ai/deepseek-coder-33b-instruct | FP16 | 15.94 tok/sEstimated Auto-generated benchmark | 68GB |
| MiniMaxAI/MiniMax-M1-40k | Q4 | 15.78 tok/sEstimated Auto-generated benchmark | 255GB |
| Qwen/Qwen3-235B-A22B | Q4 | 15.19 tok/sEstimated Auto-generated benchmark | 115GB |
| MiniMaxAI/MiniMax-VL-01 | Q4 | 14.95 tok/sEstimated Auto-generated benchmark | 256GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | FP16 | 14.38 tok/sEstimated Auto-generated benchmark | 275GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | 14.31 tok/sEstimated Auto-generated benchmark | 231GB |
| deepseek-ai/DeepSeek-Math-V2 | Q8 | 13.38 tok/sEstimated Auto-generated benchmark | 766GB |
| Qwen/Qwen3-235B-A22B | Q8 | 11.42 tok/sEstimated Auto-generated benchmark | 230GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 11.04 tok/sEstimated Auto-generated benchmark | 141GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 10.99 tok/sEstimated Auto-generated benchmark | 142GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | 10.84 tok/sEstimated Auto-generated benchmark | 156GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | FP16 | 10.77 tok/sEstimated Auto-generated benchmark | 176GB |
| openai/gpt-oss-120b | FP16 | 10.72 tok/sEstimated Auto-generated benchmark | 235GB |
| MiniMaxAI/MiniMax-M1-40k | Q8 | 10.56 tok/sEstimated Auto-generated benchmark | 510GB |
| NousResearch/Hermes-3-Llama-3.1-70B | FP16 | 10.51 tok/sEstimated Auto-generated benchmark | 138GB |
| mistralai/Mistral-Large-Instruct-2411 | FP16 | 10.49 tok/sEstimated Auto-generated benchmark | 240GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | FP16 | 10.46 tok/sEstimated Auto-generated benchmark | 156GB |
| AI-MO/Kimina-Prover-72B | FP16 | 10.41 tok/sEstimated Auto-generated benchmark | 141GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | FP16 | 10.40 tok/sEstimated Auto-generated benchmark | 138GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | 10.28 tok/sEstimated Auto-generated benchmark | 755GB |
| MiniMaxAI/MiniMax-VL-01 | Q8 | 10.15 tok/sEstimated Auto-generated benchmark | 511GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 10.15 tok/sEstimated Auto-generated benchmark | 138GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 9.91 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen2.5-Math-72B-Instruct | FP16 | 9.85 tok/sEstimated Auto-generated benchmark | 142GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | FP16 | 9.35 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | FP16 | 9.22 tok/sEstimated Auto-generated benchmark | 156GB |
| deepseek-ai/DeepSeek-Math-V2 | FP16 | 8.04 tok/sEstimated Auto-generated benchmark | 1532GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | FP16 | 7.60 tok/sEstimated Auto-generated benchmark | 461GB |
| MiniMaxAI/MiniMax-M1-40k | FP16 | 6.56 tok/sEstimated Auto-generated benchmark | 1020GB |
| MiniMaxAI/MiniMax-VL-01 | FP16 | 6.49 tok/sEstimated Auto-generated benchmark | 1021GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | 6.37 tok/sEstimated Auto-generated benchmark | 1509GB |
| Qwen/Qwen3-235B-A22B | FP16 | 5.95 tok/sEstimated Auto-generated benchmark | 460GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | Not supported | 16.78 tok/sEstimated | 378GB (have 48GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | Not supported | 10.28 tok/sEstimated | 755GB (have 48GB) |
| EssentialAI/rnj-1 | FP16 | Fits comfortably | 37.97 tok/sEstimated | 19GB (have 48GB) |
| EssentialAI/rnj-1 | Q8 | Fits comfortably | 65.30 tok/sEstimated | 10GB (have 48GB) |
| EssentialAI/rnj-1 | Q4 | Fits comfortably | 105.99 tok/sEstimated | 5GB (have 48GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | Not supported | 6.37 tok/sEstimated | 1509GB (have 48GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 115.50 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 168.29 tok/sEstimated | 2GB (have 48GB) |
| openai/gpt-oss-120b | Q4 | Not supported | 27.54 tok/sEstimated | 59GB (have 48GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 121.41 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 96.63 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2.5-1.5B | FP16 | Fits comfortably | 55.15 tok/sEstimated | 11GB (have 48GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 102.61 tok/sEstimated | 7GB (have 48GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | FP16 | Fits comfortably | 48.40 tok/sEstimated | 17GB (have 48GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 76.13 tok/sEstimated | 10GB (have 48GB) |
| openai/gpt-oss-120b | FP16 | Not supported | 10.72 tok/sEstimated | 235GB (have 48GB) |
| unsloth/gpt-oss-20b-BF16 | FP16 | Fits comfortably | 25.22 tok/sEstimated | 41GB (have 48GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 65.76 tok/sEstimated | 10GB (have 48GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 137.24 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 88.28 tok/sEstimated | 9GB (have 48GB) |
| Qwen/Qwen3-8B-Base | FP16 | Fits comfortably | 45.90 tok/sEstimated | 17GB (have 48GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 121.31 tok/sEstimated | 4GB (have 48GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 91.04 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Fits comfortably | 49.52 tok/sEstimated | 31GB (have 48GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 123.95 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 87.45 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2.5-Coder-1.5B | FP16 | Fits comfortably | 50.65 tok/sEstimated | 11GB (have 48GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 143.23 tok/sEstimated | 4GB (have 48GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 89.97 tok/sEstimated | 7GB (have 48GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | FP16 | Fits comfortably | 48.19 tok/sEstimated | 15GB (have 48GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | Fits comfortably | 51.92 tok/sEstimated | 17GB (have 48GB) |
| Qwen/Qwen2.5-72B-Instruct | FP16 | Not supported | 11.04 tok/sEstimated | 141GB (have 48GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | 20.36 tok/sEstimated | 70GB (have 48GB) |
| microsoft/DialoGPT-medium | FP16 | Fits comfortably | 48.13 tok/sEstimated | 15GB (have 48GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 154.20 tok/sEstimated | 1GB (have 48GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 124.96 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Fits comfortably | 70.65 tok/sEstimated | 13GB (have 48GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 102.41 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen2.5-3B | FP16 | Fits comfortably | 64.84 tok/sEstimated | 6GB (have 48GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 119.51 tok/sEstimated | 4GB (have 48GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 84.02 tok/sEstimated | 9GB (have 48GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | Fits comfortably | 51.31 tok/sEstimated | 17GB (have 48GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 143.54 tok/sEstimated | 1GB (have 48GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | Not supported | 10.84 tok/sEstimated | 156GB (have 48GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Fits comfortably | 25.62 tok/sEstimated | 36GB (have 48GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | Fits comfortably | 42.72 tok/sEstimated | 17GB (have 48GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | Fits comfortably | 30.53 tok/sEstimated | 34GB (have 48GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | FP16 | Not supported | 17.91 tok/sEstimated | 67GB (have 48GB) |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | Not supported | 9.91 tok/sEstimated | 138GB (have 48GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 141.29 tok/sEstimated | 2GB (have 48GB) |
| google/gemma-2-27b-it | Q4 | Fits comfortably | 77.39 tok/sEstimated | 14GB (have 48GB) |
| google/gemma-2-27b-it | Q8 | Fits comfortably | 55.49 tok/sEstimated | 28GB (have 48GB) |
| google/gemma-2-27b-it | FP16 | Not supported | 26.23 tok/sEstimated | 56GB (have 48GB) |
| google/gemma-2-9b-it | Q4 | Fits comfortably | 96.79 tok/sEstimated | 5GB (have 48GB) |
| microsoft/Phi-3-medium-128k-instruct | Q8 | Fits comfortably | 72.77 tok/sEstimated | 14GB (have 48GB) |
| microsoft/Phi-3-medium-128k-instruct | FP16 | Fits comfortably | 38.95 tok/sEstimated | 29GB (have 48GB) |
| moonshotai/Kimi-K2-Thinking | FP16 | Not supported | 16.96 tok/sEstimated | 1956GB (have 48GB) |
| deepseek-ai/DeepSeek-Math-V2 | Q4 | Not supported | 20.58 tok/sEstimated | 383GB (have 48GB) |
| deepseek-ai/DeepSeek-Math-V2 | Q8 | Not supported | 13.38 tok/sEstimated | 766GB (have 48GB) |
| deepseek-ai/DeepSeek-Math-V2 | FP16 | Not supported | 8.04 tok/sEstimated | 1532GB (have 48GB) |
| Tongyi-MAI/Z-Image-Turbo | Q4 | Fits comfortably | 136.61 tok/sEstimated | 4GB (have 48GB) |
| Tongyi-MAI/Z-Image-Turbo | Q8 | Fits comfortably | 99.69 tok/sEstimated | 8GB (have 48GB) |
| WeiboAI/VibeThinker-1.5B | FP16 | Fits comfortably | 55.39 tok/sEstimated | 4GB (have 48GB) |
| WeiboAI/VibeThinker-1.5B | Q8 | Fits comfortably | 110.97 tok/sEstimated | 2GB (have 48GB) |
| llamafactory/tiny-random-Llama-3 | FP16 | Fits comfortably | 51.22 tok/sEstimated | 15GB (have 48GB) |
| WeiboAI/VibeThinker-1.5B | Q4 | Fits comfortably | 151.82 tok/sEstimated | 1GB (have 48GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 145.44 tok/sEstimated | 4GB (have 48GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 88.04 tok/sEstimated | 7GB (have 48GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | 32.56 tok/sEstimated | 68GB (have 48GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 95.43 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen2.5-14B | Q8 | Fits comfortably | 65.04 tok/sEstimated | 14GB (have 48GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 141.61 tok/sEstimated | 4GB (have 48GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 99.99 tok/sEstimated | 9GB (have 48GB) |
| GSAI-ML/LLaDA-8B-Base | FP16 | Fits comfortably | 45.96 tok/sEstimated | 17GB (have 48GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 136.32 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-1.7B-Base | FP16 | Fits comfortably | 49.44 tok/sEstimated | 15GB (have 48GB) |
| openai/gpt-oss-20b | Q8 | Fits comfortably | 47.08 tok/sEstimated | 20GB (have 48GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 112.47 tok/sEstimated | 3GB (have 48GB) |
| inference-net/Schematron-3B | FP16 | Fits comfortably | 56.06 tok/sEstimated | 6GB (have 48GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | Fits comfortably | 44.04 tok/sEstimated | 17GB (have 48GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | FP16 | Not supported | 17.41 tok/sEstimated | 137GB (have 48GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 153.90 tok/sEstimated | 2GB (have 48GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 111.87 tok/sEstimated | 3GB (have 48GB) |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | Fits comfortably | 64.75 tok/sEstimated | 6GB (have 48GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 87.18 tok/sEstimated | 3GB (have 48GB) |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | Fits comfortably | 45.32 tok/sEstimated | 7GB (have 48GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | Not supported | 32.98 tok/sEstimated | 69GB (have 48GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | Not supported | 22.93 tok/sEstimated | 138GB (have 48GB) |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | Not supported | 18.77 tok/sEstimated | 137GB (have 48GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 142.68 tok/sEstimated | 4GB (have 48GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 145.54 tok/sEstimated | 4GB (have 48GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | FP16 | Not supported | 14.38 tok/sEstimated | 275GB (have 48GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 89.71 tok/sEstimated | 7GB (have 48GB) |
| HuggingFaceTB/SmolLM2-135M | FP16 | Fits comfortably | 51.77 tok/sEstimated | 15GB (have 48GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 140.47 tok/sEstimated | 4GB (have 48GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | Fits comfortably | 72.06 tok/sEstimated | 11GB (have 48GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | Fits comfortably | 50.32 tok/sEstimated | 23GB (have 48GB) |
| mistralai/Mistral-Small-Instruct-2409 | FP16 | Fits comfortably | 26.45 tok/sEstimated | 46GB (have 48GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 135.04 tok/sEstimated | 4GB (have 48GB) |
| google-bert/bert-base-uncased | Q4 | Fits comfortably | 160.52 tok/sEstimated | 1GB (have 48GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
Operators running dual RTX A6000/RTX 8000 cards inside oobabooga report roughly 6–7 tokens/sec on 70B IQ4 MiQu workloads—adequate for shared inference queues.
Source: Reddit – /r/LocalLLaMA (lnv0ww3)
Enthusiasts caution that consumer boards seldom provide x16/x16 for two A6000s; dropping to x8/x4 starves llama.cpp workloads and erodes throughput.
Source: Reddit – /r/LocalLLaMA (mqpg0wp)
Even 2020-era RTX A6000 cards still list near $5,000, and the community expects scalpers to follow new workstation launches—showing how demand stays high.
Source: Reddit – /r/LocalLLaMA (movlqi2)
Some builders consider 48 GB 4090s, which keep full VRAM for inference but drop to 24 GB for PCIe peer-to-peer training—making the trade-off workload dependent.
Source: Reddit – /r/LocalLLaMA (mqoerg0)
RTX A6000 ships with 48 GB GDDR6 ECC and a 300 W TDP. As of Nov 2025 pricing on Amazon was around $4,899.
Explore how RTX 4090 stacks up for local inference workloads.
Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.
Explore how NVIDIA L40 stacks up for local inference workloads.
Explore how NVIDIA A5000 stacks up for local inference workloads.
Explore how NVIDIA A4000 stacks up for local inference workloads.
Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.
Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.
Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.