Loading GPU data...
Loading GPU data...
Quick Answer: RX 6800 XT offers 16GB VRAM and starts around $579.00. It delivers approximately 50 tokens/sec on apple/OpenELM-1_1B-Instruct. It typically draws 300W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| apple/OpenELM-1_1B-Instruct | Q4 | 49.85 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 49.50 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 49.33 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 49.01 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 48.53 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 47.43 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 46.23 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 45.31 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 44.13 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 37.58 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 36.26 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 34.69 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 34.33 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 34.20 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 33.90 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 33.85 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q8 | 33.27 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q4 | 33.09 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | Q4 | 33.06 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2b | Q4 | 31.98 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 31.97 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 31.59 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 31.45 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-1b-it | Q8 | 30.61 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 30.36 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B | Q4 | 30.26 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 30.15 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 29.98 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 29.98 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q4 | 29.54 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 29.01 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q8 | 28.90 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 28.80 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 28.68 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-4B | Q4 | 28.61 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 28.58 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 28.00 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-0.5B | Q4 | 27.85 tok/sEstimated Auto-generated benchmark | 3GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 27.80 tok/sEstimated Auto-generated benchmark | 3GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 27.79 tok/sEstimated Auto-generated benchmark | 2GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 27.51 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 27.26 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B | Q4 | 26.89 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 26.24 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 26.15 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 26.14 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 26.14 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 26.01 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Base | Q4 | 25.54 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 24.98 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B | Q4 | 24.77 tok/sEstimated Auto-generated benchmark | 3GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 24.75 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 24.66 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-2b-it | Q8 | 24.59 tok/sEstimated Auto-generated benchmark | 2GB |
| distilbert/distilgpt2 | Q4 | 24.58 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 24.57 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 24.55 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 24.52 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/VibeVoice-1.5B | Q4 | 24.48 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-large | Q4 | 24.43 tok/sEstimated Auto-generated benchmark | 4GB |
| LiquidAI/LFM2-1.2B | Q8 | 24.37 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 24.29 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 24.27 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 24.26 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 24.25 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 24.14 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 24.05 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-xl | Q4 | 23.87 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 23.87 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 23.81 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 23.78 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 23.77 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 23.76 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 23.66 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 23.55 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 23.53 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 23.52 tok/sEstimated Auto-generated benchmark | 3GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 23.49 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 23.43 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 23.42 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-research/PowerMoE-3b | Q8 | 23.39 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 23.37 tok/sEstimated Auto-generated benchmark | 4GB |
| rednote-hilab/dots.ocr | Q4 | 23.34 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 23.34 tok/sEstimated Auto-generated benchmark | 4GB |
| google-t5/t5-3b | Q8 | 23.33 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-8B-Base | Q4 | 23.32 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 23.32 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 23.19 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 23.12 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 23.09 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 23.06 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 23.05 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 23.04 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 23.01 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 22.97 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 22.95 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 22.74 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 22.72 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2b | Q8 | 22.63 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-docling-258M | Q4 | 22.60 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 22.49 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 22.49 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 22.48 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 22.45 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 22.39 tok/sEstimated Auto-generated benchmark | 3GB |
| vikhyatk/moondream2 | Q4 | 22.37 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 22.28 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 22.27 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 22.25 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 22.24 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 22.23 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 22.20 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 22.16 tok/sEstimated Auto-generated benchmark | 4GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 22.12 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 22.04 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 22.04 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 22.03 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 21.99 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 21.96 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 21.89 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 21.83 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 21.64 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 21.59 tok/sEstimated Auto-generated benchmark | 4GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 21.54 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 21.51 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 21.51 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 21.47 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 21.46 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 21.42 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 21.38 tok/sEstimated Auto-generated benchmark | 4GB |
| parler-tts/parler-tts-large-v1 | Q4 | 21.35 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 21.25 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 21.21 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 21.09 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 21.03 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 20.90 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q8 | 20.88 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/phi-2 | Q4 | 20.83 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 20.80 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B | Q8 | 20.79 tok/sEstimated Auto-generated benchmark | 3GB |
| liuhaotian/llava-v1.5-7b | Q4 | 20.73 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 20.67 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 20.63 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 20.63 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 20.56 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 20.54 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 20.51 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 20.50 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B | Q4 | 20.39 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 20.38 tok/sEstimated Auto-generated benchmark | 4GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 20.33 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 20.23 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 20.14 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 20.06 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 20.04 tok/sEstimated Auto-generated benchmark | 3GB |
| bigcode/starcoder2-3b | Q8 | 19.99 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 19.81 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 19.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B | Q8 | 19.57 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q8 | 19.53 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 19.52 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 19.41 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 19.41 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 19.39 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 19.33 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 19.29 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 19.27 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 19.19 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 19.12 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 18.91 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 18.74 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 18.59 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B | Q4 | 18.10 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 18.06 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 18.04 tok/sEstimated Auto-generated benchmark | 7GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 17.96 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 17.96 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 17.95 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 17.69 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 17.54 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q8 | 17.29 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 17.27 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 17.25 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 17.23 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 17.20 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 17.13 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 16.90 tok/sEstimated Auto-generated benchmark | 5GB |
| ibm-granite/granite-docling-258M | Q8 | 16.86 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 16.78 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 16.76 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B | Q8 | 16.74 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-2-7b-hf | Q8 | 16.73 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 16.73 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 16.71 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 16.68 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 16.63 tok/sEstimated Auto-generated benchmark | 7GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 16.62 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 16.57 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 16.56 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 16.56 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 16.55 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/VibeVoice-1.5B | Q8 | 16.54 tok/sEstimated Auto-generated benchmark | 5GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 16.50 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 16.49 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 16.47 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 16.41 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 16.40 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 16.36 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 16.36 tok/sEstimated Auto-generated benchmark | 6GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 16.31 tok/sEstimated Auto-generated benchmark | 7GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 16.31 tok/sEstimated Auto-generated benchmark | 10GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 16.22 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 16.20 tok/sEstimated Auto-generated benchmark | 8GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 16.20 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 16.16 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 16.14 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 16.13 tok/sEstimated Auto-generated benchmark | 5GB |
| numind/NuExtract-1.5 | Q8 | 16.12 tok/sEstimated Auto-generated benchmark | 7GB |
| vikhyatk/moondream2 | Q8 | 16.07 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B-Base | Q4 | 16.03 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 15.95 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 15.94 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 15.91 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 15.89 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 15.85 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 15.84 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 15.79 tok/sEstimated Auto-generated benchmark | 8GB |
| openai/gpt-oss-20b | Q4 | 15.78 tok/sEstimated Auto-generated benchmark | 10GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 15.76 tok/sEstimated Auto-generated benchmark | 7GB |
| dicta-il/dictalm2.0-instruct | Q8 | 15.74 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 15.73 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 15.73 tok/sEstimated Auto-generated benchmark | 7GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 15.70 tok/sEstimated Auto-generated benchmark | 8GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 15.70 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 15.68 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 15.62 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B | Q8 | 15.56 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 15.50 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 15.42 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.6-FP8 | Q8 | 15.41 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 15.41 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 15.40 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 15.37 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 15.22 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 15.22 tok/sEstimated Auto-generated benchmark | 8GB |
| openai-community/gpt2-xl | Q8 | 15.19 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-small | Q8 | 15.17 tok/sEstimated Auto-generated benchmark | 7GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 15.16 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 15.07 tok/sEstimated Auto-generated benchmark | 8GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 15.06 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 15.05 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-7B | Q8 | 14.99 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 14.99 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 14.98 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B | Q8 | 14.95 tok/sEstimated Auto-generated benchmark | 8GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 14.92 tok/sEstimated Auto-generated benchmark | 8GB |
| openai-community/gpt2-large | Q8 | 14.87 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 14.85 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/Phi-4-mini-instruct | Q8 | 14.84 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 14.77 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 14.75 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 14.74 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 14.71 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 14.70 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 14.69 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 14.67 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 14.66 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 14.65 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 14.61 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 14.61 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 14.57 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 14.57 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 14.52 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 14.41 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 14.37 tok/sEstimated Auto-generated benchmark | 8GB |
| huggyllama/llama-7b | Q8 | 14.30 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 14.30 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 14.22 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 14.16 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 14.14 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 14.03 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 13.94 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 13.83 tok/sEstimated Auto-generated benchmark | 8GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 13.78 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 13.75 tok/sEstimated Auto-generated benchmark | 15GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 13.74 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 13.59 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 13.16 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q4 | 13.15 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 13.09 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 12.89 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-14B | Q8 | 12.82 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 12.79 tok/sEstimated Auto-generated benchmark | 16GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 12.69 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 12.66 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 12.66 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 12.63 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-32B | Q4 | 12.60 tok/sEstimated Auto-generated benchmark | 16GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 12.41 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2.5-14B | Q8 | 12.14 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-14B-Base | Q8 | 11.94 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 11.76 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 11.69 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 11.49 tok/sEstimated Auto-generated benchmark | 14GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 11.48 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-32B | Q4 | 11.48 tok/sEstimated Auto-generated benchmark | 16GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 11.31 tok/sEstimated Auto-generated benchmark | 14GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 16GB) |
| ai-forever/ruGPT-3.5-13B | Q8 | Fits comfortably | 12.41 tok/sEstimated | 13GB (have 16GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 16.22 tok/sEstimated | 7GB (have 16GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 16GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Fits (tight) | 11.48 tok/sEstimated | 16GB (have 16GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 15.70 tok/sEstimated | 7GB (have 16GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 23.37 tok/sEstimated | 4GB (have 16GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 15.07 tok/sEstimated | 8GB (have 16GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 22.27 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 14.65 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 20.90 tok/sEstimated | 4GB (have 16GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Not supported | — | 20GB (have 16GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits comfortably | 14.16 tok/sEstimated | 10GB (have 16GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 14.77 tok/sEstimated | 7GB (have 16GB) |
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 24.75 tok/sEstimated | 4GB (have 16GB) |
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 15.74 tok/sEstimated | 7GB (have 16GB) |
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 21.89 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Fits (tight) | 12.69 tok/sEstimated | 15GB (have 16GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 14.92 tok/sEstimated | 8GB (have 16GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 23.34 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Fits (tight) | 12.66 tok/sEstimated | 15GB (have 16GB) |
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 16.13 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 26.01 tok/sEstimated | 3GB (have 16GB) |
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 17.27 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 24.26 tok/sEstimated | 4GB (have 16GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 17.96 tok/sEstimated | 5GB (have 16GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 27.80 tok/sEstimated | 3GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Fits (tight) | 12.89 tok/sEstimated | 15GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Fits (tight) | 12.63 tok/sEstimated | 15GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Fits (tight) | 13.09 tok/sEstimated | 15GB (have 16GB) |
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 16GB) |
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 16GB) |
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 33.85 tok/sEstimated | 1GB (have 16GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 49.85 tok/sEstimated | 1GB (have 16GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 14.85 tok/sEstimated | 8GB (have 16GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 21.64 tok/sEstimated | 4GB (have 16GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits comfortably | 13.74 tok/sEstimated | 9GB (have 16GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 20.33 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 20.79 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 30.26 tok/sEstimated | 2GB (have 16GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 16.40 tok/sEstimated | 7GB (have 16GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 22.12 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Fits comfortably | 13.59 tok/sEstimated | 13GB (have 16GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 18.04 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 16GB) |
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 31.97 tok/sEstimated | 1GB (have 16GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 44.13 tok/sEstimated | 1GB (have 16GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 19.99 tok/sEstimated | 3GB (have 16GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 30.15 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 16GB) |
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 16.86 tok/sEstimated | 7GB (have 16GB) |
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 22.60 tok/sEstimated | 4GB (have 16GB) |
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 15.22 tok/sEstimated | 7GB (have 16GB) |
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 22.72 tok/sEstimated | 4GB (have 16GB) |
| google/gemma-3-270m-it | Q8 | Fits comfortably | 15.73 tok/sEstimated | 7GB (have 16GB) |
| google/gemma-3-270m-it | Q4 | Fits comfortably | 23.43 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 17.54 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 29.01 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 16GB) |
| Qwen/Qwen2.5-32B | Q4 | Fits (tight) | 11.48 tok/sEstimated | 16GB (have 16GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 16.56 tok/sEstimated | 7GB (have 16GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 21.35 tok/sEstimated | 4GB (have 16GB) |
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 16.16 tok/sEstimated | 7GB (have 16GB) |
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 23.04 tok/sEstimated | 4GB (have 16GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 16.54 tok/sEstimated | 5GB (have 16GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 24.48 tok/sEstimated | 3GB (have 16GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 23.12 tok/sEstimated | 2GB (have 16GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 36.26 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 16GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 16GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 14.57 tok/sEstimated | 7GB (have 16GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 20.73 tok/sEstimated | 4GB (have 16GB) |
| google/gemma-2b | Q8 | Fits comfortably | 22.63 tok/sEstimated | 2GB (have 16GB) |
| google/gemma-2b | Q4 | Fits comfortably | 31.98 tok/sEstimated | 1GB (have 16GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 16.71 tok/sEstimated | 7GB (have 16GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 22.95 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 16GB) |
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 16GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 13.83 tok/sEstimated | 8GB (have 16GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 22.03 tok/sEstimated | 4GB (have 16GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 14.84 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 22.16 tok/sEstimated | 4GB (have 16GB) |
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 15.16 tok/sEstimated | 7GB (have 16GB) |
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 23.49 tok/sEstimated | 4GB (have 16GB) |
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 16.73 tok/sEstimated | 7GB (have 16GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 22.04 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 19.81 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 30.36 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Fits (tight) | 13.75 tok/sEstimated | 15GB (have 16GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 15.22 tok/sEstimated | 8GB (have 16GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 21.99 tok/sEstimated | 4GB (have 16GB) |
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 34.69 tok/sEstimated | 1GB (have 16GB) |
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 47.43 tok/sEstimated | 1GB (have 16GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 15.41 tok/sEstimated | 8GB (have 16GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 23.53 tok/sEstimated | 4GB (have 16GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 16GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 16GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 15.50 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 24.29 tok/sEstimated | 4GB (have 16GB) |
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 16.12 tok/sEstimated | 7GB (have 16GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 22.49 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 15.91 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 21.03 tok/sEstimated | 4GB (have 16GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 16.20 tok/sEstimated | 7GB (have 16GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 21.83 tok/sEstimated | 4GB (have 16GB) |
| huggyllama/llama-7b | Q8 | Fits comfortably | 14.30 tok/sEstimated | 7GB (have 16GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 20.51 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 15.84 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 21.21 tok/sEstimated | 4GB (have 16GB) |
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 17.23 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 23.19 tok/sEstimated | 4GB (have 16GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 15.62 tok/sEstimated | 7GB (have 16GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 22.25 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 14.41 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 20.06 tok/sEstimated | 4GB (have 16GB) |
| openai-community/gpt2-xl | Q8 | Fits comfortably | 15.19 tok/sEstimated | 7GB (have 16GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 23.87 tok/sEstimated | 4GB (have 16GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Fits comfortably | 11.31 tok/sEstimated | 14GB (have 16GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 16.62 tok/sEstimated | 7GB (have 16GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 16GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 19.12 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 26.14 tok/sEstimated | 2GB (have 16GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 23.39 tok/sEstimated | 3GB (have 16GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 28.58 tok/sEstimated | 2GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 18.74 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 26.24 tok/sEstimated | 2GB (have 16GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 22.39 tok/sEstimated | 3GB (have 16GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 31.59 tok/sEstimated | 2GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 19.27 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 26.14 tok/sEstimated | 2GB (have 16GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 20.04 tok/sEstimated | 3GB (have 16GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 29.54 tok/sEstimated | 2GB (have 16GB) |
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 16.36 tok/sEstimated | 7GB (have 16GB) |
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 23.06 tok/sEstimated | 4GB (have 16GB) |
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 16GB) |
| codellama/CodeLlama-34b-hf | Q4 | Not supported | — | 17GB (have 16GB) |
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 34.20 tok/sEstimated | 1GB (have 16GB) |
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 49.33 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 17.95 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 27.26 tok/sEstimated | 3GB (have 16GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 24.59 tok/sEstimated | 2GB (have 16GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 37.58 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen2.5-14B | Q8 | Fits comfortably | 12.14 tok/sEstimated | 14GB (have 16GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 18.10 tok/sEstimated | 7GB (have 16GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 16GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Fits (tight) | 11.69 tok/sEstimated | 16GB (have 16GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 16.31 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 21.09 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 19.53 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 25.54 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 14.98 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 20.63 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 16.57 tok/sEstimated | 7GB (have 16GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 21.51 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-14B-Base | Q8 | Fits comfortably | 11.94 tok/sEstimated | 14GB (have 16GB) |
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 16.03 tok/sEstimated | 7GB (have 16GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 15.70 tok/sEstimated | 8GB (have 16GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 19.52 tok/sEstimated | 4GB (have 16GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 15.89 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 23.77 tok/sEstimated | 4GB (have 16GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 15.73 tok/sEstimated | 7GB (have 16GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 22.97 tok/sEstimated | 4GB (have 16GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 14.75 tok/sEstimated | 7GB (have 16GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 24.25 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 18.91 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 26.15 tok/sEstimated | 3GB (have 16GB) |
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 13.78 tok/sEstimated | 8GB (have 16GB) |
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 19.72 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-30B-A3B | Q4 | Fits (tight) | 13.15 tok/sEstimated | 15GB (have 16GB) |
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 14.52 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 22.49 tok/sEstimated | 4GB (have 16GB) |
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 15.17 tok/sEstimated | 7GB (have 16GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 21.46 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 13.94 tok/sEstimated | 8GB (have 16GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 20.63 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Fits (tight) | 13.16 tok/sEstimated | 15GB (have 16GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 20.23 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 24.98 tok/sEstimated | 2GB (have 16GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 16.68 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 21.59 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 15.94 tok/sEstimated | 8GB (have 16GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 23.32 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 16.36 tok/sEstimated | 6GB (have 16GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 23.52 tok/sEstimated | 3GB (have 16GB) |
| openai-community/gpt2-medium | Q8 | Fits comfortably | 14.99 tok/sEstimated | 7GB (have 16GB) |
| openai-community/gpt2-medium | Q4 | Fits comfortably | 24.52 tok/sEstimated | 4GB (have 16GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 14.22 tok/sEstimated | 7GB (have 16GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 23.05 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 18.06 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 23.42 tok/sEstimated | 3GB (have 16GB) |
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 16.50 tok/sEstimated | 7GB (have 16GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 24.55 tok/sEstimated | 4GB (have 16GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Not supported | — | 20GB (have 16GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 14.66 tok/sEstimated | 10GB (have 16GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 16GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 16GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 16.14 tok/sEstimated | 8GB (have 16GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 20.14 tok/sEstimated | 4GB (have 16GB) |
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 14.70 tok/sEstimated | 7GB (have 16GB) |
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 21.96 tok/sEstimated | 4GB (have 16GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 15.76 tok/sEstimated | 7GB (have 16GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 23.32 tok/sEstimated | 4GB (have 16GB) |
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 24.37 tok/sEstimated | 2GB (have 16GB) |
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 33.90 tok/sEstimated | 1GB (have 16GB) |
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 14.69 tok/sEstimated | 7GB (have 16GB) |
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 22.20 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 16GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits (tight) | 12.79 tok/sEstimated | 16GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 15.40 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 22.74 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 15.79 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 22.28 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 15.42 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 23.09 tok/sEstimated | 4GB (have 16GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 14.57 tok/sEstimated | 7GB (have 16GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 24.66 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 19.41 tok/sEstimated | 3GB (have 16GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 31.45 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 17.13 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 27.85 tok/sEstimated | 3GB (have 16GB) |
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 15.06 tok/sEstimated | 7GB (have 16GB) |
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 22.04 tok/sEstimated | 4GB (have 16GB) |
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 16.55 tok/sEstimated | 7GB (have 16GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 21.42 tok/sEstimated | 4GB (have 16GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 15.41 tok/sEstimated | 7GB (have 16GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 24.27 tok/sEstimated | 4GB (have 16GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 14.30 tok/sEstimated | 7GB (have 16GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 24.14 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 14.03 tok/sEstimated | 8GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 22.23 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 16.73 tok/sEstimated | 7GB (have 16GB) |
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 20.80 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 14.67 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 21.51 tok/sEstimated | 4GB (have 16GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 16.47 tok/sEstimated | 7GB (have 16GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 20.83 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 16GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 16GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 17.29 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 26.89 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen3-14B | Q8 | Fits comfortably | 12.82 tok/sEstimated | 14GB (have 16GB) |
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 16.76 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 14.14 tok/sEstimated | 8GB (have 16GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 23.55 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 16GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 16GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 15.05 tok/sEstimated | 8GB (have 16GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 19.41 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 11.49 tok/sEstimated | 14GB (have 16GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 16.63 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 16.74 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 24.77 tok/sEstimated | 3GB (have 16GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 19.29 tok/sEstimated | 4GB (have 16GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 27.79 tok/sEstimated | 2GB (have 16GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Not supported | — | 20GB (have 16GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 16.31 tok/sEstimated | 10GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 16.90 tok/sEstimated | 5GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 23.87 tok/sEstimated | 3GB (have 16GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 14.74 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 19.39 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 15.85 tok/sEstimated | 6GB (have 16GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 23.66 tok/sEstimated | 3GB (have 16GB) |
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 15.68 tok/sEstimated | 7GB (have 16GB) |
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 23.34 tok/sEstimated | 4GB (have 16GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 23.33 tok/sEstimated | 3GB (have 16GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 33.06 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Fits (tight) | 11.76 tok/sEstimated | 15GB (have 16GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 19.57 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 28.61 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 16.41 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 22.45 tok/sEstimated | 4GB (have 16GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 14.87 tok/sEstimated | 7GB (have 16GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 24.43 tok/sEstimated | 4GB (have 16GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 16.78 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 20.38 tok/sEstimated | 4GB (have 16GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 28.90 tok/sEstimated | 1GB (have 16GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 46.23 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 16GB) |
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 16GB) |
| Qwen/Qwen3-32B | Q4 | Fits (tight) | 12.60 tok/sEstimated | 16GB (have 16GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 17.25 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 24.05 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 14.99 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 20.39 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 14.37 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 21.47 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 33.27 tok/sEstimated | 1GB (have 16GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 49.01 tok/sEstimated | 1GB (have 16GB) |
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 15.95 tok/sEstimated | 7GB (have 16GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 23.76 tok/sEstimated | 4GB (have 16GB) |
| vikhyatk/moondream2 | Q8 | Fits comfortably | 16.07 tok/sEstimated | 7GB (have 16GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 22.37 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 20.50 tok/sEstimated | 3GB (have 16GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 29.98 tok/sEstimated | 2GB (have 16GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 16GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 16GB) |
| distilbert/distilgpt2 | Q8 | Fits comfortably | 17.20 tok/sEstimated | 7GB (have 16GB) |
| distilbert/distilgpt2 | Q4 | Fits comfortably | 24.58 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Fits (tight) | 12.66 tok/sEstimated | 16GB (have 16GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 20.88 tok/sEstimated | 3GB (have 16GB) |
| inference-net/Schematron-3B | Q4 | Fits comfortably | 33.09 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 14.95 tok/sEstimated | 8GB (have 16GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 20.54 tok/sEstimated | 4GB (have 16GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 14.71 tok/sEstimated | 7GB (have 16GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 22.48 tok/sEstimated | 4GB (have 16GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 21.54 tok/sEstimated | 3GB (have 16GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 28.00 tok/sEstimated | 2GB (have 16GB) |
| bigscience/bloomz-560m | Q8 | Fits comfortably | 16.56 tok/sEstimated | 7GB (have 16GB) |
| bigscience/bloomz-560m | Q4 | Fits comfortably | 21.25 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 22.24 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 28.80 tok/sEstimated | 2GB (have 16GB) |
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 16GB) |
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 16GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 28.68 tok/sEstimated | 1GB (have 16GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 48.53 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 19.19 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 29.98 tok/sEstimated | 2GB (have 16GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 16.49 tok/sEstimated | 7GB (have 16GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 20.56 tok/sEstimated | 4GB (have 16GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 34.33 tok/sEstimated | 1GB (have 16GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 49.50 tok/sEstimated | 1GB (have 16GB) |
| facebook/opt-125m | Q8 | Fits comfortably | 14.61 tok/sEstimated | 7GB (have 16GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 20.67 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 17.69 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 23.81 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 17.96 tok/sEstimated | 6GB (have 16GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 23.78 tok/sEstimated | 3GB (have 16GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 30.61 tok/sEstimated | 1GB (have 16GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 45.31 tok/sEstimated | 1GB (have 16GB) |
| openai/gpt-oss-20b | Q8 | Not supported | — | 20GB (have 16GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 15.78 tok/sEstimated | 10GB (have 16GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 16GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Not supported | — | 17GB (have 16GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 16.20 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 19.33 tok/sEstimated | 4GB (have 16GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 18.59 tok/sEstimated | 5GB (have 16GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 27.51 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 15.56 tok/sEstimated | 6GB (have 16GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 23.01 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 14.61 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 21.38 tok/sEstimated | 4GB (have 16GB) |
| openai-community/gpt2 | Q8 | Fits comfortably | 15.37 tok/sEstimated | 7GB (have 16GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 24.57 tok/sEstimated | 4GB (have 16GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.
Explore how RTX 3090 stacks up for local inference workloads.