Loading GPU data...
Loading GPU data...
Quick Answer: RTX 4080 offers 16GB VRAM and starts around $1199.00. It delivers approximately 105 tokens/sec on unsloth/gemma-3-1b-it. It typically draws 320W under load.
RTX 4080 balances throughput and efficiency. It crushes 8B–13B models, handles most 70B work with clever quantization, and stays manageable in terms of power and thermals.
Rotate out primary variants whenever validation flags an issue.
| Model | Quantization | Tokens/sec | VRAM used | 
|---|---|---|---|
| unsloth/gemma-3-1b-it | Q4 | 105.18 tok/sEstimated Auto-generated benchmark  | 1GB | 
| allenai/OLMo-2-0425-1B | Q4 | 104.81 tok/sEstimated Auto-generated benchmark  | 1GB | 
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 104.74 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-3.2-1B | Q4 | 102.82 tok/sEstimated Auto-generated benchmark  | 1GB | 
| apple/OpenELM-1_1B-Instruct | Q4 | 101.29 tok/sEstimated Auto-generated benchmark  | 1GB | 
| google/gemma-3-1b-it | Q4 | 93.27 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-Guard-3-1B | Q4 | 91.88 tok/sEstimated Auto-generated benchmark  | 1GB | 
| unsloth/Llama-3.2-1B-Instruct | Q4 | 90.38 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 86.41 tok/sEstimated Auto-generated benchmark  | 1GB | 
| google/gemma-2-2b-it | Q4 | 82.37 tok/sEstimated Auto-generated benchmark  | 1GB | 
| LiquidAI/LFM2-1.2B | Q4 | 81.45 tok/sEstimated Auto-generated benchmark  | 1GB | 
| ibm-granite/granite-3.3-2b-instruct | Q4 | 80.99 tok/sEstimated Auto-generated benchmark  | 1GB | 
| apple/OpenELM-1_1B-Instruct | Q8 | 72.45 tok/sEstimated Auto-generated benchmark  | 1GB | 
| unsloth/Llama-3.2-3B-Instruct | Q4 | 70.28 tok/sEstimated Auto-generated benchmark  | 2GB | 
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 69.33 tok/sEstimated Auto-generated benchmark  | 1GB | 
| google/gemma-2b | Q4 | 69.25 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 69.15 tok/sEstimated Auto-generated benchmark  | 2GB | 
| bigcode/starcoder2-3b | Q4 | 68.96 tok/sEstimated Auto-generated benchmark  | 2GB | 
| meta-llama/Llama-3.2-1B | Q8 | 68.77 tok/sEstimated Auto-generated benchmark  | 1GB | 
| google/gemma-3-1b-it | Q8 | 68.40 tok/sEstimated Auto-generated benchmark  | 1GB | 
| allenai/OLMo-2-0425-1B | Q8 | 68.34 tok/sEstimated Auto-generated benchmark  | 1GB | 
| unsloth/gemma-3-1b-it | Q8 | 68.27 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-3.2-3B | Q4 | 68.01 tok/sEstimated Auto-generated benchmark  | 2GB | 
| inference-net/Schematron-3B | Q4 | 67.83 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-3B-Instruct | Q4 | 67.59 tok/sEstimated Auto-generated benchmark  | 2GB | 
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 66.54 tok/sEstimated Auto-generated benchmark  | 1GB | 
| ibm-research/PowerMoE-3b | Q4 | 65.30 tok/sEstimated Auto-generated benchmark  | 2GB | 
| unsloth/Llama-3.2-1B-Instruct | Q8 | 64.84 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-Guard-3-1B | Q8 | 64.55 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 64.48 tok/sEstimated Auto-generated benchmark  | 2GB | 
| microsoft/Phi-3.5-mini-instruct | Q4 | 61.71 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen3-4B-Base | Q4 | 61.58 tok/sEstimated Auto-generated benchmark  | 2GB | 
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 61.08 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 60.72 tok/sEstimated Auto-generated benchmark  | 2GB | 
| google-t5/t5-3b | Q4 | 60.49 tok/sEstimated Auto-generated benchmark  | 2GB | 
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 60.36 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen3-Embedding-4B | Q4 | 60.03 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-3B | Q4 | 58.72 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 58.65 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 58.43 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2-0.5B | Q4 | 58.08 tok/sEstimated Auto-generated benchmark  | 3GB | 
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 57.60 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 57.14 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 57.12 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 56.69 tok/sEstimated Auto-generated benchmark  | 2GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 56.54 tok/sEstimated Auto-generated benchmark  | 2GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 55.85 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-1.5B | Q4 | 55.06 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen2-1.5B-Instruct | Q4 | 54.08 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 54.08 tok/sEstimated Auto-generated benchmark  | 3GB | 
| google/gemma-2-2b-it | Q8 | 53.78 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-Math-1.5B | Q4 | 53.71 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen3-4B | Q4 | 53.39 tok/sEstimated Auto-generated benchmark  | 2GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 53.23 tok/sEstimated Auto-generated benchmark  | 2GB | 
| google/gemma-2b | Q8 | 52.93 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen3-0.6B | Q4 | 52.82 tok/sEstimated Auto-generated benchmark  | 3GB | 
| google/gemma-3-270m-it | Q4 | 52.36 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-1.7B-Base | Q4 | 52.21 tok/sEstimated Auto-generated benchmark  | 4GB | 
| lmsys/vicuna-7b-v1.5 | Q4 | 52.18 tok/sEstimated Auto-generated benchmark  | 4GB | 
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 52.14 tok/sEstimated Auto-generated benchmark  | 4GB | 
| mistralai/Mistral-7B-v0.1 | Q4 | 52.12 tok/sEstimated Auto-generated benchmark  | 4GB | 
| HuggingFaceH4/zephyr-7b-beta | Q4 | 51.86 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-4-mini-instruct | Q4 | 51.73 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/VibeVoice-1.5B | Q4 | 51.55 tok/sEstimated Auto-generated benchmark  | 3GB | 
| MiniMaxAI/MiniMax-M2 | Q4 | 51.49 tok/sEstimated Auto-generated benchmark  | 4GB | 
| vikhyatk/moondream2 | Q4 | 51.43 tok/sEstimated Auto-generated benchmark  | 4GB | 
| HuggingFaceTB/SmolLM-135M | Q4 | 51.27 tok/sEstimated Auto-generated benchmark  | 4GB | 
| parler-tts/parler-tts-large-v1 | Q4 | 51.12 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-2-7b-chat-hf | Q4 | 50.84 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 50.74 tok/sEstimated Auto-generated benchmark  | 3GB | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 50.63 tok/sEstimated Auto-generated benchmark  | 3GB | 
| HuggingFaceTB/SmolLM2-135M | Q4 | 50.50 tok/sEstimated Auto-generated benchmark  | 4GB | 
| LiquidAI/LFM2-1.2B | Q8 | 50.45 tok/sEstimated Auto-generated benchmark  | 2GB | 
| EleutherAI/gpt-neo-125m | Q4 | 50.42 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/DialoGPT-medium | Q4 | 50.40 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-2-7b-hf | Q4 | 50.29 tok/sEstimated Auto-generated benchmark  | 4GB | 
| openai-community/gpt2-medium | Q4 | 50.26 tok/sEstimated Auto-generated benchmark  | 4GB | 
| sshleifer/tiny-gpt2 | Q4 | 50.26 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/phi-4 | Q4 | 50.25 tok/sEstimated Auto-generated benchmark  | 4GB | 
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 50.14 tok/sEstimated Auto-generated benchmark  | 4GB | 
| BSC-LT/salamandraTA-7b-instruct | Q4 | 49.99 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-Reranker-0.6B | Q4 | 49.95 tok/sEstimated Auto-generated benchmark  | 3GB | 
| petals-team/StableBeluga2 | Q4 | 49.72 tok/sEstimated Auto-generated benchmark  | 4GB | 
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 49.60 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-4-multimodal-instruct | Q4 | 49.55 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-3.1-8B | Q4 | 49.52 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2-0.5B-Instruct | Q4 | 49.39 tok/sEstimated Auto-generated benchmark  | 3GB | 
| ibm-granite/granite-docling-258M | Q4 | 49.36 tok/sEstimated Auto-generated benchmark  | 4GB | 
| llamafactory/tiny-random-Llama-3 | Q4 | 49.36 tok/sEstimated Auto-generated benchmark  | 4GB | 
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 49.33 tok/sEstimated Auto-generated benchmark  | 4GB | 
| ibm-granite/granite-3.3-2b-instruct | Q8 | 49.32 tok/sEstimated Auto-generated benchmark  | 2GB | 
| deepseek-ai/DeepSeek-R1 | Q4 | 49.05 tok/sEstimated Auto-generated benchmark  | 4GB | 
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 48.86 tok/sEstimated Auto-generated benchmark  | 4GB | 
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 48.81 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen3-Embedding-0.6B | Q4 | 48.76 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen2.5-0.5B | Q4 | 48.66 tok/sEstimated Auto-generated benchmark  | 3GB | 
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 48.60 tok/sEstimated Auto-generated benchmark  | 4GB | 
| zai-org/GLM-4.6-FP8 | Q4 | 48.55 tok/sEstimated Auto-generated benchmark  | 4GB | 
| openai-community/gpt2-large | Q4 | 48.53 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-0.6B-Base | Q4 | 48.48 tok/sEstimated Auto-generated benchmark  | 3GB | 
| zai-org/GLM-4.5-Air | Q4 | 48.48 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 48.47 tok/sEstimated Auto-generated benchmark  | 3GB | 
| inference-net/Schematron-3B | Q8 | 48.44 tok/sEstimated Auto-generated benchmark  | 3GB | 
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 48.44 tok/sEstimated Auto-generated benchmark  | 4GB | 
| dicta-il/dictalm2.0-instruct | Q4 | 48.31 tok/sEstimated Auto-generated benchmark  | 4GB | 
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 48.26 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 48.04 tok/sEstimated Auto-generated benchmark  | 5GB | 
| skt/kogpt2-base-v2 | Q4 | 47.91 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-3-mini-4k-instruct | Q4 | 47.68 tok/sEstimated Auto-generated benchmark  | 4GB | 
| openai-community/gpt2 | Q4 | 47.46 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 47.35 tok/sEstimated Auto-generated benchmark  | 3GB | 
| microsoft/Phi-3-mini-128k-instruct | Q4 | 47.20 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 47.01 tok/sEstimated Auto-generated benchmark  | 4GB | 
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 46.99 tok/sEstimated Auto-generated benchmark  | 4GB | 
| unsloth/Llama-3.2-3B-Instruct | Q8 | 46.89 tok/sEstimated Auto-generated benchmark  | 3GB | 
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 46.77 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-8B-FP8 | Q4 | 46.72 tok/sEstimated Auto-generated benchmark  | 4GB | 
| bigscience/bloomz-560m | Q4 | 46.70 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 46.61 tok/sEstimated Auto-generated benchmark  | 4GB | 
| google-t5/t5-3b | Q8 | 46.60 tok/sEstimated Auto-generated benchmark  | 3GB | 
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 46.55 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-1.7B | Q4 | 46.46 tok/sEstimated Auto-generated benchmark  | 4GB | 
| ibm-research/PowerMoE-3b | Q8 | 46.42 tok/sEstimated Auto-generated benchmark  | 3GB | 
| GSAI-ML/LLaDA-8B-Base | Q4 | 46.28 tok/sEstimated Auto-generated benchmark  | 4GB | 
| rednote-hilab/dots.ocr | Q4 | 46.23 tok/sEstimated Auto-generated benchmark  | 4GB | 
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 46.15 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 46.15 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-7B | Q4 | 46.14 tok/sEstimated Auto-generated benchmark  | 4GB | 
| rinna/japanese-gpt-neox-small | Q4 | 46.05 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/phi-2 | Q4 | 46.03 tok/sEstimated Auto-generated benchmark  | 4GB | 
| facebook/opt-125m | Q4 | 45.71 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-7B-Instruct | Q4 | 45.64 tok/sEstimated Auto-generated benchmark  | 4GB | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 45.45 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/DialoGPT-small | Q4 | 45.39 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-V3.1 | Q4 | 45.38 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-7B-Instruct | Q4 | 45.36 tok/sEstimated Auto-generated benchmark  | 5GB | 
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 45.18 tok/sEstimated Auto-generated benchmark  | 5GB | 
| microsoft/Phi-3.5-vision-instruct | Q4 | 44.97 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2-7B-Instruct | Q4 | 44.83 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-Embedding-8B | Q4 | 44.81 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-3B-Instruct | Q8 | 44.66 tok/sEstimated Auto-generated benchmark  | 3GB | 
| ibm-granite/granite-3.3-8b-instruct | Q4 | 44.64 tok/sEstimated Auto-generated benchmark  | 4GB | 
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 44.43 tok/sEstimated Auto-generated benchmark  | 4GB | 
| distilbert/distilgpt2 | Q4 | 44.38 tok/sEstimated Auto-generated benchmark  | 4GB | 
| numind/NuExtract-1.5 | Q4 | 44.31 tok/sEstimated Auto-generated benchmark  | 4GB | 
| EleutherAI/pythia-70m-deduped | Q4 | 44.26 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-4B | Q8 | 44.23 tok/sEstimated Auto-generated benchmark  | 4GB | 
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 44.19 tok/sEstimated Auto-generated benchmark  | 5GB | 
| liuhaotian/llava-v1.5-7b | Q4 | 44.12 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-V3 | Q4 | 44.11 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 44.08 tok/sEstimated Auto-generated benchmark  | 4GB | 
| openai-community/gpt2-xl | Q4 | 44.01 tok/sEstimated Auto-generated benchmark  | 4GB | 
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 43.57 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-3.5-mini-instruct | Q4 | 43.57 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 43.29 tok/sEstimated Auto-generated benchmark  | 4GB | 
| IlyaGusev/saiga_llama3_8b | Q4 | 42.98 tok/sEstimated Auto-generated benchmark  | 4GB | 
| huggyllama/llama-7b | Q4 | 42.88 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-3.2-3B | Q8 | 42.84 tok/sEstimated Auto-generated benchmark  | 3GB | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 42.76 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 42.68 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-Guard-3-8B | Q4 | 42.59 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-8B-Base | Q4 | 42.46 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 41.92 tok/sEstimated Auto-generated benchmark  | 4GB | 
| google/gemma-2-9b-it | Q4 | 41.48 tok/sEstimated Auto-generated benchmark  | 6GB | 
| Qwen/Qwen3-8B | Q4 | 41.45 tok/sEstimated Auto-generated benchmark  | 4GB | 
| bigcode/starcoder2-3b | Q8 | 41.41 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen2.5-Math-1.5B | Q8 | 41.17 tok/sEstimated Auto-generated benchmark  | 5GB | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 41.13 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2.5-3B | Q8 | 41.07 tok/sEstimated Auto-generated benchmark  | 3GB | 
| meta-llama/Meta-Llama-3-8B | Q4 | 41.06 tok/sEstimated Auto-generated benchmark  | 4GB | 
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 41.03 tok/sEstimated Auto-generated benchmark  | 4GB | 
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 41.01 tok/sEstimated Auto-generated benchmark  | 4GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 40.99 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 40.92 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2-1.5B-Instruct | Q8 | 40.75 tok/sEstimated Auto-generated benchmark  | 5GB | 
| microsoft/Phi-3.5-mini-instruct | Q8 | 40.25 tok/sEstimated Auto-generated benchmark  | 5GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 40.05 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-0.5B | Q8 | 39.96 tok/sEstimated Auto-generated benchmark  | 5GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 39.84 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 39.75 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-14B | Q4 | 39.32 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 39.18 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen3-14B-Base | Q4 | 39.00 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen3-4B-Base | Q8 | 38.92 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-0.6B | Q8 | 38.78 tok/sEstimated Auto-generated benchmark  | 6GB | 
| Qwen/Qwen3-Embedding-4B | Q8 | 38.37 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-0.6B-Base | Q8 | 38.36 tok/sEstimated Auto-generated benchmark  | 6GB | 
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 37.97 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-2-13b-chat-hf | Q4 | 37.27 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 37.22 tok/sEstimated Auto-generated benchmark  | 5GB | 
| ai-forever/ruGPT-3.5-13B | Q4 | 37.11 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/VibeVoice-1.5B | Q8 | 36.93 tok/sEstimated Auto-generated benchmark  | 5GB | 
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 36.86 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-3-mini-4k-instruct | Q8 | 36.65 tok/sEstimated Auto-generated benchmark  | 7GB | 
| llamafactory/tiny-random-Llama-3 | Q8 | 36.56 tok/sEstimated Auto-generated benchmark  | 7GB | 
| MiniMaxAI/MiniMax-M2 | Q8 | 36.54 tok/sEstimated Auto-generated benchmark  | 7GB | 
| facebook/opt-125m | Q8 | 36.42 tok/sEstimated Auto-generated benchmark  | 7GB | 
| zai-org/GLM-4.6-FP8 | Q8 | 36.30 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen3-Reranker-0.6B | Q8 | 36.14 tok/sEstimated Auto-generated benchmark  | 6GB | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 36.13 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-4-multimodal-instruct | Q8 | 36.07 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 35.97 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 35.79 tok/sEstimated Auto-generated benchmark  | 5GB | 
| HuggingFaceH4/zephyr-7b-beta | Q8 | 35.76 tok/sEstimated Auto-generated benchmark  | 7GB | 
| deepseek-ai/DeepSeek-V3 | Q8 | 35.74 tok/sEstimated Auto-generated benchmark  | 7GB | 
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 35.65 tok/sEstimated Auto-generated benchmark  | 7GB | 
| openai-community/gpt2-medium | Q8 | 35.41 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen3-1.7B | Q8 | 35.39 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-3-medium-128k-instruct | Q4 | 35.36 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen2.5-1.5B | Q8 | 35.36 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2-0.5B | Q8 | 35.23 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2-0.5B-Instruct | Q8 | 35.08 tok/sEstimated Auto-generated benchmark  | 5GB | 
| meta-llama/Llama-2-7b-hf | Q8 | 35.08 tok/sEstimated Auto-generated benchmark  | 7GB | 
| deepseek-ai/DeepSeek-R1 | Q8 | 35.07 tok/sEstimated Auto-generated benchmark  | 7GB | 
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 34.95 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-3.5-mini-instruct | Q8 | 34.88 tok/sEstimated Auto-generated benchmark  | 7GB | 
| parler-tts/parler-tts-large-v1 | Q8 | 34.87 tok/sEstimated Auto-generated benchmark  | 7GB | 
| OpenPipe/Qwen3-14B-Instruct | Q4 | 34.87 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/DialoGPT-small | Q8 | 34.84 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-14B-Instruct | Q4 | 34.79 tok/sEstimated Auto-generated benchmark  | 7GB | 
| liuhaotian/llava-v1.5-7b | Q8 | 34.76 tok/sEstimated Auto-generated benchmark  | 7GB | 
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 34.75 tok/sEstimated Auto-generated benchmark  | 7GB | 
| petals-team/StableBeluga2 | Q8 | 34.74 tok/sEstimated Auto-generated benchmark  | 7GB | 
| lmsys/vicuna-7b-v1.5 | Q8 | 34.71 tok/sEstimated Auto-generated benchmark  | 7GB | 
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 34.65 tok/sEstimated Auto-generated benchmark  | 7GB | 
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 34.59 tok/sEstimated Auto-generated benchmark  | 7GB | 
| meta-llama/Llama-2-7b-chat-hf | Q8 | 34.44 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-14B-Instruct | Q4 | 34.39 tok/sEstimated Auto-generated benchmark  | 9GB | 
| Qwen/Qwen3-Embedding-0.6B | Q8 | 34.34 tok/sEstimated Auto-generated benchmark  | 6GB | 
| Qwen/Qwen2.5-7B-Instruct | Q8 | 34.34 tok/sEstimated Auto-generated benchmark  | 7GB | 
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 34.28 tok/sEstimated Auto-generated benchmark  | 9GB | 
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 34.16 tok/sEstimated Auto-generated benchmark  | 10GB | 
| distilbert/distilgpt2 | Q8 | 34.16 tok/sEstimated Auto-generated benchmark  | 7GB | 
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 34.15 tok/sEstimated Auto-generated benchmark  | 7GB | 
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 33.98 tok/sEstimated Auto-generated benchmark  | 8GB | 
| google/gemma-3-270m-it | Q8 | 33.95 tok/sEstimated Auto-generated benchmark  | 7GB | 
| HuggingFaceTB/SmolLM2-135M | Q8 | 33.80 tok/sEstimated Auto-generated benchmark  | 7GB | 
| mistralai/Mistral-7B-v0.1 | Q8 | 33.76 tok/sEstimated Auto-generated benchmark  | 7GB | 
| sshleifer/tiny-gpt2 | Q8 | 33.65 tok/sEstimated Auto-generated benchmark  | 7GB | 
| vikhyatk/moondream2 | Q8 | 33.63 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/phi-2 | Q8 | 33.61 tok/sEstimated Auto-generated benchmark  | 7GB | 
| EleutherAI/gpt-neo-125m | Q8 | 33.58 tok/sEstimated Auto-generated benchmark  | 7GB | 
| dicta-il/dictalm2.0-instruct | Q8 | 33.57 tok/sEstimated Auto-generated benchmark  | 7GB | 
| numind/NuExtract-1.5 | Q8 | 33.53 tok/sEstimated Auto-generated benchmark  | 7GB | 
| ibm-granite/granite-docling-258M | Q8 | 33.46 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-14B | Q4 | 33.34 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-3-mini-128k-instruct | Q8 | 33.11 tok/sEstimated Auto-generated benchmark  | 7GB | 
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 33.07 tok/sEstimated Auto-generated benchmark  | 8GB | 
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 32.87 tok/sEstimated Auto-generated benchmark  | 8GB | 
| huggyllama/llama-7b | Q8 | 32.82 tok/sEstimated Auto-generated benchmark  | 7GB | 
| skt/kogpt2-base-v2 | Q8 | 32.79 tok/sEstimated Auto-generated benchmark  | 7GB | 
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 32.75 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 32.53 tok/sEstimated Auto-generated benchmark  | 7GB | 
| meta-llama/Meta-Llama-3-8B | Q8 | 32.49 tok/sEstimated Auto-generated benchmark  | 8GB | 
| microsoft/Phi-3.5-vision-instruct | Q8 | 32.42 tok/sEstimated Auto-generated benchmark  | 7GB | 
| zai-org/GLM-4.5-Air | Q8 | 32.31 tok/sEstimated Auto-generated benchmark  | 7GB | 
| meta-llama/Llama-Guard-3-8B | Q8 | 32.00 tok/sEstimated Auto-generated benchmark  | 8GB | 
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 31.87 tok/sEstimated Auto-generated benchmark  | 7GB | 
| meta-llama/Llama-3.1-8B | Q8 | 31.87 tok/sEstimated Auto-generated benchmark  | 8GB | 
| microsoft/DialoGPT-medium | Q8 | 31.74 tok/sEstimated Auto-generated benchmark  | 7GB | 
| BSC-LT/salamandraTA-7b-instruct | Q8 | 31.62 tok/sEstimated Auto-generated benchmark  | 7GB | 
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 31.61 tok/sEstimated Auto-generated benchmark  | 8GB | 
| unsloth/gpt-oss-20b-BF16 | Q4 | 31.51 tok/sEstimated Auto-generated benchmark  | 10GB | 
| GSAI-ML/LLaDA-8B-Base | Q8 | 31.48 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen3-8B | Q8 | 31.43 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen3-1.7B-Base | Q8 | 31.41 tok/sEstimated Auto-generated benchmark  | 7GB | 
| openai-community/gpt2-large | Q8 | 31.40 tok/sEstimated Auto-generated benchmark  | 7GB | 
| deepseek-ai/DeepSeek-V3.1 | Q8 | 31.36 tok/sEstimated Auto-generated benchmark  | 7GB | 
| openai-community/gpt2 | Q8 | 31.36 tok/sEstimated Auto-generated benchmark  | 7GB | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 31.31 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen2-7B-Instruct | Q8 | 31.27 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-7B | Q8 | 31.06 tok/sEstimated Auto-generated benchmark  | 7GB | 
| EleutherAI/pythia-70m-deduped | Q8 | 31.00 tok/sEstimated Auto-generated benchmark  | 7GB | 
| bigscience/bloomz-560m | Q8 | 30.90 tok/sEstimated Auto-generated benchmark  | 7GB | 
| openai-community/gpt2-xl | Q8 | 30.89 tok/sEstimated Auto-generated benchmark  | 7GB | 
| HuggingFaceTB/SmolLM-135M | Q8 | 30.87 tok/sEstimated Auto-generated benchmark  | 7GB | 
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 30.83 tok/sEstimated Auto-generated benchmark  | 7GB | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 30.81 tok/sEstimated Auto-generated benchmark  | 8GB | 
| rednote-hilab/dots.ocr | Q8 | 30.73 tok/sEstimated Auto-generated benchmark  | 7GB | 
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 30.71 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-4-mini-instruct | Q8 | 30.54 tok/sEstimated Auto-generated benchmark  | 7GB | 
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 30.54 tok/sEstimated Auto-generated benchmark  | 10GB | 
| ibm-granite/granite-3.3-8b-instruct | Q8 | 30.44 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen3-8B-Base | Q8 | 30.41 tok/sEstimated Auto-generated benchmark  | 8GB | 
| IlyaGusev/saiga_llama3_8b | Q8 | 30.34 tok/sEstimated Auto-generated benchmark  | 8GB | 
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 30.11 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/phi-4 | Q8 | 30.11 tok/sEstimated Auto-generated benchmark  | 7GB | 
| rinna/japanese-gpt-neox-small | Q8 | 30.09 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-7B-Instruct | Q8 | 30.08 tok/sEstimated Auto-generated benchmark  | 9GB | 
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 29.46 tok/sEstimated Auto-generated benchmark  | 9GB | 
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 29.40 tok/sEstimated Auto-generated benchmark  | 8GB | 
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 29.40 tok/sEstimated Auto-generated benchmark  | 8GB | 
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 29.38 tok/sEstimated Auto-generated benchmark  | 13GB | 
| openai/gpt-oss-20b | Q4 | 29.35 tok/sEstimated Auto-generated benchmark  | 10GB | 
| google/gemma-2-27b-it | Q4 | 29.22 tok/sEstimated Auto-generated benchmark  | 16GB | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 29.22 tok/sEstimated Auto-generated benchmark  | 15GB | 
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 28.96 tok/sEstimated Auto-generated benchmark  | 8GB | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 28.94 tok/sEstimated Auto-generated benchmark  | 8GB | 
| google/gemma-2-9b-it | Q8 | 28.83 tok/sEstimated Auto-generated benchmark  | 11GB | 
| Qwen/Qwen3-Embedding-8B | Q8 | 28.76 tok/sEstimated Auto-generated benchmark  | 8GB | 
| ai-forever/ruGPT-3.5-13B | Q8 | 28.69 tok/sEstimated Auto-generated benchmark  | 13GB | 
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 28.57 tok/sEstimated Auto-generated benchmark  | 15GB | 
| Qwen/Qwen3-8B-FP8 | Q8 | 28.55 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen3-32B | Q4 | 28.31 tok/sEstimated Auto-generated benchmark  | 16GB | 
| Qwen/Qwen2.5-32B | Q4 | 27.70 tok/sEstimated Auto-generated benchmark  | 16GB | 
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 27.31 tok/sEstimated Auto-generated benchmark  | 9GB | 
| Qwen/Qwen3-14B-Base | Q8 | 27.11 tok/sEstimated Auto-generated benchmark  | 14GB | 
| microsoft/Phi-3-medium-128k-instruct | Q8 | 26.91 tok/sEstimated Auto-generated benchmark  | 16GB | 
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 26.23 tok/sEstimated Auto-generated benchmark  | 16GB | 
| Qwen/Qwen2.5-32B-Instruct | Q4 | 26.10 tok/sEstimated Auto-generated benchmark  | 16GB | 
| Qwen/Qwen2.5-14B | Q8 | 25.81 tok/sEstimated Auto-generated benchmark  | 14GB | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 25.74 tok/sEstimated Auto-generated benchmark  | 15GB | 
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 25.68 tok/sEstimated Auto-generated benchmark  | 15GB | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 25.57 tok/sEstimated Auto-generated benchmark  | 16GB | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 25.53 tok/sEstimated Auto-generated benchmark  | 15GB | 
| Qwen/Qwen2.5-14B-Instruct | Q8 | 25.48 tok/sEstimated Auto-generated benchmark  | 14GB | 
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 24.87 tok/sEstimated Auto-generated benchmark  | 15GB | 
| Qwen/Qwen3-14B | Q8 | 24.66 tok/sEstimated Auto-generated benchmark  | 14GB | 
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 24.59 tok/sEstimated Auto-generated benchmark  | 15GB | 
| Qwen/Qwen3-30B-A3B | Q4 | 24.51 tok/sEstimated Auto-generated benchmark  | 15GB | 
| baichuan-inc/Baichuan-M2-32B | Q4 | 24.18 tok/sEstimated Auto-generated benchmark  | 16GB | 
| meta-llama/Llama-2-13b-chat-hf | Q8 | 24.17 tok/sEstimated Auto-generated benchmark  | 13GB | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 24.04 tok/sEstimated Auto-generated benchmark  | 15GB | 
| OpenPipe/Qwen3-14B-Instruct | Q8 | 23.54 tok/sEstimated Auto-generated benchmark  | 14GB | 
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed | 
|---|---|---|---|---|
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | Not supported | — | 79GB (have 16GB) | 
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | Not supported | — | 40GB (have 16GB) | 
| 01-ai/Yi-1.5-34B-Chat | Q8 | Not supported | — | 39GB (have 16GB) | 
| 01-ai/Yi-1.5-34B-Chat | Q4 | Not supported | — | 20GB (have 16GB) | 
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | Fits comfortably | 34.28 tok/sEstimated  | 9GB (have 16GB) | 
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | Fits comfortably | 45.18 tok/sEstimated  | 5GB (have 16GB) | 
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | Not supported | — | 79GB (have 16GB) | 
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | Not supported | — | 40GB (have 16GB) | 
| microsoft/Phi-3-medium-128k-instruct | Q8 | Fits (tight) | 26.91 tok/sEstimated  | 16GB (have 16GB) | 
| microsoft/Phi-3-medium-128k-instruct | Q4 | Fits comfortably | 35.36 tok/sEstimated  | 8GB (have 16GB) | 
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 40.25 tok/sEstimated  | 5GB (have 16GB) | 
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 61.71 tok/sEstimated  | 3GB (have 16GB) | 
| google/gemma-2-9b-it | Q8 | Fits comfortably | 28.83 tok/sEstimated  | 11GB (have 16GB) | 
| google/gemma-2-9b-it | Q4 | Fits comfortably | 41.48 tok/sEstimated  | 6GB (have 16GB) | 
| google/gemma-2-27b-it | Q8 | Not supported | — | 31GB (have 16GB) | 
| google/gemma-2-27b-it | Q4 | Fits (tight) | 29.22 tok/sEstimated  | 16GB (have 16GB) | 
| mistralai/Mistral-Small-Instruct-2409 | Q8 | Not supported | — | 25GB (have 16GB) | 
| mistralai/Mistral-Small-Instruct-2409 | Q4 | Fits comfortably | 29.38 tok/sEstimated  | 13GB (have 16GB) | 
| mistralai/Mistral-Large-Instruct-2411 | Q8 | Not supported | — | 138GB (have 16GB) | 
| mistralai/Mistral-Large-Instruct-2411 | Q4 | Not supported | — | 69GB (have 16GB) | 
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | Not supported | — | 158GB (have 16GB) | 
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | Not supported | — | 79GB (have 16GB) | 
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 48.60 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 64.48 tok/sEstimated  | 2GB (have 16GB) | 
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 29.46 tok/sEstimated  | 9GB (have 16GB) | 
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 48.04 tok/sEstimated  | 5GB (have 16GB) | 
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 79GB (have 16GB) | 
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 40GB (have 16GB) | 
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 79GB (have 16GB) | 
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 40GB (have 16GB) | 
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | Not supported | — | 38GB (have 16GB) | 
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | Not supported | — | 19GB (have 16GB) | 
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | Not supported | — | 264GB (have 16GB) | 
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | Not supported | — | 132GB (have 16GB) | 
| deepseek-ai/DeepSeek-V2.5 | Q8 | Not supported | — | 264GB (have 16GB) | 
| deepseek-ai/DeepSeek-V2.5 | Q4 | Not supported | — | 132GB (have 16GB) | 
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | Not supported | — | 82GB (have 16GB) | 
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | Not supported | — | 41GB (have 16GB) | 
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 30.08 tok/sEstimated  | 9GB (have 16GB) | 
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 45.36 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2.5-14B-Instruct | Q8 | Not supported | — | 17GB (have 16GB) | 
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 34.39 tok/sEstimated  | 9GB (have 16GB) | 
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 37GB (have 16GB) | 
| Qwen/Qwen2.5-32B-Instruct | Q4 | Not supported | — | 19GB (have 16GB) | 
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | Not supported | — | 37GB (have 16GB) | 
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | Not supported | — | 19GB (have 16GB) | 
| Qwen/QwQ-32B-Preview | Q8 | Not supported | — | 37GB (have 16GB) | 
| Qwen/QwQ-32B-Preview | Q4 | Not supported | — | 19GB (have 16GB) | 
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 82GB (have 16GB) | 
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 41GB (have 16GB) | 
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 16GB) | 
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 16GB) | 
| ai-forever/ruGPT-3.5-13B | Q8 | Fits comfortably | 28.69 tok/sEstimated  | 13GB (have 16GB) | 
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 37.11 tok/sEstimated  | 7GB (have 16GB) | 
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 16GB) | 
| baichuan-inc/Baichuan-M2-32B | Q4 | Fits (tight) | 24.18 tok/sEstimated  | 16GB (have 16GB) | 
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 35.65 tok/sEstimated  | 7GB (have 16GB) | 
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 43.57 tok/sEstimated  | 4GB (have 16GB) | 
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 30.44 tok/sEstimated  | 8GB (have 16GB) | 
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 44.64 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 31.41 tok/sEstimated  | 7GB (have 16GB) | 
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 52.21 tok/sEstimated  | 4GB (have 16GB) | 
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Not supported | — | 20GB (have 16GB) | 
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits comfortably | 30.54 tok/sEstimated  | 10GB (have 16GB) | 
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 31.62 tok/sEstimated  | 7GB (have 16GB) | 
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 49.99 tok/sEstimated  | 4GB (have 16GB) | 
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 33.57 tok/sEstimated  | 7GB (have 16GB) | 
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 48.31 tok/sEstimated  | 4GB (have 16GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 16GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Fits (tight) | 25.53 tok/sEstimated  | 15GB (have 16GB) | 
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 31.48 tok/sEstimated  | 8GB (have 16GB) | 
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 46.28 tok/sEstimated  | 4GB (have 16GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 16GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Fits (tight) | 25.74 tok/sEstimated  | 15GB (have 16GB) | 
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 35.08 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 49.39 tok/sEstimated  | 3GB (have 16GB) | 
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 35.74 tok/sEstimated  | 7GB (have 16GB) | 
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 44.11 tok/sEstimated  | 4GB (have 16GB) | 
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 35.79 tok/sEstimated  | 5GB (have 16GB) | 
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 54.08 tok/sEstimated  | 3GB (have 16GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 16GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Fits (tight) | 29.22 tok/sEstimated  | 15GB (have 16GB) | 
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 16GB) | 
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Fits (tight) | 28.57 tok/sEstimated  | 15GB (have 16GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 16GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Fits (tight) | 24.04 tok/sEstimated  | 15GB (have 16GB) | 
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 16GB) | 
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 16GB) | 
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 72.45 tok/sEstimated  | 1GB (have 16GB) | 
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 101.29 tok/sEstimated  | 1GB (have 16GB) | 
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 28.96 tok/sEstimated  | 8GB (have 16GB) | 
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 46.99 tok/sEstimated  | 4GB (have 16GB) | 
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits comfortably | 27.31 tok/sEstimated  | 9GB (have 16GB) | 
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 44.19 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 41.07 tok/sEstimated  | 3GB (have 16GB) | 
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 58.72 tok/sEstimated  | 2GB (have 16GB) | 
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 34.71 tok/sEstimated  | 7GB (have 16GB) | 
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 52.18 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-2-13b-chat-hf | Q8 | Fits comfortably | 24.17 tok/sEstimated  | 13GB (have 16GB) | 
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 37.27 tok/sEstimated  | 7GB (have 16GB) | 
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 16GB) | 
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 16GB) | 
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 68.27 tok/sEstimated  | 1GB (have 16GB) | 
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 105.18 tok/sEstimated  | 1GB (have 16GB) | 
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 41.41 tok/sEstimated  | 3GB (have 16GB) | 
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 68.96 tok/sEstimated  | 2GB (have 16GB) | 
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 16GB) | 
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 16GB) | 
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 33.46 tok/sEstimated  | 7GB (have 16GB) | 
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 49.36 tok/sEstimated  | 4GB (have 16GB) | 
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 32.79 tok/sEstimated  | 7GB (have 16GB) | 
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 47.91 tok/sEstimated  | 4GB (have 16GB) | 
| google/gemma-3-270m-it | Q8 | Fits comfortably | 33.95 tok/sEstimated  | 7GB (have 16GB) | 
| google/gemma-3-270m-it | Q4 | Fits comfortably | 52.36 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 42.68 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 60.72 tok/sEstimated  | 2GB (have 16GB) | 
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 16GB) | 
| Qwen/Qwen2.5-32B | Q4 | Fits (tight) | 27.70 tok/sEstimated  | 16GB (have 16GB) | 
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 34.87 tok/sEstimated  | 7GB (have 16GB) | 
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 51.12 tok/sEstimated  | 4GB (have 16GB) | 
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 31.00 tok/sEstimated  | 7GB (have 16GB) | 
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 44.26 tok/sEstimated  | 4GB (have 16GB) | 
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 36.93 tok/sEstimated  | 5GB (have 16GB) | 
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 51.55 tok/sEstimated  | 3GB (have 16GB) | 
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 49.32 tok/sEstimated  | 2GB (have 16GB) | 
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 80.99 tok/sEstimated  | 1GB (have 16GB) | 
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 16GB) | 
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 16GB) | 
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 34.76 tok/sEstimated  | 7GB (have 16GB) | 
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 44.12 tok/sEstimated  | 4GB (have 16GB) | 
| google/gemma-2b | Q8 | Fits comfortably | 52.93 tok/sEstimated  | 2GB (have 16GB) | 
| google/gemma-2b | Q4 | Fits comfortably | 69.25 tok/sEstimated  | 1GB (have 16GB) | 
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 32.75 tok/sEstimated  | 7GB (have 16GB) | 
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 48.26 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 16GB) | 
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 16GB) | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 30.81 tok/sEstimated  | 8GB (have 16GB) | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 45.45 tok/sEstimated  | 4GB (have 16GB) | 
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 30.54 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 51.73 tok/sEstimated  | 4GB (have 16GB) | 
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 36.56 tok/sEstimated  | 7GB (have 16GB) | 
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 49.36 tok/sEstimated  | 4GB (have 16GB) | 
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 35.76 tok/sEstimated  | 7GB (have 16GB) | 
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 51.86 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 39.75 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 58.43 tok/sEstimated  | 2GB (have 16GB) | 
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 16GB) | 
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Fits (tight) | 24.59 tok/sEstimated  | 15GB (have 16GB) | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 31.31 tok/sEstimated  | 8GB (have 16GB) | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 42.76 tok/sEstimated  | 4GB (have 16GB) | 
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 64.84 tok/sEstimated  | 1GB (have 16GB) | 
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 90.38 tok/sEstimated  | 1GB (have 16GB) | 
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 31.61 tok/sEstimated  | 8GB (have 16GB) | 
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 41.03 tok/sEstimated  | 4GB (have 16GB) | 
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 16GB) | 
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 16GB) | 
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 32.53 tok/sEstimated  | 7GB (have 16GB) | 
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 46.15 tok/sEstimated  | 4GB (have 16GB) | 
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 33.53 tok/sEstimated  | 7GB (have 16GB) | 
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 44.31 tok/sEstimated  | 4GB (have 16GB) | 
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 34.95 tok/sEstimated  | 7GB (have 16GB) | 
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 46.61 tok/sEstimated  | 4GB (have 16GB) | 
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 30.71 tok/sEstimated  | 7GB (have 16GB) | 
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 50.14 tok/sEstimated  | 4GB (have 16GB) | 
| huggyllama/llama-7b | Q8 | Fits comfortably | 32.82 tok/sEstimated  | 7GB (have 16GB) | 
| huggyllama/llama-7b | Q4 | Fits comfortably | 42.88 tok/sEstimated  | 4GB (have 16GB) | 
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 30.83 tok/sEstimated  | 7GB (have 16GB) | 
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 46.77 tok/sEstimated  | 4GB (have 16GB) | 
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 33.11 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 47.20 tok/sEstimated  | 4GB (have 16GB) | 
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 33.65 tok/sEstimated  | 7GB (have 16GB) | 
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 50.26 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 32.00 tok/sEstimated  | 8GB (have 16GB) | 
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 42.59 tok/sEstimated  | 4GB (have 16GB) | 
| openai-community/gpt2-xl | Q8 | Fits comfortably | 30.89 tok/sEstimated  | 7GB (have 16GB) | 
| openai-community/gpt2-xl | Q4 | Fits comfortably | 44.01 tok/sEstimated  | 4GB (have 16GB) | 
| OpenPipe/Qwen3-14B-Instruct | Q8 | Fits comfortably | 23.54 tok/sEstimated  | 14GB (have 16GB) | 
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 34.87 tok/sEstimated  | 7GB (have 16GB) | 
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 16GB) | 
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 16GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 40.99 tok/sEstimated  | 4GB (have 16GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 53.23 tok/sEstimated  | 2GB (have 16GB) | 
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 46.42 tok/sEstimated  | 3GB (have 16GB) | 
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 65.30 tok/sEstimated  | 2GB (have 16GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 39.84 tok/sEstimated  | 4GB (have 16GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 55.85 tok/sEstimated  | 2GB (have 16GB) | 
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 46.89 tok/sEstimated  | 3GB (have 16GB) | 
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 70.28 tok/sEstimated  | 2GB (have 16GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 40.05 tok/sEstimated  | 4GB (have 16GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 56.54 tok/sEstimated  | 2GB (have 16GB) | 
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 42.84 tok/sEstimated  | 3GB (have 16GB) | 
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 68.01 tok/sEstimated  | 2GB (have 16GB) | 
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 33.58 tok/sEstimated  | 7GB (have 16GB) | 
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 50.42 tok/sEstimated  | 4GB (have 16GB) | 
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 16GB) | 
| codellama/CodeLlama-34b-hf | Q4 | Not supported | — | 17GB (have 16GB) | 
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 64.55 tok/sEstimated  | 1GB (have 16GB) | 
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 91.88 tok/sEstimated  | 1GB (have 16GB) | 
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 40.75 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 54.08 tok/sEstimated  | 3GB (have 16GB) | 
| google/gemma-2-2b-it | Q8 | Fits comfortably | 53.78 tok/sEstimated  | 2GB (have 16GB) | 
| google/gemma-2-2b-it | Q4 | Fits comfortably | 82.37 tok/sEstimated  | 1GB (have 16GB) | 
| Qwen/Qwen2.5-14B | Q8 | Fits comfortably | 25.81 tok/sEstimated  | 14GB (have 16GB) | 
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 33.34 tok/sEstimated  | 7GB (have 16GB) | 
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 16GB) | 
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Fits (tight) | 26.23 tok/sEstimated  | 16GB (have 16GB) | 
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 34.88 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 43.57 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 38.92 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 61.58 tok/sEstimated  | 2GB (have 16GB) | 
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 31.27 tok/sEstimated  | 7GB (have 16GB) | 
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 44.83 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 34.44 tok/sEstimated  | 7GB (have 16GB) | 
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 50.84 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-14B-Base | Q8 | Fits comfortably | 27.11 tok/sEstimated  | 14GB (have 16GB) | 
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 39.00 tok/sEstimated  | 7GB (have 16GB) | 
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 33.98 tok/sEstimated  | 8GB (have 16GB) | 
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 48.86 tok/sEstimated  | 4GB (have 16GB) | 
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 32.42 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 44.97 tok/sEstimated  | 4GB (have 16GB) | 
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 30.11 tok/sEstimated  | 7GB (have 16GB) | 
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 49.60 tok/sEstimated  | 4GB (have 16GB) | 
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 30.09 tok/sEstimated  | 7GB (have 16GB) | 
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 46.05 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 35.97 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 58.65 tok/sEstimated  | 3GB (have 16GB) | 
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 30.34 tok/sEstimated  | 8GB (have 16GB) | 
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 42.98 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 16GB) | 
| Qwen/Qwen3-30B-A3B | Q4 | Fits (tight) | 24.51 tok/sEstimated  | 15GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 35.07 tok/sEstimated  | 7GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 49.05 tok/sEstimated  | 4GB (have 16GB) | 
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 34.84 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 45.39 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 28.55 tok/sEstimated  | 8GB (have 16GB) | 
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 46.72 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 16GB) | 
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Fits (tight) | 25.68 tok/sEstimated  | 15GB (have 16GB) | 
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 38.37 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 60.03 tok/sEstimated  | 2GB (have 16GB) | 
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 36.07 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 49.55 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 30.41 tok/sEstimated  | 8GB (have 16GB) | 
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 42.46 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 38.36 tok/sEstimated  | 6GB (have 16GB) | 
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 48.48 tok/sEstimated  | 3GB (have 16GB) | 
| openai-community/gpt2-medium | Q8 | Fits comfortably | 35.41 tok/sEstimated  | 7GB (have 16GB) | 
| openai-community/gpt2-medium | Q4 | Fits comfortably | 50.26 tok/sEstimated  | 4GB (have 16GB) | 
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 34.75 tok/sEstimated  | 7GB (have 16GB) | 
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 49.33 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 41.17 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 53.71 tok/sEstimated  | 3GB (have 16GB) | 
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 30.87 tok/sEstimated  | 7GB (have 16GB) | 
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 51.27 tok/sEstimated  | 4GB (have 16GB) | 
| unsloth/gpt-oss-20b-BF16 | Q8 | Not supported | — | 20GB (have 16GB) | 
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 31.51 tok/sEstimated  | 10GB (have 16GB) | 
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 16GB) | 
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 16GB) | 
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 33.07 tok/sEstimated  | 8GB (have 16GB) | 
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 46.15 tok/sEstimated  | 4GB (have 16GB) | 
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 32.31 tok/sEstimated  | 7GB (have 16GB) | 
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 48.48 tok/sEstimated  | 4GB (have 16GB) | 
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 34.15 tok/sEstimated  | 7GB (have 16GB) | 
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 52.14 tok/sEstimated  | 4GB (have 16GB) | 
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 50.45 tok/sEstimated  | 2GB (have 16GB) | 
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 81.45 tok/sEstimated  | 1GB (have 16GB) | 
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 33.76 tok/sEstimated  | 7GB (have 16GB) | 
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 52.12 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 16GB) | 
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits (tight) | 26.10 tok/sEstimated  | 16GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 34.59 tok/sEstimated  | 7GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 43.29 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 31.87 tok/sEstimated  | 8GB (have 16GB) | 
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 49.52 tok/sEstimated  | 4GB (have 16GB) | 
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 31.36 tok/sEstimated  | 7GB (have 16GB) | 
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 45.38 tok/sEstimated  | 4GB (have 16GB) | 
| microsoft/phi-4 | Q8 | Fits comfortably | 30.11 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/phi-4 | Q4 | Fits comfortably | 50.25 tok/sEstimated  | 4GB (have 16GB) | 
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 48.47 tok/sEstimated  | 3GB (have 16GB) | 
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 61.08 tok/sEstimated  | 2GB (have 16GB) | 
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 35.23 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 58.08 tok/sEstimated  | 3GB (have 16GB) | 
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 36.54 tok/sEstimated  | 7GB (have 16GB) | 
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 51.49 tok/sEstimated  | 4GB (have 16GB) | 
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 31.74 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 50.40 tok/sEstimated  | 4GB (have 16GB) | 
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 36.30 tok/sEstimated  | 7GB (have 16GB) | 
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 48.55 tok/sEstimated  | 4GB (have 16GB) | 
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 33.80 tok/sEstimated  | 7GB (have 16GB) | 
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 50.50 tok/sEstimated  | 4GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 28.94 tok/sEstimated  | 8GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 44.08 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 35.08 tok/sEstimated  | 7GB (have 16GB) | 
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 50.29 tok/sEstimated  | 4GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 36.13 tok/sEstimated  | 7GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 47.01 tok/sEstimated  | 4GB (have 16GB) | 
| microsoft/phi-2 | Q8 | Fits comfortably | 33.61 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/phi-2 | Q4 | Fits comfortably | 46.03 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 16GB) | 
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 16GB) | 
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 39.96 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 48.66 tok/sEstimated  | 3GB (have 16GB) | 
| Qwen/Qwen3-14B | Q8 | Fits comfortably | 24.66 tok/sEstimated  | 14GB (have 16GB) | 
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 39.32 tok/sEstimated  | 7GB (have 16GB) | 
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 28.76 tok/sEstimated  | 8GB (have 16GB) | 
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 44.81 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 16GB) | 
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 16GB) | 
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 32.87 tok/sEstimated  | 8GB (have 16GB) | 
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 41.01 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 25.48 tok/sEstimated  | 14GB (have 16GB) | 
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 34.79 tok/sEstimated  | 7GB (have 16GB) | 
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 35.36 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 55.06 tok/sEstimated  | 3GB (have 16GB) | 
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 36.86 tok/sEstimated  | 4GB (have 16GB) | 
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 57.60 tok/sEstimated  | 2GB (have 16GB) | 
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Not supported | — | 20GB (have 16GB) | 
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 34.16 tok/sEstimated  | 10GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 41.13 tok/sEstimated  | 5GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 50.63 tok/sEstimated  | 3GB (have 16GB) | 
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 29.40 tok/sEstimated  | 8GB (have 16GB) | 
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 41.92 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 36.14 tok/sEstimated  | 6GB (have 16GB) | 
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 49.95 tok/sEstimated  | 3GB (have 16GB) | 
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 30.73 tok/sEstimated  | 7GB (have 16GB) | 
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 46.23 tok/sEstimated  | 4GB (have 16GB) | 
| google-t5/t5-3b | Q8 | Fits comfortably | 46.60 tok/sEstimated  | 3GB (have 16GB) | 
| google-t5/t5-3b | Q4 | Fits comfortably | 60.49 tok/sEstimated  | 2GB (have 16GB) | 
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 16GB) | 
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Fits (tight) | 24.87 tok/sEstimated  | 15GB (have 16GB) | 
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 44.23 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 53.39 tok/sEstimated  | 2GB (have 16GB) | 
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 35.39 tok/sEstimated  | 7GB (have 16GB) | 
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 46.46 tok/sEstimated  | 4GB (have 16GB) | 
| openai-community/gpt2-large | Q8 | Fits comfortably | 31.40 tok/sEstimated  | 7GB (have 16GB) | 
| openai-community/gpt2-large | Q4 | Fits comfortably | 48.53 tok/sEstimated  | 4GB (have 16GB) | 
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 36.65 tok/sEstimated  | 7GB (have 16GB) | 
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 47.68 tok/sEstimated  | 4GB (have 16GB) | 
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 68.34 tok/sEstimated  | 1GB (have 16GB) | 
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 104.81 tok/sEstimated  | 1GB (have 16GB) | 
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 16GB) | 
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 16GB) | 
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 16GB) | 
| Qwen/Qwen3-32B | Q4 | Fits (tight) | 28.31 tok/sEstimated  | 16GB (have 16GB) | 
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 39.18 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 50.74 tok/sEstimated  | 3GB (have 16GB) | 
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 31.06 tok/sEstimated  | 7GB (have 16GB) | 
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 46.14 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 32.49 tok/sEstimated  | 8GB (have 16GB) | 
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 41.06 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 68.77 tok/sEstimated  | 1GB (have 16GB) | 
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 102.82 tok/sEstimated  | 1GB (have 16GB) | 
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 34.74 tok/sEstimated  | 7GB (have 16GB) | 
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 49.72 tok/sEstimated  | 4GB (have 16GB) | 
| vikhyatk/moondream2 | Q8 | Fits comfortably | 33.63 tok/sEstimated  | 7GB (have 16GB) | 
| vikhyatk/moondream2 | Q4 | Fits comfortably | 51.43 tok/sEstimated  | 4GB (have 16GB) | 
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 47.35 tok/sEstimated  | 3GB (have 16GB) | 
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 69.15 tok/sEstimated  | 2GB (have 16GB) | 
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 16GB) | 
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 16GB) | 
| distilbert/distilgpt2 | Q8 | Fits comfortably | 34.16 tok/sEstimated  | 7GB (have 16GB) | 
| distilbert/distilgpt2 | Q4 | Fits comfortably | 44.38 tok/sEstimated  | 4GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 16GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Fits (tight) | 25.57 tok/sEstimated  | 16GB (have 16GB) | 
| inference-net/Schematron-3B | Q8 | Fits comfortably | 48.44 tok/sEstimated  | 3GB (have 16GB) | 
| inference-net/Schematron-3B | Q4 | Fits comfortably | 67.83 tok/sEstimated  | 2GB (have 16GB) | 
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 31.43 tok/sEstimated  | 8GB (have 16GB) | 
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 41.45 tok/sEstimated  | 4GB (have 16GB) | 
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 31.87 tok/sEstimated  | 7GB (have 16GB) | 
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 46.55 tok/sEstimated  | 4GB (have 16GB) | 
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 48.81 tok/sEstimated  | 3GB (have 16GB) | 
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 60.36 tok/sEstimated  | 2GB (have 16GB) | 
| bigscience/bloomz-560m | Q8 | Fits comfortably | 30.90 tok/sEstimated  | 7GB (have 16GB) | 
| bigscience/bloomz-560m | Q4 | Fits comfortably | 46.70 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 44.66 tok/sEstimated  | 3GB (have 16GB) | 
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 67.59 tok/sEstimated  | 2GB (have 16GB) | 
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 16GB) | 
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 16GB) | 
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 66.54 tok/sEstimated  | 1GB (have 16GB) | 
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 86.41 tok/sEstimated  | 1GB (have 16GB) | 
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 37.97 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 56.69 tok/sEstimated  | 2GB (have 16GB) | 
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 34.65 tok/sEstimated  | 7GB (have 16GB) | 
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 44.43 tok/sEstimated  | 4GB (have 16GB) | 
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 69.33 tok/sEstimated  | 1GB (have 16GB) | 
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 104.74 tok/sEstimated  | 1GB (have 16GB) | 
| facebook/opt-125m | Q8 | Fits comfortably | 36.42 tok/sEstimated  | 7GB (have 16GB) | 
| facebook/opt-125m | Q4 | Fits comfortably | 45.71 tok/sEstimated  | 4GB (have 16GB) | 
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 37.22 tok/sEstimated  | 5GB (have 16GB) | 
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 57.14 tok/sEstimated  | 3GB (have 16GB) | 
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 34.34 tok/sEstimated  | 6GB (have 16GB) | 
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 48.76 tok/sEstimated  | 3GB (have 16GB) | 
| google/gemma-3-1b-it | Q8 | Fits comfortably | 68.40 tok/sEstimated  | 1GB (have 16GB) | 
| google/gemma-3-1b-it | Q4 | Fits comfortably | 93.27 tok/sEstimated  | 1GB (have 16GB) | 
| openai/gpt-oss-20b | Q8 | Not supported | — | 20GB (have 16GB) | 
| openai/gpt-oss-20b | Q4 | Fits comfortably | 29.35 tok/sEstimated  | 10GB (have 16GB) | 
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 16GB) | 
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Not supported | — | 17GB (have 16GB) | 
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 29.40 tok/sEstimated  | 8GB (have 16GB) | 
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 48.44 tok/sEstimated  | 4GB (have 16GB) | 
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 40.92 tok/sEstimated  | 5GB (have 16GB) | 
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 57.12 tok/sEstimated  | 3GB (have 16GB) | 
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 38.78 tok/sEstimated  | 6GB (have 16GB) | 
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 52.82 tok/sEstimated  | 3GB (have 16GB) | 
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 34.34 tok/sEstimated  | 7GB (have 16GB) | 
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 45.64 tok/sEstimated  | 4GB (have 16GB) | 
| openai-community/gpt2 | Q8 | Fits comfortably | 31.36 tok/sEstimated  | 7GB (have 16GB) | 
| openai-community/gpt2 | Q4 | Fits comfortably | 47.46 tok/sEstimated  | 4GB (have 16GB) | 
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
Umbrella’s CUDA build runs the 16 GB chat preset for Llama 3.3 70B at roughly 10 tokens/sec on a stock RTX 4080—around 20× faster than older GGUF pipelines on the same card.
Source: Reddit – /r/LocalLLaMA (m7daipg)
Yes. One builder logged Llama 3.3 70B Q3_s at ~15 tok/s on Windows with Ollama, then jumped to ~30 tok/s after switching to Linux with ExLlama and performance-tuned CUDA kernels.
Source: Reddit – /r/LocalLLaMA (mi1gu0s)
RTX 4080 carries a 320 W board power rating, ships with 16 GB of GDDR6X, and uses the 16-pin 12VHPWR connector. NVIDIA recommends at least a 750 W PSU.
Source: TechPowerUp – RTX 4080 Specs
Only with heavy offloading. Users experimenting with DDR6 system memory and PCIe offload confirm that 70B models can run, but bandwidth limits keep throughput well below 24 GB cards.
Source: Reddit – /r/LocalLLaMA (m76rp0l)
Price snapshot from 3 Nov 2025: Amazon at $1,199 in stock, Newegg at $1,249 in stock, and Best Buy at $1,199 in stock.
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.