Loading GPU data...
Loading GPU data...
Quick Answer: RTX 4060 Ti 16GB offers 16GB VRAM and starts around $499.00. It delivers approximately 46 tokens/sec on unsloth/Llama-3.2-1B-Instruct. It typically draws 165W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| unsloth/Llama-3.2-1B-Instruct | Q4 | 46.26 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 46.00 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 44.61 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 42.74 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 42.50 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 41.12 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 40.82 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 39.20 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 39.17 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 32.80 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 32.35 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q8 | 32.24 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q8 | 31.94 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 31.85 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 31.08 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 31.07 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 31.04 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 30.48 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 30.02 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 30.01 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 29.93 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-1b-it | Q8 | 29.10 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 29.05 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-research/PowerMoE-3b | Q4 | 28.51 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 28.43 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | Q4 | 28.23 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 28.04 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 27.94 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 27.56 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q4 | 27.52 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B | Q4 | 27.35 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 27.14 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 26.94 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 26.88 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 26.72 tok/sEstimated Auto-generated benchmark | 2GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 26.46 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 26.44 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 26.44 tok/sEstimated Auto-generated benchmark | 2GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 26.43 tok/sEstimated Auto-generated benchmark | 3GB |
| inference-net/Schematron-3B | Q4 | 26.19 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 26.00 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/VibeVoice-1.5B | Q4 | 25.96 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 25.96 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 25.37 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 25.32 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 25.24 tok/sEstimated Auto-generated benchmark | 3GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 25.23 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 25.03 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-0.5B | Q4 | 24.74 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 24.44 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B | Q4 | 24.37 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 23.95 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Base | Q4 | 23.92 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-0.5B | Q4 | 23.81 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 23.66 tok/sEstimated Auto-generated benchmark | 2GB |
| petals-team/StableBeluga2 | Q4 | 23.42 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 23.41 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 23.39 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 23.34 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 23.33 tok/sEstimated Auto-generated benchmark | 3GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 23.10 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 23.10 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 23.05 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 23.04 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 22.98 tok/sEstimated Auto-generated benchmark | 3GB |
| distilbert/distilgpt2 | Q4 | 22.84 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 22.76 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 22.63 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 22.60 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/DialoGPT-small | Q4 | 22.56 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 22.56 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 22.53 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 22.43 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 22.40 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 22.40 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-2b-it | Q8 | 22.34 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 22.32 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 22.28 tok/sEstimated Auto-generated benchmark | 3GB |
| LiquidAI/LFM2-1.2B | Q8 | 22.27 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-0.6B | Q4 | 22.27 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B | Q4 | 22.24 tok/sEstimated Auto-generated benchmark | 4GB |
| google-t5/t5-3b | Q8 | 22.22 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 22.19 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q8 | 22.16 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-7b-hf | Q4 | 22.13 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 22.09 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 22.07 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 22.07 tok/sEstimated Auto-generated benchmark | 3GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 22.07 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 22.06 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-3B | Q8 | 22.04 tok/sEstimated Auto-generated benchmark | 3GB |
| liuhaotian/llava-v1.5-7b | Q4 | 22.01 tok/sEstimated Auto-generated benchmark | 4GB |
| parler-tts/parler-tts-large-v1 | Q4 | 21.95 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 21.93 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 21.93 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 21.89 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 21.88 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 21.86 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 21.83 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 21.77 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 21.76 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 21.71 tok/sEstimated Auto-generated benchmark | 3GB |
| rednote-hilab/dots.ocr | Q4 | 21.62 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 21.62 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 21.53 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 21.44 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 21.42 tok/sEstimated Auto-generated benchmark | 4GB |
| bigcode/starcoder2-3b | Q8 | 21.36 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2b | Q8 | 21.35 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 21.29 tok/sEstimated Auto-generated benchmark | 2GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 21.29 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 21.28 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 21.23 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 21.22 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 21.20 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 21.17 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 21.10 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 21.09 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 21.07 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 21.06 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 21.00 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 20.91 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 20.83 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 20.75 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 20.75 tok/sEstimated Auto-generated benchmark | 3GB |
| numind/NuExtract-1.5 | Q4 | 20.75 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 20.71 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 20.71 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 20.70 tok/sEstimated Auto-generated benchmark | 4GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 20.68 tok/sEstimated Auto-generated benchmark | 3GB |
| vikhyatk/moondream2 | Q4 | 20.53 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 20.51 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 20.49 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 20.45 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 20.43 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 20.40 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-xl | Q4 | 20.40 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 20.38 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 20.36 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 20.33 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 20.23 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 20.20 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 20.10 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 19.98 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 19.94 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 19.91 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 19.88 tok/sEstimated Auto-generated benchmark | 5GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 19.78 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 19.74 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 19.63 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 19.54 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 19.48 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 19.47 tok/sEstimated Auto-generated benchmark | 5GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 19.46 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 19.43 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 19.37 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 19.36 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 19.30 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 19.28 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-research/PowerMoE-3b | Q8 | 19.16 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 19.04 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 19.02 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 18.87 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 18.86 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 18.84 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 18.80 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 18.79 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 18.72 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 18.69 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q8 | 18.65 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 18.50 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 18.45 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 18.35 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 18.35 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 18.29 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 18.29 tok/sEstimated Auto-generated benchmark | 5GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 18.20 tok/sEstimated Auto-generated benchmark | 5GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 18.14 tok/sEstimated Auto-generated benchmark | 5GB |
| google/gemma-2-9b-it | Q4 | 17.96 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-1.5B | Q8 | 17.88 tok/sEstimated Auto-generated benchmark | 5GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 17.55 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 17.55 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B | Q4 | 17.06 tok/sEstimated Auto-generated benchmark | 7GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 16.86 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 16.82 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-4B | Q8 | 16.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 16.71 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 16.66 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 16.55 tok/sEstimated Auto-generated benchmark | 5GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 16.54 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 16.51 tok/sEstimated Auto-generated benchmark | 4GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 16.49 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 16.43 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-0.6B | Q8 | 16.42 tok/sEstimated Auto-generated benchmark | 6GB |
| zai-org/GLM-4.6-FP8 | Q8 | 16.39 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 16.39 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 16.39 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 16.39 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 16.38 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/VibeVoice-1.5B | Q8 | 16.36 tok/sEstimated Auto-generated benchmark | 5GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 16.34 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 16.31 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 16.30 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 16.26 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 16.22 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 16.18 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 16.08 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 16.08 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 16.02 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 16.01 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-2-7b-hf | Q8 | 15.98 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 15.95 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 15.93 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 15.85 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 15.84 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 15.83 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 15.76 tok/sEstimated Auto-generated benchmark | 5GB |
| openai-community/gpt2-xl | Q8 | 15.75 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 15.74 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 15.73 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 15.72 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 15.69 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B | Q8 | 15.68 tok/sEstimated Auto-generated benchmark | 5GB |
| openai-community/gpt2-large | Q8 | 15.67 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 15.58 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-14B-Base | Q4 | 15.57 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 15.55 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 15.53 tok/sEstimated Auto-generated benchmark | 7GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 15.52 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B | Q8 | 15.45 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 15.40 tok/sEstimated Auto-generated benchmark | 10GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 15.36 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 15.31 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 15.28 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 15.23 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 15.21 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/DialoGPT-small | Q8 | 15.19 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 15.15 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 15.09 tok/sEstimated Auto-generated benchmark | 7GB |
| numind/NuExtract-1.5 | Q8 | 15.08 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 15.06 tok/sEstimated Auto-generated benchmark | 8GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 15.05 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 15.02 tok/sEstimated Auto-generated benchmark | 8GB |
| sshleifer/tiny-gpt2 | Q8 | 15.02 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 14.97 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 14.97 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 14.93 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-14B | Q4 | 14.91 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 14.90 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 14.90 tok/sEstimated Auto-generated benchmark | 7GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 14.89 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 14.84 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 14.82 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 14.82 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 14.77 tok/sEstimated Auto-generated benchmark | 8GB |
| dicta-il/dictalm2.0-instruct | Q8 | 14.75 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 14.75 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 14.67 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 14.64 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 14.61 tok/sEstimated Auto-generated benchmark | 7GB |
| vikhyatk/moondream2 | Q8 | 14.59 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 14.49 tok/sEstimated Auto-generated benchmark | 6GB |
| liuhaotian/llava-v1.5-7b | Q8 | 14.45 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 14.41 tok/sEstimated Auto-generated benchmark | 7GB |
| openai/gpt-oss-20b | Q4 | 14.39 tok/sEstimated Auto-generated benchmark | 10GB |
| rinna/japanese-gpt-neox-small | Q8 | 14.36 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 14.36 tok/sEstimated Auto-generated benchmark | 7GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 14.34 tok/sEstimated Auto-generated benchmark | 10GB |
| ibm-granite/granite-docling-258M | Q8 | 14.30 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 14.29 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 14.25 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 14.24 tok/sEstimated Auto-generated benchmark | 10GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 14.23 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 14.23 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 14.19 tok/sEstimated Auto-generated benchmark | 13GB |
| openai-community/gpt2 | Q8 | 14.17 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 14.14 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 14.14 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 14.00 tok/sEstimated Auto-generated benchmark | 7GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 13.92 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 13.84 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 13.83 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 13.82 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 13.80 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 13.75 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 13.70 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 13.69 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 13.62 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 13.60 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 13.49 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 13.40 tok/sEstimated Auto-generated benchmark | 8GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 13.28 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 13.26 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 13.21 tok/sEstimated Auto-generated benchmark | 8GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 13.02 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-30B-A3B | Q4 | 12.98 tok/sEstimated Auto-generated benchmark | 15GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 12.96 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 12.95 tok/sEstimated Auto-generated benchmark | 13GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 12.93 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 12.92 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 12.92 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 12.82 tok/sEstimated Auto-generated benchmark | 8GB |
| google/gemma-2-27b-it | Q4 | 12.80 tok/sEstimated Auto-generated benchmark | 16GB |
| google/gemma-2-9b-it | Q8 | 12.80 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-32B | Q4 | 12.66 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-14B-Base | Q8 | 12.42 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-32B | Q4 | 11.76 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 11.74 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 11.72 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 11.65 tok/sEstimated Auto-generated benchmark | 15GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 11.64 tok/sEstimated Auto-generated benchmark | 13GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 11.62 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 11.55 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 11.48 tok/sEstimated Auto-generated benchmark | 15GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 11.39 tok/sEstimated Auto-generated benchmark | 14GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 11.38 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-14B | Q8 | 11.22 tok/sEstimated Auto-generated benchmark | 14GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 11.04 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 10.91 tok/sEstimated Auto-generated benchmark | 16GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 10.80 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 10.76 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 10.61 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-14B | Q8 | 10.58 tok/sEstimated Auto-generated benchmark | 14GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 10.54 tok/sEstimated Auto-generated benchmark | 16GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | Not supported | — | 79GB (have 16GB) |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | Not supported | — | 40GB (have 16GB) |
| 01-ai/Yi-1.5-34B-Chat | Q8 | Not supported | — | 39GB (have 16GB) |
| 01-ai/Yi-1.5-34B-Chat | Q4 | Not supported | — | 20GB (have 16GB) |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | Fits comfortably | 13.28 tok/sEstimated | 9GB (have 16GB) |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | Fits comfortably | 18.29 tok/sEstimated | 5GB (have 16GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | Not supported | — | 79GB (have 16GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | Not supported | — | 40GB (have 16GB) |
| microsoft/Phi-3-medium-128k-instruct | Q8 | Fits (tight) | 10.54 tok/sEstimated | 16GB (have 16GB) |
| microsoft/Phi-3-medium-128k-instruct | Q4 | Fits comfortably | 15.21 tok/sEstimated | 8GB (have 16GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 19.88 tok/sEstimated | 5GB (have 16GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 25.24 tok/sEstimated | 3GB (have 16GB) |
| google/gemma-2-9b-it | Q8 | Fits comfortably | 12.80 tok/sEstimated | 11GB (have 16GB) |
| google/gemma-2-9b-it | Q4 | Fits comfortably | 17.96 tok/sEstimated | 6GB (have 16GB) |
| google/gemma-2-27b-it | Q8 | Not supported | — | 31GB (have 16GB) |
| google/gemma-2-27b-it | Q4 | Fits (tight) | 12.80 tok/sEstimated | 16GB (have 16GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | Not supported | — | 25GB (have 16GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | Fits comfortably | 14.19 tok/sEstimated | 13GB (have 16GB) |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | Not supported | — | 138GB (have 16GB) |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | Not supported | — | 69GB (have 16GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | Not supported | — | 158GB (have 16GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | Not supported | — | 79GB (have 16GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 18.69 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 26.88 tok/sEstimated | 2GB (have 16GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 15.23 tok/sEstimated | 9GB (have 16GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 18.84 tok/sEstimated | 5GB (have 16GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 79GB (have 16GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 40GB (have 16GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 79GB (have 16GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 40GB (have 16GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | Not supported | — | 38GB (have 16GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | Not supported | — | 19GB (have 16GB) |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | Not supported | — | 264GB (have 16GB) |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | Not supported | — | 132GB (have 16GB) |
| deepseek-ai/DeepSeek-V2.5 | Q8 | Not supported | — | 264GB (have 16GB) |
| deepseek-ai/DeepSeek-V2.5 | Q4 | Not supported | — | 132GB (have 16GB) |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | Not supported | — | 82GB (have 16GB) |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | Not supported | — | 41GB (have 16GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 14.14 tok/sEstimated | 9GB (have 16GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 19.47 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Not supported | — | 17GB (have 16GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 16.82 tok/sEstimated | 9GB (have 16GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 37GB (have 16GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Not supported | — | 19GB (have 16GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | Not supported | — | 37GB (have 16GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | Not supported | — | 19GB (have 16GB) |
| Qwen/QwQ-32B-Preview | Q8 | Not supported | — | 37GB (have 16GB) |
| Qwen/QwQ-32B-Preview | Q4 | Not supported | — | 19GB (have 16GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 82GB (have 16GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 41GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 16GB) |
| ai-forever/ruGPT-3.5-13B | Q8 | Fits comfortably | 11.64 tok/sEstimated | 13GB (have 16GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 16.49 tok/sEstimated | 7GB (have 16GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 16GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Fits (tight) | 10.80 tok/sEstimated | 16GB (have 16GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 15.55 tok/sEstimated | 7GB (have 16GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 19.28 tok/sEstimated | 4GB (have 16GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 14.23 tok/sEstimated | 8GB (have 16GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 19.46 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 15.69 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 21.93 tok/sEstimated | 4GB (have 16GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Not supported | — | 20GB (have 16GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits comfortably | 14.24 tok/sEstimated | 10GB (have 16GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 16.39 tok/sEstimated | 7GB (have 16GB) |
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 20.45 tok/sEstimated | 4GB (have 16GB) |
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 14.75 tok/sEstimated | 7GB (have 16GB) |
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 23.39 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Fits (tight) | 11.04 tok/sEstimated | 15GB (have 16GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 14.84 tok/sEstimated | 8GB (have 16GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 21.53 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Fits (tight) | 11.72 tok/sEstimated | 15GB (have 16GB) |
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 18.35 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 23.33 tok/sEstimated | 3GB (have 16GB) |
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 14.82 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 20.20 tok/sEstimated | 4GB (have 16GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 16.54 tok/sEstimated | 5GB (have 16GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 26.43 tok/sEstimated | 3GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Fits (tight) | 11.38 tok/sEstimated | 15GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Fits (tight) | 11.65 tok/sEstimated | 15GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 16GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Fits (tight) | 11.48 tok/sEstimated | 15GB (have 16GB) |
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 16GB) |
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 16GB) |
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 28.04 tok/sEstimated | 1GB (have 16GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 42.74 tok/sEstimated | 1GB (have 16GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 15.02 tok/sEstimated | 8GB (have 16GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 19.94 tok/sEstimated | 4GB (have 16GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits comfortably | 12.96 tok/sEstimated | 9GB (have 16GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 18.20 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 22.04 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 30.48 tok/sEstimated | 2GB (have 16GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 16.08 tok/sEstimated | 7GB (have 16GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 19.78 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Fits comfortably | 12.95 tok/sEstimated | 13GB (have 16GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 16.08 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 16GB) |
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 32.35 tok/sEstimated | 1GB (have 16GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 39.17 tok/sEstimated | 1GB (have 16GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 21.36 tok/sEstimated | 3GB (have 16GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 29.93 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 16GB) |
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 14.30 tok/sEstimated | 7GB (have 16GB) |
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 22.76 tok/sEstimated | 4GB (have 16GB) |
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 13.84 tok/sEstimated | 7GB (have 16GB) |
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 21.00 tok/sEstimated | 4GB (have 16GB) |
| google/gemma-3-270m-it | Q8 | Fits comfortably | 14.97 tok/sEstimated | 7GB (have 16GB) |
| google/gemma-3-270m-it | Q4 | Fits comfortably | 23.41 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 16.71 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 26.44 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 16GB) |
| Qwen/Qwen2.5-32B | Q4 | Fits (tight) | 11.76 tok/sEstimated | 16GB (have 16GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 15.09 tok/sEstimated | 7GB (have 16GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 21.95 tok/sEstimated | 4GB (have 16GB) |
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 14.29 tok/sEstimated | 7GB (have 16GB) |
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 20.43 tok/sEstimated | 4GB (have 16GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 16.36 tok/sEstimated | 5GB (have 16GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 25.96 tok/sEstimated | 3GB (have 16GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 21.29 tok/sEstimated | 2GB (have 16GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 31.04 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 16GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 16GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 14.45 tok/sEstimated | 7GB (have 16GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 22.01 tok/sEstimated | 4GB (have 16GB) |
| google/gemma-2b | Q8 | Fits comfortably | 21.35 tok/sEstimated | 2GB (have 16GB) |
| google/gemma-2b | Q4 | Fits comfortably | 31.07 tok/sEstimated | 1GB (have 16GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 15.05 tok/sEstimated | 7GB (have 16GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 20.51 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 16GB) |
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 16GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 13.26 tok/sEstimated | 8GB (have 16GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 19.30 tok/sEstimated | 4GB (have 16GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 13.62 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 19.37 tok/sEstimated | 4GB (have 16GB) |
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 13.92 tok/sEstimated | 7GB (have 16GB) |
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 20.70 tok/sEstimated | 4GB (have 16GB) |
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 15.15 tok/sEstimated | 7GB (have 16GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 22.43 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 19.04 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 25.37 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Fits (tight) | 11.74 tok/sEstimated | 15GB (have 16GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 15.06 tok/sEstimated | 8GB (have 16GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 19.54 tok/sEstimated | 4GB (have 16GB) |
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 30.02 tok/sEstimated | 1GB (have 16GB) |
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 46.26 tok/sEstimated | 1GB (have 16GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 12.93 tok/sEstimated | 8GB (have 16GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 21.29 tok/sEstimated | 4GB (have 16GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 16GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 16GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 13.75 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 21.88 tok/sEstimated | 4GB (have 16GB) |
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 15.08 tok/sEstimated | 7GB (have 16GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 20.75 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 14.41 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 20.38 tok/sEstimated | 4GB (have 16GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 14.89 tok/sEstimated | 7GB (have 16GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 23.10 tok/sEstimated | 4GB (have 16GB) |
| huggyllama/llama-7b | Q8 | Fits comfortably | 15.93 tok/sEstimated | 7GB (have 16GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 20.36 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 14.00 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 21.17 tok/sEstimated | 4GB (have 16GB) |
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 14.90 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 21.42 tok/sEstimated | 4GB (have 16GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 15.02 tok/sEstimated | 7GB (have 16GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 22.53 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 14.77 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 21.89 tok/sEstimated | 4GB (have 16GB) |
| openai-community/gpt2-xl | Q8 | Fits comfortably | 15.75 tok/sEstimated | 7GB (have 16GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 20.40 tok/sEstimated | 4GB (have 16GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Fits comfortably | 11.39 tok/sEstimated | 14GB (have 16GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 16.86 tok/sEstimated | 7GB (have 16GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 16GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 18.80 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 25.03 tok/sEstimated | 2GB (have 16GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 19.16 tok/sEstimated | 3GB (have 16GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 28.51 tok/sEstimated | 2GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 18.45 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 26.72 tok/sEstimated | 2GB (have 16GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 22.06 tok/sEstimated | 3GB (have 16GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 27.56 tok/sEstimated | 2GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 20.10 tok/sEstimated | 4GB (have 16GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 26.44 tok/sEstimated | 2GB (have 16GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 21.71 tok/sEstimated | 3GB (have 16GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 27.52 tok/sEstimated | 2GB (have 16GB) |
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 16.39 tok/sEstimated | 7GB (have 16GB) |
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 21.93 tok/sEstimated | 4GB (have 16GB) |
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 16GB) |
| codellama/CodeLlama-34b-hf | Q4 | Not supported | — | 17GB (have 16GB) |
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 29.05 tok/sEstimated | 1GB (have 16GB) |
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 41.12 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 16.01 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 22.28 tok/sEstimated | 3GB (have 16GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 22.34 tok/sEstimated | 2GB (have 16GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 31.08 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen2.5-14B | Q8 | Fits comfortably | 10.58 tok/sEstimated | 14GB (have 16GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 17.06 tok/sEstimated | 7GB (have 16GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 16GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Fits (tight) | 10.76 tok/sEstimated | 16GB (have 16GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 14.36 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 22.56 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 18.65 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 23.92 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 16.30 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 21.83 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 15.36 tok/sEstimated | 7GB (have 16GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 20.71 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-14B-Base | Q8 | Fits comfortably | 12.42 tok/sEstimated | 14GB (have 16GB) |
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 15.57 tok/sEstimated | 7GB (have 16GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 15.52 tok/sEstimated | 8GB (have 16GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 22.07 tok/sEstimated | 4GB (have 16GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 15.28 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 22.32 tok/sEstimated | 4GB (have 16GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 15.95 tok/sEstimated | 7GB (have 16GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 22.40 tok/sEstimated | 4GB (have 16GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 14.36 tok/sEstimated | 7GB (have 16GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 21.62 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 16.55 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 25.96 tok/sEstimated | 3GB (have 16GB) |
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 13.02 tok/sEstimated | 8GB (have 16GB) |
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 21.44 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-30B-A3B | Q4 | Fits (tight) | 12.98 tok/sEstimated | 15GB (have 16GB) |
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 15.73 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 23.05 tok/sEstimated | 4GB (have 16GB) |
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 15.19 tok/sEstimated | 7GB (have 16GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 22.56 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 13.21 tok/sEstimated | 8GB (have 16GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 21.06 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Fits (tight) | 11.55 tok/sEstimated | 15GB (have 16GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 19.98 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 24.44 tok/sEstimated | 2GB (have 16GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 15.84 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 22.63 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 13.70 tok/sEstimated | 8GB (have 16GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 18.79 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 14.93 tok/sEstimated | 6GB (have 16GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 22.98 tok/sEstimated | 3GB (have 16GB) |
| openai-community/gpt2-medium | Q8 | Fits comfortably | 16.38 tok/sEstimated | 7GB (have 16GB) |
| openai-community/gpt2-medium | Q4 | Fits comfortably | 21.77 tok/sEstimated | 4GB (have 16GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 15.53 tok/sEstimated | 7GB (have 16GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 19.74 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 18.35 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 25.32 tok/sEstimated | 3GB (have 16GB) |
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 16.18 tok/sEstimated | 7GB (have 16GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 21.23 tok/sEstimated | 4GB (have 16GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Not supported | — | 20GB (have 16GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 15.40 tok/sEstimated | 10GB (have 16GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 16GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 16GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 12.82 tok/sEstimated | 8GB (have 16GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 21.09 tok/sEstimated | 4GB (have 16GB) |
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 16.22 tok/sEstimated | 7GB (have 16GB) |
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 20.75 tok/sEstimated | 4GB (have 16GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 16.31 tok/sEstimated | 7GB (have 16GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 23.34 tok/sEstimated | 4GB (have 16GB) |
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 22.27 tok/sEstimated | 2GB (have 16GB) |
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 32.80 tok/sEstimated | 1GB (have 16GB) |
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 16.34 tok/sEstimated | 7GB (have 16GB) |
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 21.86 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 16GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits (tight) | 10.91 tok/sEstimated | 16GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 14.61 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 21.22 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 15.31 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 18.87 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 14.97 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 20.33 tok/sEstimated | 4GB (have 16GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 13.49 tok/sEstimated | 7GB (have 16GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 22.09 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 18.72 tok/sEstimated | 3GB (have 16GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 26.94 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 15.74 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 24.74 tok/sEstimated | 3GB (have 16GB) |
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 14.67 tok/sEstimated | 7GB (have 16GB) |
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 20.91 tok/sEstimated | 4GB (have 16GB) |
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 13.82 tok/sEstimated | 7GB (have 16GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 19.63 tok/sEstimated | 4GB (have 16GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 16.39 tok/sEstimated | 7GB (have 16GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 22.40 tok/sEstimated | 4GB (have 16GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 13.80 tok/sEstimated | 7GB (have 16GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 19.43 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 14.64 tok/sEstimated | 8GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 22.07 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 15.98 tok/sEstimated | 7GB (have 16GB) |
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 22.13 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 14.75 tok/sEstimated | 7GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 23.10 tok/sEstimated | 4GB (have 16GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 14.25 tok/sEstimated | 7GB (have 16GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 19.36 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 16GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 16GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 15.68 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 23.81 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen3-14B | Q8 | Fits comfortably | 11.22 tok/sEstimated | 14GB (have 16GB) |
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 14.91 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 13.40 tok/sEstimated | 8GB (have 16GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 18.29 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 16GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 16GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 14.82 tok/sEstimated | 8GB (have 16GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 21.76 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 10.61 tok/sEstimated | 14GB (have 16GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 16.66 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 17.88 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 24.37 tok/sEstimated | 3GB (have 16GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 17.55 tok/sEstimated | 4GB (have 16GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 25.23 tok/sEstimated | 2GB (have 16GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Not supported | — | 20GB (have 16GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 14.34 tok/sEstimated | 10GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 17.55 tok/sEstimated | 5GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 22.07 tok/sEstimated | 3GB (have 16GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 12.92 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 20.49 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 14.49 tok/sEstimated | 6GB (have 16GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 23.95 tok/sEstimated | 3GB (have 16GB) |
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 14.14 tok/sEstimated | 7GB (have 16GB) |
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 21.62 tok/sEstimated | 4GB (have 16GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 22.22 tok/sEstimated | 3GB (have 16GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 28.23 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 16GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Fits (tight) | 12.92 tok/sEstimated | 15GB (have 16GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 16.72 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 27.35 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 15.72 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 20.71 tok/sEstimated | 4GB (have 16GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 15.67 tok/sEstimated | 7GB (have 16GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 20.83 tok/sEstimated | 4GB (have 16GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 14.90 tok/sEstimated | 7GB (have 16GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 21.28 tok/sEstimated | 4GB (have 16GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 32.24 tok/sEstimated | 1GB (have 16GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 44.61 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 16GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 16GB) |
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 16GB) |
| Qwen/Qwen3-32B | Q4 | Fits (tight) | 12.66 tok/sEstimated | 16GB (have 16GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 15.76 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 22.60 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 15.83 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 22.24 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 14.23 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 18.86 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 31.94 tok/sEstimated | 1GB (have 16GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 46.00 tok/sEstimated | 1GB (have 16GB) |
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 13.83 tok/sEstimated | 7GB (have 16GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 23.42 tok/sEstimated | 4GB (have 16GB) |
| vikhyatk/moondream2 | Q8 | Fits comfortably | 14.59 tok/sEstimated | 7GB (have 16GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 20.53 tok/sEstimated | 4GB (have 16GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 20.75 tok/sEstimated | 3GB (have 16GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 30.01 tok/sEstimated | 2GB (have 16GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 16GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 16GB) |
| distilbert/distilgpt2 | Q8 | Fits comfortably | 16.39 tok/sEstimated | 7GB (have 16GB) |
| distilbert/distilgpt2 | Q4 | Fits comfortably | 22.84 tok/sEstimated | 4GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 16GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Fits (tight) | 11.62 tok/sEstimated | 16GB (have 16GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 22.16 tok/sEstimated | 3GB (have 16GB) |
| inference-net/Schematron-3B | Q4 | Fits comfortably | 26.19 tok/sEstimated | 2GB (have 16GB) |
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 15.45 tok/sEstimated | 8GB (have 16GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 19.48 tok/sEstimated | 4GB (have 16GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 16.02 tok/sEstimated | 7GB (have 16GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 23.04 tok/sEstimated | 4GB (have 16GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 20.68 tok/sEstimated | 3GB (have 16GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 27.94 tok/sEstimated | 2GB (have 16GB) |
| bigscience/bloomz-560m | Q8 | Fits comfortably | 13.60 tok/sEstimated | 7GB (have 16GB) |
| bigscience/bloomz-560m | Q4 | Fits comfortably | 21.07 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 19.91 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 28.43 tok/sEstimated | 2GB (have 16GB) |
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 16GB) |
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 16GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 27.14 tok/sEstimated | 1GB (have 16GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 40.82 tok/sEstimated | 1GB (have 16GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 16.51 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 23.66 tok/sEstimated | 2GB (have 16GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 16.26 tok/sEstimated | 7GB (have 16GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 21.10 tok/sEstimated | 4GB (have 16GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 31.85 tok/sEstimated | 1GB (have 16GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 42.50 tok/sEstimated | 1GB (have 16GB) |
| facebook/opt-125m | Q8 | Fits comfortably | 15.85 tok/sEstimated | 7GB (have 16GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 20.23 tok/sEstimated | 4GB (have 16GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 18.50 tok/sEstimated | 5GB (have 16GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 26.00 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 16.43 tok/sEstimated | 6GB (have 16GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 20.40 tok/sEstimated | 3GB (have 16GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 29.10 tok/sEstimated | 1GB (have 16GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 39.20 tok/sEstimated | 1GB (have 16GB) |
| openai/gpt-oss-20b | Q8 | Not supported | — | 20GB (have 16GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 14.39 tok/sEstimated | 10GB (have 16GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 16GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Not supported | — | 17GB (have 16GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 15.58 tok/sEstimated | 8GB (have 16GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 19.02 tok/sEstimated | 4GB (have 16GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 18.14 tok/sEstimated | 5GB (have 16GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 26.46 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 16.42 tok/sEstimated | 6GB (have 16GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 22.27 tok/sEstimated | 3GB (have 16GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 13.69 tok/sEstimated | 7GB (have 16GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 22.19 tok/sEstimated | 4GB (have 16GB) |
| openai-community/gpt2 | Q8 | Fits comfortably | 14.17 tok/sEstimated | 7GB (have 16GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 21.20 tok/sEstimated | 4GB (have 16GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
A user benchmarking Llama 2 13B Q4 in Ollama and Open WebUI logged roughly 52 tokens/sec on the 16GB RTX 4060 Ti—about half the speed of a 3090 but smooth enough for interactive use.
Source: Reddit – /r/LocalLLaMA (kt7t5xj)
Yes—with partial CPU offload. Builders report that a 16GB 4060 Ti can run 30B models at 4 bpw, provided enough system RAM is available, whereas the 8GB card quickly stalls.
Source: Reddit – /r/LocalLLaMA (kjyvc7a)
Make sure the card runs on a PCIe 4.0 slot—owners note that the 8-lane interface becomes a major limiter on older PCIe 3.0 boards, slashing throughput during large context runs.
Source: Reddit – /r/LocalLLaMA (kt8pk13)
The 16GB RTX 4060 Ti carries a 165 W board power rating, uses a single 8-pin PCIe connector, and NVIDIA recommends pairing it with a 550 W PSU.
On 3 Nov 2025 our tracker showed the RTX 4060 Ti 16GB at $499 (Amazon, in stock), $519 (Newegg, in stock), and $499 (Best Buy, in stock).
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.
Explore how RTX 3090 stacks up for local inference workloads.