Loading GPU data...
Loading GPU data...
Quick Answer: RX 7900 XT offers 20GB VRAM and starts around $899.00. It delivers approximately 58 tokens/sec on apple/OpenELM-1_1B-Instruct. It typically draws 315W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| apple/OpenELM-1_1B-Instruct | Q4 | 57.92 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 55.74 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 55.31 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 55.06 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 54.52 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 51.75 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 51.41 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 48.97 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 48.65 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 42.36 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 42.34 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 40.46 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 40.39 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 40.19 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q8 | 39.69 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 39.15 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B | Q4 | 39.08 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gemma-3-1b-it | Q8 | 38.85 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 38.64 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q8 | 38.60 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 37.81 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 37.72 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 36.10 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 35.27 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 35.18 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q8 | 34.70 tok/sEstimated Auto-generated benchmark | 1GB |
| google-t5/t5-3b | Q4 | 34.65 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 34.51 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 33.99 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 33.74 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 33.68 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 33.62 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 33.44 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 33.32 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 33.25 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 33.24 tok/sEstimated Auto-generated benchmark | 2GB |
| inference-net/Schematron-3B | Q4 | 33.11 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 32.54 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B | Q4 | 32.37 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 31.87 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Base | Q4 | 31.82 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 31.39 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/VibeVoice-1.5B | Q4 | 31.32 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 31.21 tok/sEstimated Auto-generated benchmark | 3GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 31.17 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 30.76 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B | Q4 | 30.70 tok/sEstimated Auto-generated benchmark | 3GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 30.48 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2b | Q8 | 29.87 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 29.60 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-0.6B | Q4 | 29.57 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 29.53 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 29.45 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B | Q4 | 29.33 tok/sEstimated Auto-generated benchmark | 2GB |
| LiquidAI/LFM2-1.2B | Q8 | 29.25 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 28.94 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 28.93 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 28.91 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 28.89 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 28.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 28.58 tok/sEstimated Auto-generated benchmark | 3GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 28.49 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 28.48 tok/sEstimated Auto-generated benchmark | 4GB |
| rednote-hilab/dots.ocr | Q4 | 28.39 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 28.10 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 28.10 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 28.04 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 28.01 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 27.98 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 27.94 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 27.87 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 27.87 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 27.82 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 27.79 tok/sEstimated Auto-generated benchmark | 3GB |
| parler-tts/parler-tts-large-v1 | Q4 | 27.75 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 27.50 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 27.50 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 27.39 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 27.36 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 27.35 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 27.29 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2 | Q4 | 27.20 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 27.10 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 27.04 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 26.99 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 26.96 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 26.93 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 26.87 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q4 | 26.86 tok/sEstimated Auto-generated benchmark | 3GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 26.85 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-research/PowerMoE-3b | Q8 | 26.82 tok/sEstimated Auto-generated benchmark | 3GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 26.79 tok/sEstimated Auto-generated benchmark | 4GB |
| liuhaotian/llava-v1.5-7b | Q4 | 26.77 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 26.58 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q8 | 26.52 tok/sEstimated Auto-generated benchmark | 3GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 26.50 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 26.50 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 26.46 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 26.43 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 26.35 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 26.32 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 26.31 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 26.22 tok/sEstimated Auto-generated benchmark | 3GB |
| dicta-il/dictalm2.0-instruct | Q4 | 26.17 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 26.16 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 26.15 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-2b-it | Q8 | 26.13 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 26.03 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 25.96 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 25.88 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 25.72 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 25.70 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 25.68 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 25.68 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 25.61 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 25.55 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 25.52 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 25.48 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 25.41 tok/sEstimated Auto-generated benchmark | 4GB |
| bigcode/starcoder2-3b | Q8 | 25.37 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-7b-hf | Q4 | 25.37 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 25.31 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 25.31 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 25.29 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 25.29 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 25.16 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 25.09 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 24.96 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B | Q4 | 24.93 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 24.93 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 24.90 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 24.90 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 24.63 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 24.57 tok/sEstimated Auto-generated benchmark | 4GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 24.54 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 24.46 tok/sEstimated Auto-generated benchmark | 4GB |
| google-t5/t5-3b | Q8 | 24.39 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 24.34 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 24.27 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 24.16 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 24.14 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 24.07 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 24.06 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 24.05 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 24.02 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 24.01 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 24.00 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 23.98 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 23.93 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 23.80 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 23.79 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 23.72 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 23.69 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 23.68 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 23.48 tok/sEstimated Auto-generated benchmark | 4GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 23.44 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-3B | Q8 | 23.38 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.1-8B | Q4 | 23.34 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 23.34 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 23.34 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 23.26 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B | Q8 | 23.21 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 23.02 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 22.87 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 22.81 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 22.78 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 22.74 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B | Q8 | 22.63 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 22.60 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 22.24 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 22.16 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B | Q4 | 22.10 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 21.82 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 21.82 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 21.77 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Base | Q8 | 21.60 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q8 | 21.53 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 21.51 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-14B | Q4 | 21.35 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 21.10 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 21.06 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/VibeVoice-1.5B | Q8 | 20.77 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-14B-Base | Q4 | 20.57 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 20.56 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 20.27 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-1.7B | Q8 | 20.26 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 20.24 tok/sEstimated Auto-generated benchmark | 6GB |
| google/gemma-3-270m-it | Q8 | 20.23 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 20.22 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 20.14 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 20.14 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 20.12 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-small | Q8 | 20.06 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 20.06 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B | Q8 | 20.06 tok/sEstimated Auto-generated benchmark | 6GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 20.02 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 19.96 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 19.91 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 19.90 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 19.89 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 19.75 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 19.74 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 19.70 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 19.67 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/phi-2 | Q8 | 19.65 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 19.64 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 19.56 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 19.53 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 19.37 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 19.35 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 19.28 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 19.27 tok/sEstimated Auto-generated benchmark | 5GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 19.27 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 19.25 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 19.24 tok/sEstimated Auto-generated benchmark | 8GB |
| bigscience/bloomz-560m | Q8 | 19.23 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 19.21 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 19.21 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 19.20 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 19.16 tok/sEstimated Auto-generated benchmark | 8GB |
| openai-community/gpt2 | Q8 | 19.09 tok/sEstimated Auto-generated benchmark | 7GB |
| dicta-il/dictalm2.0-instruct | Q8 | 18.95 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 18.92 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 18.87 tok/sEstimated Auto-generated benchmark | 8GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 18.87 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-7B | Q8 | 18.84 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 18.84 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 18.83 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 18.78 tok/sEstimated Auto-generated benchmark | 10GB |
| openai-community/gpt2-large | Q8 | 18.74 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 18.74 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 18.73 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 18.65 tok/sEstimated Auto-generated benchmark | 8GB |
| liuhaotian/llava-v1.5-7b | Q8 | 18.59 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 18.55 tok/sEstimated Auto-generated benchmark | 7GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 18.49 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 18.44 tok/sEstimated Auto-generated benchmark | 8GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 18.43 tok/sEstimated Auto-generated benchmark | 7GB |
| openai/gpt-oss-20b | Q4 | 18.39 tok/sEstimated Auto-generated benchmark | 10GB |
| rinna/japanese-gpt-neox-small | Q8 | 18.29 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 18.20 tok/sEstimated Auto-generated benchmark | 7GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 18.19 tok/sEstimated Auto-generated benchmark | 9GB |
| distilbert/distilgpt2 | Q8 | 18.16 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B | Q8 | 18.12 tok/sEstimated Auto-generated benchmark | 8GB |
| rednote-hilab/dots.ocr | Q8 | 18.04 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 18.00 tok/sEstimated Auto-generated benchmark | 8GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 17.99 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 17.98 tok/sEstimated Auto-generated benchmark | 6GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 17.94 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 17.88 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 17.87 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 17.86 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 17.86 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 17.82 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 17.77 tok/sEstimated Auto-generated benchmark | 8GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 17.75 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 17.72 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-xl | Q8 | 17.62 tok/sEstimated Auto-generated benchmark | 7GB |
| vikhyatk/moondream2 | Q8 | 17.61 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 17.53 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 17.47 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 17.47 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 17.45 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 17.44 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 17.42 tok/sEstimated Auto-generated benchmark | 8GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 17.40 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 17.38 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 17.35 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 17.30 tok/sEstimated Auto-generated benchmark | 8GB |
| zai-org/GLM-4.6-FP8 | Q8 | 17.22 tok/sEstimated Auto-generated benchmark | 7GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 17.08 tok/sEstimated Auto-generated benchmark | 10GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 17.02 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 17.00 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 16.83 tok/sEstimated Auto-generated benchmark | 8GB |
| zai-org/GLM-4.5-Air | Q8 | 16.76 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 16.75 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 16.75 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 16.75 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 16.66 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 16.63 tok/sEstimated Auto-generated benchmark | 7GB |
| numind/NuExtract-1.5 | Q8 | 16.63 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 16.53 tok/sEstimated Auto-generated benchmark | 10GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 16.51 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 16.47 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 16.01 tok/sEstimated Auto-generated benchmark | 15GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 15.97 tok/sEstimated Auto-generated benchmark | 13GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 15.81 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 15.67 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-32B | Q4 | 15.33 tok/sEstimated Auto-generated benchmark | 16GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 15.30 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 15.14 tok/sEstimated Auto-generated benchmark | 15GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 15.12 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 14.83 tok/sEstimated Auto-generated benchmark | 16GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 14.76 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q4 | 14.67 tok/sEstimated Auto-generated benchmark | 15GB |
| codellama/CodeLlama-34b-hf | Q4 | 14.66 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 14.50 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2.5-32B | Q4 | 14.35 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 13.79 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 13.69 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 13.69 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 13.58 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 13.55 tok/sEstimated Auto-generated benchmark | 16GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 13.35 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-14B | Q8 | 13.17 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-14B-Base | Q8 | 13.16 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-14B | Q8 | 13.10 tok/sEstimated Auto-generated benchmark | 14GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 12.87 tok/sEstimated Auto-generated benchmark | 16GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 12.75 tok/sEstimated Auto-generated benchmark | 17GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 12.34 tok/sEstimated Auto-generated benchmark | 20GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 12.31 tok/sEstimated Auto-generated benchmark | 20GB |
| openai/gpt-oss-20b | Q8 | 12.31 tok/sEstimated Auto-generated benchmark | 20GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 11.67 tok/sEstimated Auto-generated benchmark | 20GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 20GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 20GB) |
| ai-forever/ruGPT-3.5-13B | Q8 | Fits comfortably | 15.97 tok/sEstimated | 13GB (have 20GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 19.56 tok/sEstimated | 7GB (have 20GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 20GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Fits comfortably | 13.35 tok/sEstimated | 16GB (have 20GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 17.44 tok/sEstimated | 7GB (have 20GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 24.46 tok/sEstimated | 4GB (have 20GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 18.87 tok/sEstimated | 8GB (have 20GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 23.48 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 19.91 tok/sEstimated | 7GB (have 20GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 28.94 tok/sEstimated | 4GB (have 20GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Fits (tight) | 12.34 tok/sEstimated | 20GB (have 20GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits comfortably | 16.53 tok/sEstimated | 10GB (have 20GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 17.86 tok/sEstimated | 7GB (have 20GB) |
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 23.79 tok/sEstimated | 4GB (have 20GB) |
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 18.95 tok/sEstimated | 7GB (have 20GB) |
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 26.17 tok/sEstimated | 4GB (have 20GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 20GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Fits comfortably | 14.76 tok/sEstimated | 15GB (have 20GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 19.21 tok/sEstimated | 8GB (have 20GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 26.79 tok/sEstimated | 4GB (have 20GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 20GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Fits comfortably | 15.30 tok/sEstimated | 15GB (have 20GB) |
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 21.51 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 31.39 tok/sEstimated | 3GB (have 20GB) |
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 17.87 tok/sEstimated | 7GB (have 20GB) |
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 25.09 tok/sEstimated | 4GB (have 20GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 18.87 tok/sEstimated | 5GB (have 20GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 26.85 tok/sEstimated | 3GB (have 20GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 20GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Fits comfortably | 16.01 tok/sEstimated | 15GB (have 20GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 20GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Fits comfortably | 15.67 tok/sEstimated | 15GB (have 20GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 20GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Fits comfortably | 13.58 tok/sEstimated | 15GB (have 20GB) |
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 20GB) |
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 20GB) |
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 40.19 tok/sEstimated | 1GB (have 20GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 57.92 tok/sEstimated | 1GB (have 20GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 17.86 tok/sEstimated | 8GB (have 20GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 25.70 tok/sEstimated | 4GB (have 20GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits comfortably | 18.19 tok/sEstimated | 9GB (have 20GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 23.44 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 23.38 tok/sEstimated | 3GB (have 20GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 37.72 tok/sEstimated | 2GB (have 20GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 18.20 tok/sEstimated | 7GB (have 20GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 28.49 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Fits comfortably | 14.50 tok/sEstimated | 13GB (have 20GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 22.24 tok/sEstimated | 7GB (have 20GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 20GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 20GB) |
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 38.85 tok/sEstimated | 1GB (have 20GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 48.65 tok/sEstimated | 1GB (have 20GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 25.37 tok/sEstimated | 3GB (have 20GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 33.24 tok/sEstimated | 2GB (have 20GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 20GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 20GB) |
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 19.96 tok/sEstimated | 7GB (have 20GB) |
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 25.88 tok/sEstimated | 4GB (have 20GB) |
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 17.45 tok/sEstimated | 7GB (have 20GB) |
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 24.63 tok/sEstimated | 4GB (have 20GB) |
| google/gemma-3-270m-it | Q8 | Fits comfortably | 20.23 tok/sEstimated | 7GB (have 20GB) |
| google/gemma-3-270m-it | Q4 | Fits comfortably | 26.93 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 21.82 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 33.44 tok/sEstimated | 2GB (have 20GB) |
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 20GB) |
| Qwen/Qwen2.5-32B | Q4 | Fits comfortably | 14.35 tok/sEstimated | 16GB (have 20GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 20.06 tok/sEstimated | 7GB (have 20GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 27.75 tok/sEstimated | 4GB (have 20GB) |
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 19.53 tok/sEstimated | 7GB (have 20GB) |
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 28.93 tok/sEstimated | 4GB (have 20GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 20.77 tok/sEstimated | 5GB (have 20GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 31.32 tok/sEstimated | 3GB (have 20GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 29.60 tok/sEstimated | 2GB (have 20GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 42.34 tok/sEstimated | 1GB (have 20GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 20GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 20GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 18.59 tok/sEstimated | 7GB (have 20GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 26.77 tok/sEstimated | 4GB (have 20GB) |
| google/gemma-2b | Q8 | Fits comfortably | 29.87 tok/sEstimated | 2GB (have 20GB) |
| google/gemma-2b | Q4 | Fits comfortably | 39.15 tok/sEstimated | 1GB (have 20GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 17.99 tok/sEstimated | 7GB (have 20GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 25.48 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 20GB) |
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 20GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 16.83 tok/sEstimated | 8GB (have 20GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 22.87 tok/sEstimated | 4GB (have 20GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 16.75 tok/sEstimated | 7GB (have 20GB) |
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 26.58 tok/sEstimated | 4GB (have 20GB) |
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 20.02 tok/sEstimated | 7GB (have 20GB) |
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 25.31 tok/sEstimated | 4GB (have 20GB) |
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 19.37 tok/sEstimated | 7GB (have 20GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 27.35 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 23.34 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 34.51 tok/sEstimated | 2GB (have 20GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 20GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Fits comfortably | 15.14 tok/sEstimated | 15GB (have 20GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 19.16 tok/sEstimated | 8GB (have 20GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 23.68 tok/sEstimated | 4GB (have 20GB) |
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 38.64 tok/sEstimated | 1GB (have 20GB) |
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 55.74 tok/sEstimated | 1GB (have 20GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 17.47 tok/sEstimated | 8GB (have 20GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 25.68 tok/sEstimated | 4GB (have 20GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 20GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 20GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 17.53 tok/sEstimated | 7GB (have 20GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 24.02 tok/sEstimated | 4GB (have 20GB) |
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 16.63 tok/sEstimated | 7GB (have 20GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 23.93 tok/sEstimated | 4GB (have 20GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 19.75 tok/sEstimated | 7GB (have 20GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 24.93 tok/sEstimated | 4GB (have 20GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 17.75 tok/sEstimated | 7GB (have 20GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 26.50 tok/sEstimated | 4GB (have 20GB) |
| huggyllama/llama-7b | Q8 | Fits comfortably | 18.83 tok/sEstimated | 7GB (have 20GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 24.06 tok/sEstimated | 4GB (have 20GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 19.21 tok/sEstimated | 7GB (have 20GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 28.04 tok/sEstimated | 4GB (have 20GB) |
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 19.89 tok/sEstimated | 7GB (have 20GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 28.72 tok/sEstimated | 4GB (have 20GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 20.14 tok/sEstimated | 7GB (have 20GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 28.01 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 16.47 tok/sEstimated | 8GB (have 20GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 26.16 tok/sEstimated | 4GB (have 20GB) |
| openai-community/gpt2-xl | Q8 | Fits comfortably | 17.62 tok/sEstimated | 7GB (have 20GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 24.01 tok/sEstimated | 4GB (have 20GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Fits comfortably | 15.12 tok/sEstimated | 14GB (have 20GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 18.49 tok/sEstimated | 7GB (have 20GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 20GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 20GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 23.34 tok/sEstimated | 4GB (have 20GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 29.45 tok/sEstimated | 2GB (have 20GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 26.82 tok/sEstimated | 3GB (have 20GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 32.54 tok/sEstimated | 2GB (have 20GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 22.60 tok/sEstimated | 4GB (have 20GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 33.99 tok/sEstimated | 2GB (have 20GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 23.02 tok/sEstimated | 3GB (have 20GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 36.10 tok/sEstimated | 2GB (have 20GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 21.82 tok/sEstimated | 4GB (have 20GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 33.74 tok/sEstimated | 2GB (have 20GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 26.22 tok/sEstimated | 3GB (have 20GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 39.08 tok/sEstimated | 2GB (have 20GB) |
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 18.55 tok/sEstimated | 7GB (have 20GB) |
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 26.50 tok/sEstimated | 4GB (have 20GB) |
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 20GB) |
| codellama/CodeLlama-34b-hf | Q4 | Fits comfortably | 14.66 tok/sEstimated | 17GB (have 20GB) |
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 37.81 tok/sEstimated | 1GB (have 20GB) |
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 55.06 tok/sEstimated | 1GB (have 20GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 21.77 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 31.87 tok/sEstimated | 3GB (have 20GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 26.13 tok/sEstimated | 2GB (have 20GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 40.39 tok/sEstimated | 1GB (have 20GB) |
| Qwen/Qwen2.5-14B | Q8 | Fits comfortably | 13.10 tok/sEstimated | 14GB (have 20GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 22.10 tok/sEstimated | 7GB (have 20GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 20GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Fits comfortably | 12.87 tok/sEstimated | 16GB (have 20GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 18.92 tok/sEstimated | 7GB (have 20GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 27.94 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 21.60 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 31.82 tok/sEstimated | 2GB (have 20GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 17.00 tok/sEstimated | 7GB (have 20GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 27.10 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 18.73 tok/sEstimated | 7GB (have 20GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 25.72 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-14B-Base | Q8 | Fits comfortably | 13.16 tok/sEstimated | 14GB (have 20GB) |
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 20.57 tok/sEstimated | 7GB (have 20GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 19.27 tok/sEstimated | 8GB (have 20GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 25.68 tok/sEstimated | 4GB (have 20GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 17.35 tok/sEstimated | 7GB (have 20GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 25.52 tok/sEstimated | 4GB (have 20GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 16.75 tok/sEstimated | 7GB (have 20GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 27.87 tok/sEstimated | 4GB (have 20GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 18.29 tok/sEstimated | 7GB (have 20GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 27.50 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 19.74 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 31.21 tok/sEstimated | 3GB (have 20GB) |
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 15.81 tok/sEstimated | 8GB (have 20GB) |
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 27.50 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 20GB) |
| Qwen/Qwen3-30B-A3B | Q4 | Fits comfortably | 14.67 tok/sEstimated | 15GB (have 20GB) |
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 17.82 tok/sEstimated | 7GB (have 20GB) |
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 23.98 tok/sEstimated | 4GB (have 20GB) |
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 20.06 tok/sEstimated | 7GB (have 20GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 28.10 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 17.77 tok/sEstimated | 8GB (have 20GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 22.78 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 20GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Fits comfortably | 13.79 tok/sEstimated | 15GB (have 20GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 24.57 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 29.53 tok/sEstimated | 2GB (have 20GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 17.02 tok/sEstimated | 7GB (have 20GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 25.61 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 17.30 tok/sEstimated | 8GB (have 20GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 26.46 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 20.27 tok/sEstimated | 6GB (have 20GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 27.39 tok/sEstimated | 3GB (have 20GB) |
| openai-community/gpt2-medium | Q8 | Fits comfortably | 18.84 tok/sEstimated | 7GB (have 20GB) |
| openai-community/gpt2-medium | Q4 | Fits comfortably | 27.87 tok/sEstimated | 4GB (have 20GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 18.43 tok/sEstimated | 7GB (have 20GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 26.87 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 19.27 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 27.98 tok/sEstimated | 3GB (have 20GB) |
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 16.66 tok/sEstimated | 7GB (have 20GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 25.16 tok/sEstimated | 4GB (have 20GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Fits (tight) | 11.67 tok/sEstimated | 20GB (have 20GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 18.78 tok/sEstimated | 10GB (have 20GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 20GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 20GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 19.24 tok/sEstimated | 8GB (have 20GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 23.26 tok/sEstimated | 4GB (have 20GB) |
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 16.76 tok/sEstimated | 7GB (have 20GB) |
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 24.05 tok/sEstimated | 4GB (have 20GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 17.94 tok/sEstimated | 7GB (have 20GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 27.04 tok/sEstimated | 4GB (have 20GB) |
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 29.25 tok/sEstimated | 2GB (have 20GB) |
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 42.36 tok/sEstimated | 1GB (have 20GB) |
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 16.75 tok/sEstimated | 7GB (have 20GB) |
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 26.35 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 20GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits comfortably | 14.83 tok/sEstimated | 16GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 19.28 tok/sEstimated | 7GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 25.55 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 17.42 tok/sEstimated | 8GB (have 20GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 23.34 tok/sEstimated | 4GB (have 20GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 18.74 tok/sEstimated | 7GB (have 20GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 28.89 tok/sEstimated | 4GB (have 20GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 20.14 tok/sEstimated | 7GB (have 20GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 24.00 tok/sEstimated | 4GB (have 20GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 26.99 tok/sEstimated | 3GB (have 20GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 33.62 tok/sEstimated | 2GB (have 20GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 21.10 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 26.86 tok/sEstimated | 3GB (have 20GB) |
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 17.88 tok/sEstimated | 7GB (have 20GB) |
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 26.15 tok/sEstimated | 4GB (have 20GB) |
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 16.63 tok/sEstimated | 7GB (have 20GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 25.31 tok/sEstimated | 4GB (have 20GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 17.22 tok/sEstimated | 7GB (have 20GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 28.10 tok/sEstimated | 4GB (have 20GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 17.40 tok/sEstimated | 7GB (have 20GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 26.96 tok/sEstimated | 4GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 16.51 tok/sEstimated | 8GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 25.29 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 19.90 tok/sEstimated | 7GB (have 20GB) |
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 25.37 tok/sEstimated | 4GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 20.22 tok/sEstimated | 7GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 26.03 tok/sEstimated | 4GB (have 20GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 19.65 tok/sEstimated | 7GB (have 20GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 24.16 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 20GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 20GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 22.63 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 30.70 tok/sEstimated | 3GB (have 20GB) |
| Qwen/Qwen3-14B | Q8 | Fits comfortably | 13.17 tok/sEstimated | 14GB (have 20GB) |
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 21.35 tok/sEstimated | 7GB (have 20GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 18.00 tok/sEstimated | 8GB (have 20GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 24.90 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 20GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 20GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 18.44 tok/sEstimated | 8GB (have 20GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 24.90 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 13.69 tok/sEstimated | 14GB (have 20GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 19.70 tok/sEstimated | 7GB (have 20GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 21.53 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 32.37 tok/sEstimated | 3GB (have 20GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 24.14 tok/sEstimated | 4GB (have 20GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 30.48 tok/sEstimated | 2GB (have 20GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Fits (tight) | 12.31 tok/sEstimated | 20GB (have 20GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 17.08 tok/sEstimated | 10GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 22.81 tok/sEstimated | 5GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 30.76 tok/sEstimated | 3GB (have 20GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 18.65 tok/sEstimated | 8GB (have 20GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 26.43 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 17.98 tok/sEstimated | 6GB (have 20GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 28.58 tok/sEstimated | 3GB (have 20GB) |
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 18.04 tok/sEstimated | 7GB (have 20GB) |
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 28.39 tok/sEstimated | 4GB (have 20GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 24.39 tok/sEstimated | 3GB (have 20GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 34.65 tok/sEstimated | 2GB (have 20GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 20GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Fits comfortably | 13.69 tok/sEstimated | 15GB (have 20GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 23.21 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 29.33 tok/sEstimated | 2GB (have 20GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 20.26 tok/sEstimated | 7GB (have 20GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 25.96 tok/sEstimated | 4GB (have 20GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 18.74 tok/sEstimated | 7GB (have 20GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 27.82 tok/sEstimated | 4GB (have 20GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 17.72 tok/sEstimated | 7GB (have 20GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 24.34 tok/sEstimated | 4GB (have 20GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 38.60 tok/sEstimated | 1GB (have 20GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 51.41 tok/sEstimated | 1GB (have 20GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 20GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 20GB) |
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 20GB) |
| Qwen/Qwen3-32B | Q4 | Fits comfortably | 15.33 tok/sEstimated | 16GB (have 20GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 22.16 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 27.79 tok/sEstimated | 3GB (have 20GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 18.84 tok/sEstimated | 7GB (have 20GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 24.93 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 19.20 tok/sEstimated | 8GB (have 20GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 23.72 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 34.70 tok/sEstimated | 1GB (have 20GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 48.97 tok/sEstimated | 1GB (have 20GB) |
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 19.35 tok/sEstimated | 7GB (have 20GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 24.27 tok/sEstimated | 4GB (have 20GB) |
| vikhyatk/moondream2 | Q8 | Fits comfortably | 17.61 tok/sEstimated | 7GB (have 20GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 23.80 tok/sEstimated | 4GB (have 20GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 24.96 tok/sEstimated | 3GB (have 20GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 33.32 tok/sEstimated | 2GB (have 20GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 20GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 20GB) |
| distilbert/distilgpt2 | Q8 | Fits comfortably | 18.16 tok/sEstimated | 7GB (have 20GB) |
| distilbert/distilgpt2 | Q4 | Fits comfortably | 25.29 tok/sEstimated | 4GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 20GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Fits comfortably | 13.55 tok/sEstimated | 16GB (have 20GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 26.52 tok/sEstimated | 3GB (have 20GB) |
| inference-net/Schematron-3B | Q4 | Fits comfortably | 33.11 tok/sEstimated | 2GB (have 20GB) |
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 18.12 tok/sEstimated | 8GB (have 20GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 26.32 tok/sEstimated | 4GB (have 20GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 17.47 tok/sEstimated | 7GB (have 20GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 23.69 tok/sEstimated | 4GB (have 20GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 24.54 tok/sEstimated | 3GB (have 20GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 33.25 tok/sEstimated | 2GB (have 20GB) |
| bigscience/bloomz-560m | Q8 | Fits comfortably | 19.23 tok/sEstimated | 7GB (have 20GB) |
| bigscience/bloomz-560m | Q4 | Fits comfortably | 26.31 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 22.74 tok/sEstimated | 3GB (have 20GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 35.18 tok/sEstimated | 2GB (have 20GB) |
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 20GB) |
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 20GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 40.46 tok/sEstimated | 1GB (have 20GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 54.52 tok/sEstimated | 1GB (have 20GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 21.06 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 35.27 tok/sEstimated | 2GB (have 20GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 19.64 tok/sEstimated | 7GB (have 20GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 28.48 tok/sEstimated | 4GB (have 20GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 33.68 tok/sEstimated | 1GB (have 20GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 51.75 tok/sEstimated | 1GB (have 20GB) |
| facebook/opt-125m | Q8 | Fits comfortably | 17.38 tok/sEstimated | 7GB (have 20GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 25.41 tok/sEstimated | 4GB (have 20GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 20.56 tok/sEstimated | 5GB (have 20GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 27.29 tok/sEstimated | 3GB (have 20GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 20.24 tok/sEstimated | 6GB (have 20GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 28.91 tok/sEstimated | 3GB (have 20GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 39.69 tok/sEstimated | 1GB (have 20GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 55.31 tok/sEstimated | 1GB (have 20GB) |
| openai/gpt-oss-20b | Q8 | Fits (tight) | 12.31 tok/sEstimated | 20GB (have 20GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 18.39 tok/sEstimated | 10GB (have 20GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 20GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Fits comfortably | 12.75 tok/sEstimated | 17GB (have 20GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 19.25 tok/sEstimated | 8GB (have 20GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 24.07 tok/sEstimated | 4GB (have 20GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 19.67 tok/sEstimated | 5GB (have 20GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 31.17 tok/sEstimated | 3GB (have 20GB) |
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 20.06 tok/sEstimated | 6GB (have 20GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 29.57 tok/sEstimated | 3GB (have 20GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 20.12 tok/sEstimated | 7GB (have 20GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 27.36 tok/sEstimated | 4GB (have 20GB) |
| openai-community/gpt2 | Q8 | Fits comfortably | 19.09 tok/sEstimated | 7GB (have 20GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 27.20 tok/sEstimated | 4GB (have 20GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
An upgrader running LM Studio with ROCm on Windows measured ~112 tokens/sec on Qwen3-30B Q3_K_L, with GPU VRAM usage around 17 GB and no stability issues.
Source: Reddit – /r/LocalLLaMA (n791e2t)
That same test showed the 20 GB card leaving a couple of gigabytes free while running Qwen3-30B, confirming the XT has enough room for 30B Q3/Q4 workloads.
Source: Reddit – /r/LocalLLaMA (n791e2t)
RX 7900 XT owners recommend sticking with ROCm builds on Windows or Linux—community comparisons show ROCm outpacing Vulkan on this card for Qwen workloads.
Source: Reddit – /r/LocalLLaMA (n791e2t)
The RX 7900 XT has a 315 W board power, dual 8-pin connectors, and AMD advises at least a 750 W PSU.
On 3 Nov 2025 the RX 7900 XT was listed at $749 (Amazon, in stock), $779 (Newegg, in stock), and $769 (Best Buy, in stock).
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.