Loading GPU data...
Loading GPU data...
Quick Answer: RTX 3090 offers 24GB VRAM and starts around $999.00. It delivers approximately 113 tokens/sec on meta-llama/Llama-3.2-1B-Instruct. It typically draws 350W under load.
RTX 3090 still delivers strong results for large language models thanks to its 24GB VRAM. It is ideal for enthusiasts leveraging the Ampere generation for budget workstation builds.
Rotate out primary variants whenever validation flags an issue.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 112.88 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 112.28 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 111.22 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 109.90 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 107.37 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 106.60 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 105.80 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 95.71 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 95.11 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 86.02 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 83.95 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 79.74 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q8 | 76.86 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 76.52 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 76.45 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 76.12 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 75.64 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-research/PowerMoE-3b | Q4 | 73.56 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 72.93 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 72.18 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 72.10 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 72.01 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | Q4 | 71.07 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q8 | 68.95 tok/sEstimated Auto-generated benchmark | 1GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 67.75 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 67.23 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 67.08 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 66.63 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 66.62 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 66.53 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Base | Q4 | 66.19 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 66.17 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 65.51 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 65.47 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 65.33 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-1b-it | Q8 | 65.31 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 65.30 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B | Q4 | 64.98 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 64.22 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 63.13 tok/sEstimated Auto-generated benchmark | 3GB |
| inference-net/Schematron-3B | Q4 | 63.11 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 62.37 tok/sEstimated Auto-generated benchmark | 3GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 62.09 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 61.91 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2-2b-it | Q8 | 61.30 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/VibeVoice-1.5B | Q4 | 60.74 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 60.55 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2b | Q8 | 60.29 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 60.07 tok/sEstimated Auto-generated benchmark | 2GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 59.80 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 59.08 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 58.89 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B | Q4 | 58.61 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B | Q4 | 58.31 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 58.15 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 57.07 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 56.98 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-1.5B | Q4 | 56.86 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 56.45 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 56.42 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 56.25 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 56.19 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 56.18 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 56.03 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 56.02 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 56.02 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 55.91 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 55.86 tok/sEstimated Auto-generated benchmark | 4GB |
| LiquidAI/LFM2-1.2B | Q8 | 55.67 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-2-7b-hf | Q4 | 55.46 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 55.41 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 55.28 tok/sEstimated Auto-generated benchmark | 4GB |
| liuhaotian/llava-v1.5-7b | Q4 | 55.24 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 55.21 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 55.14 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 54.72 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 54.65 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 54.40 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 54.36 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q4 | 54.32 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B | Q4 | 54.32 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 54.31 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 54.24 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 54.13 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 54.13 tok/sEstimated Auto-generated benchmark | 4GB |
| rednote-hilab/dots.ocr | Q4 | 54.07 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 53.95 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 53.95 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 53.94 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 53.64 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 53.59 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 53.24 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 53.06 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-1.7B | Q4 | 52.62 tok/sEstimated Auto-generated benchmark | 4GB |
| google-t5/t5-3b | Q8 | 52.58 tok/sEstimated Auto-generated benchmark | 3GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 52.45 tok/sEstimated Auto-generated benchmark | 3GB |
| petals-team/StableBeluga2 | Q4 | 52.39 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 52.38 tok/sEstimated Auto-generated benchmark | 4GB |
| bigcode/starcoder2-3b | Q8 | 52.36 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 52.36 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 52.07 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 52.03 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 51.75 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 51.73 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 51.64 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 51.62 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 51.61 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 51.54 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 51.39 tok/sEstimated Auto-generated benchmark | 2GB |
| inference-net/Schematron-3B | Q8 | 51.37 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 51.36 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 51.22 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 51.21 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 51.03 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-8B | Q4 | 50.98 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 50.62 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 50.54 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-research/PowerMoE-3b | Q8 | 50.18 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-medium | Q4 | 50.17 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 50.16 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B | Q8 | 50.00 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 49.99 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 49.96 tok/sEstimated Auto-generated benchmark | 4GB |
| parler-tts/parler-tts-large-v1 | Q4 | 49.90 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 49.85 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 49.83 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 49.82 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 49.77 tok/sEstimated Auto-generated benchmark | 4GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 49.60 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 49.37 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 49.09 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 49.04 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 49.03 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 49.01 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 48.91 tok/sEstimated Auto-generated benchmark | 5GB |
| zai-org/GLM-4.5-Air | Q4 | 48.89 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 48.87 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 48.45 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 48.44 tok/sEstimated Auto-generated benchmark | 3GB |
| dicta-il/dictalm2.0-instruct | Q4 | 48.32 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 48.25 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 48.15 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 48.10 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 47.99 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 47.99 tok/sEstimated Auto-generated benchmark | 5GB |
| facebook/opt-125m | Q4 | 47.92 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 47.77 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 47.73 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 47.71 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 47.62 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 47.62 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 47.57 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 47.44 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 47.10 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 46.98 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 46.98 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 46.88 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 46.80 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 46.74 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 46.59 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 46.38 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 46.32 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 46.12 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 45.86 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 45.76 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 45.53 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 45.26 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 45.21 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 45.19 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 44.41 tok/sEstimated Auto-generated benchmark | 5GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 44.26 tok/sEstimated Auto-generated benchmark | 3GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 43.95 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 43.92 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q8 | 43.33 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-14B-Base | Q4 | 43.32 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 43.29 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q8 | 43.28 tok/sEstimated Auto-generated benchmark | 4GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 43.02 tok/sEstimated Auto-generated benchmark | 5GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 43.02 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-2-9b-it | Q4 | 42.85 tok/sEstimated Auto-generated benchmark | 6GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 42.68 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 42.65 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 42.53 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B | Q4 | 42.24 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 42.20 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 41.91 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 41.86 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/VibeVoice-1.5B | Q8 | 41.61 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-0.6B | Q8 | 41.48 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-4B | Q8 | 41.39 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 40.71 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 40.65 tok/sEstimated Auto-generated benchmark | 6GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 40.43 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 40.35 tok/sEstimated Auto-generated benchmark | 9GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 40.22 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 40.07 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 39.78 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B | Q8 | 39.74 tok/sEstimated Auto-generated benchmark | 5GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 39.39 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-small | Q8 | 39.35 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 39.27 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 39.20 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 39.15 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 39.09 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 38.96 tok/sEstimated Auto-generated benchmark | 7GB |
| vikhyatk/moondream2 | Q8 | 38.63 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.6-FP8 | Q8 | 38.61 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 38.43 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 38.28 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 38.05 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 37.95 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 37.72 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 37.70 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 37.55 tok/sEstimated Auto-generated benchmark | 5GB |
| google/gemma-3-270m-it | Q8 | 37.49 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 37.45 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 37.42 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 37.40 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 37.39 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 37.33 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 37.31 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 37.29 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 37.21 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 37.17 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 37.16 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 36.99 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 36.88 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 36.86 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 36.83 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 36.56 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 36.54 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 36.53 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 36.52 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 36.41 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 36.32 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 36.32 tok/sEstimated Auto-generated benchmark | 7GB |
| dicta-il/dictalm2.0-instruct | Q8 | 36.29 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 36.26 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 36.24 tok/sEstimated Auto-generated benchmark | 9GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 36.24 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 36.23 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 36.15 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 36.11 tok/sEstimated Auto-generated benchmark | 10GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 36.10 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 35.99 tok/sEstimated Auto-generated benchmark | 8GB |
| distilbert/distilgpt2 | Q8 | 35.93 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 35.91 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 35.84 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 35.70 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 35.69 tok/sEstimated Auto-generated benchmark | 10GB |
| parler-tts/parler-tts-large-v1 | Q8 | 35.65 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-2-9b-it | Q8 | 35.38 tok/sEstimated Auto-generated benchmark | 11GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 35.36 tok/sEstimated Auto-generated benchmark | 7GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 35.34 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 35.32 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 35.31 tok/sEstimated Auto-generated benchmark | 8GB |
| numind/NuExtract-1.5 | Q8 | 35.17 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 35.06 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 34.98 tok/sEstimated Auto-generated benchmark | 8GB |
| EleutherAI/pythia-70m-deduped | Q8 | 34.83 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 34.70 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 34.07 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 34.05 tok/sEstimated Auto-generated benchmark | 7GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 34.03 tok/sEstimated Auto-generated benchmark | 10GB |
| rednote-hilab/dots.ocr | Q8 | 33.95 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 33.83 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 33.70 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 33.64 tok/sEstimated Auto-generated benchmark | 8GB |
| bigscience/bloomz-560m | Q8 | 33.63 tok/sEstimated Auto-generated benchmark | 7GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 33.57 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 33.43 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 33.23 tok/sEstimated Auto-generated benchmark | 8GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 33.22 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 33.13 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 33.11 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 32.91 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 32.89 tok/sEstimated Auto-generated benchmark | 8GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 32.87 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 32.86 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 32.85 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 32.76 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-xl | Q8 | 32.75 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 32.71 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 32.68 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 32.68 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 32.62 tok/sEstimated Auto-generated benchmark | 8GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 32.59 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 32.57 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 32.57 tok/sEstimated Auto-generated benchmark | 8GB |
| google/gemma-2-27b-it | Q4 | 32.55 tok/sEstimated Auto-generated benchmark | 16GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 32.46 tok/sEstimated Auto-generated benchmark | 13GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 32.45 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B | Q8 | 31.82 tok/sEstimated Auto-generated benchmark | 8GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 31.73 tok/sEstimated Auto-generated benchmark | 8GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 31.66 tok/sEstimated Auto-generated benchmark | 8GB |
| openai/gpt-oss-20b | Q4 | 31.46 tok/sEstimated Auto-generated benchmark | 10GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 31.41 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q4 | 31.39 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 31.33 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 31.16 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 31.06 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 31.03 tok/sEstimated Auto-generated benchmark | 15GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 31.03 tok/sEstimated Auto-generated benchmark | 9GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 30.85 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 30.85 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 30.63 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-32B | Q4 | 30.47 tok/sEstimated Auto-generated benchmark | 16GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 30.35 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 30.16 tok/sEstimated Auto-generated benchmark | 16GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 30.02 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 29.90 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | 29.78 tok/sEstimated Auto-generated benchmark | 19GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | 29.49 tok/sEstimated Auto-generated benchmark | 19GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 29.31 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 29.30 tok/sEstimated Auto-generated benchmark | 19GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 29.14 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/QwQ-32B-Preview | Q4 | 28.75 tok/sEstimated Auto-generated benchmark | 19GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 28.27 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-14B | Q8 | 28.27 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-14B-Base | Q8 | 28.03 tok/sEstimated Auto-generated benchmark | 14GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 27.77 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 27.67 tok/sEstimated Auto-generated benchmark | 16GB |
| 01-ai/Yi-1.5-34B-Chat | Q4 | 27.59 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen3-32B | Q4 | 27.27 tok/sEstimated Auto-generated benchmark | 16GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 26.83 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-14B | Q8 | 26.58 tok/sEstimated Auto-generated benchmark | 14GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 26.30 tok/sEstimated Auto-generated benchmark | 13GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 26.30 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 26.24 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 26.01 tok/sEstimated Auto-generated benchmark | 13GB |
| codellama/CodeLlama-34b-hf | Q4 | 25.79 tok/sEstimated Auto-generated benchmark | 17GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 25.32 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 25.26 tok/sEstimated Auto-generated benchmark | 17GB |
| openai/gpt-oss-20b | Q8 | 23.36 tok/sEstimated Auto-generated benchmark | 20GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 23.35 tok/sEstimated Auto-generated benchmark | 20GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 22.19 tok/sEstimated Auto-generated benchmark | 20GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | Not supported | — | 79GB (have 24GB) |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | Not supported | — | 40GB (have 24GB) |
| 01-ai/Yi-1.5-34B-Chat | Q8 | Not supported | — | 39GB (have 24GB) |
| 01-ai/Yi-1.5-34B-Chat | Q4 | Fits comfortably | 27.59 tok/sEstimated | 20GB (have 24GB) |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | Fits comfortably | 36.24 tok/sEstimated | 9GB (have 24GB) |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | Fits comfortably | 48.91 tok/sEstimated | 5GB (have 24GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | Not supported | — | 79GB (have 24GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | Not supported | — | 40GB (have 24GB) |
| microsoft/Phi-3-medium-128k-instruct | Q8 | Fits comfortably | 26.83 tok/sEstimated | 16GB (have 24GB) |
| microsoft/Phi-3-medium-128k-instruct | Q4 | Fits comfortably | 37.31 tok/sEstimated | 8GB (have 24GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 47.99 tok/sEstimated | 5GB (have 24GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 61.91 tok/sEstimated | 3GB (have 24GB) |
| google/gemma-2-9b-it | Q8 | Fits comfortably | 35.38 tok/sEstimated | 11GB (have 24GB) |
| google/gemma-2-9b-it | Q4 | Fits comfortably | 42.85 tok/sEstimated | 6GB (have 24GB) |
| google/gemma-2-27b-it | Q8 | Not supported | — | 31GB (have 24GB) |
| google/gemma-2-27b-it | Q4 | Fits comfortably | 32.55 tok/sEstimated | 16GB (have 24GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | Not supported | — | 25GB (have 24GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | Fits comfortably | 32.46 tok/sEstimated | 13GB (have 24GB) |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | Not supported | — | 138GB (have 24GB) |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | Not supported | — | 69GB (have 24GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | Not supported | — | 158GB (have 24GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | Not supported | — | 79GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 45.19 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 65.51 tok/sEstimated | 2GB (have 24GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 30.85 tok/sEstimated | 9GB (have 24GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 48.45 tok/sEstimated | 5GB (have 24GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 79GB (have 24GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 40GB (have 24GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 79GB (have 24GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 40GB (have 24GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | Not supported | — | 38GB (have 24GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | Fits comfortably | 29.49 tok/sEstimated | 19GB (have 24GB) |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | Not supported | — | 264GB (have 24GB) |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | Not supported | — | 132GB (have 24GB) |
| deepseek-ai/DeepSeek-V2.5 | Q8 | Not supported | — | 264GB (have 24GB) |
| deepseek-ai/DeepSeek-V2.5 | Q4 | Not supported | — | 132GB (have 24GB) |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | Not supported | — | 82GB (have 24GB) |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | Not supported | — | 41GB (have 24GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 32.86 tok/sEstimated | 9GB (have 24GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 54.13 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 25.26 tok/sEstimated | 17GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 40.35 tok/sEstimated | 9GB (have 24GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 37GB (have 24GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits comfortably | 29.30 tok/sEstimated | 19GB (have 24GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | Not supported | — | 37GB (have 24GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | Fits comfortably | 29.78 tok/sEstimated | 19GB (have 24GB) |
| Qwen/QwQ-32B-Preview | Q8 | Not supported | — | 37GB (have 24GB) |
| Qwen/QwQ-32B-Preview | Q4 | Fits comfortably | 28.75 tok/sEstimated | 19GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 82GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 41GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 24GB) |
| ai-forever/ruGPT-3.5-13B | Q8 | Fits comfortably | 26.30 tok/sEstimated | 13GB (have 24GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 43.02 tok/sEstimated | 7GB (have 24GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 24GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Fits comfortably | 30.35 tok/sEstimated | 16GB (have 24GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 39.09 tok/sEstimated | 7GB (have 24GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 51.36 tok/sEstimated | 4GB (have 24GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 31.66 tok/sEstimated | 8GB (have 24GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 45.21 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 36.32 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 46.98 tok/sEstimated | 4GB (have 24GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Fits comfortably | 22.19 tok/sEstimated | 20GB (have 24GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits comfortably | 36.11 tok/sEstimated | 10GB (have 24GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 36.41 tok/sEstimated | 7GB (have 24GB) |
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 48.87 tok/sEstimated | 4GB (have 24GB) |
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 36.29 tok/sEstimated | 7GB (have 24GB) |
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 48.32 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Fits comfortably | 26.30 tok/sEstimated | 15GB (have 24GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 33.22 tok/sEstimated | 8GB (have 24GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 51.73 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Fits comfortably | 27.77 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 38.28 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 58.89 tok/sEstimated | 3GB (have 24GB) |
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 36.32 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 56.42 tok/sEstimated | 4GB (have 24GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 43.95 tok/sEstimated | 5GB (have 24GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 62.09 tok/sEstimated | 3GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Fits comfortably | 26.24 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Fits comfortably | 31.03 tok/sEstimated | 15GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Fits comfortably | 31.41 tok/sEstimated | 15GB (have 24GB) |
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 24GB) |
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 24GB) |
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 65.47 tok/sEstimated | 1GB (have 24GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 106.60 tok/sEstimated | 1GB (have 24GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 31.73 tok/sEstimated | 8GB (have 24GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 46.12 tok/sEstimated | 4GB (have 24GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits comfortably | 31.03 tok/sEstimated | 9GB (have 24GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 43.02 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 50.00 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 72.01 tok/sEstimated | 2GB (have 24GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 36.24 tok/sEstimated | 7GB (have 24GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 49.60 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Fits comfortably | 26.01 tok/sEstimated | 13GB (have 24GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 42.20 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 24GB) |
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 75.64 tok/sEstimated | 1GB (have 24GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 105.80 tok/sEstimated | 1GB (have 24GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 52.36 tok/sEstimated | 3GB (have 24GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 67.23 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 24GB) |
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 32.91 tok/sEstimated | 7GB (have 24GB) |
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 55.91 tok/sEstimated | 4GB (have 24GB) |
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 33.43 tok/sEstimated | 7GB (have 24GB) |
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 47.73 tok/sEstimated | 4GB (have 24GB) |
| google/gemma-3-270m-it | Q8 | Fits comfortably | 37.49 tok/sEstimated | 7GB (have 24GB) |
| google/gemma-3-270m-it | Q4 | Fits comfortably | 55.14 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 43.92 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 57.07 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 24GB) |
| Qwen/Qwen2.5-32B | Q4 | Fits comfortably | 30.47 tok/sEstimated | 16GB (have 24GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 35.65 tok/sEstimated | 7GB (have 24GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 49.90 tok/sEstimated | 4GB (have 24GB) |
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 34.83 tok/sEstimated | 7GB (have 24GB) |
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 55.21 tok/sEstimated | 4GB (have 24GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 41.61 tok/sEstimated | 5GB (have 24GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 60.74 tok/sEstimated | 3GB (have 24GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 51.39 tok/sEstimated | 2GB (have 24GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 79.74 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 24GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 35.06 tok/sEstimated | 7GB (have 24GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 55.24 tok/sEstimated | 4GB (have 24GB) |
| google/gemma-2b | Q8 | Fits comfortably | 60.29 tok/sEstimated | 2GB (have 24GB) |
| google/gemma-2b | Q4 | Fits comfortably | 83.95 tok/sEstimated | 1GB (have 24GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 32.87 tok/sEstimated | 7GB (have 24GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 48.25 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 24GB) |
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 24GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 37.42 tok/sEstimated | 8GB (have 24GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 49.09 tok/sEstimated | 4GB (have 24GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 39.15 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 49.82 tok/sEstimated | 4GB (have 24GB) |
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 35.34 tok/sEstimated | 7GB (have 24GB) |
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 47.62 tok/sEstimated | 4GB (have 24GB) |
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 39.27 tok/sEstimated | 7GB (have 24GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 49.83 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 46.80 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 66.17 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Fits comfortably | 29.14 tok/sEstimated | 15GB (have 24GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 35.99 tok/sEstimated | 8GB (have 24GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 49.96 tok/sEstimated | 4GB (have 24GB) |
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 65.30 tok/sEstimated | 1GB (have 24GB) |
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 107.37 tok/sEstimated | 1GB (have 24GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 35.31 tok/sEstimated | 8GB (have 24GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 51.61 tok/sEstimated | 4GB (have 24GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 24GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 24GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 37.95 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 48.15 tok/sEstimated | 4GB (have 24GB) |
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 35.17 tok/sEstimated | 7GB (have 24GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 50.16 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 37.39 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 56.02 tok/sEstimated | 4GB (have 24GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 33.57 tok/sEstimated | 7GB (have 24GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 53.94 tok/sEstimated | 4GB (have 24GB) |
| huggyllama/llama-7b | Q8 | Fits comfortably | 33.70 tok/sEstimated | 7GB (have 24GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 53.59 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 38.05 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 51.21 tok/sEstimated | 4GB (have 24GB) |
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 36.10 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 46.38 tok/sEstimated | 4GB (have 24GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 36.52 tok/sEstimated | 7GB (have 24GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 55.86 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 34.07 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 49.37 tok/sEstimated | 4GB (have 24GB) |
| openai-community/gpt2-xl | Q8 | Fits comfortably | 32.75 tok/sEstimated | 7GB (have 24GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 49.04 tok/sEstimated | 4GB (have 24GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Fits comfortably | 30.02 tok/sEstimated | 14GB (have 24GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 40.43 tok/sEstimated | 7GB (have 24GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 24GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 40.22 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 60.07 tok/sEstimated | 2GB (have 24GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 50.18 tok/sEstimated | 3GB (have 24GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 73.56 tok/sEstimated | 2GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 43.29 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 67.75 tok/sEstimated | 2GB (have 24GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 52.36 tok/sEstimated | 3GB (have 24GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 66.63 tok/sEstimated | 2GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 42.68 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 66.62 tok/sEstimated | 2GB (have 24GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 47.44 tok/sEstimated | 3GB (have 24GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 64.98 tok/sEstimated | 2GB (have 24GB) |
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 36.83 tok/sEstimated | 7GB (have 24GB) |
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 54.31 tok/sEstimated | 4GB (have 24GB) |
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 24GB) |
| codellama/CodeLlama-34b-hf | Q4 | Fits comfortably | 25.79 tok/sEstimated | 17GB (have 24GB) |
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 72.10 tok/sEstimated | 1GB (have 24GB) |
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 109.90 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 38.43 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 60.55 tok/sEstimated | 3GB (have 24GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 61.30 tok/sEstimated | 2GB (have 24GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 76.52 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen2.5-14B | Q8 | Fits comfortably | 28.27 tok/sEstimated | 14GB (have 24GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 42.24 tok/sEstimated | 7GB (have 24GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 24GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Fits comfortably | 30.16 tok/sEstimated | 16GB (have 24GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 33.11 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 56.45 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 43.28 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 66.19 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 34.70 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 55.28 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 34.05 tok/sEstimated | 7GB (have 24GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 54.13 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-14B-Base | Q8 | Fits comfortably | 28.03 tok/sEstimated | 14GB (have 24GB) |
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 43.32 tok/sEstimated | 7GB (have 24GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 32.45 tok/sEstimated | 8GB (have 24GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 45.76 tok/sEstimated | 4GB (have 24GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 37.29 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 54.36 tok/sEstimated | 4GB (have 24GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 37.33 tok/sEstimated | 7GB (have 24GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 56.25 tok/sEstimated | 4GB (have 24GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 36.56 tok/sEstimated | 7GB (have 24GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 51.64 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 44.41 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 54.24 tok/sEstimated | 3GB (have 24GB) |
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 30.85 tok/sEstimated | 8GB (have 24GB) |
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 53.64 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-30B-A3B | Q4 | Fits comfortably | 31.39 tok/sEstimated | 15GB (have 24GB) |
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 35.70 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 47.57 tok/sEstimated | 4GB (have 24GB) |
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 39.35 tok/sEstimated | 7GB (have 24GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 52.38 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 34.98 tok/sEstimated | 8GB (have 24GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 45.86 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Fits comfortably | 31.06 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 41.91 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 66.53 tok/sEstimated | 2GB (have 24GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 36.86 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 49.77 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 32.89 tok/sEstimated | 8GB (have 24GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 47.71 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 40.71 tok/sEstimated | 6GB (have 24GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 54.72 tok/sEstimated | 3GB (have 24GB) |
| openai-community/gpt2-medium | Q8 | Fits comfortably | 32.57 tok/sEstimated | 7GB (have 24GB) |
| openai-community/gpt2-medium | Q4 | Fits comfortably | 50.17 tok/sEstimated | 4GB (have 24GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 39.39 tok/sEstimated | 7GB (have 24GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 50.62 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 40.07 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 63.13 tok/sEstimated | 3GB (have 24GB) |
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 36.15 tok/sEstimated | 7GB (have 24GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 47.62 tok/sEstimated | 4GB (have 24GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Fits comfortably | 23.35 tok/sEstimated | 20GB (have 24GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 35.69 tok/sEstimated | 10GB (have 24GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 24GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 24GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 32.57 tok/sEstimated | 8GB (have 24GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 52.03 tok/sEstimated | 4GB (have 24GB) |
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 33.83 tok/sEstimated | 7GB (have 24GB) |
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 48.89 tok/sEstimated | 4GB (have 24GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 35.32 tok/sEstimated | 7GB (have 24GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 47.77 tok/sEstimated | 4GB (have 24GB) |
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 55.67 tok/sEstimated | 2GB (have 24GB) |
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 86.02 tok/sEstimated | 1GB (have 24GB) |
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 36.53 tok/sEstimated | 7GB (have 24GB) |
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 49.03 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 24GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits comfortably | 29.31 tok/sEstimated | 16GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 33.13 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 56.18 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 33.23 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 45.53 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 37.72 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 56.03 tok/sEstimated | 4GB (have 24GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 36.26 tok/sEstimated | 7GB (have 24GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 48.10 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 53.06 tok/sEstimated | 3GB (have 24GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 72.93 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 36.88 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 54.32 tok/sEstimated | 3GB (have 24GB) |
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 36.99 tok/sEstimated | 7GB (have 24GB) |
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 53.24 tok/sEstimated | 4GB (have 24GB) |
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 37.21 tok/sEstimated | 7GB (have 24GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 51.54 tok/sEstimated | 4GB (have 24GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 38.61 tok/sEstimated | 7GB (have 24GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 56.02 tok/sEstimated | 4GB (have 24GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 32.59 tok/sEstimated | 7GB (have 24GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 54.65 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 37.40 tok/sEstimated | 8GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 47.10 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 32.68 tok/sEstimated | 7GB (have 24GB) |
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 55.46 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 32.68 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 46.88 tok/sEstimated | 4GB (have 24GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 35.91 tok/sEstimated | 7GB (have 24GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 46.98 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 24GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 24GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 43.33 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 58.61 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen3-14B | Q8 | Fits comfortably | 26.58 tok/sEstimated | 14GB (have 24GB) |
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 37.17 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 31.16 tok/sEstimated | 8GB (have 24GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 52.07 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 24GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 24GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 33.64 tok/sEstimated | 8GB (have 24GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 49.01 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 29.90 tok/sEstimated | 14GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 39.78 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 39.74 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 56.86 tok/sEstimated | 3GB (have 24GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 47.99 tok/sEstimated | 4GB (have 24GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 59.80 tok/sEstimated | 2GB (have 24GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Fits comfortably | 25.32 tok/sEstimated | 20GB (have 24GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 34.03 tok/sEstimated | 10GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 37.55 tok/sEstimated | 5GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 59.08 tok/sEstimated | 3GB (have 24GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 32.62 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 50.54 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 40.65 tok/sEstimated | 6GB (have 24GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 56.98 tok/sEstimated | 3GB (have 24GB) |
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 33.95 tok/sEstimated | 7GB (have 24GB) |
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 54.07 tok/sEstimated | 4GB (have 24GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 52.58 tok/sEstimated | 3GB (have 24GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 71.07 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Fits comfortably | 30.63 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 41.39 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 58.31 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 32.76 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 52.62 tok/sEstimated | 4GB (have 24GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 39.20 tok/sEstimated | 7GB (have 24GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 51.62 tok/sEstimated | 4GB (have 24GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 35.36 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 53.95 tok/sEstimated | 4GB (have 24GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 68.95 tok/sEstimated | 1GB (have 24GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 112.28 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 24GB) |
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 24GB) |
| Qwen/Qwen3-32B | Q4 | Fits comfortably | 27.27 tok/sEstimated | 16GB (have 24GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 42.53 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 58.15 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 38.96 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 54.32 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 31.33 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 46.74 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 76.86 tok/sEstimated | 1GB (have 24GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 111.22 tok/sEstimated | 1GB (have 24GB) |
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 37.45 tok/sEstimated | 7GB (have 24GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 52.39 tok/sEstimated | 4GB (have 24GB) |
| vikhyatk/moondream2 | Q8 | Fits comfortably | 38.63 tok/sEstimated | 7GB (have 24GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 54.40 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 51.03 tok/sEstimated | 3GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 65.33 tok/sEstimated | 2GB (have 24GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 24GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 24GB) |
| distilbert/distilgpt2 | Q8 | Fits comfortably | 35.93 tok/sEstimated | 7GB (have 24GB) |
| distilbert/distilgpt2 | Q4 | Fits comfortably | 46.59 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Fits comfortably | 27.67 tok/sEstimated | 16GB (have 24GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 51.37 tok/sEstimated | 3GB (have 24GB) |
| inference-net/Schematron-3B | Q4 | Fits comfortably | 63.11 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 31.82 tok/sEstimated | 8GB (have 24GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 50.98 tok/sEstimated | 4GB (have 24GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 35.84 tok/sEstimated | 7GB (have 24GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 56.19 tok/sEstimated | 4GB (have 24GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 44.26 tok/sEstimated | 3GB (have 24GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 64.22 tok/sEstimated | 2GB (have 24GB) |
| bigscience/bloomz-560m | Q8 | Fits comfortably | 33.63 tok/sEstimated | 7GB (have 24GB) |
| bigscience/bloomz-560m | Q4 | Fits comfortably | 49.85 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 48.44 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 76.45 tok/sEstimated | 2GB (have 24GB) |
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 24GB) |
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 24GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 76.12 tok/sEstimated | 1GB (have 24GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 112.88 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 45.26 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 67.08 tok/sEstimated | 2GB (have 24GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 37.16 tok/sEstimated | 7GB (have 24GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 53.95 tok/sEstimated | 4GB (have 24GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 72.18 tok/sEstimated | 1GB (have 24GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 95.11 tok/sEstimated | 1GB (have 24GB) |
| facebook/opt-125m | Q8 | Fits comfortably | 32.71 tok/sEstimated | 7GB (have 24GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 47.92 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 41.86 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 62.37 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 37.70 tok/sEstimated | 6GB (have 24GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 55.41 tok/sEstimated | 3GB (have 24GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 65.31 tok/sEstimated | 1GB (have 24GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 95.71 tok/sEstimated | 1GB (have 24GB) |
| openai/gpt-oss-20b | Q8 | Fits comfortably | 23.36 tok/sEstimated | 20GB (have 24GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 31.46 tok/sEstimated | 10GB (have 24GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 24GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Fits comfortably | 28.27 tok/sEstimated | 17GB (have 24GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 36.54 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 49.99 tok/sEstimated | 4GB (have 24GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 42.65 tok/sEstimated | 5GB (have 24GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 52.45 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 41.48 tok/sEstimated | 6GB (have 24GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 51.22 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 32.85 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 46.32 tok/sEstimated | 4GB (have 24GB) |
| openai-community/gpt2 | Q8 | Fits comfortably | 36.23 tok/sEstimated | 7GB (have 24GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 51.75 tok/sEstimated | 4GB (have 24GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
An Ampere builder reports ~18 tok/s on Llama 3 70B Q4 with a single 3090, and ~36 tok/s after adding tensor parallelism across four cards.
Source: Reddit – /r/LocalLLaMA (mqzh3yo)
Enthusiasts routinely see ~100 tokens/sec on Qwen 30B-A3B when tuned on a single RTX 3090, making it a budget-friendly coding workhorse.
Source: Reddit – /r/LocalLLaMA (mqs2r45)
Builders using x1 risers for dual 3080/3090 rigs measured no meaningful tokens/sec loss—the main penalty is slower model swaps, not slower inference.
Source: Reddit – /r/LocalLLaMA (mr10ib4)
RTX 3090 provides 24 GB GDDR6X, draws 350 W, and uses triple 8-pin PCIe power connectors. NVIDIA recommends a 750 W PSU.
Source: TechPowerUp – RTX 3090 Specs
3 Nov 2025 snapshot: Newegg at $999 in stock, Amazon at $1,099 out of stock, and Best Buy at $1,499 out of stock.
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.