Loading GPU data...
Loading GPU data...
Quick Answer: RTX 3080 offers 10GB VRAM and starts around $520.59. It delivers approximately 90 tokens/sec on allenai/OLMo-2-0425-1B. It typically draws 320W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| allenai/OLMo-2-0425-1B | Q4 | 90.42 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 89.56 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 86.19 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 84.89 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 84.59 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 84.47 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 82.79 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 82.64 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 80.70 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 70.61 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 68.60 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 68.47 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 65.86 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 64.94 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 64.77 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-research/PowerMoE-3b | Q4 | 63.55 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q8 | 63.22 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 62.85 tok/sEstimated Auto-generated benchmark | 1GB |
| google-t5/t5-3b | Q4 | 61.32 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 58.52 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 58.32 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 57.87 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 57.19 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 56.44 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q4 | 56.36 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 56.23 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 56.05 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-4B-Base | Q4 | 55.70 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q8 | 55.70 tok/sEstimated Auto-generated benchmark | 1GB |
| bigcode/starcoder2-3b | Q4 | 55.55 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B | Q4 | 55.34 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 55.16 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 55.13 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q8 | 54.36 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 53.55 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 53.47 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 53.09 tok/sEstimated Auto-generated benchmark | 2GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 52.79 tok/sEstimated Auto-generated benchmark | 2GB |
| inference-net/Schematron-3B | Q4 | 52.73 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 52.52 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 52.18 tok/sEstimated Auto-generated benchmark | 3GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 52.15 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 52.00 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 51.72 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 51.48 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 50.94 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 50.79 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B | Q4 | 50.18 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/VibeVoice-1.5B | Q4 | 49.45 tok/sEstimated Auto-generated benchmark | 3GB |
| LiquidAI/LFM2-1.2B | Q8 | 49.42 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 49.28 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B | Q4 | 49.13 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-0.5B | Q4 | 48.69 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-0.6B | Q4 | 46.85 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 46.77 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 46.72 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 46.59 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q4 | 46.46 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 46.44 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 46.43 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 46.33 tok/sEstimated Auto-generated benchmark | 2GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 46.19 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 46.14 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 46.05 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 45.92 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 45.88 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 45.72 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 45.66 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 45.59 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 45.52 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 45.39 tok/sEstimated Auto-generated benchmark | 3GB |
| sshleifer/tiny-gpt2 | Q4 | 45.16 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-2b-it | Q8 | 45.04 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 44.85 tok/sEstimated Auto-generated benchmark | 3GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 44.82 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 44.77 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 44.71 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 44.63 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 44.56 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 44.53 tok/sEstimated Auto-generated benchmark | 4GB |
| parler-tts/parler-tts-large-v1 | Q4 | 44.48 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 44.47 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 44.25 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 44.20 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 44.15 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 44.11 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 44.07 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 44.04 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 43.99 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B | Q4 | 43.99 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 43.86 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-research/PowerMoE-3b | Q8 | 43.73 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 43.70 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 43.69 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 43.68 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.1-8B | Q4 | 43.55 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 43.50 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 43.50 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 43.37 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 43.36 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 43.34 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 43.22 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 42.91 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 42.90 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 42.88 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2b | Q8 | 42.79 tok/sEstimated Auto-generated benchmark | 2GB |
| rednote-hilab/dots.ocr | Q4 | 42.75 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 42.68 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 42.64 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 42.40 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 42.39 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 42.38 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 42.32 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 42.17 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 42.11 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 42.09 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 42.06 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 41.95 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 41.88 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 41.80 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 41.73 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 41.72 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 41.54 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 41.45 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 41.44 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 41.40 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/phi-4 | Q4 | 41.37 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 41.28 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 41.18 tok/sEstimated Auto-generated benchmark | 4GB |
| liuhaotian/llava-v1.5-7b | Q4 | 40.96 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 40.79 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 40.78 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 40.67 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 40.55 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 40.22 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 40.15 tok/sEstimated Auto-generated benchmark | 4GB |
| bigcode/starcoder2-3b | Q8 | 40.13 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 40.05 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 40.00 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 39.89 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 39.80 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q8 | 39.68 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 39.62 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 39.44 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 39.39 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 39.36 tok/sEstimated Auto-generated benchmark | 4GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 39.29 tok/sEstimated Auto-generated benchmark | 5GB |
| inference-net/Schematron-3B | Q8 | 39.28 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 39.11 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 39.08 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 39.08 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 39.04 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 38.91 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 38.82 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 38.78 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 38.70 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 38.61 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 38.54 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B | Q8 | 38.52 tok/sEstimated Auto-generated benchmark | 3GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 38.32 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 37.98 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 37.62 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 37.11 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 37.00 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 36.95 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 36.89 tok/sEstimated Auto-generated benchmark | 5GB |
| google-t5/t5-3b | Q8 | 36.88 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 36.82 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 36.69 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 36.58 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 36.49 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 35.80 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B | Q4 | 35.76 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 35.25 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B | Q8 | 34.76 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 34.34 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 34.07 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 34.00 tok/sEstimated Auto-generated benchmark | 5GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 33.92 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 33.92 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B | Q8 | 33.59 tok/sEstimated Auto-generated benchmark | 4GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 33.15 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 33.04 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q8 | 32.79 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 32.76 tok/sEstimated Auto-generated benchmark | 5GB |
| dicta-il/dictalm2.0-instruct | Q8 | 32.69 tok/sEstimated Auto-generated benchmark | 7GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 32.54 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-small | Q8 | 32.45 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 32.34 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 32.24 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 32.21 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 32.06 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 32.05 tok/sEstimated Auto-generated benchmark | 5GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 31.96 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 31.90 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 31.78 tok/sEstimated Auto-generated benchmark | 7GB |
| numind/NuExtract-1.5 | Q8 | 31.76 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-xl | Q8 | 31.64 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 31.60 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 31.49 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 31.48 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 31.29 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/VibeVoice-1.5B | Q8 | 31.26 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 31.19 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B | Q8 | 31.19 tok/sEstimated Auto-generated benchmark | 5GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 31.17 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 31.15 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B-Base | Q4 | 30.98 tok/sEstimated Auto-generated benchmark | 7GB |
| vikhyatk/moondream2 | Q8 | 30.94 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 30.92 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 30.92 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 30.91 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 30.91 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 30.87 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 30.84 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 30.76 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 30.75 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 30.52 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 30.50 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 30.46 tok/sEstimated Auto-generated benchmark | 8GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 30.36 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 30.35 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 30.34 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B | Q8 | 30.26 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 30.26 tok/sEstimated Auto-generated benchmark | 8GB |
| openai-community/gpt2-medium | Q8 | 30.15 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 29.92 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 29.92 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 29.88 tok/sEstimated Auto-generated benchmark | 7GB |
| openai/gpt-oss-20b | Q4 | 29.80 tok/sEstimated Auto-generated benchmark | 10GB |
| microsoft/phi-4 | Q8 | 29.75 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B | Q8 | 29.60 tok/sEstimated Auto-generated benchmark | 6GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 29.45 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 29.35 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 29.33 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 29.32 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 29.31 tok/sEstimated Auto-generated benchmark | 6GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 29.27 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 29.08 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 29.08 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 29.05 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 28.77 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 28.76 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 28.53 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 28.46 tok/sEstimated Auto-generated benchmark | 6GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 28.44 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 28.40 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 28.38 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 28.09 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/phi-2 | Q8 | 27.84 tok/sEstimated Auto-generated benchmark | 7GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 27.83 tok/sEstimated Auto-generated benchmark | 10GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 27.73 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 27.62 tok/sEstimated Auto-generated benchmark | 8GB |
| EleutherAI/gpt-neo-125m | Q8 | 27.62 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 27.61 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 27.61 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 27.57 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 27.53 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 27.49 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 27.48 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 27.29 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 27.29 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 27.29 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 27.27 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 27.26 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 27.19 tok/sEstimated Auto-generated benchmark | 8GB |
| rednote-hilab/dots.ocr | Q8 | 27.15 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 27.11 tok/sEstimated Auto-generated benchmark | 8GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 27.03 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 26.99 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.6-FP8 | Q8 | 26.95 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 26.95 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 26.93 tok/sEstimated Auto-generated benchmark | 8GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 26.86 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 26.85 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 26.54 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 26.53 tok/sEstimated Auto-generated benchmark | 10GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 26.11 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-3.1-8B | Q8 | 26.00 tok/sEstimated Auto-generated benchmark | 8GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 25.95 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B-Base | Q8 | 25.81 tok/sEstimated Auto-generated benchmark | 8GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 25.81 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 25.59 tok/sEstimated Auto-generated benchmark | 10GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 25.59 tok/sEstimated Auto-generated benchmark | 8GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 24.98 tok/sEstimated Auto-generated benchmark | 9GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 36.89 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 39.08 tok/sEstimated | 4GB (have 10GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 52.15 tok/sEstimated | 3GB (have 10GB) |
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 34.07 tok/sEstimated | 6GB (have 10GB) |
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 29.60 tok/sEstimated | 6GB (have 10GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 43.68 tok/sEstimated | 3GB (have 10GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 46.85 tok/sEstimated | 3GB (have 10GB) |
| openai-community/gpt2-medium | Q8 | Fits comfortably | 30.15 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 31.19 tok/sEstimated | 7GB (have 10GB) |
| openai-community/gpt2-medium | Q4 | Fits comfortably | 44.25 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 42.11 tok/sEstimated | 4GB (have 10GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 32.24 tok/sEstimated | 7GB (have 10GB) |
| openai-community/gpt2 | Q8 | Fits comfortably | 30.92 tok/sEstimated | 7GB (have 10GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 44.53 tok/sEstimated | 4GB (have 10GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 38.91 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 35.80 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 46.77 tok/sEstimated | 3GB (have 10GB) |
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 30.75 tok/sEstimated | 7GB (have 10GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 44.47 tok/sEstimated | 4GB (have 10GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Not supported | — | 20GB (have 10GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits (tight) | 26.53 tok/sEstimated | 10GB (have 10GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 10GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 10GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 29.45 tok/sEstimated | 8GB (have 10GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 37.00 tok/sEstimated | 4GB (have 10GB) |
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 32.06 tok/sEstimated | 7GB (have 10GB) |
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 43.50 tok/sEstimated | 4GB (have 10GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 27.73 tok/sEstimated | 7GB (have 10GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 44.82 tok/sEstimated | 4GB (have 10GB) |
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 49.42 tok/sEstimated | 2GB (have 10GB) |
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 62.85 tok/sEstimated | 1GB (have 10GB) |
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 28.76 tok/sEstimated | 7GB (have 10GB) |
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 44.63 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 10GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Not supported | — | 16GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 27.27 tok/sEstimated | 7GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 44.07 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 26.00 tok/sEstimated | 8GB (have 10GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 43.55 tok/sEstimated | 4GB (have 10GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 28.38 tok/sEstimated | 7GB (have 10GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 42.17 tok/sEstimated | 4GB (have 10GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 29.75 tok/sEstimated | 7GB (have 10GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 41.37 tok/sEstimated | 4GB (have 10GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 36.58 tok/sEstimated | 3GB (have 10GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 53.55 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 30.91 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 46.46 tok/sEstimated | 3GB (have 10GB) |
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 27.29 tok/sEstimated | 7GB (have 10GB) |
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 38.61 tok/sEstimated | 4GB (have 10GB) |
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 26.99 tok/sEstimated | 7GB (have 10GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 45.92 tok/sEstimated | 4GB (have 10GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 26.95 tok/sEstimated | 7GB (have 10GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 42.64 tok/sEstimated | 4GB (have 10GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 26.86 tok/sEstimated | 7GB (have 10GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 41.18 tok/sEstimated | 4GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 28.40 tok/sEstimated | 8GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 42.09 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 30.84 tok/sEstimated | 7GB (have 10GB) |
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 44.56 tok/sEstimated | 4GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 29.08 tok/sEstimated | 7GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 39.04 tok/sEstimated | 4GB (have 10GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 27.84 tok/sEstimated | 7GB (have 10GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 39.89 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 10GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 10GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 31.19 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 48.69 tok/sEstimated | 3GB (have 10GB) |
| Qwen/Qwen3-14B | Q8 | Not supported | — | 14GB (have 10GB) |
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 30.92 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 26.93 tok/sEstimated | 8GB (have 10GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 43.70 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 10GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 10GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 30.26 tok/sEstimated | 8GB (have 10GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 36.95 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Not supported | — | 14GB (have 10GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 33.92 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 34.76 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 50.18 tok/sEstimated | 3GB (have 10GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 39.44 tok/sEstimated | 4GB (have 10GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 52.79 tok/sEstimated | 2GB (have 10GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Not supported | — | 20GB (have 10GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits (tight) | 27.83 tok/sEstimated | 10GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 32.05 tok/sEstimated | 5GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 49.28 tok/sEstimated | 3GB (have 10GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 27.11 tok/sEstimated | 8GB (have 10GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 42.06 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 29.31 tok/sEstimated | 6GB (have 10GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 42.38 tok/sEstimated | 3GB (have 10GB) |
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 27.15 tok/sEstimated | 7GB (have 10GB) |
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 42.75 tok/sEstimated | 4GB (have 10GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 36.88 tok/sEstimated | 3GB (have 10GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 61.32 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 10GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Not supported | — | 15GB (have 10GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 33.59 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 49.13 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 31.15 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 41.88 tok/sEstimated | 4GB (have 10GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 32.21 tok/sEstimated | 7GB (have 10GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 46.59 tok/sEstimated | 4GB (have 10GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 29.92 tok/sEstimated | 7GB (have 10GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 40.22 tok/sEstimated | 4GB (have 10GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 55.70 tok/sEstimated | 1GB (have 10GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 90.42 tok/sEstimated | 1GB (have 10GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 10GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 10GB) |
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 10GB) |
| Qwen/Qwen3-32B | Q4 | Not supported | — | 16GB (have 10GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 34.00 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 51.48 tok/sEstimated | 3GB (have 10GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 32.79 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 43.99 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 25.59 tok/sEstimated | 8GB (have 10GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 42.32 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 63.22 tok/sEstimated | 1GB (have 10GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 84.59 tok/sEstimated | 1GB (have 10GB) |
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 29.32 tok/sEstimated | 7GB (have 10GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 45.66 tok/sEstimated | 4GB (have 10GB) |
| vikhyatk/moondream2 | Q8 | Fits comfortably | 30.94 tok/sEstimated | 7GB (have 10GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 45.88 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 41.40 tok/sEstimated | 3GB (have 10GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 58.32 tok/sEstimated | 2GB (have 10GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 10GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 10GB) |
| distilbert/distilgpt2 | Q8 | Fits comfortably | 27.29 tok/sEstimated | 7GB (have 10GB) |
| distilbert/distilgpt2 | Q4 | Fits comfortably | 40.79 tok/sEstimated | 4GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 10GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Not supported | — | 16GB (have 10GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 39.28 tok/sEstimated | 3GB (have 10GB) |
| inference-net/Schematron-3B | Q4 | Fits comfortably | 52.73 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 30.26 tok/sEstimated | 8GB (have 10GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 41.73 tok/sEstimated | 4GB (have 10GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 31.96 tok/sEstimated | 7GB (have 10GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 41.45 tok/sEstimated | 4GB (have 10GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 38.32 tok/sEstimated | 3GB (have 10GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 57.87 tok/sEstimated | 2GB (have 10GB) |
| bigscience/bloomz-560m | Q8 | Fits comfortably | 27.26 tok/sEstimated | 7GB (have 10GB) |
| bigscience/bloomz-560m | Q4 | Fits comfortably | 42.40 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 43.34 tok/sEstimated | 3GB (have 10GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 56.44 tok/sEstimated | 2GB (have 10GB) |
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 10GB) |
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 10GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 58.52 tok/sEstimated | 1GB (have 10GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 82.64 tok/sEstimated | 1GB (have 10GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 33.04 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 51.72 tok/sEstimated | 2GB (have 10GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 30.36 tok/sEstimated | 7GB (have 10GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 45.59 tok/sEstimated | 4GB (have 10GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 56.05 tok/sEstimated | 1GB (have 10GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 89.56 tok/sEstimated | 1GB (have 10GB) |
| facebook/opt-125m | Q8 | Fits comfortably | 29.88 tok/sEstimated | 7GB (have 10GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 44.04 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 30.52 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 44.85 tok/sEstimated | 3GB (have 10GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 28.46 tok/sEstimated | 6GB (have 10GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 44.15 tok/sEstimated | 3GB (have 10GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 54.36 tok/sEstimated | 1GB (have 10GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 82.79 tok/sEstimated | 1GB (have 10GB) |
| openai/gpt-oss-20b | Q8 | Not supported | — | 20GB (have 10GB) |
| openai/gpt-oss-20b | Q4 | Fits (tight) | 29.80 tok/sEstimated | 10GB (have 10GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 10GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Not supported | — | 17GB (have 10GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 28.77 tok/sEstimated | 8GB (have 10GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 43.36 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 10GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 10GB) |
| ai-forever/ruGPT-3.5-13B | Q8 | Not supported | — | 13GB (have 10GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 35.25 tok/sEstimated | 7GB (have 10GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 10GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Not supported | — | 16GB (have 10GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 28.44 tok/sEstimated | 7GB (have 10GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 41.72 tok/sEstimated | 4GB (have 10GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 27.19 tok/sEstimated | 8GB (have 10GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 42.68 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 27.49 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 45.52 tok/sEstimated | 4GB (have 10GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Not supported | — | 20GB (have 10GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits (tight) | 25.59 tok/sEstimated | 10GB (have 10GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 29.92 tok/sEstimated | 7GB (have 10GB) |
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 42.39 tok/sEstimated | 4GB (have 10GB) |
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 32.69 tok/sEstimated | 7GB (have 10GB) |
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 41.95 tok/sEstimated | 4GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Not supported | — | 15GB (have 10GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 25.81 tok/sEstimated | 8GB (have 10GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 37.62 tok/sEstimated | 4GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Not supported | — | 15GB (have 10GB) |
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 36.69 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 52.18 tok/sEstimated | 3GB (have 10GB) |
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 27.29 tok/sEstimated | 7GB (have 10GB) |
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 44.77 tok/sEstimated | 4GB (have 10GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 33.92 tok/sEstimated | 5GB (have 10GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 45.39 tok/sEstimated | 3GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Not supported | — | 15GB (have 10GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 10GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Not supported | — | 15GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 10GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Not supported | — | 15GB (have 10GB) |
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 10GB) |
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 10GB) |
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 65.86 tok/sEstimated | 1GB (have 10GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 80.70 tok/sEstimated | 1GB (have 10GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 28.09 tok/sEstimated | 8GB (have 10GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 41.54 tok/sEstimated | 4GB (have 10GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits (tight) | 24.98 tok/sEstimated | 9GB (have 10GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 39.29 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 38.52 tok/sEstimated | 3GB (have 10GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 55.34 tok/sEstimated | 2GB (have 10GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 27.03 tok/sEstimated | 7GB (have 10GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 46.19 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Not supported | — | 13GB (have 10GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 30.76 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 10GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 10GB) |
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 64.94 tok/sEstimated | 1GB (have 10GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 86.19 tok/sEstimated | 1GB (have 10GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 40.13 tok/sEstimated | 3GB (have 10GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 55.55 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 10GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 10GB) |
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 31.48 tok/sEstimated | 7GB (have 10GB) |
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 41.28 tok/sEstimated | 4GB (have 10GB) |
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 30.91 tok/sEstimated | 7GB (have 10GB) |
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 40.55 tok/sEstimated | 4GB (have 10GB) |
| google/gemma-3-270m-it | Q8 | Fits comfortably | 27.53 tok/sEstimated | 7GB (have 10GB) |
| google/gemma-3-270m-it | Q4 | Fits comfortably | 38.70 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 40.00 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 53.09 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 10GB) |
| Qwen/Qwen2.5-32B | Q4 | Not supported | — | 16GB (have 10GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 31.29 tok/sEstimated | 7GB (have 10GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 44.48 tok/sEstimated | 4GB (have 10GB) |
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 32.34 tok/sEstimated | 7GB (have 10GB) |
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 40.15 tok/sEstimated | 4GB (have 10GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 31.26 tok/sEstimated | 5GB (have 10GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 49.45 tok/sEstimated | 3GB (have 10GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 46.33 tok/sEstimated | 2GB (have 10GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 68.60 tok/sEstimated | 1GB (have 10GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 10GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 10GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 26.95 tok/sEstimated | 7GB (have 10GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 40.96 tok/sEstimated | 4GB (have 10GB) |
| google/gemma-2b | Q8 | Fits comfortably | 42.79 tok/sEstimated | 2GB (have 10GB) |
| google/gemma-2b | Q4 | Fits comfortably | 68.47 tok/sEstimated | 1GB (have 10GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 31.17 tok/sEstimated | 7GB (have 10GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 41.44 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 10GB) |
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 10GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 30.35 tok/sEstimated | 8GB (have 10GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 40.05 tok/sEstimated | 4GB (have 10GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 31.78 tok/sEstimated | 7GB (have 10GB) |
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 42.90 tok/sEstimated | 4GB (have 10GB) |
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 32.54 tok/sEstimated | 7GB (have 10GB) |
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 44.20 tok/sEstimated | 4GB (have 10GB) |
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 29.05 tok/sEstimated | 7GB (have 10GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 39.62 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 38.78 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 53.47 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 10GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Not supported | — | 15GB (have 10GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 27.62 tok/sEstimated | 8GB (have 10GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 43.69 tok/sEstimated | 4GB (have 10GB) |
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 64.77 tok/sEstimated | 1GB (have 10GB) |
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 84.89 tok/sEstimated | 1GB (have 10GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 25.95 tok/sEstimated | 8GB (have 10GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 40.78 tok/sEstimated | 4GB (have 10GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 10GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 10GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 29.33 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 38.82 tok/sEstimated | 4GB (have 10GB) |
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 31.76 tok/sEstimated | 7GB (have 10GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 39.36 tok/sEstimated | 4GB (have 10GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 30.87 tok/sEstimated | 7GB (have 10GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 43.22 tok/sEstimated | 4GB (have 10GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 29.27 tok/sEstimated | 7GB (have 10GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 44.71 tok/sEstimated | 4GB (have 10GB) |
| huggyllama/llama-7b | Q8 | Fits comfortably | 27.48 tok/sEstimated | 7GB (have 10GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 38.54 tok/sEstimated | 4GB (have 10GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 27.57 tok/sEstimated | 7GB (have 10GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 46.44 tok/sEstimated | 4GB (have 10GB) |
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 26.85 tok/sEstimated | 7GB (have 10GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 39.11 tok/sEstimated | 4GB (have 10GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 27.61 tok/sEstimated | 7GB (have 10GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 45.16 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 30.46 tok/sEstimated | 8GB (have 10GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 44.11 tok/sEstimated | 4GB (have 10GB) |
| openai-community/gpt2-xl | Q8 | Fits comfortably | 31.64 tok/sEstimated | 7GB (have 10GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 46.05 tok/sEstimated | 4GB (have 10GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Not supported | — | 14GB (have 10GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 33.15 tok/sEstimated | 7GB (have 10GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 10GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 36.49 tok/sEstimated | 4GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 50.79 tok/sEstimated | 2GB (have 10GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 43.73 tok/sEstimated | 3GB (have 10GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 63.55 tok/sEstimated | 2GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 37.98 tok/sEstimated | 4GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 56.23 tok/sEstimated | 2GB (have 10GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 43.37 tok/sEstimated | 3GB (have 10GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 57.19 tok/sEstimated | 2GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 39.39 tok/sEstimated | 4GB (have 10GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 55.16 tok/sEstimated | 2GB (have 10GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 43.99 tok/sEstimated | 3GB (have 10GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 56.36 tok/sEstimated | 2GB (have 10GB) |
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 27.62 tok/sEstimated | 7GB (have 10GB) |
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 42.88 tok/sEstimated | 4GB (have 10GB) |
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 10GB) |
| codellama/CodeLlama-34b-hf | Q4 | Not supported | — | 17GB (have 10GB) |
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 55.13 tok/sEstimated | 1GB (have 10GB) |
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 84.47 tok/sEstimated | 1GB (have 10GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 32.76 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 50.94 tok/sEstimated | 3GB (have 10GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 45.04 tok/sEstimated | 2GB (have 10GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 70.61 tok/sEstimated | 1GB (have 10GB) |
| Qwen/Qwen2.5-14B | Q8 | Not supported | — | 14GB (have 10GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 35.76 tok/sEstimated | 7GB (have 10GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 10GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Not supported | — | 16GB (have 10GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 31.60 tok/sEstimated | 7GB (have 10GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 41.80 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 39.68 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 55.70 tok/sEstimated | 2GB (have 10GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 29.35 tok/sEstimated | 7GB (have 10GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 46.72 tok/sEstimated | 4GB (have 10GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 31.49 tok/sEstimated | 7GB (have 10GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 43.86 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-14B-Base | Q8 | Not supported | — | 14GB (have 10GB) |
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 30.98 tok/sEstimated | 7GB (have 10GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 26.11 tok/sEstimated | 8GB (have 10GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 37.11 tok/sEstimated | 4GB (have 10GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 27.61 tok/sEstimated | 7GB (have 10GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 46.43 tok/sEstimated | 4GB (have 10GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 30.34 tok/sEstimated | 7GB (have 10GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 39.08 tok/sEstimated | 4GB (have 10GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 31.90 tok/sEstimated | 7GB (have 10GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 39.80 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 34.34 tok/sEstimated | 5GB (have 10GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 52.52 tok/sEstimated | 3GB (have 10GB) |
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 26.54 tok/sEstimated | 8GB (have 10GB) |
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 42.91 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 10GB) |
| Qwen/Qwen3-30B-A3B | Q4 | Not supported | — | 15GB (have 10GB) |
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 28.53 tok/sEstimated | 7GB (have 10GB) |
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 46.14 tok/sEstimated | 4GB (have 10GB) |
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 32.45 tok/sEstimated | 7GB (have 10GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 45.72 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 30.50 tok/sEstimated | 8GB (have 10GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 40.67 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 10GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Not supported | — | 15GB (have 10GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 36.82 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 52.00 tok/sEstimated | 2GB (have 10GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 29.08 tok/sEstimated | 7GB (have 10GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 43.50 tok/sEstimated | 4GB (have 10GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 28.12 tok/sEstimated | 8GB (have 10GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
Owners running Qwen3-30B-A3B on a 10 GB RTX 3080 report roughly 15 tokens/sec after tuning, keeping interactive coding prompts responsive.
Source: Reddit – /r/LocalLLaMA (mquvxwc)
Some spec sheets assume higher ceilings, but real-world users note they already achieve ~10 tok/sec on a 10 GB 3080—showing how tuning beats blanket requirements.
Source: Reddit – /r/LocalLLaMA (mj408ke)
With larger context windows, Ollama reports 40% of layers moving to system RAM even on 12B models—illustrating the need to tune gpu_layers on 10 GB cards.
Source: Reddit – /r/LocalLLaMA (mnspe0d)
The RTX 3080 Founders Edition includes 10 GB GDDR6X, a 320 W board power, triple 8-pin power connectors, and NVIDIA recommends a 750 W PSU.
Source: TechPowerUp – RTX 3080 Specs
Latest snapshot (3 Nov 2025): Amazon at $699 out of stock, Newegg at $729 in stock, Best Buy at $699 out of stock.
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3090 stacks up for local inference workloads.