Loading GPU data...
Loading GPU data...
Quick Answer: RTX 3070 offers 8GB VRAM and starts around $429.49. It delivers approximately 63 tokens/sec on unsloth/gemma-3-1b-it. It typically draws 220W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| unsloth/gemma-3-1b-it | Q4 | 63.30 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 62.13 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 60.22 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 59.90 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 59.38 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 57.84 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 57.33 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 55.02 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 54.16 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 48.24 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 46.15 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 45.86 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 44.15 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B | Q4 | 42.93 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 41.91 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 41.74 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2b | Q4 | 41.70 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 41.70 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q4 | 41.65 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/gemma-3-1b-it | Q8 | 41.54 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 41.49 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 40.50 tok/sEstimated Auto-generated benchmark | 1GB |
| google-t5/t5-3b | Q4 | 39.95 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 39.69 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 39.61 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-1b-it | Q8 | 39.36 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-4B-Base | Q4 | 38.53 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 38.31 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 38.13 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 38.03 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 37.97 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 37.93 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q8 | 37.83 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 37.82 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q8 | 37.08 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 36.39 tok/sEstimated Auto-generated benchmark | 2GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 35.83 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 35.77 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 35.49 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/VibeVoice-1.5B | Q4 | 34.98 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B | Q4 | 34.80 tok/sEstimated Auto-generated benchmark | 2GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 34.71 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 34.26 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 34.24 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 34.02 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 33.87 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 33.81 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-0.6B | Q4 | 33.48 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 33.04 tok/sEstimated Auto-generated benchmark | 3GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 32.82 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 32.75 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 32.70 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 32.68 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B | Q4 | 32.51 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2b | Q8 | 32.45 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 32.12 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 32.04 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 31.91 tok/sEstimated Auto-generated benchmark | 3GB |
| parler-tts/parler-tts-large-v1 | Q4 | 31.59 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q4 | 31.48 tok/sEstimated Auto-generated benchmark | 3GB |
| huggyllama/llama-7b | Q4 | 31.44 tok/sEstimated Auto-generated benchmark | 4GB |
| liuhaotian/llava-v1.5-7b | Q4 | 31.39 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 31.30 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 31.29 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 31.27 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 31.19 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 31.15 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 31.08 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 31.02 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 30.78 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 30.57 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 30.44 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 30.36 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 30.36 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-2b-it | Q8 | 30.35 tok/sEstimated Auto-generated benchmark | 2GB |
| distilbert/distilgpt2 | Q4 | 30.31 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 30.30 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 30.27 tok/sEstimated Auto-generated benchmark | 4GB |
| LiquidAI/LFM2-1.2B | Q8 | 30.16 tok/sEstimated Auto-generated benchmark | 2GB |
| numind/NuExtract-1.5 | Q4 | 30.14 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 30.02 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q4 | 30.00 tok/sEstimated Auto-generated benchmark | 3GB |
| skt/kogpt2-base-v2 | Q4 | 29.96 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 29.91 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 29.90 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B | Q4 | 29.86 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 29.69 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 29.67 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 29.65 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 29.57 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 29.31 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 29.27 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 29.24 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 29.24 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 29.12 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 29.11 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 29.09 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 29.08 tok/sEstimated Auto-generated benchmark | 4GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 29.07 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 29.05 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 29.00 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 28.98 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 28.96 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 28.79 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 28.74 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 28.72 tok/sEstimated Auto-generated benchmark | 4GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 28.71 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 28.68 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 28.65 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 28.57 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 28.45 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 28.42 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 28.42 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 28.36 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 28.32 tok/sEstimated Auto-generated benchmark | 3GB |
| rinna/japanese-gpt-neox-small | Q4 | 28.06 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 28.04 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 28.01 tok/sEstimated Auto-generated benchmark | 4GB |
| bigcode/starcoder2-3b | Q8 | 28.00 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 27.94 tok/sEstimated Auto-generated benchmark | 4GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 27.93 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 27.91 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 27.91 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 27.88 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 27.87 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 27.86 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 27.85 tok/sEstimated Auto-generated benchmark | 3GB |
| sshleifer/tiny-gpt2 | Q4 | 27.58 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 27.51 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B | Q8 | 27.44 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/DialoGPT-small | Q4 | 27.41 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 27.30 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 27.19 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 27.16 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 27.13 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 27.03 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 27.00 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 26.88 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 26.86 tok/sEstimated Auto-generated benchmark | 4GB |
| google-t5/t5-3b | Q8 | 26.85 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 26.81 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 26.79 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 26.77 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 26.76 tok/sEstimated Auto-generated benchmark | 4GB |
| rednote-hilab/dots.ocr | Q4 | 26.68 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q8 | 26.66 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 26.53 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 26.44 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 26.43 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 26.41 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 26.24 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 26.23 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 26.22 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-research/PowerMoE-3b | Q8 | 26.17 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 26.14 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 26.11 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 25.99 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 25.98 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 25.93 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 25.65 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 25.58 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 25.53 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 25.40 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 25.31 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 25.27 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 25.08 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 24.93 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 24.70 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Base | Q8 | 24.45 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 24.07 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 24.00 tok/sEstimated Auto-generated benchmark | 7GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 23.91 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-14B-Base | Q4 | 23.66 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 23.46 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 23.40 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/VibeVoice-1.5B | Q8 | 23.00 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 22.91 tok/sEstimated Auto-generated benchmark | 5GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 22.90 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 22.84 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-0.5B | Q8 | 22.56 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B | Q8 | 22.33 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 22.30 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 22.11 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 22.08 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 21.94 tok/sEstimated Auto-generated benchmark | 7GB |
| dicta-il/dictalm2.0-instruct | Q8 | 21.84 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 21.78 tok/sEstimated Auto-generated benchmark | 6GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 21.72 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 21.71 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 21.69 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 21.55 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.6-FP8 | Q8 | 21.49 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 21.47 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 21.47 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 21.47 tok/sEstimated Auto-generated benchmark | 7GB |
| numind/NuExtract-1.5 | Q8 | 21.44 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 21.39 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 21.36 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-0.5B | Q8 | 21.36 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 21.31 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 21.24 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 21.15 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 21.15 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 21.02 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B | Q4 | 21.00 tok/sEstimated Auto-generated benchmark | 7GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 20.93 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 20.91 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 20.91 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 20.90 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 20.90 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 20.86 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-1.5B | Q8 | 20.76 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 20.73 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 20.70 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 20.68 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 20.66 tok/sEstimated Auto-generated benchmark | 6GB |
| vikhyatk/moondream2 | Q8 | 20.66 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 20.60 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 20.58 tok/sEstimated Auto-generated benchmark | 5GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 20.53 tok/sEstimated Auto-generated benchmark | 8GB |
| parler-tts/parler-tts-large-v1 | Q8 | 20.51 tok/sEstimated Auto-generated benchmark | 7GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 20.33 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 20.28 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 20.23 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 20.18 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 20.14 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 20.09 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 20.05 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 20.02 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 20.00 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 19.99 tok/sEstimated Auto-generated benchmark | 8GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 19.95 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 19.81 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 19.75 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 19.63 tok/sEstimated Auto-generated benchmark | 8GB |
| bigscience/bloomz-560m | Q8 | 19.63 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 19.60 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 19.57 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 19.56 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 19.52 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 19.52 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 19.49 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 19.45 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 19.44 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 19.42 tok/sEstimated Auto-generated benchmark | 8GB |
| openai-community/gpt2-xl | Q8 | 19.38 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B | Q8 | 19.37 tok/sEstimated Auto-generated benchmark | 6GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 19.33 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/DialoGPT-small | Q8 | 19.28 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 19.25 tok/sEstimated Auto-generated benchmark | 7GB |
| petals-team/StableBeluga2 | Q8 | 19.18 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 19.18 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 19.15 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 19.12 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 19.11 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 19.04 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 19.03 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 19.03 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 19.02 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B | Q8 | 18.99 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 18.94 tok/sEstimated Auto-generated benchmark | 8GB |
| openai-community/gpt2 | Q8 | 18.93 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 18.90 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 18.76 tok/sEstimated Auto-generated benchmark | 8GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 18.72 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 18.72 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 18.68 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 18.65 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 18.55 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 18.51 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 18.39 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 18.29 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 18.29 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 18.21 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 18.19 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 18.17 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 17.96 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 17.88 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 17.82 tok/sEstimated Auto-generated benchmark | 8GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 17.45 tok/sEstimated Auto-generated benchmark | 8GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 8GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 8GB) |
| ai-forever/ruGPT-3.5-13B | Q8 | Not supported | — | 13GB (have 8GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits (tight) | 21.94 tok/sEstimated | 7GB (have 8GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 8GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Not supported | — | 16GB (have 8GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits (tight) | 20.91 tok/sEstimated | 7GB (have 8GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 28.04 tok/sEstimated | 4GB (have 8GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits (tight) | 17.45 tok/sEstimated | 8GB (have 8GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 24.93 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits (tight) | 21.15 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 31.08 tok/sEstimated | 4GB (have 8GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Not supported | — | 20GB (have 8GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Not supported | — | 10GB (have 8GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits (tight) | 21.69 tok/sEstimated | 7GB (have 8GB) |
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 30.57 tok/sEstimated | 4GB (have 8GB) |
| dicta-il/dictalm2.0-instruct | Q8 | Fits (tight) | 21.84 tok/sEstimated | 7GB (have 8GB) |
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 27.91 tok/sEstimated | 4GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Not supported | — | 15GB (have 8GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits (tight) | 19.42 tok/sEstimated | 8GB (have 8GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 29.11 tok/sEstimated | 4GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Not supported | — | 15GB (have 8GB) |
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 23.40 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 32.04 tok/sEstimated | 3GB (have 8GB) |
| deepseek-ai/DeepSeek-V3 | Q8 | Fits (tight) | 21.39 tok/sEstimated | 7GB (have 8GB) |
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 29.27 tok/sEstimated | 4GB (have 8GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 22.90 tok/sEstimated | 5GB (have 8GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 35.83 tok/sEstimated | 3GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Not supported | — | 15GB (have 8GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 8GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Not supported | — | 15GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 8GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Not supported | — | 15GB (have 8GB) |
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 8GB) |
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 8GB) |
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 44.15 tok/sEstimated | 1GB (have 8GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 54.16 tok/sEstimated | 1GB (have 8GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits (tight) | 18.90 tok/sEstimated | 8GB (have 8GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 28.65 tok/sEstimated | 4GB (have 8GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Not supported | — | 9GB (have 8GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 28.71 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 27.44 tok/sEstimated | 3GB (have 8GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 37.82 tok/sEstimated | 2GB (have 8GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits (tight) | 18.55 tok/sEstimated | 7GB (have 8GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 29.07 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Not supported | — | 13GB (have 8GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits (tight) | 20.60 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 8GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 8GB) |
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 41.54 tok/sEstimated | 1GB (have 8GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 63.30 tok/sEstimated | 1GB (have 8GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 28.00 tok/sEstimated | 3GB (have 8GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 38.31 tok/sEstimated | 2GB (have 8GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 8GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 8GB) |
| ibm-granite/granite-docling-258M | Q8 | Fits (tight) | 18.68 tok/sEstimated | 7GB (have 8GB) |
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 29.24 tok/sEstimated | 4GB (have 8GB) |
| skt/kogpt2-base-v2 | Q8 | Fits (tight) | 19.57 tok/sEstimated | 7GB (have 8GB) |
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 29.96 tok/sEstimated | 4GB (have 8GB) |
| google/gemma-3-270m-it | Q8 | Fits (tight) | 19.18 tok/sEstimated | 7GB (have 8GB) |
| google/gemma-3-270m-it | Q4 | Fits comfortably | 27.86 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 26.14 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 38.03 tok/sEstimated | 2GB (have 8GB) |
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 8GB) |
| Qwen/Qwen2.5-32B | Q4 | Not supported | — | 16GB (have 8GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits (tight) | 20.51 tok/sEstimated | 7GB (have 8GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 31.59 tok/sEstimated | 4GB (have 8GB) |
| EleutherAI/pythia-70m-deduped | Q8 | Fits (tight) | 19.15 tok/sEstimated | 7GB (have 8GB) |
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 29.91 tok/sEstimated | 4GB (have 8GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 23.00 tok/sEstimated | 5GB (have 8GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 34.98 tok/sEstimated | 3GB (have 8GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 34.26 tok/sEstimated | 2GB (have 8GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 45.86 tok/sEstimated | 1GB (have 8GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 8GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 8GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits (tight) | 19.12 tok/sEstimated | 7GB (have 8GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 31.39 tok/sEstimated | 4GB (have 8GB) |
| google/gemma-2b | Q8 | Fits comfortably | 32.45 tok/sEstimated | 2GB (have 8GB) |
| google/gemma-2b | Q4 | Fits comfortably | 41.70 tok/sEstimated | 1GB (have 8GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits (tight) | 18.72 tok/sEstimated | 7GB (have 8GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 29.08 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 8GB) |
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 8GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits (tight) | 20.70 tok/sEstimated | 8GB (have 8GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 25.27 tok/sEstimated | 4GB (have 8GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits (tight) | 20.91 tok/sEstimated | 7GB (have 8GB) |
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 26.43 tok/sEstimated | 4GB (have 8GB) |
| llamafactory/tiny-random-Llama-3 | Q8 | Fits (tight) | 18.72 tok/sEstimated | 7GB (have 8GB) |
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 30.36 tok/sEstimated | 4GB (have 8GB) |
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits (tight) | 19.52 tok/sEstimated | 7GB (have 8GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 29.31 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 27.03 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 33.81 tok/sEstimated | 2GB (have 8GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 8GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Not supported | — | 15GB (have 8GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits (tight) | 19.33 tok/sEstimated | 8GB (have 8GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 28.42 tok/sEstimated | 4GB (have 8GB) |
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 38.13 tok/sEstimated | 1GB (have 8GB) |
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 62.13 tok/sEstimated | 1GB (have 8GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits (tight) | 19.02 tok/sEstimated | 8GB (have 8GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 25.08 tok/sEstimated | 4GB (have 8GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 8GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 8GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits (tight) | 18.29 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 31.30 tok/sEstimated | 4GB (have 8GB) |
| numind/NuExtract-1.5 | Q8 | Fits (tight) | 21.44 tok/sEstimated | 7GB (have 8GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 30.14 tok/sEstimated | 4GB (have 8GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits (tight) | 19.81 tok/sEstimated | 7GB (have 8GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 27.91 tok/sEstimated | 4GB (have 8GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits (tight) | 20.93 tok/sEstimated | 7GB (have 8GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 28.01 tok/sEstimated | 4GB (have 8GB) |
| huggyllama/llama-7b | Q8 | Fits (tight) | 21.47 tok/sEstimated | 7GB (have 8GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 31.44 tok/sEstimated | 4GB (have 8GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits (tight) | 18.19 tok/sEstimated | 7GB (have 8GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 28.45 tok/sEstimated | 4GB (have 8GB) |
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits (tight) | 21.47 tok/sEstimated | 7GB (have 8GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 27.51 tok/sEstimated | 4GB (have 8GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits (tight) | 20.14 tok/sEstimated | 7GB (have 8GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 27.58 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-Guard-3-8B | Q8 | Fits (tight) | 19.11 tok/sEstimated | 8GB (have 8GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 28.68 tok/sEstimated | 4GB (have 8GB) |
| openai-community/gpt2-xl | Q8 | Fits (tight) | 19.38 tok/sEstimated | 7GB (have 8GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 31.19 tok/sEstimated | 4GB (have 8GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Not supported | — | 14GB (have 8GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits (tight) | 19.95 tok/sEstimated | 7GB (have 8GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 8GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 8GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 24.07 tok/sEstimated | 4GB (have 8GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 34.24 tok/sEstimated | 2GB (have 8GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 26.17 tok/sEstimated | 3GB (have 8GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 39.61 tok/sEstimated | 2GB (have 8GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 25.65 tok/sEstimated | 4GB (have 8GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 35.49 tok/sEstimated | 2GB (have 8GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 27.13 tok/sEstimated | 3GB (have 8GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 41.74 tok/sEstimated | 2GB (have 8GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 25.40 tok/sEstimated | 4GB (have 8GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 35.77 tok/sEstimated | 2GB (have 8GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 25.93 tok/sEstimated | 3GB (have 8GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 42.93 tok/sEstimated | 2GB (have 8GB) |
| EleutherAI/gpt-neo-125m | Q8 | Fits (tight) | 18.51 tok/sEstimated | 7GB (have 8GB) |
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 28.42 tok/sEstimated | 4GB (have 8GB) |
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 8GB) |
| codellama/CodeLlama-34b-hf | Q4 | Not supported | — | 17GB (have 8GB) |
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 41.91 tok/sEstimated | 1GB (have 8GB) |
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 60.22 tok/sEstimated | 1GB (have 8GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 24.70 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 32.75 tok/sEstimated | 3GB (have 8GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 30.35 tok/sEstimated | 2GB (have 8GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 46.15 tok/sEstimated | 1GB (have 8GB) |
| Qwen/Qwen2.5-14B | Q8 | Not supported | — | 14GB (have 8GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits (tight) | 21.00 tok/sEstimated | 7GB (have 8GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 8GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Not supported | — | 16GB (have 8GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits (tight) | 20.73 tok/sEstimated | 7GB (have 8GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 29.00 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 24.45 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 38.53 tok/sEstimated | 2GB (have 8GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits (tight) | 21.24 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 27.88 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits (tight) | 19.49 tok/sEstimated | 7GB (have 8GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 26.81 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-14B-Base | Q8 | Not supported | — | 14GB (have 8GB) |
| Qwen/Qwen3-14B-Base | Q4 | Fits (tight) | 23.66 tok/sEstimated | 7GB (have 8GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits (tight) | 20.33 tok/sEstimated | 8GB (have 8GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 29.12 tok/sEstimated | 4GB (have 8GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits (tight) | 19.25 tok/sEstimated | 7GB (have 8GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 30.30 tok/sEstimated | 4GB (have 8GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits (tight) | 19.44 tok/sEstimated | 7GB (have 8GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 29.65 tok/sEstimated | 4GB (have 8GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits (tight) | 20.90 tok/sEstimated | 7GB (have 8GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 28.06 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 23.46 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 29.90 tok/sEstimated | 3GB (have 8GB) |
| IlyaGusev/saiga_llama3_8b | Q8 | Fits (tight) | 20.53 tok/sEstimated | 8GB (have 8GB) |
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 27.00 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 8GB) |
| Qwen/Qwen3-30B-A3B | Q4 | Not supported | — | 15GB (have 8GB) |
| deepseek-ai/DeepSeek-R1 | Q8 | Fits (tight) | 19.04 tok/sEstimated | 7GB (have 8GB) |
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 30.78 tok/sEstimated | 4GB (have 8GB) |
| microsoft/DialoGPT-small | Q8 | Fits (tight) | 19.28 tok/sEstimated | 7GB (have 8GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 27.41 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits (tight) | 17.96 tok/sEstimated | 8GB (have 8GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 25.98 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 8GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Not supported | — | 15GB (have 8GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 26.77 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 36.39 tok/sEstimated | 2GB (have 8GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits (tight) | 22.11 tok/sEstimated | 7GB (have 8GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 27.87 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits (tight) | 19.52 tok/sEstimated | 8GB (have 8GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 26.88 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 20.66 tok/sEstimated | 6GB (have 8GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 32.68 tok/sEstimated | 3GB (have 8GB) |
| openai-community/gpt2-medium | Q8 | Fits (tight) | 20.02 tok/sEstimated | 7GB (have 8GB) |
| openai-community/gpt2-medium | Q4 | Fits comfortably | 30.02 tok/sEstimated | 4GB (have 8GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits (tight) | 21.72 tok/sEstimated | 7GB (have 8GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 28.74 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 22.30 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 31.91 tok/sEstimated | 3GB (have 8GB) |
| HuggingFaceTB/SmolLM-135M | Q8 | Fits (tight) | 19.45 tok/sEstimated | 7GB (have 8GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 29.24 tok/sEstimated | 4GB (have 8GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Not supported | — | 20GB (have 8GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Not supported | — | 10GB (have 8GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 8GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 8GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits (tight) | 18.94 tok/sEstimated | 8GB (have 8GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 27.94 tok/sEstimated | 4GB (have 8GB) |
| zai-org/GLM-4.5-Air | Q8 | Fits (tight) | 21.55 tok/sEstimated | 7GB (have 8GB) |
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 27.30 tok/sEstimated | 4GB (have 8GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits (tight) | 20.68 tok/sEstimated | 7GB (have 8GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 30.44 tok/sEstimated | 4GB (have 8GB) |
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 30.16 tok/sEstimated | 2GB (have 8GB) |
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 48.24 tok/sEstimated | 1GB (have 8GB) |
| mistralai/Mistral-7B-v0.1 | Q8 | Fits (tight) | 19.75 tok/sEstimated | 7GB (have 8GB) |
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 30.36 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 8GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Not supported | — | 16GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits (tight) | 19.03 tok/sEstimated | 7GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 31.15 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits (tight) | 19.63 tok/sEstimated | 8GB (have 8GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 29.05 tok/sEstimated | 4GB (have 8GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits (tight) | 21.31 tok/sEstimated | 7GB (have 8GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 26.41 tok/sEstimated | 4GB (have 8GB) |
| microsoft/phi-4 | Q8 | Fits (tight) | 19.03 tok/sEstimated | 7GB (have 8GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 30.27 tok/sEstimated | 4GB (have 8GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 28.32 tok/sEstimated | 3GB (have 8GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 39.69 tok/sEstimated | 2GB (have 8GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 22.56 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 31.48 tok/sEstimated | 3GB (have 8GB) |
| MiniMaxAI/MiniMax-M2 | Q8 | Fits (tight) | 18.17 tok/sEstimated | 7GB (have 8GB) |
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 28.96 tok/sEstimated | 4GB (have 8GB) |
| microsoft/DialoGPT-medium | Q8 | Fits (tight) | 18.39 tok/sEstimated | 7GB (have 8GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 28.57 tok/sEstimated | 4GB (have 8GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits (tight) | 21.49 tok/sEstimated | 7GB (have 8GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 29.69 tok/sEstimated | 4GB (have 8GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits (tight) | 21.15 tok/sEstimated | 7GB (have 8GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 31.29 tok/sEstimated | 4GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits (tight) | 20.86 tok/sEstimated | 8GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 26.86 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-2-7b-hf | Q8 | Fits (tight) | 20.18 tok/sEstimated | 7GB (have 8GB) |
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 28.36 tok/sEstimated | 4GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits (tight) | 20.05 tok/sEstimated | 7GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 26.79 tok/sEstimated | 4GB (have 8GB) |
| microsoft/phi-2 | Q8 | Fits (tight) | 18.65 tok/sEstimated | 7GB (have 8GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 31.02 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 8GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 8GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 21.36 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 32.51 tok/sEstimated | 3GB (have 8GB) |
| Qwen/Qwen3-14B | Q8 | Not supported | — | 14GB (have 8GB) |
| Qwen/Qwen3-14B | Q4 | Fits (tight) | 20.00 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits (tight) | 17.88 tok/sEstimated | 8GB (have 8GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 29.09 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 8GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 8GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits (tight) | 17.82 tok/sEstimated | 8GB (have 8GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 26.22 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Not supported | — | 14GB (have 8GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits (tight) | 24.00 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 20.76 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 30.00 tok/sEstimated | 3GB (have 8GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 23.91 tok/sEstimated | 4GB (have 8GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 32.82 tok/sEstimated | 2GB (have 8GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Not supported | — | 20GB (have 8GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Not supported | — | 10GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 22.84 tok/sEstimated | 5GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 32.70 tok/sEstimated | 3GB (have 8GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits (tight) | 18.76 tok/sEstimated | 8GB (have 8GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 25.53 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 21.78 tok/sEstimated | 6GB (have 8GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 27.85 tok/sEstimated | 3GB (have 8GB) |
| rednote-hilab/dots.ocr | Q8 | Fits (tight) | 20.23 tok/sEstimated | 7GB (have 8GB) |
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 26.68 tok/sEstimated | 4GB (have 8GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 26.85 tok/sEstimated | 3GB (have 8GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 39.95 tok/sEstimated | 2GB (have 8GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 8GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Not supported | — | 15GB (have 8GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 22.33 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 34.80 tok/sEstimated | 2GB (have 8GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits (tight) | 18.21 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 26.76 tok/sEstimated | 4GB (have 8GB) |
| openai-community/gpt2-large | Q8 | Fits (tight) | 18.29 tok/sEstimated | 7GB (have 8GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 26.44 tok/sEstimated | 4GB (have 8GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits (tight) | 21.71 tok/sEstimated | 7GB (have 8GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 26.53 tok/sEstimated | 4GB (have 8GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 37.83 tok/sEstimated | 1GB (have 8GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 55.02 tok/sEstimated | 1GB (have 8GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 8GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 8GB) |
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 8GB) |
| Qwen/Qwen3-32B | Q4 | Not supported | — | 16GB (have 8GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 22.91 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 33.87 tok/sEstimated | 3GB (have 8GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits (tight) | 19.60 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 29.86 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits (tight) | 19.99 tok/sEstimated | 8GB (have 8GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 26.11 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 37.08 tok/sEstimated | 1GB (have 8GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 57.84 tok/sEstimated | 1GB (have 8GB) |
| petals-team/StableBeluga2 | Q8 | Fits (tight) | 19.18 tok/sEstimated | 7GB (have 8GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 28.98 tok/sEstimated | 4GB (have 8GB) |
| vikhyatk/moondream2 | Q8 | Fits (tight) | 20.66 tok/sEstimated | 7GB (have 8GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 29.67 tok/sEstimated | 4GB (have 8GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 27.19 tok/sEstimated | 3GB (have 8GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 37.93 tok/sEstimated | 2GB (have 8GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 8GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 8GB) |
| distilbert/distilgpt2 | Q8 | Fits (tight) | 22.08 tok/sEstimated | 7GB (have 8GB) |
| distilbert/distilgpt2 | Q4 | Fits comfortably | 30.31 tok/sEstimated | 4GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 8GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Not supported | — | 16GB (have 8GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 26.66 tok/sEstimated | 3GB (have 8GB) |
| inference-net/Schematron-3B | Q4 | Fits comfortably | 41.65 tok/sEstimated | 2GB (have 8GB) |
| Qwen/Qwen3-8B | Q8 | Fits (tight) | 18.99 tok/sEstimated | 8GB (have 8GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 28.72 tok/sEstimated | 4GB (have 8GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits (tight) | 19.56 tok/sEstimated | 7GB (have 8GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 26.23 tok/sEstimated | 4GB (have 8GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 27.93 tok/sEstimated | 3GB (have 8GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 37.97 tok/sEstimated | 2GB (have 8GB) |
| bigscience/bloomz-560m | Q8 | Fits (tight) | 19.63 tok/sEstimated | 7GB (have 8GB) |
| bigscience/bloomz-560m | Q4 | Fits comfortably | 31.27 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 25.58 tok/sEstimated | 3GB (have 8GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 41.49 tok/sEstimated | 2GB (have 8GB) |
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 8GB) |
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 8GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 41.70 tok/sEstimated | 1GB (have 8GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 59.90 tok/sEstimated | 1GB (have 8GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 27.16 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 34.02 tok/sEstimated | 2GB (have 8GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits (tight) | 21.02 tok/sEstimated | 7GB (have 8GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 28.79 tok/sEstimated | 4GB (have 8GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 40.50 tok/sEstimated | 1GB (have 8GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 59.38 tok/sEstimated | 1GB (have 8GB) |
| facebook/opt-125m | Q8 | Fits (tight) | 21.47 tok/sEstimated | 7GB (have 8GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 26.24 tok/sEstimated | 4GB (have 8GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 20.90 tok/sEstimated | 5GB (have 8GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 32.12 tok/sEstimated | 3GB (have 8GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 21.36 tok/sEstimated | 6GB (have 8GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 33.04 tok/sEstimated | 3GB (have 8GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 39.36 tok/sEstimated | 1GB (have 8GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 57.33 tok/sEstimated | 1GB (have 8GB) |
| openai/gpt-oss-20b | Q8 | Not supported | — | 20GB (have 8GB) |
| openai/gpt-oss-20b | Q4 | Not supported | — | 10GB (have 8GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 8GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Not supported | — | 17GB (have 8GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits (tight) | 20.09 tok/sEstimated | 8GB (have 8GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 25.31 tok/sEstimated | 4GB (have 8GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 20.58 tok/sEstimated | 5GB (have 8GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 34.71 tok/sEstimated | 3GB (have 8GB) |
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 19.37 tok/sEstimated | 6GB (have 8GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 33.48 tok/sEstimated | 3GB (have 8GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits (tight) | 20.28 tok/sEstimated | 7GB (have 8GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 29.57 tok/sEstimated | 4GB (have 8GB) |
| openai-community/gpt2 | Q8 | Fits (tight) | 18.93 tok/sEstimated | 7GB (have 8GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 25.99 tok/sEstimated | 4GB (have 8GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
A Windows builder running LLaMA 2 13B Q6 in Kobold with system-memory fallback enabled sees about 8 tokens/sec, while disabling fallback drops throughput toward 5 tok/s.
Source: Reddit – /r/LocalLLaMA (1beu2vh)
Not really—community buyers warn that the 3070’s 8GB ceiling struggles beyond 13B unless you chain multiple cards or accept heavy offload, making higher-VRAM GPUs a safer bet.
Source: Reddit – /r/LocalLLaMA (ndp8799)
Keep NVIDIA’s sysmem fallback enabled when you stretch past 7B models—users disabling it see token speeds collapse as layers spill to host RAM instead of the GPU.
Source: Reddit – /r/LocalLLaMA (kuyjopm)
RTX 3070 ships with 8 GB GDDR6, draws 220 W, and uses dual 8-pin PCIe connectors with NVIDIA recommending a 650 W PSU.
Source: TechPowerUp – RTX 3070 Specs
As of 3 Nov 2025 the RTX 3070 hovered around $469 (Amazon, in stock), $489 (Newegg, in stock), and $479 (Best Buy, limited stock).
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.