Loading GPU data...
Loading GPU data...
Quick Answer: RTX 4070 offers 12GB VRAM and starts around $599.00. It delivers approximately 64 tokens/sec on unsloth/Llama-3.2-1B-Instruct. It typically draws 200W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| unsloth/Llama-3.2-1B-Instruct | Q4 | 63.66 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 62.49 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 61.89 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 61.43 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 59.60 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 58.81 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 58.31 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 54.18 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 54.12 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 48.93 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 46.22 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 45.57 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 44.63 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 42.77 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 42.52 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q4 | 42.23 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2b | Q4 | 41.16 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 40.57 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 39.82 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-3-1b-it | Q8 | 39.79 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 39.61 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 39.52 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 39.12 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 38.97 tok/sEstimated Auto-generated benchmark | 1GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 38.86 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 38.77 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 38.77 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 38.56 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q4 | 38.53 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B | Q4 | 38.29 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q8 | 38.29 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 38.05 tok/sEstimated Auto-generated benchmark | 1GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 37.93 tok/sEstimated Auto-generated benchmark | 3GB |
| bigcode/starcoder2-3b | Q4 | 37.91 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q8 | 37.87 tok/sEstimated Auto-generated benchmark | 1GB |
| google-t5/t5-3b | Q4 | 37.78 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 37.71 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-4B-Base | Q4 | 37.27 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 37.12 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 36.73 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 36.71 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 35.96 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 35.70 tok/sEstimated Auto-generated benchmark | 3GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 34.40 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2-2b-it | Q8 | 34.14 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 33.89 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 33.14 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/VibeVoice-1.5B | Q4 | 33.09 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 32.96 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2b | Q8 | 32.26 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 32.07 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 32.02 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 31.94 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 31.93 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 31.84 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 31.56 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 31.43 tok/sEstimated Auto-generated benchmark | 3GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 31.36 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 31.32 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 31.29 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 31.26 tok/sEstimated Auto-generated benchmark | 4GB |
| rednote-hilab/dots.ocr | Q4 | 31.25 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 31.18 tok/sEstimated Auto-generated benchmark | 3GB |
| skt/kogpt2-base-v2 | Q4 | 31.05 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 30.95 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 30.94 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 30.89 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 30.83 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 30.82 tok/sEstimated Auto-generated benchmark | 3GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 30.80 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 30.60 tok/sEstimated Auto-generated benchmark | 3GB |
| petals-team/StableBeluga2 | Q4 | 30.59 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 30.54 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 30.48 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 30.46 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 30.42 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q4 | 30.19 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 30.19 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 30.16 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 30.15 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 30.07 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 30.02 tok/sEstimated Auto-generated benchmark | 3GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 29.95 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 29.90 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 29.85 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 29.67 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 29.63 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 29.61 tok/sEstimated Auto-generated benchmark | 4GB |
| liuhaotian/llava-v1.5-7b | Q4 | 29.61 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 29.60 tok/sEstimated Auto-generated benchmark | 5GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 29.58 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q4 | 29.57 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B | Q4 | 29.53 tok/sEstimated Auto-generated benchmark | 3GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 29.52 tok/sEstimated Auto-generated benchmark | 3GB |
| EleutherAI/gpt-neo-125m | Q4 | 29.51 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 29.30 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-large | Q4 | 29.28 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 29.22 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 29.20 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 29.19 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 29.03 tok/sEstimated Auto-generated benchmark | 3GB |
| LiquidAI/LFM2-1.2B | Q8 | 29.01 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 29.00 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 29.00 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 28.99 tok/sEstimated Auto-generated benchmark | 4GB |
| google-t5/t5-3b | Q8 | 28.93 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-medium | Q4 | 28.82 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 28.62 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 28.61 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 28.38 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 28.38 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 28.35 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 28.14 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 28.10 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 28.10 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 28.09 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 27.97 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 27.96 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 27.96 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-research/PowerMoE-3b | Q8 | 27.92 tok/sEstimated Auto-generated benchmark | 3GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 27.89 tok/sEstimated Auto-generated benchmark | 5GB |
| numind/NuExtract-1.5 | Q4 | 27.83 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 27.75 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q8 | 27.74 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 27.73 tok/sEstimated Auto-generated benchmark | 3GB |
| bigscience/bloomz-560m | Q4 | 27.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 27.67 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 27.66 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 27.59 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 27.57 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 27.51 tok/sEstimated Auto-generated benchmark | 4GB |
| bigcode/starcoder2-3b | Q8 | 27.49 tok/sEstimated Auto-generated benchmark | 3GB |
| parler-tts/parler-tts-large-v1 | Q4 | 27.32 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-9b-it | Q4 | 27.23 tok/sEstimated Auto-generated benchmark | 6GB |
| rinna/japanese-gpt-neox-small | Q4 | 27.11 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 27.08 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 27.01 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 27.00 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 26.84 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 26.80 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 26.79 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 26.77 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 26.76 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 26.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 26.72 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 26.70 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 26.61 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 26.61 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q8 | 26.60 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 26.59 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 26.50 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 26.46 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 26.44 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 26.43 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 26.34 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 26.24 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 26.22 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 26.19 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 26.06 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 26.03 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B | Q8 | 26.03 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 26.00 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 25.97 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q4 | 25.95 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 25.83 tok/sEstimated Auto-generated benchmark | 4GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 25.58 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.1-8B | Q4 | 25.56 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 25.54 tok/sEstimated Auto-generated benchmark | 5GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 25.53 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 25.44 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 25.21 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 25.07 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q8 | 25.02 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 24.81 tok/sEstimated Auto-generated benchmark | 5GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 24.77 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 24.37 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 24.33 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 24.28 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 23.97 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-14B-Base | Q4 | 23.88 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B | Q8 | 23.81 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/VibeVoice-1.5B | Q8 | 23.47 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 23.46 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 23.28 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 23.18 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 23.15 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 23.11 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 22.96 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 22.50 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 22.37 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 22.18 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 22.14 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 22.05 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 21.98 tok/sEstimated Auto-generated benchmark | 6GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 21.94 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 21.93 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 21.93 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 21.90 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 21.80 tok/sEstimated Auto-generated benchmark | 5GB |
| zai-org/GLM-4.5-Air | Q8 | 21.79 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B | Q8 | 21.76 tok/sEstimated Auto-generated benchmark | 5GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 21.75 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-0.5B | Q8 | 21.70 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-4-mini-instruct | Q8 | 21.66 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 21.63 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 21.59 tok/sEstimated Auto-generated benchmark | 7GB |
| vikhyatk/moondream2 | Q8 | 21.32 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 21.30 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 21.11 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 21.02 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 21.01 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 21.01 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 20.95 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 20.93 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 20.89 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B | Q8 | 20.74 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 20.73 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 20.73 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 20.68 tok/sEstimated Auto-generated benchmark | 5GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 20.68 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 20.67 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 20.62 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 20.62 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 20.59 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-1.7B | Q8 | 20.59 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 20.59 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 20.45 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 20.44 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 20.36 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 20.31 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 20.26 tok/sEstimated Auto-generated benchmark | 8GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 20.23 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 20.21 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 20.18 tok/sEstimated Auto-generated benchmark | 10GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 20.17 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-14B | Q4 | 20.12 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 20.07 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-small | Q8 | 20.05 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 20.05 tok/sEstimated Auto-generated benchmark | 7GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 20.02 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 19.98 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 19.97 tok/sEstimated Auto-generated benchmark | 8GB |
| numind/NuExtract-1.5 | Q8 | 19.97 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 19.86 tok/sEstimated Auto-generated benchmark | 8GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 19.85 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 19.82 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 19.76 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 19.73 tok/sEstimated Auto-generated benchmark | 8GB |
| dicta-il/dictalm2.0-instruct | Q8 | 19.73 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B | Q8 | 19.69 tok/sEstimated Auto-generated benchmark | 6GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 19.69 tok/sEstimated Auto-generated benchmark | 8GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 19.62 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 19.58 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 19.47 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 19.45 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 19.45 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 19.41 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 19.37 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 19.37 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 19.24 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 19.21 tok/sEstimated Auto-generated benchmark | 8GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 19.21 tok/sEstimated Auto-generated benchmark | 9GB |
| petals-team/StableBeluga2 | Q8 | 19.18 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 19.16 tok/sEstimated Auto-generated benchmark | 8GB |
| huggyllama/llama-7b | Q8 | 19.02 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-xl | Q8 | 19.00 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 18.99 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 18.99 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/DialoGPT-medium | Q8 | 18.98 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 18.94 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 18.86 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 18.80 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 18.74 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 18.72 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 18.69 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 18.62 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 18.58 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 18.58 tok/sEstimated Auto-generated benchmark | 7GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 18.55 tok/sEstimated Auto-generated benchmark | 8GB |
| rinna/japanese-gpt-neox-small | Q8 | 18.49 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 18.45 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 18.44 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 18.43 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 18.41 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 18.41 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 18.39 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 18.38 tok/sEstimated Auto-generated benchmark | 9GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 18.37 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 18.29 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 18.28 tok/sEstimated Auto-generated benchmark | 8GB |
| zai-org/GLM-4.6-FP8 | Q8 | 18.23 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 18.20 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 18.20 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-2-9b-it | Q8 | 18.01 tok/sEstimated Auto-generated benchmark | 11GB |
| openai/gpt-oss-20b | Q4 | 17.74 tok/sEstimated Auto-generated benchmark | 10GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 17.65 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-3.1-8B | Q8 | 17.58 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 17.46 tok/sEstimated Auto-generated benchmark | 8GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 17.25 tok/sEstimated Auto-generated benchmark | 10GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 19.21 tok/sEstimated | 8GB (have 12GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 25.83 tok/sEstimated | 4GB (have 12GB) |
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 37.71 tok/sEstimated | 1GB (have 12GB) |
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 63.66 tok/sEstimated | 1GB (have 12GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 18.28 tok/sEstimated | 8GB (have 12GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 25.53 tok/sEstimated | 4GB (have 12GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 12GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 12GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 18.45 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 27.97 tok/sEstimated | 4GB (have 12GB) |
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 19.97 tok/sEstimated | 7GB (have 12GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 27.83 tok/sEstimated | 4GB (have 12GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 19.24 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 30.54 tok/sEstimated | 4GB (have 12GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 20.02 tok/sEstimated | 7GB (have 12GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 29.20 tok/sEstimated | 4GB (have 12GB) |
| huggyllama/llama-7b | Q8 | Fits comfortably | 19.02 tok/sEstimated | 7GB (have 12GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 29.61 tok/sEstimated | 4GB (have 12GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 20.93 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 26.34 tok/sEstimated | 4GB (have 12GB) |
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 18.20 tok/sEstimated | 7GB (have 12GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 29.90 tok/sEstimated | 4GB (have 12GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 18.41 tok/sEstimated | 7GB (have 12GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 26.61 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 17.65 tok/sEstimated | 8GB (have 12GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 28.10 tok/sEstimated | 4GB (have 12GB) |
| openai-community/gpt2-xl | Q8 | Fits comfortably | 19.00 tok/sEstimated | 7GB (have 12GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 27.08 tok/sEstimated | 4GB (have 12GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Not supported | — | 14GB (have 12GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 21.75 tok/sEstimated | 7GB (have 12GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 12GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 12GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 26.50 tok/sEstimated | 4GB (have 12GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 37.12 tok/sEstimated | 2GB (have 12GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 27.92 tok/sEstimated | 3GB (have 12GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 36.71 tok/sEstimated | 2GB (have 12GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 23.97 tok/sEstimated | 4GB (have 12GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 36.73 tok/sEstimated | 2GB (have 12GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 29.30 tok/sEstimated | 3GB (have 12GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 38.77 tok/sEstimated | 2GB (have 12GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 24.33 tok/sEstimated | 4GB (have 12GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 33.14 tok/sEstimated | 2GB (have 12GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 25.21 tok/sEstimated | 3GB (have 12GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 38.53 tok/sEstimated | 2GB (have 12GB) |
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 20.67 tok/sEstimated | 7GB (have 12GB) |
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 29.51 tok/sEstimated | 4GB (have 12GB) |
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 12GB) |
| codellama/CodeLlama-34b-hf | Q4 | Not supported | — | 17GB (have 12GB) |
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 38.97 tok/sEstimated | 1GB (have 12GB) |
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 61.89 tok/sEstimated | 1GB (have 12GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 22.05 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 31.94 tok/sEstimated | 3GB (have 12GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 34.14 tok/sEstimated | 2GB (have 12GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 46.22 tok/sEstimated | 1GB (have 12GB) |
| Qwen/Qwen2.5-14B | Q8 | Not supported | — | 14GB (have 12GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 20.12 tok/sEstimated | 7GB (have 12GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 12GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Not supported | — | 16GB (have 12GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 19.45 tok/sEstimated | 7GB (have 12GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 26.46 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 26.60 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 37.27 tok/sEstimated | 2GB (have 12GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 19.37 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 29.00 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 19.47 tok/sEstimated | 7GB (have 12GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 31.56 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-14B-Base | Q8 | Not supported | — | 14GB (have 12GB) |
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 23.88 tok/sEstimated | 7GB (have 12GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 18.55 tok/sEstimated | 8GB (have 12GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 29.95 tok/sEstimated | 4GB (have 12GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 18.44 tok/sEstimated | 7GB (have 12GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 27.96 tok/sEstimated | 4GB (have 12GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 18.69 tok/sEstimated | 7GB (have 12GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 25.97 tok/sEstimated | 4GB (have 12GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 18.49 tok/sEstimated | 7GB (have 12GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 27.11 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 23.28 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 30.02 tok/sEstimated | 3GB (have 12GB) |
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 18.74 tok/sEstimated | 8GB (have 12GB) |
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 27.96 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 12GB) |
| Qwen/Qwen3-30B-A3B | Q4 | Not supported | — | 15GB (have 12GB) |
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 22.18 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 26.22 tok/sEstimated | 4GB (have 12GB) |
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 20.05 tok/sEstimated | 7GB (have 12GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 29.67 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 20.26 tok/sEstimated | 8GB (have 12GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 27.67 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 12GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Not supported | — | 15GB (have 12GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 23.11 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 32.02 tok/sEstimated | 2GB (have 12GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 18.41 tok/sEstimated | 7GB (have 12GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 26.06 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 19.86 tok/sEstimated | 8GB (have 12GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 26.72 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 21.98 tok/sEstimated | 6GB (have 12GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 31.18 tok/sEstimated | 3GB (have 12GB) |
| openai-community/gpt2-medium | Q8 | Fits comfortably | 18.58 tok/sEstimated | 7GB (have 12GB) |
| openai-community/gpt2-medium | Q4 | Fits comfortably | 28.82 tok/sEstimated | 4GB (have 12GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 18.62 tok/sEstimated | 7GB (have 12GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 26.43 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 24.81 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 31.84 tok/sEstimated | 3GB (have 12GB) |
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 20.44 tok/sEstimated | 7GB (have 12GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 26.61 tok/sEstimated | 4GB (have 12GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Not supported | — | 20GB (have 12GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 18.37 tok/sEstimated | 10GB (have 12GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 12GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 12GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 19.69 tok/sEstimated | 8GB (have 12GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 29.19 tok/sEstimated | 4GB (have 12GB) |
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 21.79 tok/sEstimated | 7GB (have 12GB) |
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 29.22 tok/sEstimated | 4GB (have 12GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 18.86 tok/sEstimated | 7GB (have 12GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 29.63 tok/sEstimated | 4GB (have 12GB) |
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 29.01 tok/sEstimated | 2GB (have 12GB) |
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 45.57 tok/sEstimated | 1GB (have 12GB) |
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 20.59 tok/sEstimated | 7GB (have 12GB) |
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 31.26 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 12GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Not supported | — | 16GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 20.05 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 26.77 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 17.58 tok/sEstimated | 8GB (have 12GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 25.56 tok/sEstimated | 4GB (have 12GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 19.98 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 28.99 tok/sEstimated | 4GB (have 12GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 18.99 tok/sEstimated | 7GB (have 12GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 31.29 tok/sEstimated | 4GB (have 12GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 27.73 tok/sEstimated | 3GB (have 12GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 39.52 tok/sEstimated | 2GB (have 12GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 21.70 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 30.19 tok/sEstimated | 3GB (have 12GB) |
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 20.07 tok/sEstimated | 7GB (have 12GB) |
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 30.80 tok/sEstimated | 4GB (have 12GB) |
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 18.98 tok/sEstimated | 7GB (have 12GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 28.38 tok/sEstimated | 4GB (have 12GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 18.23 tok/sEstimated | 7GB (have 12GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 29.00 tok/sEstimated | 4GB (have 12GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 19.85 tok/sEstimated | 7GB (have 12GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 30.94 tok/sEstimated | 4GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 20.95 tok/sEstimated | 8GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 29.85 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 19.41 tok/sEstimated | 7GB (have 12GB) |
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 28.14 tok/sEstimated | 4GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 21.02 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 30.16 tok/sEstimated | 4GB (have 12GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 21.01 tok/sEstimated | 7GB (have 12GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 30.15 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 12GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 12GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 25.02 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 29.53 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen3-14B | Q8 | Not supported | — | 14GB (have 12GB) |
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 21.93 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 19.16 tok/sEstimated | 8GB (have 12GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 28.61 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 12GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 12GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 20.17 tok/sEstimated | 8GB (have 12GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 26.44 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Not supported | — | 14GB (have 12GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 23.18 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 21.76 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 29.57 tok/sEstimated | 3GB (have 12GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 24.28 tok/sEstimated | 4GB (have 12GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 38.86 tok/sEstimated | 2GB (have 12GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Not supported | — | 20GB (have 12GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 17.25 tok/sEstimated | 10GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 22.14 tok/sEstimated | 5GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 31.93 tok/sEstimated | 3GB (have 12GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 19.73 tok/sEstimated | 8GB (have 12GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 26.79 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 20.59 tok/sEstimated | 6GB (have 12GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 31.43 tok/sEstimated | 3GB (have 12GB) |
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 20.62 tok/sEstimated | 7GB (have 12GB) |
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 31.25 tok/sEstimated | 4GB (have 12GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 28.93 tok/sEstimated | 3GB (have 12GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 37.78 tok/sEstimated | 2GB (have 12GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 12GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Not supported | — | 15GB (have 12GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 23.81 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 38.29 tok/sEstimated | 2GB (have 12GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 20.59 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 30.89 tok/sEstimated | 4GB (have 12GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 18.43 tok/sEstimated | 7GB (have 12GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 29.28 tok/sEstimated | 4GB (have 12GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 20.21 tok/sEstimated | 7GB (have 12GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 27.01 tok/sEstimated | 4GB (have 12GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 38.29 tok/sEstimated | 1GB (have 12GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 58.81 tok/sEstimated | 1GB (have 12GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 12GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 12GB) |
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 12GB) |
| Qwen/Qwen3-32B | Q4 | Not supported | — | 16GB (have 12GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 24.37 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 32.96 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 21.30 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 25.95 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 21.11 tok/sEstimated | 8GB (have 12GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 25.07 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 37.87 tok/sEstimated | 1GB (have 12GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 54.18 tok/sEstimated | 1GB (have 12GB) |
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 19.18 tok/sEstimated | 7GB (have 12GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 30.59 tok/sEstimated | 4GB (have 12GB) |
| vikhyatk/moondream2 | Q8 | Fits comfortably | 21.32 tok/sEstimated | 7GB (have 12GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 27.57 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 30.19 tok/sEstimated | 3GB (have 12GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 42.77 tok/sEstimated | 2GB (have 12GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 12GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 12GB) |
| distilbert/distilgpt2 | Q8 | Fits comfortably | 18.94 tok/sEstimated | 7GB (have 12GB) |
| distilbert/distilgpt2 | Q4 | Fits comfortably | 28.62 tok/sEstimated | 4GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 12GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Not supported | — | 16GB (have 12GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 27.74 tok/sEstimated | 3GB (have 12GB) |
| inference-net/Schematron-3B | Q4 | Fits comfortably | 42.23 tok/sEstimated | 2GB (have 12GB) |
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 20.74 tok/sEstimated | 8GB (have 12GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 28.09 tok/sEstimated | 4GB (have 12GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 18.72 tok/sEstimated | 7GB (have 12GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 26.03 tok/sEstimated | 4GB (have 12GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 25.58 tok/sEstimated | 3GB (have 12GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 38.77 tok/sEstimated | 2GB (have 12GB) |
| bigscience/bloomz-560m | Q8 | Fits comfortably | 18.20 tok/sEstimated | 7GB (have 12GB) |
| bigscience/bloomz-560m | Q4 | Fits comfortably | 27.72 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 26.70 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 39.82 tok/sEstimated | 2GB (have 12GB) |
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 12GB) |
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 12GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 44.63 tok/sEstimated | 1GB (have 12GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 61.43 tok/sEstimated | 1GB (have 12GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 22.37 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 38.56 tok/sEstimated | 2GB (have 12GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 20.68 tok/sEstimated | 7GB (have 12GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 31.32 tok/sEstimated | 4GB (have 12GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 42.52 tok/sEstimated | 1GB (have 12GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 62.49 tok/sEstimated | 1GB (have 12GB) |
| facebook/opt-125m | Q8 | Fits comfortably | 21.93 tok/sEstimated | 7GB (have 12GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 28.35 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 20.68 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 35.70 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 22.50 tok/sEstimated | 6GB (have 12GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 29.03 tok/sEstimated | 3GB (have 12GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 39.79 tok/sEstimated | 1GB (have 12GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 58.31 tok/sEstimated | 1GB (have 12GB) |
| openai/gpt-oss-20b | Q8 | Not supported | — | 20GB (have 12GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 17.74 tok/sEstimated | 10GB (have 12GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 12GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Not supported | — | 17GB (have 12GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 17.46 tok/sEstimated | 8GB (have 12GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 25.44 tok/sEstimated | 4GB (have 12GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 23.15 tok/sEstimated | 5GB (have 12GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 29.52 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 19.69 tok/sEstimated | 6GB (have 12GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 30.60 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 19.76 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 27.66 tok/sEstimated | 4GB (have 12GB) |
| openai-community/gpt2 | Q8 | Fits comfortably | 18.80 tok/sEstimated | 7GB (have 12GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 26.24 tok/sEstimated | 4GB (have 12GB) |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | Not supported | — | 79GB (have 12GB) |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | Not supported | — | 40GB (have 12GB) |
| 01-ai/Yi-1.5-34B-Chat | Q8 | Not supported | — | 39GB (have 12GB) |
| 01-ai/Yi-1.5-34B-Chat | Q4 | Not supported | — | 20GB (have 12GB) |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | Fits comfortably | 20.23 tok/sEstimated | 9GB (have 12GB) |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | Fits comfortably | 27.00 tok/sEstimated | 5GB (have 12GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | Not supported | — | 79GB (have 12GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | Not supported | — | 40GB (have 12GB) |
| microsoft/Phi-3-medium-128k-instruct | Q8 | Not supported | — | 16GB (have 12GB) |
| microsoft/Phi-3-medium-128k-instruct | Q4 | Fits comfortably | 22.96 tok/sEstimated | 8GB (have 12GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 26.59 tok/sEstimated | 5GB (have 12GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 37.93 tok/sEstimated | 3GB (have 12GB) |
| google/gemma-2-9b-it | Q8 | Fits (tight) | 18.01 tok/sEstimated | 11GB (have 12GB) |
| google/gemma-2-9b-it | Q4 | Fits comfortably | 27.23 tok/sEstimated | 6GB (have 12GB) |
| google/gemma-2-27b-it | Q8 | Not supported | — | 31GB (have 12GB) |
| google/gemma-2-27b-it | Q4 | Not supported | — | 16GB (have 12GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | Not supported | — | 25GB (have 12GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | Not supported | — | 13GB (have 12GB) |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | Not supported | — | 138GB (have 12GB) |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | Not supported | — | 69GB (have 12GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | Not supported | — | 158GB (have 12GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | Not supported | — | 79GB (have 12GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 27.75 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 40.57 tok/sEstimated | 2GB (have 12GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 18.38 tok/sEstimated | 9GB (have 12GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 29.60 tok/sEstimated | 5GB (have 12GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 79GB (have 12GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 40GB (have 12GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 79GB (have 12GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 40GB (have 12GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | Not supported | — | 38GB (have 12GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | Not supported | — | 19GB (have 12GB) |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | Not supported | — | 264GB (have 12GB) |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | Not supported | — | 132GB (have 12GB) |
| deepseek-ai/DeepSeek-V2.5 | Q8 | Not supported | — | 264GB (have 12GB) |
| deepseek-ai/DeepSeek-V2.5 | Q4 | Not supported | — | 132GB (have 12GB) |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | Not supported | — | 82GB (have 12GB) |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | Not supported | — | 41GB (have 12GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 18.99 tok/sEstimated | 9GB (have 12GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 25.54 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Not supported | — | 17GB (have 12GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 21.90 tok/sEstimated | 9GB (have 12GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 37GB (have 12GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Not supported | — | 19GB (have 12GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | Not supported | — | 37GB (have 12GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | Not supported | — | 19GB (have 12GB) |
| Qwen/QwQ-32B-Preview | Q8 | Not supported | — | 37GB (have 12GB) |
| Qwen/QwQ-32B-Preview | Q4 | Not supported | — | 19GB (have 12GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 82GB (have 12GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 41GB (have 12GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 12GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 12GB) |
| ai-forever/ruGPT-3.5-13B | Q8 | Not supported | — | 13GB (have 12GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 21.59 tok/sEstimated | 7GB (have 12GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 12GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Not supported | — | 16GB (have 12GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 20.89 tok/sEstimated | 7GB (have 12GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 27.59 tok/sEstimated | 4GB (have 12GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 19.97 tok/sEstimated | 8GB (have 12GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 29.58 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 18.29 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 26.00 tok/sEstimated | 4GB (have 12GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Not supported | — | 20GB (have 12GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits comfortably | 20.18 tok/sEstimated | 10GB (have 12GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 20.45 tok/sEstimated | 7GB (have 12GB) |
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 28.10 tok/sEstimated | 4GB (have 12GB) |
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 19.73 tok/sEstimated | 7GB (have 12GB) |
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 26.76 tok/sEstimated | 4GB (have 12GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 12GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Not supported | — | 15GB (have 12GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 19.37 tok/sEstimated | 8GB (have 12GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 28.38 tok/sEstimated | 4GB (have 12GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 12GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Not supported | — | 15GB (have 12GB) |
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 21.80 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 30.82 tok/sEstimated | 3GB (have 12GB) |
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 19.82 tok/sEstimated | 7GB (have 12GB) |
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 30.46 tok/sEstimated | 4GB (have 12GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 24.77 tok/sEstimated | 5GB (have 12GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 34.40 tok/sEstimated | 3GB (have 12GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 12GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Not supported | — | 15GB (have 12GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 12GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Not supported | — | 15GB (have 12GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 12GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Not supported | — | 15GB (have 12GB) |
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 12GB) |
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 12GB) |
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 39.12 tok/sEstimated | 1GB (have 12GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 59.60 tok/sEstimated | 1GB (have 12GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 20.62 tok/sEstimated | 8GB (have 12GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 26.72 tok/sEstimated | 4GB (have 12GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits comfortably | 19.21 tok/sEstimated | 9GB (have 12GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 27.89 tok/sEstimated | 5GB (have 12GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 26.03 tok/sEstimated | 3GB (have 12GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 39.61 tok/sEstimated | 2GB (have 12GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 18.39 tok/sEstimated | 7GB (have 12GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 31.36 tok/sEstimated | 4GB (have 12GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Not supported | — | 13GB (have 12GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 20.73 tok/sEstimated | 7GB (have 12GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 12GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 12GB) |
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 38.05 tok/sEstimated | 1GB (have 12GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 54.12 tok/sEstimated | 1GB (have 12GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 27.49 tok/sEstimated | 3GB (have 12GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 37.91 tok/sEstimated | 2GB (have 12GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 12GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 12GB) |
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 21.63 tok/sEstimated | 7GB (have 12GB) |
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 30.42 tok/sEstimated | 4GB (have 12GB) |
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 21.01 tok/sEstimated | 7GB (have 12GB) |
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 31.05 tok/sEstimated | 4GB (have 12GB) |
| google/gemma-3-270m-it | Q8 | Fits comfortably | 20.36 tok/sEstimated | 7GB (have 12GB) |
| google/gemma-3-270m-it | Q4 | Fits comfortably | 30.07 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 23.46 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 35.96 tok/sEstimated | 2GB (have 12GB) |
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 12GB) |
| Qwen/Qwen2.5-32B | Q4 | Not supported | — | 16GB (have 12GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 18.58 tok/sEstimated | 7GB (have 12GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 27.32 tok/sEstimated | 4GB (have 12GB) |
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 19.45 tok/sEstimated | 7GB (have 12GB) |
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 30.83 tok/sEstimated | 4GB (have 12GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 23.47 tok/sEstimated | 5GB (have 12GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 33.09 tok/sEstimated | 3GB (have 12GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 33.89 tok/sEstimated | 2GB (have 12GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 48.93 tok/sEstimated | 1GB (have 12GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 12GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 12GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 20.31 tok/sEstimated | 7GB (have 12GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 29.61 tok/sEstimated | 4GB (have 12GB) |
| google/gemma-2b | Q8 | Fits comfortably | 32.26 tok/sEstimated | 2GB (have 12GB) |
| google/gemma-2b | Q4 | Fits comfortably | 41.16 tok/sEstimated | 1GB (have 12GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 19.62 tok/sEstimated | 7GB (have 12GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 26.80 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 12GB) |
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 12GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 19.58 tok/sEstimated | 8GB (have 12GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 26.19 tok/sEstimated | 4GB (have 12GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 21.66 tok/sEstimated | 7GB (have 12GB) |
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 30.48 tok/sEstimated | 4GB (have 12GB) |
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 21.94 tok/sEstimated | 7GB (have 12GB) |
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 27.51 tok/sEstimated | 4GB (have 12GB) |
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 20.73 tok/sEstimated | 7GB (have 12GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 30.95 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 26.84 tok/sEstimated | 4GB (have 12GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 32.07 tok/sEstimated | 2GB (have 12GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 12GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Not supported | — | 15GB (have 12GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
Bandwidth math from community benchmarking places the 12 GB RTX 4070 at about 10 tokens/sec on Llama 3 70B Q4 when all layers stay in VRAM and kernels hold ~70% efficiency.
Source: Reddit – /r/LocalLLaMA (meaafcw)
Misconfigured offload is common—a 4070 laptop user reported 1.7 tok/s until shifting more layers from system RAM onto the GPU in llama.cpp.
Source: Reddit – /r/LocalLLaMA (l2it43q)
Yes. Notebook owners upgrading from 32 GB to 64 GB system RAM note that larger context windows and higher quants stop thrashing once the extra memory is installed.
Source: Reddit – /r/LocalLLaMA (lrqupbc)
The RTX 4070 carries a 200 W board power, provides 12 GB GDDR6X, and uses the 16-pin 12VHPWR plug. NVIDIA advises pairing it with a 650 W PSU.
Source: TechPowerUp – RTX 4070 Specs
As of 3 Nov 2025, Amazon listed RTX 4070 cards at $599 (in stock), Newegg at $619, and Best Buy at $599.
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.