Loading GPU data...
Loading GPU data...
Quick Answer: RTX 4070 Ti offers 12GB VRAM and starts around $799.00. It delivers approximately 81 tokens/sec on meta-llama/Llama-Guard-3-1B. It typically draws 285W under load.
RTX 4070 Ti is the sweet spot for 7B–13B inference. It hits solid tokens/sec without the power demands or price tag of the higher-end Ada cards.
| Model | Quantization | Tokens/sec | VRAM used | 
|---|---|---|---|
| meta-llama/Llama-Guard-3-1B | Q4 | 80.81 tok/sEstimated Auto-generated benchmark  | 1GB | 
| google/gemma-3-1b-it | Q4 | 80.74 tok/sEstimated Auto-generated benchmark  | 1GB | 
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 78.05 tok/sEstimated Auto-generated benchmark  | 1GB | 
| unsloth/gemma-3-1b-it | Q4 | 77.14 tok/sEstimated Auto-generated benchmark  | 1GB | 
| unsloth/Llama-3.2-1B-Instruct | Q4 | 72.56 tok/sEstimated Auto-generated benchmark  | 1GB | 
| allenai/OLMo-2-0425-1B | Q4 | 71.60 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 70.02 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-3.2-1B | Q4 | 69.70 tok/sEstimated Auto-generated benchmark  | 1GB | 
| apple/OpenELM-1_1B-Instruct | Q4 | 69.43 tok/sEstimated Auto-generated benchmark  | 1GB | 
| ibm-granite/granite-3.3-2b-instruct | Q4 | 61.19 tok/sEstimated Auto-generated benchmark  | 1GB | 
| google/gemma-3-1b-it | Q8 | 57.58 tok/sEstimated Auto-generated benchmark  | 1GB | 
| google/gemma-2-2b-it | Q4 | 57.23 tok/sEstimated Auto-generated benchmark  | 1GB | 
| google/gemma-2b | Q4 | 57.20 tok/sEstimated Auto-generated benchmark  | 1GB | 
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 56.87 tok/sEstimated Auto-generated benchmark  | 1GB | 
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 55.92 tok/sEstimated Auto-generated benchmark  | 2GB | 
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 55.17 tok/sEstimated Auto-generated benchmark  | 1GB | 
| LiquidAI/LFM2-1.2B | Q4 | 55.14 tok/sEstimated Auto-generated benchmark  | 1GB | 
| unsloth/Llama-3.2-3B-Instruct | Q4 | 55.06 tok/sEstimated Auto-generated benchmark  | 2GB | 
| ibm-research/PowerMoE-3b | Q4 | 54.57 tok/sEstimated Auto-generated benchmark  | 2GB | 
| allenai/OLMo-2-0425-1B | Q8 | 54.55 tok/sEstimated Auto-generated benchmark  | 1GB | 
| inference-net/Schematron-3B | Q4 | 54.32 tok/sEstimated Auto-generated benchmark  | 2GB | 
| bigcode/starcoder2-3b | Q4 | 54.12 tok/sEstimated Auto-generated benchmark  | 2GB | 
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 54.03 tok/sEstimated Auto-generated benchmark  | 2GB | 
| meta-llama/Llama-Guard-3-1B | Q8 | 54.02 tok/sEstimated Auto-generated benchmark  | 1GB | 
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 52.96 tok/sEstimated Auto-generated benchmark  | 2GB | 
| meta-llama/Llama-3.2-3B | Q4 | 52.54 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-3B-Instruct | Q4 | 51.87 tok/sEstimated Auto-generated benchmark  | 2GB | 
| apple/OpenELM-1_1B-Instruct | Q8 | 51.03 tok/sEstimated Auto-generated benchmark  | 1GB | 
| unsloth/gemma-3-1b-it | Q8 | 50.49 tok/sEstimated Auto-generated benchmark  | 1GB | 
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 50.34 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 49.12 tok/sEstimated Auto-generated benchmark  | 2GB | 
| meta-llama/Llama-3.2-1B | Q8 | 48.79 tok/sEstimated Auto-generated benchmark  | 1GB | 
| unsloth/Llama-3.2-1B-Instruct | Q8 | 48.18 tok/sEstimated Auto-generated benchmark  | 1GB | 
| Qwen/Qwen3-Embedding-4B | Q4 | 47.92 tok/sEstimated Auto-generated benchmark  | 2GB | 
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 47.57 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen3-4B | Q4 | 47.16 tok/sEstimated Auto-generated benchmark  | 2GB | 
| google-t5/t5-3b | Q4 | 46.85 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-3B | Q4 | 46.68 tok/sEstimated Auto-generated benchmark  | 2GB | 
| microsoft/Phi-3.5-mini-instruct | Q4 | 46.40 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 45.78 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2-0.5B-Instruct | Q4 | 44.71 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen2.5-0.5B | Q4 | 44.67 tok/sEstimated Auto-generated benchmark  | 3GB | 
| google/gemma-2b | Q8 | 44.08 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen3-4B-Base | Q4 | 43.57 tok/sEstimated Auto-generated benchmark  | 2GB | 
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 43.41 tok/sEstimated Auto-generated benchmark  | 2GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 43.24 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen3-0.6B | Q4 | 42.29 tok/sEstimated Auto-generated benchmark  | 3GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 42.19 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-Math-1.5B | Q4 | 41.89 tok/sEstimated Auto-generated benchmark  | 3GB | 
| ibm-granite/granite-3.3-2b-instruct | Q8 | 41.80 tok/sEstimated Auto-generated benchmark  | 2GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 41.63 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 41.42 tok/sEstimated Auto-generated benchmark  | 3GB | 
| openai-community/gpt2 | Q4 | 41.30 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 41.28 tok/sEstimated Auto-generated benchmark  | 3GB | 
| zai-org/GLM-4.5-Air | Q4 | 41.19 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 41.06 tok/sEstimated Auto-generated benchmark  | 3GB | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 41.04 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 40.98 tok/sEstimated Auto-generated benchmark  | 3GB | 
| bigscience/bloomz-560m | Q4 | 40.93 tok/sEstimated Auto-generated benchmark  | 4GB | 
| dicta-il/dictalm2.0-instruct | Q4 | 40.92 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-7B-Instruct | Q4 | 40.86 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2-1.5B-Instruct | Q4 | 40.79 tok/sEstimated Auto-generated benchmark  | 3GB | 
| liuhaotian/llava-v1.5-7b | Q4 | 40.62 tok/sEstimated Auto-generated benchmark  | 4GB | 
| LiquidAI/LFM2-1.2B | Q8 | 40.57 tok/sEstimated Auto-generated benchmark  | 2GB | 
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 40.46 tok/sEstimated Auto-generated benchmark  | 3GB | 
| google/gemma-3-270m-it | Q4 | 40.39 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-3.5-vision-instruct | Q4 | 40.14 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-1.7B-Base | Q4 | 40.04 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-3-mini-4k-instruct | Q4 | 40.01 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 39.96 tok/sEstimated Auto-generated benchmark  | 3GB | 
| openai-community/gpt2-large | Q4 | 39.95 tok/sEstimated Auto-generated benchmark  | 4GB | 
| HuggingFaceH4/zephyr-7b-beta | Q4 | 39.87 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-V3.1 | Q4 | 39.83 tok/sEstimated Auto-generated benchmark  | 4GB | 
| ibm-granite/granite-docling-258M | Q4 | 39.69 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2-7B-Instruct | Q4 | 39.55 tok/sEstimated Auto-generated benchmark  | 4GB | 
| zai-org/GLM-4.6-FP8 | Q4 | 39.53 tok/sEstimated Auto-generated benchmark  | 4GB | 
| rinna/japanese-gpt-neox-small | Q4 | 39.46 tok/sEstimated Auto-generated benchmark  | 4GB | 
| petals-team/StableBeluga2 | Q4 | 39.38 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-1.5B | Q4 | 39.28 tok/sEstimated Auto-generated benchmark  | 3GB | 
| meta-llama/Llama-2-7b-hf | Q4 | 39.27 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2-0.5B | Q4 | 39.26 tok/sEstimated Auto-generated benchmark  | 3GB | 
| microsoft/VibeVoice-1.5B | Q4 | 39.21 tok/sEstimated Auto-generated benchmark  | 3GB | 
| microsoft/Phi-3.5-mini-instruct | Q4 | 39.20 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-4-multimodal-instruct | Q4 | 39.02 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-3.1-8B | Q4 | 38.96 tok/sEstimated Auto-generated benchmark  | 4GB | 
| parler-tts/parler-tts-large-v1 | Q4 | 38.96 tok/sEstimated Auto-generated benchmark  | 4GB | 
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 38.86 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/phi-2 | Q4 | 38.81 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/DialoGPT-medium | Q4 | 38.80 tok/sEstimated Auto-generated benchmark  | 4GB | 
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 38.75 tok/sEstimated Auto-generated benchmark  | 4GB | 
| IlyaGusev/saiga_llama3_8b | Q4 | 38.69 tok/sEstimated Auto-generated benchmark  | 4GB | 
| openai-community/gpt2-medium | Q4 | 38.69 tok/sEstimated Auto-generated benchmark  | 4GB | 
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 38.58 tok/sEstimated Auto-generated benchmark  | 4GB | 
| google/gemma-2-2b-it | Q8 | 38.50 tok/sEstimated Auto-generated benchmark  | 2GB | 
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 38.48 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-Guard-3-8B | Q4 | 38.33 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-4-mini-instruct | Q4 | 38.24 tok/sEstimated Auto-generated benchmark  | 4GB | 
| skt/kogpt2-base-v2 | Q4 | 38.19 tok/sEstimated Auto-generated benchmark  | 4GB | 
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 38.12 tok/sEstimated Auto-generated benchmark  | 4GB | 
| huggyllama/llama-7b | Q4 | 38.09 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 37.98 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Meta-Llama-3-8B | Q4 | 37.79 tok/sEstimated Auto-generated benchmark  | 4GB | 
| facebook/opt-125m | Q4 | 37.74 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 37.71 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-8B-Base | Q4 | 37.66 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-8B | Q4 | 37.65 tok/sEstimated Auto-generated benchmark  | 4GB | 
| mistralai/Mistral-7B-v0.1 | Q4 | 37.56 tok/sEstimated Auto-generated benchmark  | 4GB | 
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 37.55 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen3-Embedding-8B | Q4 | 37.51 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-0.6B-Base | Q4 | 37.45 tok/sEstimated Auto-generated benchmark  | 3GB | 
| deepseek-ai/DeepSeek-R1 | Q4 | 37.43 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 37.38 tok/sEstimated Auto-generated benchmark  | 4GB | 
| openai-community/gpt2-xl | Q4 | 37.36 tok/sEstimated Auto-generated benchmark  | 4GB | 
| ibm-research/PowerMoE-3b | Q8 | 37.32 tok/sEstimated Auto-generated benchmark  | 3GB | 
| microsoft/phi-4 | Q4 | 37.11 tok/sEstimated Auto-generated benchmark  | 4GB | 
| numind/NuExtract-1.5 | Q4 | 37.06 tok/sEstimated Auto-generated benchmark  | 4GB | 
| HuggingFaceTB/SmolLM2-135M | Q4 | 36.96 tok/sEstimated Auto-generated benchmark  | 4GB | 
| sshleifer/tiny-gpt2 | Q4 | 36.84 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-7B | Q4 | 36.79 tok/sEstimated Auto-generated benchmark  | 4GB | 
| ibm-granite/granite-3.3-8b-instruct | Q4 | 36.58 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-7B-Instruct | Q4 | 36.41 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2.5-3B-Instruct | Q8 | 36.39 tok/sEstimated Auto-generated benchmark  | 3GB | 
| Qwen/Qwen3-Reranker-0.6B | Q4 | 36.38 tok/sEstimated Auto-generated benchmark  | 3GB | 
| EleutherAI/pythia-70m-deduped | Q4 | 36.37 tok/sEstimated Auto-generated benchmark  | 4GB | 
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 36.29 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/DialoGPT-small | Q4 | 36.20 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-Embedding-0.6B | Q4 | 36.06 tok/sEstimated Auto-generated benchmark  | 3GB | 
| llamafactory/tiny-random-Llama-3 | Q4 | 35.78 tok/sEstimated Auto-generated benchmark  | 4GB | 
| HuggingFaceTB/SmolLM-135M | Q4 | 35.74 tok/sEstimated Auto-generated benchmark  | 4GB | 
| distilbert/distilgpt2 | Q4 | 35.69 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 35.67 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 35.66 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 35.65 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/DeepSeek-V3 | Q4 | 35.52 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-3.2-3B | Q8 | 35.52 tok/sEstimated Auto-generated benchmark  | 3GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 35.51 tok/sEstimated Auto-generated benchmark  | 4GB | 
| lmsys/vicuna-7b-v1.5 | Q4 | 35.32 tok/sEstimated Auto-generated benchmark  | 4GB | 
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 35.16 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-Embedding-4B | Q8 | 35.11 tok/sEstimated Auto-generated benchmark  | 4GB | 
| vikhyatk/moondream2 | Q4 | 35.09 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-1.7B | Q4 | 35.08 tok/sEstimated Auto-generated benchmark  | 4GB | 
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 35.04 tok/sEstimated Auto-generated benchmark  | 5GB | 
| BSC-LT/salamandraTA-7b-instruct | Q4 | 35.04 tok/sEstimated Auto-generated benchmark  | 4GB | 
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 35.00 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-2-7b-chat-hf | Q4 | 34.80 tok/sEstimated Auto-generated benchmark  | 4GB | 
| google-t5/t5-3b | Q8 | 34.66 tok/sEstimated Auto-generated benchmark  | 3GB | 
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 34.62 tok/sEstimated Auto-generated benchmark  | 3GB | 
| EleutherAI/gpt-neo-125m | Q4 | 34.52 tok/sEstimated Auto-generated benchmark  | 4GB | 
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 34.51 tok/sEstimated Auto-generated benchmark  | 4GB | 
| MiniMaxAI/MiniMax-M2 | Q4 | 34.35 tok/sEstimated Auto-generated benchmark  | 4GB | 
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 34.33 tok/sEstimated Auto-generated benchmark  | 4GB | 
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 34.33 tok/sEstimated Auto-generated benchmark  | 3GB | 
| rednote-hilab/dots.ocr | Q4 | 34.31 tok/sEstimated Auto-generated benchmark  | 4GB | 
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 34.22 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-3B | Q8 | 34.10 tok/sEstimated Auto-generated benchmark  | 3GB | 
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 34.04 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-3-mini-128k-instruct | Q4 | 34.04 tok/sEstimated Auto-generated benchmark  | 4GB | 
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 33.92 tok/sEstimated Auto-generated benchmark  | 4GB | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 33.89 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-8B-FP8 | Q4 | 33.78 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-4B | Q8 | 33.60 tok/sEstimated Auto-generated benchmark  | 4GB | 
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 33.57 tok/sEstimated Auto-generated benchmark  | 4GB | 
| inference-net/Schematron-3B | Q8 | 33.54 tok/sEstimated Auto-generated benchmark  | 3GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 33.52 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 33.42 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 33.37 tok/sEstimated Auto-generated benchmark  | 4GB | 
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 33.32 tok/sEstimated Auto-generated benchmark  | 5GB | 
| GSAI-ML/LLaDA-8B-Base | Q4 | 33.10 tok/sEstimated Auto-generated benchmark  | 4GB | 
| unsloth/Llama-3.2-3B-Instruct | Q8 | 33.00 tok/sEstimated Auto-generated benchmark  | 3GB | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 32.80 tok/sEstimated Auto-generated benchmark  | 4GB | 
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 32.66 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-1.5B | Q8 | 32.41 tok/sEstimated Auto-generated benchmark  | 5GB | 
| bigcode/starcoder2-3b | Q8 | 32.39 tok/sEstimated Auto-generated benchmark  | 3GB | 
| meta-llama/Llama-2-13b-chat-hf | Q4 | 32.37 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 32.17 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 32.13 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 32.00 tok/sEstimated Auto-generated benchmark  | 5GB | 
| google/gemma-2-9b-it | Q4 | 31.99 tok/sEstimated Auto-generated benchmark  | 6GB | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 31.96 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2-1.5B-Instruct | Q8 | 31.84 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2-0.5B | Q8 | 31.61 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2.5-0.5B | Q8 | 31.39 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 31.34 tok/sEstimated Auto-generated benchmark  | 5GB | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 30.84 tok/sEstimated Auto-generated benchmark  | 4GB | 
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 30.60 tok/sEstimated Auto-generated benchmark  | 4GB | 
| Qwen/Qwen3-4B-Base | Q8 | 30.35 tok/sEstimated Auto-generated benchmark  | 4GB | 
| microsoft/Phi-3.5-mini-instruct | Q8 | 30.32 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen3-0.6B-Base | Q8 | 30.22 tok/sEstimated Auto-generated benchmark  | 6GB | 
| Qwen/Qwen3-0.6B | Q8 | 29.94 tok/sEstimated Auto-generated benchmark  | 6GB | 
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 29.71 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2.5-14B | Q4 | 29.68 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 29.39 tok/sEstimated Auto-generated benchmark  | 4GB | 
| ai-forever/ruGPT-3.5-13B | Q4 | 29.08 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-Math-1.5B | Q8 | 29.00 tok/sEstimated Auto-generated benchmark  | 5GB | 
| meta-llama/Llama-2-7b-hf | Q8 | 28.91 tok/sEstimated Auto-generated benchmark  | 7GB | 
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 28.88 tok/sEstimated Auto-generated benchmark  | 7GB | 
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 28.83 tok/sEstimated Auto-generated benchmark  | 7GB | 
| openai-community/gpt2-xl | Q8 | 28.77 tok/sEstimated Auto-generated benchmark  | 7GB | 
| mistralai/Mistral-7B-v0.1 | Q8 | 28.73 tok/sEstimated Auto-generated benchmark  | 7GB | 
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 28.71 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2-0.5B-Instruct | Q8 | 28.68 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen3-14B | Q4 | 28.52 tok/sEstimated Auto-generated benchmark  | 7GB | 
| rinna/japanese-gpt-neox-small | Q8 | 28.51 tok/sEstimated Auto-generated benchmark  | 7GB | 
| facebook/opt-125m | Q8 | 28.37 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen3-14B-Base | Q4 | 28.32 tok/sEstimated Auto-generated benchmark  | 7GB | 
| openai-community/gpt2-large | Q8 | 28.30 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen3-Reranker-0.6B | Q8 | 28.17 tok/sEstimated Auto-generated benchmark  | 6GB | 
| huggyllama/llama-7b | Q8 | 28.11 tok/sEstimated Auto-generated benchmark  | 7GB | 
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 28.10 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-3-mini-128k-instruct | Q8 | 28.07 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/VibeVoice-1.5B | Q8 | 27.97 tok/sEstimated Auto-generated benchmark  | 5GB | 
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 27.94 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-4-multimodal-instruct | Q8 | 27.82 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 27.75 tok/sEstimated Auto-generated benchmark  | 5GB | 
| distilbert/distilgpt2 | Q8 | 27.66 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-14B-Instruct | Q4 | 27.43 tok/sEstimated Auto-generated benchmark  | 7GB | 
| BSC-LT/salamandraTA-7b-instruct | Q8 | 27.32 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2-7B-Instruct | Q8 | 27.32 tok/sEstimated Auto-generated benchmark  | 7GB | 
| meta-llama/Llama-2-7b-chat-hf | Q8 | 27.19 tok/sEstimated Auto-generated benchmark  | 7GB | 
| numind/NuExtract-1.5 | Q8 | 27.09 tok/sEstimated Auto-generated benchmark  | 7GB | 
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 27.06 tok/sEstimated Auto-generated benchmark  | 8GB | 
| meta-llama/Llama-3.1-8B | Q8 | 27.04 tok/sEstimated Auto-generated benchmark  | 8GB | 
| OpenPipe/Qwen3-14B-Instruct | Q4 | 27.04 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-14B-Instruct | Q4 | 26.97 tok/sEstimated Auto-generated benchmark  | 9GB | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 26.87 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen2.5-7B | Q8 | 26.86 tok/sEstimated Auto-generated benchmark  | 7GB | 
| deepseek-ai/DeepSeek-V3.1 | Q8 | 26.85 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-3-medium-128k-instruct | Q4 | 26.58 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen3-Embedding-0.6B | Q8 | 26.57 tok/sEstimated Auto-generated benchmark  | 6GB | 
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 26.48 tok/sEstimated Auto-generated benchmark  | 7GB | 
| petals-team/StableBeluga2 | Q8 | 26.47 tok/sEstimated Auto-generated benchmark  | 7GB | 
| IlyaGusev/saiga_llama3_8b | Q8 | 26.47 tok/sEstimated Auto-generated benchmark  | 8GB | 
| openai-community/gpt2 | Q8 | 26.40 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen3-1.7B-Base | Q8 | 26.39 tok/sEstimated Auto-generated benchmark  | 7GB | 
| HuggingFaceTB/SmolLM-135M | Q8 | 26.34 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen3-8B-FP8 | Q8 | 26.34 tok/sEstimated Auto-generated benchmark  | 8GB | 
| sshleifer/tiny-gpt2 | Q8 | 26.21 tok/sEstimated Auto-generated benchmark  | 7GB | 
| ibm-granite/granite-docling-258M | Q8 | 26.17 tok/sEstimated Auto-generated benchmark  | 7GB | 
| EleutherAI/pythia-70m-deduped | Q8 | 26.15 tok/sEstimated Auto-generated benchmark  | 7GB | 
| llamafactory/tiny-random-Llama-3 | Q8 | 26.05 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen3-1.7B | Q8 | 25.98 tok/sEstimated Auto-generated benchmark  | 7GB | 
| dicta-il/dictalm2.0-instruct | Q8 | 25.95 tok/sEstimated Auto-generated benchmark  | 7GB | 
| unsloth/gpt-oss-20b-BF16 | Q4 | 25.91 tok/sEstimated Auto-generated benchmark  | 10GB | 
| meta-llama/Llama-Guard-3-8B | Q8 | 25.89 tok/sEstimated Auto-generated benchmark  | 8GB | 
| deepseek-ai/DeepSeek-R1 | Q8 | 25.85 tok/sEstimated Auto-generated benchmark  | 7GB | 
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 25.81 tok/sEstimated Auto-generated benchmark  | 8GB | 
| HuggingFaceTB/SmolLM2-135M | Q8 | 25.78 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/DialoGPT-small | Q8 | 25.73 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/DialoGPT-medium | Q8 | 25.70 tok/sEstimated Auto-generated benchmark  | 7GB | 
| parler-tts/parler-tts-large-v1 | Q8 | 25.70 tok/sEstimated Auto-generated benchmark  | 7GB | 
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 25.66 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen3-8B-Base | Q8 | 25.65 tok/sEstimated Auto-generated benchmark  | 8GB | 
| microsoft/Phi-3.5-mini-instruct | Q8 | 25.64 tok/sEstimated Auto-generated benchmark  | 7GB | 
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 25.64 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-3-mini-4k-instruct | Q8 | 25.62 tok/sEstimated Auto-generated benchmark  | 7GB | 
| EleutherAI/gpt-neo-125m | Q8 | 25.60 tok/sEstimated Auto-generated benchmark  | 7GB | 
| liuhaotian/llava-v1.5-7b | Q8 | 25.60 tok/sEstimated Auto-generated benchmark  | 7GB | 
| lmsys/vicuna-7b-v1.5 | Q8 | 25.54 tok/sEstimated Auto-generated benchmark  | 7GB | 
| deepseek-ai/DeepSeek-V3 | Q8 | 25.45 tok/sEstimated Auto-generated benchmark  | 7GB | 
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 25.44 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-7B-Instruct | Q8 | 25.40 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/phi-4 | Q8 | 25.38 tok/sEstimated Auto-generated benchmark  | 7GB | 
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 25.17 tok/sEstimated Auto-generated benchmark  | 8GB | 
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 25.17 tok/sEstimated Auto-generated benchmark  | 10GB | 
| meta-llama/Meta-Llama-3-8B | Q8 | 25.16 tok/sEstimated Auto-generated benchmark  | 8GB | 
| bigscience/bloomz-560m | Q8 | 25.10 tok/sEstimated Auto-generated benchmark  | 7GB | 
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 25.09 tok/sEstimated Auto-generated benchmark  | 7GB | 
| MiniMaxAI/MiniMax-M2 | Q8 | 24.94 tok/sEstimated Auto-generated benchmark  | 7GB | 
| Qwen/Qwen2.5-7B-Instruct | Q8 | 24.80 tok/sEstimated Auto-generated benchmark  | 9GB | 
| google/gemma-3-270m-it | Q8 | 24.74 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-4-mini-instruct | Q8 | 24.61 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/phi-2 | Q8 | 24.50 tok/sEstimated Auto-generated benchmark  | 7GB | 
| rednote-hilab/dots.ocr | Q8 | 24.45 tok/sEstimated Auto-generated benchmark  | 7GB | 
| openai-community/gpt2-medium | Q8 | 24.44 tok/sEstimated Auto-generated benchmark  | 7GB | 
| zai-org/GLM-4.5-Air | Q8 | 24.41 tok/sEstimated Auto-generated benchmark  | 7GB | 
| GSAI-ML/LLaDA-8B-Base | Q8 | 24.37 tok/sEstimated Auto-generated benchmark  | 8GB | 
| vikhyatk/moondream2 | Q8 | 24.28 tok/sEstimated Auto-generated benchmark  | 7GB | 
| zai-org/GLM-4.6-FP8 | Q8 | 24.28 tok/sEstimated Auto-generated benchmark  | 7GB | 
| microsoft/Phi-3.5-vision-instruct | Q8 | 24.23 tok/sEstimated Auto-generated benchmark  | 7GB | 
| openai/gpt-oss-20b | Q4 | 24.22 tok/sEstimated Auto-generated benchmark  | 10GB | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 24.16 tok/sEstimated Auto-generated benchmark  | 7GB | 
| skt/kogpt2-base-v2 | Q8 | 24.15 tok/sEstimated Auto-generated benchmark  | 7GB | 
| ibm-granite/granite-3.3-8b-instruct | Q8 | 24.11 tok/sEstimated Auto-generated benchmark  | 8GB | 
| Qwen/Qwen3-Embedding-8B | Q8 | 24.01 tok/sEstimated Auto-generated benchmark  | 8GB | 
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 23.99 tok/sEstimated Auto-generated benchmark  | 10GB | 
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 23.97 tok/sEstimated Auto-generated benchmark  | 7GB | 
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 23.93 tok/sEstimated Auto-generated benchmark  | 7GB | 
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 23.85 tok/sEstimated Auto-generated benchmark  | 8GB | 
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 23.83 tok/sEstimated Auto-generated benchmark  | 8GB | 
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 23.73 tok/sEstimated Auto-generated benchmark  | 7GB | 
| HuggingFaceH4/zephyr-7b-beta | Q8 | 23.70 tok/sEstimated Auto-generated benchmark  | 7GB | 
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 23.62 tok/sEstimated Auto-generated benchmark  | 9GB | 
| google/gemma-2-9b-it | Q8 | 23.37 tok/sEstimated Auto-generated benchmark  | 11GB | 
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 23.33 tok/sEstimated Auto-generated benchmark  | 8GB | 
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 23.26 tok/sEstimated Auto-generated benchmark  | 9GB | 
| Qwen/Qwen3-8B | Q8 | 23.13 tok/sEstimated Auto-generated benchmark  | 8GB | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 22.99 tok/sEstimated Auto-generated benchmark  | 8GB | 
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 22.96 tok/sEstimated Auto-generated benchmark  | 9GB | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 22.68 tok/sEstimated Auto-generated benchmark  | 8GB | 
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed | 
|---|---|---|---|---|
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | Not supported | — | 79GB (have 12GB) | 
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | Not supported | — | 40GB (have 12GB) | 
| 01-ai/Yi-1.5-34B-Chat | Q8 | Not supported | — | 39GB (have 12GB) | 
| 01-ai/Yi-1.5-34B-Chat | Q4 | Not supported | — | 20GB (have 12GB) | 
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | Fits comfortably | 22.96 tok/sEstimated  | 9GB (have 12GB) | 
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | Fits comfortably | 33.32 tok/sEstimated  | 5GB (have 12GB) | 
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | Not supported | — | 79GB (have 12GB) | 
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | Not supported | — | 40GB (have 12GB) | 
| microsoft/Phi-3-medium-128k-instruct | Q8 | Not supported | — | 16GB (have 12GB) | 
| microsoft/Phi-3-medium-128k-instruct | Q4 | Fits comfortably | 26.58 tok/sEstimated  | 8GB (have 12GB) | 
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 30.32 tok/sEstimated  | 5GB (have 12GB) | 
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 46.40 tok/sEstimated  | 3GB (have 12GB) | 
| google/gemma-2-9b-it | Q8 | Fits (tight) | 23.37 tok/sEstimated  | 11GB (have 12GB) | 
| google/gemma-2-9b-it | Q4 | Fits comfortably | 31.99 tok/sEstimated  | 6GB (have 12GB) | 
| google/gemma-2-27b-it | Q8 | Not supported | — | 31GB (have 12GB) | 
| google/gemma-2-27b-it | Q4 | Not supported | — | 16GB (have 12GB) | 
| mistralai/Mistral-Small-Instruct-2409 | Q8 | Not supported | — | 25GB (have 12GB) | 
| mistralai/Mistral-Small-Instruct-2409 | Q4 | Not supported | — | 13GB (have 12GB) | 
| mistralai/Mistral-Large-Instruct-2411 | Q8 | Not supported | — | 138GB (have 12GB) | 
| mistralai/Mistral-Large-Instruct-2411 | Q4 | Not supported | — | 69GB (have 12GB) | 
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | Not supported | — | 158GB (have 12GB) | 
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | Not supported | — | 79GB (have 12GB) | 
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 37.98 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 54.03 tok/sEstimated  | 2GB (have 12GB) | 
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 23.62 tok/sEstimated  | 9GB (have 12GB) | 
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 33.42 tok/sEstimated  | 5GB (have 12GB) | 
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 79GB (have 12GB) | 
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 40GB (have 12GB) | 
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 79GB (have 12GB) | 
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 40GB (have 12GB) | 
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | Not supported | — | 38GB (have 12GB) | 
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | Not supported | — | 19GB (have 12GB) | 
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | Not supported | — | 264GB (have 12GB) | 
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | Not supported | — | 132GB (have 12GB) | 
| deepseek-ai/DeepSeek-V2.5 | Q8 | Not supported | — | 264GB (have 12GB) | 
| deepseek-ai/DeepSeek-V2.5 | Q4 | Not supported | — | 132GB (have 12GB) | 
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | Not supported | — | 82GB (have 12GB) | 
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | Not supported | — | 41GB (have 12GB) | 
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 24.80 tok/sEstimated  | 9GB (have 12GB) | 
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 36.41 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2.5-14B-Instruct | Q8 | Not supported | — | 17GB (have 12GB) | 
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 26.97 tok/sEstimated  | 9GB (have 12GB) | 
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 37GB (have 12GB) | 
| Qwen/Qwen2.5-32B-Instruct | Q4 | Not supported | — | 19GB (have 12GB) | 
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | Not supported | — | 37GB (have 12GB) | 
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | Not supported | — | 19GB (have 12GB) | 
| Qwen/QwQ-32B-Preview | Q8 | Not supported | — | 37GB (have 12GB) | 
| Qwen/QwQ-32B-Preview | Q4 | Not supported | — | 19GB (have 12GB) | 
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 82GB (have 12GB) | 
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 41GB (have 12GB) | 
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 12GB) | 
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 12GB) | 
| ai-forever/ruGPT-3.5-13B | Q8 | Not supported | — | 13GB (have 12GB) | 
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 29.08 tok/sEstimated  | 7GB (have 12GB) | 
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 12GB) | 
| baichuan-inc/Baichuan-M2-32B | Q4 | Not supported | — | 16GB (have 12GB) | 
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 25.09 tok/sEstimated  | 7GB (have 12GB) | 
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 34.51 tok/sEstimated  | 4GB (have 12GB) | 
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 24.11 tok/sEstimated  | 8GB (have 12GB) | 
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 36.58 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 26.39 tok/sEstimated  | 7GB (have 12GB) | 
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 40.04 tok/sEstimated  | 4GB (have 12GB) | 
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Not supported | — | 20GB (have 12GB) | 
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits comfortably | 25.17 tok/sEstimated  | 10GB (have 12GB) | 
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 27.32 tok/sEstimated  | 7GB (have 12GB) | 
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 35.04 tok/sEstimated  | 4GB (have 12GB) | 
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 25.95 tok/sEstimated  | 7GB (have 12GB) | 
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 40.92 tok/sEstimated  | 4GB (have 12GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 12GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Not supported | — | 15GB (have 12GB) | 
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 24.37 tok/sEstimated  | 8GB (have 12GB) | 
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 33.10 tok/sEstimated  | 4GB (have 12GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 12GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Not supported | — | 15GB (have 12GB) | 
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 28.68 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 44.71 tok/sEstimated  | 3GB (have 12GB) | 
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 25.45 tok/sEstimated  | 7GB (have 12GB) | 
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 35.52 tok/sEstimated  | 4GB (have 12GB) | 
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 31.34 tok/sEstimated  | 5GB (have 12GB) | 
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 41.06 tok/sEstimated  | 3GB (have 12GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 12GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Not supported | — | 15GB (have 12GB) | 
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 12GB) | 
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Not supported | — | 15GB (have 12GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 12GB) | 
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Not supported | — | 15GB (have 12GB) | 
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 12GB) | 
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 12GB) | 
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 51.03 tok/sEstimated  | 1GB (have 12GB) | 
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 69.43 tok/sEstimated  | 1GB (have 12GB) | 
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 23.33 tok/sEstimated  | 8GB (have 12GB) | 
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 33.92 tok/sEstimated  | 4GB (have 12GB) | 
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits comfortably | 23.26 tok/sEstimated  | 9GB (have 12GB) | 
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 35.04 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 34.10 tok/sEstimated  | 3GB (have 12GB) | 
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 46.68 tok/sEstimated  | 2GB (have 12GB) | 
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 25.54 tok/sEstimated  | 7GB (have 12GB) | 
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 35.32 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-2-13b-chat-hf | Q8 | Not supported | — | 13GB (have 12GB) | 
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 32.37 tok/sEstimated  | 7GB (have 12GB) | 
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 12GB) | 
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 12GB) | 
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 50.49 tok/sEstimated  | 1GB (have 12GB) | 
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 77.14 tok/sEstimated  | 1GB (have 12GB) | 
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 32.39 tok/sEstimated  | 3GB (have 12GB) | 
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 54.12 tok/sEstimated  | 2GB (have 12GB) | 
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 12GB) | 
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 12GB) | 
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 26.17 tok/sEstimated  | 7GB (have 12GB) | 
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 39.69 tok/sEstimated  | 4GB (have 12GB) | 
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 24.15 tok/sEstimated  | 7GB (have 12GB) | 
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 38.19 tok/sEstimated  | 4GB (have 12GB) | 
| google/gemma-3-270m-it | Q8 | Fits comfortably | 24.74 tok/sEstimated  | 7GB (have 12GB) | 
| google/gemma-3-270m-it | Q4 | Fits comfortably | 40.39 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 32.13 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 45.78 tok/sEstimated  | 2GB (have 12GB) | 
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 12GB) | 
| Qwen/Qwen2.5-32B | Q4 | Not supported | — | 16GB (have 12GB) | 
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 25.70 tok/sEstimated  | 7GB (have 12GB) | 
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 38.96 tok/sEstimated  | 4GB (have 12GB) | 
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 26.15 tok/sEstimated  | 7GB (have 12GB) | 
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 36.37 tok/sEstimated  | 4GB (have 12GB) | 
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 27.97 tok/sEstimated  | 5GB (have 12GB) | 
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 39.21 tok/sEstimated  | 3GB (have 12GB) | 
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 41.80 tok/sEstimated  | 2GB (have 12GB) | 
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 61.19 tok/sEstimated  | 1GB (have 12GB) | 
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 12GB) | 
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 12GB) | 
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 25.60 tok/sEstimated  | 7GB (have 12GB) | 
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 40.62 tok/sEstimated  | 4GB (have 12GB) | 
| google/gemma-2b | Q8 | Fits comfortably | 44.08 tok/sEstimated  | 2GB (have 12GB) | 
| google/gemma-2b | Q4 | Fits comfortably | 57.20 tok/sEstimated  | 1GB (have 12GB) | 
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 28.88 tok/sEstimated  | 7GB (have 12GB) | 
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 38.12 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 12GB) | 
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 12GB) | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 26.87 tok/sEstimated  | 8GB (have 12GB) | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 32.80 tok/sEstimated  | 4GB (have 12GB) | 
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 24.61 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 38.24 tok/sEstimated  | 4GB (have 12GB) | 
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 26.05 tok/sEstimated  | 7GB (have 12GB) | 
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 35.78 tok/sEstimated  | 4GB (have 12GB) | 
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 23.70 tok/sEstimated  | 7GB (have 12GB) | 
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 39.87 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 33.37 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 49.12 tok/sEstimated  | 2GB (have 12GB) | 
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 12GB) | 
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Not supported | — | 15GB (have 12GB) | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 22.99 tok/sEstimated  | 8GB (have 12GB) | 
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 33.89 tok/sEstimated  | 4GB (have 12GB) | 
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 48.18 tok/sEstimated  | 1GB (have 12GB) | 
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 72.56 tok/sEstimated  | 1GB (have 12GB) | 
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 23.85 tok/sEstimated  | 8GB (have 12GB) | 
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 38.48 tok/sEstimated  | 4GB (have 12GB) | 
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 12GB) | 
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 12GB) | 
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 27.94 tok/sEstimated  | 7GB (have 12GB) | 
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 35.67 tok/sEstimated  | 4GB (have 12GB) | 
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 27.09 tok/sEstimated  | 7GB (have 12GB) | 
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 37.06 tok/sEstimated  | 4GB (have 12GB) | 
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 28.71 tok/sEstimated  | 7GB (have 12GB) | 
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 35.65 tok/sEstimated  | 4GB (have 12GB) | 
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 23.93 tok/sEstimated  | 7GB (have 12GB) | 
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 35.16 tok/sEstimated  | 4GB (have 12GB) | 
| huggyllama/llama-7b | Q8 | Fits comfortably | 28.11 tok/sEstimated  | 7GB (have 12GB) | 
| huggyllama/llama-7b | Q4 | Fits comfortably | 38.09 tok/sEstimated  | 4GB (have 12GB) | 
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 23.97 tok/sEstimated  | 7GB (have 12GB) | 
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 37.38 tok/sEstimated  | 4GB (have 12GB) | 
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 28.07 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 34.04 tok/sEstimated  | 4GB (have 12GB) | 
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 26.21 tok/sEstimated  | 7GB (have 12GB) | 
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 36.84 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 25.89 tok/sEstimated  | 8GB (have 12GB) | 
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 38.33 tok/sEstimated  | 4GB (have 12GB) | 
| openai-community/gpt2-xl | Q8 | Fits comfortably | 28.77 tok/sEstimated  | 7GB (have 12GB) | 
| openai-community/gpt2-xl | Q4 | Fits comfortably | 37.36 tok/sEstimated  | 4GB (have 12GB) | 
| OpenPipe/Qwen3-14B-Instruct | Q8 | Not supported | — | 14GB (have 12GB) | 
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 27.04 tok/sEstimated  | 7GB (have 12GB) | 
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 12GB) | 
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 12GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 30.84 tok/sEstimated  | 4GB (have 12GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 41.63 tok/sEstimated  | 2GB (have 12GB) | 
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 37.32 tok/sEstimated  | 3GB (have 12GB) | 
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 54.57 tok/sEstimated  | 2GB (have 12GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 35.51 tok/sEstimated  | 4GB (have 12GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 42.19 tok/sEstimated  | 2GB (have 12GB) | 
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 33.00 tok/sEstimated  | 3GB (have 12GB) | 
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 55.06 tok/sEstimated  | 2GB (have 12GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 33.52 tok/sEstimated  | 4GB (have 12GB) | 
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 43.24 tok/sEstimated  | 2GB (have 12GB) | 
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 35.52 tok/sEstimated  | 3GB (have 12GB) | 
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 52.54 tok/sEstimated  | 2GB (have 12GB) | 
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 25.60 tok/sEstimated  | 7GB (have 12GB) | 
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 34.52 tok/sEstimated  | 4GB (have 12GB) | 
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 12GB) | 
| codellama/CodeLlama-34b-hf | Q4 | Not supported | — | 17GB (have 12GB) | 
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 54.02 tok/sEstimated  | 1GB (have 12GB) | 
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 80.81 tok/sEstimated  | 1GB (have 12GB) | 
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 31.84 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 40.79 tok/sEstimated  | 3GB (have 12GB) | 
| google/gemma-2-2b-it | Q8 | Fits comfortably | 38.50 tok/sEstimated  | 2GB (have 12GB) | 
| google/gemma-2-2b-it | Q4 | Fits comfortably | 57.23 tok/sEstimated  | 1GB (have 12GB) | 
| Qwen/Qwen2.5-14B | Q8 | Not supported | — | 14GB (have 12GB) | 
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 29.68 tok/sEstimated  | 7GB (have 12GB) | 
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 12GB) | 
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Not supported | — | 16GB (have 12GB) | 
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 25.64 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 39.20 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 30.35 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 43.57 tok/sEstimated  | 2GB (have 12GB) | 
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 27.32 tok/sEstimated  | 7GB (have 12GB) | 
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 39.55 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 27.19 tok/sEstimated  | 7GB (have 12GB) | 
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 34.80 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-14B-Base | Q8 | Not supported | — | 14GB (have 12GB) | 
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 28.32 tok/sEstimated  | 7GB (have 12GB) | 
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 27.06 tok/sEstimated  | 8GB (have 12GB) | 
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 34.33 tok/sEstimated  | 4GB (have 12GB) | 
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 24.23 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 40.14 tok/sEstimated  | 4GB (have 12GB) | 
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 28.10 tok/sEstimated  | 7GB (have 12GB) | 
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 38.75 tok/sEstimated  | 4GB (have 12GB) | 
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 28.51 tok/sEstimated  | 7GB (have 12GB) | 
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 39.46 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 32.17 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 41.42 tok/sEstimated  | 3GB (have 12GB) | 
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 26.47 tok/sEstimated  | 8GB (have 12GB) | 
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 38.69 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 12GB) | 
| Qwen/Qwen3-30B-A3B | Q4 | Not supported | — | 15GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 25.85 tok/sEstimated  | 7GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 37.43 tok/sEstimated  | 4GB (have 12GB) | 
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 25.73 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 36.20 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 26.34 tok/sEstimated  | 8GB (have 12GB) | 
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 33.78 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 12GB) | 
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Not supported | — | 15GB (have 12GB) | 
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 35.11 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 47.92 tok/sEstimated  | 2GB (have 12GB) | 
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 27.82 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 39.02 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 25.65 tok/sEstimated  | 8GB (have 12GB) | 
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 37.66 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 30.22 tok/sEstimated  | 6GB (have 12GB) | 
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 37.45 tok/sEstimated  | 3GB (have 12GB) | 
| openai-community/gpt2-medium | Q8 | Fits comfortably | 24.44 tok/sEstimated  | 7GB (have 12GB) | 
| openai-community/gpt2-medium | Q4 | Fits comfortably | 38.69 tok/sEstimated  | 4GB (have 12GB) | 
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 28.83 tok/sEstimated  | 7GB (have 12GB) | 
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 34.22 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 29.00 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 41.89 tok/sEstimated  | 3GB (have 12GB) | 
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 26.34 tok/sEstimated  | 7GB (have 12GB) | 
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 35.74 tok/sEstimated  | 4GB (have 12GB) | 
| unsloth/gpt-oss-20b-BF16 | Q8 | Not supported | — | 20GB (have 12GB) | 
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 25.91 tok/sEstimated  | 10GB (have 12GB) | 
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 12GB) | 
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 12GB) | 
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 23.83 tok/sEstimated  | 8GB (have 12GB) | 
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 35.00 tok/sEstimated  | 4GB (have 12GB) | 
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 24.41 tok/sEstimated  | 7GB (have 12GB) | 
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 41.19 tok/sEstimated  | 4GB (have 12GB) | 
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 23.73 tok/sEstimated  | 7GB (have 12GB) | 
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 38.58 tok/sEstimated  | 4GB (have 12GB) | 
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 40.57 tok/sEstimated  | 2GB (have 12GB) | 
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 55.14 tok/sEstimated  | 1GB (have 12GB) | 
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 28.73 tok/sEstimated  | 7GB (have 12GB) | 
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 37.56 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 12GB) | 
| Qwen/Qwen2.5-32B-Instruct | Q4 | Not supported | — | 16GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 26.48 tok/sEstimated  | 7GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 34.04 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 27.04 tok/sEstimated  | 8GB (have 12GB) | 
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 38.96 tok/sEstimated  | 4GB (have 12GB) | 
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 26.85 tok/sEstimated  | 7GB (have 12GB) | 
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 39.83 tok/sEstimated  | 4GB (have 12GB) | 
| microsoft/phi-4 | Q8 | Fits comfortably | 25.38 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/phi-4 | Q4 | Fits comfortably | 37.11 tok/sEstimated  | 4GB (have 12GB) | 
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 34.33 tok/sEstimated  | 3GB (have 12GB) | 
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 52.96 tok/sEstimated  | 2GB (have 12GB) | 
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 31.61 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 39.26 tok/sEstimated  | 3GB (have 12GB) | 
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 24.94 tok/sEstimated  | 7GB (have 12GB) | 
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 34.35 tok/sEstimated  | 4GB (have 12GB) | 
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 25.70 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 38.80 tok/sEstimated  | 4GB (have 12GB) | 
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 24.28 tok/sEstimated  | 7GB (have 12GB) | 
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 39.53 tok/sEstimated  | 4GB (have 12GB) | 
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 25.78 tok/sEstimated  | 7GB (have 12GB) | 
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 36.96 tok/sEstimated  | 4GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 22.68 tok/sEstimated  | 8GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 37.71 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 28.91 tok/sEstimated  | 7GB (have 12GB) | 
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 39.27 tok/sEstimated  | 4GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 24.16 tok/sEstimated  | 7GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 41.04 tok/sEstimated  | 4GB (have 12GB) | 
| microsoft/phi-2 | Q8 | Fits comfortably | 24.50 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/phi-2 | Q4 | Fits comfortably | 38.81 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 12GB) | 
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 12GB) | 
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 31.39 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 44.67 tok/sEstimated  | 3GB (have 12GB) | 
| Qwen/Qwen3-14B | Q8 | Not supported | — | 14GB (have 12GB) | 
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 28.52 tok/sEstimated  | 7GB (have 12GB) | 
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 24.01 tok/sEstimated  | 8GB (have 12GB) | 
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 37.51 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 12GB) | 
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 12GB) | 
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 25.66 tok/sEstimated  | 8GB (have 12GB) | 
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 33.57 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen2.5-14B-Instruct | Q8 | Not supported | — | 14GB (have 12GB) | 
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 27.43 tok/sEstimated  | 7GB (have 12GB) | 
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 32.41 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 39.28 tok/sEstimated  | 3GB (have 12GB) | 
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 30.60 tok/sEstimated  | 4GB (have 12GB) | 
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 43.41 tok/sEstimated  | 2GB (have 12GB) | 
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Not supported | — | 20GB (have 12GB) | 
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 23.99 tok/sEstimated  | 10GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 31.96 tok/sEstimated  | 5GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 40.98 tok/sEstimated  | 3GB (have 12GB) | 
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 25.17 tok/sEstimated  | 8GB (have 12GB) | 
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 35.66 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 28.17 tok/sEstimated  | 6GB (have 12GB) | 
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 36.38 tok/sEstimated  | 3GB (have 12GB) | 
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 24.45 tok/sEstimated  | 7GB (have 12GB) | 
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 34.31 tok/sEstimated  | 4GB (have 12GB) | 
| google-t5/t5-3b | Q8 | Fits comfortably | 34.66 tok/sEstimated  | 3GB (have 12GB) | 
| google-t5/t5-3b | Q4 | Fits comfortably | 46.85 tok/sEstimated  | 2GB (have 12GB) | 
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 12GB) | 
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Not supported | — | 15GB (have 12GB) | 
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 33.60 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 47.16 tok/sEstimated  | 2GB (have 12GB) | 
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 25.98 tok/sEstimated  | 7GB (have 12GB) | 
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 35.08 tok/sEstimated  | 4GB (have 12GB) | 
| openai-community/gpt2-large | Q8 | Fits comfortably | 28.30 tok/sEstimated  | 7GB (have 12GB) | 
| openai-community/gpt2-large | Q4 | Fits comfortably | 39.95 tok/sEstimated  | 4GB (have 12GB) | 
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 25.62 tok/sEstimated  | 7GB (have 12GB) | 
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 40.01 tok/sEstimated  | 4GB (have 12GB) | 
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 54.55 tok/sEstimated  | 1GB (have 12GB) | 
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 71.60 tok/sEstimated  | 1GB (have 12GB) | 
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 12GB) | 
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 12GB) | 
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 12GB) | 
| Qwen/Qwen3-32B | Q4 | Not supported | — | 16GB (have 12GB) | 
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 32.00 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 41.28 tok/sEstimated  | 3GB (have 12GB) | 
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 26.86 tok/sEstimated  | 7GB (have 12GB) | 
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 36.79 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 25.16 tok/sEstimated  | 8GB (have 12GB) | 
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 37.79 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 48.79 tok/sEstimated  | 1GB (have 12GB) | 
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 69.70 tok/sEstimated  | 1GB (have 12GB) | 
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 26.47 tok/sEstimated  | 7GB (have 12GB) | 
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 39.38 tok/sEstimated  | 4GB (have 12GB) | 
| vikhyatk/moondream2 | Q8 | Fits comfortably | 24.28 tok/sEstimated  | 7GB (have 12GB) | 
| vikhyatk/moondream2 | Q4 | Fits comfortably | 35.09 tok/sEstimated  | 4GB (have 12GB) | 
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 34.62 tok/sEstimated  | 3GB (have 12GB) | 
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 47.57 tok/sEstimated  | 2GB (have 12GB) | 
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 12GB) | 
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 12GB) | 
| distilbert/distilgpt2 | Q8 | Fits comfortably | 27.66 tok/sEstimated  | 7GB (have 12GB) | 
| distilbert/distilgpt2 | Q4 | Fits comfortably | 35.69 tok/sEstimated  | 4GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 12GB) | 
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Not supported | — | 16GB (have 12GB) | 
| inference-net/Schematron-3B | Q8 | Fits comfortably | 33.54 tok/sEstimated  | 3GB (have 12GB) | 
| inference-net/Schematron-3B | Q4 | Fits comfortably | 54.32 tok/sEstimated  | 2GB (have 12GB) | 
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 23.13 tok/sEstimated  | 8GB (have 12GB) | 
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 37.65 tok/sEstimated  | 4GB (have 12GB) | 
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 25.44 tok/sEstimated  | 7GB (have 12GB) | 
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 36.29 tok/sEstimated  | 4GB (have 12GB) | 
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 37.55 tok/sEstimated  | 3GB (have 12GB) | 
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 55.92 tok/sEstimated  | 2GB (have 12GB) | 
| bigscience/bloomz-560m | Q8 | Fits comfortably | 25.10 tok/sEstimated  | 7GB (have 12GB) | 
| bigscience/bloomz-560m | Q4 | Fits comfortably | 40.93 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 36.39 tok/sEstimated  | 3GB (have 12GB) | 
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 51.87 tok/sEstimated  | 2GB (have 12GB) | 
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 12GB) | 
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 12GB) | 
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 56.87 tok/sEstimated  | 1GB (have 12GB) | 
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 70.02 tok/sEstimated  | 1GB (have 12GB) | 
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 29.39 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 50.34 tok/sEstimated  | 2GB (have 12GB) | 
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 25.64 tok/sEstimated  | 7GB (have 12GB) | 
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 38.86 tok/sEstimated  | 4GB (have 12GB) | 
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 55.17 tok/sEstimated  | 1GB (have 12GB) | 
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 78.05 tok/sEstimated  | 1GB (have 12GB) | 
| facebook/opt-125m | Q8 | Fits comfortably | 28.37 tok/sEstimated  | 7GB (have 12GB) | 
| facebook/opt-125m | Q4 | Fits comfortably | 37.74 tok/sEstimated  | 4GB (have 12GB) | 
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 27.75 tok/sEstimated  | 5GB (have 12GB) | 
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 40.46 tok/sEstimated  | 3GB (have 12GB) | 
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 26.57 tok/sEstimated  | 6GB (have 12GB) | 
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 36.06 tok/sEstimated  | 3GB (have 12GB) | 
| google/gemma-3-1b-it | Q8 | Fits comfortably | 57.58 tok/sEstimated  | 1GB (have 12GB) | 
| google/gemma-3-1b-it | Q4 | Fits comfortably | 80.74 tok/sEstimated  | 1GB (have 12GB) | 
| openai/gpt-oss-20b | Q8 | Not supported | — | 20GB (have 12GB) | 
| openai/gpt-oss-20b | Q4 | Fits comfortably | 24.22 tok/sEstimated  | 10GB (have 12GB) | 
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 12GB) | 
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Not supported | — | 17GB (have 12GB) | 
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 25.81 tok/sEstimated  | 8GB (have 12GB) | 
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 32.66 tok/sEstimated  | 4GB (have 12GB) | 
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 29.71 tok/sEstimated  | 5GB (have 12GB) | 
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 39.96 tok/sEstimated  | 3GB (have 12GB) | 
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 29.94 tok/sEstimated  | 6GB (have 12GB) | 
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 42.29 tok/sEstimated  | 3GB (have 12GB) | 
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 25.40 tok/sEstimated  | 7GB (have 12GB) | 
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 40.86 tok/sEstimated  | 4GB (have 12GB) | 
| openai-community/gpt2 | Q8 | Fits comfortably | 26.40 tok/sEstimated  | 7GB (have 12GB) | 
| openai-community/gpt2 | Q4 | Fits comfortably | 41.30 tok/sEstimated  | 4GB (have 12GB) | 
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
A CUDA user benchmarking Qwen 2.5 14B Instruct Q4_K on a 4070 Ti Super reported ~72 tokens/sec—roughly 40% faster than the LocalScore baseline for that quant.
Source: Reddit – /r/LocalLLaMA (mlbgc2j)
LM Studio logs ~35 tok/s on gemma2-9b Q8_0 and ~25 tok/s on gemma-2-27b Q4_K_M with dual 4070 Ti Supers, delivering sub-0.2 s time-to-first-token for 9B workloads.
Source: Reddit – /r/LocalLLaMA (mehsra3)
Users on 12 GB mobile 4070 Ti rigs see QWQ stuck near 3.7 tok/s until they manually offload additional layers to the GPU—highlighting how VRAM ceilings restrict high-context models.
Source: Reddit – /r/LocalLLaMA (mjbgky4)
RTX 4070 Ti is rated at 285 W, ships with 12 GB GDDR6X, and relies on the 16-pin 12VHPWR connector. NVIDIA’s PSU guidance is 700 W or higher.
Source: TechPowerUp – RTX 4070 Ti Specs
On 3 Nov 2025, Amazon listed the 4070 Ti at $799 in stock, Newegg at $849 in stock, while Best Buy was $799 but out of stock.
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.