Loading GPU data...
Loading GPU data...
Quick Answer: NVIDIA RTX 6000 Ada offers 48GB VRAM and starts around $4999.00. It delivers approximately 192 tokens/sec on apple/OpenELM-1_1B-Instruct. It typically draws 300W under load.
This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| apple/OpenELM-1_1B-Instruct | Q4 | 192.20 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 189.90 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 186.76 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 180.86 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 176.80 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 175.85 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 170.05 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 167.62 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 166.50 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 151.65 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 148.05 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 137.77 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q8 | 137.23 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 136.33 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 134.23 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 133.24 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 132.18 tok/sEstimated Auto-generated benchmark | 1GB |
| google-t5/t5-3b | Q4 | 131.80 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 131.45 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 131.00 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q4 | 127.97 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B | Q4 | 125.26 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 124.33 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 122.40 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 122.17 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q8 | 120.42 tok/sEstimated Auto-generated benchmark | 1GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 116.15 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 116.07 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q8 | 115.42 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 114.24 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q4 | 112.92 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 112.45 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 112.02 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 111.60 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B | Q4 | 110.10 tok/sEstimated Auto-generated benchmark | 3GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 109.75 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Base | Q4 | 108.82 tok/sEstimated Auto-generated benchmark | 2GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 108.13 tok/sEstimated Auto-generated benchmark | 3GB |
| LiquidAI/LFM2-1.2B | Q8 | 105.86 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 105.48 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 105.30 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 103.87 tok/sEstimated Auto-generated benchmark | 2GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 103.62 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 102.84 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2b | Q8 | 102.71 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-0.5B | Q4 | 100.27 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 99.77 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 99.75 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 99.46 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 98.98 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 98.75 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B | Q4 | 98.58 tok/sEstimated Auto-generated benchmark | 2GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 98.10 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 97.76 tok/sEstimated Auto-generated benchmark | 3GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 97.44 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 97.44 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 96.19 tok/sEstimated Auto-generated benchmark | 3GB |
| EleutherAI/gpt-neo-125m | Q4 | 95.94 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 95.92 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 95.86 tok/sEstimated Auto-generated benchmark | 3GB |
| rednote-hilab/dots.ocr | Q4 | 95.72 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 95.38 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 94.86 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-2b-it | Q8 | 94.59 tok/sEstimated Auto-generated benchmark | 2GB |
| sshleifer/tiny-gpt2 | Q4 | 94.30 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 94.10 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 94.07 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 94.03 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 93.80 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 93.33 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 93.33 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 93.19 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 92.82 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 92.60 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 92.55 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 92.44 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 92.30 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 92.29 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 92.21 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 92.12 tok/sEstimated Auto-generated benchmark | 3GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 91.77 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 91.37 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 91.08 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 90.91 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 90.82 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q4 | 90.71 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/VibeVoice-1.5B | Q4 | 90.70 tok/sEstimated Auto-generated benchmark | 3GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 90.57 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 90.53 tok/sEstimated Auto-generated benchmark | 4GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 90.42 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 90.40 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 90.19 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 90.07 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 89.90 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 89.78 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 89.59 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-research/PowerMoE-3b | Q8 | 89.44 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 89.33 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 89.29 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 89.26 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 88.88 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 88.75 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 88.47 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 88.41 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 88.31 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 88.04 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 88.00 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 87.94 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 87.59 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 87.28 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 87.02 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.1-8B | Q4 | 86.99 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 86.52 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 86.46 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 86.19 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2 | Q4 | 86.16 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 85.81 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 85.74 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 85.36 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 85.19 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 85.04 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 85.04 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 84.77 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 84.32 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 84.31 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 84.26 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 84.10 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 84.08 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 84.05 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 83.99 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 83.86 tok/sEstimated Auto-generated benchmark | 3GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 83.71 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 83.67 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 83.48 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 82.89 tok/sEstimated Auto-generated benchmark | 4GB |
| bigcode/starcoder2-3b | Q8 | 82.76 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-large | Q4 | 82.69 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 82.35 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 82.32 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B | Q8 | 82.26 tok/sEstimated Auto-generated benchmark | 3GB |
| bigscience/bloomz-560m | Q4 | 82.15 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 81.99 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 81.98 tok/sEstimated Auto-generated benchmark | 4GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 81.65 tok/sEstimated Auto-generated benchmark | 5GB |
| google-t5/t5-3b | Q8 | 81.58 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 81.42 tok/sEstimated Auto-generated benchmark | 4GB |
| liuhaotian/llava-v1.5-7b | Q4 | 81.41 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 81.34 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 81.34 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 81.25 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 81.07 tok/sEstimated Auto-generated benchmark | 4GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 80.92 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 80.77 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 80.64 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B | Q8 | 80.54 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 80.36 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q8 | 80.28 tok/sEstimated Auto-generated benchmark | 3GB |
| parler-tts/parler-tts-large-v1 | Q4 | 80.23 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q4 | 80.12 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 79.82 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 79.69 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 79.69 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q8 | 78.94 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 78.59 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 78.21 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B | Q8 | 77.15 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 76.59 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 76.44 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 75.45 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 75.38 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 75.31 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-14B | Q4 | 75.07 tok/sEstimated Auto-generated benchmark | 7GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 74.70 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-14B-Base | Q4 | 74.07 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 73.76 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 73.56 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-14B | Q4 | 72.68 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 72.50 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 71.84 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 71.75 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 71.50 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 71.09 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q8 | 70.98 tok/sEstimated Auto-generated benchmark | 5GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 70.91 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 70.34 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 69.76 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 69.70 tok/sEstimated Auto-generated benchmark | 4GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 69.21 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 69.14 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-0.6B | Q8 | 68.27 tok/sEstimated Auto-generated benchmark | 6GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 68.26 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 68.17 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B | Q8 | 68.13 tok/sEstimated Auto-generated benchmark | 5GB |
| openai-community/gpt2-medium | Q8 | 67.96 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 67.75 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 67.65 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 67.12 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 67.01 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 66.89 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 66.58 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 66.54 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/VibeVoice-1.5B | Q8 | 66.52 tok/sEstimated Auto-generated benchmark | 5GB |
| dicta-il/dictalm2.0-instruct | Q8 | 66.48 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 66.47 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 66.24 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 66.02 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 65.30 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/gpt-neo-125m | Q8 | 65.17 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 65.17 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 64.83 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 64.51 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 64.44 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/DialoGPT-small | Q8 | 64.40 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 64.16 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 64.09 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 64.05 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 63.93 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 63.91 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 63.73 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 63.70 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 63.50 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 63.43 tok/sEstimated Auto-generated benchmark | 8GB |
| zai-org/GLM-4.5-Air | Q8 | 63.33 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 63.23 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B | Q8 | 62.86 tok/sEstimated Auto-generated benchmark | 8GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 62.77 tok/sEstimated Auto-generated benchmark | 8GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 62.67 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 62.64 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 62.38 tok/sEstimated Auto-generated benchmark | 7GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 62.26 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 62.24 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 61.84 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 61.75 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 61.55 tok/sEstimated Auto-generated benchmark | 7GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 61.54 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 61.49 tok/sEstimated Auto-generated benchmark | 8GB |
| ibm-granite/granite-docling-258M | Q8 | 61.34 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 61.32 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-xl | Q8 | 61.23 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 60.83 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 60.83 tok/sEstimated Auto-generated benchmark | 8GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 60.67 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 60.64 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 60.61 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 60.41 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 60.22 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 60.12 tok/sEstimated Auto-generated benchmark | 8GB |
| google/gemma-3-270m-it | Q8 | 59.94 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 59.76 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 59.52 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 59.16 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 59.09 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 59.09 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 59.08 tok/sEstimated Auto-generated benchmark | 8GB |
| numind/NuExtract-1.5 | Q8 | 59.01 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 58.87 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 58.82 tok/sEstimated Auto-generated benchmark | 7GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 58.61 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 58.41 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 58.18 tok/sEstimated Auto-generated benchmark | 8GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 58.13 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 58.11 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 58.10 tok/sEstimated Auto-generated benchmark | 10GB |
| petals-team/StableBeluga2 | Q8 | 57.97 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 57.69 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 57.54 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.6-FP8 | Q8 | 57.53 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 57.50 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 57.42 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 57.23 tok/sEstimated Auto-generated benchmark | 7GB |
| openai/gpt-oss-20b | Q4 | 57.06 tok/sEstimated Auto-generated benchmark | 10GB |
| microsoft/Phi-4-mini-instruct | Q8 | 56.88 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 56.81 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 56.62 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 56.60 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 56.60 tok/sEstimated Auto-generated benchmark | 8GB |
| vikhyatk/moondream2 | Q8 | 56.39 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 56.22 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 56.15 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 54.37 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 54.32 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-8B-Base | Q8 | 53.92 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-3.1-8B | Q8 | 53.76 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 53.49 tok/sEstimated Auto-generated benchmark | 8GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 53.39 tok/sEstimated Auto-generated benchmark | 10GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 53.32 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 52.96 tok/sEstimated Auto-generated benchmark | 16GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 52.93 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 52.14 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 52.06 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-14B | Q8 | 51.69 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B | Q4 | 51.35 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-14B-Base | Q8 | 51.12 tok/sEstimated Auto-generated benchmark | 14GB |
| codellama/CodeLlama-34b-hf | Q4 | 49.89 tok/sEstimated Auto-generated benchmark | 17GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 49.34 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-14B | Q8 | 49.15 tok/sEstimated Auto-generated benchmark | 14GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 48.99 tok/sEstimated Auto-generated benchmark | 15GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 48.68 tok/sEstimated Auto-generated benchmark | 14GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 48.12 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 47.71 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 47.41 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 47.35 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 46.98 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-32B | Q4 | 46.23 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 46.02 tok/sEstimated Auto-generated benchmark | 16GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 45.74 tok/sEstimated Auto-generated benchmark | 13GB |
| openai/gpt-oss-20b | Q8 | 45.36 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen3-32B | Q4 | 44.83 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 44.61 tok/sEstimated Auto-generated benchmark | 16GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 44.31 tok/sEstimated Auto-generated benchmark | 20GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 43.39 tok/sEstimated Auto-generated benchmark | 17GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 41.98 tok/sEstimated Auto-generated benchmark | 20GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 41.36 tok/sEstimated Auto-generated benchmark | 20GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | 36.82 tok/sEstimated Auto-generated benchmark | 30GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | 36.62 tok/sEstimated Auto-generated benchmark | 30GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | 36.02 tok/sEstimated Auto-generated benchmark | 30GB |
| baichuan-inc/Baichuan-M2-32B | Q8 | 35.49 tok/sEstimated Auto-generated benchmark | 32GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | 35.47 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-32B | Q8 | 35.21 tok/sEstimated Auto-generated benchmark | 32GB |
| Qwen/Qwen3-32B | Q8 | 34.42 tok/sEstimated Auto-generated benchmark | 32GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | 34.29 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen3-30B-A3B | Q8 | 33.94 tok/sEstimated Auto-generated benchmark | 30GB |
| codellama/CodeLlama-34b-hf | Q8 | 33.36 tok/sEstimated Auto-generated benchmark | 34GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 33.33 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | 33.28 tok/sEstimated Auto-generated benchmark | 30GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | 32.97 tok/sEstimated Auto-generated benchmark | 30GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | 32.96 tok/sEstimated Auto-generated benchmark | 30GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 32.93 tok/sEstimated Auto-generated benchmark | 36GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | 32.65 tok/sEstimated Auto-generated benchmark | 32GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 32.55 tok/sEstimated Auto-generated benchmark | 32GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | 32.43 tok/sEstimated Auto-generated benchmark | 30GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | 31.88 tok/sEstimated Auto-generated benchmark | 35GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | 31.76 tok/sEstimated Auto-generated benchmark | 32GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | 31.69 tok/sEstimated Auto-generated benchmark | 35GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | 31.37 tok/sEstimated Auto-generated benchmark | 30GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | 30.05 tok/sEstimated Auto-generated benchmark | 40GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | 30.00 tok/sEstimated Auto-generated benchmark | 40GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 29.70 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | 28.99 tok/sEstimated Auto-generated benchmark | 40GB |
| AI-MO/Kimina-Prover-72B | Q4 | 28.65 tok/sEstimated Auto-generated benchmark | 36GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | 28.22 tok/sEstimated Auto-generated benchmark | 40GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | 26.38 tok/sEstimated Auto-generated benchmark | 45GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 48GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Fits comfortably | 30.05 tok/sEstimated | 40GB (have 48GB) |
| ai-forever/ruGPT-3.5-13B | Q8 | Fits comfortably | 45.74 tok/sEstimated | 13GB (have 48GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 73.76 tok/sEstimated | 7GB (have 48GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Fits comfortably | 35.49 tok/sEstimated | 32GB (have 48GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Fits comfortably | 49.34 tok/sEstimated | 16GB (have 48GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 62.24 tok/sEstimated | 7GB (have 48GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 93.33 tok/sEstimated | 4GB (have 48GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 58.13 tok/sEstimated | 8GB (have 48GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 84.05 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 64.16 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 90.19 tok/sEstimated | 4GB (have 48GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Fits comfortably | 41.36 tok/sEstimated | 20GB (have 48GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits comfortably | 54.32 tok/sEstimated | 10GB (have 48GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 62.26 tok/sEstimated | 7GB (have 48GB) |
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 84.77 tok/sEstimated | 4GB (have 48GB) |
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 66.48 tok/sEstimated | 7GB (have 48GB) |
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 90.53 tok/sEstimated | 4GB (have 48GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Fits comfortably | 36.62 tok/sEstimated | 30GB (have 48GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Fits comfortably | 52.06 tok/sEstimated | 15GB (have 48GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 60.67 tok/sEstimated | 8GB (have 48GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 90.57 tok/sEstimated | 4GB (have 48GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Fits comfortably | 32.43 tok/sEstimated | 30GB (have 48GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Fits comfortably | 48.12 tok/sEstimated | 15GB (have 48GB) |
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 75.31 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 97.76 tok/sEstimated | 3GB (have 48GB) |
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 60.22 tok/sEstimated | 7GB (have 48GB) |
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 85.74 tok/sEstimated | 4GB (have 48GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 74.70 tok/sEstimated | 5GB (have 48GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 108.13 tok/sEstimated | 3GB (have 48GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Fits comfortably | 36.82 tok/sEstimated | 30GB (have 48GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Fits comfortably | 54.37 tok/sEstimated | 15GB (have 48GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Fits comfortably | 36.02 tok/sEstimated | 30GB (have 48GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Fits comfortably | 47.71 tok/sEstimated | 15GB (have 48GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Fits comfortably | 31.37 tok/sEstimated | 30GB (have 48GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Fits comfortably | 48.99 tok/sEstimated | 15GB (have 48GB) |
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 48GB) |
| AI-MO/Kimina-Prover-72B | Q4 | Fits comfortably | 28.65 tok/sEstimated | 36GB (have 48GB) |
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 116.07 tok/sEstimated | 1GB (have 48GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 192.20 tok/sEstimated | 1GB (have 48GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 59.08 tok/sEstimated | 8GB (have 48GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 84.10 tok/sEstimated | 4GB (have 48GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits comfortably | 52.93 tok/sEstimated | 9GB (have 48GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 81.65 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 82.26 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 125.26 tok/sEstimated | 2GB (have 48GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 67.01 tok/sEstimated | 7GB (have 48GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 90.42 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Fits comfortably | 53.32 tok/sEstimated | 13GB (have 48GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 70.91 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 48GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Fits comfortably | 28.22 tok/sEstimated | 40GB (have 48GB) |
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 114.24 tok/sEstimated | 1GB (have 48GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 166.50 tok/sEstimated | 1GB (have 48GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 82.76 tok/sEstimated | 3GB (have 48GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 131.45 tok/sEstimated | 2GB (have 48GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 48GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Fits comfortably | 28.99 tok/sEstimated | 40GB (have 48GB) |
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 61.34 tok/sEstimated | 7GB (have 48GB) |
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 88.75 tok/sEstimated | 4GB (have 48GB) |
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 56.62 tok/sEstimated | 7GB (have 48GB) |
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 82.89 tok/sEstimated | 4GB (have 48GB) |
| google/gemma-3-270m-it | Q8 | Fits comfortably | 59.94 tok/sEstimated | 7GB (have 48GB) |
| google/gemma-3-270m-it | Q4 | Fits comfortably | 92.30 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 76.44 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 99.77 tok/sEstimated | 2GB (have 48GB) |
| Qwen/Qwen2.5-32B | Q8 | Fits comfortably | 35.21 tok/sEstimated | 32GB (have 48GB) |
| Qwen/Qwen2.5-32B | Q4 | Fits comfortably | 46.23 tok/sEstimated | 16GB (have 48GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 58.87 tok/sEstimated | 7GB (have 48GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 80.23 tok/sEstimated | 4GB (have 48GB) |
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 67.12 tok/sEstimated | 7GB (have 48GB) |
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 88.00 tok/sEstimated | 4GB (have 48GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 66.52 tok/sEstimated | 5GB (have 48GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 90.70 tok/sEstimated | 3GB (have 48GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 105.30 tok/sEstimated | 2GB (have 48GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 134.23 tok/sEstimated | 1GB (have 48GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 48GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Fits comfortably | 32.93 tok/sEstimated | 36GB (have 48GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 63.91 tok/sEstimated | 7GB (have 48GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 81.41 tok/sEstimated | 4GB (have 48GB) |
| google/gemma-2b | Q8 | Fits comfortably | 102.71 tok/sEstimated | 2GB (have 48GB) |
| google/gemma-2b | Q4 | Fits comfortably | 137.77 tok/sEstimated | 1GB (have 48GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 62.38 tok/sEstimated | 7GB (have 48GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 93.80 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 48GB) |
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 48GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 64.44 tok/sEstimated | 8GB (have 48GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 78.59 tok/sEstimated | 4GB (have 48GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 56.88 tok/sEstimated | 7GB (have 48GB) |
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 88.41 tok/sEstimated | 4GB (have 48GB) |
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 62.67 tok/sEstimated | 7GB (have 48GB) |
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 95.92 tok/sEstimated | 4GB (have 48GB) |
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 56.81 tok/sEstimated | 7GB (have 48GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 89.33 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 71.84 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 99.46 tok/sEstimated | 2GB (have 48GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Fits comfortably | 33.28 tok/sEstimated | 30GB (have 48GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Fits comfortably | 52.14 tok/sEstimated | 15GB (have 48GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 63.23 tok/sEstimated | 8GB (have 48GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 89.90 tok/sEstimated | 4GB (have 48GB) |
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 132.18 tok/sEstimated | 1GB (have 48GB) |
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 180.86 tok/sEstimated | 1GB (have 48GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 62.77 tok/sEstimated | 8GB (have 48GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 87.59 tok/sEstimated | 4GB (have 48GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 48GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Fits comfortably | 26.38 tok/sEstimated | 45GB (have 48GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 63.50 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 94.86 tok/sEstimated | 4GB (have 48GB) |
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 59.01 tok/sEstimated | 7GB (have 48GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 90.07 tok/sEstimated | 4GB (have 48GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 66.54 tok/sEstimated | 7GB (have 48GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 88.47 tok/sEstimated | 4GB (have 48GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 61.54 tok/sEstimated | 7GB (have 48GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 93.19 tok/sEstimated | 4GB (have 48GB) |
| huggyllama/llama-7b | Q8 | Fits comfortably | 67.65 tok/sEstimated | 7GB (have 48GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 97.44 tok/sEstimated | 4GB (have 48GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 58.41 tok/sEstimated | 7GB (have 48GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 91.37 tok/sEstimated | 4GB (have 48GB) |
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 58.11 tok/sEstimated | 7GB (have 48GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 81.25 tok/sEstimated | 4GB (have 48GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 58.82 tok/sEstimated | 7GB (have 48GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 94.30 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 56.60 tok/sEstimated | 8GB (have 48GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 92.21 tok/sEstimated | 4GB (have 48GB) |
| openai-community/gpt2-xl | Q8 | Fits comfortably | 61.23 tok/sEstimated | 7GB (have 48GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 87.94 tok/sEstimated | 4GB (have 48GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Fits comfortably | 48.68 tok/sEstimated | 14GB (have 48GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 69.21 tok/sEstimated | 7GB (have 48GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 48GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Fits comfortably | 31.88 tok/sEstimated | 35GB (have 48GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 70.34 tok/sEstimated | 4GB (have 48GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 103.87 tok/sEstimated | 2GB (have 48GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 89.44 tok/sEstimated | 3GB (have 48GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 122.17 tok/sEstimated | 2GB (have 48GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 71.09 tok/sEstimated | 4GB (have 48GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 116.15 tok/sEstimated | 2GB (have 48GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 78.21 tok/sEstimated | 3GB (have 48GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 109.75 tok/sEstimated | 2GB (have 48GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 80.64 tok/sEstimated | 4GB (have 48GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 112.45 tok/sEstimated | 2GB (have 48GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 87.02 tok/sEstimated | 3GB (have 48GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 127.97 tok/sEstimated | 2GB (have 48GB) |
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 65.17 tok/sEstimated | 7GB (have 48GB) |
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 95.94 tok/sEstimated | 4GB (have 48GB) |
| codellama/CodeLlama-34b-hf | Q8 | Fits comfortably | 33.36 tok/sEstimated | 34GB (have 48GB) |
| codellama/CodeLlama-34b-hf | Q4 | Fits comfortably | 49.89 tok/sEstimated | 17GB (have 48GB) |
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 136.33 tok/sEstimated | 1GB (have 48GB) |
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 175.85 tok/sEstimated | 1GB (have 48GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 75.38 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 102.84 tok/sEstimated | 3GB (have 48GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 94.59 tok/sEstimated | 2GB (have 48GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 148.05 tok/sEstimated | 1GB (have 48GB) |
| Qwen/Qwen2.5-14B | Q8 | Fits comfortably | 51.69 tok/sEstimated | 14GB (have 48GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 75.07 tok/sEstimated | 7GB (have 48GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Fits comfortably | 32.65 tok/sEstimated | 32GB (have 48GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Fits comfortably | 46.02 tok/sEstimated | 16GB (have 48GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 66.89 tok/sEstimated | 7GB (have 48GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 80.36 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 78.94 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 108.82 tok/sEstimated | 2GB (have 48GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 63.73 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 86.46 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 64.51 tok/sEstimated | 7GB (have 48GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 92.60 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-14B-Base | Q8 | Fits comfortably | 51.12 tok/sEstimated | 14GB (have 48GB) |
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 74.07 tok/sEstimated | 7GB (have 48GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 58.61 tok/sEstimated | 8GB (have 48GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 84.31 tok/sEstimated | 4GB (have 48GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 59.09 tok/sEstimated | 7GB (have 48GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 81.99 tok/sEstimated | 4GB (have 48GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 60.61 tok/sEstimated | 7GB (have 48GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 89.78 tok/sEstimated | 4GB (have 48GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 59.16 tok/sEstimated | 7GB (have 48GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 85.19 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 71.75 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 92.29 tok/sEstimated | 3GB (have 48GB) |
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 60.83 tok/sEstimated | 8GB (have 48GB) |
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 85.81 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Fits comfortably | 33.94 tok/sEstimated | 30GB (have 48GB) |
| Qwen/Qwen3-30B-A3B | Q4 | Fits comfortably | 51.35 tok/sEstimated | 15GB (have 48GB) |
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 62.64 tok/sEstimated | 7GB (have 48GB) |
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 82.35 tok/sEstimated | 4GB (have 48GB) |
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 64.40 tok/sEstimated | 7GB (have 48GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 89.59 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 56.22 tok/sEstimated | 8GB (have 48GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 81.98 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Fits comfortably | 32.96 tok/sEstimated | 30GB (have 48GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Fits comfortably | 47.41 tok/sEstimated | 15GB (have 48GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 69.70 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 99.75 tok/sEstimated | 2GB (have 48GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 64.09 tok/sEstimated | 7GB (have 48GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 83.48 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 53.92 tok/sEstimated | 8GB (have 48GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 92.55 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 59.76 tok/sEstimated | 6GB (have 48GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 86.19 tok/sEstimated | 3GB (have 48GB) |
| openai-community/gpt2-medium | Q8 | Fits comfortably | 67.96 tok/sEstimated | 7GB (have 48GB) |
| openai-community/gpt2-medium | Q4 | Fits comfortably | 85.04 tok/sEstimated | 4GB (have 48GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 63.93 tok/sEstimated | 7GB (have 48GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 91.77 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 75.45 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 105.48 tok/sEstimated | 3GB (have 48GB) |
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 64.05 tok/sEstimated | 7GB (have 48GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 81.42 tok/sEstimated | 4GB (have 48GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Fits comfortably | 41.98 tok/sEstimated | 20GB (have 48GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 58.10 tok/sEstimated | 10GB (have 48GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 48GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Fits comfortably | 31.69 tok/sEstimated | 35GB (have 48GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 58.18 tok/sEstimated | 8GB (have 48GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 81.07 tok/sEstimated | 4GB (have 48GB) |
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 63.33 tok/sEstimated | 7GB (have 48GB) |
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 95.38 tok/sEstimated | 4GB (have 48GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 67.75 tok/sEstimated | 7GB (have 48GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 84.32 tok/sEstimated | 4GB (have 48GB) |
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 105.86 tok/sEstimated | 2GB (have 48GB) |
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 151.65 tok/sEstimated | 1GB (have 48GB) |
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 68.26 tok/sEstimated | 7GB (have 48GB) |
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 97.44 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Fits comfortably | 32.55 tok/sEstimated | 32GB (have 48GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits comfortably | 52.96 tok/sEstimated | 16GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 66.24 tok/sEstimated | 7GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 88.31 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 53.76 tok/sEstimated | 8GB (have 48GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 86.99 tok/sEstimated | 4GB (have 48GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 59.52 tok/sEstimated | 7GB (have 48GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 83.99 tok/sEstimated | 4GB (have 48GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 60.83 tok/sEstimated | 7GB (have 48GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 84.26 tok/sEstimated | 4GB (have 48GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 92.82 tok/sEstimated | 3GB (have 48GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 111.60 tok/sEstimated | 2GB (have 48GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 77.15 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 90.71 tok/sEstimated | 3GB (have 48GB) |
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 57.23 tok/sEstimated | 7GB (have 48GB) |
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 89.26 tok/sEstimated | 4GB (have 48GB) |
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 57.50 tok/sEstimated | 7GB (have 48GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 94.10 tok/sEstimated | 4GB (have 48GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 57.53 tok/sEstimated | 7GB (have 48GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 94.03 tok/sEstimated | 4GB (have 48GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 68.17 tok/sEstimated | 7GB (have 48GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 86.52 tok/sEstimated | 4GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 56.15 tok/sEstimated | 8GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 84.08 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 65.30 tok/sEstimated | 7GB (have 48GB) |
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 92.44 tok/sEstimated | 4GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 57.42 tok/sEstimated | 7GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 90.91 tok/sEstimated | 4GB (have 48GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 60.41 tok/sEstimated | 7GB (have 48GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 85.36 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 48GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Fits comfortably | 29.70 tok/sEstimated | 35GB (have 48GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 68.13 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 100.27 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen3-14B | Q8 | Fits comfortably | 49.15 tok/sEstimated | 14GB (have 48GB) |
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 72.68 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 53.49 tok/sEstimated | 8GB (have 48GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 79.69 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 48GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Fits comfortably | 33.33 tok/sEstimated | 35GB (have 48GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 61.49 tok/sEstimated | 8GB (have 48GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 82.32 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 46.98 tok/sEstimated | 14GB (have 48GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 73.56 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 70.98 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 110.10 tok/sEstimated | 3GB (have 48GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 79.82 tok/sEstimated | 4GB (have 48GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 103.62 tok/sEstimated | 2GB (have 48GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Fits comfortably | 44.31 tok/sEstimated | 20GB (have 48GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 53.39 tok/sEstimated | 10GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 69.76 tok/sEstimated | 5GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 95.86 tok/sEstimated | 3GB (have 48GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 60.12 tok/sEstimated | 8GB (have 48GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 85.04 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 69.14 tok/sEstimated | 6GB (have 48GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 93.33 tok/sEstimated | 3GB (have 48GB) |
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 61.75 tok/sEstimated | 7GB (have 48GB) |
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 95.72 tok/sEstimated | 4GB (have 48GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 81.58 tok/sEstimated | 3GB (have 48GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 131.80 tok/sEstimated | 2GB (have 48GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Fits comfortably | 32.97 tok/sEstimated | 30GB (have 48GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Fits comfortably | 47.35 tok/sEstimated | 15GB (have 48GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 80.54 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 98.58 tok/sEstimated | 2GB (have 48GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 56.60 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 94.07 tok/sEstimated | 4GB (have 48GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 66.58 tok/sEstimated | 7GB (have 48GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 82.69 tok/sEstimated | 4GB (have 48GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 60.64 tok/sEstimated | 7GB (have 48GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 88.04 tok/sEstimated | 4GB (have 48GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 120.42 tok/sEstimated | 1GB (have 48GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 189.90 tok/sEstimated | 1GB (have 48GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 48GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Fits comfortably | 30.00 tok/sEstimated | 40GB (have 48GB) |
| Qwen/Qwen3-32B | Q8 | Fits comfortably | 34.42 tok/sEstimated | 32GB (have 48GB) |
| Qwen/Qwen3-32B | Q4 | Fits comfortably | 44.83 tok/sEstimated | 16GB (have 48GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 71.50 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 92.12 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 61.84 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 80.12 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 57.69 tok/sEstimated | 8GB (have 48GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 79.69 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 137.23 tok/sEstimated | 1GB (have 48GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 167.62 tok/sEstimated | 1GB (have 48GB) |
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 57.97 tok/sEstimated | 7GB (have 48GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 81.34 tok/sEstimated | 4GB (have 48GB) |
| vikhyatk/moondream2 | Q8 | Fits comfortably | 56.39 tok/sEstimated | 7GB (have 48GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 90.40 tok/sEstimated | 4GB (have 48GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 80.77 tok/sEstimated | 3GB (have 48GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 131.00 tok/sEstimated | 2GB (have 48GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 48GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Fits comfortably | 34.29 tok/sEstimated | 35GB (have 48GB) |
| distilbert/distilgpt2 | Q8 | Fits comfortably | 64.83 tok/sEstimated | 7GB (have 48GB) |
| distilbert/distilgpt2 | Q4 | Fits comfortably | 89.29 tok/sEstimated | 4GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Fits comfortably | 31.76 tok/sEstimated | 32GB (have 48GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Fits comfortably | 44.61 tok/sEstimated | 16GB (have 48GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 80.28 tok/sEstimated | 3GB (have 48GB) |
| inference-net/Schematron-3B | Q4 | Fits comfortably | 112.92 tok/sEstimated | 2GB (have 48GB) |
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 62.86 tok/sEstimated | 8GB (have 48GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 90.82 tok/sEstimated | 4GB (have 48GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 65.17 tok/sEstimated | 7GB (have 48GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 83.71 tok/sEstimated | 4GB (have 48GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 80.92 tok/sEstimated | 3GB (have 48GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 112.02 tok/sEstimated | 2GB (have 48GB) |
| bigscience/bloomz-560m | Q8 | Fits comfortably | 59.09 tok/sEstimated | 7GB (have 48GB) |
| bigscience/bloomz-560m | Q4 | Fits comfortably | 82.15 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 83.86 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 122.40 tok/sEstimated | 2GB (have 48GB) |
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 48GB) |
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 48GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 124.33 tok/sEstimated | 1GB (have 48GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 170.05 tok/sEstimated | 1GB (have 48GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 76.59 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 98.98 tok/sEstimated | 2GB (have 48GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 61.32 tok/sEstimated | 7GB (have 48GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 88.88 tok/sEstimated | 4GB (have 48GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 133.24 tok/sEstimated | 1GB (have 48GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 176.80 tok/sEstimated | 1GB (have 48GB) |
| facebook/opt-125m | Q8 | Fits comfortably | 61.55 tok/sEstimated | 7GB (have 48GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 81.34 tok/sEstimated | 4GB (have 48GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 72.50 tok/sEstimated | 5GB (have 48GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 96.19 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 66.47 tok/sEstimated | 6GB (have 48GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 98.75 tok/sEstimated | 3GB (have 48GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 115.42 tok/sEstimated | 1GB (have 48GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 186.76 tok/sEstimated | 1GB (have 48GB) |
| openai/gpt-oss-20b | Q8 | Fits comfortably | 45.36 tok/sEstimated | 20GB (have 48GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 57.06 tok/sEstimated | 10GB (have 48GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Fits comfortably | 35.47 tok/sEstimated | 34GB (have 48GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Fits comfortably | 43.39 tok/sEstimated | 17GB (have 48GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 63.43 tok/sEstimated | 8GB (have 48GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 83.67 tok/sEstimated | 4GB (have 48GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 63.70 tok/sEstimated | 5GB (have 48GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 98.10 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 68.27 tok/sEstimated | 6GB (have 48GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 91.08 tok/sEstimated | 3GB (have 48GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 57.54 tok/sEstimated | 7GB (have 48GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 87.28 tok/sEstimated | 4GB (have 48GB) |
| openai-community/gpt2 | Q8 | Fits comfortably | 66.02 tok/sEstimated | 7GB (have 48GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 86.16 tok/sEstimated | 4GB (have 48GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
LM Studio users fully offloading Qwen 3 30B Q4 with FlashAttention report about 33 tokens/sec at a 32K context window on the RTX 6000 Ada.
Source: Reddit – /r/LocalLLaMA (mpya1gb)
Professionals cite turnkey RTX 6000 Ada boxes at roughly $6,000—already fast and private enough to replace API workflows for many coding teams.
Source: Reddit – /r/LocalLLaMA (mr6x6wu)
One ProLiant DL380 Gen10 setup pairs a single RTX 6000 Ada with three RTX 4090s, virtualized under Proxmox to expose 120 GB of total VRAM for AI workloads.
Source: Reddit – /r/LocalLLaMA (mqubm2s)
Some buyers note the RTX 6000 Ada’s price (~$7k) rivals three RTX 5090 cards, so the workstation route only makes sense when ECC VRAM and pro drivers are required.
Source: Reddit – /r/LocalLLaMA (mqsk1ah)
RTX 6000 Ada includes 48 GB GDDR6 ECC, a 300 W TDP, and PCIe 4.0 x16 connectivity. As of 3 Nov 2025 it listed at $6,999 (Newegg), $7,199 (Amazon), and $7,299 (Best Buy, out of stock).
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.