10-20 minutesBeginner

How to Run Gemma Locally

Run Google's open-weight LLM on your hardware

Gemma is Google's family of open-weight language models. Gemma 2 offers excellent performance for its size, rivaling larger models. This guide shows you how to run Gemma locally using Jan.

Hardware Requirements

GPU VRAMMin: 8GB (Gemma 2 9B Q4)Rec: 16GB (Gemma 2 27B Q4)Gemma 2B runs on 4GB+

System RAMMin: 16GBRec: 32GB

StorageMin: 15GB freeRec: 50GB SSD

Step-by-Step Guide

1Download and Install Jan

Jan is a free desktop app with a built-in Model Hub featuring Gemma.

# Download from: https://jan.ai/download
# Available for Windows, macOS, Linux
# Install and launch the app

💡 Jan auto-detects your GPU and configures optimal settings.

2Find Gemma in Model Hub

Open Jan and click Model Hub. Search for 'Gemma'.

Available Gemma models:
• Gemma 2 2B - Ultra fast, runs on any GPU
• Gemma 2 9B - Great balance of speed and quality  
• Gemma 2 27B - Best quality, needs 16GB+ VRAM

3Download and Start Chatting

Click download on your chosen model, then start a new chat.

Gemma 2 9B download: ~5GB
Gemma 2 27B download: ~15GB

Once downloaded, click the model to start chatting!

Recommended GPUs

Budget

RTX 3060 12GB

Runs Gemma 2 9B smoothly at 30+ tok/s.

View GPU

Recommended

RTX 4070 Ti Super 16GB

Handles Gemma 2 27B at good speeds.

View GPU

Premium

RTX 4090 24GB

Fastest Gemma inference, room for larger context.

View GPU

Troubleshooting

❓ Model runs on CPU instead of GPU

✅ Go to Settings > Advanced and verify GPU acceleration is enabled. Update your GPU drivers.

❓ Gemma 27B is slow

✅ Gemma 27B needs 16GB+ VRAM for good speeds. Try Gemma 9B for faster responses on 12GB cards.

Related Guides

Run Llama Locally Run Mistral Locally