This page answers xgen-universe/Capybara q5_k_m quantization queries with explicit calculations from our model requirement dataset and compatibility speed table.
Estimated from Q4 and Q8 requirement bounds using midpoint interpolation.
Throughput data below uses available compatibility measurements/estimates and is sorted by tokens per second for this model.
Need general guidance? Review full methodology.
| GPU | VRAM | Quantization | Speed | Compatibility |
|---|---|---|---|---|
| AMD Instinct MI300X | 192GB | Q4 | 817 tok/s | View full compatibility |
| NVIDIA H200 SXM 141GB | 141GB | Q4 | 703 tok/s | View full compatibility |
| AMD Instinct MI250X | 128GB | Q4 | 497 tok/s | View full compatibility |
| NVIDIA H100 SXM5 80GB | 80GB | Q4 | 463 tok/s | View full compatibility |
| RTX 5090 | 32GB | Q4 | 323 tok/s | View full compatibility |
| NVIDIA H100 PCIe 80GB | 80GB | Q4 | 318 tok/s | View full compatibility |
| NVIDIA A100 80GB SXM4 | 80GB | Q4 | 269 tok/s | View full compatibility |
| AMD Instinct MI210 | 64GB | Q4 | 227 tok/s | View full compatibility |
| NVIDIA A100 40GB PCIe | 40GB | Q4 | 221 tok/s | View full compatibility |
| RTX 4090 | 24GB | Q4 | 196 tok/s | View full compatibility |
| NVIDIA L40S | 48GB | Q4 | 180 tok/s | View full compatibility |
| NVIDIA L40 | 48GB | Q4 | 175 tok/s | View full compatibility |