
2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy
Member-only story Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU Benjamin Marie · Follow Published in Towards Data Science · 8 min read · 5 hours ago — Generated with ChatGPT Recent developments in low-bit quantization for LLMs, like AQLM and AutoRound, are now showing acceptable levels of degradation in downstream tasks, especially for large models. That said, 2-bit quantization still introduces noticeable accuracy loss in most cases. One promising