Between quantization-aware training and post-training quantization
There are many quantization methods to reduce the size of large language models (LLM). Recently, better low-bit quantization methods have been proposed. For instance, AQLM achieves 2-bit quantization while preserving most of the model’s accuracy.