Llama 2 vs. Llama 3 vs. Mistral 7B, quantized with GPTQ and Bitsandbytes
With quantization, we can reduce the size of large language models (LLMs). Quantized LLMs are easier to run on GPUs with smaller memory, effectively serving as a compression method for LLMs.