Torch Compile: 2x Faster Llama 3.2 with Low Effort
But it will depend on your GPU Benjamin Marie · Follow Published in Towards Data Science · 5 min read · 12 hours ago — Image generated with ChatGPT Torch Compile (torch.compile) was first introduced with PyTorch 2.0, but it took several updates and optimizations before it could reliably support most large language models (LLMs). when it comes to inference, torch.compile can genuinely speed up decoding with only a small increase in memory usage. In