Can Quantum Computing help improving our ability to train Large Neural Networks encoding language models (LLMs)?
What is “training”?
In the lingo of Artificial Intelligence (AI) studies, “training” means optimizing a statistical model, often implemented as a neural network, to make predictions based on some input data and a measure of how good these predictions are (“cost” or “loss” function). There are three main paradigms in which such procedure can happen: supervised, unsupervised (often autoregressive), and reinforcement learning. In supervised learning, each data point is labelled so the model predictions can be directly compared to the true values (e.g. this is the image of a cat or a dog). In unsupervised training, there are no explicit labels, but the comparison is carried out with features extracted from the data itself (e.g. predicting the next word in a sentence). Finally, reinforcement learning is based on optimizing the long-term returns of a sequence of decisions (predictions) based on the interaction between the statistical model and the environment (should the car slow down or speed up at a yellow traffic light?).
In all these cases, the optimization of the parameters of the model is a lengthy process which requires a…