It’s not practical or possible to ideate on the training and application of large language models, or the way that LLMs improve their word prediction and decision-making skills, without discussing the building blocks of language learning itself.
LLMs do not actually “learn,” — at least not in the traditional sense. While they use techniques such as memory, self-play, and step-by-step iterations to improve their speed and accuracy, they do not actually maintain intrinsic knowledge or understanding. The model is simply predicting the best token (a token can be a word, a part of a word, or a letter) to follow within a set to achieve a better response.
What does “better” mean?
In LLM training, we are testing for how well the model generates responses according to the provided tone parameters, temperature, and dataset(s).