Lexical Glossary — Gabby Lee

Content Manager, Gen AI

It’s not practical or possible to ideate on the training and application of large language models, or the way that LLMs improve their word prediction and decision-making skills, without discussing the building blocks of language learning itself.

LLMs do not actually “learn,” — at least not in the traditional sense. While they use techniques such as memory, self-play, and step-by-step iterations to improve their speed and accuracy, they do not actually maintain intrinsic knowledge or understanding. The model is simply predicting the best token (a token can be a word, a part of a word, or a letter) to follow within a set to achieve a better response.

What does “better” mean?

In LLM training, we are testing for how well the model generates responses according to the provided tone parameters, temperature, and dataset(s).

In my experience, the vast majority of LLM training uses datasets from technical analysis, news aggregates, trending algorithms, research papers, and the occasional message board. There is no specific attention paid to teen and young adult (TAYA) lexicon, which presents a huge missed opportunity.

I have long been fascinated by linguistics and the evolution of language over time. The instant connectivity of social media has led to a rapid transformation of language. The language transforms so quickly that it's hard to keep up, so I have dedicated my time to collecting and cataloging this incredible evolution of the teen and young adult lexicon.

Click here to access my TAYA Lexical Glossary. Updated regularly; feel free to submit anything I’ve missed.

Also, if you’re not sure what the difference is between Gen Z and Millennials, click here to access my comprehensive Gen Z Guide.