What’s a large language model and how do they work?

A large language model is two files:

Parameters: The weights of the neural network. Each of the parameters is stored as two bytes.
run.c: What runs the neural network.

LLM training

The best way to understand it is like compression of a chunk of the internet made by lots of GPUs that produce a .zip file. Take into account that we don't get a copy of the data in the parameters.

The process involved in a neural network is to predict what word better fits the next one in a sentence with a certain probability.

LLM dreams

We can think that the neural network produces web page dreams, as it was fed on those and after producing our query it tries to predict what would better fit the task at hand.

We called hallucinations to the incorrect answers.

How do they work?

We can measure that the parameters are working better at predicting the next word but we don't know what they are doing.

The knowledge is one-dimensional, it doesn't have access to the data from every questioning angle and it may answer differently when asked on the same subject.

Fine-tuning to an assistant

We are reproducing the same process of learning but now with humans instead of a dataset. The humans are entitled to the task of making questions and answers for the network to learn on. It’s important to know that nowadays this is a human-machine process so the machines also help.

In this stage, we are giving an alignment to the network on how to answer certain questions but using also the pre-training data from the whole internet.

When we encounter misbehavior, a person should give the correct response to the question, and that Q&A is now inserted into the training data.