Pretraining and Prior Weights

This page connects classic ML intuition to modern LLMs. The short version: pretraining fixes the model weights, prompting supplies the inputs, and inference returns the output.

From ML to LLM in one line

Traditional ML: you choose features X, the model f uses weights W, and you get predictions Y.

LLMs: pretraining sets W, your prompt is X, and the response is Y. Prompting is just choosing which inputs and context the model sees.

Want the prompt view? See the Prompt Engineering Guide.

Why prior weights matter

Pretraining is a giant optimization pass that bakes in patterns from the training data. Those patterns are the model's starting beliefs, just like priors in a Bayesian model.

This is the same intuition behind using prior weights in gradient boosting. If you start closer to reality, you need fewer updates to get good predictions.

Video walkthrough

Full article: Prior weights in XGBoost

Here is the original writeup. It is long-form and technical, but it provides the foundation for the pretraining intuition.

Open the article in a full page view.

Book bonus: Predictive Analytics: A Guide for Modern Science is included with any software purchase, with signed copies mailed when print runs are available.