Notes from Andrej Karpathy's 1-Hour Lecture on LLMs

Before we begin, a short excerpt from the book The Defining Decade:

The right time to take action is at the sweet spot—waiting too long or acting too early can lead to missed opportunities.

Echoing this idea, I decided to learn about the Generative AI landscape. And who better to learn it from than the expert Andrej Karpathy?

Please note that these are merely my notes for reference, not a comprehensive tutorial. If you want to follow along with the notes, here’s the video

A busy person’s guide to understanding LLMs

Large Language Models (LLMs) are at the heart of many modern AI applications, from chatbots to content generators.

Understanding Number Representation

Floating point numbers in binary can be represented as float16, float32, or float64, each affecting precision and storage:

float16 (16-bit): 1 bit for the sign, 5 bits for the exponent, and 10 bits for the significand.
float32 and float64 offer higher precision and range with more bits.

LLaMA 2 Model Overview

LLaMA 2 (70B): The second iteration of the LLaMA model released by Meta AI, containing 70 billion parameters. There are also models with 7 billion and 14 billion parameters.

When downloading the LLaMA-2-70B model locally, you'll find a parameter file and an executable.
Each parameter is stored as a float16 number, which takes up 2 bytes. Therefore, the total storage required is:
- 70 billion×2 bytes=140 billion bytes=140 GB

Training Stages

a. Pre-training

LLMs utilize lossy compression to distill text from the internet into 140GB of parameters.
During pre-training, we evaluate the LLM on the next word prediction problem. Why? We know mathematically that prediction is related to compression.
The next word prediction gets better if we have more parameters/ data.
Predicting the next word is a strong objective since it forces you to learn about the world (stored in the NN’s parameters). The model generates predictions not directly present in the input but represents a new formulation of the training dataset information.
Internally, LLMs are based on a transformer architecture. We can measure its accuracy, but we don’t know how exactly billions of parameters collaborate to do it.
The transformer network builds and maintains a knowledge database, but it might not be accurate.

b. Fine-tuning

The model learns to produce quality responses in the style of an assistant, based on high-quality answers from skilled individuals.

c. Comparison stage

In this phase, human feedback is used to improve the model's responses. Humans evaluate generated responses, and the feedback is used to refine the model through reinforcement learning techniques.

Scaling laws in LLMs

As LLMs grow in size (more parameters and data), their ability to predict the next word improves.
This improvement is directly correlated with better performance on tasks we care about, such as text generation or question answering.
LLMs continue to scale, integrating capabilities such as multimodality (handling text, audio, and images), which expands their potential applications.

Thinking Patterns in LLMs: Autopilot vs. Conscious Thinking

The book Thinking fast and slow describes two modes of thinking: autopilot and conscious mode.
LLMs operate more like an "autopilot" mode of human thinking, instantly generating responses without deep deliberation. This rapid output mimics intuitive thinking but does not involve conscious reasoning.
In contrast, AlphaGo was able to beat the best human player by playing with itself, self-improvement.

Security Concerns in LLMs

As LLMs become integral to various applications, they also present new security challenges, akin to operating system security. Some key concerns include:

Jailbreak: Refers to techniques used to bypass built-in restrictions or content filters, allowing the model to generate responses it would normally be prohibited from producing
Prompt Injection Attacks: Attackers can manipulate prompts to override the model’s instructions, potentially exposing sensitive information or causing unintended behaviours.

Data Poisoning: Introducing malicious or biased data during training or fine-tuning can influence the model’s responses. By embedding specific trigger phrases, attackers can cause the model to behave in undesired ways.
Information Leakage: There is a risk of leaking personal or sensitive data when integrating LLMs with web services or other tools. For instance, prompting a model to fetch data from a service like Google Docs could expose private information.

Similarity to Operating Systems

LLMs are increasingly becoming a platform for various AI applications, similar to an operating system with proprietary and open-source versions.
As they evolve, questions about LLM security, modularity, and extensibility will become more prominent.
Ensuring the models are secure, reliable, and adaptable to different applications will be essential in their continued development.

Conclusion

The talk delivers a solid foundation on LLMs, covering everything from number representation to the training process and scaling challenges. It provides valuable insights into the workings and security considerations of these models, making complex topics more accessible. For a practical and engaging starting point in understanding the landscape of Generative AI, this video is well worth your time.

Notes from Andrej Karpathy's 1-Hour Lecture on LLMs

A busy person’s guide to understanding LLMs

Understanding Number Representation

LLaMA 2 Model Overview

Training Stages

Scaling laws in LLMs

Thinking Patterns in LLMs: Autopilot vs. Conscious Thinking

Security Concerns in LLMs

Similarity to Operating Systems

Conclusion

Comments

More from this blog

20 essential PySpark operations

Demystifying Setup.py: The Blueprint Behind Python Packages 📦

A Beginner's Guide to Spark: Insights from an MLE

My Journey as a Machine Learning Engineer

Command Palette

A busy person’s guide to understanding LLMs

Understanding Number Representation

LLaMA 2 Model Overview

Training Stages

Scaling laws in LLMs

Thinking Patterns in LLMs: Autopilot vs. Conscious Thinking

Security Concerns in LLMs

Similarity to Operating Systems

Conclusion

Comments

More from this blog