Fast Tracking the Course of AI

1/23/26

Introduction

field is moving really really fast
understand how modern AI engineering is done
tools , use cases , agents ,skills - subset of buzz words - a lot of other things are present.
ask lots of intuitive questions during interviews and also learn about the interview process as well
understand how LLMs actually work
research also involves implementing a lot of stuff that you work with daily
things to do
- basic python
- have a colab account ready
- look at different models available in the market today and explore them
- lot of models in the market -> try to also look at os models like qwen, deepseek etc.
  - how different they are from other models.
free stuff we can try:
- cli tools people are getting from that
- opencode
- gemini cli
building and being good with the fundamentals is more important than doing a phd now.
applied ai
- using existing tools, how to make things that are better to use and to find new applications that can be built on top of them.
what are some good projects ?
- dealing with los of data
- lots of context, lots of data
- how we pass the data to the llm
- memory and need of something -> mix and match of these
context engineering
- when to use which model
- what are the best models
- how to switch contexts etc.
agentic tasks, data pipeling and building webapps on top of a agentic skill and definitely a good way to learn about this
go on colab and learn and type everything
- theory and learn about things
- and what direction we are going towards.
coding simple transformations and simple blocks of attention
go out and try qwen, deepseek , llama on openrouter and find examples on where where these models differ , different guard rails and jailbreaking these open access models.
what do AI engineers do ?
- new “backend” engineer
- pace we are moving is the same thing
- every company needs someone who can do this and has become a new completely different field and high time to learn what this tools are
- lot of people are vibe coding things and moving fast.
- can move in 2-3 projects at two times and deliver more than asked for and using these agents much easier to build and deploy faster.
openrouter and opencode to play around with different models.

Fast Tracking the Course of AI:

intelligence
- ability to achieve goals in a wide range of situations.
<History /> of AI and other stuff
why language is hard for computers ?
- Natural Language Processing
- 4 different attempts to NLP
  - Put up every possible word in the dictionary approach in it -> n-gram model
  - Statistical Pattern based techniques -> Pattern Matching != Understanding
    - Machine could predict the next word based on frequency, but it had no concept of what the words actually meant.
    - Breakthrough -> Words as Numbers (computers can also understand numbers), words doesnt make any implicit sense for computers.
    - Challenge -> how do we turn a word into a list of numbers that captures its meaning ?
    - Apple -> [0.92,-0.14,0.05,....]
    - Hint: these are called embeddings and we are creating embeddings.
  - Word Embeddings -> “Secret Sauce”
    - How do we turn a word like “Apple” into a number that captures its meaning ?
    - Capture the meaning not in 1d space but in a higher dimensional space
    - Instead of one number, we give each word a list of numbers(a vector).
    - Each number represents a specific “dimension” of meaning.
    - very big breakthrough as for every word, it can be very high.
  - Eg: king and queen have similar numbers for royalty, but vastly different values for gender. This is how the machines represent concepts as data.
  - ![[Screenshot 2026-01-23 at 10.55.13 AM.png]]
  - Words as Positions in Space
    - If a word is a list of numbers, its a point on a map. Similar words are close together, while different words are far apart.
  - ![[Screenshot 2026-01-23 at 10.57.45 AM.png]]
  - So , similar words are placed close together in a multi-dimensional space. Proximity indicates shared meaning.

magic of embeddings

word math
- king - man + woman = queen
- paris - france + italy = rome
- walking - walk + swim = swimming
can embeddings solve our earlier problems ?
- in basic word embeddings, each word has exactly ONE position in space.
- “I went to the bank to deposit cash”
- “I sat on the bank of the river”

Sequence Models (RNNs)

okay we can make embeddings now, but what to make of that embedding exactly.
we are not able to make use of the embedding and use it anyhow.
Idea: Process the sentence one word at a time, building up understanding as we go.
- Model maintains a “memory” of what it has read so far.
Eg: other models like GRUS, LSTMS etc. and other divergent fields like Mambas
Even in fact , Transformers are the best version of top of the sequence models.
Its a architecture , that deals with the sequence and from each word they retain some memory.
the context and upcoming word , change the meaning of our word completely.
Problem : Forgetting
- World ,storage and memroy is finite.
- And it has to go back a long vectors back.
- meaning would completey disrupt as it goes back a lot of tokens.
RNNs struggle with long sentences because they process information linearly and have limited memory capacity.
- LSTM came up after RNN, added gates for choice to add to its memory and the memory and the information that it needs to retain is very less.
- GRUS came after that

Transformers

stage is set around 2017
what we have till now
- word embeddings
- sequence models(RNNs)
- Tons of internet data
- Powerful GPUs
We are just lacking in a powerful way to make all these work better.
(old idea) process words one by one, like reading a book from start to end.
The idea of transformer
- Simultaneous
  - Look at every word in the sentence at the same time.
- Attention
  - model “attends” to the most relevant words, no matter how far apart they are.
Have to make them more selective and it was wasting more of their information.
And the second thing that we need to do is , all the word that we are processing should be done parallely.
- Attention enables the model to focus only on the most relevant word.

Attention in Action

when the model looks at the focus word, the attention mechanism tells it to pay the most attention to it.
![[Screenshot 2026-01-23 at 12.58.46 PM.png]]
how it happens depends on the maths behind it
Model “knows” what the pronouns refers to buiding a context aware representation of every word is.
Why Attention is powerful :
- More memory -> every word in a sentence “sees” every other word simultaneously , no matter how far apart they are
- Parallel Processing -> Unlike RNNs that read word by word, Transformers process all words at once, making training incredibly fast.
- Deep Understanding -> The model builds a mathematical map of how every word relates to every other word in the specific context.
![[Screenshot 2026-01-23 at 7.38.54 PM.png]]
By changing just one word (tired vs wide), the model’s attention shifts, correctly identifying what “it” refers to in each context.
Whole Flow (of a transformer based language model):
- ![[Screenshot 2026-01-23 at 7.49.03 PM.png]]
- takes the sentence , outputs a list of words and adds to the sentence and basically repeats itself until the MAX_TOKEN_LENGTH that its allowed.
Why “Predict the Next Word” Creates Understanding
- Grammar and Syntax
  - Predicting the next word requires the model to internalize language structure and rules.
- Factual Knowledge
  - Correct predictions rely on learned factual relations between concepts.
- Logic and Understanding
  - Following a chain of reasoning lets the model anticipate the appropriate continuation.
Generation of Text
- ![[Screenshot 2026-01-23 at 8.04.31 PM.png]]
- Generation is a iterative loop.
- Each predicted word is added back to the input and the model predicts the next word based on the updated context.

Foundation Models

“Foundation” models are pre-trained models based on huge corpus of data and training loop.
It’s kind of like the base knowledge.
We no longer build tools from scratch, we build on top of these giants.
General Purpose AI

Current Directions (2024-25)

Multimodality
- AI is no longer just text. It can see images, hear voices, and speak back in real time.
Reasoning
- How models are designed to “think” before they speak, solving complex math and logic problems.
Agents
- shift from chatbot to agents that can use tools, browse the we, and complete multi-step tasks
- different applications that we see on the internet that use some source of LLM
  - take input from user, use some LLM and building on top of that.