Fast Tracking the Course of AI
1/23/26
Introduction
- field is moving really really fast
- understand how modern AI engineering is done
- tools , use cases , agents ,skills - subset of buzz words - a lot of other things are present.
- ask lots of intuitive questions during interviews and also learn about the interview process as well
- understand how LLMs actually work
- research also involves implementing a lot of stuff that you work with daily
- things to do
- basic python
- have a colab account ready
- look at different models available in the market today and explore them
- lot of models in the market -> try to also look at os models like qwen, deepseek etc.
- how different they are from other models.
- free stuff we can try:
- cli tools people are getting from that
- opencode
- gemini cli
- building and being good with the fundamentals is more important than doing a phd now.
- applied ai
- using existing tools, how to make things that are better to use and to find new applications that can be built on top of them.
- what are some good projects ?
- dealing with los of data
- lots of context, lots of data
- how we pass the data to the llm
- memory and need of something -> mix and match of these
- context engineering
- when to use which model
- what are the best models
- how to switch contexts etc.
- agentic tasks, data pipeling and building webapps on top of a agentic skill and definitely a good way to learn about this
- go on colab and learn and type everything
- theory and learn about things
- and what direction we are going towards.
- coding simple transformations and simple blocks of attention
- go out and try qwen, deepseek , llama on openrouter and find examples on where where these models differ , different guard rails and jailbreaking these open access models.
- what do AI engineers do ?
- new “backend” engineer
- pace we are moving is the same thing
- every company needs someone who can do this and has become a new completely different field and high time to learn what this tools are
- lot of people are vibe coding things and moving fast.
- can move in 2-3 projects at two times and deliver more than asked for and using these agents much easier to build and deploy faster.
- openrouter and opencode to play around with different models.
Fast Tracking the Course of AI:
- intelligence
- ability to achieve goals in a wide range of situations.
<History />of AI and other stuff- why language is hard for computers ?
- Natural Language Processing
- 4 different attempts to NLP
- Put up every possible word in the dictionary approach in it -> n-gram model
- Statistical Pattern based techniques -> Pattern Matching != Understanding
- Machine could predict the next word based on frequency, but it had no concept of what the words actually meant.
- Breakthrough -> Words as Numbers (computers can also understand numbers), words doesnt make any implicit sense for computers.
- Challenge -> how do we turn a word into a list of numbers that captures its meaning ?
Apple->[0.92,-0.14,0.05,....]- Hint: these are called embeddings and we are creating embeddings.
- Word Embeddings -> “Secret Sauce”
- How do we turn a word like “Apple” into a number that captures its meaning ?
- Capture the meaning not in 1d space but in a higher dimensional space
- Instead of one number, we give each word a list of numbers(a vector).
- Each number represents a specific “dimension” of meaning.
- very big breakthrough as for every word, it can be very high.
- Eg: king and queen have similar numbers for royalty, but vastly different values for gender. This is how the machines represent concepts as data.
- ![[Screenshot 2026-01-23 at 10.55.13 AM.png]]
- Words as Positions in Space
- If a word is a list of numbers, its a point on a map. Similar words are close together, while different words are far apart.
- ![[Screenshot 2026-01-23 at 10.57.45 AM.png]]
- So , similar words are placed close together in a multi-dimensional space. Proximity indicates shared meaning.
magic of embeddings
- word math
- king - man + woman = queen
- paris - france + italy = rome
- walking - walk + swim = swimming
- can embeddings solve our earlier problems ?
- in basic word embeddings, each word has exactly ONE position in space.
- “I went to the bank to deposit cash”
- “I sat on the bank of the river”
Sequence Models (RNNs)
- okay we can make embeddings now, but what to make of that embedding exactly.
- we are not able to make use of the embedding and use it anyhow.
- Idea: Process the sentence one word at a time, building up understanding as we go.
- Model maintains a “memory” of what it has read so far.
- Eg: other models like GRUS, LSTMS etc. and other divergent fields like Mambas
- Even in fact , Transformers are the best version of top of the sequence models.
- Its a architecture , that deals with the sequence and from each word they retain some memory.
- the context and upcoming word , change the meaning of our word completely.
- Problem : Forgetting
- World ,storage and memroy is finite.
- And it has to go back a long vectors back.
- meaning would completey disrupt as it goes back a lot of tokens.
- RNNs struggle with long sentences because they process information linearly and have limited memory capacity.
- LSTM came up after RNN, added gates for choice to add to its memory and the memory and the information that it needs to retain is very less.
- GRUS came after that
Transformers
- stage is set around 2017
- what we have till now
- word embeddings
- sequence models(RNNs)
- Tons of internet data
- Powerful GPUs
- We are just lacking in a powerful way to make all these work better.
- (old idea) process words one by one, like reading a book from start to end.
- The idea of transformer
- Simultaneous
- Look at every word in the sentence at the same time.
- Attention
- model “attends” to the most relevant words, no matter how far apart they are.
- Simultaneous
- Have to make them more selective and it was wasting more of their information.
- And the second thing that we need to do is , all the word that we are processing should be done parallely.
- Attention enables the model to focus only on the most relevant word.
Attention in Action
- when the model looks at the focus word, the attention mechanism tells it to pay the most attention to it.
- ![[Screenshot 2026-01-23 at 12.58.46 PM.png]]
- how it happens depends on the maths behind it
- Model “knows” what the pronouns refers to buiding a context aware representation of every word is.
- Why Attention is powerful :
- More memory -> every word in a sentence “sees” every other word simultaneously , no matter how far apart they are
- Parallel Processing -> Unlike RNNs that read word by word, Transformers process all words at once, making training incredibly fast.
- Deep Understanding -> The model builds a mathematical map of how every word relates to every other word in the specific context.
- ![[Screenshot 2026-01-23 at 7.38.54 PM.png]]
- By changing just one word (tired vs wide), the model’s attention shifts, correctly identifying what “it” refers to in each context.
- Whole Flow (of a transformer based language model):
- ![[Screenshot 2026-01-23 at 7.49.03 PM.png]]
- takes the sentence , outputs a list of words and adds to the sentence and basically repeats itself until the MAX_TOKEN_LENGTH that its allowed.
- Why “Predict the Next Word” Creates Understanding
- Grammar and Syntax
- Predicting the next word requires the model to internalize language structure and rules.
- Factual Knowledge
- Correct predictions rely on learned factual relations between concepts.
- Logic and Understanding
- Following a chain of reasoning lets the model anticipate the appropriate continuation.
- Grammar and Syntax
- Generation of Text
- ![[Screenshot 2026-01-23 at 8.04.31 PM.png]]
- Generation is a iterative loop.
- Each predicted word is added back to the input and the model predicts the next word based on the updated context.
Foundation Models
- “Foundation” models are pre-trained models based on huge corpus of data and training loop.
- It’s kind of like the base knowledge.
- We no longer build tools from scratch, we build on top of these giants.
- General Purpose AI
Current Directions (2024-25)
- Multimodality
- AI is no longer just text. It can see images, hear voices, and speak back in real time.
- Reasoning
- How models are designed to “think” before they speak, solving complex math and logic problems.
- Agents
- shift from chatbot to agents that can use tools, browse the we, and complete multi-step tasks
- different applications that we see on the internet that use some source of LLM
- take input from user, use some LLM and building on top of that.