Fast Tracking the Course of AI

1/23/26

Introduction

  • field is moving really really fast
  • understand how modern AI engineering is done
  • tools , use cases , agents ,skills - subset of buzz words - a lot of other things are present.
  • ask lots of intuitive questions during interviews and also learn about the interview process as well
  • understand how LLMs actually work
  • research also involves implementing a lot of stuff that you work with daily
  • things to do
    • basic python
    • have a colab account ready
    • look at different models available in the market today and explore them
    • lot of models in the market -> try to also look at os models like qwen, deepseek etc.
      • how different they are from other models.
  • free stuff we can try:
    • cli tools people are getting from that
    • opencode
    • gemini cli
  • building and being good with the fundamentals is more important than doing a phd now.
  • applied ai
    • using existing tools, how to make things that are better to use and to find new applications that can be built on top of them.
  • what are some good projects ?
    • dealing with los of data
    • lots of context, lots of data
    • how we pass the data to the llm
    • memory and need of something -> mix and match of these
  • context engineering
    • when to use which model
    • what are the best models
    • how to switch contexts etc.
  • agentic tasks, data pipeling and building webapps on top of a agentic skill and definitely a good way to learn about this
  • go on colab and learn and type everything
    • theory and learn about things
    • and what direction we are going towards.
  • coding simple transformations and simple blocks of attention
  • go out and try qwen, deepseek , llama on openrouter and find examples on where where these models differ , different guard rails and jailbreaking these open access models.
  • what do AI engineers do ?
    • new “backend” engineer
    • pace we are moving is the same thing
    • every company needs someone who can do this and has become a new completely different field and high time to learn what this tools are
    • lot of people are vibe coding things and moving fast.
    • can move in 2-3 projects at two times and deliver more than asked for and using these agents much easier to build and deploy faster.
  • openrouter and opencode to play around with different models.

Fast Tracking the Course of AI:

  • intelligence
    • ability to achieve goals in a wide range of situations.
  • <History /> of AI and other stuff
  • why language is hard for computers ?
    • Natural Language Processing
    • 4 different attempts to NLP
      • Put up every possible word in the dictionary approach in it -> n-gram model
      • Statistical Pattern based techniques -> Pattern Matching != Understanding
        • Machine could predict the next word based on frequency, but it had no concept of what the words actually meant.
        • Breakthrough -> Words as Numbers (computers can also understand numbers), words doesnt make any implicit sense for computers.
        • Challenge -> how do we turn a word into a list of numbers that captures its meaning ?
        • Apple -> [0.92,-0.14,0.05,....]
        • Hint: these are called embeddings and we are creating embeddings.
      • Word Embeddings -> “Secret Sauce”
        • How do we turn a word like “Apple” into a number that captures its meaning ?
        • Capture the meaning not in 1d space but in a higher dimensional space
        • Instead of one number, we give each word a list of numbers(a vector).
        • Each number represents a specific “dimension” of meaning.
        • very big breakthrough as for every word, it can be very high.
      • Eg: king and queen have similar numbers for royalty, but vastly different values for gender. This is how the machines represent concepts as data.
      • ![[Screenshot 2026-01-23 at 10.55.13 AM.png]]
      • Words as Positions in Space
        • If a word is a list of numbers, its a point on a map. Similar words are close together, while different words are far apart.
      • ![[Screenshot 2026-01-23 at 10.57.45 AM.png]]
      • So , similar words are placed close together in a multi-dimensional space. Proximity indicates shared meaning.

magic of embeddings

  • word math
    • king - man + woman = queen
    • paris - france + italy = rome
    • walking - walk + swim = swimming
  • can embeddings solve our earlier problems ?
    • in basic word embeddings, each word has exactly ONE position in space.
    • “I went to the bank to deposit cash”
    • “I sat on the bank of the river”

Sequence Models (RNNs)

  • okay we can make embeddings now, but what to make of that embedding exactly.
  • we are not able to make use of the embedding and use it anyhow.
  • Idea: Process the sentence one word at a time, building up understanding as we go.
    • Model maintains a “memory” of what it has read so far.
  • Eg: other models like GRUS, LSTMS etc. and other divergent fields like Mambas
  • Even in fact , Transformers are the best version of top of the sequence models.
  • Its a architecture , that deals with the sequence and from each word they retain some memory.
  • the context and upcoming word , change the meaning of our word completely.
  • Problem : Forgetting
    • World ,storage and memroy is finite.
    • And it has to go back a long vectors back.
    • meaning would completey disrupt as it goes back a lot of tokens.
  • RNNs struggle with long sentences because they process information linearly and have limited memory capacity.
    • LSTM came up after RNN, added gates for choice to add to its memory and the memory and the information that it needs to retain is very less.
    • GRUS came after that

Transformers

  • stage is set around 2017
  • what we have till now
    • word embeddings
    • sequence models(RNNs)
    • Tons of internet data
    • Powerful GPUs
  • We are just lacking in a powerful way to make all these work better.
  • (old idea) process words one by one, like reading a book from start to end.
  • The idea of transformer
    • Simultaneous
      • Look at every word in the sentence at the same time.
    • Attention
      • model “attends” to the most relevant words, no matter how far apart they are.
  • Have to make them more selective and it was wasting more of their information.
  • And the second thing that we need to do is , all the word that we are processing should be done parallely.
    • Attention enables the model to focus only on the most relevant word.

Attention in Action

  • when the model looks at the focus word, the attention mechanism tells it to pay the most attention to it.
  • ![[Screenshot 2026-01-23 at 12.58.46 PM.png]]
  • how it happens depends on the maths behind it
  • Model “knows” what the pronouns refers to buiding a context aware representation of every word is.
  • Why Attention is powerful :
    • More memory -> every word in a sentence “sees” every other word simultaneously , no matter how far apart they are
    • Parallel Processing -> Unlike RNNs that read word by word, Transformers process all words at once, making training incredibly fast.
    • Deep Understanding -> The model builds a mathematical map of how every word relates to every other word in the specific context.
  • ![[Screenshot 2026-01-23 at 7.38.54 PM.png]]
  • By changing just one word (tired vs wide), the model’s attention shifts, correctly identifying what “it” refers to in each context.
  • Whole Flow (of a transformer based language model):
    • ![[Screenshot 2026-01-23 at 7.49.03 PM.png]]
    • takes the sentence , outputs a list of words and adds to the sentence and basically repeats itself until the MAX_TOKEN_LENGTH that its allowed.
  • Why “Predict the Next Word” Creates Understanding
    • Grammar and Syntax
      • Predicting the next word requires the model to internalize language structure and rules.
    • Factual Knowledge
      • Correct predictions rely on learned factual relations between concepts.
    • Logic and Understanding
      • Following a chain of reasoning lets the model anticipate the appropriate continuation.
  • Generation of Text
    • ![[Screenshot 2026-01-23 at 8.04.31 PM.png]]
    • Generation is a iterative loop.
    • Each predicted word is added back to the input and the model predicts the next word based on the updated context.

Foundation Models

  • “Foundation” models are pre-trained models based on huge corpus of data and training loop.
  • It’s kind of like the base knowledge.
  • We no longer build tools from scratch, we build on top of these giants.
  • General Purpose AI

Current Directions (2024-25)

  • Multimodality
    • AI is no longer just text. It can see images, hear voices, and speak back in real time.
  • Reasoning
    • How models are designed to “think” before they speak, solving complex math and logic problems.
  • Agents
    • shift from chatbot to agents that can use tools, browse the we, and complete multi-step tasks
    • different applications that we see on the internet that use some source of LLM
      • take input from user, use some LLM and building on top of that.