Building a Small AI Agent from Scratch - Kai

1/27/26

ref: https://github.com/sagnikc395/kai/
Objective: building an app that can help us build other apps.
what does an ai agent do ?
- program that we are building is a CLI tool that
  - accepts a coding task
  - chooses from a set of predefined functions to work on the task , like:
    - scan the files in a directory
    - read a file’s contents
    - overwrites a file’s contents
    - execute the python interpreter on a file
- repeats step2 until the task is complete (or it fails, which is also possible)
Goals Of Project:
- Understand how the AI tools work under the hood
- Writing a CLI tool with Python
- Using a pre-trained LLM to build an agent from scratch.
- How to make better DX for developers to use these tools and how to use it.
Gemini is an LLM.
- Given it a prompt , it will give you back an text , that it believes is the answer.
Tokens:
- Tokens are the currency of LLMs.
- The way LLMs measure how much text they have to process.
- Roughly 4 letters for most models.
LLM APIs aren’t typically used in a “one-shot” manner , as we need to keep the context of the conversation that is happening
- When we are talking to ChatGPT , the conversation has a history, and , if we keep track of that history ,then with each new prompt, the model can see the entire conversation and respond with the larger context of the conversation.
Most importantly, each message in the conversation has a “role”.
- eg: user role vs model / agent role
for testing and building our agent to see if it works properly , added a calculator building app e2e.
capabilities of the agent
- ability to read the contents from files
  - more specifically
- built a guardrail to read the files in the guarded directory so the LLM doesnt run amok.
getting the contents of a file
- return the file contents as a string , or perhaps a error string if something went wrong.
- as always safely scope to this to the specific working directory.