Skip to content

// projects

Tool-Using Agent

An LLM that plans, calls real tools, remembers context, and stays inside its guardrails.

An agent that decomposes a goal into steps, calls typed tools (search, database, HTTP, calculator) to act on the world, and keeps short- and long-term memory across turns. It runs inside guardrails — schema-validated tool arguments, an action allow-list, and step and token limits — so it cannot wander or loop forever. It is the project that proves you understand the control loop behind agentic AI, not just the buzzword.

AdvancedPythonFastAPIOpenAI function callingPydanticSQLite
01The problem

A single LLM call can answer a question, but it cannot check a live order status, query a database, or take a multi-step action. Naively looping a model over tool calls fails in expensive ways: it hallucinates tool arguments, loops forever, or takes actions no one authorized. The real engineering problem is the control loop — turning a text model into a reliable worker with typed tools, bounded autonomy, memory, and an audit trail of what it did and why.

02Architecture
  1. 01

    Planner & control loop

    An orchestration loop prompts the model with the goal and available tools, receives a structured action (tool name plus arguments), executes it, feeds the observation back, and repeats until the model emits a final answer or hits a limit.

  2. 02

    Typed tool registry

    Each tool declares a Pydantic schema for its inputs; tool-call arguments from the model are validated against that schema before anything runs, so malformed calls are rejected instead of executed.

  3. 03

    Execution sandbox

    Tools (web search, SQL read, HTTP GET, calculator) run behind an allow-list with timeouts; any side-effecting tool requires an explicit confirmation flag before it can act.

  4. 04

    Memory

    Short-term working memory holds the running transcript; long-term memory stores durable facts in SQLite and pulls the relevant ones back into context on later turns.

  5. 05

    Guardrails & limits

    Hard caps on steps, wall-clock time, and token budget stop runaway loops, and a validation layer checks outputs against policy before they are returned.

  6. 06

    Tracing

    Every step — prompt, chosen tool, arguments, observation — is logged as a structured trace so a run can be replayed and debugged.

  7. 07

    API surface

    A FastAPI endpoint accepts a goal and streams the intermediate steps and final result, with the full trace available for inspection.

03Key trade-offs

Function calling with validated schemas over free-text tool parsing

Letting the model emit JSON against a declared schema and validating with Pydantic removes a whole class of brittle string-parsing bugs. You depend on the provider function-calling format, but gain reliability you would otherwise hand-roll.

Explicit step and token limits over open-ended autonomy

Bounded loops occasionally stop one step short of a goal, but they make cost and latency predictable and prevent the pathological infinite-tool-call runs that make agents scary in production.

SQLite for memory instead of a vector database

For a portfolio-scale agent, a simple relational store with a few indexed fields is easier to reason about and inspect than a vector store — you can graduate to embeddings-based recall when memory actually gets large.

Confirmation gates on side-effecting tools

Requiring an explicit flag before a tool can write or send anything trades a little friction for a lot of safety, and shows you think about blast radius.

04How you know it works
  • A suite of task scenarios with known-correct outcomes and expected tool sequences, run repeatedly to measure task success rate.
  • Tool-choice accuracy: did the agent pick the right tool with valid arguments, scored against labeled traces.
  • Guardrail tests that assert the agent refuses out-of-policy actions and stops at the step and token limits instead of looping.
  • Regression runs on every prompt or tool change, tracking success rate and average steps per task so a 'smarter' prompt that doubles cost gets caught.
05Deployment
  • Dockerized service where tools are configured per environment; the allow-list and limits are config, not code.
  • Provider keys and tool credentials come from the secret store, and the agent process runs with least-privilege access to each tool.
  • CI runs the scenario suite and guardrail tests before promotion; a failed guardrail test blocks the deploy.
  • Structured traces and per-run cost and step metrics are exported so you can monitor success rate and spend in production.
06Interview talking points
  1. 01How your control loop turns a text model into a bounded worker, and where you put the limits.
  2. 02How schema-validated function calling removed brittle parsing and made tool calls reliable.
  3. 03How you separated short-term transcript memory from durable long-term memory.
  4. 04The guardrails you would add before letting an agent take real side-effecting actions.

Video walkthrough

Watch it built, end to end

A full video walkthrough — architecture, trade-offs, evals, and deployment — ships with the AI Engineer Interview & Portfolio Kit at launch (August 2026). There is no fake demo here: join the waitlist and you will get it the day it lands.

Production AI Notes

One practical AI engineering email each week

One concept, one architecture, one project idea, and one interview question — written for developers who want to build and ship real AI systems.

No spam. Unsubscribe anytime.