Building Agents from First Principles

Michael Huang
Mar 15
6 min read

Context vs Capabilities: The Mental Model Behind Every Agent That Actually Works

Most agents fail not because the model is bad, nor because the code is wrong. It is usually because the builder got one thing backwards: they gave the agent incredible tools, and forgot to tell it what it needed to know.

Ivan, formerly AI Engineer of Manus.ai, spent a Thursday evening at SQ Collective unpacking exactly why. His talk was called "Building Your AI Assistant: From First Principles to Agentic."

Start With the Simplest Definition

Ivan opened with a question: what actually is an agent?

"Fundamentally, if you've let Claude Code or any coding agent run wild, it just keeps burning tokens until the job is done."

That's it. An agent is:

1) a language model in a loop

2) with enough token budget to keep going until

3) it reaches a result.

There's no magic to it. It's just a model making decisions repeatedly, with access to tools that let it act on those decisions.

The shift from traditional software to agents is a shift in who encodes the steps. In traditional code, a programmer writes every branch, every conditional, every fallback. In an agent, the model decides what to do next. You define the space. The model navigates it.

There you have it, what an agent is.

The Framework: Context vs Capabilities

Ivan's central mental model divides everything an agent needs into two buckets.

Context is what the agent knows. The current date. The user's name. Their preferences. What happened in the last session. Any background information the model needs to make good decisions. Context lives in the prompt—either directly or via retrieval.

Capabilities are what the agent can do. Web search. File system access. Calendar reads and writes. Code execution. API calls. Each capability is a tool that the model can invoke.

Most production failures, Ivan said, trace back to mismanaging this split. Some scenarios:

You give the agent powerful tools but forget that it has no idea what day it is.
You flood the context with information the model doesn't need, wasting tokens and diluting signal.
You define capabilities in ways that make sense to you but confuses the model.

The Manus team wrote a deep technical post on context engineering after building one of the world's most capable autonomous agents. The core lesson maps directly to Ivan's framework: the quality of what you put in the context window determines the quality of what the agent does.

Anthropic's engineering team also reached the same conclusion independently: context compaction and careful curation are what separate reliable agents from unreliable ones.

Get the balance right and your agent becomes robust. Getting it wrong will send you down and you're debugging mysteriously bad decisions that make perfect sense once you realize the model was working from incomplete information.

Tool Calling: The Quiet Revolution

Ivan called tool calling one of the biggest innovations in AI.

Before tool calling, agents were fragile. You'd ask a model to take an action and it would output some JSON you'd have to try to parse. The format would drift. The parsing would break. You'd spend more time wrangling output formatting than actually building capabilities.

Tool calling replaced all of that with a clean contract. The model receives a schema describing available functions: name, parameters, types, descriptions. When it needs to act, it outputs a structured function call. The runtime executes it and returns the result. The model continues. Two reasons why this is powerful:

First, reliability. The model isn't trying to produce parseable text—it's producing a structured object against a defined schema. That's a much easier task. Failure rates drop.
Second, abstraction. The tool definition the model sees is completely decoupled from the implementation behind it. You can describe a `search_web` tool to the model. Under the hood, you can swap between Brave, Perplexity, or your own index without changing anything the model knows. You can mock it in tests. You can version it. You can A/B test implementations without touching the agent logic.

That separation between interface and implementation, is standard software engineering. Tool calling brought it to AI agents. It's why serious agent frameworks all converge on this pattern.

Memory That Actually Works

Ivan used OpenClaw as a live case study, as an example of what thoughtful agent design looks like in practice.

The memory architecture he described solves one of the hardest problems in agentic systems: what does the agent remember, and in what form?

There are three layers:

1. Raw JSONL transcripts. Every conversation, verbatim, stored as structured logs. Complete fidelity. Expensive to load in full.

2. Periodic summarizations. Date-stamped files that condense recent activity into usable summaries. The agent can load a week of context without loading thousands of raw messages.

3. Solve-on-empty file. A preferences file with user-specific context that's always present. The things that should always be in scope—name, preferences, recurring tasks, working style.

Most agent memory implementations pick one of these and call it done. They either have raw logs (too verbose) or summaries (lossy) or a static system prompt (too rigid). The three-layer approach gives you fidelity, efficiency, and persistence simultaneously.

The more significant point Ivan made: non-technical people can set this up on a Mac. That's a design constraint, not a limitation. When you build for accessibility, you're forced to remove complexity that wasn't earning its keep. The result is usually a better system.

Removing Structure as Models Get Smarter

As models improve, you should add less scaffolding, not more. Earlier agent frameworks were elaborate: Explicit state machines, hardcoded step sequences, retry logic with specific fallback behaviors.

You had to do this because the model couldn't reliably do the work on its own.

With Claude 3.7 Sonnet, the calculus changed. Its hybrid reasoning capability—the ability to pause and think before responding—means you can hand it genuinely ambiguous multi-step tasks and trust that it'll navigate them. The model now handles reasoning that used to require explicit programming.

Ivan's concrete example: newer coding agents don't have a dedicated file-reading tool. They just write a `cat` command. Or a Python script. The model figures out what it needs and codes out those tools itself. And this is not a limitation of the toolset—it's a sign of model capability.

The agent isn't constrained to the tools you enumerate; it can synthesize new approaches on the fly.

The implication for builders: invest in defining the problem space clearly (context + capabilities), not in scripting the solution path. The model will handle the path. Your job is to give it an accurate map.

The Endgame: Software That Extends Itself

Where does this lead? Ivan's answer: self-extending software. Agents that don't just use tools. They are able to write their own.

In the OpenClaw model, skills are just prompts. A new capability is a new instruction file. When the agent encounters something it can't do, it can author a skill to handle it. Hooks let the agent respond to lifecycle events. Meta-prompting—where one model writes the system prompt or tool descriptions for another—now works reliably enough that manually authoring tool schemas is increasingly unnecessary.

"The future of agentic applications is software that extends itself."

That's not a research hypothesis. Manus shipped in early 2025 as one of the first agents capable of operating autonomously across complex tasks. OpenClaw runs on consumer hardware today. The patterns Ivan described aren't waiting for better models—they're available now, to anyone willing to think carefully about context and capabilities.

What to Try This Week

If you want to start applying Ivan's framework:

Define context before you define tools. Write down exactly what your agent needs to know to make good decisions. Date, user preferences, task history—be explicit. Then ask yourself what's missing.

Audit your tool list. What can your agent do? Are the tool descriptions accurate and unambiguous? Could the model misinterpret a parameter name? Tool quality matters more than tool quantity.

Remove scaffolding incrementally. If you have explicit step sequences, try handing one to the model without the script. See if it handles it. Newer models often surprise you.

Build memory in layers. Raw logs for fidelity. Summaries for efficiency. A persistent preferences file for identity. Don't pick one—stack all three.

Start with simple loops. Ivan's educational code was intentionally minimal. A model, a loop, a few tools, a clear context. That's enough to build something real.

The gap between "I understand agents" and "I've built an agent" is smaller than it looks. You just need the right framework to close it.

Missed out on this session?

Don't worry, these conversations happen every Friday at SQ Collective.

Usually over laptops. Sometimes over pizza.

You're welcome to join the next one.