PiCrust: What If Pi Was Written in Rust?

Pi is the agent framework that powers OpenClaw. If you've used OpenClaw, you know the experience. You message it on WhatsApp, ask it to do something, and it handles it for you. Book a reservation, look something up, draft a message, whatever. It's a personal assistant, and a very good one. The architecture underneath is beautiful in its simplicity: four tools, a while loop calling an LLM, and an extension system that lets the agent teach itself new tricks. Armin Ronacher wrote about it recently and captured what makes it special better than I could.

But here's where our story diverges.

At Vibework, we're not trying to build a better personal assistant. We're trying to build AI employees. Agents that don't need you sitting there. Agents that run on their own for hours, handle your email while you're in a meeting, build a presentation from a brief overnight, juggle five tasks at once the way a human employee would. When we tried to make that leap, from the "you're there guiding it" model to the "it just goes and does the work" model, we realized pretty quickly that we needed a new foundation.

We needed a new Pi. So we built one.

PiCrust (Pi + Rust, like piecrust, get it?) is that foundation. It's open source.

1. What does an AI employee actually need?

There's a useful distinction between a coding assistant and an AI employee. A coding assistant, think Claude Code, Cursor, Pi, sits in your terminal, pair-programs with you, and waits for instructions. An AI employee does work on its own. It handles your email while you're in a meeting. It builds a presentation from a brief. It runs a research workflow overnight and has results ready by morning. It juggles multiple tasks concurrently, the way a human employee would.

Most agent frameworks are built for the assistant model. And that makes sense, it's where the market is right now. But if you actually think about what the employee model requires, the architecture looks very different.

Long-running, crash-safe sessions. An employee doesn't lose its memory when the process restarts. It needs append-only persistence that can survive power failures, OOM kills, and deployment restarts without corrupting state.

Parallel task execution. A human employee doesn't finish one task before starting another. They context-switch. An AI employee needs the same, multiple agent loops running concurrently on a single runtime, each with isolated state but sharing compute resources.

Full system access. An employee needs hands. Filesystem access. Desktop app control. Browser automation. Shell execution. The ability to reach out and actually touch things, not just generate text in a sandbox.

Structured permission controls. With great access comes great responsibility. You need hooks, permission layers, and safety guardrails that can intercept dangerous operations before they execute, without crippling the agent's autonomy.

Cross-platform native binaries. If your AI employee runs on each user's machine (or in a dedicated VM per user), you can't afford to ship a Python runtime or Node.js installation with it. You need a single binary that runs anywhere.

Pi nails the first principle, simple loop, minimal core. PiCrust takes that and builds the infrastructure for the other four.

2. What Pi gets right (and where we diverge)

Pi's core insight is that agents don't need complex orchestration. No graphs, no DAGs, no state machines. Just a loop: call the LLM, execute whatever tools it asks for, feed results back, repeat. Claude Code proves this works at scale. LangGraph, with its Pregel-inspired graph topology and node-edge abstractions, is solving a problem most agent workflows don't actually have.

I think of an agent harness the way I think of an operating system. Provide the capability to execute tools, access information, manage sessions, along with the basic guardrails for human-in-the-loop control. Then get out of the way. Pi embodies this beautifully. Four tools. Shortest system prompt of any agent. The intelligence lives in the model, not the scaffolding.

PiCrust takes this same philosophy and makes a few deliberate additions.

Pi has four tools. PiCrust ships with nine. Bash, Read, Write, Edit, Glob, Grep, Todo, PresentFile, AskUserQuestion, plus a Tool trait that makes adding more a five-minute job. Pi's approach of "four tools and extend yourself" is elegant for a coding agent where the human is in the loop. For an AI employee running autonomously across desktop applications, you need a richer base toolkit from day one.

Pi's extensions are powerful but dynamic. The agent can write code, hot-reload, test in a loop. That's incredible for the self-extending philosophy Pi celebrates. PiCrust takes a different approach: extensions are compiled Rust traits. You lose the hot-reload magic, but you gain compile-time guarantees that your tool definitions, hook implementations, and injection logic are correct before the agent ever runs. For a production agent that handles your files and controls your apps unsupervised, we'll take the compile-time safety every time.

Pi doesn't do MCP, on purpose. Ronacher explains this well: Pi's philosophy is that agents should extend themselves rather than download external tools. Fair enough. PiCrust includes MCP support out of the box via ToolProvider, because an AI employee needs to integrate with whatever tools the user already has. Different use case, different tradeoff.

Pi runs one agent per process. PiCrust runs many agents on a single async runtime. This is the biggest architectural divergence and honestly the one that matters most. A coding assistant is one conversation. An AI employee is many concurrent tasks. You need the runtime to match.

3. Why Rust is actually the right language for agents

Here's the counterintuitive part. Rust makes agent code easier for both humans and LLMs to understand.

Pi is written in TypeScript, excellent TypeScript by all accounts. But TypeScript (and Python, where most agent frameworks live) carries baggage that matters when LLMs need to read and reason about your agent's code. Inheritance hierarchies. Dynamic typing. Runtime magic. Metaclasses. Properties that compute values on access. An LLM reading a Python agent framework has to hold mental models of class hierarchies, runtime type resolution, and implicit behaviors that aren't visible in the source.

Rust has none of that. No inheritance, just traits and composition. What you see is what you get. Every function signature tells you exactly what goes in and what comes out. No runtime magic, no dynamic dispatch unless you explicitly ask for it with dyn.

When Claude reads PiCrust's source to understand how to build on it, there's zero ambiguity. The types tell the whole story:

pub struct AgentSession {
    pub metadata: SessionMetadata,
    pub messages: Vec<Message>,
    storage: SessionStorage,
}

That's the complete picture. No hidden fields, no inherited state, no dynamically attached attributes. Compare that to a Python session class that might inherit from multiple bases, use __dict__ for dynamic attributes, and have properties that compute values on access. Good luck explaining that to an LLM in a single context window.

But the practical benefits go beyond readability. Rust compiles to a native binary. No runtime. No dependencies beyond the OS. Compile for Mac, compile for Windows, same codebase, both work. No Node.js overhead (sorry, Pi), no Python version conflicts, no dependency hell.

# Compile for Mac
cargo build --release --target aarch64-apple-darwin

# Compile for Windows
cargo build --release --target x86_64-pc-windows-msvc

# Same code. Different targets. Both work.

For an AI employee that ships to user machines or runs in per-user VMs, a single native binary with zero runtime dependencies isn't a nice-to-have. It's table stakes.

4. The loop

At the heart of PiCrust is StandardAgent, a default implementation of the agent loop that handles 95% of real-world workflows. Like Pi, it's just a while loop. Here's the core (simplified from src/agent/standard_loop.rs):

async fn process_turn(&self, internals: &mut AgentInternals, user_input: &str) -> Result<()> {
    internals.session.write().await.add_message(Message::user(user_input))?;

    let mut iterations = 0;

    loop {
        iterations += 1;
        if iterations > self.config.max_tool_iterations {
            internals.send_status("Max tool iterations reached");
            break;
        }

        let (tools, system, messages) = self.prepare_llm_input(internals, messages);

        let (content_blocks, stop_reason) = if self.config.streaming_enabled {
            self.call_llm_streaming(internals, messages, tools, system).await?
        } else {
            self.call_llm_non_streaming(internals, messages, tools, system).await?
        };

        let tool_results = self.execute_tools(internals, &content_blocks).await;

        internals.session.write().await.add_message(
            Message::assistant_with_blocks(content_blocks)
        )?;

        if tool_results.is_empty() {
            break;
        }

        internals.session.write().await.add_message(
            Message::user_with_blocks(tool_results)
        )?;
    }

    Ok(())
}

No hidden layers. No graph abstractions. If you squint, this is the same loop Pi runs, call the LLM, execute tools, feed results back, repeat until done. The difference is everything surrounding it.

Where Pi lets you modify behavior through extensions that hot-reload, PiCrust gives you two mechanisms.

Hooks intercept the loop at five points, PreToolUse, PostToolUse, PostToolUseFailure, UserPromptSubmit, PostAssistantResponse, without touching the loop code. Each hook receives a mutable HookContext with full access to agent internals and can allow, deny, or modify operations. Pattern-based matching lets you target specific tools (regex like "Bash|Shell" or "^mcp__").

impl Hook for CommandSafetyHook {
    fn hook_type(&self) -> HookType { HookType::PreToolUse }

    fn run(&self, ctx: &mut HookContext) -> HookResult {
        if ctx.tool_name.as_deref() == Some("Bash") {
            let cmd = ctx.tool_input.as_ref()
                .and_then(|v| v.get("command"))
                .and_then(|v| v.as_str());
            if cmd.map_or(false, |c| c.contains("rm -rf")) {
                return HookResult::deny("Blocked dangerous command");
            }
        }
        HookResult::allow()
    }
}

Context injection solves a subtle problem: you want to give the LLM dynamic information (time, working directory, recent state) but you don't want stale timestamps polluting session history. PiCrust injects context before each LLM call, after applying cache control. Fresh every time, never persisted.

And if hooks and injection aren't enough? The standard loop is ~900 lines. Fork it. That's the 5% escape hatch, and it's deliberate.

5. Parallel agents, single runtime

This is where PiCrust diverges most sharply from Pi, and honestly it's the feature that matters most for the AI employee model.

Pi is one agent, one process. Excellent for pair-programming. But an AI employee juggles tasks. Handling email while preparing a report while running a background research workflow. You need multiple agent loops running concurrently, each with isolated sessions but sharing compute resources efficiently.

PiCrust's AgentRuntime handles this natively. Each agent is a Tokio task, sharing the same event loop but with fully isolated state:

let runtime = AgentRuntime::new();

let handle1 = runtime.spawn(session1, |internals| {
    StandardAgent::new(config1, llm1).run(internals)
}).await;

let handle2 = runtime.spawn(session2, |internals| {
    StandardAgent::new(config2, llm2).run(internals)
}).await;

Each AgentHandle gives you an external interface, send input, subscribe to output streams via broadcast channels, manage permissions, gracefully interrupt. The runtime also supports spawning subagents that link back to a parent session. Delegate a focused subtask, get results back, continue the main workflow.

In development, run five test agents in one process. In production, run agents per user or per task. The same code works in both modes.

6. The VM-per-agent future

There's a crucial architectural distinction that most framework discussions just completely miss.

Backend agent frameworks like LangGraph and AI SDK assume a shared server model. One server, many users, careful state isolation. This makes sense for chatbots and API-driven assistants where agents live behind HTTP endpoints.

AI employees don't work like that. They need full system access, filesystem, desktop apps, browser, shell. You can't safely give multiple users' agents access to the same machine. The emerging deployment pattern is VM-per-agent: each user or company gets their own isolated environment. The agent runs continuously in that VM with full access to everything, the way an employee has full access to their workstation.

PiCrust is designed for exactly this topology. On a user's desktop, it runs locally with native performance. In production, it runs one agent per container or VM with full isolation. The lightweight binary, no runtime, no interpreter, no garbage collector pauses, makes this economical at scale. You're not paying for Node.js or Python overhead in every single VM.

Sessions are append-only JSONL files, one line per message, crash-safe, restart and pick up where you left off. Each session lives at sessions/{session_id}/history.jsonl with a separate metadata.json for identity, timestamps, and parent/child lineage. No database. No schema migrations. Just files.

7. PiCrust vs. everything else

vs. Pi: PiCrust is Pi's philosophy rewritten in Rust with parallel runtimes, compiled extensions, MCP support, and structured hook/injection systems. Pi is better if you want self-extending agents and hot-reload magic. PiCrust is better if you want type-safe composition, native binaries, and multiple agents on one runtime.

vs. LangGraph: LangGraph makes you think in nodes and edges. PiCrust makes you think in turns and messages. For most agent workflows, chat, LLM, tools, repeat, the loop model is simpler and that's kind of the point. LangGraph is the right choice if you genuinely have complex branching workflows. Most people don't.

vs. AI SDK: Vercel's AI SDK is focused on streaming and UI integration in the JavaScript ecosystem. PiCrust is focused on long-running, stateful agents that survive crashes and run for hours. Completely different problem spaces.

8. Getting started

PiCrust isn't on crates.io yet. For now:

git clone https://github.com/HourSense/shadow-agent-framework
cd shadow-agent-framework

Minimal working agent:

use shadow_agent_framework::{
    agent::{AgentConfig, StandardAgent},
    runtime::AgentRuntime,
    llm::AnthropicProvider,
};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let llm = Arc::new(AnthropicProvider::new(
        std::env::var("ANTHROPIC_API_KEY")?,
        "claude-sonnet-4-5-20250929",
    ));

    let config = AgentConfig::new("You are a helpful assistant")
        .with_tools(my_tools)
        .with_streaming(true)
        .with_auto_save(true);

    let agent = StandardAgent::new(config, llm);
    let runtime = AgentRuntime::new();
    let session = AgentSession::new("my-agent", "assistant", "My Agent", "A helpful agent")?;

    let handle = runtime.spawn(session, |internals| {
        agent.run(internals)
    }).await;

    handle.send_input("Hello!").await?;

    while let Some(output) = handle.subscribe_output().recv().await {
        match output {
            OutputChunk::TextDelta(text) => print!("{}", text),
            OutputChunk::Done => break,
            _ => {}
        }
    }

    Ok(())
}

Read the standard loop at src/agent/standard_loop.rs. It's ~900 lines and it's meant to be read. If it works for you, great. If it doesn't, fork it. That's the point.

What's coming

More hooks. Currently five events. Adding: PreLLMCall, PostLLMCall, SessionStart, SessionEnd, ToolError. Hooks at every boundary in the loop.

RAG on conversation history. Semantic search over past conversations, vectors in SQLite, relevant context injected automatically. No external vector DB.

First-class subagents. Isolated context, timeout, result aggregation. Delegate to a specialist, continue when done.

HTTP server. Optional REST API, send messages, query state, manage sessions. Drop-in for web UIs.

All optional. The loop stays simple.

TLDR

Pi proved that the best agent architecture is the simplest one, a while loop with tools. But the next wave of AI isn't coding assistants, it's AI employees: autonomous agents with full system access, running for hours, juggling concurrent tasks, surviving crashes. That model needs things Pi wasn't designed for. Parallel runtimes. Native binaries. Structured permissions. Compiled extensibility.

PiCrust takes Pi's insight and rebuilds it in Rust for the employee model. Type-safe composition that LLMs can actually read. Native binaries that run anywhere. Multiple agents on a single async runtime. Hooks and injection for customization without forking the loop. And append-only sessions that survive anything.

PiCrust is open source. Clone it. Read the loop. Rearrange the pieces. That's the whole idea.