You built an AI agent. It runs on your laptop. It searches the web, processes documents, maybe manages your calendar. Then you close the lid, and it dies.
This is the gap that separates every demo from every product. The moment your laptop lid closes, your agent ceases to exist. And the gap between "runs in a terminal session" and "runs 24/7 in production" is wider than most developers realize.
I built three AI agent systems last year. Each one went through the same painful transition from "works on my machine" to "runs reliably at all hours." Here is what that transition actually involves.
The Five Problems Nobody Warns You About
1. State Persistence
A terminal-based agent holds state in memory. Conversation history, tool results, intermediate reasoning — all gone when the process stops. In production, you need to externalize all of this.
interface AgentSession {
id: string;
userId: string;
messages: Message[];
toolResults: ToolResult[];
status: 'active' | 'paused' | 'completed' | 'failed';
metadata: {
totalTokens: number;
totalToolCalls: number;
startedAt: Date;
lastActiveAt: Date;
};
}
// Persist session state after every agent step
async function persistSession(session: AgentSession): Promise<void> {
await db.query(
`INSERT INTO agent_sessions (id, user_id, state, updated_at)
VALUES ($1, $2, $3, now())
ON CONFLICT (id) DO UPDATE SET state = $3, updated_at = now()`,
[session.id, session.userId, JSON.stringify(session)]
);
}
Every step of the agent loop must be checkpointed. If the process crashes between step 3 and step 4, you need to resume from step 3 without repeating steps 1 and 2. This is not free — serializing and deserializing agent state adds latency and complexity.
2. Long-Running Task Management
Some agent tasks take minutes or hours. A deep research task might involve 50+ tool calls across multiple sources. You cannot hold an HTTP connection open for 30 minutes.
The pattern is async job processing: the user submits a task, the server returns a job ID immediately, and the agent runs in the background. The user polls for progress or receives a webhook when the task completes.
// Submit a long-running agent task
app.post('/api/agent/tasks', async (req, res) => {
const task = await createAgentTask({
userId: req.user.id,
prompt: req.body.prompt,
tools: req.body.tools,
maxDuration: 300000, // 5 minutes max
});
// Return immediately with task ID
res.json({ taskId: task.id, status: 'queued' });
// Process in background
processAgentTask(task).catch((error) => {
updateTaskStatus(task.id, 'failed', error.message);
});
});
// Poll for status
app.get('/api/agent/tasks/:id', async (req, res) => {
const task = await getAgentTask(req.params.id);
res.json({
status: task.status,
progress: task.progress,
result: task.status === 'completed' ? task.result : undefined,
});
});
3. Error Recovery
In a demo, when something fails, you restart and try again. In production, errors are expected and must be handled gracefully. An agent that crashes on a tool failure, a rate limit, or a network timeout is not production-ready.
async function executeWithRecovery(
agent: Agent,
task: AgentTask,
maxRetries = 3
): Promise<AgentResult> {
let lastError: Error | null = null;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
// Resume from last checkpoint
const checkpoint = await loadCheckpoint(task.id);
const result = await agent.run(task, { resumeFrom: checkpoint });
return result;
} catch (error) {
lastError = error as Error;
if (isRateLimitError(error)) {
// Wait and retry
await sleep(getRetryDelay(attempt));
continue;
}
if (isToolError(error)) {
// Skip the failed tool and continue
await markToolFailed(task.id, error);
continue;
}
// Unrecoverable error — fail the task
break;
}
}
return {
status: 'failed',
error: lastError?.message ?? 'Unknown error',
partialResult: await getPartialResult(task.id),
};
}
The key insight: always return partial results. If the agent completed 7 out of 10 steps before failing, the user should see those 7 results rather than nothing.
4. Resource Management
A demo agent has unlimited tokens and unlimited time. A production agent needs budgets.
interface ResourceBudget {
maxTokens: number;
maxToolCalls: number;
maxDurationMs: number;
maxCostCents: number;
}
class BudgetedAgent {
private consumed = { tokens: 0, toolCalls: 0, costCents: 0 };
private startTime = Date.now();
async step(): Promise<StepResult> {
// Check budget before each step
if (this.consumed.tokens >= this.budget.maxTokens) {
return { status: 'budget_exceeded', reason: 'token_limit' };
}
if (Date.now() - this.startTime >= this.budget.maxDurationMs) {
return { status: 'budget_exceeded', reason: 'time_limit' };
}
if (this.consumed.costCents >= this.budget.maxCostCents) {
return { status: 'budget_exceeded', reason: 'cost_limit' };
}
const result = await this.executeStep();
this.consumed.tokens += result.tokensUsed;
this.consumed.toolCalls++;
this.consumed.costCents += result.estimatedCost;
return result;
}
}
Without budgets, a runaway agent can drain your API credits in minutes. I learned this the hard way when a research agent got stuck in a loop, making 200+ API calls before I noticed. That cost $45 and accomplished nothing.
5. Observability
When an agent runs in your terminal, you see everything — every thought, every tool call, every result. In production, you see nothing unless you build observability.
Log every agent decision point:
- What tool did it choose and why?
- What parameters did it pass?
- What result did it get?
- How long did each step take?
- How many tokens did each step consume?
This is not optional. When a user reports "the agent gave me a wrong answer," you need to trace the exact sequence of decisions that led there. Without observability, debugging production agent issues is impossible.
The Deployment Architecture
For my agent systems, I use a simple three-tier architecture:
-
API layer (Cloudflare Workers): Accepts task submissions, serves status queries, handles authentication. Stateless, fast, globally distributed.
-
Agent runtime (long-running compute): Runs the actual agent loops. This cannot run on Workers (CPU time limits). I use a simple Node.js process on Railway or Fly.io, connected to the same PostgreSQL database.
-
Storage layer (PostgreSQL): Session state, checkpoints, results, and observability logs. Everything in one database.
The API layer communicates with the agent runtime through the database — task submissions go into a queue table, and the runtime polls for new tasks. This is simpler than a message queue for the volumes I handle, and it keeps the architecture to three components instead of four.
When "Always-On" Is Not Worth It
Not every agent needs to run 24/7. If the agent only serves users during business hours, a scheduled approach is cheaper and simpler. If the task is short enough (under 30 seconds), running the agent inline in the API request is fine — no background processing needed.
Always-on infrastructure adds cost and complexity. Use it when the task requires it — long-running research, continuous monitoring, scheduled workflows. For everything else, keep it simple.
The gap between a demo agent and a production agent is not intelligence. It is infrastructure. State persistence, error recovery, resource budgets, and observability — these are not AI problems. They are software engineering problems. And they are the problems that determine whether your agent is a toy or a product.