Running Code in Sandboxes — Agentuity Documentation

Running Code in Sandboxes

Run code in isolated, secure containers with configurable resources

Execute code in isolated Linux containers with configurable resource limits, network controls, and execution timeouts.

Why Sandboxes?

Agents that reason about code need somewhere safe to execute it. Whether generating Python scripts, validating builds, or running user-provided code, you can't let arbitrary execution happen on your infrastructure.

The pattern keeps repeating: spin up a secure environment, run code, tear it down. Without proper isolation, a single bad script could access sensitive data, exhaust resources, or compromise your systems.

Agentuity sandboxes handle this automatically. Every execution runs in an isolated container with its own filesystem, configurable resource limits, and network controls. When execution completes, the environment is destroyed.

What this gives you:

  • Security by default: Network disabled, filesystem isolated, resource limits enforced
  • No infrastructure management: Containers spin up and tear down automatically
  • Multi-language support: Run Python, Node.js, shell scripts, and more
  • Consistent environments: Use snapshots to get the same setup every time, with dependencies pre-installed

Three Ways to Use Sandboxes

MethodBest For
Web AppVisual management, browsing runtimes and snapshots
SDKProgrammatic use in agents and routes (ctx.sandbox)
CLILocal development, scripting, CI/CD

Key Concepts

ConceptDescription
RuntimeA pre-configured base environment (OS + language tools) provided by Agentuity
SandboxA running container created from a runtime where you execute commands
SnapshotA saved sandbox state that can be used to create new sandboxes

These build on each other: Runtime → Sandbox → Snapshot. Here's an example of how to use all three:

  1. Pick a runtime (e.g., bun:1 or node:latest)
  2. Create a sandbox from that runtime
  3. Optionally save a snapshot to reuse your configured environment

Runtimes

Runtimes are pre-configured base environments that Agentuity provides. Each includes an operating system, language toolchain, and common utilities.

Language Runtimes

Use these for general code execution:

RuntimeDescription
base:latestMinimal Debian runtime with essential tools (default)
bun:1Bun 1.x with JavaScript/TypeScript support
node:latestNode.js latest version
node:ltsNode.js LTS version
python:3.13Python 3.13
python:3.14Python 3.14
golang:latestGolang latest

Agent Runtimes

Pre-configured AI coding assistants:

RuntimeDescription
claude-code:latestClaude Code AI assistant
amp:latestAmp AI coding assistant
codex:latestOpenAI Codex CLI agent
gemini-cli:latestGoogle Gemini CLI agent
opencode:latestOpenCode AI coding assistant
agentuity:latestAgentuity CLI for building and running AI agents

Testing Runtimes

Pre-configured testing runtimes:

RuntimeDescription
agent-browser:latestHeadless browser automation CLI for AI agents
playwright:v1Playwright browser automation runtime (Chrome, Firefox, WebKit)

Runtime Metadata

Each runtime includes metadata for identification and resource planning:

FieldDescription
descriptionWhat the runtime provides
iconURL to runtime icon
brandColorHex color for UI display
documentationUrlLink to runtime documentation
tagsCategories like language, ai-agent
requirementsMinimum memory, CPU, disk, and network needs

View runtime details with agentuity cloud sandbox runtime list --json.

Snapshots

A snapshot captures the filesystem state of a sandbox. You create new sandboxes from a snapshot rather than running it directly.

Snapshots build on top of runtimes. When you create a snapshot, it includes everything from the base runtime plus your installed dependencies and files.

Workflow:

  1. Create a sandbox from a runtime
  2. Install dependencies and configure the environment
  3. Save a snapshot
  4. Create new sandboxes from that snapshot (fast, no reinstallation needed)

See Creating and Using Snapshots for details.

Two Execution Modes

Choose based on your use case:

One-shot (sandbox.run())

Creates a sandbox, runs a single command, then destroys the sandbox. Best for stateless code execution.

import { createAgent } from '@agentuity/runtime';
 
const agent = createAgent('CodeRunner', {
  handler: async (ctx, input) => {
    const result = await ctx.sandbox.run({
      command: { exec: ['python3', '-c', 'print("Hello!")'] },
      resources: { memory: '256Mi', cpu: '500m' },
    });
 
    ctx.logger.info('Output', { stdout: result.stdout, exitCode: result.exitCode });
    return { output: result.stdout, exitCode: result.exitCode };
  },
});

Interactive (sandbox.create())

Creates a persistent sandbox for multiple commands. Best for stateful workflows like dependency installation.

import { createAgent } from '@agentuity/runtime';
 
const agent = createAgent('ProjectBuilder', {
  handler: async (ctx, input) => {
    const sandbox = await ctx.sandbox.create({
      resources: { memory: '1Gi' },
      network: { enabled: true },  // Required for package installation
    });
 
    try {
      await sandbox.execute({ command: ['npm', 'install'] });
      await sandbox.execute({ command: ['npm', 'run', 'build'] });
      return { success: true };
    } finally {
      await sandbox.destroy();
    }
  },
});

Background Jobs

Jobs let you run long-running commands in a sandbox without blocking. Unlike regular execution, jobs:

  • Run in parallel: Multiple jobs can execute simultaneously
  • Don't block: Control returns immediately after creation
  • Persist: Jobs continue even after the creating request completes
  • Capture output: Stdout/stderr are captured to streams for later retrieval

Creating Jobs

import { createAgent } from '@agentuity/runtime';
 
const agent = createAgent('BuildRunner', {
  handler: async (ctx, input) => {
    const sandbox = await ctx.sandbox.create({
      resources: { memory: '2Gi' },
      network: { enabled: true },
    });
 
    // Create a background job
    const job = await sandbox.jobCreate({
      command: ['npm', 'run', 'build'],
    });
 
    ctx.logger.info('Build started', { jobId: job.jobId });
 
    // Check status later
    const status = await sandbox.jobGet({ jobId: job.jobId });
    if (status.status === 'completed') {
      ctx.logger.info('Build succeeded', { exitCode: status.exitCode });
    }
 
    return { jobId: job.jobId };
  },
});

Job Lifecycle

StatusDescription
pendingJob created, waiting to start
runningJob actively executing
completedFinished with exit code 0
failedFinished with non-zero exit code
cancelledTerminated by user request

Stopping Jobs

// Graceful stop (SIGTERM, then SIGKILL after grace period)
await sandbox.jobStop({ jobId: job.jobId });
 
// Force kill immediately
await sandbox.jobStop({ jobId: job.jobId, force: true });

Use Cases

Use CaseExample
Build processesRun npm run build in background
Long-running testsExecute test suites without blocking
Data processingProcess large files asynchronously
Service daemonsRun background services in sandbox

SDK Access

ContextAccess
Agentsctx.sandbox
Routesc.var.sandbox

The API is identical in both contexts.

Configuration Options

OptionDescriptionExample
runtimeRuntime environment'bun:1', 'python:3.14'
resources.memoryMemory limit (Kubernetes-style)'512Mi', '1Gi'
resources.cpuCPU limit in millicores'500m', '1000m'
resources.diskDisk space limit'1Gi'
network.enabledAllow outbound networktrue (default: false)
network.portPort to expose to internet (1024-65535)3000
projectIdAssociate sandbox with a project'proj_abc123'
timeout.idleIdle timeout before cleanup'10m', '1h'
timeout.executionMax execution time per command'5m', '30s'
dependenciesApt packages to install['python3', 'git']
envEnvironment variables{ NODE_ENV: 'test' }
snapshotCreate from existing snapshot'my-env' or snp_abc123

Sandbox Events

Every sandbox records lifecycle events as it transitions through states. Use sandboxEventList to retrieve these events for auditing or debugging.

import { sandboxEventList } from '@agentuity/server';
 
const { events } = await sandboxEventList(client, {
  sandboxId: 'sbx_abc123',
  limit: 50,            // optional, default 50
  direction: 'asc',    // optional: 'asc' (oldest first, default) or 'desc'
});

Each event includes:

FieldDescription
eventIdUnique identifier for the event
sandboxIdID of the sandbox
typeEvent type (e.g., create, destroy, lifecycle:started)
eventArbitrary payload data for the event
createdAtISO timestamp when the event was recorded

From the CLI, use agentuity cloud sandbox events <sandbox-id> to list events. See CLI Commands for options.

Auto-Resume on Execute

When you call exec on a suspended sandbox, the sandbox automatically resumes before running the command. The response includes autoResumed: true so you can detect and log this behavior.

const execution = await sandbox.execute({ command: ['bun', 'run', 'test'] });
 
if (execution.autoResumed) {
  ctx.logger.info('Sandbox was resumed automatically before execution');
}

When to Use Sandbox

Use CaseExample
Code execution agentsRun user-provided Python/JavaScript safely
Code validationVerify generated code compiles and runs
AI coding assistantsExecute code suggested by LLMs
Automated testingRun tests in clean environments
Build systemsCompile projects in isolated containers

Security

Sandboxes provide isolation through:

  • Network disabled by default: Enable explicitly when needed
  • Resource limits: Prevent resource exhaustion
  • Execution timeouts: Prevent runaway processes
  • Filesystem isolation: Each sandbox has its own workspace

Next Steps

  • SDK Usage: Detailed API for file I/O, streaming, and advanced configuration
  • Snapshots: Skip dependency installation with pre-configured environments
  • CLI Commands: Debug sandboxes and create snapshots manually