Running Code in Sandboxes — Agentuity Documentation

Running Code in Sandboxes

Run code in isolated, secure containers with configurable resources

Execute code in isolated Linux containers with configurable resource limits, network controls, and execution timeouts.

Why Sandboxes?

Agents that reason about code need somewhere safe to execute it. Whether generating Python scripts, validating builds, or running user-provided code, you can't let arbitrary execution happen on your infrastructure.

The pattern keeps repeating: spin up a secure environment, run code, tear it down. Without proper isolation, a single bad script could access sensitive data, exhaust resources, or compromise your systems.

Agentuity sandboxes handle the isolation layer. One-shot runs create a sandbox, execute a command, and destroy it. Interactive sandboxes keep their filesystem until you destroy them or the idle timeout reaps them.

What this gives you:

  • Security by default: Network disabled, filesystem isolated, resource limits enforced
  • No infrastructure management: Containers spin up and tear down automatically
  • Multi-language support: Run Python, Node.js, shell scripts, and more
  • Consistent environments: Use snapshots to get the same setup every time, with dependencies pre-installed

Three Ways to Use Sandboxes

MethodBest For
Web AppVisual management, browsing runtimes and snapshots
SDKProgrammatic use in agents and routes (ctx.sandbox)
CLILocal development, scripting, CI/CD

Key Concepts

ConceptDescription
RuntimeA pre-configured base environment (OS + language tools) provided by Agentuity
SandboxA running container created from a runtime where you execute commands
SnapshotA saved sandbox state that can be used to create new sandboxes
CheckpointA saved filesystem state for one sandbox, used by pause/resume and restore workflows

Runtimes, sandboxes, and snapshots build on each other: Runtime → Sandbox → Snapshot. Checkpoints are sandbox-scoped: you restore the same sandbox back to a saved filesystem state instead of creating a reusable base image.

  1. Pick a runtime (e.g., bun:1 or node:latest)
  2. Create a sandbox from that runtime
  3. Optionally save a snapshot to reuse your configured environment

Runtimes

Runtimes are pre-configured base environments that Agentuity provides. Each includes an operating system, language toolchain, and common utilities.

Language Runtimes

Use these for general code execution:

RuntimeDescription
base:latestMinimal base runtime with essential tools (default)
bun:1Bun 1.x with JavaScript/TypeScript support
node:latestNode.js latest version
node:ltsNode.js LTS version
python:3.13Python 3.13 with uv package manager
python:3.14Python 3.14 with uv package manager

Agent Runtimes

Pre-configured AI coding assistants:

RuntimeDescription
claude-code:latestClaude Code AI assistant
amp:latestAmp AI coding assistant
opencode:latestOpenCode AI coding assistant

Runtime Metadata

Each runtime includes metadata for identification and resource planning:

FieldDescription
descriptionWhat the runtime provides
iconUrlURL to runtime icon
brandColorHex color for UI display
urlLink to runtime documentation or homepage
tagsCategories like language, testing, agent
requirementsMinimum memory, CPU, disk, and networkEnabled requirements

View runtime details with agentuity cloud sandbox runtime list --json.

Snapshots

A snapshot captures the filesystem state of a sandbox. You create new sandboxes from a snapshot rather than running it directly.

Snapshots build on top of runtimes. When you create a snapshot, it includes everything from the base runtime plus your installed dependencies and files.

Workflow:

  1. Create a sandbox from a runtime
  2. Install dependencies and configure the environment
  3. Save a snapshot
  4. Create new sandboxes from that snapshot (fast, no reinstallation needed)

See Creating and Using Snapshots for details.

Two Execution Modes

Choose based on your use case:

One-shot (sandbox.run())

Creates a sandbox, runs a single command, then destroys the sandbox. Best for stateless code execution.

import { createAgent } from '@agentuity/runtime';
 
const agent = createAgent('CodeRunner', {
  handler: async (ctx, input) => {
    const result = await ctx.sandbox.run({
      command: { exec: ['python3', '-c', 'print("Hello!")'] },
      resources: { memory: '256Mi', cpu: '500m' },
    });
 
    ctx.logger.info('Output', { stdout: result.stdout, exitCode: result.exitCode });
    return { output: result.stdout, exitCode: result.exitCode };
  },
});

Interactive (sandbox.create())

Creates a persistent sandbox for multiple commands. Best for stateful workflows like dependency installation.

import { createAgent } from '@agentuity/runtime';
 
const agent = createAgent('ProjectBuilder', {
  handler: async (ctx, input) => {
    const sandbox = await ctx.sandbox.create({
      runtime: 'node:lts',
      resources: { memory: '1Gi' },
      network: { enabled: true },  // Required for package installation
    });
 
    try {
      await sandbox.execute({ command: ['npm', 'init', '-y'] });
      await sandbox.execute({ command: ['npm', 'install', 'zod'] });
      const result = await sandbox.execute({
        command: ['node', '-e', 'console.log("ready")'],
      });
 
      return { exitCode: result.exitCode };
    } finally {
      await sandbox.destroy();
    }
  },
});

Background Jobs

Jobs let you run long-running commands in a sandbox without blocking. Unlike regular execution, jobs:

  • Run in parallel: Multiple jobs can execute simultaneously
  • Don't block: Control returns immediately after creation
  • Persist: Jobs continue even after the creating request completes
  • Capture output: Stdout/stderr are captured to streams for later retrieval

Creating Jobs

import { createAgent } from '@agentuity/runtime';
 
const agent = createAgent('JobRunner', {
  handler: async (ctx, input) => {
    const sandbox = await ctx.sandbox.create({
      runtime: 'node:lts',
      resources: { memory: '2Gi' },
      network: { enabled: true },
    });
 
    // Create a background job
    const job = await sandbox.createJob({
      command: ['sh', '-c', 'sleep 30 && echo done'],
    });
 
    ctx.logger.info('Job started', { jobId: job.jobId });
 
    // Check status later
    const status = await sandbox.getJob(job.jobId);
    if (status.status === 'completed') {
      ctx.logger.info('Job completed', { exitCode: status.exitCode });
    }
 
    return { jobId: job.jobId };
  },
});

Job Lifecycle

StatusDescription
pendingJob created, waiting to start
runningJob actively executing
completedFinished with exit code 0
failedFinished with non-zero exit code
cancelledTerminated by user request

Stopping Jobs

// Graceful stop (SIGTERM, then SIGKILL after grace period)
await sandbox.stopJob(job.jobId);
 
// Force kill immediately
await sandbox.stopJob(job.jobId, true);

Use Cases

Use CaseExample
Build processesRun npm run build in background
Long-running testsExecute test suites without blocking
Data processingProcess large files asynchronously
Service daemonsRun background services in sandbox

SDK Access

ContextAccess
Agentsctx.sandbox
Routesc.var.sandbox

The API is identical in both contexts.

Configuration Options

OptionDescriptionExample
runtimeRuntime environment'bun:1', 'python:3.14'
resources.memoryMemory limit (Kubernetes-style)'512Mi', '1Gi'
resources.cpuCPU limit in millicores'500m', '1000m'
resources.diskDisk space limit'1Gi'
network.enabledAllow outbound networktrue (default: false)
network.portPort to expose to internet (1024-65535)3000
projectIdAssociate sandbox with a project'proj_abc123'
timeout.idleIdle timeout before cleanup'10m', '1h'
timeout.executionMax execution time per command'5m', '30s'
dependenciesApt packages to install['python3', 'git']
packagesnpm/bun packages to install globally['typescript', 'tsx']
envEnvironment variables{ NODE_ENV: 'test' }
snapshotCreate from existing snapshot'my-env' or snp_abc123

Sandbox Events

Every sandbox records lifecycle events as it transitions through states. Use sandboxEventList to retrieve these events for auditing or debugging.

import { sandboxEventList } from '@agentuity/server';
 
const { events } = await sandboxEventList(client, {
  sandboxId: 'sbx_abc123',
  limit: 50,            // optional, default 50
  direction: 'asc',    // optional: 'asc' (oldest first, default) or 'desc'
});

Each event includes:

FieldDescription
eventIdUnique identifier for the event
sandboxIdID of the sandbox
typeEvent type (e.g., create, destroy, lifecycle:started)
eventArbitrary payload data for the event
createdAtISO timestamp when the event was recorded

From the CLI, use agentuity cloud sandbox events <sandbox-id> to list events. See CLI Commands for options.

Resume Paused Sandboxes

sandbox.execute() automatically resumes a suspended sandbox and returns autoResumed: true on the execution result. Call sandbox.resume() when you want the sandbox awake before issuing a batch of commands.

await sandbox.resume();
const execution = await sandbox.execute({ command: ['bun', 'run', 'test'] });
 
ctx.logger
  .child({ executionId: execution.executionId, autoResumed: execution.autoResumed })
  .info('Sandbox command completed');

When to Use Sandbox

Use CaseExample
Code execution agentsRun user-provided Python/JavaScript safely
Code validationVerify generated code compiles and runs
AI coding assistantsExecute code suggested by LLMs
Automated testingRun tests in clean environments
Build systemsCompile projects in isolated containers

Security

Sandboxes provide isolation through:

  • Network disabled by default: Enable explicitly when needed
  • Resource limits: Prevent resource exhaustion
  • Execution timeouts: Prevent runaway processes
  • Filesystem isolation: Each sandbox has its own workspace

Next Steps

  • SDK Usage: Detailed API for file I/O, streaming, and advanced configuration
  • Snapshots: Skip dependency installation with pre-configured environments
  • CLI Commands: Debug sandboxes and create snapshots manually