Building Secure, Reliable Agents

The best way to learn how to build a production-ready agent is to build a production-ready agent. In the past few posts, I've been walking through exactly that: building agents that scale to thousands of invocations per hour, with a harness that combines a fast, secure filesystem and runtime, tooling you can trust, and decision tracing. In this article I'm going deeper into how I've been engineering the security and reliability.

If you've built similar systems or see gaps in this approach, I'd welcome your input.

Security by Separation and Encryption

Serverless functions have payload limits and time limits. These constraints are set by the hosting provider, not the orchestration layer, and they vary considerably:

Vercel: 4.5MB payload, 10s to 900s execution timeout depending on plan
AWS Lambda: 6MB to 20MB payload, 15 minute timeout
Google Cloud Functions: 512KB to 32MB payload, 10 to 60 minute timeout
Cloudflare Workers: 100MB to 500MB payload, no fixed timeout
Netlify: 256KB to 6MB payload, 10s to 15 minute timeout
DigitalOcean: 1MB payload, 15 minute timeout

Complex workflows processing large documents, multi-step agent executions with accumulated context, or workflows handling hundreds of concurrent employee records will hit one or both of those ceilings. An agent that needs to wait 2 hours for a downstream service to recover can't hold a function execution open for that duration (more on durable sleep patterns later in this article).

One solution is context externalization: store execution data separately from workflow definitions. The workflow function itself stays lightweight (trigger metadata, step references, routing logic). The actual payloads, documents, intermediate results, and agent context live in an external store that the execution framework references by ID.

This is standard practice for performance, but it unlocks real security benefits too. When context lives in a separate store, you can apply data lifecycle policies independently of the workflow engine: retention schedules, geographic residency, automatic purging of PII after processing. Organizations that need to own their data infrastructure can bring their own cloud (BYOC), keeping execution context in their own S3 buckets or equivalent while the orchestration layer handles routing and step coordination.

Per-execution encryption ensures that the agent can only decrypt context during its active runtime. Each execution generates a scoped decryption key that lives for the duration of the run. When the execution completes, the key is discarded. Any context that needs to be referenced later gets saved as sanitized logs or stored as memories in the agent's filesystem (covered in the previous article). The raw execution payload, with all its PII and intermediate reasoning, becomes inaccessible.

async function executeWithEncryption(
  orgId: string,
  executionId: string,
  handler: () => Promise<StepOutput>
) {
  // Generate execution-scoped key from org master key
  const executionKey = await deriveKey(orgId, executionId);

  // Decrypt input context for this execution
  const context = await decryptPayload(executionKey, inputRef);

  // Run the handler with decrypted context
  const result = await handler();

  // Encrypt output, discard execution key
  await encryptAndStore(executionKey, result);
  executionKey.destroy();

  return { ref: result.ref, status: "complete" };
}

The implication: a compromised step in one tenant's execution can't expose another tenant's data, and a compromised execution can't expose historical payloads from prior runs.

Authorization and Access Control

Multi-tenant agent platforms need RBAC at multiple levels: which users can build workflows, which users can trigger them, which agents can access which data sources, and what scopes those connections carry.

Authentication routes through SSO middleware that acts as an OAuth 2.0 wrapper around the customer's identity provider (SAML or OIDC). The platform doesn't manage user databases directly. It normalizes authentication across identity providers, assigns organization-scoped roles, and enforces permission boundaries. When a user signs in, the platform evaluates their IdP group memberships, maps them to roles, and scopes their session accordingly.

For third-party account connections (Slack, Google Workspace, GitHub, Workday), users connect their own accounts through a managed OAuth widget embedded in the application. The platform handles the OAuth flows, token refresh logic, and credential storage on the user's behalf. When an agent needs to make an API call to a connected service, it requests a fresh access token from the platform's connection layer. The agent never sees or stores raw credentials.

Workday's Agent Gateway is worth a specific mention here. Workday's Agent System of Record (ASOR) provides a centralized authentication and authorization layer for registered agents. The Agent Gateway functions as a secure reverse proxy: all API traffic (REST, SOAP, WQL, Graph) routes through us.agent.workday.com, where the gateway validates the agent's token, checks that the agent has access (AuthZ via ASOR), checks that the user has access, and then returns the data. This dual authorization model (agent-level and user-level) is the right pattern for enterprise systems where agents act on behalf of employees. It ensures that an agent registered for travel booking management can't access compensation data, even if the underlying API surface technically supports it.

Guardrails

Guardrails execute at three checkpoints in the agent lifecycle: before agent inference (screening user input), before tool execution (validating the agent's intended action), and before agent response (screening output before it reaches the user). Different guardrails apply at each checkpoint.

Content filtering and manipulation detection run at the input boundary. These catch prompt injection attempts, instruction hijacking ("ignore previous instructions and..."), and content that falls outside the agent's defined scope. OSS libraries like superagent provide model-agnostic guardrail infrastructure for this:

import { createClient } from "safety-agent";

const safety = createClient();

async function screenInput(input: string): Promise<GuardResult> {
  const result = await safety.guard({
    input,
    model: "google/gemini-3-flash",
    fallbackModel: "superagent-guard-1.7b",
  });

  if (result.classification === "block") {
    return {
      blocked: true,
      reason: result.reasoning,
      violations: result.violation_types,
    };
  }
  return { blocked: false };
}

Bring-your-own-policy guardrails complement the structural checks. These are well-defined policy documents, executed by low-latency, high-TPS models, that encode organization-specific rules. For an agent that handles employee peer feedback submissions, a spam policy might look like this:

## Spam Policy (#SP)

**GOAL:** Identify spam in peer feedback submissions.
Classify each submission as VALID or INVALID.

**Allowed Content (SP0):**
- SP0.a: Constructive performance feedback
- SP0.b: Specific behavioral observations with examples
- SP0.c: Development recommendations tied to role expectations

**Likely Spam (SP2):**
- SP2.a: Generic praise with no specifics ("Great job!")
- SP2.b: Copy-pasted identical feedback across multiple reviews
- SP2.c: Irrelevant content unrelated to the review period

**High-Risk (SP3):**
- SP3.a: Coordinated identical submissions from multiple reviewers
- SP3.b: Feedback that references information outside the
  reviewer's working relationship

**Malicious (SP4):**
- SP4.a: Personally abusive language disguised as feedback
- SP4.b: Deliberately false performance claims
  Output: INVALID + ESCALATE to HR

The policy document lives in the agent's filesystem alongside other configuration. The guardrail model evaluates each submission against the policy at inference time, classifying and routing accordingly. This approach keeps guardrail logic in plain language that HR teams can review and update without touching code.

Deterministic tool enablement is the third layer. Tools don't execute by default just because the agent requests them. For sensitive operations (submitting a performance review, updating a worker record, sending a notification), the tool is gated behind a human-in-the-loop approval step. The agent can reason about whether to call the tool, but the tool's execute function only fires when the approval condition is met:

const submitFeedback = createTool({
  id: "submit-peer-feedback",
  description: "Submit peer feedback for a performance review cycle",
  inputSchema: z.object({
    revieweeId: z.string(),
    feedback: z.string(),
    rating: z.number().min(1).max(5),
  }),
  requireApproval: true,
  execute: async (input) => {
    // Only runs after human approval
    const result = await workday.submitFeedback(input);
    return { submitted: true, confirmationId: result.id };
  },
});

The execution framework surfaces the pending tool call to the approval interface. The reviewer sees the agent's intended action, the arguments it prepared, and the reasoning chain that led to the decision. Approve or decline. The agent resumes or reroutes.

Durable Execution

Agent workflows fail. APIs return 503s. LLM providers rate-limit you. Downstream systems go down for maintenance. The question is whether the failure takes down the entire workflow or just the step that failed.

Durable execution wraps every meaningful unit of work in a step function that provides three guarantees: automatic retries (configurable per step), step-level idempotency (a completed step returns its cached result if the workflow restarts), and state that survives failures (the execution framework persists progress after each step).

const travelAlertWorkflow = createWorkflow(
  { id: "travel-alert", retries: 3 },
  async ({ event, step }) => {
    // Step 1: completed steps are memoized on restart
    const bookings = await step.execute("fetch-bookings", async () => {
      return await workday.getTravelBookings(event.data.orgId);
    });

    // Step 2: if this fails, step 1 doesn't re-execute
    const advisories = await step.execute("check-advisories", async () => {
      return await fetchAdvisories(bookings.destinations);
    });

    // Step 3: wait for external system recovery
    const serviceStatus = await step.execute("check-comms-status", async () => {
      const status = await checkServiceHealth("messaging-provider");
      if (status.degraded) {
        // Sleep until recovery event from status page webhook
        return await step.sleep({
          event: "messaging-provider-recovered",
          timeout: "2h",
        });
      }
      return { ready: true };
    });

    // Step 4: send alerts with step-level retry isolation
    for (const booking of bookings.active) {
      await step.execute(`alert-${booking.employeeId}`, async () => {
        return await sendAlert(booking, advisories, serviceStatus);
      });
    }
  }
);

A few patterns worth calling out.

In my harness, step.sleep accepts either a duration string or an event declaration. When passed a duration (step.sleep("30m")), it pauses the execution for that interval, then the next step checks service health again. When passed an event declaration with a timeout (step.sleep({ event: "messaging-provider-recovered", timeout: "2h" })), it pauses until the matching event arrives or the timeout expires. Use event declarations when the downstream service publishes status webhooks. Use duration sleeps when it doesn't. In both cases, no compute is burned while waiting, and the LLM can dynamically adjust intervals: shorter sleeps for high-urgency alerts, longer for routine notifications.

Step isolation means that if alert-employee-21003 fails because that employee's Slack account is deactivated, the workflow is not blocked for alert-employee-21004. The failure is logged and traceable, but it doesn't cascade.

Payload encryption at step boundaries (covered in the security section above) ensures that the persisted state between steps is encrypted. If the execution framework's storage is compromised, the attacker gets encrypted blobs with execution-scoped keys that no longer exist.

Decision Context

Security teams and compliance auditors need to know what an agent did, why it did it, and on whose behalf. Every execution step, tool call, model inference, and guardrail evaluation produces a structured trace. These traces are sanitized: no employee names, no PII, no raw document content. References to subjects are stored as opaque reference IDs that only security administrators with the appropriate RBAC scope can resolve back to identifiable records. The trace captures the structure of the decision (which tool, which policy, which branch, what confidence level) without exposing the subjects of the decision.

This isn't just an observability feature. Comprehensive security context for agents requires capturing not just what was accessed, but the full decision chain: what the agent observed, what it considered, what it chose, and what it discarded. That context is what enables post-incident analysis and continuous improvement of guardrail policies.

The traces feed back into the platform's decision trace infrastructure (more on that in a future post), where they become queryable: "Show me every time an agent escalated a guardrail violation in the last 30 days" or "Which agents triggered the most tool approval requests this quarter?"

What This Adds Up To

Security for AI agents isn't a single feature. It's a set of constraints that compound: externalized payloads with per-execution encryption bound data exposure to a single run. RBAC with dual authorization (platform-level and system-of-record-level) ensures agents can only access what both the platform and the source system authorize. Guardrails at three checkpoints with bring-your-own-policy support let organizations encode their own rules without waiting for platform updates. Durable execution with step isolation contains failures and makes recovery automatic.

Each layer is straightforward on its own. Together, they form the posture that enterprise systems require before they'll let an agent touch production data.