Skip to main content

AI Agent for Trucking β€” v1 Proposal

FieldValue
StatusDraft β€” decisions locked, code not started
OwnerScott Asher
Target reposattunelogic-api, attunelogic-service
IndustriesTrucking only (v1)
LLM providerAnthropic (Claude)
Implementation plandocs/plans/ai-agent-trucking-v1.md
Last updated2026-05-16

Reading order: start here for the what and why. When you're ready for the how and in what order, jump to the implementation plan β€” it mirrors the Phase 1 β†’ 2 β†’ 3 build order with checkbox-trackable todos.


Locked decisions (May 2026)​

Captured during the 2026-05-16 pre-build review. These are settled β€” code can be written against them without re-litigation. Anything not on this list defers to the section it lives in below.

Scope & posture​

  • Phase 1 and Phase 2 will be built; Phase 3 stays deferred until an Anthropic API key is acquired AND the data-policy review is signed off.
  • "Live for development purposes" means: Phase 1+2 code deploys to beta only. AI_AGENT_GLOBAL_ENABLED=true in beta, false in alpha and main. No external tenant gets the flag forced on until the pre-release checklist is green.
  • Code work is not started yet. Lock the decisions in this doc, revisit kickoff after the current feature/tenant-roles-permissions work lands and the Anthropic key is in hand.

Names, paths & schemas (lock; renames cost migrations later)​

  • Feature flag key: aiAgent.enabled
  • Config block: configs.aiAgent.{ model, monthlyTokenCap, perUserDailyMessageCap, monthlyCostCeilingUsd, allowedTools, defaultUndoWindowMs, dryRun }
  • System-level kill switch doc: Config doc { type: "system", key: "aiAgent.globallyEnabled" }
  • Env vars: AI_AGENT_GLOBAL_ENABLED, ANTHROPIC_API_KEY, AI_AGENT_DEFAULT_MODEL
  • Routes: POST /api/v1/ai/agent/messages, GET /api/v1/ai/agent/sessions/:id, GET /api/v1/system/ai-status (no auth), GET /api/v1/admin/ai-agent/health (super-admin), GET /api/v1/admin/ai-agent/:parentCompanyId/activity, PATCH /api/v1/account/feature-flags
  • Mongo collection: agentsessions (default Mongoose pluralization)
  • Source tree: src/services/ai/{ anthropic.js, agent/{ index.js, systemPrompt.js, addressRedactor.js, addressDetector.js, tools/ } }
  • Additive Job fields: aiCreated: Boolean (default false), aiCreatedAt: Date, pendingAiReview: Boolean (default false). All sparse-indexable, backwards compatible.

Cost ceilings & alerts (dev posture)​

  • Internal/dev tenants: no hard auto-disable ceiling. App-side enforcement off for internal tenants.
  • Alerts only: super-admin gets a Sentry alert + email when:
    • Per-tenant MTD cost crosses $10/day for that tenant
    • Per-tenant MTD cost crosses $50/month for that tenant
  • The L2 runtime kill switch and the per-tenant Force Off remain available as manual cutoffs if alerts trip.
  • External pilot ceilings are unchanged from the proposal: $25/tenant/mo, $200/day platform β€” those become live the moment a non-internal tenant is enabled. Internal vs external is identified by a pilot.aiAgent === false (or unset) on the Customer doc β€” internal/dev tenants explicitly set pilot.aiAgent = "internal" to opt into alert-only mode.

Audit retention​

  • AgentSession documents: 90-day TTL on startedAt. Mongo TTL index handles purge.
  • Rationale: balances storage, debugging headroom (3 months of user-reported issue investigation), and audit-exposure surface. Longer-term audit is satisfied by Sentry breadcrumbs + Mongo backups, not by keeping primary agentsessions indefinitely.
  • Revisit before GA if real usage patterns suggest a different number.

Dry-run mode default​

  • No global default. aiAgent.dryRun is set per-tenant, super-admin only.
  • New internal tenants are added with dryRun = true until the team is satisfied with tool-call quality on that tenant's actual data, then flipped to false deliberately.
  • This avoids both extremes: never accidentally writing for a tenant we haven't vetted, and never gating internal work behind dry-run forever.

Anthropic provider (Phase 3 only)​

  • API key: to be created when Phase 3 starts. Not blocking Phase 1+2.
  • Data-policy review: narrowed scope is "client names + location names + city/state strings only." No addresses, lat/lng, postal, phone, email leave the API. Sign-off required before Phase 3 begins.
  • Default model: pin to a specific dated Claude version at Phase 3 kickoff (current direction: claude-sonnet-4-...). Read from AI_AGENT_DEFAULT_MODEL env so version bumps don't require code changes.

Cross-references​

  • Implementation plan with checkbox-trackable todos: docs/plans/ai-agent-trucking-v1.md
  • Risk callouts and the "lock this now or regret it" list live further down in this doc.

TL;DR​

Add an in-app AI assistant on the service web that lets trucking customers create multi-leg Jobs from natural language ("schedule a load with 3 orders for Acme from Dallas to Houston"). The agent runs as an orchestrator on the API using Claude with tool-use, resolves entities through scoped read-only tools, and commits via the existing handleExtractedJobCreate pipeline so created records flow through current validation, tenancy, and audit. Created jobs are flagged as AI-originated and surfaced in the drawer with a 5-minute Undo window.

Two non-negotiable safety properties for launch:

  1. No addresses ever leave the API. The LLM works only with { id, name, city, state }. Street, ZIP, lat/lng, and pre-formatted address strings stay server-side.
  2. Four independent kill-switch layers gate every request, so the agent can be disabled at the env, runtime, tenant, or industry level without touching the others.

Stakeholder review checklist​

Use this section when sharing with stakeholders to capture sign-off.

  • Product β€” confirm scope (trucking-only, create-only, drivers excluded, no net-new locations) is acceptable for v1
  • Security / Privacy β€” sign off on the no-address PII model and the four-layer kill switch
  • Operations β€” sign off on cost ceilings ($25/tenant/mo pilot, $200/day platform), runbook, and panic-button procedure
  • Anthropic data-policy review β€” narrowed scope is "client names + location names + city/state strings only"
  • Engineering β€” confirm phased build order is feasible and resource it

Architecture​

flowchart LR
User["User (trucking tenant)"] -->|prompt| Drawer["AgentDrawer<br/>(service web)"]
Drawer -->|"POST /api/v1/ai/agent/messages"| Route["Express route<br/>L1+L2+L3+L4 gates<br/>+ rate limit"]
Route --> Orchestrator["Agent service<br/>(Claude + tool loop)"]
Orchestrator -->|tool calls| Tools["Scoped tools (parentCompany)"]
Tools --> Read[("Clients,<br/>Locations")]
Orchestrator -->|"createDraftJob tool"| Existing["createJob -> handleExtractedJobCreate<br/>(existing controller)"]
Existing --> DB[("MongoDB")]
Existing -->|"jobId + aiCreatedAt"| Drawer
Drawer -->|"5-min Undo<br/>DELETE /jobs/:id"| Existing

Core principle: the LLM never writes directly. It calls scoped read tools to resolve entities, then a single createDraftJob tool that calls the existing createJob controller path with an extractedData.legs payload. This reuses every guardrail already in attunelogic-api/src/controllers/jobs/create.js.

Critical implementation notes (validated against the codebase)​

  • createDraftJob payload shape (deterministic, low-risk). The agent's tool always emits a fully pre-resolved payload β€” no name strings, no fuzzy fields β€” so the server-side OCR matching path is short-circuited:
    {
    "client": "<Client._id>",
    "appointmentDate": "<ISO>",
    "extractedData": {
    "legs": [
    { "origin": "<Location._id>", "destination": "<Location._id>", "resolvedLocation": { "id": "<Location._id>" }, "orderNumber": "1" },
    { "origin": "<Location._id>", "destination": "<Location._id>", "resolvedLocation": { "id": "<Location._id>" }, "orderNumber": "2" }
    ]
    },
    "aiCreated": true,
    "pendingAiReview": true
    }
    All location IDs come from searchLocations (which only returns IDs scoped to parentCompany). Client ID comes from searchClients. The LLM never invents IDs.
  • No driver assignment in v1. Setting leg.driver triggers handleUserAssignedToLeg, which sends notifications. Drivers are dispatched by humans after approval.
  • Approval status is NOT a schedule gate. schedule.all for trucking does not filter by approval.status, so a pending job is still visible on dispatch screens. Mitigation: new pendingAiReview flag on Job + an opt-in exclude filter so AI-created jobs stay off dispatch schedules until approved.
  • Search tools reuse existing endpoints rather than creating parallel ones:
    • searchClients β†’ GET /clients?search=true&searchTerm=... (Atlas Search autocomplete on name)
    • searchLocations β†’ GET /locations?search=true&searchTerm=...&clientId=... (regex-with-scoring)
  • Live flag refresh. ConfigProvider currently only initializes configs once; we patch its useEffect to re-sync on configData change so admin toggles propagate to the launcher without a page reload.

Data sent to Anthropic (PII minimization)​

Hard rule: no addresses ever leave the API. This includes street, postalCode, country, lat/lng, and any pre-formatted "123 Main St, Dallas TX 75001" strings. The LLM works entirely with identifiers + display labels (id + name + city + state).

flowchart LR
User[User prompt] --> Guard["Address detector<br/>(regex on inbound prompt)"]
Guard -->|address-shaped| Reject["400 ADDRESS_DETECTED<br/>friendly nudge in UI<br/>nothing forwarded"]
Guard -->|clean| Orch[Orchestrator]
Orch -->|searchLocations clientId, query| Tool[searchLocations tool]
Tool --> DB[(Locations<br/>full docs)]
DB --> Redact["addressRedactor<br/>β†’ {id, name, city, state}"]
Redact -->|redacted JSON| Orch
Orch -->|"createDraftJob<br/>legs[].pickup={locationId},<br/>legs[].drop={locationId}"| Existing[handleExtractedJobCreate]
Existing --> Resolve["Server resolves IDs<br/>β†’ full Location docs<br/>(addresses stay server-side)"]

What goes to Anthropic:

  • The user's prompt (after passing the address-detector guard)
  • The system prompt (no tenant data)
  • Tool results in the form { id, name, city, state } only
  • Created job summaries by id + leg names

What never goes to Anthropic:

  • Street addresses, postal codes, country codes
  • Latitude / longitude
  • Pre-formatted address strings
  • Phone numbers, emails, contact names
  • Driver names or IDs (driver assignment is out of scope for v1)
  • Anything from the user's prompt that looks address-shaped (rejected pre-flight)

How it's enforced (six layers):

  1. Inbound guard. A regex-based addressDetector scans the user's message before it reaches the LLM. Address-shaped tokens (street suffixes like St/Ave/Blvd, ZIP/postal patterns, lat/lng pairs) trigger a friendly 400 with code ADDRESS_DETECTED. The message is never forwarded.
  2. Outbound redaction. Every tool that touches Location data passes its result through addressRedactor before returning to the orchestrator. Deny-list (drops street, postalCode, country, coordinates, formattedAddress) layered with an allow-list (only id, name, city, state survive).
  3. Reference-by-ID createDraftJob. The LLM cannot construct a location β€” it can only reference IDs returned by a prior searchLocations call in the same session. The server re-resolves the ID against the full DB record (with addresses) when calling handleExtractedJobCreate. The LLM never sees, types, or stores the address.
  4. AgentSession audit storage. Tool result snapshots persisted on the session use the redacted shape. Even if a future bug exposed sessions, no addresses would leak.
  5. Regression test gate. tests/services/ai/no-address-leak.test.js runs every tool against fixtures with recognizable addresses and fails if any address-shaped string appears in the orchestrator-bound payload or the persisted session. Runs in CI on every PR touching src/services/ai/**.
  6. UX nudge. The InputBar tells users up front: "Refer to locations by name β€” please don't paste addresses." The 400 error renders an inline nudge with a quick link to create a saved Location.

Trade-offs the user should know about:

  • City/state ambiguity. Two locations in the same city share the city/state label. The LLM resolves them by name suffix ("Acme Dallas DC" vs "Acme Dallas Yard"). Search results return both with city/state to help disambiguate.
  • Net-new locations are out of scope for v1. v1 will not let the LLM create a Location, since creating one would require the LLM to handle an address. If the user wants a stop at a location not yet saved, the agent responds: "I can't add a new stop yet β€” please create the location first, then I can use it." This is a deliberate v1 limit that keeps the no-address rule airtight.
  • Slightly chattier prompts. Users have to refer to locations by name rather than describing them by address. Mitigated by the searchLocations autocomplete being good enough that "Acme Dallas" finds "Acme Dallas DC".

Anthropic data-policy review is now narrowly scoped to: client names + location names + city/state strings, in a context where Anthropic has zero data-retention beyond standard processing. Significantly smaller review than the original "names + addresses" scope.


Kill-switch hierarchy (defense in depth)​

The agent is gated by four independent layers that are evaluated in order on every request. Any one of them can disable the system without touching the others. This is the core safety story for the launch.

flowchart TD
Req["Incoming agent request"] --> L1{"L1<br/>AI_AGENT_GLOBAL_ENABLED env var"}
L1 -->|false| Block1["503 Service Unavailable<br/>(no DB, no LLM, no logs beyond a counter)"]
L1 -->|true| L2{"L2<br/>System runtime flag<br/>(Config type=system)"}
L2 -->|false| Block2["503 β€” global runtime kill"]
L2 -->|true| L3{"L3<br/>Tenant flag<br/>featureFlags.aiAgent.enabled"}
L3 -->|false| Block3["403 β€” tenant has agent off"]
L3 -->|true| L4{"L4<br/>Industry gate<br/>appType === trucking"}
L4 -->|false| Block4["403 β€” wrong industry"]
L4 -->|true| L5["Run rate limits, cost ceiling,<br/>dry-run check, then handler"]
LayerSource of truthWho controls itHow fast to flipUse case
L1: Env varAI_AGENT_GLOBAL_ENABLED env on every API instanceDevops (deploy or env update)Minutes (rolling restart)Hardest kill. Defaults to false in production β€” agent is opt-in per environment. Survives DB outages.
L2: System runtime flagConfig doc with type: "system", key aiAgent.globallyEnabledSuper-admin via panic buttonSeconds (cached 30s per process; cache busted on toggle)Operational kill. Use during incidents, billing spikes, or vendor outages. No deploy needed.
L3: Tenant flagConfig.configs.featureFlagOverrides.aiAgent.enabledSuper-admin (Force On/Off) or tenant admin (self-serve)ImmediatePer-customer enablement. The feature for everyday admin work.
L4: Industry gateCustomer.appType === "trucking"N/A β€” derived from tenant dataN/ABelt-and-suspenders. Prevents accidental enablement on a non-trucking tenant.

Key properties:

  • L1 short-circuits everything. When the env var is false, the route returns 503 before any DB query, before reading any tenant config, before touching the LLM. No cost, no logs (except a counter for monitoring), no risk.
  • L2 has a 30s per-process cache so it doesn't add a DB hit per request, but the panic-button toggle explicitly busts the cache across all instances via a Mongo change stream / pub-sub (or, simpler for v1: 30s is acceptable kill-window with documented expectation).
  • L4 is hard-coded by industry, not a flag. Even if someone Force On's the tenant flag for a service-repair tenant, the industry gate still blocks them. Cannot be bypassed from the admin UI.
  • The launcher in service web honors the same hierarchy. A new public-ish GET /api/v1/system/ai-status endpoint returns { globalEnabled } with no auth required (just the L1+L2 result), polled every 60s + on window focus by the service web. When global goes off, the launcher disappears from active sessions within 60s without requiring a re-login.
  • Circuit breaker can flip L2 automatically when error rate or platform cost exceeds threshold. Auto-recovery is intentionally manual β€” a human must verify before re-enabling.

Automatic safety circuits​

Beyond the manual kill switches, the system protects itself:

  • Error-rate circuit breaker. Sliding 5-minute window of Anthropic API calls. If error rate exceeds 25% over at least 10 calls, automatically flips L2 to off, sends Sentry alert + super-admin email. Manual flip required to re-enable.
  • Platform cost ceiling. Independent of per-tenant ceilings. Sum of all tenant MTD cost. If 24h rolling spend exceeds platform threshold (configured value, e.g. $200/day starting), flips L2 to off + alerts.
  • Per-tenant cost ceiling. When a tenant's MTD cost hits its per-tenant monthlyCostCeilingUsd, that tenant is auto-disabled (their L3 effectively flips to off via a tenantSuspendedUntil field). Other tenants unaffected. Resets on the 1st of the next month.
  • Per-tenant message rate limit. perUserDailyMessageCap and aiAgentLimiter (express-rate-limit). 429 when exceeded.

Dry-run mode​

A per-tenant aiAgent.dryRun: true config flag (super-admin only) makes createDraftJob return a preview payload without creating a Job. Lets us:

  • Pilot in production with a real tenant and zero write risk
  • Demo the agent to a prospect without touching their data
  • Test the LLM's tool selection on real prompts before flipping write-mode on

The drawer UI shows a clear "Preview mode β€” no jobs will be created" banner when dryRun is on.


Per-customer on/off (first-class requirement)​

Two control surfaces β€” super-admin (full control) and tenant admin (self-serve for their own org only).

Super-admin (existing UI, just register the flag)​

  • SuperAdmin > Feature Flags (/admin/feature-flags) β†’ pick a tenant β†’ the aiAgent.enabled row appears with Inherit / Force On / Force Off. No new admin screen needed.
  • Same UI also surfaces Tier Defaults, beta allowlist, and (new) the AI Activity widget below.

Tenant admin self-serve (new, scoped)​

  • New endpoint PATCH /api/v1/account/feature-flags with verifyToken + admin (role) + verifyParent. Body: { featureKey, value } where value ∈ { true, false, null } (null = inherit).
  • Allowlist enforcement: the endpoint only accepts flags whose registry entry has tenantAdminToggleable: true. Any other key returns 403. This means we explicitly opt features in to self-serve β€” aiAgent.enabled is opted in; sensitive infra flags are not.
  • Writes to the same Config.configs.featureFlagOverrides that super-admin uses, so both surfaces share one source of truth and the resolver doesn't change.
  • Where it appears in the UI: new "AI Assistant" card in the existing tenant settings/account area (admin-visible only), with a single on/off toggle, short description, and a "Learn more" link. Toggle is hidden for non-admins.
  • Live effect: on success, optimistically refresh useConfig() so the launcher shows/hides immediately without a reload.

Rollout knobs available out of the box​

  • Default off for everyone via defaultEnabled: false + lifecycle: "beta" in the registry.
  • Force On / Force Off per tenant from super-admin (writes featureFlagOverrides).
  • Tenant admin self-serve via the new account-scoped endpoint.
  • Tier-based defaults (e.g., enable for tier3+) via existing Tier Defaults tab.
  • Beta allowlist (channel = latest) for staged rollouts without a Force On.
  • Kill switch: Force Off any tenant from super-admin to override even the tenant's own preference (Force Off beats tenant admin On per existing resolver precedence).

AI Activity widget (super-admin)​

Lives on SuperAdmin > Feature Flags and renders when a tenant is selected and the registry contains aiAgent.enabled. Lets ops monitor adoption and cost per customer alongside the toggle.

  • Endpoint: GET /api/v1/admin/ai-agent/:parentCompanyId/activity (super-admin only).
  • Response shape:
    {
    enabled: boolean,
    tokenUsage: {
    monthToDate: { input: number, output: number, estimatedCostUsd: number },
    perDay: [{ date: "YYYY-MM-DD", input, output }] // last 30 days
    },
    sessionCounts: { mtd: number, today: number },
    jobsCreated: { mtd: number, today: number },
    recentSessions: [
    { _id, user: { _id, name }, startedAt, promptPreview, toolsUsed: [string], jobsCreatedCount, tokenUsage: { input, output }, status }
    ] // last 10
    }
  • Aggregations: computed from AgentSession (filtered by parentCompany). MTD aggregations use a single Mongo aggregation pipeline; recent sessions = find().sort({ startedAt: -1 }).limit(10).
  • Cost estimate: model-pricing table in src/services/ai/pricing.js (input/output $/1M tokens by model name); easy to update without redeploy by reading from config.
  • UI: new card built with existing shared/Card, shared/Badge, and a small inline sparkline. Shows totals at top, table below, sparkline on the right.
  • Empty state: when enabled === false and no sessions exist, shows "No AI activity for this tenant yet" with a hint to enable the flag.

API changes (attunelogic-api)​

New dependency

  • @anthropic-ai/sdk (latest). API key in env: ANTHROPIC_API_KEY (add to .env.example, config/keys.js, config/index.js).

New files

  • src/services/ai/anthropic.js β€” singleton client wrapper, model selection (default claude-sonnet-4-…), token usage logger.
  • src/services/ai/agent/systemPrompt.js β€” tenant-aware system prompt (injects appType, today's date, tenant timezone, rules: "always confirm ambiguous client/location matches before creating", etc.).
  • src/services/ai/agent/index.js β€” runAgent({ messages, tenantContext }). Implements the Claude tool-use loop with hard cap of N iterations.
  • src/services/ai/agent/tools/index.js β€” tool registry exporting { name, description, input_schema, handler }.
  • src/services/ai/agent/tools/searchClients.js β€” fuzzy match on Client.name, scoped by parentCompany from customerConfigStorage. Returns top 5 with _id, name, location count.
  • src/services/ai/agent/tools/searchLocations.js β€” search by city/state/name, optionally filtered by clientId. Returns top 5, redacted to { id, name, city, state }.
  • src/services/ai/agent/tools/createDraftJob.js β€” builds extractedData payload and invokes existing createJob (or its inner handleExtractedJobCreate). Sets aiCreated: true, aiCreatedAt: new Date(), pendingAiReview: true on the Job document. Returns the created jobId + summary.
  • src/services/ai/agent/addressRedactor.js β€” utility that strips address-bearing fields from any object before it reaches the orchestrator.
  • src/services/ai/agent/addressDetector.js β€” regex guard for inbound user prompts.
  • src/controllers/ai/agent/index.js β€” messages.create handler. Validates body, attaches tenant + user context, calls runAgent, returns { messages, toolEvents, createdJobId? }. Persists transcript to a new AgentSession model.
  • src/models/AgentSession.js β€” { parentCompany, user, messages: [{ role, content, toolUses, toolResults, ts }], tokenUsage, createdJobIds }. Tenant-scoped, indexed on parentCompany + user.
  • src/routes/api/v1/ai/agent.js β€” POST /messages, GET /sessions/:id. Middlewares: verifyToken, admin, verifyParent, requireFeature("aiAgent.enabled"), dedicated aiAgentRateLimiter.
  • src/routes/api/v1/system/ai-status.js β€” GET /system/ai-status (no auth) returning { globalEnabled } for L1+L2 only.
  • src/routes/api/v1/admin/ai-agent/health.js β€” super-admin health endpoint.
  • src/routes/api/v1/account/feature-flags.js β€” tenant-admin allowlisted PATCH.
  • Wire all routes in src/routes/api/v1/index.js.

Touched files

  • src/models/Job.js β€” add optional aiCreated: Boolean, aiCreatedAt: Date, pendingAiReview: Boolean (additive, backwards compatible).
  • src/controllers/schedule/index.js β€” opt-in filter excluding pendingAiReview: true jobs from dispatcher schedule.
  • src/services/config/default-configs/feature-flags.js β€” register aiAgent.enabled with { lifecycle: "beta", defaultEnabled: false, description: "AI assistant for creating loads from natural language", tenantAdminToggleable: true }.
  • src/services/config/default-configs/index.js β€” add aiAgent config block: { model, monthlyTokenCap, perUserDailyMessageCap, monthlyCostCeilingUsd, allowedTools, defaultUndoWindowMs, dryRun }.
  • src/middlewares/rateLimiting.js β€” add aiAgentLimiter (e.g., 30 req / 5 min per user, 500 req / day per tenant; values in config).
  • src/middlewares/featureGates.js (new or existing) β€” add requireAppType("trucking") and requireGlobalAiEnabled middlewares.
  • .env.example, config/keys.js, config/index.js β€” ANTHROPIC_API_KEY, AI_AGENT_DEFAULT_MODEL, AI_AGENT_GLOBAL_ENABLED.

Tests (tests/controllers/ai/agent/, tests/services/ai/, tests/controllers/account/)

  • Tool-call loop with mocked Anthropic client (single-leg, multi-leg, ambiguous client β†’ asks for clarification).
  • Tenancy: tools cannot return data from other parentCompany values.
  • Feature flag off β†’ 403 on every agent route.
  • L1 env var off β†’ 503 before any DB hit.
  • L2 runtime flag off β†’ 503 within cache window.
  • L4 industry gate β†’ 403 for non-trucking tenants even with flag forced on.
  • Rate limit triggers 429.
  • Created Job carries aiCreated: true, pendingAiReview: true, hidden from dispatcher schedule.
  • Undo: DELETE /jobs/:id within window succeeds.
  • Tenant-admin toggle: PATCH /account/feature-flags as admin enables/disables aiAgent.enabled and the change is reflected in GET /config. Same call with a non-toggleable flag β†’ 403. Same call as a non-admin user β†’ 403. Cross-tenant attempt blocked by verifyParent.
  • AI Activity endpoint: super-admin gets correct aggregates; non-super-admin gets 403; another tenant's data is never returned.
  • tests/services/ai/no-address-leak.test.js β€” runs every tool against address-bearing fixtures and asserts zero address-shaped strings reach the orchestrator-bound payload or the persisted AgentSession. CI gate for src/services/ai/**.
  • Address detector β€” positive tests for street suffixes, ZIP/postal, and lat/lng patterns returning 400 ADDRESS_DETECTED.

Service web changes (attunelogic-service)​

New files

  • src/redux/services/ai/agentApi.js β€” RTK Query slice with sendAgentMessage mutation, getAgentSession query, getAiSystemStatus query. Invalidates Jobs/Schedule tags on successful job creation.
  • src/components/AIAgent/AgentLauncher.jsx β€” FAB. Stacks above ChatLauncher using the existing --right-sidebar-offset CSS var. Hidden when L1/L2/L3/L4 disable the agent.
  • src/components/AIAgent/AgentDrawer.jsx β€” uses shared/Drawer. Sections: message list, tool-call status pills ("Looking up Acme…", "Checked Dallas locations"), composer, and the Undo banner.
  • src/components/AIAgent/MessageList.jsx, InputBar.jsx, UndoBanner.jsx, ToolCallPill.jsx.
  • Optional v1.1: src/components/AIAgent/JobPreviewCard.jsx β€” clickable card linking to /jobs/:id after creation.
  • src/pages/SuperAdmin/FeatureFlags/AiSystemStatusPanel.jsx β€” health endpoint output + panic button.
  • src/components/Settings/AiAssistantCard.jsx β€” tenant-admin on/off toggle.

Touched files

  • src/layouts/Dashboard/index.jsx β€” mount <AgentLauncher /> + <AgentDrawer /> next to existing ChatWidget/ChatLauncher.
  • src/hooks/useConfig.tsx β€” fix ConfigProvider useEffect to re-sync configs whenever configData changes.
  • src/pages/SuperAdmin/FeatureFlags/index.tsx β€” mount AI Activity widget + AI System Status panel.

Undo flow

  • On agent response containing createdJobId, drawer starts a 5-minute countdown banner with "Undo" β†’ fires existing useDeleteJobMutation and shows a confirmation toast. After expiry, banner converts to a passive "Created job #X β€” view" link.

No-address UX

  • InputBar shows persistent helper text: "Refer to locations by name (e.g. 'Acme Dallas DC') β€” please don't paste addresses."
  • If the API returns the 400 ADDRESS_DETECTED error code, show an inline error nudging the user to use a saved location name and offer a quick link to create one.
  • Tool-call pills render only redacted fields (name + city/state).

Phased build order​

Live todos for this section live in the implementation plan. This proposal lists the phase goals; the plan lists the per-task checkboxes you tick off as work lands.

The Anthropic data-policy review is a blocker for sending real customer data to the LLM, but it does NOT block any of the surrounding infrastructure. We build in 3 phases and only Phase 3 needs the policy decision.

Phase 1 β€” Infrastructure & kill switches (no LLM, no Anthropic key needed)​

Goal: a fully gated, observable, killable system before a single token is spent.

  • L1 env-var gate (AI_AGENT_GLOBAL_ENABLED) wired into config and middleware.
  • L2 system-runtime kill switch (Config doc type: "system") + cache + super-admin endpoint to flip it.
  • L3 register aiAgent.enabled in the feature-flag registry; tenantAdminToggleable: true.
  • L4 industry-gate middleware (requireAppType("trucking")).
  • aiAgent config block (model, monthlyTokenCap, perUserDailyMessageCap, monthlyCostCeilingUsd, defaultUndoWindowMs, dryRun).
  • Job.aiCreated, Job.aiCreatedAt, Job.pendingAiReview fields + schedule filter excluding pending-review jobs by default.
  • Tenant-admin PATCH /account/feature-flags endpoint with allowlist enforcement.
  • GET /api/v1/system/ai-status (no auth β€” returns { globalEnabled } only).
  • GET /api/v1/admin/ai-agent/health (super-admin) β€” initially returns env/runtime status only.
  • Service: fix ConfigProvider live-refresh; add tenant settings "AI Assistant" toggle card; add Super-Admin "AI System Status" panel with panic button + AI Activity widget shell; launcher polls /system/ai-status every 60s.

Ship value: every kill switch in place and verifiable. The platform can guarantee "AI is off" before any AI exists.

Phase 2 β€” Tools & scaffolding with stubbed LLM (still no real Anthropic call)​

  • addressRedactor utility + tests/services/ai/no-address-leak.test.js regression suite.
  • Inbound addressDetector regex guard at the route boundary returning 400 ADDRESS_DETECTED.
  • AgentSession model + audit storage (PII-trimmed tool results β€” redacted shape only).
  • Scoped tools (searchClients, searchLocations, createDraftJob) with full unit tests including cross-tenant isolation tests AND no-address-leak snapshot tests.
  • aiAgentLimiter (express-rate-limit), iteration cap, token cap, dry-run mode plumbing in createDraftJob.
  • Agent route + controller with a stubbed LLM provider (returns canned tool-call sequences). End-to-end test: stubbed LLM emits "call createDraftJob with these IDs" β†’ real Job is created with aiCreated: true, pendingAiReview: true, no driver, hidden from schedule by default.
  • AI agent drawer/launcher UI shell wired to the route, gated behind L1+L2+L3+L4.
  • Undo flow end-to-end (UI banner + DELETE call).
  • Circuit-breaker stub: track call counts and error rates against the stubbed provider so we can unit-test the auto-trip logic.

Ship value: entire system testable, killable, observable, and reviewable without any real LLM call.

Phase 3 β€” Wire Anthropic (requires data-policy decision + API key)​

  • Implement src/services/ai/anthropic.js (fail-closed when ANTHROPIC_API_KEY missing β†’ 503).
  • Real tool-use loop in runAgent with iteration cap and token accounting.
  • Tenant-aware system prompt with timezone, today's date, and "always confirm ambiguous matches" rule.
  • Cost estimation against pricing.js table; enforce per-tenant monthly $ ceiling and platform 24h ceiling; circuit breaker flips L2 on threshold breach.
  • Wire health endpoint to report anthropicReachable, lastSuccessfulCallAt, real error rates, MTD cost.
  • Internal smoke test in dry-run mode against a test tenant. Then internal employee tenant with real writes. Then 1 friendly external pilot tenant.

Pre-release safety checklist (run before any production tenant is enabled)​

Every box must be green. This is not optional.

Kill-switch verification (end-to-end, in alpha)

  • L1 env var = false β†’ API returns 503; service launcher hidden within 60s
  • L2 runtime flag flipped via panic button β†’ API returns 503; cache busts within 30s; launcher hidden
  • L3 tenant flag off β†’ API returns 403; launcher hidden for that tenant only
  • L4 industry gate β†’ service-repair tenant with flag forced on still gets 403
  • Restart all API instances with AI_AGENT_GLOBAL_ENABLED=false β†’ smoke check that no agent traffic succeeds

Cost & rate controls

  • Per-user daily message cap tested (429 after threshold)
  • Per-tenant monthly token cap tested (graceful rejection)
  • Per-tenant monthly $ ceiling tested (auto-disables that tenant only)
  • Platform 24h $ ceiling tested (trips L2 + alert)
  • Error-rate circuit breaker tested (forced 50% error rate trips L2 + alert)
  • Manual recovery from circuit-breaker trip works (super-admin flips L2 back)

Data & write safety

  • Cross-tenant isolation: tools never return data from other parentCompany (covered by tests)
  • LLM cannot invent IDs (createDraftJob rejects IDs not produced by recent search tool calls in same session)
  • AI-created jobs are flagged and hidden from dispatcher schedule by default
  • No leg.driver is ever set by the agent (verified by test)
  • Undo within 5 min cleanly deletes the job; AgentSession records the undo
  • Dry-run mode in production tenant does NOT create any Job

No-address guarantee (PII minimization)

  • no-address-leak.test.js is green and wired into CI for src/services/ai/** paths
  • Manual proxy-trace of a real session in alpha confirms outbound Anthropic payload contains zero address-shaped strings
  • Inbound address detector rejects pasted addresses with 400 ADDRESS_DETECTED (positive test for street, ZIP, and lat/lng patterns)
  • AgentSession docs in alpha verified to contain only redacted location shapes
  • InputBar helper text + inline error nudge render correctly
  • Anthropic data-policy review signed off for the narrowed scope (names + city/state only β€” no addresses)

Observability

  • Sentry receives breadcrumbs for tool loop iterations and provider errors
  • AI Activity widget shows correct MTD usage for a test tenant after a session
  • Logs verified PII-free (no full prompts, no full client lists, no tokens, no addresses at info level)
  • Health endpoint returns sane values from a freshly restarted instance
  • Alert email actually delivers when circuit breaker trips (test in alpha)

Process

  • Runbook published in attunelogic-docs/docs/operations/ai-agent-runbook.md
  • On-call rotation knows about the panic button and how to use it
  • First pilot tenant has been briefed and given the Undo + report-issue paths
  • Per-tenant monthlyCostCeilingUsd set conservatively for pilot (e.g. $25)
  • Platform dailyCostCeilingUsd set (e.g. $200 for pilot, raise as adoption grows)

Pilot rollout plan (gated by checklist above)​

  1. Internal alpha (employees only): dry-run mode on a test tenant; verify tool selection quality on 50+ real-world prompts.
  2. Internal beta (employees only): dry-run off, real writes, hard $5/day ceiling, ~1 week.
  3. Friendly external pilot (1 tenant): real writes, $25/month ceiling, daily check-in for first week.
  4. Expanded pilot (3-5 tenants): real writes, monitor AI Activity widget across all of them.
  5. GA flip: lifecycle beta β†’ ga in registry; tier defaults can opt in tier3+ tenants by default. Tenant admins can self-serve from there.

Any pilot tenant can be cut at any moment via L3 (super-admin Force Off). All pilot tenants can be cut simultaneously via L2 (panic button). The whole platform can be cut via L1 (env redeploy).


Out of scope for v1 (called out for follow-ups)​

  • Service/Repair industry support (separate tool set + system prompt).
  • Read-only Q&A ("show me this week's loads") and updates/reschedules.
  • Mobile assistant.
  • Streaming responses (v1 returns full reply when tool loop ends; v1.1 can switch to SSE).
  • Document/email ingestion ("create jobs from this attached PDF") β€” the extractedData pipeline already supports it; we'd add an ingestDocument tool later.
  • Net-new Location creation by the agent (would require LLM to handle addresses; deliberately deferred).
  • Driver assignment by the agent (would trigger notifications; humans dispatch after approval).

Open / pending external decisions​

  • Anthropic data-policy review (blocks Phase 3 only): scope narrowed to client names + location names + city/state strings β€” no addresses, lat/lng, postal codes, phone, or email. Pending stakeholder sign-off.
  • Anthropic API key: not yet created; build will wire env var and a fail-closed 503 when missing so Phase 1/2 can ship without it.
  • Per-tenant monthly $ ceiling default: starting recommendation is $25/month for pilots β€” easy to raise per tenant via super-admin once we see real usage patterns. Confirm before Phase 3.
  • Platform 24h $ ceiling: starting recommendation $200/day. Confirm before Phase 3.
  • Default model: starting recommendation claude-sonnet-4-... (latest). Confirm before Phase 3.

Branches​

  • API: feature/ai-agent-trucking-v1 (already created)
  • Service: feature/ai-agent-trucking-v1 (to create when Phase 2 begins)
  • Promotion: feature/* β†’ beta β†’ alpha β†’ main per 44-release-branch-policy

  • Implementation plan with checkbox-trackable todos: docs/plans/ai-agent-trucking-v1.md
  • Operational runbook (to be authored in Phase 1): docs/operations/ai-agent-runbook.md
  • Local Cursor plan (live todo state, not committed): ~/.cursor/plans/ai_agent_trucking_v1_*.plan.md
  • Existing Job extraction pipeline: attunelogic-api/docs/JOB_EXTRACTION_API.md and attunelogic-api/src/controllers/jobs/create.js#handleExtractedJobCreate
  • Feature flag system: attunelogic-api/src/services/feature-flags/resolveFeatureFlags.js, attunelogic-service/src/pages/SuperAdmin/FeatureFlags/index.tsx
  • Cross-repo branch policy: 44-release-branch-policy