Skip to content

Session Apr 17 2026 — Monorepo Prod Cutover + Caller-ID Feature Set

Long session. First real prod deploy of the monorepo, plus the full outbound caller-ID priority chain, per-DID routing environments, Super Human agent extension flow, and cleanup of stale test orgs from prod.

High-level summary

  • Prod cut over to monorepo deploys for the first time (api, editor, workflow-engine, pipecat-flow). Zero customer downtime.
  • Prod cleaned up: Acme Corporation, TechStart Inc, AstraPrivate deleted. Only GrandEstancia remains as a live prod tenant. Staging forwarding still works via orphaned DIDs.
  • Staging cleaned up: 30 → 11 DIDs, Dharan org removed, AstraPrivate retained.
  • Live customer bug squashed: GE inbound calls to queue 5003 were falling through to the AI Agent — queue had no Answer() before Queue(). Hot-fixed in live prod, then committed.
  • Complete outbound caller-ID story landed: per-call API > per-user > org default > first assigned > NUC fallback. Works across both editor surfaces + API paths.

What was shipped

Core features

PR Branch Feature
#17/#18 feat/per-user-outbound-did users.outbound_did column + Users page dropdown
#19/#20 feat/org-default-did did_numbers.is_default + Phone Numbers UI "Set as Default Caller ID" + auto-default on first DID assign
#21/#22 fix/did-admin-put-allow-routing-env Admin UI dropdown for routing_environment was silently ignored — routing_environment missing from allowed-fields list
#23/#24 feat/super-human-agent-extension Super Human dialog create + callable extension assignment for bots (editor-side orchestration linking bot + user via routing_destination URL)
#25/#26 fix/generator-orphan-staging-dids Dispatcher generator emits staging-forwarding for orphan DIDs (no owning org needed)
#27/#28 fix/org-creation-flows request-org was crashing on every signup (missing context_prefix); admin create-org never created owner org_users row

Infrastructure + data

  • users.outbound_did VARCHAR(20) added to both staging + prod DBs
  • did_numbers.is_default BOOLEAN DEFAULT 0 added to both; backfilled first-assigned DID per org as default
  • did_numbers.routing_environment ENUM('prod','staging','oss') DEFAULT 'prod' added to both
  • Staging cleaned: 19 stray prod-cloned DIDs + Dharan org removed
  • Prod cleaned: Acme, TechStart, AstraPrivate orgs + their users/queues deleted. DIDs released back to pool. 78001/78003 orphaned with routing_environment='staging'.

NUC changes

  • from-cloud range validation extended — Indian local format (0XXXXXXXXX) auto-normalised to 91XXXXXXXXX before the range check. Fixes GE outbound caller ID showing as 78001 instead of GE's 78002.
  • Range preserved: 918065978000-029 still the valid set.

Staging PJSIP

  • tata_gateway_identify now matches both 10.10.10.2 (NUC) and 10.10.10.1 (prod forwarding) so staging-routed DIDs can flow prod → staging.

Bug fixes documented

See Troubleshooting entries 28–35:

  • Error 28 — request-org notNull violation (every self-serve signup broken)
  • Error 29 — admin create-org missing org_users owner (admin-created orgs unusable)
  • Error 30 — duplicate DID records routing to wrong org
  • Error 31 — queue without Answer() → fell through to AI Agent (real customer impact)
  • Error 32 — staging PJSIP identify missing prod IP (staging forwarding auth failed)
  • Error 33 — NUC clobbered Indian local-format outbound CID
  • Error 34 — admin UI routing_environment silently dropped by PUT endpoint
  • Error 35 — dispatcher required an org for every DID

New internal-docs pages

Operational hygiene

  • Auto-sync workflow added: every push to main force-updates staging branch so --delete-branch can't leave us with a missing staging
  • 6 merged feature branches cleaned up from both remote + local clones
  • dev-deploy.sh script (committed earlier) used extensively — cut feedback cycle from ~5 min (CI/CD) to ~5s (api) / ~3-4 min (editor) for staging experiments

Known followups (not urgent)

  • OSS dispatching kept on NUC deliberately for security isolation. If OSS grows real customers, revisit with a WireGuard tunnel prod ↔ OSS (defence in depth via SIP auth + WG private key + IP whitelist)
  • Queue failover destination is hardcoded to ext 1003 (AI Agent) in the generator. Owner accepted this as intentional — "max timeout routes to AI". Could become per-queue configurable later.
  • users.outbound_did column isn't auto-cleaned when a DID is released from the org. Low-impact — the user's CID just becomes stale and falls back to org default via resolveCallerId's re-validation. Could add a trigger or a nightly job.
  • Super Human agent extension suggestion starts at 1099 by convention. No hard partition between "human" and "bot" extension ranges — just avoids accidental collision. Worth documenting in the admin user-creation flow for consistency.

Rollback artifacts retained

  • /root/cutover-backup-1776420747 on prod — full DB dump + /opt/* tarballs + asterisk config snapshot (from the monorepo cutover)
  • /root/cleanup-backup-1776429885 on prod — full DB dump before the org cleanup (Acme/TechStart/AstraPrivate deletion)
  • /root/staging-cleanup-1776427595 on staging — DB dumps before the Dharan + stray DID removal

Keep all three until at least 2026-05-01.