Session Apr 17 2026 — Monorepo Prod Cutover + Caller-ID Feature Set¶
Long session. First real prod deploy of the monorepo, plus the full outbound caller-ID priority chain, per-DID routing environments, Super Human agent extension flow, and cleanup of stale test orgs from prod.
High-level summary¶
- Prod cut over to monorepo deploys for the first time (api, editor, workflow-engine, pipecat-flow). Zero customer downtime.
- Prod cleaned up: Acme Corporation, TechStart Inc, AstraPrivate deleted. Only GrandEstancia remains as a live prod tenant. Staging forwarding still works via orphaned DIDs.
- Staging cleaned up: 30 → 11 DIDs, Dharan org removed, AstraPrivate retained.
- Live customer bug squashed: GE inbound calls to queue 5003 were falling through to the AI Agent — queue had no
Answer()beforeQueue(). Hot-fixed in live prod, then committed. - Complete outbound caller-ID story landed: per-call API > per-user > org default > first assigned > NUC fallback. Works across both editor surfaces + API paths.
What was shipped¶
Core features¶
| PR | Branch | Feature |
|---|---|---|
| #17/#18 | feat/per-user-outbound-did | users.outbound_did column + Users page dropdown |
| #19/#20 | feat/org-default-did | did_numbers.is_default + Phone Numbers UI "Set as Default Caller ID" + auto-default on first DID assign |
| #21/#22 | fix/did-admin-put-allow-routing-env | Admin UI dropdown for routing_environment was silently ignored — routing_environment missing from allowed-fields list |
| #23/#24 | feat/super-human-agent-extension | Super Human dialog create + callable extension assignment for bots (editor-side orchestration linking bot + user via routing_destination URL) |
| #25/#26 | fix/generator-orphan-staging-dids | Dispatcher generator emits staging-forwarding for orphan DIDs (no owning org needed) |
| #27/#28 | fix/org-creation-flows | request-org was crashing on every signup (missing context_prefix); admin create-org never created owner org_users row |
Infrastructure + data¶
users.outbound_did VARCHAR(20)added to both staging + prod DBsdid_numbers.is_default BOOLEAN DEFAULT 0added to both; backfilled first-assigned DID per org as defaultdid_numbers.routing_environment ENUM('prod','staging','oss') DEFAULT 'prod'added to both- Staging cleaned: 19 stray prod-cloned DIDs + Dharan org removed
- Prod cleaned: Acme, TechStart, AstraPrivate orgs + their users/queues deleted. DIDs released back to pool. 78001/78003 orphaned with routing_environment='staging'.
NUC changes¶
from-cloudrange validation extended — Indian local format (0XXXXXXXXX) auto-normalised to91XXXXXXXXXbefore the range check. Fixes GE outbound caller ID showing as 78001 instead of GE's 78002.- Range preserved:
918065978000-029still the valid set.
Staging PJSIP¶
tata_gateway_identifynow matches both10.10.10.2(NUC) and10.10.10.1(prod forwarding) so staging-routed DIDs can flow prod → staging.
Bug fixes documented¶
See Troubleshooting entries 28–35:
- Error 28 — request-org notNull violation (every self-serve signup broken)
- Error 29 — admin create-org missing org_users owner (admin-created orgs unusable)
- Error 30 — duplicate DID records routing to wrong org
- Error 31 — queue without Answer() → fell through to AI Agent (real customer impact)
- Error 32 — staging PJSIP identify missing prod IP (staging forwarding auth failed)
- Error 33 — NUC clobbered Indian local-format outbound CID
- Error 34 — admin UI routing_environment silently dropped by PUT endpoint
- Error 35 — dispatcher required an org for every DID
New internal-docs pages¶
- Outbound Caller ID — full priority chain + NUC normalisation + worked examples
- DID Routing Environments — per-DID routing_environment model, format aliases, OSS isolation rationale, troubleshooting
- Prod Monorepo Cutover Runbook — marked completed, preserved as template
- Staging Environment — post-cutover state added
- DID Management — rewrote around new data model
Operational hygiene¶
- Auto-sync workflow added: every push to
mainforce-updatesstagingbranch so--delete-branchcan't leave us with a missing staging - 6 merged feature branches cleaned up from both remote + local clones
dev-deploy.shscript (committed earlier) used extensively — cut feedback cycle from ~5 min (CI/CD) to ~5s (api) / ~3-4 min (editor) for staging experiments
Known followups (not urgent)¶
- OSS dispatching kept on NUC deliberately for security isolation. If OSS grows real customers, revisit with a WireGuard tunnel prod ↔ OSS (defence in depth via SIP auth + WG private key + IP whitelist)
- Queue failover destination is hardcoded to ext 1003 (AI Agent) in the generator. Owner accepted this as intentional — "max timeout routes to AI". Could become per-queue configurable later.
users.outbound_didcolumn isn't auto-cleaned when a DID is released from the org. Low-impact — the user's CID just becomes stale and falls back to org default via resolveCallerId's re-validation. Could add a trigger or a nightly job.- Super Human agent extension suggestion starts at 1099 by convention. No hard partition between "human" and "bot" extension ranges — just avoids accidental collision. Worth documenting in the admin user-creation flow for consistency.
Rollback artifacts retained¶
/root/cutover-backup-1776420747on prod — full DB dump +/opt/*tarballs + asterisk config snapshot (from the monorepo cutover)/root/cleanup-backup-1776429885on prod — full DB dump before the org cleanup (Acme/TechStart/AstraPrivate deletion)/root/staging-cleanup-1776427595on staging — DB dumps before the Dharan + stray DID removal
Keep all three until at least 2026-05-01.