Skip to content

Customer Tunnels (WireGuard)

Per-org WireGuard tunnels that give customer PBXes a stable, encrypted path into Astradial's cloud Asterisk — eliminating CGNAT, multi-WAN failover, and dynamic-IP problems at the network layer.

Managed through the editor (editor.astradial.com), not by hand-editing config files. Per-org, version-controlled, auditable.

Why this exists

Customers like V7 sit behind dynamic/CGNAT public IPs and often have multi-WAN failover. From the cloud's perspective, the customer's public IP appears to rotate between several values — sometimes within seconds. This breaks SIP in well-known ways:

  • NAT pinholes die and get re-bound on different ports
  • Cloud Asterisk's stored contact for a phone points at an IP that no longer routes
  • Inbound calls during the recovery window go to the wrong place or nowhere
  • Each phone independently fights the problem

A WireGuard tunnel between the customer's site router and our cloud gives both sides a fixed, stable tunnel IP regardless of what's happening at the public-IP layer. The customer's site router (e.g., Grandstream GWN7002) initiates the tunnel; our cloud accepts it; SIP traffic flows through the tunnel and the cloud sees the customer at a stable address forever.

WireGuard is identity-based, not IP-based: when the customer's WAN flips or CGNAT rotates the public IP, the tunnel survives because authentication is by cryptographic key, not by IP.

Security model (read this first)

Customer tunnels handle external traffic and must be strictly isolated from Astradial's internal infrastructure.

Layered defenses

                           PUBLIC INTERNET
        ┌────────────────────────┼────────────────────────┐
        │                                                 │
        │  CLOUD VPS (89.116.31.109)                      │
        │                                                 │
        │  ┌──────────────────┐    ┌──────────────────┐   │
        │  │  wg0 (existing)  │    │  wg1 (NEW)       │   │
        │  │  UDP 51820       │    │  UDP 51821       │   │
        │  │  10.10.10.0/24   │    │  10.20.0.0/16    │   │
        │  │                  │    │                  │   │
        │  │  • NUC           │    │  • V7            │   │
        │  │  • Staging       │    │  • Future cust N │   │
        │  └────────┬─────────┘    └────────┬─────────┘   │
        │           │                       │             │
        │           └──── iptables DROP ────┘             │
        │                  (no cross-traffic)              │
        │                                                 │
        │  Cloud services (Asterisk, AstraPBX, etc.)      │
        └─────────────────────────────────────────────────┘

Defense layers

Layer Mechanism Purpose
1. Separate interface wg1 distinct from existing wg0 Cleanly isolates customer traffic from internal infra
2. Separate UDP port wg1 listens on 51821 (not 51820) Per-port firewall rules + audit; fail2ban distinguishes attack surfaces
3. Crypto-tight AllowedIPs Each peer's AllowedIPs = single /32 of their tunnel IP WireGuard refuses spoofed source IPs
4. iptables FORWARD drops wg1↔wg0 and wg1↔wg1 are explicit DROP Customer can't reach NUC, staging, or other customers — even if forwarding is enabled
5. iptables INPUT scope wg1 traffic only allowed to SIP/RTP ports Customer can't reach SSH (22), AstraPBX API (8000), editor (3001), MariaDB (3306)
6. PJSIP endpoint match V7's PJSIP endpoints match 10.20.7.0/30 only Customer's tunnel traffic can only reach their own org's extensions
7. API-level RBAC Tunnel CRUD requires admin role on the org Customers can't create/modify tunnels themselves
8. Private key never exposed via API Server's WG private key stays on disk; only public key + PSK returned to UI Compromise of editor doesn't compromise server identity
9. Audit log Every tunnel CRUD writes to audit_log table Forensic trail of who did what
10. Input validation Pubkey format, allowed-IP CIDR, name regex all validated Reject malformed input early

What a customer CAN reach via their tunnel

✅ Cloud's wg1 tunnel IP (e.g. 10.20.7.1) — for SIP signaling/RTP only
✅ Cloud Asterisk on 5060/UDP, 5080/UDP, 10000-20000/UDP (RTP)

What a customer CANNOT reach

❌ NUC (10.10.10.2)             — different interface + iptables DROP
❌ Staging (10.10.10.3)         — different interface + iptables DROP
❌ Other customer tunnels       — wg1-to-wg1 iptables DROP
❌ SSH (22)                     — INPUT policy on wg1
❌ AstraPBX API (8000)          — INPUT policy on wg1
❌ Editor (3001)                — INPUT policy on wg1
❌ MariaDB / Postgres / Redis   — bind to localhost only + INPUT policy
❌ Any other org's PJSIP endpoints — PJSIP `match` is per-org /30

Architecture

Subnet allocation

Range Use Notes
10.10.10.0/24 Internal infra (wg0) EXISTING. NUC (.2), Staging (.3). Do not add customers here.
10.20.0.0/16 Customer tunnels (wg1) New pool. Allocator picks next free /30 per customer.
10.20.N.0/30 Per-customer /30 Cloud peer .1, customer peer .2. (.0 net, .3 broadcast unused.)

Per-customer /30 layout (example: V7)

10.20.7.0/30:
  10.20.7.0  → network address (unused)
  10.20.7.1  → cloud-side tunnel IP
  10.20.7.2  → customer-side tunnel IP
  10.20.7.3  → broadcast (unused)

Each customer's /30 is independently allocated by the API at tunnel creation time.

Components

Component Type Lives in
customer_tunnels table DB (MariaDB) pbx_api_db
CustomerTunnel Sequelize model Backend api/src/models/CustomerTunnel.js
customer-tunnels.js routes API api/src/routes/customer-tunnels.js
wireguardGenerator.js service Backend api/src/services/asterisk/wireguardGenerator.js
wireguardApplier.js service Backend api/src/services/asterisk/wireguardApplier.js
Subnet allocator Backend service api/src/services/network/subnetAllocator.js
Network Tunnels UI Frontend editor/app/dashboard/[orgId]/settings/page.tsx (new tab)
wg1 interface System (per-VPS) /etc/wireguard/wg1.conf (generated by wireguardGenerator)
iptables rules System (per-VPS) /etc/iptables/customer-tunnels.v4 (set once at bootstrap)

Database schema

// api/database/migrations/YYYYMMDDhhmmss-create-customer-tunnels.js
'use strict';

module.exports = {
  up: async (queryInterface, Sequelize) => {
    await queryInterface.createTable('customer_tunnels', {
      id: {
        type: Sequelize.UUID,
        defaultValue: Sequelize.UUIDV4,
        primaryKey: true,
      },
      org_id: {
        type: Sequelize.UUID,
        allowNull: false,
        references: { model: 'organizations', key: 'id' },
        onDelete: 'CASCADE',
      },
      name: {
        type: Sequelize.STRING(64),
        allowNull: false,
      },
      tunnel_subnet: {
        type: Sequelize.STRING(18),       // CIDR (e.g. "10.20.7.0/30")
        allowNull: false,
        unique: true,
      },
      cloud_tunnel_ip: {
        type: Sequelize.STRING(15),
        allowNull: false,
      },
      customer_tunnel_ip: {
        type: Sequelize.STRING(15),
        allowNull: false,
      },
      customer_pubkey: {
        type: Sequelize.STRING(64),       // base64-encoded 32-byte WG pubkey
        allowNull: false,
      },
      preshared_key: {
        type: Sequelize.STRING(64),
        allowNull: false,
      },
      persistent_keepalive: {
        type: Sequelize.INTEGER,
        defaultValue: 25,
      },
      listen_port: {
        type: Sequelize.INTEGER,
        defaultValue: 51821,
      },
      interface_name: {
        type: Sequelize.STRING(16),
        defaultValue: 'wg1',
      },
      status: {
        type: Sequelize.ENUM('active', 'disabled', 'revoked'),
        defaultValue: 'active',
        allowNull: false,
      },
      notes: {
        type: Sequelize.TEXT,
      },
      created_at: { type: Sequelize.DATE, defaultValue: Sequelize.NOW },
      updated_at: { type: Sequelize.DATE, defaultValue: Sequelize.NOW },
      created_by_user_id: {
        type: Sequelize.UUID,
        references: { model: 'org_users', key: 'id' },
        onDelete: 'SET NULL',
      },
    });

    await queryInterface.addIndex('customer_tunnels', ['org_id']);
    await queryInterface.addIndex('customer_tunnels', ['status']);
    await queryInterface.addIndex('customer_tunnels', ['org_id', 'name'], { unique: true });
  },

  down: async (queryInterface) => {
    await queryInterface.dropTable('customer_tunnels');
  },
};

Notes: - customer_pubkey and preshared_key are stored in plaintext, consistent with how sip_trunks.password is stored today (see v7-setup.md and existing models). Plain WireGuard pubkeys are not secrets; PSKs are. TODO: encrypt PSKs at rest as part of a broader DB-secret-encryption initiative. - org_id cascades on org deletion (tunnel automatically removed if org is deleted). - Unique on (org_id, name) prevents duplicate names within an org; unique on tunnel_subnet prevents subnet collision globally.

API surface

Routes live in api/src/routes/customer-tunnels.js, mounted at /api/v1/customer-tunnels. They follow the same auth + org-scoping pattern as queues.js (JWT middleware → req.user.org_id filter on every query).

Method Path RBAC Purpose
GET /api/v1/customer-tunnels admin List tunnels for the requesting org
GET /api/v1/customer-tunnels/:id admin Get one tunnel, including live status (last handshake, bytes transferred)
POST /api/v1/customer-tunnels admin Create — body { name, customer_pubkey, notes? }. Server allocates /30, generates PSK, writes wg1.conf, returns full record including the Peer config block for the customer to paste on their side
PATCH /api/v1/customer-tunnels/:id admin Update — { status, notes }. Switching to disabled removes peer from wg1 but keeps DB row.
DELETE /api/v1/customer-tunnels/:id admin Revoke — removes peer from wg1, marks status revoked. Subnet is NOT immediately reused (kept reserved for 30 days for audit).
GET /api/v1/customer-tunnels/:id/customer-config admin Returns the customer-side [Peer] block as plain text for copy/paste into GDMS

Every route runs the existing JWT + RBAC middleware (requireRole('admin')), validates input via express-validator, and writes an entry to audit_log on every mutation.

Subnet allocator (api/src/services/network/subnetAllocator.js)

const POOL_CIDR = '10.20.0.0/16';
const PREFIX_LENGTH = 30;

async function allocateNextAvailable() {
  // Get all in-use subnets ordered ascending
  const used = await CustomerTunnel.findAll({
    where: { status: ['active', 'disabled'] },  // revoked subnets reserved for 30d
    attributes: ['tunnel_subnet'],
    order: [['tunnel_subnet', 'ASC']],
  });

  // Walk 10.20.0.0/30, 10.20.0.4/30, ... 10.20.0.252/30, 10.20.1.0/30, ...
  // First-fit: return the first /30 in pool not present in `used`
  // Skip 10.20.0.0/30 (reserved, network boundary)

  for (let octet3 = 0; octet3 < 256; octet3++) {
    for (let octet4 = 4; octet4 < 256; octet4 += 4) {  // /30 = 4 IPs aligned
      const candidate = `10.20.${octet3}.${octet4}/30`;
      if (!used.find((u) => u.tunnel_subnet === candidate)) {
        return {
          subnet: candidate,
          cloud_ip: `10.20.${octet3}.${octet4 + 1}`,
          customer_ip: `10.20.${octet3}.${octet4 + 2}`,
        };
      }
    }
  }
  throw new Error('Subnet pool exhausted (10.20.0.0/16 fully allocated)');
}

Capacity: 10.20.0.0/16 has 16,384 /30s. We will not exhaust this in any realistic scenario.

WireGuard config generator (api/src/services/asterisk/wireguardGenerator.js)

Mirrors the pattern of dialplanGenerator.js. Reads all active and disabled (but with peers omitted) tunnels from DB, emits the full wg1.conf from scratch on every regeneration:

async function generateWg1Config() {
  const serverPrivateKey = await readServerPrivateKey();  // /etc/wireguard/wg1.private (root-only)
  const tunnels = await CustomerTunnel.findAll({
    where: { status: 'active' },
    order: [['created_at', 'ASC']],
  });

  let conf = `# AUTO-GENERATED by AstraPBX wireguardGenerator. DO NOT EDIT BY HAND.
# Source of truth: customer_tunnels table in pbx_api_db.
# Generated: ${new Date().toISOString()}

[Interface]
Address = 10.20.0.1/16
ListenPort = 51821
PrivateKey = ${serverPrivateKey}
PostUp = /usr/local/sbin/customer-tunnels-iptables.sh up
PostDown = /usr/local/sbin/customer-tunnels-iptables.sh down

`;

  for (const t of tunnels) {
    conf += `# org=${t.org_id} name=${t.name} created=${t.created_at.toISOString()}
[Peer]
PublicKey = ${t.customer_pubkey}
PresharedKey = ${t.preshared_key}
AllowedIPs = ${t.customer_tunnel_ip}/32
PersistentKeepalive = ${t.persistent_keepalive}

`;
  }

  return conf;
}

Server's WG private key is generated once during the wg1 bootstrap (see Bootstrap procedure) and stays in /etc/wireguard/wg1.private with chmod 600 root:root. Never returned via API. Never logged.

WireGuard applier (api/src/services/asterisk/wireguardApplier.js)

async function applyWg1() {
  const conf = await generateWg1Config();

  // 1. Atomic write to staging file
  const tmp = '/etc/wireguard/wg1.conf.new';
  await fs.writeFile(tmp, conf, { mode: 0o600 });

  // 2. Backup current
  const backup = `/etc/wireguard/wg1.conf.bak-${Date.now()}`;
  await fs.copyFile('/etc/wireguard/wg1.conf', backup).catch(() => {});

  // 3. Move into place atomically
  await fs.rename(tmp, '/etc/wireguard/wg1.conf');

  // 4. Hot-reload (no tunnel restart for unchanged peers)
  await exec('wg syncconf wg1 <(wg-quick strip wg1)', { shell: '/bin/bash' });

  // 5. Verify wg show succeeds
  const { stdout } = await exec('wg show wg1');
  return { applied: true, peer_count: tunnels.length, wg_status: stdout };
}

wg syncconf is a hot-reload that only changes peers that differ — existing tunnels are not disrupted when a new peer is added. Backups in /etc/wireguard/wg1.conf.bak-<ts> are retained 30 days then garbage-collected by a cron job.

Bootstrap procedure (per-VPS, one-time)

This is infrastructure setup, not feature code. Done once on staging during development, then once on prod before the feature ships. Documented here as a runbook.

# 1. Generate server WG keys (once per VPS)
umask 077
wg genkey > /etc/wireguard/wg1.private
wg pubkey < /etc/wireguard/wg1.private > /etc/wireguard/wg1.public
chmod 600 /etc/wireguard/wg1.private
chmod 644 /etc/wireguard/wg1.public

# 2. Write iptables helper script
cat > /usr/local/sbin/customer-tunnels-iptables.sh <<'EOF'
#!/bin/bash
# Applies iptables rules when wg1 comes up; removes them when wg1 goes down.
# Called from PostUp/PostDown in /etc/wireguard/wg1.conf.

set -e

case "$1" in
  up)
    # Block customer→internal infra
    iptables -A FORWARD -i wg1 -o wg0 -j DROP
    iptables -A FORWARD -i wg0 -o wg1 -j DROP
    # Block customer→customer
    iptables -A FORWARD -i wg1 -o wg1 -j DROP
    # Allow only SIP+RTP into the cloud
    iptables -A INPUT -i wg1 -p udp --dport 5060 -j ACCEPT
    iptables -A INPUT -i wg1 -p udp --dport 5080 -j ACCEPT
    iptables -A INPUT -i wg1 -p udp --dport 10000:20000 -j ACCEPT
    # Drop everything else from wg1
    iptables -A INPUT -i wg1 -j DROP
    ;;
  down)
    iptables -D FORWARD -i wg1 -o wg0 -j DROP || true
    iptables -D FORWARD -i wg0 -o wg1 -j DROP || true
    iptables -D FORWARD -i wg1 -o wg1 -j DROP || true
    iptables -D INPUT -i wg1 -p udp --dport 5060 -j ACCEPT || true
    iptables -D INPUT -i wg1 -p udp --dport 5080 -j ACCEPT || true
    iptables -D INPUT -i wg1 -p udp --dport 10000:20000 -j ACCEPT || true
    iptables -D INPUT -i wg1 -j DROP || true
    ;;
esac
EOF
chmod 755 /usr/local/sbin/customer-tunnels-iptables.sh

# 3. Initial empty wg1.conf (no peers yet — AstraPBX will populate via wireguardGenerator)
cat > /etc/wireguard/wg1.conf <<EOF
[Interface]
Address = 10.20.0.1/16
ListenPort = 51821
PrivateKey = $(cat /etc/wireguard/wg1.private)
PostUp = /usr/local/sbin/customer-tunnels-iptables.sh up
PostDown = /usr/local/sbin/customer-tunnels-iptables.sh down
EOF
chmod 600 /etc/wireguard/wg1.conf

# 4. Open UDP 51821 in ufw
ufw allow 51821/udp comment 'WireGuard customer tunnels'

# 5. Enable and start the service
systemctl enable --now wg-quick@wg1

# 6. Verify
wg show wg1
ip link show wg1
iptables -L FORWARD -v | grep wg1

This bootstrap is run by the deploy script during initial feature rollout (in scripts/setup/wg1-bootstrap.sh in the monorepo), with explicit --dry-run option for safety.

Editor UI

A new tab on the Org Settings page (editor/app/dashboard/[orgId]/settings/page.tsx):

Org Settings
├── Organization (existing)
├── Asterisk Configuration (existing)
├── Session (existing)
└── Network Tunnels (NEW)
     ├── List of active tunnels for this org
     │   └── For each: name, subnet, last handshake, status, [⋮ actions]
     ├── [+ Add Tunnel] button → modal:
     │     - Name (default "astradial-<orgshortname>")
     │     - Customer pubkey (textarea — paste from GDMS)
     │     - Notes (optional)
     │     [Submit]
     └── [View customer config] action → modal:
           - Renders the [Peer] block for the customer to paste in GDMS
           - "Copy to clipboard" button
           - Pre-shared key shown with reveal/hide toggle

UI calls the new API via pbxCustomerTunnels.list(), pbxCustomerTunnels.create(), etc. — added to the existing lib/pbx/client.ts.

Staging-first rollout plan

Step What Where Reversible?
1 Run bootstrap on staging VPS 94.136.188.221 Yes — remove wg1 service + uninstall iptables script
2 Merge feature branch (DB migration + backend) to staging branch GitHub PR Yes — revert PR
3 CI deploys backend to staging VPS auto
4 Run migration on staging DB npx sequelize-cli db:migrate Yes — db:migrate:undo
5 Merge frontend feature branch to staging GitHub PR Yes — revert PR
6 E2E test: create a test tunnel via UI for a test org, configure a test client (e.g., a Linux box with wireguard-tools), verify tunnel establishes Staging
7 E2E test: simulate a tunnel-IP SIP registration arrives at staging Asterisk, verify endpoint match works Staging
8 Run for 24h, monitor wg show wg1, monitor staging Asterisk logs for unexpected drops Staging
9 Bootstrap prod VPS 89.116.31.109 Yes — same removal procedure
10 Merge stagingmain GitHub PR Yes — revert PR
11 CI deploys to prod, runs migration on prod DB auto Yes — db:migrate:undo on prod
12 Onboard V7 as the first real customer via the UI Editor on prod Yes — disable tunnel + revert phones to direct cloud registration

No prod changes until Steps 1–8 are clean on staging.

Production rollout — actual execution log

The order in the table above was theoretical. Actual prod rollout used a slightly different order to avoid a sequelize.sync() race (prod's server.js line 6223 still calls sync() at boot — fixed in PR #138 but not yet on main). Running migrations BEFORE the code merge means tables exist before any sync() runs, avoiding MariaDB 11 FK collision (1061) that bit us in PRs #125 and #131.

Stage What Result
1 Backup prod DB /root/pre-customer-tunnels-2026-05-12T115633Z/pbx_api_db.sql.gz, 290 KB, 30 tables, gzip verified
2 SCP migration files to /opt/astrapbx/database/migrations/, run npx sequelize-cli db:migrate Both migrated cleanly (0.239s + 0.080s). customer_tunnels + tunnel_metrics exist, empty, SequelizeMeta updated. astrapbx pm2 process untouched (no restart).
3 Merge stagingmain (PR #140, 28 commits, +7.2k LOC) CI green: Deploy API to production 25s, Check API routes ✓. Editor deploy queued (self-hosted runner backlog, not blocking). pm2 reload astrapbx graceful — PID rotated, uptime fresh, zero dropped requests. WireGuard status poller started (60s interval) confirmed at startup. wg-poller logs [wg-poller] cycle failed: Command failed: wg show wg1 dump once per minute as expected (wg1 absent until Stage 4). CDR poller resumed at last ID 4246 (PR #139 fix verified). AMI + ARI connections re-established. /health 200. /api/v1/customer-tunnels returns 401 unauthorized (route mounted, auth middleware firing). DB unchanged: both tables still empty. CLOUD_PUBLIC_IP set on prod .env before merge so reload picked it up.
4 Run wg1-bootstrap.sh on prod VPS --check and --dry-run clean (port 51821 free, 10.20.0.0/16 unrouted, wg0 present, all 7 pre-flight PASS). Live run exit 0 at 17:45:28 IST: keypair generated, helper installed at /usr/local/sbin/customer-tunnels-iptables.sh (mode 755), wg1.conf written (mode 600, interface block + 0 peers), ufw allow 51821/udp added, wg-quick@wg1 enabled+active. Verification: wg show wg1 returns 0 peers / port 51821, ip a wg1 shows UP at 10.20.0.1/16, iptables FORWARD has 3 DROP rules (wg1↔wg0 + wg1↔wg1), iptables INPUT has 3 ACCEPT (5060/5080/10000-20000) + 1 catch-all DROP, syslog wg1-iptables logged all 7 rule additions + "up: complete". wg-poller transitioned silently — error log mtime frozen at 17:44:52, no new failures after wg1 came up. wg0 untouched (NUC peer still handshaking ~20s fresh). Asterisk untouched (0 channels during run). Server public key: 0Dfkqmj3UFLCN4mmG+Cp2j7VfP4J75iOyA+AZUxKQng= (paste into customer router peer config). Transcript: /var/log/wg1-bootstrap-20260512-174526.log on prod.
5 ~~Set CLOUD_PUBLIC_IP=89.116.31.109 in prod .env~~ DONE (folded into Stage 3 pre-flight)
6 Onboard V7 as first real customer via Editor UI FULLY WORKING after PR #146 + PR #148. Tunnel V7_Tirupathur created (subnet 10.20.0.0/30, customer pubkey ylkqY4S7ahWWAD3L2m10dv2r+TRflPupWxCwfZ51hAI=, customer_lan_cidr 192.168.0.0/24 set via Editor "Edit Tunnel" dialog). Reception phone (ext 09) registers via tunnel from 192.168.0.7610.20.0.1:5080, RTT ~172ms, active call confirmed (Endpoint: org_moijhj2l__09/09 Ringing 1 of inf). Endpoint roamed 103.197.113.158:33252120.60.105.158:51820 mid-session (WG roaming verified ✓). Failover policy "BSNL-Primary-Rail-Backup" configured in GDMS Internet Source. Mac softphone (org_mo8vbv60__1003 + org_moijhj2l__01) registered via public path on UDP transport — TCP transport failed qualify (transport mismatch with V7 endpoints which use transport-udp).

Architectural issues found during V7 onboarding (all resolved)

# Issue Severity Resolution
1 Per-customer cloud_tunnel_ip varies by /30 but wg1 only binds 10.20.0.1 P0 — would have blocked customer #2 PR #145 — Editor customer-config now always returns 10.20.0.1 regardless of the customer's /30
2 Customer-config recommended 89.116.31.109/32 in AllowedIPs causing GWN7002 routing-loop rejection P1 — blocked any Grandstream-router customer PR #144cloud_routed_ips defaults to empty; only the tunnel-side IP is added
3 No SNAT/MASQUERADE on customer side — server-side WG cryptokey routing rejected packets from customer LAN P0 — blocked all phone registrations via tunnel PR #146 — new customer_lan_cidr field, validated server-side, expanded into server peer's AllowedIPs; Editor "Edit Tunnel" UI to set it post-create
4 wg syncconf updates AllowedIPs but NOT kernel routing table — responses to customer LAN went out eth0 instead of wg1 P0 — found mid-session, manually unblocked via ip route add 192.168.0.0/24 dev wg1 PR #148syncCustomerLanRoutes in the applier auto-manages kernel routes on every tunnel apply (add/remove with proper diff, idempotent, defense-in-depth against shell injection)

Softphone gotchas (operator-level)

Discovered during V7 onboarding. Worth checking first when a softphone fails to register or shows "qualify failed":

  1. SIP Transport mismatch. Astradial's PJSIP endpoints (per-org PJSIP confs) are configured with transport=transport-udp. If a softphone client defaults to TCP (Telephone.app on macOS does this on first install), the REGISTER may succeed but qualify (OPTIONS keepalive) fails with no clear error — endpoint shows Unavailable even though the contact is in the AOR. Fix: set SIP Transport to UDP in the softphone's account settings.

  2. fail2ban bans after repeated wrong-auth attempts. Three consecutive failed auths trip asterisk-auth jail and the source IP gets blocked at iptables (BEFORE Asterisk sees subsequent packets). Symptom: registration just goes silent, no Asterisk log. Check: ssh root@89.116.31.109 'fail2ban-client status asterisk-auth'. Unban: fail2ban-client set asterisk-auth unbanip <ip>.

  3. NAT keep-alive for inbound calls. Softphones behind CGNAT/NAT need to send periodic keep-alive packets (every 25-30s) so the NAT mapping stays open and Asterisk's OPTIONS qualify round-trips. Most apps have a "NAT Keep-Alive" setting. Without it: registration succeeds but Contact Status shows NonQual with -nan RTT, and inbound calls never ring.

  4. Per-extension credentials are org-scoped. SIP User ID = <ext> (e.g., 09); Authentication ID = org_<org-prefix>__<ext> (e.g., org_moijhj2l__09). If the softphone uses just 09 as auth, Asterisk's PJSIP can't find a matching endpoint and replies No matching endpoint found (logged as 401 in PJSIP logger, but with empty AOR match).

Operations — route inspection and reboot behavior

After PR #148, kernel routes for customer LANs are auto-managed by the applier. Two things to know:

How routes flow at runtime: - Editor → POST/PATCH/DELETE /customer-tunnelsapplyWg1Config() → 1. Renders wg1.conf (peers include customer_lan_cidr in AllowedIPs) 2. Atomically writes + wg syncconf wg1 <(wg-quick strip wg1) (cryptokey layer) 3. syncCustomerLanRoutes() diffs desired vs current routes via ip -4 route show dev wg1, then ip route add/del to converge - Result is returned to caller as apply.route_sync.{added, removed, unchanged, errors} and surfaced in the Editor UI as a warning toast if any errors occurred.

Inspect routes manually:

ssh root@89.116.31.109 'ip -4 route show dev wg1'
# Expect:
#   10.20.0.0/16 proto kernel scope link src 10.20.0.1   ← wg-quick manages this
#   192.168.0.0/24 scope link                            ← V7 LAN, syncCustomerLanRoutes manages this

Reboot behavior: - ip route add is kernel runtime state (not persisted to disk) - On reboot, wg-quick@wg1.service starts before astrapbx and reads /etc/wireguard/wg1.conf - For each peer, wg-quick installs routes for the AllowedIPs entries — so 192.168.0.0/24 gets re-added automatically - Persistence is therefore at the wg1.conf level (which we DO write to disk via the applier's atomic-write protocol); the kernel routing table is rebuilt at boot from that source of truth

Recovery for V7 if route somehow goes missing without a reboot:

# Any tunnel mutation triggers route-sync; simplest is a no-op PATCH:
curl -X PATCH https://devpbx.astradial.com/api/v1/customer-tunnels/<V7-ID> \
  -H "Authorization: Bearer <token>" -H "Content-Type: application/json" \
  -d '{"notes":"trigger route resync"}'
# Or manually:
ssh root@89.116.31.109 'ip route add 192.168.0.0/24 dev wg1'

Diagnostic findings from V7 session

  • WG kernel statistics confirmed cryptokey routing rejection: cat /sys/class/net/wg1/statistics/rx_errors was 48 after V7 phone tried to register, while rx_dropped stayed 0 → confirms encrypted packets arrived but failed source-IP validation.
  • tcpdump on eth0 udp port 51821 showed 656-byte packets from 120.60.105.158 (V7's CGNAT IP) — these are encrypted SIP REGISTER attempts that get dropped post-decryption.
  • tcpdump on wg1 showed 0 packets matching the SIP filter — confirming nothing made it past the WG layer.
  • WG endpoint roaming verified: V7's apparent source IP changed from 103.197.113.158:33252 (CGNAT'd, ephemeral port) to 120.60.105.158:51820 (different CGNAT pool, listen port) mid-session without any tunnel disruption.
  • Asterisk PJSIP transports bind 0.0.0.0:5080 (UDP+TCP) and 0.0.0.0:5060 (UDP) — so destination-side wouldn't be the bottleneck if SNAT were correct.

Bug fixes shipped during V7 onboarding session

PR Bug Fix
#142 subnetAllocator.js destructured Sequelize from models registry — but registry doesn't expose it. POST /customer-tunnels 500'd with TypeError: Cannot destructure property 'Op' of 'Sequelize' as it is undefined. Import Op directly from sequelize package. Added 4 regression tests exercising DB-aware code paths with real-shape mock.
#143 CustomerTunnel.scope('withSecrets') produced SQL with preshared_key listed twice (default scope's SELECT * already had it, plus the scope's include: ['preshared_key'] added it again). mariadb driver rejected with Error in results, duplicate field name preshared_key. Caused applier to fail mid-create → tunnel marked status=disabled. Change withSecrets to attributes: { exclude: [] } (clears default exclusion) instead of trying to re-include.

Interim workaround applied for V7

After PR #143 landed, V7's customer_tunnels row had status=disabled from the earlier failed apply. Recovery:

ssh root@89.116.31.109 'mariadb -uroot pbx_api_db -e "UPDATE customer_tunnels SET status=\"active\" WHERE name=\"V7_Tirupathur\";"'
ssh root@89.116.31.109 'cd /opt/astrapbx && node -e "const { applyWg1Config } = require(\"./src/services/network/wireguardApplier\"); applyWg1Config({ models: require(\"./src/models\") }).then(r => { console.log(JSON.stringify(r, null, 2)); process.exit(0); }).catch(e => { console.error(e.message); process.exit(1); });"'
Should be turned into a proper recovery endpoint (e.g., POST /:id/retry-apply) in a future PR.

Migration verification on prod (run after Stage 2):

SHOW TABLES LIKE 'customer_tunnels';     -- expects 1 row
SHOW TABLES LIKE 'tunnel_metrics';       -- expects 1 row
SELECT COUNT(*) FROM customer_tunnels;   -- expects 0
SELECT COUNT(*) FROM tunnel_metrics;     -- expects 0
SELECT name FROM SequelizeMeta
  WHERE name LIKE '2026051212%' OR name LIKE '2026051220%';  -- expects 2 rows

Indexes confirmed on prod: customer_tunnels_org_name_unique (unique compound org_id, name), customer_tunnels_status, tunnel_subnet (unique), tunnel_metrics_tunnel_snapshot (compound tunnel_id, snapshot_at), tunnel_metrics_snapshot_at. FK auto-indexes covered by the compound unique on org_id first column — exactly the pattern that survived the MariaDB 11 gotcha.

V7 onboarding playbook (after feature ships)

1. Ops opens editor.astradial.com → V7 org → Settings → Network Tunnels
2. Click [+ Add Tunnel]
3. Name: "astradial-cloud" (default)
4. Paste V7's WireGuard public key (from GDMS — generated when V7 IT created the WG entry on GWN7002)
5. Submit
   → System allocates 10.20.7.0/30
   → System generates PSK
   → System writes peer block to wg1.conf and reloads
   → System updates V7's PJSIP endpoint match list to include 10.20.7.2/32 (via generator regen)
   → Returns success + customer-config modal
6. Click [View customer config], copy the [Peer] block
7. In GDMS → V7's network → Settings → VPN → WireGuard → Add (or Setup Wizard):
   - Interface: BSNL
   - Local IP: 10.20.7.2/30
   - Paste server config including Endpoint = 89.116.31.109:51821
   - Save & Apply
8. On cloud: `wg show wg1` should show V7's peer with a fresh handshake within ~10s
9. Verify V7's phones now register from 10.20.7.2 (visible in `pjsip show contacts`)

Operations

Daily verification

ssh root@89.116.31.109 'wg show wg1'
Look for: each active customer has a handshake within the last few minutes.

Rolling key for a customer (compromise scenario)

  1. Editor → V7 → Network Tunnels → Disable existing tunnel
  2. Delete (after operator confirms)
  3. Customer regenerates WG keys on their side (new pubkey)
  4. Operator creates a new tunnel with the new pubkey
  5. New tunnel comes up; subnet is the same (allocator picks the previously-revoked slot if 30d elapsed, else a fresh one)

Customer subnet exhaustion

Capacity is 16,384 /30s under 10.20.0.0/16. We're nowhere near this. If reached: expand to 10.21.0.0/16 etc. — generator and allocator support multiple pool CIDRs.

Rollback the feature entirely

Pre-deploy DB snapshot (taken before any prod schema change):

Item Value
Path on prod /root/pre-customer-tunnels-2026-05-12T115633Z/pbx_api_db.sql.gz
Off-box copy ~/AstradialBackups/pre-customer-tunnels-2026-05-12T115633Z/ (Hari's MacBook)
Size 290 KB gzipped
Tables captured 30 (matches live information_schema.tables count)
Verified gzip integrity OK; -- Dump completed on 2026-05-12 17:26:33 footer present; 30× CREATE TABLE
Flags used --single-transaction --quick --skip-lock-tables --routines --triggers --events
Retain until 2026-08-12 (3 months)

Steps

  1. Revert merge to main on astradial-platform (via GitHub PR)
  2. npx sequelize-cli db:migrate:undo on prod (and staging) — undoes the tunnel_metrics then customer_tunnels migrations
  3. systemctl stop wg-quick@wg1 on both VPSes
  4. Remove /etc/wireguard/wg1.conf, /etc/wireguard/wg1.private, /etc/wireguard/wg1.public, /usr/local/sbin/customer-tunnels-iptables.sh
  5. ufw delete allow 51821/udp
  6. Any active customer tunnels revert to direct-cloud registration (their phones already have this as a fallback)

If migration rollback fails — restore from snapshot:

ssh root@89.116.31.109 'pm2 stop astrapbx workflow-engine pipecat-flow editor'
ssh root@89.116.31.109 'zcat /root/pre-customer-tunnels-2026-05-12T115633Z/pbx_api_db.sql.gz | mariadb -uroot pbx_api_db'
ssh root@89.116.31.109 'pm2 restart astrapbx workflow-engine pipecat-flow editor'

RTO estimate: ~30 seconds for restore on a 290 KB dump + ~10 s for pm2 cycle = under 1 minute downtime.