Customer Tunnels (WireGuard)¶
Per-org WireGuard tunnels that give customer PBXes a stable, encrypted path into Astradial's cloud Asterisk — eliminating CGNAT, multi-WAN failover, and dynamic-IP problems at the network layer.
Managed through the editor (
editor.astradial.com), not by hand-editing config files. Per-org, version-controlled, auditable.
Why this exists¶
Customers like V7 sit behind dynamic/CGNAT public IPs and often have multi-WAN failover. From the cloud's perspective, the customer's public IP appears to rotate between several values — sometimes within seconds. This breaks SIP in well-known ways:
- NAT pinholes die and get re-bound on different ports
- Cloud Asterisk's stored contact for a phone points at an IP that no longer routes
- Inbound calls during the recovery window go to the wrong place or nowhere
- Each phone independently fights the problem
A WireGuard tunnel between the customer's site router and our cloud gives both sides a fixed, stable tunnel IP regardless of what's happening at the public-IP layer. The customer's site router (e.g., Grandstream GWN7002) initiates the tunnel; our cloud accepts it; SIP traffic flows through the tunnel and the cloud sees the customer at a stable address forever.
WireGuard is identity-based, not IP-based: when the customer's WAN flips or CGNAT rotates the public IP, the tunnel survives because authentication is by cryptographic key, not by IP.
Security model (read this first)¶
Customer tunnels handle external traffic and must be strictly isolated from Astradial's internal infrastructure.
Layered defenses¶
PUBLIC INTERNET
│
│
┌────────────────────────┼────────────────────────┐
│ │
│ CLOUD VPS (89.116.31.109) │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ wg0 (existing) │ │ wg1 (NEW) │ │
│ │ UDP 51820 │ │ UDP 51821 │ │
│ │ 10.10.10.0/24 │ │ 10.20.0.0/16 │ │
│ │ │ │ │ │
│ │ • NUC │ │ • V7 │ │
│ │ • Staging │ │ • Future cust N │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │
│ └──── iptables DROP ────┘ │
│ (no cross-traffic) │
│ │
│ Cloud services (Asterisk, AstraPBX, etc.) │
└─────────────────────────────────────────────────┘
Defense layers¶
| Layer | Mechanism | Purpose |
|---|---|---|
| 1. Separate interface | wg1 distinct from existing wg0 | Cleanly isolates customer traffic from internal infra |
| 2. Separate UDP port | wg1 listens on 51821 (not 51820) | Per-port firewall rules + audit; fail2ban distinguishes attack surfaces |
3. Crypto-tight AllowedIPs | Each peer's AllowedIPs = single /32 of their tunnel IP | WireGuard refuses spoofed source IPs |
| 4. iptables FORWARD drops | wg1↔wg0 and wg1↔wg1 are explicit DROP | Customer can't reach NUC, staging, or other customers — even if forwarding is enabled |
| 5. iptables INPUT scope | wg1 traffic only allowed to SIP/RTP ports | Customer can't reach SSH (22), AstraPBX API (8000), editor (3001), MariaDB (3306) |
6. PJSIP endpoint match | V7's PJSIP endpoints match 10.20.7.0/30 only | Customer's tunnel traffic can only reach their own org's extensions |
| 7. API-level RBAC | Tunnel CRUD requires admin role on the org | Customers can't create/modify tunnels themselves |
| 8. Private key never exposed via API | Server's WG private key stays on disk; only public key + PSK returned to UI | Compromise of editor doesn't compromise server identity |
| 9. Audit log | Every tunnel CRUD writes to audit_log table | Forensic trail of who did what |
| 10. Input validation | Pubkey format, allowed-IP CIDR, name regex all validated | Reject malformed input early |
What a customer CAN reach via their tunnel¶
✅ Cloud's wg1 tunnel IP (e.g. 10.20.7.1) — for SIP signaling/RTP only
✅ Cloud Asterisk on 5060/UDP, 5080/UDP, 10000-20000/UDP (RTP)
What a customer CANNOT reach¶
❌ NUC (10.10.10.2) — different interface + iptables DROP
❌ Staging (10.10.10.3) — different interface + iptables DROP
❌ Other customer tunnels — wg1-to-wg1 iptables DROP
❌ SSH (22) — INPUT policy on wg1
❌ AstraPBX API (8000) — INPUT policy on wg1
❌ Editor (3001) — INPUT policy on wg1
❌ MariaDB / Postgres / Redis — bind to localhost only + INPUT policy
❌ Any other org's PJSIP endpoints — PJSIP `match` is per-org /30
Architecture¶
Subnet allocation¶
| Range | Use | Notes |
|---|---|---|
10.10.10.0/24 | Internal infra (wg0) | EXISTING. NUC (.2), Staging (.3). Do not add customers here. |
10.20.0.0/16 | Customer tunnels (wg1) | New pool. Allocator picks next free /30 per customer. |
10.20.N.0/30 | Per-customer /30 | Cloud peer .1, customer peer .2. (.0 net, .3 broadcast unused.) |
Per-customer /30 layout (example: V7)¶
10.20.7.0/30:
10.20.7.0 → network address (unused)
10.20.7.1 → cloud-side tunnel IP
10.20.7.2 → customer-side tunnel IP
10.20.7.3 → broadcast (unused)
Each customer's /30 is independently allocated by the API at tunnel creation time.
Components¶
| Component | Type | Lives in |
|---|---|---|
customer_tunnels table | DB (MariaDB) | pbx_api_db |
CustomerTunnel Sequelize model | Backend | api/src/models/CustomerTunnel.js |
customer-tunnels.js routes | API | api/src/routes/customer-tunnels.js |
wireguardGenerator.js service | Backend | api/src/services/asterisk/wireguardGenerator.js |
wireguardApplier.js service | Backend | api/src/services/asterisk/wireguardApplier.js |
| Subnet allocator | Backend service | api/src/services/network/subnetAllocator.js |
| Network Tunnels UI | Frontend | editor/app/dashboard/[orgId]/settings/page.tsx (new tab) |
wg1 interface | System (per-VPS) | /etc/wireguard/wg1.conf (generated by wireguardGenerator) |
| iptables rules | System (per-VPS) | /etc/iptables/customer-tunnels.v4 (set once at bootstrap) |
Database schema¶
// api/database/migrations/YYYYMMDDhhmmss-create-customer-tunnels.js
'use strict';
module.exports = {
up: async (queryInterface, Sequelize) => {
await queryInterface.createTable('customer_tunnels', {
id: {
type: Sequelize.UUID,
defaultValue: Sequelize.UUIDV4,
primaryKey: true,
},
org_id: {
type: Sequelize.UUID,
allowNull: false,
references: { model: 'organizations', key: 'id' },
onDelete: 'CASCADE',
},
name: {
type: Sequelize.STRING(64),
allowNull: false,
},
tunnel_subnet: {
type: Sequelize.STRING(18), // CIDR (e.g. "10.20.7.0/30")
allowNull: false,
unique: true,
},
cloud_tunnel_ip: {
type: Sequelize.STRING(15),
allowNull: false,
},
customer_tunnel_ip: {
type: Sequelize.STRING(15),
allowNull: false,
},
customer_pubkey: {
type: Sequelize.STRING(64), // base64-encoded 32-byte WG pubkey
allowNull: false,
},
preshared_key: {
type: Sequelize.STRING(64),
allowNull: false,
},
persistent_keepalive: {
type: Sequelize.INTEGER,
defaultValue: 25,
},
listen_port: {
type: Sequelize.INTEGER,
defaultValue: 51821,
},
interface_name: {
type: Sequelize.STRING(16),
defaultValue: 'wg1',
},
status: {
type: Sequelize.ENUM('active', 'disabled', 'revoked'),
defaultValue: 'active',
allowNull: false,
},
notes: {
type: Sequelize.TEXT,
},
created_at: { type: Sequelize.DATE, defaultValue: Sequelize.NOW },
updated_at: { type: Sequelize.DATE, defaultValue: Sequelize.NOW },
created_by_user_id: {
type: Sequelize.UUID,
references: { model: 'org_users', key: 'id' },
onDelete: 'SET NULL',
},
});
await queryInterface.addIndex('customer_tunnels', ['org_id']);
await queryInterface.addIndex('customer_tunnels', ['status']);
await queryInterface.addIndex('customer_tunnels', ['org_id', 'name'], { unique: true });
},
down: async (queryInterface) => {
await queryInterface.dropTable('customer_tunnels');
},
};
Notes: - customer_pubkey and preshared_key are stored in plaintext, consistent with how sip_trunks.password is stored today (see v7-setup.md and existing models). Plain WireGuard pubkeys are not secrets; PSKs are. TODO: encrypt PSKs at rest as part of a broader DB-secret-encryption initiative. - org_id cascades on org deletion (tunnel automatically removed if org is deleted). - Unique on (org_id, name) prevents duplicate names within an org; unique on tunnel_subnet prevents subnet collision globally.
API surface¶
Routes live in api/src/routes/customer-tunnels.js, mounted at /api/v1/customer-tunnels. They follow the same auth + org-scoping pattern as queues.js (JWT middleware → req.user.org_id filter on every query).
| Method | Path | RBAC | Purpose |
|---|---|---|---|
GET | /api/v1/customer-tunnels | admin | List tunnels for the requesting org |
GET | /api/v1/customer-tunnels/:id | admin | Get one tunnel, including live status (last handshake, bytes transferred) |
POST | /api/v1/customer-tunnels | admin | Create — body { name, customer_pubkey, notes? }. Server allocates /30, generates PSK, writes wg1.conf, returns full record including the Peer config block for the customer to paste on their side |
PATCH | /api/v1/customer-tunnels/:id | admin | Update — { status, notes }. Switching to disabled removes peer from wg1 but keeps DB row. |
DELETE | /api/v1/customer-tunnels/:id | admin | Revoke — removes peer from wg1, marks status revoked. Subnet is NOT immediately reused (kept reserved for 30 days for audit). |
GET | /api/v1/customer-tunnels/:id/customer-config | admin | Returns the customer-side [Peer] block as plain text for copy/paste into GDMS |
Every route runs the existing JWT + RBAC middleware (requireRole('admin')), validates input via express-validator, and writes an entry to audit_log on every mutation.
Subnet allocator (api/src/services/network/subnetAllocator.js)¶
const POOL_CIDR = '10.20.0.0/16';
const PREFIX_LENGTH = 30;
async function allocateNextAvailable() {
// Get all in-use subnets ordered ascending
const used = await CustomerTunnel.findAll({
where: { status: ['active', 'disabled'] }, // revoked subnets reserved for 30d
attributes: ['tunnel_subnet'],
order: [['tunnel_subnet', 'ASC']],
});
// Walk 10.20.0.0/30, 10.20.0.4/30, ... 10.20.0.252/30, 10.20.1.0/30, ...
// First-fit: return the first /30 in pool not present in `used`
// Skip 10.20.0.0/30 (reserved, network boundary)
for (let octet3 = 0; octet3 < 256; octet3++) {
for (let octet4 = 4; octet4 < 256; octet4 += 4) { // /30 = 4 IPs aligned
const candidate = `10.20.${octet3}.${octet4}/30`;
if (!used.find((u) => u.tunnel_subnet === candidate)) {
return {
subnet: candidate,
cloud_ip: `10.20.${octet3}.${octet4 + 1}`,
customer_ip: `10.20.${octet3}.${octet4 + 2}`,
};
}
}
}
throw new Error('Subnet pool exhausted (10.20.0.0/16 fully allocated)');
}
Capacity: 10.20.0.0/16 has 16,384 /30s. We will not exhaust this in any realistic scenario.
WireGuard config generator (api/src/services/asterisk/wireguardGenerator.js)¶
Mirrors the pattern of dialplanGenerator.js. Reads all active and disabled (but with peers omitted) tunnels from DB, emits the full wg1.conf from scratch on every regeneration:
async function generateWg1Config() {
const serverPrivateKey = await readServerPrivateKey(); // /etc/wireguard/wg1.private (root-only)
const tunnels = await CustomerTunnel.findAll({
where: { status: 'active' },
order: [['created_at', 'ASC']],
});
let conf = `# AUTO-GENERATED by AstraPBX wireguardGenerator. DO NOT EDIT BY HAND.
# Source of truth: customer_tunnels table in pbx_api_db.
# Generated: ${new Date().toISOString()}
[Interface]
Address = 10.20.0.1/16
ListenPort = 51821
PrivateKey = ${serverPrivateKey}
PostUp = /usr/local/sbin/customer-tunnels-iptables.sh up
PostDown = /usr/local/sbin/customer-tunnels-iptables.sh down
`;
for (const t of tunnels) {
conf += `# org=${t.org_id} name=${t.name} created=${t.created_at.toISOString()}
[Peer]
PublicKey = ${t.customer_pubkey}
PresharedKey = ${t.preshared_key}
AllowedIPs = ${t.customer_tunnel_ip}/32
PersistentKeepalive = ${t.persistent_keepalive}
`;
}
return conf;
}
Server's WG private key is generated once during the wg1 bootstrap (see Bootstrap procedure) and stays in /etc/wireguard/wg1.private with chmod 600 root:root. Never returned via API. Never logged.
WireGuard applier (api/src/services/asterisk/wireguardApplier.js)¶
async function applyWg1() {
const conf = await generateWg1Config();
// 1. Atomic write to staging file
const tmp = '/etc/wireguard/wg1.conf.new';
await fs.writeFile(tmp, conf, { mode: 0o600 });
// 2. Backup current
const backup = `/etc/wireguard/wg1.conf.bak-${Date.now()}`;
await fs.copyFile('/etc/wireguard/wg1.conf', backup).catch(() => {});
// 3. Move into place atomically
await fs.rename(tmp, '/etc/wireguard/wg1.conf');
// 4. Hot-reload (no tunnel restart for unchanged peers)
await exec('wg syncconf wg1 <(wg-quick strip wg1)', { shell: '/bin/bash' });
// 5. Verify wg show succeeds
const { stdout } = await exec('wg show wg1');
return { applied: true, peer_count: tunnels.length, wg_status: stdout };
}
wg syncconf is a hot-reload that only changes peers that differ — existing tunnels are not disrupted when a new peer is added. Backups in /etc/wireguard/wg1.conf.bak-<ts> are retained 30 days then garbage-collected by a cron job.
Bootstrap procedure (per-VPS, one-time)¶
This is infrastructure setup, not feature code. Done once on staging during development, then once on prod before the feature ships. Documented here as a runbook.
# 1. Generate server WG keys (once per VPS)
umask 077
wg genkey > /etc/wireguard/wg1.private
wg pubkey < /etc/wireguard/wg1.private > /etc/wireguard/wg1.public
chmod 600 /etc/wireguard/wg1.private
chmod 644 /etc/wireguard/wg1.public
# 2. Write iptables helper script
cat > /usr/local/sbin/customer-tunnels-iptables.sh <<'EOF'
#!/bin/bash
# Applies iptables rules when wg1 comes up; removes them when wg1 goes down.
# Called from PostUp/PostDown in /etc/wireguard/wg1.conf.
set -e
case "$1" in
up)
# Block customer→internal infra
iptables -A FORWARD -i wg1 -o wg0 -j DROP
iptables -A FORWARD -i wg0 -o wg1 -j DROP
# Block customer→customer
iptables -A FORWARD -i wg1 -o wg1 -j DROP
# Allow only SIP+RTP into the cloud
iptables -A INPUT -i wg1 -p udp --dport 5060 -j ACCEPT
iptables -A INPUT -i wg1 -p udp --dport 5080 -j ACCEPT
iptables -A INPUT -i wg1 -p udp --dport 10000:20000 -j ACCEPT
# Drop everything else from wg1
iptables -A INPUT -i wg1 -j DROP
;;
down)
iptables -D FORWARD -i wg1 -o wg0 -j DROP || true
iptables -D FORWARD -i wg0 -o wg1 -j DROP || true
iptables -D FORWARD -i wg1 -o wg1 -j DROP || true
iptables -D INPUT -i wg1 -p udp --dport 5060 -j ACCEPT || true
iptables -D INPUT -i wg1 -p udp --dport 5080 -j ACCEPT || true
iptables -D INPUT -i wg1 -p udp --dport 10000:20000 -j ACCEPT || true
iptables -D INPUT -i wg1 -j DROP || true
;;
esac
EOF
chmod 755 /usr/local/sbin/customer-tunnels-iptables.sh
# 3. Initial empty wg1.conf (no peers yet — AstraPBX will populate via wireguardGenerator)
cat > /etc/wireguard/wg1.conf <<EOF
[Interface]
Address = 10.20.0.1/16
ListenPort = 51821
PrivateKey = $(cat /etc/wireguard/wg1.private)
PostUp = /usr/local/sbin/customer-tunnels-iptables.sh up
PostDown = /usr/local/sbin/customer-tunnels-iptables.sh down
EOF
chmod 600 /etc/wireguard/wg1.conf
# 4. Open UDP 51821 in ufw
ufw allow 51821/udp comment 'WireGuard customer tunnels'
# 5. Enable and start the service
systemctl enable --now wg-quick@wg1
# 6. Verify
wg show wg1
ip link show wg1
iptables -L FORWARD -v | grep wg1
This bootstrap is run by the deploy script during initial feature rollout (in scripts/setup/wg1-bootstrap.sh in the monorepo), with explicit --dry-run option for safety.
Editor UI¶
A new tab on the Org Settings page (editor/app/dashboard/[orgId]/settings/page.tsx):
Org Settings
├── Organization (existing)
├── Asterisk Configuration (existing)
├── Session (existing)
└── Network Tunnels (NEW)
├── List of active tunnels for this org
│ └── For each: name, subnet, last handshake, status, [⋮ actions]
├── [+ Add Tunnel] button → modal:
│ - Name (default "astradial-<orgshortname>")
│ - Customer pubkey (textarea — paste from GDMS)
│ - Notes (optional)
│ [Submit]
└── [View customer config] action → modal:
- Renders the [Peer] block for the customer to paste in GDMS
- "Copy to clipboard" button
- Pre-shared key shown with reveal/hide toggle
UI calls the new API via pbxCustomerTunnels.list(), pbxCustomerTunnels.create(), etc. — added to the existing lib/pbx/client.ts.
Staging-first rollout plan¶
| Step | What | Where | Reversible? |
|---|---|---|---|
| 1 | Run bootstrap on staging VPS | 94.136.188.221 | Yes — remove wg1 service + uninstall iptables script |
| 2 | Merge feature branch (DB migration + backend) to staging branch | GitHub PR | Yes — revert PR |
| 3 | CI deploys backend to staging VPS | auto | — |
| 4 | Run migration on staging DB | npx sequelize-cli db:migrate | Yes — db:migrate:undo |
| 5 | Merge frontend feature branch to staging | GitHub PR | Yes — revert PR |
| 6 | E2E test: create a test tunnel via UI for a test org, configure a test client (e.g., a Linux box with wireguard-tools), verify tunnel establishes | Staging | — |
| 7 | E2E test: simulate a tunnel-IP SIP registration arrives at staging Asterisk, verify endpoint match works | Staging | — |
| 8 | Run for 24h, monitor wg show wg1, monitor staging Asterisk logs for unexpected drops | Staging | — |
| 9 | Bootstrap prod VPS | 89.116.31.109 | Yes — same removal procedure |
| 10 | Merge staging → main | GitHub PR | Yes — revert PR |
| 11 | CI deploys to prod, runs migration on prod DB | auto | Yes — db:migrate:undo on prod |
| 12 | Onboard V7 as the first real customer via the UI | Editor on prod | Yes — disable tunnel + revert phones to direct cloud registration |
No prod changes until Steps 1–8 are clean on staging.
Production rollout — actual execution log¶
The order in the table above was theoretical. Actual prod rollout used a slightly different order to avoid a sequelize.sync() race (prod's server.js line 6223 still calls sync() at boot — fixed in PR #138 but not yet on main). Running migrations BEFORE the code merge means tables exist before any sync() runs, avoiding MariaDB 11 FK collision (1061) that bit us in PRs #125 and #131.
| Stage | What | Result |
|---|---|---|
| 1 | Backup prod DB | /root/pre-customer-tunnels-2026-05-12T115633Z/pbx_api_db.sql.gz, 290 KB, 30 tables, gzip verified |
| 2 | SCP migration files to /opt/astrapbx/database/migrations/, run npx sequelize-cli db:migrate | Both migrated cleanly (0.239s + 0.080s). customer_tunnels + tunnel_metrics exist, empty, SequelizeMeta updated. astrapbx pm2 process untouched (no restart). |
| 3 | Merge staging → main (PR #140, 28 commits, +7.2k LOC) | CI green: Deploy API to production 25s, Check API routes ✓. Editor deploy queued (self-hosted runner backlog, not blocking). pm2 reload astrapbx graceful — PID rotated, uptime fresh, zero dropped requests. WireGuard status poller started (60s interval) confirmed at startup. wg-poller logs [wg-poller] cycle failed: Command failed: wg show wg1 dump once per minute as expected (wg1 absent until Stage 4). CDR poller resumed at last ID 4246 (PR #139 fix verified). AMI + ARI connections re-established. /health 200. /api/v1/customer-tunnels returns 401 unauthorized (route mounted, auth middleware firing). DB unchanged: both tables still empty. CLOUD_PUBLIC_IP set on prod .env before merge so reload picked it up. |
| 4 | Run wg1-bootstrap.sh on prod VPS | --check and --dry-run clean (port 51821 free, 10.20.0.0/16 unrouted, wg0 present, all 7 pre-flight PASS). Live run exit 0 at 17:45:28 IST: keypair generated, helper installed at /usr/local/sbin/customer-tunnels-iptables.sh (mode 755), wg1.conf written (mode 600, interface block + 0 peers), ufw allow 51821/udp added, wg-quick@wg1 enabled+active. Verification: wg show wg1 returns 0 peers / port 51821, ip a wg1 shows UP at 10.20.0.1/16, iptables FORWARD has 3 DROP rules (wg1↔wg0 + wg1↔wg1), iptables INPUT has 3 ACCEPT (5060/5080/10000-20000) + 1 catch-all DROP, syslog wg1-iptables logged all 7 rule additions + "up: complete". wg-poller transitioned silently — error log mtime frozen at 17:44:52, no new failures after wg1 came up. wg0 untouched (NUC peer still handshaking ~20s fresh). Asterisk untouched (0 channels during run). Server public key: 0Dfkqmj3UFLCN4mmG+Cp2j7VfP4J75iOyA+AZUxKQng= (paste into customer router peer config). Transcript: /var/log/wg1-bootstrap-20260512-174526.log on prod. |
| 5 | ~~Set CLOUD_PUBLIC_IP=89.116.31.109 in prod .env~~ | DONE (folded into Stage 3 pre-flight) |
| 6 | Onboard V7 as first real customer via Editor UI | FULLY WORKING after PR #146 + PR #148. Tunnel V7_Tirupathur created (subnet 10.20.0.0/30, customer pubkey ylkqY4S7ahWWAD3L2m10dv2r+TRflPupWxCwfZ51hAI=, customer_lan_cidr 192.168.0.0/24 set via Editor "Edit Tunnel" dialog). Reception phone (ext 09) registers via tunnel from 192.168.0.76 → 10.20.0.1:5080, RTT ~172ms, active call confirmed (Endpoint: org_moijhj2l__09/09 Ringing 1 of inf). Endpoint roamed 103.197.113.158:33252 → 120.60.105.158:51820 mid-session (WG roaming verified ✓). Failover policy "BSNL-Primary-Rail-Backup" configured in GDMS Internet Source. Mac softphone (org_mo8vbv60__1003 + org_moijhj2l__01) registered via public path on UDP transport — TCP transport failed qualify (transport mismatch with V7 endpoints which use transport-udp). |
Architectural issues found during V7 onboarding (all resolved)¶
| # | Issue | Severity | Resolution |
|---|---|---|---|
| 1 | Per-customer cloud_tunnel_ip varies by /30 but wg1 only binds 10.20.0.1 | P0 — would have blocked customer #2 | PR #145 — Editor customer-config now always returns 10.20.0.1 regardless of the customer's /30 |
| 2 | Customer-config recommended 89.116.31.109/32 in AllowedIPs causing GWN7002 routing-loop rejection | P1 — blocked any Grandstream-router customer | PR #144 — cloud_routed_ips defaults to empty; only the tunnel-side IP is added |
| 3 | No SNAT/MASQUERADE on customer side — server-side WG cryptokey routing rejected packets from customer LAN | P0 — blocked all phone registrations via tunnel | PR #146 — new customer_lan_cidr field, validated server-side, expanded into server peer's AllowedIPs; Editor "Edit Tunnel" UI to set it post-create |
| 4 | wg syncconf updates AllowedIPs but NOT kernel routing table — responses to customer LAN went out eth0 instead of wg1 | P0 — found mid-session, manually unblocked via ip route add 192.168.0.0/24 dev wg1 | PR #148 — syncCustomerLanRoutes in the applier auto-manages kernel routes on every tunnel apply (add/remove with proper diff, idempotent, defense-in-depth against shell injection) |
Softphone gotchas (operator-level)¶
Discovered during V7 onboarding. Worth checking first when a softphone fails to register or shows "qualify failed":
-
SIP Transport mismatch. Astradial's PJSIP endpoints (per-org PJSIP confs) are configured with
transport=transport-udp. If a softphone client defaults to TCP (Telephone.app on macOS does this on first install), the REGISTER may succeed butqualify(OPTIONS keepalive) fails with no clear error — endpoint showsUnavailableeven though the contact is in the AOR. Fix: set SIP Transport to UDP in the softphone's account settings. -
fail2ban bans after repeated wrong-auth attempts. Three consecutive failed auths trip
asterisk-authjail and the source IP gets blocked at iptables (BEFORE Asterisk sees subsequent packets). Symptom: registration just goes silent, no Asterisk log. Check:ssh root@89.116.31.109 'fail2ban-client status asterisk-auth'. Unban:fail2ban-client set asterisk-auth unbanip <ip>. -
NAT keep-alive for inbound calls. Softphones behind CGNAT/NAT need to send periodic keep-alive packets (every 25-30s) so the NAT mapping stays open and Asterisk's OPTIONS qualify round-trips. Most apps have a "NAT Keep-Alive" setting. Without it: registration succeeds but
Contact StatusshowsNonQualwith-nanRTT, and inbound calls never ring. -
Per-extension credentials are org-scoped. SIP User ID =
<ext>(e.g.,09); Authentication ID =org_<org-prefix>__<ext>(e.g.,org_moijhj2l__09). If the softphone uses just09as auth, Asterisk's PJSIP can't find a matching endpoint and repliesNo matching endpoint found(logged as 401 in PJSIP logger, but with empty AOR match).
Operations — route inspection and reboot behavior¶
After PR #148, kernel routes for customer LANs are auto-managed by the applier. Two things to know:
How routes flow at runtime: - Editor → POST/PATCH/DELETE /customer-tunnels → applyWg1Config() → 1. Renders wg1.conf (peers include customer_lan_cidr in AllowedIPs) 2. Atomically writes + wg syncconf wg1 <(wg-quick strip wg1) (cryptokey layer) 3. syncCustomerLanRoutes() diffs desired vs current routes via ip -4 route show dev wg1, then ip route add/del to converge - Result is returned to caller as apply.route_sync.{added, removed, unchanged, errors} and surfaced in the Editor UI as a warning toast if any errors occurred.
Inspect routes manually:
ssh root@89.116.31.109 'ip -4 route show dev wg1'
# Expect:
# 10.20.0.0/16 proto kernel scope link src 10.20.0.1 ← wg-quick manages this
# 192.168.0.0/24 scope link ← V7 LAN, syncCustomerLanRoutes manages this
Reboot behavior: - ip route add is kernel runtime state (not persisted to disk) - On reboot, wg-quick@wg1.service starts before astrapbx and reads /etc/wireguard/wg1.conf - For each peer, wg-quick installs routes for the AllowedIPs entries — so 192.168.0.0/24 gets re-added automatically - Persistence is therefore at the wg1.conf level (which we DO write to disk via the applier's atomic-write protocol); the kernel routing table is rebuilt at boot from that source of truth
Recovery for V7 if route somehow goes missing without a reboot:
# Any tunnel mutation triggers route-sync; simplest is a no-op PATCH:
curl -X PATCH https://devpbx.astradial.com/api/v1/customer-tunnels/<V7-ID> \
-H "Authorization: Bearer <token>" -H "Content-Type: application/json" \
-d '{"notes":"trigger route resync"}'
# Or manually:
ssh root@89.116.31.109 'ip route add 192.168.0.0/24 dev wg1'
Diagnostic findings from V7 session¶
- WG kernel statistics confirmed cryptokey routing rejection:
cat /sys/class/net/wg1/statistics/rx_errorswas48after V7 phone tried to register, whilerx_droppedstayed0→ confirms encrypted packets arrived but failed source-IP validation. - tcpdump on
eth0 udp port 51821showed 656-byte packets from120.60.105.158(V7's CGNAT IP) — these are encrypted SIP REGISTER attempts that get dropped post-decryption. - tcpdump on
wg1showed 0 packets matching the SIP filter — confirming nothing made it past the WG layer. - WG endpoint roaming verified: V7's apparent source IP changed from
103.197.113.158:33252(CGNAT'd, ephemeral port) to120.60.105.158:51820(different CGNAT pool, listen port) mid-session without any tunnel disruption. - Asterisk PJSIP transports bind
0.0.0.0:5080(UDP+TCP) and0.0.0.0:5060(UDP) — so destination-side wouldn't be the bottleneck if SNAT were correct.
Bug fixes shipped during V7 onboarding session¶
| PR | Bug | Fix |
|---|---|---|
| #142 | subnetAllocator.js destructured Sequelize from models registry — but registry doesn't expose it. POST /customer-tunnels 500'd with TypeError: Cannot destructure property 'Op' of 'Sequelize' as it is undefined. | Import Op directly from sequelize package. Added 4 regression tests exercising DB-aware code paths with real-shape mock. |
| #143 | CustomerTunnel.scope('withSecrets') produced SQL with preshared_key listed twice (default scope's SELECT * already had it, plus the scope's include: ['preshared_key'] added it again). mariadb driver rejected with Error in results, duplicate field name preshared_key. Caused applier to fail mid-create → tunnel marked status=disabled. | Change withSecrets to attributes: { exclude: [] } (clears default exclusion) instead of trying to re-include. |
Interim workaround applied for V7¶
After PR #143 landed, V7's customer_tunnels row had status=disabled from the earlier failed apply. Recovery:
ssh root@89.116.31.109 'mariadb -uroot pbx_api_db -e "UPDATE customer_tunnels SET status=\"active\" WHERE name=\"V7_Tirupathur\";"'
ssh root@89.116.31.109 'cd /opt/astrapbx && node -e "const { applyWg1Config } = require(\"./src/services/network/wireguardApplier\"); applyWg1Config({ models: require(\"./src/models\") }).then(r => { console.log(JSON.stringify(r, null, 2)); process.exit(0); }).catch(e => { console.error(e.message); process.exit(1); });"'
POST /:id/retry-apply) in a future PR. Migration verification on prod (run after Stage 2):
SHOW TABLES LIKE 'customer_tunnels'; -- expects 1 row
SHOW TABLES LIKE 'tunnel_metrics'; -- expects 1 row
SELECT COUNT(*) FROM customer_tunnels; -- expects 0
SELECT COUNT(*) FROM tunnel_metrics; -- expects 0
SELECT name FROM SequelizeMeta
WHERE name LIKE '2026051212%' OR name LIKE '2026051220%'; -- expects 2 rows
Indexes confirmed on prod: customer_tunnels_org_name_unique (unique compound org_id, name), customer_tunnels_status, tunnel_subnet (unique), tunnel_metrics_tunnel_snapshot (compound tunnel_id, snapshot_at), tunnel_metrics_snapshot_at. FK auto-indexes covered by the compound unique on org_id first column — exactly the pattern that survived the MariaDB 11 gotcha.
V7 onboarding playbook (after feature ships)¶
1. Ops opens editor.astradial.com → V7 org → Settings → Network Tunnels
2. Click [+ Add Tunnel]
3. Name: "astradial-cloud" (default)
4. Paste V7's WireGuard public key (from GDMS — generated when V7 IT created the WG entry on GWN7002)
5. Submit
→ System allocates 10.20.7.0/30
→ System generates PSK
→ System writes peer block to wg1.conf and reloads
→ System updates V7's PJSIP endpoint match list to include 10.20.7.2/32 (via generator regen)
→ Returns success + customer-config modal
6. Click [View customer config], copy the [Peer] block
7. In GDMS → V7's network → Settings → VPN → WireGuard → Add (or Setup Wizard):
- Interface: BSNL
- Local IP: 10.20.7.2/30
- Paste server config including Endpoint = 89.116.31.109:51821
- Save & Apply
8. On cloud: `wg show wg1` should show V7's peer with a fresh handshake within ~10s
9. Verify V7's phones now register from 10.20.7.2 (visible in `pjsip show contacts`)
Operations¶
Daily verification¶
Look for: each active customer has a handshake within the last few minutes.Rolling key for a customer (compromise scenario)¶
- Editor → V7 → Network Tunnels → Disable existing tunnel
- Delete (after operator confirms)
- Customer regenerates WG keys on their side (new pubkey)
- Operator creates a new tunnel with the new pubkey
- New tunnel comes up; subnet is the same (allocator picks the previously-revoked slot if 30d elapsed, else a fresh one)
Customer subnet exhaustion¶
Capacity is 16,384 /30s under 10.20.0.0/16. We're nowhere near this. If reached: expand to 10.21.0.0/16 etc. — generator and allocator support multiple pool CIDRs.
Rollback the feature entirely¶
Pre-deploy DB snapshot (taken before any prod schema change):
| Item | Value |
|---|---|
| Path on prod | /root/pre-customer-tunnels-2026-05-12T115633Z/pbx_api_db.sql.gz |
| Off-box copy | ~/AstradialBackups/pre-customer-tunnels-2026-05-12T115633Z/ (Hari's MacBook) |
| Size | 290 KB gzipped |
| Tables captured | 30 (matches live information_schema.tables count) |
| Verified | gzip integrity OK; -- Dump completed on 2026-05-12 17:26:33 footer present; 30× CREATE TABLE |
| Flags used | --single-transaction --quick --skip-lock-tables --routines --triggers --events |
| Retain until | 2026-08-12 (3 months) |
Steps
- Revert merge to
mainonastradial-platform(via GitHub PR) npx sequelize-cli db:migrate:undoon prod (and staging) — undoes thetunnel_metricsthencustomer_tunnelsmigrationssystemctl stop wg-quick@wg1on both VPSes- Remove
/etc/wireguard/wg1.conf,/etc/wireguard/wg1.private,/etc/wireguard/wg1.public,/usr/local/sbin/customer-tunnels-iptables.sh ufw delete allow 51821/udp- Any active customer tunnels revert to direct-cloud registration (their phones already have this as a fallback)
If migration rollback fails — restore from snapshot:
ssh root@89.116.31.109 'pm2 stop astrapbx workflow-engine pipecat-flow editor'
ssh root@89.116.31.109 'zcat /root/pre-customer-tunnels-2026-05-12T115633Z/pbx_api_db.sql.gz | mariadb -uroot pbx_api_db'
ssh root@89.116.31.109 'pm2 restart astrapbx workflow-engine pipecat-flow editor'
RTO estimate: ~30 seconds for restore on a 290 KB dump + ~10 s for pm2 cycle = under 1 minute downtime.
Related docs¶
- V7 — Network Architecture & Resilience — the customer scenario that drove this feature
- NUC WireGuard (existing wg0) — the pattern this extends
- Multi-Tenant Architecture — org isolation model
- Network & Security — overall network topology
- Org Management — where this fits in the broader admin UX