V7 — Network Architecture & Resilience Plan¶
Companion to v7-setup.md. Captures the real network architecture at V7 (different from the original target), the multi-WAN + CGNAT challenges that came out of debugging, and the resilience plan (phone-side + WireGuard tunnel).
Read this first before doing anything network-related at V7. The original v7-setup.md describes the architectural target; this doc describes the architectural reality.
Status¶
| Area | State |
|---|---|
| Multi-WAN architecture identified | ✅ BSNL (port 3) + Rail/RailTel (port 4) via GWN7002 |
| CGNAT confirmed on both ISPs | ✅ V7 WAN IPs are private (192.168.x.x); public IPs are carrier-controlled |
| Phone-side resilience settings | 🟡 Applied on ext 108 (192.168.0.69) — pending rollout to other 17 |
| TCP transport attempt | ❌ Tried, broke calls, reverted to UDP. Root cause likely endpoint config bound to UDP transport |
| Server-side AOR tuning | ⚠️ Deferred — should go via dialplanGenerator + per-org profile, not per-org config edit |
| WireGuard tunnel plan | 📋 Designed, not yet implemented |
| GDMS Cloud remote access | ✅ gdms.cloud/gwn — V7 org accessible remotely |
| BSNL number strategy | ✅ Decided: keep with CFA (*21*08065978007#) forwarding to Tata DID |
| Public number | ✅ +918065978007 (Tata DID, already routes end-to-end) |
| Grandstream + BSNL meeting | 🚫 Cancel — no longer needed under new architecture |
Architectural reality (what's actually deployed)¶
Network topology¶
ISP-A: BSNL fibre
│ CGNAT — public IP managed by carrier (e.g. 103.197.113.158)
│ WAN IP (private): 192.168.101.43
▼
GWN7002 router (port 3)
┌─→ LAN 192.168.0.0/24
GWN7002 router (port 4) │ ├── UCM6301 (.60)
▲ │ ├── GRP2636 front desk (.76 — ext 09)
│ WAN IP (private): 192.168.1.33 ├── GHP621 hotel-room phones (.62–.81)
│ CGNAT — public IP (e.g. 59.93.255.93) │ ├── GRP2602P common-area phones
ISP-B: RailTel (Indian Railways) ─┘ ├── 2× GWN7802 switches
└── 6× WiFi APs (V7 HOTEL + THIRUPATHI BHIMAS SSIDs)
What "IP changing" actually means at V7¶
There is no single dynamic IP rotating. There are two ISPs, each with their own CGNAT public IP, and the GWN7002 fails over between them. When we saw V7 appear at 103.197.113.158 then 59.93.255.93:
103.197.113.158≈ Rail CGNAT public IP59.93.255.93≈ BSNL CGNAT public IP
Plus, each carrier may rotate their CGNAT pool IPs independently. So V7's public-facing IP can change for two distinct reasons:
- GWN7002 fails over between BSNL and Rail (multi-WAN behavior)
- The carrier rotates the CGNAT pool (carrier-side, V7 has zero control)
Both can happen multiple times per day. This is why phone-side keepalive alone is necessary but not sufficient.
SIP/PBX architectural drift from v7-setup.md target¶
The v7-setup.md "Architecture target" section describes the UCM6301 hosting all extensions internally with one SIP trunk to cloud. In reality, almost all hotel phones register directly to cloud, bypassing the UCM as a trunk bridge.
Evidence (live pjsip show endpoints output): - Multiple extensions (04, 09, 102, 103, 105, 106, 108, 109, 110, 111) register individually to devsip.astradial.com:5080 - Each phone arrives from a different NAT'd port (random high port from carrier CGNAT) - The UCM trunk (org_moijhj2l_trunk1777732151626) exists and is registered, but is NOT used as the trunk for these phones - Some extensions had multiple stale contacts from old CGNAT IPs accumulated (zombie contacts)
Result: 18+ phones × WAN problems instead of 1 trunk × WAN problem. Each phone independently fights CGNAT pinholes, multi-WAN failover, and IP rotation.
Phone inventory (from GDMS Zero Config)¶
24 Grandstream devices discovered, 0 currently registered to UCM (all bypass it):
| Model | Count | Purpose | Firmware |
|---|---|---|---|
| GHP621 | ~14 | Hotel-room hospitality phones (basic, no network-change detection) | 1.0.1.75 |
| GRP2602P | 4 | Reception / restaurant / kitchen / common-area phones | 1.0.5.55, 1.0.7.64 |
| GRP2636 | 1 | Front desk, ext 09 (MAC 000EC4F5E975 at 192.168.0.76) | 1.0.13.31 |
Per-extension IP mapping (V7-confirmed):
| Ext | User | LAN IP |
|---|---|---|
| 01 | Restaurant | (TBD — update needed) |
| 02 | Ocloud CAFE | 192.168.0.81 |
| 03 | Kitchen | 192.168.0.62 |
| 04 | Purchase | 192.168.0.73 |
| 05 | Housekeeping | 192.168.0.78 |
| 07 | MD | 192.168.0.80 |
| 09 | Reception | 192.168.0.76 |
| 101 | Rooms | 192.168.0.63 |
| 102 | Rooms | 192.168.0.77 |
| 103 | Rooms | 192.168.0.64 |
| 104 | Rooms | 192.168.0.65 |
| 105 | Rooms | 192.168.0.66 |
| 106 | Rooms | 192.168.0.67 |
| 107 | Rooms | 192.168.0.68 |
| 108 | Rooms | 192.168.0.69 |
| 109 | Rooms | 192.168.0.70 |
| 110 | Rooms | 192.168.0.71 |
| 111 | Rooms | 192.168.0.72 |
All phones use static IPs (no DHCP) on the LAN. Gateway is 192.168.0.60 (UCM) — but phones connect through it as L3 only; UCM does not handle their SIP registration.
Resilience approach — three layers¶
Layer 1: Phone-side keepalive (immediate, no prod changes)¶
Applied on each phone via web admin (or rolled out via UCM Zero Config). The settings target fast recovery from CGNAT rotation / WAN failover by keeping NAT pinholes alive and re-registering quickly.
Account 1 → SIP Settings → General:
| Setting | Value | Why |
|---|---|---|
| REGISTER Expiration | 30 | Re-register every 30s for fast recovery on IP change |
| Re-Register before Expiration | 10 | Proactive refresh 10s before expiry |
| Registration Retry Wait Time | 5 | Retry every 5s on failure (was 20s) |
| Enable OPTIONS Keep-Alive | ✅ Enable | Holds NAT pinhole open + fast failure detection |
| OPTIONS Keep-Alive Interval | 15 | Probes faster than most CGNAT timeouts (~30s) |
| OPTIONS Keep-Alive Max Retries | 3 | Default fine |
| SUBSCRIBE for Registration | OFF | Not needed, adds noise without IP-change benefit |
| SIP Transport | UDP | TCP attempted, calls broke — see below |
Phone Settings → Basic Settings:
| Setting | Value |
|---|---|
| Keep-Alive Interval | 15 |
| STUN Server | stun.l.google.com:19302 |
| Use Random Port | ✅ |
Effect: After any WAN-side IP change, the phone re-registers from the new public IP within ~15–30 seconds. Outbound calls work immediately; inbound calls work after first OPTIONS round-trip completes.
Limitation without server-side complement: stale contacts from prior IP accumulate on cloud (max_contacts > 1 by default). Asterisk may still try the old contact first for ~60s after a flip → intermittent inbound failures during that window.
Layer 2: Server-side AOR tuning (deferred to per-org profile system)¶
The natural complement to phone-side keepalive would be:
; in V7's AOR sections
minimum_expiration = 30
maximum_expiration = 30
qualify_frequency = 30
max_contacts = 1
remove_existing = yes
Decision: Do NOT apply per-org by direct config-file edit. Instead, build a resilience-profile system in dialplanGenerator.js:
- Profile types:
default/dynamic-ip/multi-wan/mobile-friendly - Per-org assignment stored in DB (or JSON during transition)
- Editor UI to flip profile per org
- Regeneration triggers reload
This avoids the "can't version-control per-org hand edits" problem and lets battery-sensitive orgs opt out. Not built yet. V7 runs without Layer 2 for now; accepts the ~30–60s zombie-contact window after each WAN flip.
Layer 3: WireGuard tunnel (planned)¶
Eliminates CGNAT entirely from the SIP path by giving cloud Asterisk a stable view of V7 at a fixed tunnel IP.
Architecture¶
GWN7002 (V7) ──WireGuard tunnel (over BSNL WAN)──→ Astradial Cloud
tunnel IP: 10.30.7.2 tunnel IP: 10.30.7.1
subnet: 10.30.7.0/30 UDP port: 51820
Cloud Asterisk sees V7 SIP traffic arriving from 10.30.7.2 ALWAYS,
regardless of which ISP CGNAT IP V7 is currently behind.
Why this works despite single-WAN binding limitation¶
GDMS WireGuard config only allows binding to ONE WAN (BSNL or Rail — no "Any"/"Auto"). The tunnel must pick one. But WireGuard is identity-based, not IP-based:
| Failure scenario | Tunnel behavior |
|---|---|
| BSNL's CGNAT public IP rotates | ✅ Tunnel survives — WG only checks crypto identity |
| BSNL has packet loss / slowdown | ✅ Tunnel survives — WG retries |
| BSNL goes briefly down (<10s) | ⚠️ Tunnel pauses, reconnects |
| BSNL completely down for minutes | ❌ Tunnel dies. Need fallback. |
For the BSNL-fully-down case: phones use dual SIP accounts — Account 1 via tunnel, Account 2 direct to cloud over whichever WAN is up. When BSNL dies, Account 2 takes over via Rail.
Tunnel subnet plan¶
Existing WireGuard subnets at Astradial: - NUC tunnel: 10.10.10.0/24 (cloud .1, NUC .2) — already in use
V7 allocation: 10.30.7.0/30 point-to-point (4 IPs, only 2 usable) - Cloud peer: 10.30.7.1 - V7 peer: 10.30.7.2
Pattern for future customer tunnels: 10.30.N.0/30 where N is a per-customer identifier.
Cloud-side setup steps (NOT YET EXECUTED — requires explicit "yes proceed")¶
Step 0 — Read existing config (read-only)
ssh root@89.116.31.109 'cat /etc/wireguard/wg0.conf'
ssh root@89.116.31.109 'wg show'
ssh root@89.116.31.109 'ss -unlp | grep wireguard'
Step 1 — Generate keys for V7 peer
ssh root@89.116.31.109
cd /etc/wireguard
umask 077
wg genkey | tee v7_private.key | wg pubkey > v7_public.key
wg genpsk > v7_preshared.key
Step 2 — Backup + add V7 peer
Add to /etc/wireguard/wg0.conf:
# === V7 (VSEVEN HOTELS) peer ===
[Peer]
PublicKey = <V7_public_key_from_GDMS>
PresharedKey = <generated_psk>
AllowedIPs = 10.30.7.2/32
PersistentKeepalive = 25
# No Endpoint — V7 is behind CGNAT and initiates the connection to us
Step 3 — Hot-reload
Step 4 — Update V7 endpoints in /etc/asterisk/pjsip_vseven_hotels.conf - Backup first - Adjust match lines to include 10.30.7.0/30 - pjsip reload
Rollback (if needed)
cp /etc/wireguard/wg0.conf.bak-<date>-pre-v7 /etc/wireguard/wg0.conf
wg syncconf wg0 <(wg-quick strip wg0)
# Similar restore for pjsip config
asterisk -rx "pjsip reload"
Router-side setup (via GDMS)¶
GDMS → Settings → VPN → WireGuard® → Add (or use Setup Wizard)
Form values: | Field | Value | |---|---| | Name | astradial-cloud | | Status | Enable (after full setup) | | Interface | BSNL (only single-WAN binding available) | | Listening Port | 51820 | | Local IP / Mask | 10.30.7.2 / 30 | | Private Key | (pre-generated by GDMS, leave alone) | | Public Key | Copy this — needed for cloud's peer block | | MTU | 1420 |
Peer block (separate step after Save): | Field | Value | |---|---| | Peer Public Key | (cloud's WireGuard public key, get from wg show wg0) | | Pre-Shared Key | (from cloud's v7_preshared.key) | | Allowed IPs | 10.30.7.1/32, 89.116.31.109/32 | | Endpoint | 89.116.31.109:51820 | | Persistent Keepalive | 25 |
Investigation captured during planning¶
GDMS WireGuard config form was inspected. Key findings: - Keys auto-generated by GDMS (no need to manually generate router-side) - Interface (WAN binding) only offers BSNL or Rail — no "Any" or "Auto" option - Peer/endpoint configuration NOT visible in initial Add form — appears in a separate step or via Setup Wizard - Setup Wizard offers four protocols: OpenVPN, WireGuard, IPSec (Site-to-Site), PPTP - IPSec Site-to-Site explicitly advertises "auto-rebuild on WAN IP change" — worth considering as alternative if WireGuard implementation hits limits, but our cloud already runs WG so default to WG
Layer 4: UCM bridge architecture (deferred, not chosen for this round)¶
The "ideal" architecture from v7-setup.md was UCM as the trunk bridge: all 18 phones register locally to UCM (over LAN — no NAT, no CGNAT issues), UCM has one trunk to cloud. This means 1 trunk × WAN problem instead of 18.
Considered and deferred for this iteration because: - Requires creating extensions on UCM for all 18 phones (currently 0 registered) - Requires Zero Config template rollout - Trunk-to-trunk gating issue (from v7-setup.md) needs Grandstream-installer resolution - Phone-side fix (Layer 1) covers ~80% of the problem at lower effort
Will re-evaluate if WireGuard tunnel + phone-side keepalive isn't enough.
Number strategy (decided)¶
| Question | Decision | Rationale |
|---|---|---|
| Primary public number | +918065978007 (Tata DID via Astradial) | Already routes end-to-end via cloud; bypasses all BSNL FXO/SIP-trunk complexity |
What to do with 04175295093 (BSNL printed number) | Keep with CFA — *21*08065978007# activated on BSNL line | Preserves printed-number recognition for guests/banners; forwards to working path |
| Pending decisions | Local Number Portability (port 04175295093 → Tata) — future consideration | Eliminates BSNL dependency over time |
What we stopped doing¶
The original v7-setup.md had several pending items that became moot with the new architecture:
- ❌ Grandstream installer meeting about UCM trunk-to-trunk gating — no longer needed
- ❌ BSNL engineer meeting about SIP-over-Ethernet upgrade — no longer needed
- ❌ DBC ONT admin password hunt — no longer needed (ONT not in voice path)
- ❌ UCM
BSNL-AstraDialinbound route debugging — replaced by CFA + Tata DID flow
Cancel/repurpose external meetings.
TCP transport attempt (failed — keep notes for future)¶
Switched ext 108 from UDP to TCP. Registration succeeded but outbound calls failed. After revert to UDP, calls work normally.
Root cause not investigated yet, but most likely: V7's PJSIP endpoint config in /etc/asterisk/pjsip_vseven_hotels.conf has explicit transport = transport-udp. When phone registers over TCP, the contact stores ;transport=TCP, but Asterisk uses the endpoint's bound transport for outbound INVITE → transport/contact mismatch → INVITE never reaches phone.
To investigate later if we want TCP: 1. Read endpoint config in /etc/asterisk/pjsip_vseven_hotels.conf 2. Remove explicit transport binding OR add a separate TCP transport endpoint 3. Test with one phone first
Cloud confirmed to listen on TCP/5080:
Remote-access tooling¶
| Tool | Purpose | URL / access |
|---|---|---|
| GDMS Cloud | Remote GWN7002 / UCM / phone management | https://www.gdms.cloud/gwn — V7 organization |
| GDMS — Devices | View / config router, phones, switches, APs | Same |
| GDMS — Settings → Internet | WAN priority, failover mode, health graphs | Same |
| GDMS — Settings → VPN | WireGuard / IPSec / OpenVPN setup | Same |
| GDMS — Settings → Firewall & Security | SIP ALG (look in Advanced Security Settings) | Same |
| Astradial Cloud Asterisk | SIP server, CDR, recording | ssh root@89.116.31.109 |
Operational notes¶
Verifying phone registrations¶
Look for: - Each ext should have ONE contact (multiple = zombies from prior WAN flips) - Status should be Avail (not Unavail) - Transport should be UDP (TCP didn't work)
After WireGuard goes live, verifying tunnel¶
ssh root@89.116.31.109 'wg show wg0'
# Should show V7 peer with recent handshake and traffic
ssh root@89.116.31.109 'ping -c 3 10.30.7.2'
# Should succeed once tunnel is up
Watching WAN flips¶
On GDMS → Settings → Internet — the WAN Health graph shows green/red segments over the last 12h. Frequent red segments on either WAN = unstable line, worth investigating from V7 IT side.
Open questions / pending items¶
- [ ] Confirm whether GDMS WireGuard form has a separate Peer-configuration step (probably after Save, or via Setup Wizard) — needed to complete the tunnel
- [ ] Identify the more stable WAN between BSNL and Rail from GDMS Internet → WAN Health history (12h+ data)
- [ ] Decide whether to upgrade to dual-tunnel (Layer 3 Option B) if single-tunnel BSNL outages become painful
- [ ] Build the per-org resilience profile system in
dialplanGenerator.js(Layer 2) - [ ] Confirm SIP ALG state on GWN7002 — Firewall → Advanced Security Settings (suspected enabled, must disable)
- [ ] Roll Layer 1 phone-side settings from ext 108 to other 17 phones (manual or Zero Config template)
- [ ] Update v7-setup.md status board to reflect the deprecated BSNL-FXO path and new Tata-DID-primary path
Decisions log¶
| Decision | Outcome | Date |
|---|---|---|
Drop BSNL FXO integration; use Tata DID +918065978007 as primary number | Done conceptually; CFA activation pending V7 IT | This iteration |
| Keep BSNL line active with CFA forwarding to Tata DID | Preserves printed number with zero work | This iteration |
| Cancel Grandstream + BSNL meeting | No longer needed under new architecture | This iteration |
| Defer UCM bridge architecture; use direct cloud registration | Lower effort; phone-side keepalive + WireGuard cover most issues | This iteration |
| Apply phone-side keepalive resilience settings, starting with ext 108 | In progress | This iteration |
| Don't apply per-org server-side AOR edits; build profile system in generator instead | Avoids "can't version-control per-org hand edits" | This iteration |
| Use WireGuard for V7 tunnel (matches existing infra) over IPSec | WG keys are easier to manage; cloud already has WG service | This iteration |
| Single-tunnel on BSNL with dual-account fallback (not dual-tunnel) | Single tunnel covers ~99% of cases at lower complexity | This iteration |
| TCP transport attempt → revert to UDP | Endpoint config likely UDP-bound; investigation deferred | This iteration |
Related docs¶
- V7 — Master setup info — Status board, credentials, equipment, contacts (the original master doc; treats V7 as single-WAN)
- V7 — Meeting Prep Messages — Archived; the Grandstream/BSNL meeting is cancelled
- V7 — Meeting Brief (Grandstream + BSNL) — Archived; superseded by Tata-DID-primary architecture
- Fail2Ban Runbook — V7's CIDRs are whitelisted there
- Troubleshooting — Error 55 (customer IP change → fail2ban storm) covers a closely related issue