Skip to content

V7 — Network Architecture & Resilience Plan

Companion to v7-setup.md. Captures the real network architecture at V7 (different from the original target), the multi-WAN + CGNAT challenges that came out of debugging, and the resilience plan (phone-side + WireGuard tunnel).

Read this first before doing anything network-related at V7. The original v7-setup.md describes the architectural target; this doc describes the architectural reality.

Status

Area State
Multi-WAN architecture identified ✅ BSNL (port 3) + Rail/RailTel (port 4) via GWN7002
CGNAT confirmed on both ISPs ✅ V7 WAN IPs are private (192.168.x.x); public IPs are carrier-controlled
Phone-side resilience settings 🟡 Applied on ext 108 (192.168.0.69) — pending rollout to other 17
TCP transport attempt ❌ Tried, broke calls, reverted to UDP. Root cause likely endpoint config bound to UDP transport
Server-side AOR tuning ⚠️ Deferred — should go via dialplanGenerator + per-org profile, not per-org config edit
WireGuard tunnel plan 📋 Designed, not yet implemented
GDMS Cloud remote access gdms.cloud/gwn — V7 org accessible remotely
BSNL number strategy ✅ Decided: keep with CFA (*21*08065978007#) forwarding to Tata DID
Public number +918065978007 (Tata DID, already routes end-to-end)
Grandstream + BSNL meeting 🚫 Cancel — no longer needed under new architecture

Architectural reality (what's actually deployed)

Network topology

              ISP-A: BSNL fibre
                 │ CGNAT — public IP managed by carrier (e.g. 103.197.113.158)
                 │ WAN IP (private): 192.168.101.43
              GWN7002 router (port 3)
                                            ┌─→ LAN 192.168.0.0/24
              GWN7002 router (port 4)       │   ├── UCM6301 (.60)
                 ▲                          │   ├── GRP2636 front desk (.76 — ext 09)
                 │ WAN IP (private): 192.168.1.33  ├── GHP621 hotel-room phones (.62–.81)
                 │ CGNAT — public IP (e.g. 59.93.255.93)            │   ├── GRP2602P common-area phones
              ISP-B: RailTel (Indian Railways)  ─┘   ├── 2× GWN7802 switches
                                                    └── 6× WiFi APs (V7 HOTEL + THIRUPATHI BHIMAS SSIDs)

What "IP changing" actually means at V7

There is no single dynamic IP rotating. There are two ISPs, each with their own CGNAT public IP, and the GWN7002 fails over between them. When we saw V7 appear at 103.197.113.158 then 59.93.255.93:

  • 103.197.113.158 ≈ Rail CGNAT public IP
  • 59.93.255.93 ≈ BSNL CGNAT public IP

Plus, each carrier may rotate their CGNAT pool IPs independently. So V7's public-facing IP can change for two distinct reasons:

  1. GWN7002 fails over between BSNL and Rail (multi-WAN behavior)
  2. The carrier rotates the CGNAT pool (carrier-side, V7 has zero control)

Both can happen multiple times per day. This is why phone-side keepalive alone is necessary but not sufficient.

SIP/PBX architectural drift from v7-setup.md target

The v7-setup.md "Architecture target" section describes the UCM6301 hosting all extensions internally with one SIP trunk to cloud. In reality, almost all hotel phones register directly to cloud, bypassing the UCM as a trunk bridge.

Evidence (live pjsip show endpoints output): - Multiple extensions (04, 09, 102, 103, 105, 106, 108, 109, 110, 111) register individually to devsip.astradial.com:5080 - Each phone arrives from a different NAT'd port (random high port from carrier CGNAT) - The UCM trunk (org_moijhj2l_trunk1777732151626) exists and is registered, but is NOT used as the trunk for these phones - Some extensions had multiple stale contacts from old CGNAT IPs accumulated (zombie contacts)

Result: 18+ phones × WAN problems instead of 1 trunk × WAN problem. Each phone independently fights CGNAT pinholes, multi-WAN failover, and IP rotation.

Phone inventory (from GDMS Zero Config)

24 Grandstream devices discovered, 0 currently registered to UCM (all bypass it):

Model Count Purpose Firmware
GHP621 ~14 Hotel-room hospitality phones (basic, no network-change detection) 1.0.1.75
GRP2602P 4 Reception / restaurant / kitchen / common-area phones 1.0.5.55, 1.0.7.64
GRP2636 1 Front desk, ext 09 (MAC 000EC4F5E975 at 192.168.0.76) 1.0.13.31

Per-extension IP mapping (V7-confirmed):

Ext User LAN IP
01 Restaurant (TBD — update needed)
02 Ocloud CAFE 192.168.0.81
03 Kitchen 192.168.0.62
04 Purchase 192.168.0.73
05 Housekeeping 192.168.0.78
07 MD 192.168.0.80
09 Reception 192.168.0.76
101 Rooms 192.168.0.63
102 Rooms 192.168.0.77
103 Rooms 192.168.0.64
104 Rooms 192.168.0.65
105 Rooms 192.168.0.66
106 Rooms 192.168.0.67
107 Rooms 192.168.0.68
108 Rooms 192.168.0.69
109 Rooms 192.168.0.70
110 Rooms 192.168.0.71
111 Rooms 192.168.0.72

All phones use static IPs (no DHCP) on the LAN. Gateway is 192.168.0.60 (UCM) — but phones connect through it as L3 only; UCM does not handle their SIP registration.

Resilience approach — three layers

Layer 1: Phone-side keepalive (immediate, no prod changes)

Applied on each phone via web admin (or rolled out via UCM Zero Config). The settings target fast recovery from CGNAT rotation / WAN failover by keeping NAT pinholes alive and re-registering quickly.

Account 1 → SIP Settings → General:

Setting Value Why
REGISTER Expiration 30 Re-register every 30s for fast recovery on IP change
Re-Register before Expiration 10 Proactive refresh 10s before expiry
Registration Retry Wait Time 5 Retry every 5s on failure (was 20s)
Enable OPTIONS Keep-Alive ✅ Enable Holds NAT pinhole open + fast failure detection
OPTIONS Keep-Alive Interval 15 Probes faster than most CGNAT timeouts (~30s)
OPTIONS Keep-Alive Max Retries 3 Default fine
SUBSCRIBE for Registration OFF Not needed, adds noise without IP-change benefit
SIP Transport UDP TCP attempted, calls broke — see below

Phone Settings → Basic Settings:

Setting Value
Keep-Alive Interval 15
STUN Server stun.l.google.com:19302
Use Random Port

Effect: After any WAN-side IP change, the phone re-registers from the new public IP within ~15–30 seconds. Outbound calls work immediately; inbound calls work after first OPTIONS round-trip completes.

Limitation without server-side complement: stale contacts from prior IP accumulate on cloud (max_contacts > 1 by default). Asterisk may still try the old contact first for ~60s after a flip → intermittent inbound failures during that window.

Layer 2: Server-side AOR tuning (deferred to per-org profile system)

The natural complement to phone-side keepalive would be:

; in V7's AOR sections
minimum_expiration = 30
maximum_expiration = 30
qualify_frequency = 30
max_contacts = 1
remove_existing = yes

Decision: Do NOT apply per-org by direct config-file edit. Instead, build a resilience-profile system in dialplanGenerator.js:

  • Profile types: default / dynamic-ip / multi-wan / mobile-friendly
  • Per-org assignment stored in DB (or JSON during transition)
  • Editor UI to flip profile per org
  • Regeneration triggers reload

This avoids the "can't version-control per-org hand edits" problem and lets battery-sensitive orgs opt out. Not built yet. V7 runs without Layer 2 for now; accepts the ~30–60s zombie-contact window after each WAN flip.

Layer 3: WireGuard tunnel (planned)

Eliminates CGNAT entirely from the SIP path by giving cloud Asterisk a stable view of V7 at a fixed tunnel IP.

Architecture

GWN7002 (V7) ──WireGuard tunnel (over BSNL WAN)──→ Astradial Cloud
  tunnel IP: 10.30.7.2                              tunnel IP: 10.30.7.1
  subnet:    10.30.7.0/30                           UDP port:  51820

Cloud Asterisk sees V7 SIP traffic arriving from 10.30.7.2 ALWAYS,
regardless of which ISP CGNAT IP V7 is currently behind.

Why this works despite single-WAN binding limitation

GDMS WireGuard config only allows binding to ONE WAN (BSNL or Rail — no "Any"/"Auto"). The tunnel must pick one. But WireGuard is identity-based, not IP-based:

Failure scenario Tunnel behavior
BSNL's CGNAT public IP rotates ✅ Tunnel survives — WG only checks crypto identity
BSNL has packet loss / slowdown ✅ Tunnel survives — WG retries
BSNL goes briefly down (<10s) ⚠️ Tunnel pauses, reconnects
BSNL completely down for minutes ❌ Tunnel dies. Need fallback.

For the BSNL-fully-down case: phones use dual SIP accounts — Account 1 via tunnel, Account 2 direct to cloud over whichever WAN is up. When BSNL dies, Account 2 takes over via Rail.

Tunnel subnet plan

Existing WireGuard subnets at Astradial: - NUC tunnel: 10.10.10.0/24 (cloud .1, NUC .2) — already in use

V7 allocation: 10.30.7.0/30 point-to-point (4 IPs, only 2 usable) - Cloud peer: 10.30.7.1 - V7 peer: 10.30.7.2

Pattern for future customer tunnels: 10.30.N.0/30 where N is a per-customer identifier.

Cloud-side setup steps (NOT YET EXECUTED — requires explicit "yes proceed")

Step 0 — Read existing config (read-only)

ssh root@89.116.31.109 'cat /etc/wireguard/wg0.conf'
ssh root@89.116.31.109 'wg show'
ssh root@89.116.31.109 'ss -unlp | grep wireguard'

Step 1 — Generate keys for V7 peer

ssh root@89.116.31.109
cd /etc/wireguard
umask 077
wg genkey | tee v7_private.key | wg pubkey > v7_public.key
wg genpsk > v7_preshared.key

Step 2 — Backup + add V7 peer

cp /etc/wireguard/wg0.conf /etc/wireguard/wg0.conf.bak-$(date +%F)-pre-v7

Add to /etc/wireguard/wg0.conf:

# === V7 (VSEVEN HOTELS) peer ===
[Peer]
PublicKey = <V7_public_key_from_GDMS>
PresharedKey = <generated_psk>
AllowedIPs = 10.30.7.2/32
PersistentKeepalive = 25
# No Endpoint — V7 is behind CGNAT and initiates the connection to us

Step 3 — Hot-reload

wg syncconf wg0 <(wg-quick strip wg0)
wg show wg0

Step 4 — Update V7 endpoints in /etc/asterisk/pjsip_vseven_hotels.conf - Backup first - Adjust match lines to include 10.30.7.0/30 - pjsip reload

Rollback (if needed)

cp /etc/wireguard/wg0.conf.bak-<date>-pre-v7 /etc/wireguard/wg0.conf
wg syncconf wg0 <(wg-quick strip wg0)
# Similar restore for pjsip config
asterisk -rx "pjsip reload"

Router-side setup (via GDMS)

GDMS → Settings → VPN → WireGuard® → Add (or use Setup Wizard)

Form values: | Field | Value | |---|---| | Name | astradial-cloud | | Status | Enable (after full setup) | | Interface | BSNL (only single-WAN binding available) | | Listening Port | 51820 | | Local IP / Mask | 10.30.7.2 / 30 | | Private Key | (pre-generated by GDMS, leave alone) | | Public Key | Copy this — needed for cloud's peer block | | MTU | 1420 |

Peer block (separate step after Save): | Field | Value | |---|---| | Peer Public Key | (cloud's WireGuard public key, get from wg show wg0) | | Pre-Shared Key | (from cloud's v7_preshared.key) | | Allowed IPs | 10.30.7.1/32, 89.116.31.109/32 | | Endpoint | 89.116.31.109:51820 | | Persistent Keepalive | 25 |

Investigation captured during planning

GDMS WireGuard config form was inspected. Key findings: - Keys auto-generated by GDMS (no need to manually generate router-side) - Interface (WAN binding) only offers BSNL or Rail — no "Any" or "Auto" option - Peer/endpoint configuration NOT visible in initial Add form — appears in a separate step or via Setup Wizard - Setup Wizard offers four protocols: OpenVPN, WireGuard, IPSec (Site-to-Site), PPTP - IPSec Site-to-Site explicitly advertises "auto-rebuild on WAN IP change" — worth considering as alternative if WireGuard implementation hits limits, but our cloud already runs WG so default to WG

Layer 4: UCM bridge architecture (deferred, not chosen for this round)

The "ideal" architecture from v7-setup.md was UCM as the trunk bridge: all 18 phones register locally to UCM (over LAN — no NAT, no CGNAT issues), UCM has one trunk to cloud. This means 1 trunk × WAN problem instead of 18.

Considered and deferred for this iteration because: - Requires creating extensions on UCM for all 18 phones (currently 0 registered) - Requires Zero Config template rollout - Trunk-to-trunk gating issue (from v7-setup.md) needs Grandstream-installer resolution - Phone-side fix (Layer 1) covers ~80% of the problem at lower effort

Will re-evaluate if WireGuard tunnel + phone-side keepalive isn't enough.

Number strategy (decided)

Question Decision Rationale
Primary public number +918065978007 (Tata DID via Astradial) Already routes end-to-end via cloud; bypasses all BSNL FXO/SIP-trunk complexity
What to do with 04175295093 (BSNL printed number) Keep with CFA*21*08065978007# activated on BSNL line Preserves printed-number recognition for guests/banners; forwards to working path
Pending decisions Local Number Portability (port 04175295093 → Tata) — future consideration Eliminates BSNL dependency over time

What we stopped doing

The original v7-setup.md had several pending items that became moot with the new architecture:

  • ❌ Grandstream installer meeting about UCM trunk-to-trunk gating — no longer needed
  • ❌ BSNL engineer meeting about SIP-over-Ethernet upgrade — no longer needed
  • ❌ DBC ONT admin password hunt — no longer needed (ONT not in voice path)
  • ❌ UCM BSNL-AstraDial inbound route debugging — replaced by CFA + Tata DID flow

Cancel/repurpose external meetings.

TCP transport attempt (failed — keep notes for future)

Switched ext 108 from UDP to TCP. Registration succeeded but outbound calls failed. After revert to UDP, calls work normally.

Root cause not investigated yet, but most likely: V7's PJSIP endpoint config in /etc/asterisk/pjsip_vseven_hotels.conf has explicit transport = transport-udp. When phone registers over TCP, the contact stores ;transport=TCP, but Asterisk uses the endpoint's bound transport for outbound INVITE → transport/contact mismatch → INVITE never reaches phone.

To investigate later if we want TCP: 1. Read endpoint config in /etc/asterisk/pjsip_vseven_hotels.conf 2. Remove explicit transport binding OR add a separate TCP transport endpoint 3. Test with one phone first

Cloud confirmed to listen on TCP/5080:

asterisk -rx "pjsip show transports"
# Transport: transport-tcp tcp 0.0.0.0:5080  ✅

Remote-access tooling

Tool Purpose URL / access
GDMS Cloud Remote GWN7002 / UCM / phone management https://www.gdms.cloud/gwn — V7 organization
GDMS — Devices View / config router, phones, switches, APs Same
GDMS — Settings → Internet WAN priority, failover mode, health graphs Same
GDMS — Settings → VPN WireGuard / IPSec / OpenVPN setup Same
GDMS — Settings → Firewall & Security SIP ALG (look in Advanced Security Settings) Same
Astradial Cloud Asterisk SIP server, CDR, recording ssh root@89.116.31.109

Operational notes

Verifying phone registrations

ssh root@89.116.31.109 'asterisk -rx "pjsip show contacts" | grep moijhj2l'

Look for: - Each ext should have ONE contact (multiple = zombies from prior WAN flips) - Status should be Avail (not Unavail) - Transport should be UDP (TCP didn't work)

After WireGuard goes live, verifying tunnel

ssh root@89.116.31.109 'wg show wg0'
# Should show V7 peer with recent handshake and traffic
ssh root@89.116.31.109 'ping -c 3 10.30.7.2'
# Should succeed once tunnel is up

Watching WAN flips

On GDMS → Settings → Internet — the WAN Health graph shows green/red segments over the last 12h. Frequent red segments on either WAN = unstable line, worth investigating from V7 IT side.

Open questions / pending items

  • [ ] Confirm whether GDMS WireGuard form has a separate Peer-configuration step (probably after Save, or via Setup Wizard) — needed to complete the tunnel
  • [ ] Identify the more stable WAN between BSNL and Rail from GDMS Internet → WAN Health history (12h+ data)
  • [ ] Decide whether to upgrade to dual-tunnel (Layer 3 Option B) if single-tunnel BSNL outages become painful
  • [ ] Build the per-org resilience profile system in dialplanGenerator.js (Layer 2)
  • [ ] Confirm SIP ALG state on GWN7002 — Firewall → Advanced Security Settings (suspected enabled, must disable)
  • [ ] Roll Layer 1 phone-side settings from ext 108 to other 17 phones (manual or Zero Config template)
  • [ ] Update v7-setup.md status board to reflect the deprecated BSNL-FXO path and new Tata-DID-primary path

Decisions log

Decision Outcome Date
Drop BSNL FXO integration; use Tata DID +918065978007 as primary number Done conceptually; CFA activation pending V7 IT This iteration
Keep BSNL line active with CFA forwarding to Tata DID Preserves printed number with zero work This iteration
Cancel Grandstream + BSNL meeting No longer needed under new architecture This iteration
Defer UCM bridge architecture; use direct cloud registration Lower effort; phone-side keepalive + WireGuard cover most issues This iteration
Apply phone-side keepalive resilience settings, starting with ext 108 In progress This iteration
Don't apply per-org server-side AOR edits; build profile system in generator instead Avoids "can't version-control per-org hand edits" This iteration
Use WireGuard for V7 tunnel (matches existing infra) over IPSec WG keys are easier to manage; cloud already has WG service This iteration
Single-tunnel on BSNL with dual-account fallback (not dual-tunnel) Single tunnel covers ~99% of cases at lower complexity This iteration
TCP transport attempt → revert to UDP Endpoint config likely UDP-bound; investigation deferred This iteration