Troubleshooting¶
This page documents all known errors encountered in the Astradial infrastructure, their root causes, and fixes.
Error 1: Asterisk Won't Start -- Bind to NNI IP Fails¶
Symptom¶
Asterisk fails to start with an error about being unable to bind to the NNI interface IP address.
Root Cause¶
The PJSIP transport was configured to bind to the NNI interface IP (e.g., 172.16.x.x). When the NNI interface (enp86s0) is DOWN -- for example after a reboot before the interface comes up -- Asterisk cannot bind to that IP and refuses to start.
Diagnosis¶
Fix¶
Change the PJSIP transport bind address to 0.0.0.0 instead of the specific NNI IP:
Prevention Rule¶
Never bind PJSIP transports to a specific interface IP. Always use 0.0.0.0 to bind on all interfaces.
Error 2: Tata Calls "Number Incorrect"¶
Symptom¶
Outbound or inbound calls via Tata fail with "number incorrect" or similar rejection. Calls may partially work but SIP responses are not reaching Tata.
Root Cause¶
Two issues combined:
- Routes for the Tata network were missing from the NUC routing table.
- SIP responses were going out the wrong network interface (default route instead of the NNI interface).
Diagnosis¶
# Check routing table
ip route show
# Check which interface SIP traffic uses
tcpdump -i enp86s0 port 5060
tcpdump -i any port 5060
Fix¶
Add explicit routes to ensure Tata SIP traffic goes through the NNI gateway:
Make routes persistent in /etc/network/interfaces.d/enp86s0.
Prevention Rule¶
Always verify routing after network changes. SIP responses must leave through the same interface the request arrived on.
Error 3: tata-endpoint Shows "Unavailable"¶
Symptom¶
pjsip show endpoints shows the tata_gateway endpoint status as "Unavailable".
Root Cause¶
Asterisk sends SIP OPTIONS to qualify endpoints. Tata's SBC ignores OPTIONS requests and never responds, so Asterisk marks the endpoint as unavailable.
Diagnosis¶
Fix¶
Disable qualify by setting qualify_frequency=0 in the AOR:
Prevention Rule¶
This is expected behavior -- the endpoint will always show "Unavailable". Do not use endpoint status to determine if the Tata trunk is working. Test with actual calls instead.
Error 4: PJSIP_EFAILEDCREDENTIAL¶
Symptom¶
SIP authentication fails with PJSIP_EFAILEDCREDENTIAL in Asterisk logs when the NUC tries to authenticate with the cloud Asterisk.
Root Cause¶
There is an authentication incompatibility between Asterisk 22 and Asterisk 20 (or between certain PJSIP versions). Username/password-based SIP authentication fails across these versions.
Diagnosis¶
Fix¶
Replace username/password authentication with IP-based identification:
[tata_gateway]
type=endpoint
identify_by=ip
[tata_gateway]
type=identify
endpoint=tata_gateway
match=10.10.10.2
Prevention Rule¶
Use identify_by=ip for machine-to-machine SIP connections where both sides have known, fixed IPs (e.g., over WireGuard).
Error 5: systemctl Shows "active (exited)" but Nothing Running¶
Symptom¶
systemctl status asterisk shows active (exited) but Asterisk is not actually running. No process is found.
Root Cause¶
The default Asterisk package installs an init.d script that systemd wraps. The init.d script has a bug where it reports success (exits 0) even when Asterisk fails to start or is not running.
Diagnosis¶
systemctl status asterisk
# Shows "active (exited)" instead of "active (running)"
ps aux | grep asterisk
# No asterisk process found
Fix¶
Create a proper systemd service file that manages Asterisk directly instead of relying on the init.d wrapper:
[Unit]
Description=Asterisk PBX
After=network.target
[Service]
Type=simple
ExecStart=/usr/sbin/asterisk -f -C /etc/asterisk/asterisk.conf
ExecReload=/usr/sbin/asterisk -rx "core reload"
Restart=on-failure
[Install]
WantedBy=multi-user.target
Prevention Rule¶
Always use native systemd service files. Do not rely on init.d compatibility wrappers.
Error 6: enp86s0 DOWN After Reboot¶
Symptom¶
After a NUC reboot, the NNI interface (enp86s0) is DOWN. The Tata trunk has no network connectivity.
Root Cause¶
The NNI interface was configured manually (via ip commands) but not persisted in network configuration files. After reboot, the interface remains unconfigured and DOWN.
Diagnosis¶
Fix¶
Create a persistent configuration in /etc/network/interfaces.d/enp86s0:
Prevention Rule¶
Never configure network interfaces with transient ip commands alone. Always persist configuration in /etc/network/interfaces.d/.
Error 7: First Call After Restart Fails¶
Symptom¶
The first inbound or outbound call after an Asterisk restart fails. Subsequent calls work fine.
Root Cause¶
After Asterisk restarts, SIP registrations and endpoint discovery take time to complete. If a call arrives before the Tata gateway endpoint is fully initialized (30-60 seconds), it will fail.
Diagnosis¶
# Check how long ago Asterisk started
asterisk -rx "core show uptime"
# Check endpoint status
asterisk -rx "pjsip show endpoints"
Fix¶
Wait 30-60 seconds after restarting Asterisk before routing live calls. There is no configuration fix -- this is inherent to the SIP registration process.
Prevention Rule¶
After any Asterisk restart, verify the trunk is ready by placing a test call before routing production traffic. Avoid restarting Asterisk during business hours when possible.
Error 8: NUC Public IP Changed -- Cloud Rejected¶
Symptom¶
The NUC's SIP connection to the cloud Asterisk stops working. The cloud Asterisk rejects packets from the NUC because its public IP has changed (ISP dynamic IP).
Root Cause¶
The NUC's ISP assigns a dynamic public IP. When it changes, the cloud Asterisk's IP-based identification no longer matches, and all traffic from the NUC is rejected.
Diagnosis¶
# On NUC: check current public IP
curl ifconfig.me
# On cloud: check what IP the identify section expects
asterisk -rx "pjsip show identify tata_gateway"
Fix¶
Use a WireGuard VPN tunnel between the NUC and cloud server. WireGuard assigns fixed tunnel IPs (10.10.10.1 for cloud, 10.10.10.2 for NUC) regardless of the underlying public IP.
; Cloud Asterisk uses WireGuard IP for identification
[tata_gateway]
type=identify
endpoint=tata_gateway
match=10.10.10.2
Prevention Rule¶
Never rely on public IPs for SIP endpoint identification when either side has a dynamic IP. Always use a VPN tunnel with fixed IPs.
Error 9: Zoiper 401 Unauthorized¶
Symptom¶
Zoiper softphone on the NUC's local network receives 401 Unauthorized when trying to register with the cloud Asterisk, even though credentials are correct.
Root Cause¶
The NUC and Zoiper share the same public IP (both are behind the same NAT). The cloud Asterisk's identify_by=ip matches the source IP to the tata_gateway endpoint before evaluating Zoiper's credentials, causing an identification conflict.
Diagnosis¶
# On cloud Asterisk
pjsip set logger on
# Observe that Zoiper REGISTER is matched to tata_gateway endpoint
Fix¶
The WireGuard tunnel solves this. The NUC's Asterisk traffic goes through the tunnel (source IP 10.10.10.2), while Zoiper's traffic goes through the public IP. These are now different source IPs, so identification works correctly.
Prevention Rule¶
When using identify_by=ip, ensure each endpoint has a unique source IP. Use VPN tunnels to separate traffic from hosts sharing a public IP.
Error 10: ISP Blocks UDP 5060¶
Symptom¶
SIP softphones on certain networks cannot register or make calls. Error messages include "SIP UDP not found" or connection timeouts.
Root Cause¶
Some ISPs block outbound UDP traffic on port 5060 as a measure against SIP abuse or toll fraud.
Diagnosis¶
# From the affected network
nc -zuv 89.116.31.109 5060
# Timeout or connection refused
# Try alternate port
nc -zuv 89.116.31.109 5080
# Success
Fix¶
Configure an alternate SIP transport on port 5080:
[transport-udp-alt]
type=transport
protocol=udp
bind=0.0.0.0:5080
external_media_address=89.116.31.109
external_signaling_address=89.116.31.109
Clients behind restrictive ISPs connect to port 5080 instead of 5060.
Prevention Rule¶
Always provide an alternate SIP port (5080) as a standard part of the deployment. Document it for end users experiencing connectivity issues.
Error 11: NUC Crashed During Netdata Build¶
Symptom¶
The NUC became unresponsive and powered off during Netdata compilation. After power-on, no Netdata installation was present.
Root Cause¶
Compiling Netdata from source is CPU-intensive. The NUC's passive cooling was insufficient, causing the CPU to overheat and trigger a thermal shutdown.
Diagnosis¶
Fix¶
Install Netdata using the --static-only flag, which downloads a pre-built static binary instead of compiling from source:
Prevention Rule¶
Never compile large software from source on passively-cooled or low-power hardware. Always use pre-built binaries or static builds.
Error 12: Upptime 404¶
Symptom¶
The Upptime status page at status.astradial.com returns a 404 error. The GitHub Actions workflows run successfully and the site appears to build, but the page is not accessible.
Root Cause¶
The Upptime workflows build the static site but do not deploy it to GitHub Pages. Without a deployment step, the gh-pages branch is never updated and GitHub Pages has nothing to serve.
Diagnosis¶
# Check if gh-pages branch exists and has recent commits
gh api repos/astradial/upptime/branches/gh-pages
Fix¶
Add the peaceiris/actions-gh-pages action to the workflow to deploy the built site to the gh-pages branch:
- uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./build
Prevention Rule¶
When using GitHub Pages with a build step, always include an explicit deployment action. Verify the gh-pages branch is being updated after workflow runs.