Fail2Ban Runbook¶
Operational reference for the fail2ban configuration protecting the PROD cloud Asterisk. The setup is intentionally split so that real attacks get banned fast while legitimate customer retry storms (IP changes, network blips, softphone session starts) cannot trip the ban threshold.
Why two asterisk jails¶
Asterisk's PJSIP emits No matching endpoint found on every legitimate first REGISTER — that's the standard SIP handshake (request without auth, server responds 401, client retries with auth). The same log line is also emitted by username scanners doing brute-force discovery.
A single jail can't tell them apart by log line alone — it can only distinguish by rate. So we split:
| Jail | Catches | maxretry | findtime | bantime |
|---|---|---|---|---|
asterisk-auth | Real auth failures: wrong password, ACL violation, security events, "hacking attempt detected" | 3 | 10 min | 24 h |
asterisk-scan | Endpoint-discovery patterns: "No matching endpoint found", "No matching peer", "Not a local domain", extension-not-found | 50 | 1 h | 1 h |
A real attacker doing username scanning produces hundreds of probes per minute → still banned fast by asterisk-scan. A customer PBX with 10 phones re-registering after an IP change produces ~10–30 such lines in a few seconds → well under 50 in an hour → no ban.
File layout on PROD (89.116.31.109)¶
| Path | Contents |
|---|---|
/etc/fail2ban/jail.local | Jail definitions for asterisk-auth, asterisk-scan, sshd plus [DEFAULT] ignoreip list |
/etc/fail2ban/filter.d/asterisk-auth.local | Regex set for real auth failures |
/etc/fail2ban/filter.d/asterisk-scan.local | Regex set for endpoint-discovery patterns |
/etc/fail2ban/filter.d/asterisk.conf | Distro-default monolithic filter — disabled, kept on disk for reference |
/etc/fail2ban/jail.local.bak-YYYY-MM-DD-* | Pre-change backups |
/var/log/fail2ban.log | Action log (bans, unbans, reloads) |
/var/log/asterisk/messages.log | Source the jails monitor |
Customer-PBX whitelist (ignoreip)¶
Customer PBXes are special — they have many endpoints behind one public IP, and when their ISP-assigned IP changes (common in India for non-static-IP customers), all endpoints re-register simultaneously. Even with a 50-strike scan threshold, a one-off catastrophic ban can still happen; the ignoreip list is the belt-and-suspenders.
Current entries (as of 2026-05-11):
| CIDR | Customer | Notes |
|---|---|---|
127.0.0.0/8, ::1 | localhost | Always whitelisted |
59.93.255.0/24 | VSEVEN HOTELS (V7) | Current uplink |
103.197.113.0/24 | VSEVEN HOTELS (V7) | Previous uplink — kept for resilience if their ISP rotates back |
Use /24 (256 IPs), not /16 (65k IPs). /16 ranges are too broad and would whitelist many unrelated hosts on the same ISP, defeating the purpose.
Adding a new customer¶
When onboarding a customer whose PBX has a known public IP / range:
- Note the customer's public-IP block (ask their IT; typical SMB has a /29 or /28 from the ISP)
- SSH to PROD:
- Edit
/etc/fail2ban/jail.local: - Add the CIDR to the
ignoreip = …line under[DEFAULT] - Add a comment above the line documenting which customer it's for
- Reload:
Removing an entry¶
Same flow — edit the line, reload. The change takes effect immediately.
Common operations¶
Check status¶
ssh root@89.116.31.109
fail2ban-client status # list all jails
fail2ban-client status asterisk-auth # counters for strict jail
fail2ban-client status asterisk-scan # counters for lenient jail
fail2ban-client get asterisk-auth ignoreip # current whitelist
Unban an IP¶
fail2ban-client set asterisk-auth unbanip <IP>
fail2ban-client set asterisk-scan unbanip <IP>
# or — unban an IP from every jail at once:
fail2ban-client unban <IP>
Add a temporary runtime ignoreip (lost on restart)¶
fail2ban-client set asterisk-auth addignoreip <IP>
fail2ban-client set asterisk-scan addignoreip <IP>
# remove:
fail2ban-client set asterisk-auth delignoreip <IP>
For a permanent entry, edit jail.local instead.
Watch live ban activity¶
Test whether a regex matches a log line¶
fail2ban-regex /var/log/asterisk/messages.log /etc/fail2ban/filter.d/asterisk-auth.local
fail2ban-regex /var/log/asterisk/messages.log /etc/fail2ban/filter.d/asterisk-scan.local
Useful when adding a new regex or debugging "why didn't this get banned" / "why did this get banned".
Tuning guidance¶
If you find legitimate traffic getting banned by asterisk-scan:
- First check whether the customer's CIDR is in
ignoreip. If a known production customer is hitting the threshold, whitelist them. - Don't lower the strict-jail threshold — it's already at 3, going lower is dangerous.
- Consider raising
asterisk-scanmaxretryfrom 50 → 100 if multiple customers have flash crowds. Don't go above 200 — at that point you're effectively disabled.
If you find a real attacker NOT getting banned fast enough:
- Check
findtime— the scan jail uses 1 h. A slow scanner doing 10 probes/hour can evade. Drop to 600 (10 min) if you see this pattern. - Add a recidive jail — fail2ban has a built-in
[recidive]jail that bans repeat offenders banned by other jails. Worth enabling if scanner abuse rises.
Rollback to the pre-split single jail¶
If anything goes wrong:
ssh root@89.116.31.109
cp /etc/fail2ban/jail.local.bak-<date>-pre-split /etc/fail2ban/jail.local
rm /etc/fail2ban/filter.d/asterisk-auth.local /etc/fail2ban/filter.d/asterisk-scan.local
fail2ban-client reload
The distro-default [asterisk] jail will resume; it will re-create the false-ban risk for IP-change retry storms, so plan a replacement before rolling back.
History¶
| Date | Change |
|---|---|
| 2026-05-06 | Bumped maxretry from 3 → 10 on the monolithic [asterisk] jail as a stopgap after the PJSIP-reload + fail2ban storm incident. See Error 52. |
| 2026-05-11 | Split into asterisk-auth (strict) + asterisk-scan (lenient) + customer ignoreip whitelist after V7's IP change triggered another false-ban. See Error 55. |
Related¶
- Troubleshooting Error 52 — original PJSIP-reload + fail2ban incident
- Troubleshooting Error 55 — IP-change incident that drove the split-jail design