Fail2Ban Runbook¶

Operational reference for the fail2ban configuration protecting the PROD cloud Asterisk. The setup is intentionally split so that real attacks get banned fast while legitimate customer retry storms (IP changes, network blips, softphone session starts) cannot trip the ban threshold.

Why two asterisk jails¶

Asterisk's PJSIP emits No matching endpoint found on every legitimate first REGISTER — that's the standard SIP handshake (request without auth, server responds 401, client retries with auth). The same log line is also emitted by username scanners doing brute-force discovery.

A single jail can't tell them apart by log line alone — it can only distinguish by rate. So we split:

Jail	Catches	maxretry	findtime	bantime
`asterisk-auth`	Real auth failures: wrong password, ACL violation, security events, "hacking attempt detected"	3	10 min	24 h
`asterisk-scan`	Endpoint-discovery patterns: "No matching endpoint found", "No matching peer", "Not a local domain", extension-not-found	50	1 h	1 h

A real attacker doing username scanning produces hundreds of probes per minute → still banned fast by asterisk-scan. A customer PBX with 10 phones re-registering after an IP change produces ~10–30 such lines in a few seconds → well under 50 in an hour → no ban.

File layout on PROD (`147.93.168.216`)¶

Path	Contents
`/etc/fail2ban/jail.local`	Jail definitions for `asterisk-auth`, `asterisk-scan`, `sshd` plus `[DEFAULT]` ignoreip list
`/etc/fail2ban/filter.d/asterisk-auth.local`	Regex set for real auth failures
`/etc/fail2ban/filter.d/asterisk-scan.local`	Regex set for endpoint-discovery patterns
`/etc/fail2ban/filter.d/asterisk.conf`	Distro-default monolithic filter — disabled, kept on disk for reference
`/etc/fail2ban/jail.local.bak-YYYY-MM-DD-*`	Pre-change backups
`/var/log/fail2ban.log`	Action log (bans, unbans, reloads)
`/var/log/asterisk/messages.log`	Source the jails monitor

Whitelist your own IP before diagnostic probing — bans are whole-IP, all-ports

A ban here DROPs every port from the offending IP (SSH 22, WSS 8089, SIP, everything). Repeated diagnostics — retrying a timing-out SSH, hammering wss://…:8089 handshake probes, port scans — can ban your own workstation, after which prod looks down/broken (a half-formed ban silently drops upgrade/handshake packets → false "WSS is broken" diagnosis). Before any repeated prod probing, add your current public IP for the session: fail2ban-client set sshd addignoreip <your.ip> (repeat per jail), and unban with fail2ban-client unban <your.ip> from an un-banned host. A sudden all-ports timeout to prod means you're banned, not "prod is down" — verify from a second IP first. See the AI-agent rule #6 in the repo CLAUDE.md.

Customer-PBX whitelist (`ignoreip`)¶

Customer PBXes are special — they have many endpoints behind one public IP, and when their ISP-assigned IP changes (common in India for non-static-IP customers), all endpoints re-register simultaneously. Even with a 50-strike scan threshold, a one-off catastrophic ban can still happen; the ignoreip list is the belt-and-suspenders.

Current entries (as of 2026-05-11):

ignoreip = 127.0.0.1/8 ::1 59.93.255.0/24 103.197.113.0/24

CIDR	Customer	Notes
`127.0.0.0/8`, `::1`	localhost	Always whitelisted
`59.93.255.0/24`	VSEVEN HOTELS (V7)	Current uplink
`103.197.113.0/24`	VSEVEN HOTELS (V7)	Previous uplink — kept for resilience if their ISP rotates back

Use /24 (256 IPs), not /16 (65k IPs). /16 ranges are too broad and would whitelist many unrelated hosts on the same ISP, defeating the purpose.

Adding a new customer¶

When onboarding a customer whose PBX has a known public IP / range:

Note the customer's public-IP block (ask their IT; typical SMB has a /29 or /28 from the ISP)

SSH to PROD:

ssh root@147.93.168.216
cp /etc/fail2ban/jail.local /etc/fail2ban/jail.local.bak-$(date +%F)

Edit /etc/fail2ban/jail.local:
Add the CIDR to the ignoreip = … line under [DEFAULT]
Add a comment above the line documenting which customer it's for

Reload:

fail2ban-client reload
fail2ban-client get asterisk-auth ignoreip   # verify

Removing an entry¶

Same flow — edit the line, reload. The change takes effect immediately.

Common operations¶

Check status¶

ssh root@147.93.168.216

fail2ban-client status                       # list all jails
fail2ban-client status asterisk-auth         # counters for strict jail
fail2ban-client status asterisk-scan         # counters for lenient jail
fail2ban-client get asterisk-auth ignoreip   # current whitelist

Unban an IP¶

fail2ban-client set asterisk-auth unbanip <IP>
fail2ban-client set asterisk-scan unbanip <IP>
# or — unban an IP from every jail at once:
fail2ban-client unban <IP>

Add a temporary runtime ignoreip (lost on restart)¶

fail2ban-client set asterisk-auth addignoreip <IP>
fail2ban-client set asterisk-scan addignoreip <IP>
# remove:
fail2ban-client set asterisk-auth delignoreip <IP>

For a permanent entry, edit jail.local instead.

Watch live ban activity¶

tail -f /var/log/fail2ban.log

Test whether a regex matches a log line¶

fail2ban-regex /var/log/asterisk/messages.log /etc/fail2ban/filter.d/asterisk-auth.local
fail2ban-regex /var/log/asterisk/messages.log /etc/fail2ban/filter.d/asterisk-scan.local

Useful when adding a new regex or debugging "why didn't this get banned" / "why did this get banned".

Tuning guidance¶

If you find legitimate traffic getting banned by asterisk-scan:

First check whether the customer's CIDR is in ignoreip. If a known production customer is hitting the threshold, whitelist them.
Don't lower the strict-jail threshold — it's already at 3, going lower is dangerous.
Consider raising asterisk-scan maxretry from 50 → 100 if multiple customers have flash crowds. Don't go above 200 — at that point you're effectively disabled.

If you find a real attacker NOT getting banned fast enough:

Check findtime — the scan jail uses 1 h. A slow scanner doing 10 probes/hour can evade. Drop to 600 (10 min) if you see this pattern.
Add a recidive jail — fail2ban has a built-in [recidive] jail that bans repeat offenders banned by other jails. Worth enabling if scanner abuse rises.

Rollback to the pre-split single jail¶

If anything goes wrong:

ssh root@147.93.168.216
cp /etc/fail2ban/jail.local.bak-<date>-pre-split /etc/fail2ban/jail.local
rm /etc/fail2ban/filter.d/asterisk-auth.local /etc/fail2ban/filter.d/asterisk-scan.local
fail2ban-client reload

The distro-default [asterisk] jail will resume; it will re-create the false-ban risk for IP-change retry storms, so plan a replacement before rolling back.

History¶

Date	Change
2026-05-06	Bumped `maxretry` from 3 → 10 on the monolithic `[asterisk]` jail as a stopgap after the PJSIP-reload + fail2ban storm incident. See Error 52.
2026-05-11	Split into `asterisk-auth` (strict) + `asterisk-scan` (lenient) + customer `ignoreip` whitelist after V7's IP change triggered another false-ban. See Error 55.

Troubleshooting Error 52 — original PJSIP-reload + fail2ban incident
Troubleshooting Error 55 — IP-change incident that drove the split-jail design