This script monitors your pfSense services and automatically takes corrective actions when issues are detected:
- WAN Gateway: Monitors connectivity and restarts interface if needed
- Unbound DNS: Detects hung processes and forces reboot if unrecoverable
Create this file in the same directory as your Python script:
json
{
"email": {
"smtp_server": "smtp.gmail.com",
"smtp_port": 587,
"sender_email": "your-monitoring-email@gmail.com",
"sender_password": "your-gmail-app-password",
"recipient_email": "admin@yourcompany.com"
},
"network": {
"wan_interface": "em0",
"gateway_override": "",
"timeout_seconds": 5,
"retry_attempts": 3,
"retry_delay": 10,
"ping_count": 3
},
"unbound": {
"enabled": true,
"process_name": "unbound",
"bad_states": ["SBWAIT", "STOP"],
"restart_attempts": 2,
"restart_timeout": 30,
"force_reboot_on_failure": true
},
"watchdog": {
"enabled": false,
"use_as_last_resort": true,
"timeout_seconds": 60
},
"logging": {
"log_file": "/var/log/pfsense_monitor.log",
"max_log_size_mb": 10
},
"actions": {
"restart_interface_delay": 30,
"reboot_delay": 60
}
}- Enable 2-factor authentication on your Gmail account
- Generate an App Password:
- Go to Google Account settings
- Security → App passwords
- Generate password for "Mail"
- Use this password in the config file (not your regular Gmail password)
Run these commands on pfSense to identify your WAN interface and gateway:
Find WAN interface:
bash
ifconfigCommon WAN interface names:
em0,em1(Intel)igb0,igb1(Intel Gigabit)re0,re1(Realtek)bge0,bge1(Broadcom)vtnet0,vtnet1(VirtIO - virtual environments)
Find default gateway:
bash
# Method 1: netstat
netstat -rn | grep default
# Method 2: route command (FreeBSD/pfSense)
route -n get default
# Method 3: Check pfSense GUI
# Navigate to Status > Gateways in pfSense web interfaceThe script will auto-detect the gateway, but you can override it in the config if needed.
-
Create a dedicated directory:
bash
mkdir -p /usr/local/bin/pfsense_monitor cd /usr/local/bin/pfsense_monitor -
Copy the script and config:
bash
# Upload via SCP or copy directly scp pfsense_monitor.py root@your-pfsense-ip:/usr/local/bin/pfsense_monitor/ scp pfsense_monitor_config.json root@your-pfsense-ip:/usr/local/bin/pfsense_monitor/ -
Make the script executable:
bash
chmod +x /usr/local/bin/pfsense_monitor/pfsense_monitor.py
-
Test the script manually:
bash
cd /usr/local/bin/pfsense_monitor python3 pfsense_monitor.pyOr test from any directory:
bash
python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py
Add to root's crontab to run every 5 minutes:
bash
crontab -eAdd this line:
bash
# Check pfSense services every 5 minutes
*/5 * * * * /usr/local/bin/python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py >> /var/log/pfsense_monitor_cron.log 2>&1Alternative intervals:
bash
# Every minute (aggressive monitoring)
* * * * * /usr/local/bin/python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py
# Every 10 minutes (conservative)
*/10 * * * * /usr/local/bin/python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py
# Every 2 minutes during business hours only
*/2 8-17 * * 1-5 /usr/local/bin/python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py- smtp_server/smtp_port: Gmail SMTP settings (don't change unless using different provider)
- sender_email: The Gmail account sending alerts
- sender_password: Gmail app password (not regular password)
- recipient_email: Where to send alerts
- fallback_dns_servers: Public DNS servers to use when local DNS (Unbound) is down (default:
["8.8.8.8", "1.1.1.1"]) - smtp_server_fallback_ips: Hardcoded Gmail SMTP IPs used as last resort when all DNS fails
- wan_interface: WAN network interface name (find with
ifconfig) - gateway_override: Manual gateway IP (leave empty for auto-detection)
- timeout_seconds: How long to wait for ping responses
- retry_attempts: How many times to retry before taking action
- retry_delay: Seconds between retry attempts
- ping_count: Number of ping packets to send per test
- enabled: Enable/disable unbound service monitoring (true/false)
- process_name: Name of the unbound process (default: "unbound")
- bad_states: List of process states that indicate hung process (default: ["SBWAIT", "STOP"])
- restart_attempts: Number of times to attempt restarting unbound before forcing reboot
- restart_timeout: Seconds to wait after restart to verify recovery
- force_reboot_on_failure: If true, forces system reboot when unbound cannot be restarted
- cpu_monitoring: CPU usage monitoring settings
- enabled: Enable/disable CPU usage monitoring (default: true)
- threshold_percent: CPU percentage threshold that triggers restart (default: 80.0)
- sample_count: Number of CPU samples to collect for sustained check (default: 3)
- sample_interval_seconds: Seconds between each CPU sample (default: 5)
- dns_responsiveness: DNS query responsiveness testing (most reliable detection method)
- enabled: Enable/disable DNS responsiveness testing (default: true)
- test_domains: List of domains to query via localhost (default: ["google.com", "cloudflare.com"])
- timeout_seconds: How long to wait for each DNS query (default: 5)
- require_all: If true, all domains must resolve; if false, at least one must (default: false)
CPU Monitoring Behavior: The monitor collects multiple CPU samples over time to detect sustained high CPU usage. If the majority of samples (>=50%) exceed the threshold, the service will be restarted. This approach filters out transient CPU spikes and only triggers recovery for persistent high CPU conditions. Total CPU check time is approximately: sample_count × sample_interval_seconds (default: 15 seconds).
DNS Responsiveness Behavior: This is the most reliable detection method as it tests whether DNS actually responds to queries. It catches kernel-level deadlocks (e.g., pf lock contention) where the process appears "running" but is stuck in a kernel lock and cannot respond. Uses drill or dig commands to query localhost DNS.
- enabled: Enable/disable hardware watchdog support (default: false)
- use_as_last_resort: Use watchdog as final fallback when all software reboot methods fail
- timeout_seconds: How long to wait for hardware watchdog to trigger reboot (typically 60 seconds)
Note: Watchdog timer requires hardware support and must be enabled in pfSense:
- Navigate to: System > Advanced > Miscellaneous > Watchdog
- Or enable via shell:
service watchdogd enable && service watchdogd start
- restart_interface_delay: How long to wait after restarting interface
- reboot_delay: How long to wait before rebooting system
- Permission denied
- Make sure script runs as root
- Check file permissions:
chmod +x pfsense_monitor.py
- Email not sending
- Verify Gmail app password (not regular password)
- Check 2FA is enabled on Gmail account
- Test with a simple Python email script first
- Interface not found
- Run
ifconfigto see available interfaces - Update
wan_interfacein config file - Check if interface is actually the WAN interface
- Run
- Gateway detection issues
- Check if default route exists:
netstat -rn | grep default - Manually specify gateway in config: set
gateway_overrideto your gateway IP - Verify WAN interface is correct:
ifconfigand check which has external IP
- Check if default route exists:
- Script not running from cron
- Check cron logs:
tail -f /var/log/cron - Verify Python path:
which python3 - Add full paths to crontab entry
- Check cron logs:
- Main log:
/var/log/pfsense_monitor.log - Cron output:
/var/log/pfsense_monitor_cron.log - System cron log:
/var/log/cron
bash
# Test gateway detection and connectivity
python3 -c "
from pfsense_monitor import PfSenseMonitor
m = PfSenseMonitor()
try:
gateway = m.get_wan_gateway()
print(f'Detected gateway: {gateway}')
result = m.test_gateway_connectivity()
print(f'Gateway reachable: {result}')
except Exception as e:
print(f'Error: {e}')
"
# Test email sending
python3 -c "
from pfsense_monitor import PfSenseMonitor
m = PfSenseMonitor()
m.send_email_alert('Test Alert', 'This is a test email from the pfSense monitor')
"
# View WAN interface details
ifconfig em0 # replace em0 with your WAN interface
# Check current gateway
netstat -rn | grep default
# Test manual ping to gateway
ping -c 3 [your-gateway-ip]
# Check if interface can be controlled
ifconfig em0 down && sleep 2 && ifconfig em0 up-
Protect the config file:
bash
chmod 600 pfsense_monitor_config.json
-
Use a dedicated Gmail account for monitoring alerts
-
Consider using firewall rules to restrict which hosts the script can ping
-
Monitor the log files for suspicious activity
-
Rotate Gmail app passwords periodically
- Gateway Detection: Auto-detects WAN gateway using multiple methods (netstat, route command)
- Connectivity Test: Pings gateway with configurable packet count
- Retry Logic: Retries multiple times with delays before taking action
- Interface Restart: Brings WAN interface down/up if connectivity fails
- Verification: Re-tests connectivity after interface restart
- System Reboot: Last resort if interface restart doesn't restore connectivity
- Email Alerts: Sends notifications at each step for monitoring
- Process State Detection: Checks Unbound process states via
ps auxoutput - CPU Monitoring: Optionally monitors sustained high CPU usage over multiple samples
- Bad State Detection: Identifies hung processes (SBWAIT/STOP states)
- DNS Responsiveness Testing: Actually queries DNS to verify service responds (catches kernel deadlocks)
- Service Restart: Attempts to restart Unbound service when issues detected
- Verification: Re-checks all health indicators after restart
- System Reboot: Forces reboot if service cannot be recovered (when enabled)
- Email Alerts: Sends notifications for all detection and recovery actions
Note on Kernel Deadlocks: Some Unbound failures involve kernel-level lock contention (e.g., in pf packet filter) where the process appears "running" with high CPU but is actually stuck spinning on a lock. The DNS responsiveness check catches these cases by testing if DNS actually responds to queries, regardless of what the process state looks like.
When a reboot is necessary, the script attempts multiple methods in sequence:
- Standard reboot commands (
shutdown -r now,reboot) - Direct reboot utilities (
/sbin/rebootwith various flags) - Aggressive kernel-level reboot (
sysctl kern.reboot) - Hardware watchdog trigger (if enabled, as last resort)
The script is designed to be conservative - it won't take drastic actions unless failures are genuinely detected and simpler recovery methods have failed.