Skip to content

Spoonman1091/pfsense_monitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pfSense Monitor Setup Guide

Overview

This script monitors your pfSense services and automatically takes corrective actions when issues are detected:

  • WAN Gateway: Monitors connectivity and restarts interface if needed
  • Unbound DNS: Detects hung processes and forces reboot if unrecoverable

Configuration File (pfsense_monitor_config.json)

Create this file in the same directory as your Python script:

json

{
    "email": {
        "smtp_server": "smtp.gmail.com",
        "smtp_port": 587,
        "sender_email": "your-monitoring-email@gmail.com",
        "sender_password": "your-gmail-app-password",
        "recipient_email": "admin@yourcompany.com"
    },
    "network": {
        "wan_interface": "em0",
        "gateway_override": "",
        "timeout_seconds": 5,
        "retry_attempts": 3,
        "retry_delay": 10,
        "ping_count": 3
    },
    "unbound": {
        "enabled": true,
        "process_name": "unbound",
        "bad_states": ["SBWAIT", "STOP"],
        "restart_attempts": 2,
        "restart_timeout": 30,
        "force_reboot_on_failure": true
    },
    "watchdog": {
        "enabled": false,
        "use_as_last_resort": true,
        "timeout_seconds": 60
    },
    "logging": {
        "log_file": "/var/log/pfsense_monitor.log",
        "max_log_size_mb": 10
    },
    "actions": {
        "restart_interface_delay": 30,
        "reboot_delay": 60
    }
}

Setup Instructions

1. Gmail App Password Setup

  1. Enable 2-factor authentication on your Gmail account
  2. Generate an App Password:
    • Go to Google Account settings
    • Security → App passwords
    • Generate password for "Mail"
    • Use this password in the config file (not your regular Gmail password)

2. Find Your WAN Interface and Gateway

Run these commands on pfSense to identify your WAN interface and gateway:

Find WAN interface:

bash

ifconfig

Common WAN interface names:

  • em0, em1 (Intel)
  • igb0, igb1 (Intel Gigabit)
  • re0, re1 (Realtek)
  • bge0, bge1 (Broadcom)
  • vtnet0, vtnet1 (VirtIO - virtual environments)

Find default gateway:

bash

# Method 1: netstat
netstat -rn | grep default

# Method 2: route command (FreeBSD/pfSense)
route -n get default

# Method 3: Check pfSense GUI
# Navigate to Status > Gateways in pfSense web interface

The script will auto-detect the gateway, but you can override it in the config if needed.

3. Installation on pfSense

  1. Create a dedicated directory:

    bash

    mkdir -p /usr/local/bin/pfsense_monitor
    cd /usr/local/bin/pfsense_monitor
  2. Copy the script and config:

    bash

    # Upload via SCP or copy directly
    scp pfsense_monitor.py root@your-pfsense-ip:/usr/local/bin/pfsense_monitor/
    scp pfsense_monitor_config.json root@your-pfsense-ip:/usr/local/bin/pfsense_monitor/
  3. Make the script executable:

    bash

    chmod +x /usr/local/bin/pfsense_monitor/pfsense_monitor.py
  4. Test the script manually:

    bash

    cd /usr/local/bin/pfsense_monitor
    python3 pfsense_monitor.py

    Or test from any directory:

    bash

    python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py

4. Cron Job Setup

Add to root's crontab to run every 5 minutes:

bash

crontab -e

Add this line:

bash

# Check pfSense services every 5 minutes
*/5 * * * * /usr/local/bin/python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py >> /var/log/pfsense_monitor_cron.log 2>&1

Alternative intervals:

bash

# Every minute (aggressive monitoring)
* * * * * /usr/local/bin/python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py

# Every 10 minutes (conservative)
*/10 * * * * /usr/local/bin/python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py

# Every 2 minutes during business hours only
*/2 8-17 * * 1-5 /usr/local/bin/python3 /usr/local/bin/pfsense_monitor/pfsense_monitor.py

Configuration Options Explained

Email Settings

  • smtp_server/smtp_port: Gmail SMTP settings (don't change unless using different provider)
  • sender_email: The Gmail account sending alerts
  • sender_password: Gmail app password (not regular password)
  • recipient_email: Where to send alerts
  • fallback_dns_servers: Public DNS servers to use when local DNS (Unbound) is down (default: ["8.8.8.8", "1.1.1.1"])
  • smtp_server_fallback_ips: Hardcoded Gmail SMTP IPs used as last resort when all DNS fails

Network Settings

  • wan_interface: WAN network interface name (find with ifconfig)
  • gateway_override: Manual gateway IP (leave empty for auto-detection)
  • timeout_seconds: How long to wait for ping responses
  • retry_attempts: How many times to retry before taking action
  • retry_delay: Seconds between retry attempts
  • ping_count: Number of ping packets to send per test

Unbound DNS Monitoring Settings

  • enabled: Enable/disable unbound service monitoring (true/false)
  • process_name: Name of the unbound process (default: "unbound")
  • bad_states: List of process states that indicate hung process (default: ["SBWAIT", "STOP"])
  • restart_attempts: Number of times to attempt restarting unbound before forcing reboot
  • restart_timeout: Seconds to wait after restart to verify recovery
  • force_reboot_on_failure: If true, forces system reboot when unbound cannot be restarted
  • cpu_monitoring: CPU usage monitoring settings
    • enabled: Enable/disable CPU usage monitoring (default: true)
    • threshold_percent: CPU percentage threshold that triggers restart (default: 80.0)
    • sample_count: Number of CPU samples to collect for sustained check (default: 3)
    • sample_interval_seconds: Seconds between each CPU sample (default: 5)
  • dns_responsiveness: DNS query responsiveness testing (most reliable detection method)
    • enabled: Enable/disable DNS responsiveness testing (default: true)
    • test_domains: List of domains to query via localhost (default: ["google.com", "cloudflare.com"])
    • timeout_seconds: How long to wait for each DNS query (default: 5)
    • require_all: If true, all domains must resolve; if false, at least one must (default: false)

CPU Monitoring Behavior: The monitor collects multiple CPU samples over time to detect sustained high CPU usage. If the majority of samples (>=50%) exceed the threshold, the service will be restarted. This approach filters out transient CPU spikes and only triggers recovery for persistent high CPU conditions. Total CPU check time is approximately: sample_count × sample_interval_seconds (default: 15 seconds).

DNS Responsiveness Behavior: This is the most reliable detection method as it tests whether DNS actually responds to queries. It catches kernel-level deadlocks (e.g., pf lock contention) where the process appears "running" but is stuck in a kernel lock and cannot respond. Uses drill or dig commands to query localhost DNS.

Watchdog Timer Settings

  • enabled: Enable/disable hardware watchdog support (default: false)
  • use_as_last_resort: Use watchdog as final fallback when all software reboot methods fail
  • timeout_seconds: How long to wait for hardware watchdog to trigger reboot (typically 60 seconds)

Note: Watchdog timer requires hardware support and must be enabled in pfSense:

  • Navigate to: System > Advanced > Miscellaneous > Watchdog
  • Or enable via shell: service watchdogd enable && service watchdogd start

Action Settings

  • restart_interface_delay: How long to wait after restarting interface
  • reboot_delay: How long to wait before rebooting system

Troubleshooting

Common Issues:

  1. Permission denied
    • Make sure script runs as root
    • Check file permissions: chmod +x pfsense_monitor.py
  2. Email not sending
    • Verify Gmail app password (not regular password)
    • Check 2FA is enabled on Gmail account
    • Test with a simple Python email script first
  3. Interface not found
    • Run ifconfig to see available interfaces
    • Update wan_interface in config file
    • Check if interface is actually the WAN interface
  4. Gateway detection issues
    • Check if default route exists: netstat -rn | grep default
    • Manually specify gateway in config: set gateway_override to your gateway IP
    • Verify WAN interface is correct: ifconfig and check which has external IP
  5. Script not running from cron
    • Check cron logs: tail -f /var/log/cron
    • Verify Python path: which python3
    • Add full paths to crontab entry

Log Files:

  • Main log: /var/log/pfsense_monitor.log
  • Cron output: /var/log/pfsense_monitor_cron.log
  • System cron log: /var/log/cron

Testing Individual Components:

bash

# Test gateway detection and connectivity
python3 -c "
from pfsense_monitor import PfSenseMonitor
m = PfSenseMonitor()
try:
    gateway = m.get_wan_gateway()
    print(f'Detected gateway: {gateway}')
    result = m.test_gateway_connectivity()
    print(f'Gateway reachable: {result}')
except Exception as e:
    print(f'Error: {e}')
"

# Test email sending
python3 -c "
from pfsense_monitor import PfSenseMonitor
m = PfSenseMonitor()
m.send_email_alert('Test Alert', 'This is a test email from the pfSense monitor')
"

# View WAN interface details
ifconfig em0  # replace em0 with your WAN interface

# Check current gateway
netstat -rn | grep default

# Test manual ping to gateway
ping -c 3 [your-gateway-ip]

# Check if interface can be controlled
ifconfig em0 down && sleep 2 && ifconfig em0 up

Security Considerations

  1. Protect the config file:

    bash

    chmod 600 pfsense_monitor_config.json
  2. Use a dedicated Gmail account for monitoring alerts

  3. Consider using firewall rules to restrict which hosts the script can ping

  4. Monitor the log files for suspicious activity

  5. Rotate Gmail app passwords periodically

Script Workflow

WAN Gateway Monitoring

  1. Gateway Detection: Auto-detects WAN gateway using multiple methods (netstat, route command)
  2. Connectivity Test: Pings gateway with configurable packet count
  3. Retry Logic: Retries multiple times with delays before taking action
  4. Interface Restart: Brings WAN interface down/up if connectivity fails
  5. Verification: Re-tests connectivity after interface restart
  6. System Reboot: Last resort if interface restart doesn't restore connectivity
  7. Email Alerts: Sends notifications at each step for monitoring

Unbound DNS Monitoring

  1. Process State Detection: Checks Unbound process states via ps aux output
  2. CPU Monitoring: Optionally monitors sustained high CPU usage over multiple samples
  3. Bad State Detection: Identifies hung processes (SBWAIT/STOP states)
  4. DNS Responsiveness Testing: Actually queries DNS to verify service responds (catches kernel deadlocks)
  5. Service Restart: Attempts to restart Unbound service when issues detected
  6. Verification: Re-checks all health indicators after restart
  7. System Reboot: Forces reboot if service cannot be recovered (when enabled)
  8. Email Alerts: Sends notifications for all detection and recovery actions

Note on Kernel Deadlocks: Some Unbound failures involve kernel-level lock contention (e.g., in pf packet filter) where the process appears "running" with high CPU but is actually stuck spinning on a lock. The DNS responsiveness check catches these cases by testing if DNS actually responds to queries, regardless of what the process state looks like.

Reboot Escalation Strategy

When a reboot is necessary, the script attempts multiple methods in sequence:

  1. Standard reboot commands (shutdown -r now, reboot)
  2. Direct reboot utilities (/sbin/reboot with various flags)
  3. Aggressive kernel-level reboot (sysctl kern.reboot)
  4. Hardware watchdog trigger (if enabled, as last resort)

The script is designed to be conservative - it won't take drastic actions unless failures are genuinely detected and simpler recovery methods have failed.

About

pfSense Monitoring Script

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages