Skip to content

pcp_promote_node switchover reports success when pg_rewind from follow_primary fails #147

@tallenaz

Description

@tallenaz

Using pgpool 4.7.0:

root@dlss-azanella-db-01:~# pgpool --version
pgpool-II version 4.7.0 (tasukiboshi)

I have a healthy pgpool cluster:

root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:39:31
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:39:47
root@dlss-azanella-db-01:~# su - postgres -c 'psql -c "show checkpoint_timeout;"'
 checkpoint_timeout 
--------------------
 5min
(1 row)

After the checkpoint_timeout, I perform a switchover:

root@dlss-azanella-db-01:~# date
Fri Jan 30 11:47:32 AM PST 2026
root@dlss-azanella-db-01:~# pcp_promote_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 1 --switchover -w
pcp_promote_node -- Command Successful

The follow_primary script succeeds:

2026-01-30 11:47:49.972: follow_child pid 353841: LOG:  === Starting follow primary command for node 0 ===
2026-01-30 11:47:49.972: follow_child pid 353841: LOG:  execute command: /etc/pgpool2/follow_primary.sh  0 dlss-azanella-db-01.stanford.edu  5432 /var/lib/postgresql/16/main  1 dlss-azanella-db-02.stanford.edu 0 0 5432 /var/lib/postgresql/16/main
+ NODE_ID=0
+ NODE_HOST=dlss-azanella-db-01.stanford.edu
+ NODE_PORT=5432
+ NODE_PGDATA=/var/lib/postgresql/16/main
+ NEW_PRIMARY_NODE_ID=1
+ NEW_PRIMARY_NODE_HOST=dlss-azanella-db-02.stanford.edu
+ OLD_MAIN_NODE_ID=0
+ OLD_PRIMARY_NODE_ID=0
+ NEW_PRIMARY_NODE_PORT=5432
+ NEW_PRIMARY_NODE_PGDATA=/var/lib/postgresql/16/main
+ PGHOME=/usr/lib/postgresql/16
+ REPLUSER=repl
+ PCP_USER=pgpool
+ PGPOOL_PATH=/sbin
+ PCP_PORT=9898
++ tr -- -. _
++ echo dlss-azanella-db-01.stanford.edu
+ REPL_SLOT_NAME=dlss_azanella_db_01_stanford_edu
+ POSTGRESQL_STARTUP_USER=postgres
+ SSH_KEY_FILE=id_rsa
+ SSH_OPTIONS='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa'
+ PGCONFIG_DIR=/etc/postgresql/16/main
+ echo follow_primary.sh: start: Standby node 0
follow_primary.sh: start: Standby node 0
+ /usr/lib/postgresql/16/bin/pg_isready -h dlss-azanella-db-01.stanford.edu -p 5432
2026-01-30 11:47:49.983: sr_check_worker pid 351959: LOG:  worker process received restart request
2026-01-30 11:47:49.984: watchdog pid 175581: LOG:  received the failover indication from Pgpool-II on IPC interface
2026-01-30 11:47:49.984: watchdog pid 175581: LOG:  watchdog is informed of failover end by the main process
2026-01-30 11:47:49.984: main pid 175576: LOG:  === Failover done. shutdown host dlss-azanella-db-01.stanford.edu(5432) ===
+ '[' 0 -ne 0 ']'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '~/.ssh/id_rsa' postgres@dlss-azanella-db-02.stanford.edu ls /tmp
Warning: Permanently added 'dlss-azanella-db-02.stanford.edu' (ED25519) to the list of known hosts.
+ '[' 0 -ne 0 ']'
++ /usr/lib/postgresql/16/bin/psql -V
++ awk '{print $3}'
++ sed 's/\..*//'
++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
+ PGVERSION=16
+ '[' 16 -ge 12 ']'
+ RECOVERYCONF=/var/lib/postgresql/16/main/myrecovery.conf
+ echo follow_primary.sh: pg_rewind for node 0
follow_primary.sh: pg_rewind for node 0
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'checkpoint;'
CHECKPOINT
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'SELECT pg_create_physical_replication_slot('\''dlss_azanella_db_01_stanford_edu'\'');'
+ '[' 0 -ne 0 ']'
++ echo /var/lib/postgresql/16/main/myrecovery.conf
++ sed -e 's/\//\\\//g'
++ echo /var/lib/postgresql/16/main/myrecovery.conf
++ sed -e 's/\//\\\//g'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '~/.ssh/id_rsa' postgres@dlss-azanella-db-01.stanford.edu '

    set -o errexit

    /usr/bin/sudo /usr/bin/systemctl stop postgresql@16-main
    # /usr/lib/postgresql/16/bin/pg_ctl -w -m f -D /var/lib/postgresql/16/main stop

    /usr/lib/postgresql/16/bin/pg_rewind -D /var/lib/postgresql/16/main --source-server="user=postgres host=dlss-azanella-db-02.stanford.edu port=5432 dbname=postgres"

    [ -d "/var/lib/postgresql/16/main" ] && rm -rf /var/lib/postgresql/16/main/pg_replslot/*

    cat > /var/lib/postgresql/16/main/myrecovery.conf << EOT
primary_conninfo = '\''host=dlss-azanella-db-02.stanford.edu port=5432 user=repl application_name=dlss-azanella-db-01.stanford.edu passfile='\'''\''/var/lib/postgresql/.pgpass'\'''\'''\''
recovery_target_timeline = '\''latest'\''
primary_slot_name = '\''dlss_azanella_db_01_stanford_edu'\''
EOT

    if [ 16 -ge 12 ]; then
        sed -i -e "\$ainclude_if_exists = '\''\/var\/lib\/postgresql\/16\/main\/myrecovery.conf'\''"                -e "/^include_if_exists = '\''\/var\/lib\/postgresql\/16\/main\/myrecovery.conf'\''/d" /etc/postgresql/16/main/postgresql.conf
        touch /var/lib/postgresql/16/main/standby.signal
    else
        echo "standby_mode = '\''on'\''" >> /var/lib/postgresql/16/main/myrecovery.conf
    fi

    /usr/bin/sudo /usr/bin/systemctl start postgresql@16-main
    # /usr/lib/postgresql/16/bin/pg_ctl -l /dev/null -w -D /var/lib/postgresql/16/main start

'
Warning: Permanently added 'dlss-azanella-db-01.stanford.edu' (ED25519) to the list of known hosts.
pg_rewind: servers diverged at WAL location 0/7E000148 on timeline 73
pg_rewind: rewinding from last common checkpoint at 0/7E000098 on timeline 73
2026-01-30 11:47:50.984: pcp_main pid 351958: LOG:  restart request received in pcp child process
2026-01-30 11:47:50.986: main pid 175576: LOG:  PCP child 351958 exits with status 0 in failover()
2026-01-30 11:47:50.987: main pid 175576: LOG:  fork a new PCP child pid 353904 in failover()
2026-01-30 11:47:50.987: pcp_main pid 353904: LOG:  PCP process: 353904 started
2026-01-30 11:47:50.988: sr_check_worker pid 353905: LOG:  process started
pg_rewind: Done!
+ '[' 0 -eq 0 ']'
+ /sbin/pcp_attach_node -w -h localhost -U pgpool -p 9898 -n 0
2026-01-30 11:47:53.793: pcp_child pid 353914: LOG:  received failback request for node_id: 0 from pid [353914]
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  watchdog received the failover command from local pgpool-II on IPC interface
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  watchdog is processing the failover command [FAILBACK_REQUEST] received from local pgpool-II on IPC interface
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  The failover request does not need quorum to hold
2026-01-30 11:47:53.793: watchdog pid 175581: DETAIL:  proceeding with the failover
2026-01-30 11:47:53.793: watchdog pid 175581: HINT:  REQ_DETAIL_CONFIRMED
2026-01-30 11:47:53.793: pcp_child pid 353914: LOG:  signal_user1_to_parent_with_reason(0)
2026-01-30 11:47:53.793: main pid 175576: LOG:  Pgpool-II parent process received SIGUSR1
2026-01-30 11:47:53.793: main pid 175576: LOG:  Pgpool-II parent process has received failover request
pcp_attach_node -- Command Successful
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  received the failover indication from Pgpool-II on IPC interface
2026-01-30 11:47:53.793: watchdog pid 175581: LOG:  watchdog is informed of failover start by the main process
+ '[' 0 -ne 0 ']'
2026-01-30 11:47:53.794: main pid 175576: LOG:  === Starting fail back. reconnect host dlss-azanella-db-01.stanford.edu(5432) ===
2026-01-30 11:47:53.794: main pid 175576: LOG:  Node 1 is not down (status: 1)
+ echo follow_primary.sh: end: follow primary command is completed successfully
follow_primary.sh: end: follow primary command is completed successfully
+ exit 0
2026-01-30 11:47:53.794: main pid 175576: LOG:  Do not restart children because we are failing back node id 0 host: dlss-azanella-db-01.stanford.edu port: 5432 and we are in streaming replication mode and not all backends were down
2026-01-30 11:47:53.794: main pid 175576: LOG:  find_primary_node_repeatedly: follow primary is ongoing. return current primary: 1
2026-01-30 11:47:53.794: main pid 175576: LOG:  failover: set new primary node: 1
2026-01-30 11:47:53.794: main pid 175576: LOG:  failover: set new main node: 0
2026-01-30 11:47:53.794: watchdog pid 175581: LOG:  received the failover indication from Pgpool-II on IPC interface
2026-01-30 11:47:53.794: watchdog pid 175581: LOG:  watchdog is informed of failover end by the main process
2026-01-30 11:47:53.794: sr_check_worker pid 353905: LOG:  worker process received restart request
2026-01-30 11:47:53.794: main pid 175576: LOG:  === Failback done. reconnect host dlss-azanella-db-01.stanford.edu(5432) ===
2026-01-30 11:47:53.794: follow_child pid 353841: LOG:  === Follow primary command for node 0 ended ===
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:47:53
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:47:49

Also, I can perform a switchover in the other direction right away:

root@dlss-azanella-db-01:~# pcp_promote_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 0 --switchover -w
pcp_promote_node -- Command Successful
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:52:01
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:52:05

I can also successfully simulate a failover and perform a recovery in both directions:

root@dlss-azanella-db-01:~# systemctl stop postgresql@16-main
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 3 0.500000 down down standby unknown 0 none none 2026-01-30 11:55:22
dlss-azanella-db-02.stanford.edu 5432 2 0.500000 up up primary primary 0 none none 2026-01-30 11:55:22
root@dlss-azanella-db-01:~# pcp_recovery_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 0 -w
pcp_recovery_node -- Command Successful
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:56:03
dlss-azanella-db-02.stanford.edu 5432 2 0.500000 up up primary primary 0 none none 2026-01-30 11:55:22
root@dlss-azanella-db-02:~# systemctl stop postgresql@16-main
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:57:05
dlss-azanella-db-02.stanford.edu 5432 3 0.500000 down down standby unknown 0 none none 2026-01-30 11:57:05
root@dlss-azanella-db-01:~# pcp_recovery_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 1 -w
pcp_recovery_node -- Command Successful
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:57:05
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 11:57:31

But if I do not wait for the next checkpoint_timeout, a pcp_promote_node –switchover says the command succeeds, but follow_primary fails:

root@dlss-azanella-db-01:~# pcp_promote_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 1 --switchover -w
pcp_promote_node -- Command Successful
2026-01-30 11:59:10.048: follow_child pid 355683: LOG:  === Starting follow primary command for node 0 ===
2026-01-30 11:59:10.048: follow_child pid 355683: LOG:  execute command: /etc/pgpool2/follow_primary.sh  0 dlss-azanella-db-01.stanford.edu  5432 /var/lib/postgresql/16/main  1 dlss-azanella-db-02.stanford.edu 0 0 5432 /var/lib/postgresql/16/main
+ NODE_ID=0
+ NODE_HOST=dlss-azanella-db-01.stanford.edu
+ NODE_PORT=5432
+ NODE_PGDATA=/var/lib/postgresql/16/main
+ NEW_PRIMARY_NODE_ID=1
+ NEW_PRIMARY_NODE_HOST=dlss-azanella-db-02.stanford.edu
+ OLD_MAIN_NODE_ID=0
+ OLD_PRIMARY_NODE_ID=0
+ NEW_PRIMARY_NODE_PORT=5432
+ NEW_PRIMARY_NODE_PGDATA=/var/lib/postgresql/16/main
+ PGHOME=/usr/lib/postgresql/16
+ REPLUSER=repl
+ PCP_USER=pgpool
+ PGPOOL_PATH=/sbin
+ PCP_PORT=9898
++ echo dlss-azanella-db-01.stanford.edu
++ tr -- -. _
+ REPL_SLOT_NAME=dlss_azanella_db_01_stanford_edu
+ POSTGRESQL_STARTUP_USER=postgres
+ SSH_KEY_FILE=id_rsa
+ SSH_OPTIONS='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa'
+ PGCONFIG_DIR=/etc/postgresql/16/main
+ echo follow_primary.sh: start: Standby node 0
follow_primary.sh: start: Standby node 0
+ /usr/lib/postgresql/16/bin/pg_isready -h dlss-azanella-db-01.stanford.edu -p 5432
2026-01-30 11:59:10.058: sr_check_worker pid 355388: LOG:  worker process received restart request
2026-01-30 11:59:10.058: watchdog pid 175581: LOG:  received the failover indication from Pgpool-II on IPC interface
2026-01-30 11:59:10.058: watchdog pid 175581: LOG:  watchdog is informed of failover end by the main process
2026-01-30 11:59:10.059: main pid 175576: LOG:  === Failover done. shutdown host dlss-azanella-db-01.stanford.edu(5432) ===
+ '[' 0 -ne 0 ']'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '~/.ssh/id_rsa' postgres@dlss-azanella-db-02.stanford.edu ls /tmp
Warning: Permanently added 'dlss-azanella-db-02.stanford.edu' (ED25519) to the list of known hosts.
+ '[' 0 -ne 0 ']'
++ /usr/lib/postgresql/16/bin/psql -V
++ awk '{print $3}'
++ sed 's/\..*//'
++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
+ PGVERSION=16
+ '[' 16 -ge 12 ']'
+ RECOVERYCONF=/var/lib/postgresql/16/main/myrecovery.conf
+ echo follow_primary.sh: pg_rewind for node 0
follow_primary.sh: pg_rewind for node 0
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'checkpoint;'
CHECKPOINT
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'SELECT pg_create_physical_replication_slot('\''dlss_azanella_db_01_stanford_edu'\'');'
+ '[' 0 -ne 0 ']'
++ echo /var/lib/postgresql/16/main/myrecovery.conf
++ sed -e 's/\//\\\//g'
++ echo /var/lib/postgresql/16/main/myrecovery.conf
++ sed -e 's/\//\\\//g'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '~/.ssh/id_rsa' postgres@dlss-azanella-db-01.stanford.edu '

    set -o errexit

    /usr/bin/sudo /usr/bin/systemctl stop postgresql@16-main
    # /usr/lib/postgresql/16/bin/pg_ctl -w -m f -D /var/lib/postgresql/16/main stop

    /usr/lib/postgresql/16/bin/pg_rewind -D /var/lib/postgresql/16/main --source-server="user=postgres host=dlss-azanella-db-02.stanford.edu port=5432 dbname=postgres"

    [ -d "/var/lib/postgresql/16/main" ] && rm -rf /var/lib/postgresql/16/main/pg_replslot/*

    cat > /var/lib/postgresql/16/main/myrecovery.conf << EOT
primary_conninfo = '\''host=dlss-azanella-db-02.stanford.edu port=5432 user=repl application_name=dlss-azanella-db-01.stanford.edu passfile='\'''\''/var/lib/postgresql/.pgpass'\'''\'''\''
recovery_target_timeline = '\''latest'\''
primary_slot_name = '\''dlss_azanella_db_01_stanford_edu'\''
EOT

    if [ 16 -ge 12 ]; then
        sed -i -e "\$ainclude_if_exists = '\''\/var\/lib\/postgresql\/16\/main\/myrecovery.conf'\''"                -e "/^include_if_exists = '\''\/var\/lib\/postgresql\/16\/main\/myrecovery.conf'\''/d" /etc/postgresql/16/main/postgresql.conf
        touch /var/lib/postgresql/16/main/standby.signal
    else
        echo "standby_mode = '\''on'\''" >> /var/lib/postgresql/16/main/myrecovery.conf
    fi

    /usr/bin/sudo /usr/bin/systemctl start postgresql@16-main
    # /usr/lib/postgresql/16/bin/pg_ctl -l /dev/null -w -D /var/lib/postgresql/16/main start

'
Warning: Permanently added 'dlss-azanella-db-01.stanford.edu' (ED25519) to the list of known hosts.
2026-01-30 11:59:11.059: pcp_main pid 355387: LOG:  restart request received in pcp child process
2026-01-30 11:59:11.061: main pid 175576: LOG:  PCP child 355387 exits with status 0 in failover()
2026-01-30 11:59:11.061: main pid 175576: LOG:  fork a new PCP child pid 355743 in failover()
2026-01-30 11:59:11.061: pcp_main pid 355743: LOG:  PCP process: 355743 started
2026-01-30 11:59:11.062: sr_check_worker pid 355744: LOG:  process started
pg_rewind: servers diverged at WAL location 0/84000060 on timeline 77
pg_rewind: error: could not open file "/var/lib/postgresql/16/main/pg_wal/0000004D0000000000000083": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/83000138
+ '[' 1 -eq 0 ']'
+ /usr/lib/postgresql/16/bin/psql -h dlss-azanella-db-02.stanford.edu -p 5432 postgres -c 'SELECT pg_drop_replication_slot('\''dlss_azanella_db_01_stanford_edu'\'');'
+ '[' 0 -ne 0 ']'
+ echo ERROR: follow_primary.sh: end: follow primary command failed
ERROR: follow_primary.sh: end: follow primary command failed
+ exit 1
2026-01-30 11:59:11.133: follow_child pid 355683: LOG:  === Follow primary command for node 0 ended ===
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 3 0.500000 down down standby unknown 0 none none 2026-01-30 11:59:10
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:59:10

Although follow_primary failed, the failover promotion succeeded:

failover.sh: end: new_main_node_id=1 on dlss-azanella-db-02.stanford.edu was successfully promoted to primary

02 is the primary, but there is no replication to 01, even though logs show follow_primary executed the checkpoint. At this point I can recover 01 as the standby and end up with a healthy cluster again:

root@dlss-azanella-db-01:~# pcp_recovery_node -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -n 0 -w
pcp_recovery_node -- Command Successful
root@dlss-azanella-db-01:~# pcp_node_info -h dlss-azanella-pgpool.stanford.edu -p 9898 -U pgpool -w
dlss-azanella-db-01.stanford.edu 5432 1 0.500000 waiting up standby standby 0 streaming async 2026-01-30 12:07:04
dlss-azanella-db-02.stanford.edu 5432 1 0.500000 waiting up primary primary 0 none none 2026-01-30 11:59:10

And at this point, unless I perform a manual checkpoint, or wait until the next checkpoint_timeout has expired a switchover will do the same thing: pcp_promote_node –switchover will say it succeeded, but logs and pcp_node_info will show a promoted node without replication. If I perform a manual checkpoint or wait until the next checkpoint_timeout, a pcp_promote_node –switchover will say it succeeded, and logs will show follow_primary succeeded, and pcp_node_info will show a promoted node with streaming replication.

I guess pcp_promote_node does succeed, in the sense that I end up with a promoted node. But as the switchover does not succeed, I would have expected pcp_promote_node –switchover to error out when the logs show follow_primary does not succeed, and when pcp_node_info does not show a switchover. Unless that is the expected behavior, or I have something misconfigured?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions