Refactor entrypoint scripts for readability and implement robust signal handling#87

Open

jmtsi wants to merge 4 commits intoCisco-Talos:mainfrom

jmtsi:fix-entrypoint-signal-handling

jmtsi commented Jun 26, 2025 •

edited

Loading

The main problem with the entrypoint scripts was that they used exec without taking into consideration that it will replace the underlying shell process and therefore end the processing of the script. Tini was used as init process in Alpine-images, but as the daemons were started in the background, signals never reached them, and therefore the shutdown wasn't graceful.

There were also some corner cases where the PID 1 was replaced with a process which didn't react to any signals, leaving the container with hanging zombie processes. This caused problems when running in Kubernetes, for example.

This refactor simplifies the entrypoint scripts to make the logic easier to follow, and fixes signal handling and other minor bugs.

Summary of main changes:

Install tini in Debian Dockerfiles
Configure Debian entrypoint scripts to use tini
Ensure that the database is always updated before running clamd
Start the freshclam-daemon only after clamd is up. Otherwise the updated DB is not used (ClamAV does not always start up with the latest database #83)
Listen to TERM-signal and gracefully shut down all processes before exiting
Improve logging by adding process prefixes to entrypoint and daemon logs, and throttling the waiting-messages
Get rid of unnecessary and problematic exec tail -f "/dev/null" and exit cleanly if no processes were started
Replace all entrypoint scripts with refactored versions (they were identical between versions)

Fixed issues:

Future work:

There is a lot of duplication in the repo. All the versions could share the same scripts and distro-specific Dockerfiles instead of duplicating them many times. Fixing this was out of scope for this PR.
Bug: Docker healthcheck starts failing after 6 minutes if clamd is not started and env variable CLAMAV_NO_CLAMD is not defined, as clamdcheck.sh only skips the test if "${CLAMAV_NO_CLAMD:-}" != "false"

Example run & docker stop with the new entrypoint:

clamav-init.mp4

jmtsi added 2 commits

June 26, 2025 19:07


          Use tini as PID 1 for debian too

c1b5f69


          Rewrite entrypoint scripts for readability and better signal handling

c16d815

Author

jmtsi commented Jun 26, 2025

@val-ms Could you maybe check this out?

jmtsi commented

View reviewed changes

clamav/1.0/alpine/scripts/docker-entrypoint-unprivileged.sh

-                 command -v "${1}" > "/dev/null" 2>&1; then
-              	# Ensure healthcheck always passes
-              	CLAMAV_NO_CLAMD="true" exec "${@}"
-              else

Author

jmtsi Jun 26, 2025

This nesting was unnecessary, as using exec exits this script and nothing after it was ever executed


          Ensure that clamd starts with an up-to-date database and start freshc…

051d2b3

…lam daemon after clamd

jmtsi mentioned this pull request

clamd becomes zombie process after some time Cisco-Talos/clamav#330

Open


          Add process prefixes for all log output

c4a21a1

jmtsi commented

View reviewed changes

clamav/1.0/alpine/scripts/docker-entrypoint-unprivileged.sh

Comment on lines -17 to -18

		# Ensure healthcheck always passes
		CLAMAV_NO_CLAMD="true" exec "${@}"

Author

jmtsi Jun 26, 2025

This didn't work, as the clamdcheck.sh couldn't see this variable.

jmtsi commented

View reviewed changes

clamav/1.0/alpine/scripts/docker-entrypoint-unprivileged.sh

Comment on lines +19 to +39

+              child_pids=""
+              terminate_children() {
+                  if [ -n "${child_pids}" ]; then
+                      echo "[${SCRIPT_FILE}] Caught termination signal, stopping children: ${child_pids}"
+                      # Send SIGTERM first, then SIGKILL after a grace period if still running
+                      echo "[${SCRIPT_FILE}] Sending SIGTERM"
+                      kill -TERM ${child_pids} 2>/dev/null || true
+                      sleep 5
+                      # Check if any children are still running
+                      for pid in ${child_pids}; do
+                          if kill -0 "${pid}" 2>/dev/null; then
+                              echo "[${SCRIPT_FILE}] Child ${pid} is still running, sending SIGKILL"
+                              kill -KILL "${pid}" 2>/dev/null || true
+                          fi
+                      done
+                  fi
+                  echo "[${SCRIPT_FILE}] All children terminated, exiting."
+                  exit 0
+              }
+              trap terminate_children INT TERM

Author

jmtsi Jun 26, 2025 •

edited

Loading

Tini can only terminate child-processes, and as the daemons were started in the background, it could never pass the signals to them. This signal handling ensures that we terminate all the processes gracefully during docker stop, ctrl-c interruption or Kubernetes pod termination.

jmtsi commented

View reviewed changes

clamav/1.0/alpine/scripts/docker-entrypoint-unprivileged.sh

Comment on lines +62 to +69

+              # Ensure initial virus database exists, otherwise clamd refuses to start
+              echo "[${SCRIPT_FILE}] Updating initial database"
+              sed -e 's|^\(TestDatabases \)|#\1|' \
+                  -e '$a TestDatabases no' \
+                  -e 's|^\(NotifyClamd \)|#\1|' \
+                  /etc/clamav/freshclam.conf > /tmp/freshclam_initial.conf
+              freshclam --foreground --stdout --config-file=/tmp/freshclam_initial.conf | sed "s/^/[initial-freshclam] /"
+              rm /tmp/freshclam_initial.conf

Author

jmtsi Jun 26, 2025

Maybe this should be skipped if CLAMAV_NO_CLAMD=true?

jmtsi commented

View reviewed changes

clamav/1.0/alpine/scripts/docker-entrypoint-unprivileged.sh

Comment on lines +81 to +93

+                  elapsed=0
+                  until [ -S /tmp/clamd.sock ]; do
+                      if [ "${elapsed}" -ge "${CLAMD_STARTUP_TIMEOUT}" ]; then
+                          echo >&2 "[${SCRIPT_FILE}] Failed to start clamd (socket not found)"
+                          kill -TERM "${clamd_pid}" 2>/dev/null || true
+                          exit 1
+                      fi
+                      [ $((elapsed % 5)) -eq 0 ] && \
+                          printf "[%s] Waiting for clamd socket... (%s/%s)s\n" "${SCRIPT_FILE}" "${elapsed}" "${CLAMD_STARTUP_TIMEOUT}"
+                      sleep 1
+                      elapsed=$((elapsed + 1))
+                  done
+                  echo "[${SCRIPT_FILE}] Socket found after ${elapsed}s, clamd started."

Author

jmtsi Jun 26, 2025

Checks the condition every 1s, but prints the waiting-message only every 5s, and always prints a new line. Fixes #18

oliv3r commented Jul 26, 2025

It's been quite some years since I introduced docker to clamav, and I did not know everything I know today of course. However, for many years now, I've always used tini to handle signals, and I never generally do /usr/bin/env sh so I wonder if this was lost during one of the refactors? Or this predates my usage of tini :D

I did stop from using /bin/tini as shebang however, and instead use it from the dockerfile for two dumb reasons. One, alpine and debian put tini in different locations, by having it in the dockerfile, the entrypoint matches where the OS normally puts it (avoiding having to create symlinks or what not to make the distro's match. And secondly, so that the entrypoint script is a true shell script, and shellcheck is happy with it :) And a third reasons I suppose is that this is probably identical to what happens if you do init: true in docker-compose?

ENTRYPOINT [ "/sbin/tini", "--", "/init" ]

I do wonder what you meant with your original post however, that we do use tini (in the alpine images), but then state that tini cannot reap the background processes. Could the above solve that? As now tini should be PID 1 no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet