Skip to content

Refactor entrypoint scripts for readability and implement robust signal handling#87

Open
jmtsi wants to merge 4 commits intoCisco-Talos:mainfrom
jmtsi:fix-entrypoint-signal-handling
Open

Refactor entrypoint scripts for readability and implement robust signal handling#87
jmtsi wants to merge 4 commits intoCisco-Talos:mainfrom
jmtsi:fix-entrypoint-signal-handling

Conversation

@jmtsi
Copy link

@jmtsi jmtsi commented Jun 26, 2025

The main problem with the entrypoint scripts was that they used exec without taking into consideration that it will replace the underlying shell process and therefore end the processing of the script. Tini was used as init process in Alpine-images, but as the daemons were started in the background, signals never reached them, and therefore the shutdown wasn't graceful.

There were also some corner cases where the PID 1 was replaced with a process which didn't react to any signals, leaving the container with hanging zombie processes. This caused problems when running in Kubernetes, for example.

This refactor simplifies the entrypoint scripts to make the logic easier to follow, and fixes signal handling and other minor bugs.

Summary of main changes:

  • Install tini in Debian Dockerfiles
  • Configure Debian entrypoint scripts to use tini
  • Ensure that the database is always updated before running clamd
  • Start the freshclam-daemon only after clamd is up. Otherwise the updated DB is not used (ClamAV does not always start up with the latest database #83)
  • Listen to TERM-signal and gracefully shut down all processes before exiting
  • Improve logging by adding process prefixes to entrypoint and daemon logs, and throttling the waiting-messages
  • Get rid of unnecessary and problematic exec tail -f "/dev/null" and exit cleanly if no processes were started
  • Replace all entrypoint scripts with refactored versions (they were identical between versions)

Fixed issues:

Future work:

  • There is a lot of duplication in the repo. All the versions could share the same scripts and distro-specific Dockerfiles instead of duplicating them many times. Fixing this was out of scope for this PR.
  • Bug: Docker healthcheck starts failing after 6 minutes if clamd is not started and env variable CLAMAV_NO_CLAMD is not defined, as clamdcheck.sh only skips the test if "${CLAMAV_NO_CLAMD:-}" != "false"

Example run & docker stop with the new entrypoint:

clamav-init.mp4

@jmtsi
Copy link
Author

jmtsi commented Jun 26, 2025

@val-ms Could you maybe check this out?

command -v "${1}" > "/dev/null" 2>&1; then
# Ensure healthcheck always passes
CLAMAV_NO_CLAMD="true" exec "${@}"
else
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nesting was unnecessary, as using exec exits this script and nothing after it was ever executed

Comment on lines -17 to -18
# Ensure healthcheck always passes
CLAMAV_NO_CLAMD="true" exec "${@}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't work, as the clamdcheck.sh couldn't see this variable.

Comment on lines +19 to +39
child_pids=""

terminate_children() {
if [ -n "${child_pids}" ]; then
echo "[${SCRIPT_FILE}] Caught termination signal, stopping children: ${child_pids}"
# Send SIGTERM first, then SIGKILL after a grace period if still running
echo "[${SCRIPT_FILE}] Sending SIGTERM"
kill -TERM ${child_pids} 2>/dev/null || true
sleep 5
# Check if any children are still running
for pid in ${child_pids}; do
if kill -0 "${pid}" 2>/dev/null; then
echo "[${SCRIPT_FILE}] Child ${pid} is still running, sending SIGKILL"
kill -KILL "${pid}" 2>/dev/null || true
fi
done
fi
echo "[${SCRIPT_FILE}] All children terminated, exiting."
exit 0
}
trap terminate_children INT TERM
Copy link
Author

@jmtsi jmtsi Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tini can only terminate child-processes, and as the daemons were started in the background, it could never pass the signals to them. This signal handling ensures that we terminate all the processes gracefully during docker stop, ctrl-c interruption or Kubernetes pod termination.

Comment on lines +62 to +69
# Ensure initial virus database exists, otherwise clamd refuses to start
echo "[${SCRIPT_FILE}] Updating initial database"
sed -e 's|^\(TestDatabases \)|#\1|' \
-e '$a TestDatabases no' \
-e 's|^\(NotifyClamd \)|#\1|' \
/etc/clamav/freshclam.conf > /tmp/freshclam_initial.conf
freshclam --foreground --stdout --config-file=/tmp/freshclam_initial.conf | sed "s/^/[initial-freshclam] /"
rm /tmp/freshclam_initial.conf
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be skipped if CLAMAV_NO_CLAMD=true?

Comment on lines +81 to +93
elapsed=0
until [ -S /tmp/clamd.sock ]; do
if [ "${elapsed}" -ge "${CLAMD_STARTUP_TIMEOUT}" ]; then
echo >&2 "[${SCRIPT_FILE}] Failed to start clamd (socket not found)"
kill -TERM "${clamd_pid}" 2>/dev/null || true
exit 1
fi
[ $((elapsed % 5)) -eq 0 ] && \
printf "[%s] Waiting for clamd socket... (%s/%s)s\n" "${SCRIPT_FILE}" "${elapsed}" "${CLAMD_STARTUP_TIMEOUT}"
sleep 1
elapsed=$((elapsed + 1))
done
echo "[${SCRIPT_FILE}] Socket found after ${elapsed}s, clamd started."
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks the condition every 1s, but prints the waiting-message only every 5s, and always prints a new line. Fixes #18

@oliv3r
Copy link

oliv3r commented Jul 26, 2025

It's been quite some years since I introduced docker to clamav, and I did not know everything I know today of course. However, for many years now, I've always used tini to handle signals, and I never generally do /usr/bin/env sh so I wonder if this was lost during one of the refactors? Or this predates my usage of tini :D

I did stop from using /bin/tini as shebang however, and instead use it from the dockerfile for two dumb reasons. One, alpine and debian put tini in different locations, by having it in the dockerfile, the entrypoint matches where the OS normally puts it (avoiding having to create symlinks or what not to make the distro's match. And secondly, so that the entrypoint script is a true shell script, and shellcheck is happy with it :) And a third reasons I suppose is that this is probably identical to what happens if you do init: true in docker-compose?

ENTRYPOINT [ "/sbin/tini", "--", "/init" ]

I do wonder what you meant with your original post however, that we do use tini (in the alpine images), but then state that tini cannot reap the background processes. Could the above solve that? As now tini should be PID 1 no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants