Refactor entrypoint scripts for readability and implement robust signal handling#87
Refactor entrypoint scripts for readability and implement robust signal handling#87jmtsi wants to merge 4 commits intoCisco-Talos:mainfrom
Conversation
|
@val-ms Could you maybe check this out? |
| command -v "${1}" > "/dev/null" 2>&1; then | ||
| # Ensure healthcheck always passes | ||
| CLAMAV_NO_CLAMD="true" exec "${@}" | ||
| else |
There was a problem hiding this comment.
This nesting was unnecessary, as using exec exits this script and nothing after it was ever executed
…lam daemon after clamd
| # Ensure healthcheck always passes | ||
| CLAMAV_NO_CLAMD="true" exec "${@}" |
There was a problem hiding this comment.
This didn't work, as the clamdcheck.sh couldn't see this variable.
| child_pids="" | ||
|
|
||
| terminate_children() { | ||
| if [ -n "${child_pids}" ]; then | ||
| echo "[${SCRIPT_FILE}] Caught termination signal, stopping children: ${child_pids}" | ||
| # Send SIGTERM first, then SIGKILL after a grace period if still running | ||
| echo "[${SCRIPT_FILE}] Sending SIGTERM" | ||
| kill -TERM ${child_pids} 2>/dev/null || true | ||
| sleep 5 | ||
| # Check if any children are still running | ||
| for pid in ${child_pids}; do | ||
| if kill -0 "${pid}" 2>/dev/null; then | ||
| echo "[${SCRIPT_FILE}] Child ${pid} is still running, sending SIGKILL" | ||
| kill -KILL "${pid}" 2>/dev/null || true | ||
| fi | ||
| done | ||
| fi | ||
| echo "[${SCRIPT_FILE}] All children terminated, exiting." | ||
| exit 0 | ||
| } | ||
| trap terminate_children INT TERM |
There was a problem hiding this comment.
Tini can only terminate child-processes, and as the daemons were started in the background, it could never pass the signals to them. This signal handling ensures that we terminate all the processes gracefully during docker stop, ctrl-c interruption or Kubernetes pod termination.
| # Ensure initial virus database exists, otherwise clamd refuses to start | ||
| echo "[${SCRIPT_FILE}] Updating initial database" | ||
| sed -e 's|^\(TestDatabases \)|#\1|' \ | ||
| -e '$a TestDatabases no' \ | ||
| -e 's|^\(NotifyClamd \)|#\1|' \ | ||
| /etc/clamav/freshclam.conf > /tmp/freshclam_initial.conf | ||
| freshclam --foreground --stdout --config-file=/tmp/freshclam_initial.conf | sed "s/^/[initial-freshclam] /" | ||
| rm /tmp/freshclam_initial.conf |
There was a problem hiding this comment.
Maybe this should be skipped if CLAMAV_NO_CLAMD=true?
| elapsed=0 | ||
| until [ -S /tmp/clamd.sock ]; do | ||
| if [ "${elapsed}" -ge "${CLAMD_STARTUP_TIMEOUT}" ]; then | ||
| echo >&2 "[${SCRIPT_FILE}] Failed to start clamd (socket not found)" | ||
| kill -TERM "${clamd_pid}" 2>/dev/null || true | ||
| exit 1 | ||
| fi | ||
| [ $((elapsed % 5)) -eq 0 ] && \ | ||
| printf "[%s] Waiting for clamd socket... (%s/%s)s\n" "${SCRIPT_FILE}" "${elapsed}" "${CLAMD_STARTUP_TIMEOUT}" | ||
| sleep 1 | ||
| elapsed=$((elapsed + 1)) | ||
| done | ||
| echo "[${SCRIPT_FILE}] Socket found after ${elapsed}s, clamd started." |
There was a problem hiding this comment.
Checks the condition every 1s, but prints the waiting-message only every 5s, and always prints a new line. Fixes #18
|
It's been quite some years since I introduced docker to clamav, and I did not know everything I know today of course. However, for many years now, I've always used I did stop from using ENTRYPOINT [ "/sbin/tini", "--", "/init" ]I do wonder what you meant with your original post however, that we do use tini (in the alpine images), but then state that tini cannot reap the background processes. Could the above solve that? As now tini should be PID 1 no? |
The main problem with the entrypoint scripts was that they used exec without taking into consideration that it will replace the underlying shell process and therefore end the processing of the script. Tini was used as init process in Alpine-images, but as the daemons were started in the background, signals never reached them, and therefore the shutdown wasn't graceful.
There were also some corner cases where the PID 1 was replaced with a process which didn't react to any signals, leaving the container with hanging zombie processes. This caused problems when running in Kubernetes, for example.
This refactor simplifies the entrypoint scripts to make the logic easier to follow, and fixes signal handling and other minor bugs.
Summary of main changes:
exec tail -f "/dev/null"and exit cleanly if no processes were startedFixed issues:
Future work:
CLAMAV_NO_CLAMDis not defined, asclamdcheck.shonly skips the test if"${CLAMAV_NO_CLAMD:-}" != "false"Example run &
docker stopwith the new entrypoint:clamav-init.mp4