-
Notifications
You must be signed in to change notification settings - Fork 615
Description
Output of failed builds can be cumbersome to use for debugging
- related: Update Compose reference API to v2.16.0 docs#16702 (comment)
- related: 'docker build' error: "failed to solve with frontend dockerfile.v0" #415
- related: grpc: make sure typed errors handler is installed moby/moby#42329 (comment)
- related: Proposal: allow sending "plain" build output to a file moby/buildkit#1837
Not sure if this would be best files in buildx or in buildkit, but posting it here for now, as buildx provides the "user-facing" side of this.
When debugging a failure in a PR for our documentation, I noticed that the error output of build (and even more for bake) is not very useful. I've also seen users report issues where they have difficulty finding the actual error that caused the build to fail, which resulted in them opening a bug-report in our issue tracker where (after further investigating) the failure was related to the command failing (so not a bug, but an issue with the command executed during the build).
While I was aware that the error can be quite verbose in itself (see #415 (comment)), and I knew that improvements were made to error messages (moby/moby#42329 (comment)), It just now fully dropped in what's missing here.
Ignoring earlier output for a minute (more on that later), this is the final error message presented to the user:
Error: buildx bake failed with: ERROR: failed to solve: process "/bin/sh -c set -eu\n CONFIG_FILES=\"_config.yml\"\n if [ \"${JEKYLL_ENV}\" = \"production\" ]; then\n CONFIG_FILES=\"${CONFIG_FILES},_config_production.yml\"\n elif [ \"${DOCS_URL}\" = \"https://docs-stage.docker.com\" ]; then\n CONFIG_FILES=\"${CONFIG_FILES},_config_stage.yml\"\n fi\n set -x\n bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}\n" did not complete successfully: exit code: 1
That's a single line, with newlines and quotes encoded in them. Let's skip over that part for a bit, and assume that;
- The user understands the "encoded newline" part
- Knows about the convention (due to multiple components involved) to "wrap" (chain) errors ('docker build' error: "failed to solve with frontend dockerfile.v0" #415 (comment)), and is able to split those errors (:) to "unwrap" them;
With the above, the user would be able to format and break up the error (I numbered the (potential) wrapped errors);
1. Error:
2. buildx bake failed with:
3. ERROR:
4. failed to solve:
process
/bin/sh -c set -eu
CONFIG_FILES="_config.yml"
if [ "${JEKYLL_ENV}" = "production" ]; then
CONFIG_FILES="${CONFIG_FILES},_config_production.yml"
elif [ "${DOCS_URL}" = "https://docs-stage.docker.com" ]; then
CONFIG_FILES="${CONFIG_FILES},_config_stage.yml"
fi
set -x
bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}
did not complete successfully:
5. exit code: 1
Again, assuming an experienced user, here's how I would interpret the output;
- An error occurred
- The error was caused by
build bake, which failed (again; experienced user, so I know to interpret this asdocker buildx bake(I useddocker buildx, notbuildx)); - It's en error
- Something failed in a BuildKit build ("failed to solve") (this one keeps returning a lot, but as a user I'm now familiar with that and can skip over that as "something went wrong during a BuildKit build")
- Process
<the command that was executed>did not complete successfully (I recognize this command, so something failed when executing the command) - The exist code of the command was non-zero (
1) which means: the process failed.
Formatting and "wrapping" aside, it occurred to me that this information on its own doesn't help me solve the problem; the information I obtained from this is;
(docker) buildx bakefailed- the step that failed was the
<process>output - the exit code was
1(generic error)
Error repeated multiple times
This failure happened as part of a GitHub action (full output below)
Some of this may be due to that environment, but reading the output, the error described above was repeated four times;
2023-02-09T16:30:24.3127061Z #15 ERROR: process "/bin/sh -c set -eu\n CONFIG_FILES=\"_config.yml\"\n if [ \"${JEKYLL_ENV}\" = \"production\" ]; then\n CONFIG_FILES=\"${CONFIG_FILES},_config_production.yml\"\n elif [ \"${DOCS_URL}\" = \"https://docs-stage.docker.com\" ]; then\n CONFIG_FILES=\"${CONFIG_FILES},_config_stage.yml\"\n fi\n set -x\n bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}\n" did not complete successfully: exit code: 1
2023-02-09T16:30:24.3179019Z ------
2023-02-09T16:30:24.3180211Z > [generate 1/1] RUN --mount=type=bind,target=.,rw --mount=type=cache,target=/tmp/docker-docs-clone --mount=type=cache,target=/src/.jekyll-cache <<EOT (set -eu...):
2023-02-09T16:30:24.3188068Z ------
2023-02-09T16:30:24.3188728Z Dockerfile:52
2023-02-09T16:30:24.3189131Z --------------------
2023-02-09T16:30:24.3189679Z 51 | ENV TARGET=/out
2023-02-09T16:30:24.3190137Z 52 | >>> RUN --mount=type=bind,target=.,rw \
2023-02-09T16:30:24.3190877Z 53 | >>> --mount=type=cache,target=/tmp/docker-docs-clone \
2023-02-09T16:30:24.3191421Z 54 | >>> --mount=type=cache,target=/src/.jekyll-cache <<EOT
2023-02-09T16:30:24.3192058Z 55 | >>> set -eu
2023-02-09T16:30:24.3192424Z 56 | >>> CONFIG_FILES="_config.yml"
2023-02-09T16:30:24.3193044Z 57 | >>> if [ "${JEKYLL_ENV}" = "production" ]; then
2023-02-09T16:30:24.3193484Z 58 | >>> CONFIG_FILES="${CONFIG_FILES},_config_production.yml"
2023-02-09T16:30:24.3194267Z 59 | >>> elif [ "${DOCS_URL}" = "https://docs-stage.docker.com" ]; then
2023-02-09T16:30:24.3194752Z 60 | >>> CONFIG_FILES="${CONFIG_FILES},_config_stage.yml"
2023-02-09T16:30:24.3195334Z 61 | >>> fi
2023-02-09T16:30:24.3195680Z 62 | >>> set -x
2023-02-09T16:30:24.3196407Z 63 | >>> bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}
2023-02-09T16:30:24.3196820Z 64 | >>> EOT
2023-02-09T16:30:24.3197330Z 65 |
2023-02-09T16:30:24.3197708Z --------------------
2023-02-09T16:30:24.3199208Z ERROR: failed to solve: process "/bin/sh -c set -eu\n CONFIG_FILES=\"_config.yml\"\n if [ \"${JEKYLL_ENV}\" = \"production\" ]; then\n CONFIG_FILES=\"${CONFIG_FILES},_config_production.yml\"\n elif [ \"${DOCS_URL}\" = \"https://docs-stage.docker.com\" ]; then\n CONFIG_FILES=\"${CONFIG_FILES},_config_stage.yml\"\n fi\n set -x\n bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}\n" did not complete successfully: exit code: 1
2023-02-09T16:30:24.3758604Z ##[error]buildx bake failed with: ERROR: failed to solve: process "/bin/sh -c set -eu\n CONFIG_FILES=\"_config.yml\"\n if [ \"${JEKYLL_ENV}\" = \"production\" ]; then\n CONFIG_FILES=\"${CONFIG_FILES},_config_production.yml\"\n elif [ \"${DOCS_URL}\" = \"https://docs-stage.docker.com\" ]; then\n CONFIG_FILES=\"${CONFIG_FILES},_config_stage.yml\"\n fi\n set -x\n bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}\n" did not complete successfully: exit code: 1
The above situation is the "happy" path, as this failure happened in CI with --progress=plain, which means I can read back all output. Debugging becomes more problematic when I use (e.g.) docker compose. When I run the build with docker compose, the "pretty" output is used as a default, which starts okay;
Details
docker compose up --build
[+] Building 37.8s (12/17)
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 3.85kB 0.0s
=> [internal] load .dockerignore 0.1s
=> => transferring context: 196B 0.0s
=> resolve image config for docker.io/docker/dockerfile:1 1.3s
=> docker-image://docker.io/docker/dockerfile:1@sha256:d2d74ff22a0e47b21f4bbde337e2ac4cd0a02a2226ef79264878db3dc7e87df8 1.4s
=> => resolve docker.io/docker/dockerfile:1@sha256:d2d74ff22a0e47b21f4bbde337e2ac4cd0a02a2226ef79264878db3dc7e87df8 0.0s
=> => sha256:d2d74ff22a0e47b21f4bbde337e2ac4cd0a02a2226ef79264878db3dc7e87df8 7.65kB / 7.65kB 0.0s
=> => sha256:6107562e90a8e27b67c589b63fce32010ddf37f48b701600cfd405e296651a5d 482B / 482B 0.0s
=> => sha256:89543fde88d7f1fb9104eb8832cfe0e33bbacf255e98e916338940bbd657754f 2.91kB / 2.91kB 0.0s
=> => sha256:dd092abd7f3683f4e8e7a66e770a1cc279b2132ac7f66d3c11b7d4a0cb529b7d 11.55MB / 11.55MB 0.3s
=> => extracting sha256:dd092abd7f3683f4e8e7a66e770a1cc279b2132ac7f66d3c11b7d4a0cb529b7d 0.9s
=> [internal] load metadata for docker.io/library/nginx:alpine 0.9s
=> [internal] load metadata for docker.io/library/ruby:2.7.6-alpine 1.3s
=> [base 1/3] FROM docker.io/library/ruby:2.7.6-alpine@sha256:61da89ad73dc0e19221ba3fee68e831aff22089e5322e55a0eba09ee80bbc850 11.4s
=> => resolve docker.io/library/ruby:2.7.6-alpine@sha256:61da89ad73dc0e19221ba3fee68e831aff22089e5322e55a0eba09ee80bbc850 0.1s
=> => sha256:8678c5e3f378f4dc40d7100b7c38cb5ced80581afa2906bffd41dde5240b554e 1.36kB / 1.36kB 0.0s
=> => sha256:99b63c4cd90b661ca24bbe1c49c5103a46afa66f0920530d5c35f25aed5a693d 6.27kB / 6.27kB 0.0s
=> => sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce 2.81MB / 2.81MB 0.3s
=> => sha256:61da89ad73dc0e19221ba3fee68e831aff22089e5322e55a0eba09ee80bbc850 1.65kB / 1.65kB 0.0s
=> => sha256:fe71bb0ecfee9a49c659caea9c562189308c32261544479dd7babb052d24a6e3 3.85MB / 3.85MB 0.6s
=> => sha256:134ec6373770556ccee80d88fc0a52d205074ceb254c4d68c04595025db9787b 223B / 223B 0.5s
=> => extracting sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce 1.4s
=> => sha256:7fb77b9dabe834d23877a6e6e75df43eb733177aae51f99dcac42a5f45e3db24 13.89MB / 13.89MB 2.4s
=> => sha256:a0ee32b8e964a1879aee1936fea2d33cc73d27c232446de6a73da2e8210c1d2d 172B / 172B 0.8s
=> => extracting sha256:fe71bb0ecfee9a49c659caea9c562189308c32261544479dd7babb052d24a6e3 2.8s
=> => extracting sha256:134ec6373770556ccee80d88fc0a52d205074ceb254c4d68c04595025db9787b 0.0s
=> => extracting sha256:7fb77b9dabe834d23877a6e6e75df43eb733177aae51f99dcac42a5f45e3db24 4.3s
=> => extracting sha256:a0ee32b8e964a1879aee1936fea2d33cc73d27c232446de6a73da2e8210c1d2d 0.0s
=> [internal] load build context 31.5s
=> => transferring context: 755.77MB 31.4s
=> CACHED [deploy 1/3] FROM docker.io/library/nginx:alpine@sha256:659610aadb34b7967dea7686926fdcf08d588a71c5121edb094ce0e4cdbc45e6 0.0s
=> [base 2/3] WORKDIR /src 0.2s
=> [base 3/3] RUN apk add --no-cache bash build-base git 12.4s
=> [gem 1/2] COPY Gemfile* . 0.5s
=> [gem 2/2] RUN gem uninstall -aIx bundler && gem install bundler -v 2.3.13 && bundle config set force_ruby_platform true && bundle install 1.2sBut after the build failed, an attempt is made to show the command's failure output (and probably the error about what line in the Dockerfile failed), but either compose (or my terminal) limits history, so now I end up with;
- partial output of the failed command
- no error about the failure location in the Dockerfile, so I don't have proper context where the failure happened (unless I deserialize the 1-line error message at the end)
What has been done so far to improve this?
We made improvements in build output to help localize the part of the Dockerfile that produced the error (moby/moby#42329 (comment)). I think this was a great addition, especially when working with more complex Dockerfiles (which may have multiple sections with similar commands);
2023-02-09T16:30:24.3188068Z ------
2023-02-09T16:30:24.3188728Z Dockerfile:52
2023-02-09T16:30:24.3189131Z --------------------
2023-02-09T16:30:24.3189679Z 51 | ENV TARGET=/out
2023-02-09T16:30:24.3190137Z 52 | >>> RUN --mount=type=bind,target=.,rw \
2023-02-09T16:30:24.3190877Z 53 | >>> --mount=type=cache,target=/tmp/docker-docs-clone \
2023-02-09T16:30:24.3191421Z 54 | >>> --mount=type=cache,target=/src/.jekyll-cache <<EOT
2023-02-09T16:30:24.3192058Z 55 | >>> set -eu
2023-02-09T16:30:24.3192424Z 56 | >>> CONFIG_FILES="_config.yml"
2023-02-09T16:30:24.3193044Z 57 | >>> if [ "${JEKYLL_ENV}" = "production" ]; then
2023-02-09T16:30:24.3193484Z 58 | >>> CONFIG_FILES="${CONFIG_FILES},_config_production.yml"
2023-02-09T16:30:24.3194267Z 59 | >>> elif [ "${DOCS_URL}" = "https://docs-stage.docker.com" ]; then
2023-02-09T16:30:24.3194752Z 60 | >>> CONFIG_FILES="${CONFIG_FILES},_config_stage.yml"
2023-02-09T16:30:24.3195334Z 61 | >>> fi
2023-02-09T16:30:24.3195680Z 62 | >>> set -x
2023-02-09T16:30:24.3196407Z 63 | >>> bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}
2023-02-09T16:30:24.3196820Z 64 | >>> EOT
2023-02-09T16:30:24.3197330Z 65 |
2023-02-09T16:30:24.3197708Z --------------------
However, while that helps me locate the offending part of the Dockerfile, it does not provide me the information to solve the problem. Finding the associated error from the command is cumbersome (especially)
Solutions
I don't have an immediate solution for all of this, but some features may help.
Ability to read back build logs
For debugging, it would help to be able to read back the full output of the build. This would help debugging the failed build, especially in the docker compose example mentioned above, where the output is otherwise truncated;
For a complex Dockerfile, this may still be limiting (perhaps I want to only show the output of the failed buildstep ?), so perhaps a way to filter would be great
Replay the build (interactively?)
I saw some work was done on a "history API":
Gonna be honest; I have no idea what's stored in that API, but if that feature means we keep all the output, then it would be great to expose that through a command, and being able to (interactively?) replay the build (without actually re-running the build). Basically a "debug" option that allows me to "step" through the build to navigate the output (step over stage x, y, step into stage z, and scroll through the output).