Skip to content

Output of failed builds can be cumbersome to use for debugging #1619

@thaJeztah

Description

@thaJeztah

Output of failed builds can be cumbersome to use for debugging

Not sure if this would be best files in buildx or in buildkit, but posting it here for now, as buildx provides the "user-facing" side of this.

When debugging a failure in a PR for our documentation, I noticed that the error output of build (and even more for bake) is not very useful. I've also seen users report issues where they have difficulty finding the actual error that caused the build to fail, which resulted in them opening a bug-report in our issue tracker where (after further investigating) the failure was related to the command failing (so not a bug, but an issue with the command executed during the build).

While I was aware that the error can be quite verbose in itself (see #415 (comment)), and I knew that improvements were made to error messages (moby/moby#42329 (comment)), It just now fully dropped in what's missing here.

Ignoring earlier output for a minute (more on that later), this is the final error message presented to the user:

Error: buildx bake failed with: ERROR: failed to solve: process "/bin/sh -c   set -eu\n  CONFIG_FILES=\"_config.yml\"\n  if [ \"${JEKYLL_ENV}\" = \"production\" ]; then\n    CONFIG_FILES=\"${CONFIG_FILES},_config_production.yml\"\n  elif [ \"${DOCS_URL}\" = \"https://docs-stage.docker.com\" ]; then\n    CONFIG_FILES=\"${CONFIG_FILES},_config_stage.yml\"\n  fi\n  set -x\n  bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}\n" did not complete successfully: exit code: 1

That's a single line, with newlines and quotes encoded in them. Let's skip over that part for a bit, and assume that;

With the above, the user would be able to format and break up the error (I numbered the (potential) wrapped errors);

1. Error:
2.   buildx bake failed with:
3.    ERROR:
4.      failed to solve:
          process 
              /bin/sh -c   set -eu
              CONFIG_FILES="_config.yml"
              if [ "${JEKYLL_ENV}" = "production" ]; then
                  CONFIG_FILES="${CONFIG_FILES},_config_production.yml"
              elif [ "${DOCS_URL}" = "https://docs-stage.docker.com" ]; then
                  CONFIG_FILES="${CONFIG_FILES},_config_stage.yml"
              fi
              set -x
              bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}
          did not complete successfully:
5.          exit code: 1

Again, assuming an experienced user, here's how I would interpret the output;

  1. An error occurred
  2. The error was caused by build bake, which failed (again; experienced user, so I know to interpret this as docker buildx bake (I used docker buildx, not buildx));
  3. It's en error
  4. Something failed in a BuildKit build ("failed to solve") (this one keeps returning a lot, but as a user I'm now familiar with that and can skip over that as "something went wrong during a BuildKit build")
  5. Process <the command that was executed> did not complete successfully (I recognize this command, so something failed when executing the command)
  6. The exist code of the command was non-zero (1) which means: the process failed.

Formatting and "wrapping" aside, it occurred to me that this information on its own doesn't help me solve the problem; the information I obtained from this is;

  • (docker) buildx bake failed
  • the step that failed was the <process> output
  • the exit code was 1 (generic error)

Error repeated multiple times

This failure happened as part of a GitHub action (full output below)

Some of this may be due to that environment, but reading the output, the error described above was repeated four times;

2023-02-09T16:30:24.3127061Z #15 ERROR: process "/bin/sh -c   set -eu\n  CONFIG_FILES=\"_config.yml\"\n  if [ \"${JEKYLL_ENV}\" = \"production\" ]; then\n    CONFIG_FILES=\"${CONFIG_FILES},_config_production.yml\"\n  elif [ \"${DOCS_URL}\" = \"https://docs-stage.docker.com\" ]; then\n    CONFIG_FILES=\"${CONFIG_FILES},_config_stage.yml\"\n  fi\n  set -x\n  bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}\n" did not complete successfully: exit code: 1
2023-02-09T16:30:24.3179019Z ------
2023-02-09T16:30:24.3180211Z  > [generate 1/1] RUN --mount=type=bind,target=.,rw     --mount=type=cache,target=/tmp/docker-docs-clone     --mount=type=cache,target=/src/.jekyll-cache <<EOT (set -eu...):
2023-02-09T16:30:24.3188068Z ------
2023-02-09T16:30:24.3188728Z Dockerfile:52
2023-02-09T16:30:24.3189131Z --------------------
2023-02-09T16:30:24.3189679Z   51 |     ENV TARGET=/out
2023-02-09T16:30:24.3190137Z   52 | >>> RUN --mount=type=bind,target=.,rw \
2023-02-09T16:30:24.3190877Z   53 | >>>     --mount=type=cache,target=/tmp/docker-docs-clone \
2023-02-09T16:30:24.3191421Z   54 | >>>     --mount=type=cache,target=/src/.jekyll-cache <<EOT
2023-02-09T16:30:24.3192058Z   55 | >>>   set -eu
2023-02-09T16:30:24.3192424Z   56 | >>>   CONFIG_FILES="_config.yml"
2023-02-09T16:30:24.3193044Z   57 | >>>   if [ "${JEKYLL_ENV}" = "production" ]; then
2023-02-09T16:30:24.3193484Z   58 | >>>     CONFIG_FILES="${CONFIG_FILES},_config_production.yml"
2023-02-09T16:30:24.3194267Z   59 | >>>   elif [ "${DOCS_URL}" = "https://docs-stage.docker.com" ]; then
2023-02-09T16:30:24.3194752Z   60 | >>>     CONFIG_FILES="${CONFIG_FILES},_config_stage.yml"
2023-02-09T16:30:24.3195334Z   61 | >>>   fi
2023-02-09T16:30:24.3195680Z   62 | >>>   set -x
2023-02-09T16:30:24.3196407Z   63 | >>>   bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}
2023-02-09T16:30:24.3196820Z   64 | >>> EOT
2023-02-09T16:30:24.3197330Z   65 |     
2023-02-09T16:30:24.3197708Z --------------------
2023-02-09T16:30:24.3199208Z ERROR: failed to solve: process "/bin/sh -c   set -eu\n  CONFIG_FILES=\"_config.yml\"\n  if [ \"${JEKYLL_ENV}\" = \"production\" ]; then\n    CONFIG_FILES=\"${CONFIG_FILES},_config_production.yml\"\n  elif [ \"${DOCS_URL}\" = \"https://docs-stage.docker.com\" ]; then\n    CONFIG_FILES=\"${CONFIG_FILES},_config_stage.yml\"\n  fi\n  set -x\n  bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}\n" did not complete successfully: exit code: 1
2023-02-09T16:30:24.3758604Z ##[error]buildx bake failed with: ERROR: failed to solve: process "/bin/sh -c   set -eu\n  CONFIG_FILES=\"_config.yml\"\n  if [ \"${JEKYLL_ENV}\" = \"production\" ]; then\n    CONFIG_FILES=\"${CONFIG_FILES},_config_production.yml\"\n  elif [ \"${DOCS_URL}\" = \"https://docs-stage.docker.com\" ]; then\n    CONFIG_FILES=\"${CONFIG_FILES},_config_stage.yml\"\n  fi\n  set -x\n  bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}\n" did not complete successfully: exit code: 1

The above situation is the "happy" path, as this failure happened in CI with --progress=plain, which means I can read back all output. Debugging becomes more problematic when I use (e.g.) docker compose. When I run the build with docker compose, the "pretty" output is used as a default, which starts okay;

Details
docker compose up --build
[+] Building 37.8s (12/17)
 => [internal] load build definition from Dockerfile                                                                                                 0.1s
 => => transferring dockerfile: 3.85kB                                                                                                               0.0s
 => [internal] load .dockerignore                                                                                                                    0.1s
 => => transferring context: 196B                                                                                                                    0.0s
 => resolve image config for docker.io/docker/dockerfile:1                                                                                           1.3s
 => docker-image://docker.io/docker/dockerfile:1@sha256:d2d74ff22a0e47b21f4bbde337e2ac4cd0a02a2226ef79264878db3dc7e87df8                             1.4s
 => => resolve docker.io/docker/dockerfile:1@sha256:d2d74ff22a0e47b21f4bbde337e2ac4cd0a02a2226ef79264878db3dc7e87df8                                 0.0s
 => => sha256:d2d74ff22a0e47b21f4bbde337e2ac4cd0a02a2226ef79264878db3dc7e87df8 7.65kB / 7.65kB                                                       0.0s
 => => sha256:6107562e90a8e27b67c589b63fce32010ddf37f48b701600cfd405e296651a5d 482B / 482B                                                           0.0s
 => => sha256:89543fde88d7f1fb9104eb8832cfe0e33bbacf255e98e916338940bbd657754f 2.91kB / 2.91kB                                                       0.0s
 => => sha256:dd092abd7f3683f4e8e7a66e770a1cc279b2132ac7f66d3c11b7d4a0cb529b7d 11.55MB / 11.55MB                                                     0.3s
 => => extracting sha256:dd092abd7f3683f4e8e7a66e770a1cc279b2132ac7f66d3c11b7d4a0cb529b7d                                                            0.9s
 => [internal] load metadata for docker.io/library/nginx:alpine                                                                                      0.9s
 => [internal] load metadata for docker.io/library/ruby:2.7.6-alpine                                                                                 1.3s
 => [base 1/3] FROM docker.io/library/ruby:2.7.6-alpine@sha256:61da89ad73dc0e19221ba3fee68e831aff22089e5322e55a0eba09ee80bbc850                     11.4s
 => => resolve docker.io/library/ruby:2.7.6-alpine@sha256:61da89ad73dc0e19221ba3fee68e831aff22089e5322e55a0eba09ee80bbc850                           0.1s
 => => sha256:8678c5e3f378f4dc40d7100b7c38cb5ced80581afa2906bffd41dde5240b554e 1.36kB / 1.36kB                                                       0.0s
 => => sha256:99b63c4cd90b661ca24bbe1c49c5103a46afa66f0920530d5c35f25aed5a693d 6.27kB / 6.27kB                                                       0.0s
 => => sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce 2.81MB / 2.81MB                                                       0.3s
 => => sha256:61da89ad73dc0e19221ba3fee68e831aff22089e5322e55a0eba09ee80bbc850 1.65kB / 1.65kB                                                       0.0s
 => => sha256:fe71bb0ecfee9a49c659caea9c562189308c32261544479dd7babb052d24a6e3 3.85MB / 3.85MB                                                       0.6s
 => => sha256:134ec6373770556ccee80d88fc0a52d205074ceb254c4d68c04595025db9787b 223B / 223B                                                           0.5s
 => => extracting sha256:ca7dd9ec2225f2385955c43b2379305acd51543c28cf1d4e94522b3d94cce3ce                                                            1.4s
 => => sha256:7fb77b9dabe834d23877a6e6e75df43eb733177aae51f99dcac42a5f45e3db24 13.89MB / 13.89MB                                                     2.4s
 => => sha256:a0ee32b8e964a1879aee1936fea2d33cc73d27c232446de6a73da2e8210c1d2d 172B / 172B                                                           0.8s
 => => extracting sha256:fe71bb0ecfee9a49c659caea9c562189308c32261544479dd7babb052d24a6e3                                                            2.8s
 => => extracting sha256:134ec6373770556ccee80d88fc0a52d205074ceb254c4d68c04595025db9787b                                                            0.0s
 => => extracting sha256:7fb77b9dabe834d23877a6e6e75df43eb733177aae51f99dcac42a5f45e3db24                                                            4.3s
 => => extracting sha256:a0ee32b8e964a1879aee1936fea2d33cc73d27c232446de6a73da2e8210c1d2d                                                            0.0s
 => [internal] load build context                                                                                                                   31.5s
 => => transferring context: 755.77MB                                                                                                               31.4s
 => CACHED [deploy 1/3] FROM docker.io/library/nginx:alpine@sha256:659610aadb34b7967dea7686926fdcf08d588a71c5121edb094ce0e4cdbc45e6                  0.0s
 => [base 2/3] WORKDIR /src                                                                                                                          0.2s
 => [base 3/3] RUN apk add --no-cache bash build-base git                                                                                           12.4s
 => [gem 1/2] COPY Gemfile* .                                                                                                                        0.5s
 => [gem 2/2] RUN gem uninstall -aIx bundler   && gem install bundler -v 2.3.13   && bundle config set force_ruby_platform true   && bundle install  1.2s

But after the build failed, an attempt is made to show the command's failure output (and probably the error about what line in the Dockerfile failed), but either compose (or my terminal) limits history, so now I end up with;

  • partial output of the failed command
  • no error about the failure location in the Dockerfile, so I don't have proper context where the failure happened (unless I deserialize the 1-line error message at the end)

What has been done so far to improve this?

We made improvements in build output to help localize the part of the Dockerfile that produced the error (moby/moby#42329 (comment)). I think this was a great addition, especially when working with more complex Dockerfiles (which may have multiple sections with similar commands);

2023-02-09T16:30:24.3188068Z ------
2023-02-09T16:30:24.3188728Z Dockerfile:52
2023-02-09T16:30:24.3189131Z --------------------
2023-02-09T16:30:24.3189679Z   51 |     ENV TARGET=/out
2023-02-09T16:30:24.3190137Z   52 | >>> RUN --mount=type=bind,target=.,rw \
2023-02-09T16:30:24.3190877Z   53 | >>>     --mount=type=cache,target=/tmp/docker-docs-clone \
2023-02-09T16:30:24.3191421Z   54 | >>>     --mount=type=cache,target=/src/.jekyll-cache <<EOT
2023-02-09T16:30:24.3192058Z   55 | >>>   set -eu
2023-02-09T16:30:24.3192424Z   56 | >>>   CONFIG_FILES="_config.yml"
2023-02-09T16:30:24.3193044Z   57 | >>>   if [ "${JEKYLL_ENV}" = "production" ]; then
2023-02-09T16:30:24.3193484Z   58 | >>>     CONFIG_FILES="${CONFIG_FILES},_config_production.yml"
2023-02-09T16:30:24.3194267Z   59 | >>>   elif [ "${DOCS_URL}" = "https://docs-stage.docker.com" ]; then
2023-02-09T16:30:24.3194752Z   60 | >>>     CONFIG_FILES="${CONFIG_FILES},_config_stage.yml"
2023-02-09T16:30:24.3195334Z   61 | >>>   fi
2023-02-09T16:30:24.3195680Z   62 | >>>   set -x
2023-02-09T16:30:24.3196407Z   63 | >>>   bundle exec jekyll build --profile -d ${TARGET} --config ${CONFIG_FILES}
2023-02-09T16:30:24.3196820Z   64 | >>> EOT
2023-02-09T16:30:24.3197330Z   65 |     
2023-02-09T16:30:24.3197708Z --------------------

However, while that helps me locate the offending part of the Dockerfile, it does not provide me the information to solve the problem. Finding the associated error from the command is cumbersome (especially)

Solutions

I don't have an immediate solution for all of this, but some features may help.

Ability to read back build logs

For debugging, it would help to be able to read back the full output of the build. This would help debugging the failed build, especially in the docker compose example mentioned above, where the output is otherwise truncated;

For a complex Dockerfile, this may still be limiting (perhaps I want to only show the output of the failed buildstep ?), so perhaps a way to filter would be great

Replay the build (interactively?)

I saw some work was done on a "history API":

Gonna be honest; I have no idea what's stored in that API, but if that feature means we keep all the output, then it would be great to expose that through a command, and being able to (interactively?) replay the build (without actually re-running the build). Basically a "debug" option that allows me to "step" through the build to navigate the output (step over stage x, y, step into stage z, and scroll through the output).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions