Skip to content

Severe lack of information on how to build or use ZenDNN #12

@teodorkostov

Description

@teodorkostov

Overview

Unfortunately, it seems that this project is intended for internal AMD use. If I were to use it I would have to go to the desk of one of the maintainers and have a conversation on how to work with the framework. The build procedure is conceptually documented. Following it is not sufficient to compile the project. A lot of the framework options are not documented. There are no links from this repo to the official AMD project pages (https://www.amd.com/en/developer/zendnn.html and https://docs.amd.com/r/en-US/57300-ZenDNN-user-guide). Not that these pages provide any valuable information.

Build

Here is my experience with this framework:

  1. Opened the main README.md and followed the build instructions for v5.0.
  2. Got the AOCL-BLAS Library and read the build instructions.
$ ./configure auto
$ make
$ export ZENDNN_BLIS_PATH=$(pwd)
  1. Got the Composable Kernel. Massive amount of warnings and not properly documented dependencies. However, managed to build it.
  2. Got FBGEMM and built the appropriate version. Managed to build it with some help.
  3. With path configured we are going for the main build.
export ZENDNN_BLIS_PATH=/build/blis
export DEPEND_ON_CK=1
export FBGEMM_ENABLE=1
export FBGEMM_PATH=/build/fbgemm
  1. Does not compile. Missing bfloat16, missing blis.h, missing cblas.h, missing functions like *_gelu_*, etc.
  2. I am using a Zen 2 EPYC so my autoconfig was properly detected, however, an additional zen2 path was added. I figured out that blis paths should be modified. Also ZENDNN_STANDALONE_BUILD should be used.

All the information on ZENDNN_STANDALONE_BUILD. By the way, that LP64 is a complete mystery. Nevermind, let's go on.
Image
8. Thought the missing cblas.h could be another dependency that is not properly described here. Plausable, judging from the other AMD projects.
9. Could not get rid of the other errors. So many.
10. Decided to ditch v5.0 and go for v4.2 and use only the mandatory blis dependency.
11. Followed the instructions again. The User Guide PDF is useless.
12. ZenDNN does not compile.
13. Disabled most functionality.
14. I was lucky that the errors were more concentrated and I got searching for one of the functions that were not defined aocl_gelu_tanh_f32.
15. Found it in a PDF (of course).
16. The PDF describes how to use an addon for blis. Great!
17. Finally, I figured out that I am building the blis dependency in the wrong way. Here is the correct config.

$ ./configure --prefix=/build/blis-dist -a aocl_gemm --enable-cblas --enable-threading=openmp auto
$ make
$ make install
$ cp /build/blis-dist/include/blis/* /build/blis-dist/include
  1. The install is very important. It properly places all artifacts on the target directory.
  2. No mention that this project requires blis to be built with the --enable-cblas flag.
  3. No mention that the aocl_gemm addon should be enabled. That is almost impossible to figure out from the blis documentation as well. configure just points out how to enable addons. No information on the addons and when to use them. Just folders with a bunch of code.
  -a NAME --enable-addon=NAME

                Enable the code provided by an addon. An addon consists
                of a separate directory of code that provides additional
                APIs, implementations, and/or operations that would
                otherwise not be present within a build of BLIS. This
                option may be used multiple times to specify the inclusion
                of multiple addons. By default, no addons are enabled.
  1. No mention here that threading has to be enabled in the blis library.

Usage

So, we have the libamdZenDNN.so, what now? Well, nothing much. The integration seems quite poor. Again, following the User Guide it seems that a wrapper has to be written for every model.

model = torch.compile(model, backend='zentorch')
with torch.no_grad():
 output = model(input)

Great! But makes this entire effort practically useless. Maybe there could be value if I am creating a model from scratch.

Even the ONNX instructions are quite conceptual. The user guide mentions a binary release ONNXRT_v1.17.0_ZenDNN_v4.2_Python_v3.8.zip but it is a mystery as where is this is to be found. At this point I am giving up.

Target result

There should be an easy to follow build guide. ALL tuning parameters and addons should be properly described. Here and in blis. All related pages should interlinked.

A Dockerfile is highly welcome. It describes how this framework is built in a clean environment.

It should be easier to enable ZenDNN optimizations for models without writing Python wrappers. ONNX is a move in the write direction. However, frameworks like vllm are also getting a lot of traction.

Conclusion

I hope that this gives you a glimpse into the high complexity of your code base and poor documentation. Because of that I have encountered other oppinions that AMD frameworks (like ROCm and ZenDNN) "do not compile" and users should "just use NVIDIA". I hope that in the future you would make it easier for new users to get started with your projects. And improve the integration with AI frameworks and tools.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions