Skip to content

Conversation

@mfournial
Copy link
Owner

⚠️ unexpectedly, this doesn't work at all.

  • needs -nocudalib on clang when --target=wasm32 otherwise clang won't do anything
  • clang compiles the wasm file correctly I think, then tried to emit the CUDA stubs in a 4 stage process (.out, then .s the ptax then call nvidia-link). It fails to emit the CUDA code on different stages depending on if I try 32/64 bits NVPTX code :

NVPTX64

I think it produces wrong .s LLVM bytecode because third step ptax complains with:

ptxas /tmp/pi_targ_v4-995fde.s, line 120; fatal   : Greater number of elements in array initializer
ptxas fatal   : Ptx assembly aborted due to errors

I haven't checked that the bytecode looked like

NVPTX

Running in principle the following:

clang -O3 -fopenmp -fopenmp-targets=nvptx64 -ffast-math -lfaasmp -lm --sysroot=$SYSROOT -o pi_target_v4.wasm pi_targ_v4.c --target=wasm32 -nocudalib -I/vol/bitbucket/mmf115/local/llvm-7.0.0/lib/clang/7.0.0/include/ -I/usr/include/ -nostdinc

In reality need to run sub-commands ran by had because nvidia-link stage needs to be given -m 32 but clang does not.

 "/vol/bitbucket/mmf115/local/llvm-8.0.0/bin/clang-8" -cc1 -triple wasm32 -emit-llvm-bc -emit-llvm-uselists -disable-free -disable-llvm-verifier -discard-value-names -main-file-name pi_targ_v4.c -mrelocation-model static -mthread-model single -menable-no-infs -menable-no-nans -menable-unsafe-fp-math -fno-signed-zeros -mreassociate -freciprocal-math -fno-trapping-math -ffp-contract=fast -ffast-math -ffinite-math-only -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu generic -fvisibility hidden -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -v -nostdsysteminc -nobuiltininc -resource-dir /vol/bitbucket/mmf115/local/llvm-8.0.0/lib/clang/8.0.0 -I /vol/bitbucket/mmf115/local/llvm-7.0.0/lib/clang/7.0.0/include/ -I /usr/include/ -isysroot /vol/bitbucket/mmf115/local/wasi-sdk-8.0 -O3 -fdebug-compilation-dir /vol/bitbucket/mmf115/omp/playground/cuda/openmp-tutorial/Solutions/wow -ferror-limit 19 -fmessage-length 118 -fopenmp -fobjc-runtime=gnustep -fno-common -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o first.bc -x c pi_targ_v4.c -fopenmp-targets=nvptx

"/vol/bitbucket/mmf115/local/llvm-8.0.0/bin/clang-8" -cc1 -triple nvptx -aux-triple wasm32 -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name pi_targ_v4.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -menable-no-infs -menable-no-nans -menable-unsafe-fp-math -fno-signed-zeros -mreassociate -freciprocal-math -fno-trapping-math -ffp-contract=fast -ffast-math -ffinite-math-only -no-integrated-as -fuse-init-array -target-cpu sm_52 -dwarf-column-info -debugger-tuning=gdb -v -nostdsysteminc -nobuiltininc -resource-dir /vol/bitbucket/mmf115/local/llvm-8.0.0/lib/clang/8.0.0 -I /vol/bitbucket/mmf115/local/llvm-7.0.0/lib/clang/7.0.0/include/ -I /usr/include/ -isysroot /vol/bitbucket/mmf115/local/wasi-sdk-8.0 -O3 -fno-dwarf-directory-asm -fdebug-compilation-dir /vol/bitbucket/mmf115/omp/playground/cuda/openmp-tutorial/Solutions/wow -ferror-limit 19 -fmessage-length 118 -fopenmp -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o second.s -x c pi_targ_v4.c -fopenmp-is-device -fopenmp-host-ir-file-path first.bc

"/vol/cuda/9.2.148/bin/ptxas" -m32 -O3 -v --gpu-name sm_52 --output-file third.cubin second.s -c

"/vol/cuda/9.2.148/bin/nvlink" -m 32 -o link.out -v -arch sm_52 -L/vol/bitbucket/mmf115/local/llvm-8.0.0/lib -lomptarget-nvptx third.cubin

# nvlink fatal   : Input file '/vol/bitbucket/mmf115/local/llvm-8.0.0/lib/libomptarget-nvptx.a:omptarget-nvptx_generated_cancel.cu.o' size does not match target '-m32'

Got this even after recompiling libomptarget with LIBOMP_ARCH X86 <- Clang has a bug with this where you need to explicitely change it in the CMake file because their checks are broken (set(LIBOMP_ARCH i386) did it, and then you need to have a 32-bit compiler ready which isn't the case on lab machines).

Even after recompilation I got the same error. I didn't have a 32-bit compiler on my system (and needed to include all the 64-bit headers, which might be why this didn't work.

TODO

Can be done on any system

  • install a 32-bit compiler
wget https://releases.llvm.org/8.0.0/openmp-8.0.0.src.tar.xz
tar xf openmp-8.0.0.src.tar.xz
# modify cmake file in runtime as described above
mkdir target-8.0.0
mkdir llvm-8.0.0
cd target-8.0.0

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
                         -DCMAKE_INSTALL_PREFIX=$(pwd)/../llvm-8.0.0 \
                         -DCMAKE_C_COMPILER=clang \
                         -DCMAKE_CXX_COMPILER=clang++ \
                         -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=52 \
                         -DLIBOMP_ARCH=X86 \
                         ../openmp-8.0.0.src/

for reference on lab machine

-DCMAKE_C_FLAGS="-I/usr/src/linux-headers-5.0.0-31/arch/x86/include/uapi/ -I/usr/lib/gcc/i686-w64-mingw32/7.3-posix/include/c++/i686-w64-mingw32/ -I/usr/src/linux-headers-5.0.0-37-generic/arch/x86/include/generated/uapi/" \
                         -DCMAKE_CXX_FLAGS="-I/usr/src/linux-headers-5.0.0-37-generic/arch/x86/include/generated/uapi/ -I/usr/src/linux-headers-5.0.0-31/arch/x86/include/uapi/ -I/usr/lib/gcc/i686-w64-mingw32/7.3-posix/include/c++/i686-w64-mingw32/" \

When tried to ignore the nvlink fatal : Input file '/vol/bitbucket/mmf115/local/llvm-8.0.0/lib/libomptarget-nvptx.a:omptarget-nvptx_generated_cancel.cu.o' size does not match target '-m3 error, and simply remove this file to see what it does, I got

nvlink error   : Undefined reference to '__kmpc_spmd_kernel_init' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_get_team_static_memory' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_for_static_init_4' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_for_static_fini' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_global_thread_num' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_nvptx_parallel_reduce_nowait_v2' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_nvptx_end_reduce_nowait' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_nvptx_teams_reduce_nowait_simple' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_nvptx_teams_end_reduce_nowait_simple' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_restore_team_static_memory' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_spmd_kernel_deinit_v2' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_shuffle_int64' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_barrier' in 'third.cubin'

which means that if we can make that work, and stitch the right things together (this should be in the faasm toolchain surely), the backend to libomptarget could be a multitenant safe GPU thingy. pretty cool

Shillaker and others added 12 commits March 23, 2020 19:12
* Removed legacy edge server

* Removed dependency on legacy edge
* Add println function to OMP [skip-ci]

This is temporary, helped the Rust POC

* Add lib-rust

* Remove forgotten declaration in libomp

* Rust notes

* Rust docs

* Small tidy up

* Added script for downloading minimal libs

* Rust test

* Dockerignore

* Remove script

Co-authored-by: Simon Shillaker <mail@simonshillaker.com>
* Consolidating folders

* Tidy up python libs func

* Rename rust lib dir

* Dockerignore
* Version bump

* Docker fixes
* LLVM version bump

* Recompiling the world with LLVM 10

* Updated PRK

* Fixed GOT test

* Recompiled some OMP functions

* Version bump
Might want to look at Spectre/Meltdown implication of this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants