Test OMP + GPU + Wasm #2

mfournial · 2020-03-10T01:33:26Z

⚠️ unexpectedly, this doesn't work at all.

needs -nocudalib on clang when --target=wasm32 otherwise clang won't do anything
clang compiles the wasm file correctly I think, then tried to emit the CUDA stubs in a 4 stage process (.out, then .s the ptax then call nvidia-link). It fails to emit the CUDA code on different stages depending on if I try 32/64 bits NVPTX code :

NVPTX64

I think it produces wrong .s LLVM bytecode because third step ptax complains with:

ptxas /tmp/pi_targ_v4-995fde.s, line 120; fatal   : Greater number of elements in array initializer
ptxas fatal   : Ptx assembly aborted due to errors

I haven't checked that the bytecode looked like

NVPTX

Running in principle the following:

clang -O3 -fopenmp -fopenmp-targets=nvptx64 -ffast-math -lfaasmp -lm --sysroot=$SYSROOT -o pi_target_v4.wasm pi_targ_v4.c --target=wasm32 -nocudalib -I/vol/bitbucket/mmf115/local/llvm-7.0.0/lib/clang/7.0.0/include/ -I/usr/include/ -nostdinc

In reality need to run sub-commands ran by had because nvidia-link stage needs to be given -m 32 but clang does not.

 "/vol/bitbucket/mmf115/local/llvm-8.0.0/bin/clang-8" -cc1 -triple wasm32 -emit-llvm-bc -emit-llvm-uselists -disable-free -disable-llvm-verifier -discard-value-names -main-file-name pi_targ_v4.c -mrelocation-model static -mthread-model single -menable-no-infs -menable-no-nans -menable-unsafe-fp-math -fno-signed-zeros -mreassociate -freciprocal-math -fno-trapping-math -ffp-contract=fast -ffast-math -ffinite-math-only -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu generic -fvisibility hidden -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -v -nostdsysteminc -nobuiltininc -resource-dir /vol/bitbucket/mmf115/local/llvm-8.0.0/lib/clang/8.0.0 -I /vol/bitbucket/mmf115/local/llvm-7.0.0/lib/clang/7.0.0/include/ -I /usr/include/ -isysroot /vol/bitbucket/mmf115/local/wasi-sdk-8.0 -O3 -fdebug-compilation-dir /vol/bitbucket/mmf115/omp/playground/cuda/openmp-tutorial/Solutions/wow -ferror-limit 19 -fmessage-length 118 -fopenmp -fobjc-runtime=gnustep -fno-common -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o first.bc -x c pi_targ_v4.c -fopenmp-targets=nvptx

"/vol/bitbucket/mmf115/local/llvm-8.0.0/bin/clang-8" -cc1 -triple nvptx -aux-triple wasm32 -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name pi_targ_v4.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -menable-no-infs -menable-no-nans -menable-unsafe-fp-math -fno-signed-zeros -mreassociate -freciprocal-math -fno-trapping-math -ffp-contract=fast -ffast-math -ffinite-math-only -no-integrated-as -fuse-init-array -target-cpu sm_52 -dwarf-column-info -debugger-tuning=gdb -v -nostdsysteminc -nobuiltininc -resource-dir /vol/bitbucket/mmf115/local/llvm-8.0.0/lib/clang/8.0.0 -I /vol/bitbucket/mmf115/local/llvm-7.0.0/lib/clang/7.0.0/include/ -I /usr/include/ -isysroot /vol/bitbucket/mmf115/local/wasi-sdk-8.0 -O3 -fno-dwarf-directory-asm -fdebug-compilation-dir /vol/bitbucket/mmf115/omp/playground/cuda/openmp-tutorial/Solutions/wow -ferror-limit 19 -fmessage-length 118 -fopenmp -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o second.s -x c pi_targ_v4.c -fopenmp-is-device -fopenmp-host-ir-file-path first.bc

"/vol/cuda/9.2.148/bin/ptxas" -m32 -O3 -v --gpu-name sm_52 --output-file third.cubin second.s -c

"/vol/cuda/9.2.148/bin/nvlink" -m 32 -o link.out -v -arch sm_52 -L/vol/bitbucket/mmf115/local/llvm-8.0.0/lib -lomptarget-nvptx third.cubin

# nvlink fatal   : Input file '/vol/bitbucket/mmf115/local/llvm-8.0.0/lib/libomptarget-nvptx.a:omptarget-nvptx_generated_cancel.cu.o' size does not match target '-m32'

Got this even after recompiling libomptarget with LIBOMP_ARCH X86 <- Clang has a bug with this where you need to explicitely change it in the CMake file because their checks are broken (set(LIBOMP_ARCH i386) did it, and then you need to have a 32-bit compiler ready which isn't the case on lab machines).

Even after recompilation I got the same error. I didn't have a 32-bit compiler on my system (and needed to include all the 64-bit headers, which might be why this didn't work.

TODO

Can be done on any system

install a 32-bit compiler

wget https://releases.llvm.org/8.0.0/openmp-8.0.0.src.tar.xz
tar xf openmp-8.0.0.src.tar.xz
# modify cmake file in runtime as described above
mkdir target-8.0.0
mkdir llvm-8.0.0
cd target-8.0.0

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
                         -DCMAKE_INSTALL_PREFIX=$(pwd)/../llvm-8.0.0 \
                         -DCMAKE_C_COMPILER=clang \
                         -DCMAKE_CXX_COMPILER=clang++ \
                         -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=52 \
                         -DLIBOMP_ARCH=X86 \
                         ../openmp-8.0.0.src/

for reference on lab machine

-DCMAKE_C_FLAGS="-I/usr/src/linux-headers-5.0.0-31/arch/x86/include/uapi/ -I/usr/lib/gcc/i686-w64-mingw32/7.3-posix/include/c++/i686-w64-mingw32/ -I/usr/src/linux-headers-5.0.0-37-generic/arch/x86/include/generated/uapi/" \
                         -DCMAKE_CXX_FLAGS="-I/usr/src/linux-headers-5.0.0-37-generic/arch/x86/include/generated/uapi/ -I/usr/src/linux-headers-5.0.0-31/arch/x86/include/uapi/ -I/usr/lib/gcc/i686-w64-mingw32/7.3-posix/include/c++/i686-w64-mingw32/" \

When tried to ignore the nvlink fatal : Input file '/vol/bitbucket/mmf115/local/llvm-8.0.0/lib/libomptarget-nvptx.a:omptarget-nvptx_generated_cancel.cu.o' size does not match target '-m3 error, and simply remove this file to see what it does, I got

nvlink error   : Undefined reference to '__kmpc_spmd_kernel_init' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_get_team_static_memory' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_for_static_init_4' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_for_static_fini' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_global_thread_num' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_nvptx_parallel_reduce_nowait_v2' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_nvptx_end_reduce_nowait' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_nvptx_teams_reduce_nowait_simple' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_nvptx_teams_end_reduce_nowait_simple' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_restore_team_static_memory' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_spmd_kernel_deinit_v2' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_shuffle_int64' in 'third.cubin'
nvlink error   : Undefined reference to '__kmpc_barrier' in 'third.cubin'

which means that if we can make that work, and stitch the right things together (this should be in the faasm toolchain surely), the backend to libomptarget could be a multitenant safe GPU thingy. pretty cool

* Removed legacy edge server * Removed dependency on legacy edge

* Add println function to OMP [skip-ci] This is temporary, helped the Rust POC * Add lib-rust * Remove forgotten declaration in libomp * Rust notes * Rust docs * Small tidy up * Added script for downloading minimal libs * Rust test * Dockerignore * Remove script Co-authored-by: Simon Shillaker <mail@simonshillaker.com>

* Consolidating folders * Tidy up python libs func * Rename rust lib dir * Dockerignore

* Version bump * Docker fixes

This had me pulling my hair

* LLVM version bump * Recompiling the world with LLVM 10 * Updated PRK * Fixed GOT test * Recompiled some OMP functions * Version bump

Might want to look at Spectre/Meltdown implication of this

mfournial force-pushed the gpu-test branch from ac3ef57 to 8d2a0be Compare March 10, 2020 12:26

mfournial mentioned this pull request Mar 11, 2020

WIP: OMP + GPU + WASM faasm/faasm#197

Closed

mfournial force-pushed the gpu-test branch from fc5449b to 8a6b85b Compare March 23, 2020 17:10

Shillaker and others added 12 commits March 23, 2020 19:12

Remove legacy edge server (faasm#205)

633360e

* Removed legacy edge server * Removed dependency on legacy edge

Directory consolidation (faasm#206)

0f40b38

* Consolidating folders * Tidy up python libs func * Rename rust lib dir * Dockerignore

Release 0.0.10 (faasm#207)

bda6a44

* Version bump * Docker fixes

Fix non-gzipped archives in releases (faasm#208)

39b17b2

This had me pulling my hair

Including host_interface.h in sysroot

7ea387e

Version bump (faasm#209)

04b8974

Upgrade to LLVM 10 (faasm#210)

e68df7b

* LLVM version bump * Recompiling the world with LLVM 10 * Updated PRK * Fixed GOT test * Recompiled some OMP functions * Version bump

Add GPU OMP stubs

846ed39

Implement wtime API

8f9780e

Might want to look at Spectre/Meltdown implication of this

Hack an implemenetation of CPU distribute teams

4cf8a52

Cleanup to allow merge

bbfa5f8

mfournial force-pushed the gpu-test branch from 8a6b85b to bbfa5f8 Compare March 30, 2020 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test OMP + GPU + Wasm #2

Test OMP + GPU + Wasm #2

Uh oh!

mfournial commented Mar 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Test OMP + GPU + Wasm #2

Are you sure you want to change the base?

Test OMP + GPU + Wasm #2

Uh oh!

Conversation

mfournial commented Mar 10, 2020

NVPTX64

NVPTX

TODO

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants