MPI speedup of kinetic integration #97

logan-nc · 2019-03-21T01:08:15Z

This is a direct continuation of #82.
I attempted to get that branch into develop without closing the PR thread, but git was too smart for me. Oh well.

At the opening of this new PR, the branch successfully improved the makefiles and enabled compiling dcon with or without openmp. The same treatment, using .F file extensions instead of .f, should probably be extended to STRIDE at some point. But this should not take priority over our main DCON & GPEC development. For convenience, I've directly copied the top level outline from #82 below.

First, an overview information/computation flowchart:

The important points are,

A DCON run can take a long time if calculating the kinetic terms
- This part is partly parallelized, but still loops serially over 100-500 radial points, filling in a MxM (M~100) complex matrix a each psi before forming it all into a spline in psi.
- Better speed and/or restart capability would be good in this fourfit_kinetic_matrix subroutine
Otherwise, DCON is fast and we don't really need restarts.
GPEC and DCON are separate for historical reasons and need to stay that way. However, users have expressed a desire for gpec to (optionally) call DCON as a subroutine, eliminating the need for the slow and disk-space-intensive binary file interface.
GPEC goes through some fast initial setup (reads euler.bin), and then basically runs a series of subroutines to output various quantities according to what flags the user set. These are mostly independent, and could be done in parallel. Many of them also contain big loops through (independent) flux surfaces, which could be done in parallel. But all use common global splines/variables and various subroutines that evaluate them (their "current" value is the value at the last requested psi, so evaluations cannot be mixed up!)... Is there an easy way to send "snapshots" of these to independent parallel calculations so they do not mess each other up?
I think restarts of GPEC can be easy: just record which subroutines have been completed, and do not repeat them if restarting.

To prioritize:

Make fourfit_kinetic_matrix parallel over psi steps in fourfit.f
Parallelize GPEC subroutines in gpec.f (should be easy)
Enable single exe GPEC, with an all in-memory DCON interface
Parallelize slow GPEC subroutines that loop over many independent psi in gpout.f

This is really just a minor commit to see if this branch still lives separately and can maintain its open PR on github.

logan-nc · 2019-04-25T20:22:43Z

@stephethier I have a compiling question for you.
Installing GPEC at ASDEX Upgrade, I found it compiled fine but running the executables failed to find linked libraries. I fixed this by hardcoding the library directories into the LD_LIBRARY_PATH in my gpec modulefile but was told "this is not done anymore".

It was recommended to use something like

-Wl,-rpath=$(NETCDFHOME)/lib

in my makefiles.

Some light googling got me here, where it is claimed that this syntax is only valid for a subset of compilers and Portland Group, for example, uses an alternate syntax.

Do you know how to properly handle this in a way that maintains the portability you've helped us restore?

stephethier · 2019-04-26T11:58:13Z

Hi Nick, It should be "-L${NETCDFDIR}/lib" rather than "-Wl,-rpath". Cheers Stephane

…

On Thu, Apr 25, 2019, 2:22 PM Nikolas Logan ***@***.***> wrote: @stephethier <https://github.com/stephethier> I have a compiling question for you. Installing GPEC at ASDEX Upgrade, I found it compiled fine but running the executables failed to find linked libraries. I fixed this by hardcoding the library directories into the LD_LIBRARY_PATH in my gpec modulefile but was told "this is not done anymore". It was recommended to use something like -Wl,-rpath=$(NETCDFHOME)/lib in my makefiles. Some light googling got me here <https://gcc.gnu.org/ml/gcc-help/2005-12/msg00017.html>, where it is claimed that this syntax is only valid for a subset of compilers and Portland Group, for example, uses an alternate syntax. Do you know how to properly handle this in a way that maintains the portability you've helped us restore? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#97 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGN3VNJP34YFHTUVMTJEYPDPSIHJLANCNFSM4HACGXDQ> .

krystophny · 2019-04-26T16:08:53Z

Hi!

I believe those are two different settings. -L$LIBPATH is only a compile-time setting, while -Wl,-rpath also affects runtime lookup of the library path as in setting $LD_LIBRARY_PATH . Could be that rpath= is not working on certain versions of gcc, but rather replace "=" by another "," :

gcc -Wl,-rpath,$LIBPATH

see also https://stackoverflow.com/questions/6562403/i-dont-understand-wl-rpath-wl

Best,

Chris

logan-nc · 2019-04-26T16:54:23Z

Yep. @krystophny has the right of it.
We currently us -L$LIBPATH and it works fine on portal, iris (General Atomics), KSTAR, etc.
However, when we do module load intel mkl; make on the tok cluster (ASDEX Upgrade) and then immediately test the exe on an example it bombs at runtime due to unfound libs. That is, their modules do not put the libs in the users path and -L$LIBPATH does not tell the runtime exe where to find them.

I agree with the internet that this is silly and that 1) -L$LIBPATH SHOULD tell the runtime exe where to looks and 2) the commands to do so SHOULD be standard.... but I don't know what the best practice is given that these 2 things are not true.

Right now, I manually add $LIBPATH to LD_LIBRARY_PATH in the gpec modulefile so anyone who does module load gpec can run our exe that was compiled with -L$LIBPATH. This seems dirty. I welcome suggestions.

logan-nc · 2019-04-26T16:57:05Z

On a different note, has @stephethier made any progress diagnosing the code speed for sticking points?
I have a vague memory of setting up a fast example for you... did I misremember that? Do you need anything from me?

logan-nc · 2020-02-11T21:56:18Z

This branch was never utilized. Ongoing speedup work will be concentrated in #116

INSTALL - Minor - Adds DEFAULTS.inc instructions for OMPFLAG

53bc470

This is really just a minor commit to see if this branch still lives separately and can maintain its open PR on github.

logan-nc requested a review from stephethier March 21, 2019 01:08

logan-nc self-assigned this Mar 21, 2019

logan-nc added the enhancement label Mar 21, 2019

logan-nc mentioned this pull request Feb 11, 2020

Adds OpenACC for optional GPU utilization #116

Open

logan-nc closed this Feb 11, 2020

logan-nc deleted the cppg_modernization branch February 11, 2020 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI speedup of kinetic integration #97

MPI speedup of kinetic integration #97

Uh oh!

logan-nc commented Mar 21, 2019

Uh oh!

logan-nc commented Apr 25, 2019

Uh oh!

stephethier commented Apr 26, 2019 via email

Uh oh!

krystophny commented Apr 26, 2019

Uh oh!

logan-nc commented Apr 26, 2019

Uh oh!

logan-nc commented Apr 26, 2019

Uh oh!

logan-nc commented Feb 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MPI speedup of kinetic integration #97

MPI speedup of kinetic integration #97

Uh oh!

Conversation

logan-nc commented Mar 21, 2019

Uh oh!

logan-nc commented Apr 25, 2019

Uh oh!

stephethier commented Apr 26, 2019 via email

Uh oh!

krystophny commented Apr 26, 2019

Uh oh!

logan-nc commented Apr 26, 2019

Uh oh!

logan-nc commented Apr 26, 2019

Uh oh!

logan-nc commented Feb 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants