-
Notifications
You must be signed in to change notification settings - Fork 0
Dynamic Load Balancing with MPI
License
alexei-matveev/dlb
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
# DYNAMIC LOAD BALANCING #
The library created in this sub-folder (called libdlb.a) takes care of
dynamical load balancing. The library works solely on task identifier
(task IDs). It is the responsibility of the calling program to launch
the actual tasks belonging to these identifiers.
The library is written in Fortran and is requiring a Fortran compiler
and an MPI 2 library. The MPI 2 communication of this library is
supposed to work independent of the remaining program and it is
therefore not necessary to synchronize it with the main
program. Additional requirements are dependent on the variant which is
chosen.
There are three different cases for the distribution of the tasks
possible, additional there are 4 different variants of the library
possible. For the latter the choice should be related to the hardware
and software on which the library should be used. They can be combined
with every task distribution.
# ACTING OF DLB #
DLB takes care about the available job numbers. On request it gives
several of these to a processor to work on. It takes care that each of
them is given back exactly once. If a processor runs out of jobs, he
can, for DLB_VARIANT > 0, ask the other processors to give them some
of theirs.
# CASES OF DISTRIBUTION #
There are three different cases for the task distribution possible. They
differ in the information on the tasks or in the starting distribution.
1. All jobs are equal in name, each processor starts with a
consecutive number of tasks.
2. Each job gets a special color by initialization, it is expected for
efficiency that jobs with the same color are having consecutive
numbers.
3. The jobs are equal in name but the initial distribution of the
tasks is such that for n processors one of them will get every n'th
task. This interface was designed for tasks, which are ordered
after size or expected cost, starting with the larger ones.
There are 8 different functions altogether, where two are common ones,
while of the other each two belong to one of the cases.
# DIFFERENT VARIANTS #
Which of the different variants of the DLB implementations is used is
selected by DLB_VARIANT.
DLB_VARIANT = 0: is a static distribution, where no communication
between the processors is used. Intended mainly for debugging.
DLB_VARIANT = 1: is a remote memory access (RMA) method. When RMA is
really asynchronous, or done by the hardware, this method can be
very effective.
DLB_VARIANT = 2 and DLB_VARIANT = 3: Both methods use threads for
making the DLB algorithm work. They need a higher level of thread
safety than the normal MPI_INIT(). They should be run with
MPI_INIT_THREAD, where the variant 2 needs at least level
MPI_THREAD_SERIALIZED, while variant 3 needs MPI_THREAD_MULTIPLE.
It starts some threads to handle the MPI communication between the
processors. Variant 3 needs a good thread handling for blocking
MPI calls. If this thread alternates nicely with the working
thread this method should be better than variant 2, where this
alternation is done by hand. But MPI has often an aggressive
polling by blocking calls, where the variant 3 easily doubles the
time requirement. The variants need also POSIX Threads available.
# INTERFACE #
All functions have to be called on all processors. The two common
functions are: dlb_init() and dlb_finalize(). They have to surround
the use of the other functions. When calling dlb_init() MPI has to be
already initialized, also dlb_finalize() has to be called before
MPI_FINALIZE(). Other than the MPI counterparts there is the
possibility to finalize and initialize DLB in between some uses. The
function dlb_init() has the argument world, which should be the
communicator of MPI, of which DLB makes a private copy. Thus the
calculations with DLB should be surrounded by:
call dlb_init(world)
...
call dlb_finalize()
The calculation itself is done by the two functions dlb_setup() and
dlb_give_more() or their counterparts with colors dlb_setup_color()
and dlb_give_more_color() or an additional round robin method
dlb_setup_rr() and dlb_give_more_rr(). It is possible to have several
of such calculations to be performed in between the two common
functions, one might also choose different interfaces for them. The
only restriction is that the new calculation should not be started
inside the previous one (there must be the possibility that one
calculation has finished before the next starts) and one must not mix
interfaces during one calculation.
The general interface:
dlb_setup() will take the number of available jobs as an argument. It
will already perform a static distribution.
dlb_give_more() has two arguments: the first, MAXJOBS, stands for the
number of job indices it wants at once. The second argument is the
interval jobs, which the processor should now work on. The work the
processor should then do is between jobs(1) + 1 and
jobs(2). dlb_give_more() is a function, which will return true as long
as there a jobs and false if there are none. In the last case there
will be also jobs(1) >= jobs(2). If it returns true there is always at
least one job. But there may be also fewer, if the processor has only
few, this has nothing to do with the amount of jobs, all processors
have. If it returns false, all processors have terminated (except in
case DLB_VARIANT = 0). Otherwise it will wait in the function
dlb_give_more() if the termination has not yet been confirmed.
Usage:
call dlb_setup(NJOBS)
do while (dlb_give_more (MAXJOBS, jobs))
...
Here NJOBS is integer of idlb_kind, MAXJOBS is integer of idlb_kind,
jobs(2) is an integer array of idlb_kind DLB provides the specific
integer kind idlb_kind, which is currently an 8-byte integer. But as
this is able to change and in order to have the code always running
with the current setting make your integer of the idlb_kind.
The variant with colors:
The difference of the color case, is that here the color is also
considered. In dlb_setup_color() one gives a distribution, with the
colors, thus it gets an integer array containing the number of jobs
for the corresponding color (for the i'th element the color i is
used). dlb_give_more_color() will give in addition to the jobs their
color. Furthermore it ensures that all jobs it gives back have the
same color.
Usage:
call dlb_setup_color(distr)
do while (dlb_give_more_color (MAXJOBS, color, jobs))
...
Here distr is integer(idlb_kind) array of size (many_colors), MAXJOBS
and color are integers of the idlb_kind, jobs(2) is integer array of
idlb_kind.
The variant with round robin start distribution:
The third case is using a round robin over the regions. Thus every
process gets every n'th task, where n is the number of processes.
There is a significant difference between dlb_give_more() and
dlb_give_more_rr(): the output jobs is of different size and has
different meaning. For the case of dlb_give_more() the slice of task
IDs, included in it are the tasks with jobs(1) + 1: jobs(2): jobs(3),
where jobs(3) is the stride for the succeeding job IDs.
Usage:
call dlb_setup_rr(NJOBS)
do while (dlb_give_more_rr (MAXJOBS, jobs))
...
Additional functions:
Additionally there is a function for printing statistics. It is
possible to not print the output directly (see output_level below) but
also to give them summarized with the function
dlb_print_statistics(). The output of this function is independent on
the output_level of the DLB module but the amount of output is
specified by a variable, given directly to it. This functions need
some reuse functions, thus it could cause some overhead. In case the
function gets level = 0 this step is also omitted and
dlb_print_statistics() will return immediately. For the static
back-end of DLB only level 0 and 1 give reasonable results. the
statistic function expects to get an integer with 4 bytes.
The output specified with the level is always including all smaller
levels:
level | new output
0 | None
1 | SUM(time spend in dlb_give_more()) [last time separated]
2 | About how long waited for new tasks
3 | Time of last two batches spend outside DLB
4 | Statistics about task length (time between dlb_give_more()
calls) + complete time between dlb_setup() and the finish of
DLB
Usage:
call dlb_print_statistics(level)
Here level is an integer with 4 bytes.
# DLB AND THREAD SAFETY #
Different implementations have different requirements concerning
thread safety, especially as some of the implementations build
explicitly on them. MPI provides some different MPI_THREAD_LEVELS,
which show what could be expected in that regard from the current
implementation. Each DLB variant can show the required thread level,
as it is stored in the variable DLB_THREAD_REQUIRED, which is also
handed over to the general DLB wrapper. But be aware that the thread
level is only for the usage of DLB. It is assumed that DLB will be the
only one to do some message passing in there. If the code, which uses
DLB should have also some thread level requirements one of course
needs to use the higher one, which should cause no problems to DLB. If
the parts of the code using DLB contain also some message passing one
even might to raise the level again. This of course does not affect
MPI_THREAD_SINGLE and might not even affect MPI_THREAD_FUNNELED but
MPI_THREAD_SERIALIZED should then be raised to MPI_THREAD_MULTIPLE.
# EXAMPLE #
use mpi
use dlb
integer :: ierr, prov, i
integer(idlb_kind) :: NJOBS, MAXJOBS, jobs(2)
integer(idlb_kind) :: distr(4), color
call MPI_INIT_THREADS(MPI_THREAD_MULTIPLE, prov, ierr)
call dlb_init(MPI_COMM_WORLD)
NJOBS = 20
MAXJOBS = 2
call dlb_setup(NJOBS)
do while (dlb_give_more (MAXJOBS, jobs))
do i = jobs(1) + 1, jobs(2)
print *, "Doing job", i
enddo
enddo
MAXJOBS = 3
distr(1) = 10
distr(2) = 13
distr(3) = 1
distr(4) = 9
call dlb_setup_color(distr)
do while (dlb_give_more_color (MAXJOBS, color, jobs))
do i = jobs(1) + 1, jobs(2)
print *, "Doing job", i, "of color", color
enddo
enddo
call dlb_setup_rr(NJOBS)
do while (dlb_give_more_rr (MAXJOBS, jobs))
do i = jobs(1) + 1, jobs(2), jobs(3)
print *, "Doing job", i
enddo
enddo
call dlb_finalize()
call MPI_FINALIZE(ierr)
# COMPILATION #
If DLB is used external from the ParaGauss repository set DLB_EXTERNAL
= 1 on top of the DLB Makefile.
## OUTPUT LEVEL ##
DLB is able to run without producing output. But it may also provide
some informations and statistics up to very detailed informations
about its behavior. What it will print of them is selected by the
output level. The output level is defined by parameter OUTPUT_BORDER
in the Makefile. This output is independent to the output of the
print_statistics function. Be aware that every process creates its own
output independent of the others (expect for OUTPUT_BORDER<=1) thus
only use them if you know that the IO routines are thread save.
OUTPUT_BORDER = 0: no output at all
OUTPUT_BORDER = 1: at initialization it will be printed which DLB
variant is used
OUTPUT_BORDER = 2: the DLB variants will also provide some more
statistics with summarized informations about the
run of DLB
OUTPUT_BORDER > 2: additional output will appear with time stamps at
some selected places
About
Dynamic Load Balancing with MPI
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published