Skip to content

Multiple processor crash #10

@cianciosa

Description

@cianciosa

Joachim Geiger has reported a crash when running with multiple processors. The following input files
Cases.zip
show the behavior. input.crashes uses an extended number of modes and crashed with a heap-overflow error when run with more than a single processor. The input.works` is the same case with a reduced number of modes. This cases does not exhibit the behavior. The crash was reported using the ifort compiler however, I was able to reproduce this crash by turning on the address-sanitizer flag.

% mpirun -n 4 xvmec input.crashes_3    
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  VMEC OUTPUT FILES ALREADY EXIST: OVERWRITING THEM ...
  SEQ =    1 TIME SLICE  0.0000E+00
  PROCESSING INPUT.crashes_3
  THIS IS PARVMEC (PARALLEL VMEC), VERSION 9.0
  Lambda: Full Radial Mesh. L-Force: hybrid full/half.

  COMPUTER: cianciosaimac   OS: Darwin   RELEASE: 19.6.0  DATE = Jan 21,2021  TIME = 12:52:34

  NS =    8 NO. FOURIER MODES =  185 FTOLV =  1.000E-06 NITER =  20000
  PROCESSOR COUNT - RADIAL:    4
 INITIAL JACOBIAN CHANGED SIGN!
 TRYING TO IMPROVE INITIAL MAGNETIC AXIS GUESS
  ---- Improved AXIS Guess ----
      RAXIS_CC =    5.5423259209884730       0.30747882334706500        3.6107777297953697E-002   2.1925887832076173E-002 -0.17127515915757005       0.33995876393572677        2.7194580396712614E-002   8.7619938032124662E-003   2.1641584886036458E-002  -3.0060375964156970E-002   4.0919407891436034E-003   7.2283631622133112E-003  -4.8096045954452264E-003   3.2132317238919464E-003   1.3366337123433408E-003  -5.0218208257885189E-003  -1.0805539441867496E-003   3.8372284158438586E-004   1.2322391511445112E-003   8.2564184559682900E-004   9.0462982158830627E-003
      ZAXIS_CS =   -0.0000000000000000      -0.40364620347171476       -2.6212416249487239E-002   2.5845975128812093E-002  0.15344591155188636      -0.27210128536906603       -2.4819582171628708E-002  -7.6814873421304332E-003  -2.2282872186040290E-002   1.9170323502591072E-002  -1.1569841914854002E-002  -6.1298139436995875E-004  -2.6220827681052326E-003  -5.6155647985143900E-003  -3.0101401187663541E-003  -8.9905949402988867E-003  -4.8346291121438923E-003  -5.7954765825185117E-003   8.0075797167838414E-003  -3.0281697953424324E-003  -3.8957154711619243E-003
  -----------------------------
=================================================================
=================================================================
=================================================================
=================================================================
==55382==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6180000077c0 at pc 0x000102552fed bp 0x7ffeed759dc0 sp 0x7ffeed759db8
==55380==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6180000077c0 at pc 0x00010ec8dfed bp 0x7ffee101edc0 sp 0x7ffee101edb8
READ of size 8 at 0x6180000077c0 thread T0
READ of size 8 at 0x6180000077c0 thread T0
==55383==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6180000077c0 at pc 0x00010c003fed bp 0x7ffee3ca8dc0 sp 0x7ffee3ca8db8
READ of size 8 at 0x6180000077c0 thread T0
==55381==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6180000077c0 at pc 0x00010d992fed bp 0x7ffee2319dc0 sp 0x7ffee2319db8
READ of size 8 at 0x6180000077c0 thread T0
    #0 0x10ec8dfec in __blocktridiagonalsolver_bst_MOD_initialize_bst blocktridiagonalsolver_bst.f90:2005
    #1 0x10f09af06 in runvmec_ runvmec.f:329
    #2 0x10ebdf804 in MAIN__ vmec.f:333
    #3 0x10ebe1818 in main vmec.f:2
    #0 0x10c003fec in __blocktridiagonalsolver_bst_MOD_initialize_bst blocktridiagonalsolver_bst.f90:2005
    #1 0x10c410f06 in runvmec_ runvmec.f:329
    #2 0x10bf55804 in MAIN__ vmec.f:333
    #3 0x10bf57818 in main vmec.f:2
    #4 0x7fff6fbc6cc8 in start (libdyld.dylib:x86_64+0x1acc8)

0x6180000077c0 is located 0 bytes to the right of 832-byte region [0x618000007480,0x6180000077c0)
allocated by thread T0 here:
    #4 0x7fff6fbc6cc8 in start (libdyld.dylib:x86_64+0x1acc8)

0x6180000077c0 is located 0 bytes to the right of 832-byte region [0x618000007480,0x6180000077c0)
allocated by thread T0 here:
    #0 0x10d992fec in __blocktridiagonalsolver_bst_MOD_initialize_bst blocktridiagonalsolver_bst.f90:2005
    #1 0x10dd9ff06 in runvmec_ runvmec.f:329
    #2 0x10d8e4804 in MAIN__ vmec.f:333
    #3 0x10d8e6818 in main vmec.f:2
    #0 0x113a341ad in wrap_malloc (libasan.5.dylib:x86_64+0x6c1ad)
    #1 0x10ec8d21b in __blocktridiagonalsolver_bst_MOD_initialize_bst blocktridiagonalsolver_bst.f90:2002
    #2 0x10f09af06 in runvmec_ runvmec.f:329
    #3 0x10ebdf804 in MAIN__ vmec.f:333
    #4 0x10ebe1818 in main vmec.f:2
    #5 0x7fff6fbc6cc8 in start (libdyld.dylib:x86_64+0x1acc8)

    #4 0x7fff6fbc6cc8 in start (libdyld.dylib:x86_64+0x1acc8)

0x6180000077c0 is located 0 bytes to the right of 832-byte region [0x618000007480,0x6180000077c0)
SUMMARY: AddressSanitizer: heap-buffer-overflow blocktridiagonalsolver_bst.f90:2005 in __blocktridiagonalsolver_bst_MOD_initialize_bst
allocated by thread T0 here:
Shadow bytes around the buggy address:
  0x1c3000000ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1c3000000ef0: 00 00 00 00 00 00 00 00[fa]fa fa fa fa fa fa fa
  0x1c3000000f00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c3000000f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==55380==ABORTING

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x1136dc72c
#1  0x1136dbad3
#2  0x7fff6fdbf5fc
    #0 0x11128d1ad in wrap_malloc (libasan.5.dylib:x86_64+0x6c1ad)
    #1 0x10c00321b in __blocktridiagonalsolver_bst_MOD_initialize_bst blocktridiagonalsolver_bst.f90:2002
    #2 0x10c410f06 in runvmec_ runvmec.f:329
    #3 0x10bf55804 in MAIN__ vmec.f:333
    #4 0x10bf57818 in main vmec.f:2
    #5 0x7fff6fbc6cc8 in start (libdyld.dylib:x86_64+0x1acc8)

SUMMARY: AddressSanitizer: heap-buffer-overflow blocktridiagonalsolver_bst.f90:2005 in __blocktridiagonalsolver_bst_MOD_initialize_bst
Shadow bytes around the buggy address:
  0x1c3000000ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1c3000000ef0: 00 00 00 00 00 00 00 00[fa]fa fa fa fa fa fa fa
  0x1c3000000f00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c3000000f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c3000000f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==55383==ABORTING

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x10d8f072c
#1  0x10d8efad3
#2  0x7fff6fdbf5fc
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 55380 on node cianciosaimac exited on signal 6 (Abort trap: 6).
--------------------------------------------------------------------------

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions