[Openmd-users] Debugging

Dan Gezelter gezelter at nd.edu
Tue Jul 20 14:41:52 EDT 2010


Brett,

  A random ordering of the objects in the dump file is the normal behavior for parallel runs.  Molecules are handed out the processors using a Monte Carlo algorithm to balance the load, and the objects present on each processor have object indices that are effectively randomized.

When a dump file is written, the processors are polled for their object information, and the file is written in processor order, so the ordering in the StuntDouble section is effectively randomized molecule-by-molecule.  This is why you'll see a short run of sequential object indexes (atoms in a single molecule) that then jumps discontinuously to another range (atoms in the next molecule that is on that processor).

The first field on each line of the StuntDouble section is an object index can be used to sort the information.  The DumpReader code is able to read and handle the randomly-ordered objects. This code is built in to all of the included analysis programs (StaticProps, DynamicProps, Dump2XYZ).  If you want to use your own analysis code, we'll need to write something that can handle reading an entire frame at once and then re-orders the objects for you based on this index.

*Missing* objects is a much more serious problem.  That would point to a problem with the communication between processors.  That said, none of the dump files you sent me appear to have missing objects (except for Membrane_Therm_failed_8prc.dump, which has an incomplete final frame).   Dump2XYZ can process the rest of them just fine, which means the object count in each frame matches the number of expected objects from the MetaData section.

So, this doesn't seem connected to the run-time issues.   I'm trying to run your failed 8 processor job to recreate the run-time problem.  My best suggestion for you is to build OpenMD with debugging mode turned on and to use this:

mpirun -np 8 xterm -e gdb openmd_MPI

(You'll get 8 xterm windows appearing on your screen.)

In each of the windows you can then do "run Membrane_Therm_8prc.md"

when you get to an error, run "bt" in the debugger to get a backtrace which will help us figure out what's going on.

 --Dan



On Jul 20, 2010, at 10:49 AM, Brett Donovan wrote:

> Hi Dan
> 
> I note that the team here at soton have sent an email regarding jobs
> that have been "bailing". I have been running a small system of 128
> lipids in parallel and serial to check the outcome. I have found the
> job usually completes without a runtime error for small time spans [I
> have seen larger jobs: both spatially and temporally bail]. However
> the output .dump file is behaving strangely in the case of more than
> one processor. The serial code -i.e. one processor still used the
> openmd_MPI called from mpirun, and returns expected output, however
> when scaling up to a number of processors using the same call but
> different number of processors there is unexpected output in that the
> .dump file in that atoms are missing along with sometimes a scrambling
> of the ordering. I have enclosed a number of .dump files to illustrate
> this effect. I have tried up to 256 prc's to see if there is any
> correlation - along with a few repeats, which may help uncover some
> issues. I have no idea if these run-time issues and the output are
> connected, however from the simulation point of view some of the .stat
> files appear to be doing the right things.
> 
> Running in verbose mode, I have not discovered run-time errors.
> 
> I have enclosed the .frc and .md files and the .dump files [which are
> in some cases for 100000 and 10000 timesteps].
> 
> Hopefully you should get notified about this from Humyo.com
> 
> Best wishes
> 
> Brett

***********************************************
  J. Daniel Gezelter
  Associate Professor of Chemistry
  Director of Graduate Admissions
  Department of Chemistry and Biochemistry
  251 Nieuwland Science Hall
  University of Notre Dame
  Notre Dame, IN 46556-5670

  phone:  +1 (574) 631-7595
  fax:    +1 (574) 631-6652
  e-mail: gezelter at nd.edu
  web:    http://www.nd.edu/~gezelter
************************************************





More information about the Openmd-users mailing list