Reproducibility in parallel OpenMD simulations

There’s an interesting issue with of how OpenMD distributes load on parallel (MPI) architectures. At the very beginning of a parallel simulation, the molecules are distributed among the processors using a short Monte Carlo procedure to divide the labor. This ensures that each processor has an approximately equal number of atoms to work with, and that the row- and column- distribution of atoms in the force decomposition is roughly equitable. The Monte Carlo procedure involves the use of a pseudo-random number to make processor assignments. So, if you run the same parallel simulation multiple times, the distribution of atoms on the processors can change from run to run. This shouldn’t be a problem if the MD algorithms are all working as they should, right?

However, one thing that many people forget is a specific limitation of floating point arithmetic. Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers. For example, on a computer, the sum

(x+y)+z

can have a different answer than

x+(y+z)

when x = 10³⁰, y = -10³⁰ and z = 1. In the first case, the answer we get is 1, while roundoff might give us 0 for the second expression. The addition-ordering roundoff issue can have effects on a simulation that are somewhat subtle. If you add up the forces on an atom in a different order, the total force might change by a small amount (perhaps 1 part in 10¹⁰). When we use this force to move that atom, we’ll be off by a small amount (perhaps 1 part in 10⁹). These small errors can start to make real differences in the microstate of a simulation (i.e. the configuration of the atoms), but shouldn’t alter the macrostate (i.e. the temperature, pressure, etc.).

That said, whenever there’s a random element to the order in which quantities are added up, we can get simulations that are not reproducible. And non-reproducibility is, in general, not good. So, how do we get around this issue in OpenMD? We let the user introduce a static seed for the random number generator that ensures that we always start with exactly the same set of pseudo-random numbers. If we seed the random number generator, then on the same number of processors, we’ll always get the same division of atoms, and we’ll get reproducible simulations.

To use this feature simply add a seed value to your <MetaData> section:

seed = 8675309;

This seed can be any large positive integer (an unsigned long int).

Once the seed is set, you can run on MPI clusters and be reasonably sure of getting reproducible simulations for runs with the same number of processors. However, if you mix runs of different numbers of processors, then the roundoff issue will reappear.

This entry was posted on February 22, 2013, 2:27 pm and is filed under Examples, News. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

OpenMD

Molecular Dynamics in the Open

Reproducibility in parallel OpenMD simulations

No comments yet.

Archives

Meta