trunk/oopsePaper/parallel.tex

\section{\label{sec:Parallel}Parallel Methods and Force Decomposition}

Although processor power is doubling roughly every 18 months according to the famous Moore's Law\cite{moore}, it is still unreasonable to simulate systems of more then a 1000 atoms on a single processor. To facilitate study of larger system sizes or smaller systems on long time scales in a reasonable period of time, parallel methods were developed allowing multiple CPU's to share the simulation workload. Three general categories of parallel decomposition method's have been developed including atomic, spatial and force decomposition methods. 

Algorithmically simplest of the three method's is atomic decomposition where N particles in a simulation are split among P processors for the duration of the simulation. Computational cost scales as an optimal $O(N/P)$ for atomic decomposition. Unfortunately all processors must communicate positions and forces with all other processors leading communication to scale as an unfavorable $O(N)$ independent of the number of processors. This communication bottleneck led to the development of spatial and force decomposition methods in which communication among processors scales much more favorably. Spatial or domain decomposition divides the physical spatial domain into 3D boxes in which each processor is responsible for calculation of forces and positions of particles located in its box. Particles are reassigned to different processors as they move through simulation space. To calculate forces on a given particle, a processor must know the positions of particles within some cutoff radius located on nearby processors instead of the positions of particles on all processors. Both communication between processors and computation scale as $O(N/P)$ in the spatial method. However, spatial decomposition adds algorithmic complexity to the simulation code and is not very efficient for small N since the overall communication scales as the surface to volume ratio $(N/P)^{2/3}$ in three dimensions.

Force decomposition assigns particles to processors based on a block decomposition of the force matrix. Processors are split into a optimally square grid forming row and column processor groups. Forces are calculated on particles in a given row by particles located in that processors column assignment. Force decomposition is less complex to implement then the spatial method but still scales computationally as $O(N/P)$ and scales as $(N/\sqrt{p})$ in communication cost. Plimpton also found that force decompositions scales more favorably then spatial decomposition up to 10,000 atoms and favorably competes with spatial methods for up to 100,000 atoms. 
Revision:	752
Committed:	Mon Sep 8 20:58:12 2003 UTC (21 years, 10 months ago) by chuckv
Content type:	application/x-tex
File size:	2682 byte(s)
Log Message:	* empty log message *
#	User	Rev	Content
1	chuckv	751	\section{\label{sec:Parallel}Parallel Methods and Force Decomposition}
2
3	chuckv	752	Although processor power is doubling roughly every 18 months according to the famous Moore's Law\cite{moore}, it is still unreasonable to simulate systems of more then a 1000 atoms on a single processor. To facilitate study of larger system sizes or smaller systems on long time scales in a reasonable period of time, parallel methods were developed allowing multiple CPU's to share the simulation workload. Three general categories of parallel decomposition method's have been developed including atomic, spatial and force decomposition methods.
4
5			Algorithmically simplest of the three method's is atomic decomposition where N particles in a simulation are split among P processors for the duration of the simulation. Computational cost scales as an optimal $O(N/P)$ for atomic decomposition. Unfortunately all processors must communicate positions and forces with all other processors leading communication to scale as an unfavorable $O(N)$ independent of the number of processors. This communication bottleneck led to the development of spatial and force decomposition methods in which communication among processors scales much more favorably. Spatial or domain decomposition divides the physical spatial domain into 3D boxes in which each processor is responsible for calculation of forces and positions of particles located in its box. Particles are reassigned to different processors as they move through simulation space. To calculate forces on a given particle, a processor must know the positions of particles within some cutoff radius located on nearby processors instead of the positions of particles on all processors. Both communication between processors and computation scale as $O(N/P)$ in the spatial method. However, spatial decomposition adds algorithmic complexity to the simulation code and is not very efficient for small N since the overall communication scales as the surface to volume ratio $(N/P)^{2/3}$ in three dimensions.
6
7			Force decomposition assigns particles to processors based on a block decomposition of the force matrix. Processors are split into a optimally square grid forming row and column processor groups. Forces are calculated on particles in a given row by particles located in that processors column assignment. Force decomposition is less complex to implement then the spatial method but still scales computationally as $O(N/P)$ and scales as $(N/\sqrt{p})$ in communication cost. Plimpton also found that force decompositions scales more favorably then spatial decomposition up to 10,000 atoms and favorably competes with spatial methods for up to 100,000 atoms.