Auburn researchers develop new functionality to improve supercomputer performance
Published: Mar 7, 2018 6:00 AM
By Carol Nelson
A team of researchers in the Auburn University Department of Computer Science and Software Engineering is developing new software functionality to improve the performance of computational science on the world’s fastest supercomputers.
Under the guidance of department faculty, doctoral candidate Bradley Morgan has developed persistent nonblocking collective operations, a new communication technique for large-scale computations within Message Passing Interface, or MPI.
“MPI is basically software that allows many different processes on a supercomputer to speak to each other,” Morgan said. “If you have a difficult problem to solve, one that you couldn't do with a pen and paper or even a workstation computer, you need a number of processes to take little bits of the work, do their own parts, calculate it all and send it back. When doing these types of calculations, you often find reoccurring patterns of communication that take place among the processes. These patterns are defined in MPI as different collective operations, like a broadcast operation for example, which the program knows is a message from one single process to all the other processes.”
In a traditional collective operation, the MPI request object is re-created on each call for communication. Morgan said their research has explored the potential for maintaining MPI requests more efficiently in memory so that the request objects, which can be potentially large and costly to create, can be reused. The new communication technique is expected to boost efficiency in parallel computations where patterns are fixed across a number of processes.
“What we’ve added is the persistence to the operation,” Morgan said. “If you’re going to perform a collective operation over and over again in a loop, in the past you’ve had to reinitialize the request every time. What persistence does is add a little more efficiency because you can create the request object once and reuse it repeatedly. What it boils down to is the management of the memory in the program. It’s a simple concept, but one that hasn't been available until now.”
Morgan collaborated with researchers at the University of Alabama at Birmingham, the University of Edinburgh in Scotland and Intel Corp. to explore the potential for the new functionality. The team presented their research paper, “Planning for Performance: Persistent Collective Operations for MPI” at the 24th annual EuroMPI Conference in Chicago this past fall.
In addition, Morgan conducted his research testing on Auburn University’s supercomputers, “CASIC” and “Hopper,” which are maintained by the Office of Information Technology. He said that OIT offers a unique opportunity for Auburn researchers to perform their work on supercomputers.
“There are so many great researchers who are really good at what they do, but they may not have the best understanding of how to perform that research on a supercomputer. Auburn OIT provides support to our researchers to help them run their calculations on a supercomputer and to learn how to use that to their advantage,” Morgan said.
“In parallel computing, even slight improvements in efficiency can lead to significant impact,” he said. “My hope is that researchers are able to use these operations to find new ways to perform their science more quickly and efficiently.”
Media Contact: , chris.anthony@auburn.edu, 334.844.3447Bradley Morgan