Find information:

[9-29]Performance Analysis and Optimization of Parallel Scientific Applications on ...

Date:2008-09-23

Title:Performance Analysis and Optimization of Parallel Scientific Applications on Large-scale CMP Cluster Systems
Speaker:Xingfu Wu

Time:10:00-12:00am, Monday, Sept. 29 
Venue:Room 337

 

Abstract:

Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific applications. In this talk, we quantify the performance gap resulting from using different number of processors per node; this information is used to provide a baseline for the amount of optimization needed when using all processors per node on CMP clusters. We conduct detailed performance analysis to identify how applications can be modified to efficiently utilize all processors per node on CMP clusters, especially focusing on three scientific applications: a 3D particle-in-cell, magnetic fusion application Gyrokinetic Toroidal Code (GTC), a Lattice Boltzmann Method for simulating fluid dynamics (LBM), and an advanced Eulerian gyrokinetic-Maxwell equation solver for simulating microturbulent transport in plasma (GYRO). In terms of refinements, we use conventional techniques such as loop blocking, loop unrolling and loop fusion, develop hybrid methods for optimizing MPI_Allreduce and MPI_Reduce, and present a processor partitioning-based performance optimization method. Using these optimizations, the application performance for utilizing all processors per node was improved by up to 18.97% for GTC, 15.77% for LBM and 12.29% for GYRO on up to 2048 total processors on the CMP clusters.

 

Bio.:
Xingfu Wu received his Ph.D. degree in computer science from Beijing University of Aeronautics and Astronautics in 1997. He worked in National Research Center for Intelligent Computing Systems (NCIC), Institute of Computing Technology, Chinese Academy of Sciences as postdoctoral researcher during the academic year 1997-1998, and leaded the parallel programming environment group of Dawning2000 project. He worked in Department of Computer Science, Louisiana State University as visiting Assistant Professor during the academic year 1998-1999. He worked in Department of Electrical and Computer Engineering at Northwestern University as postdoctoral researcher during academic years 2000-2003. He has been working in Department of Computer Science at Texas A&M University as TEES Research Scientist since July 2003. He is a member of IEEE and ACM. He served as session chairs and program committee members for several international conferences, and was a guest editor of IEEE Distributed Systems Online Special Issue on Data-intensive Computing (Vol. 5, Issue 1, 2004). He was the day-to-day manager of NCSA Alliance Performance Engineering Expedition, and is Member-At-Large in NERSC Users’ Group Executive Committee (Feb 2007-Dec 2009). Dr. Wu’s monograph: Performance Evaluation, Prediction and Visualization of Parallel Systems was published by Kluwer Academic Publishers (ISBN 0-7923-8462-8) in 1999. His interests are web-based performance analysis systems, performance evaluation and modeling, parallel and grid computing, and scientific computing.
For details, see http://faculty.cs.tamu.edu/wuxf/resume.html