MPI Debugging and Profiling

Porting, Debugging and Profiling MPI Applications on Windows Clusters Using HPC Server 2008

Windows HPC Server 2008 enables clusters or x64 and x64+GPU workstations and servers to tackle scientific computing applications in a mainstream Windows environment. Windows HPC Server 2008 provides several fundamental components for effective cluster computing—the MSMPI message-passing interface library, a job scheduler to manage cluster throughput, and a means for launching and monitoring compute jobs on a cluster that is nearly as simple as printing a document from a Windows application.

If you are among the many Windows users moving to cluster computing to break free of the serial performance limits of today's servers, you need a complete set of cluster-capable development tools to effectively port, debug and tune your Fortran, C and C++ applications on HPC Server 2008. The PGI CDK® Cluster Development Kit® for Windows includes a complete suite of compilers and tools for porting existing applications to, or developing new applications on, HPC Server 2008. The PGI CDK includes three key components that dovetail with HPC Server 2008 and Microsoft Visual C++ to enable effective cluster-based computing:

PGI Visual Fortran

PVF® fully integrates the PGI suite of high-performance 64-bit and 32-bit parallel Fortran compilers and tools into Microsoft Visual Studio. Interoperable with Microsoft Visual C++, PVF is an ideal solution for porting computationally intensive science and engineering applications to Windows HPC Server 2008 clusters.

PGI Visual Fortran offers world-class performance and features including auto-parallelization for multicore, OpenMP 3.0, and support for PGI Unified Binary™ technology. The PGI Unified Binary technology streamlines cross-platform support by combining into a single executable file code optimized for x64 processor families from both Intel and AMD. This gives you the assurance that your applications will run correctly and with optimal performance regardless of the type of x64 or x64+GPU processors on which they are deployed.

PVF's state-of-the-art Fortran compiler technologies include SSE/AVX vectorization, auto-parallelization, interprocedural analysis and optimization, memory heirarchy optimizations, function inlining (including library functions), profile-feedback optimization, CPU-specific optimizations and more. PVF is the ideal solution of migrating existing compute-intensive Windows applications from SMP servers and workstations to HPC Server 2008 clusters.

The PGDBG OpenMP/MPI Debugger for HPC Server 2008

PGDBG GUI Debugging a cluster MPI application can be extremely challenging. The PGDBG debugger provides a comprehensive set of graphical user interface (GUI) elements to assist you in this process. PGDBG provides the ability to separately debug and control OpenMP threads and MSMPI processes on your HPC Server 2008 cluster. Perform Step, Break, Run and Halt actions on threads or processes individually or collectively as a group. PGDBG can even display the state of MPI message queues, enabling you to quickly isolate and resolve message-passing deadlock bugs.

Using a single integrated multi-process debugging window, PGDBG provides precise control and feedback on the state of every MPI process and OpenMP thread simultaneously, with fully integrated capabilities for debugging hybrid parallel programs that use MSMPI message-passing between nodes and OpenMP shared-memory parallelism within a multicore or SMP cluster node.

Tabs in the Main window Source Panel allow you to display source code only, disassembly code showing how the currently executing high-level source code has been compiled into assembly language, or a mix where the assembly code is interleaved with the source code. Assembly language stepping and breakpoint indicators are enabled as well.

PGDBG is interoperable with the Microsoft Visual C++ compiler, and together with PGI Visual Fortran gives you the power to port and debug your OpenMP and MPI applications on Windows HPC Server 2008 clusters using an easy and intuitive graphical user interface.

The PGPROF OpenMP/MPI Profiler for Windows HPC Server 2008

PGPROF® is a powerful and simple-to-use interactive postmortem statistical analyzer for MPI process-parallel and OpenMP thread-parallel programs as well as programs incorporating PGI Accelerator directives and CUDA Fortran on Windows HPC Server 2008 clusters. Use PGPROF to visualize and diagnose the performance of the components of your program. PGPROF associates execution time with the source code and instructions of your program, allowing you to see where and how execution time is spent. Through resource utilization data and compiler feedback information, PGPROF also provides features for helping you to understand why certain parts of your program have high execution times.

PGPROF provides the information required to determine which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, PGPROF will enable you to maximize vectorization and performance on a single x64 processor core. PGPROF exposes performance bottlenecks in a cluster application by presenting the number of calls, aggregate message size and execution time of individual MPI function calls on a line by line basis. On GPUs, PGPROF reports performance critical information including initialization, data transfer and kernel execution times.

Using PGPROF, you can merge profiles from multiple runs on different numbers of nodes to perform scalability analysis on your MPI or OpenMP application at the application, function or line level. Scalability analysis allows you to quickly see which parts of your application are barriers to scalable performance, and where your parallel tuning efforts should be focused. PGPROF, displays information in easy-to-use formats such as bar-charts, percentages, counts or seconds and displays profiles using graphical histograms.

Putting it all Together

While performance of individual x64 processor cores is still improving, the premium on power efficiency has led processor vendors to push aggressively on multi-core technology rather than increased clock speeds. Significant application performance gains in the next few years will depend directly on your ability to exploit multi-core and cluster platforms. The PGI compilers and tools give you the ability to migrate incrementally from serial to auto-parallel or OpenMP parallel algorithms for multi-core processors. When you are ready to take the next step to cluster-enabled applications using MSMPI, the PGDBG debugger and PGPROF profiler provide simple and intuitive interfaces to make porting and tuning of applications to MPI more tractable.

Additional Information

Click me