Performance analysis and optimization (3parts)

  • Phil Blood, Christian Feld, PRACE, XSEDE, RIKEN, Compute Canada
  • June 2016

Slide contents

  • Performance Engineering of Parallel Applications
  • Acknowledgment
  • Outline for Performance Sessions
  • Fitting algorithms to hardware…and vice versa
  • Code Development and Optimization Process
  • Performance engineering workflow
  • A little background...
  • Hardware Counters
  • Features of PAPI
  • Measurement Techniques
  • Inclusive and Exclusive Profiles
  • Applying Performance Tools to Improve Parallel Performance of the UNRES MD code
  • Structure of UNRES
  • Performance Engineering: Procedure
  • Is There a Performance Problem?
  • Detecting Performance Problems
  • Use a Sampling Tool for Initial Performance Check
  • UNRES: Serial Performance
  • UNRES: Parallel Performance
  • Performance Engineering: Procedure
  • Which Functions are Important?
  • Contributions of Functions
  • UNRES Function Summary
  • Performance Engineering: Procedure
  • Choose a tool: there are many!
  • TAU: Tuning and Analysis Utilities
  • General Instructions for TAU
  • Using TAU with Makefiles
  • Tiny Routines: High Overhead
  • Reducing Overhead
  • Selective Instrumentation File
  • Selective Instrumentation File
  • Getting a Call Path with TAU
  • Getting Call Path Information
  • Isolate regions of code execution
  • Key UNRES Functions in TAU (with Startup Time)
  • Key UNRES Functions (MD Time Only)
  • Performance Engineering: Procedure
  • Detecting Serial Performance Issues
  • Create a Derived Metric in Paraprof Manager
  • Perf of EELEC (peak is 2)
  • Performance Engineering: Procedure
  • Do compiler optimization first! EELEC – After forcing inlining with compiler
  • Further Info on Serial Optimization
  • Performance Engineering: Procedure
  • TAU Recipe #1: Detecting Serial Bottlenecks
  • Serial Bottleneck Detection in UNRES: Function Scaling
  • TAU Recipe #2: Detecting Parallel Load Imbalance
  • Load Imbalance Detection in UNRES
  • Major Serial Bottleneck and Load Imbalance in UNRES Eliminated
  • Next Iteration of Performance Engineering with Optimized Code
  • Use Call Path Information: MPI Calls
  • Performance Engineering: Procedure
  • Some Take-Home Points
  • International HPC Summer School 2016: Performance analysis and optimization Hands-on:
  • Access to Bridges
  • Compiling & job submission
  • Local installation
  • NPB-MZ-MPI suite
  • Building an NPB-MZ-MPI benchmark
  • System topology
  • Building an NPB-MZ-MPI benchmark
  • NPB-MZ-MPI / BT (Block Tridiagonal Solver)
  • Building an NPB-MZ-MPI benchmark
  • NPB-MZ-MPI / BT reference execution
  • Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir
  • Performance engineering workflow
  • Fragmentation of tools landscape
  • Scalasca  TAU  VAMPIR  Paraver
  • Score-P project idea
  • Score-P overview
  • Hands-on: NPB-MZ-MPI / BT