Parallel programming: classic track (MPI & OpenMP) 5

  • David Henty, PRACE, XSEDE, RIKEN, Compute Canada
  • June 2016

Slide contents

  • Rules and Regulations of the 2nd Annual IHPCSS Challenge
  • General Rules
  • Some Specifics
  • Rules For Lawyers
  • Reality Checks
  • Suggested Things to Explore
  • Decision
  • Laplace Exercise
  • Our Foundation Exercise:Laplace Solver
  • Exercise Foundation: Jacobi Iteration
  • Serial Code Implementation
  • Serial C Code (kernel)
  • Serial C Code Subroutines
  • Whole C Code
  • Serial Fortran Code (kernel)
  • Serial Fortran Code Subroutines
  • Whole Fortran Code
  • First Things First: Domain Decomposition
  • Try Again: Domain Decomposition II
  • Simplest: Domain Decomposition III
  • Simplest Decomposition for C Code
  • Simplest Decomposition for C Code
  • Sending Multiple Elements
  • Sending Multiple Elements
  • Sending Multiple Elements
  • Simplest Decomposition for Fortran Code
  • Simplest Decomposition for Fortran Code
  • Sending Multiple Elements in Fortran
  • Main Loop Structure
  • Boundary Conditions
  • Two ways to approach this exercise.
  • MPI Template for C
  • MPI Template for Fortran
  • Some ways we might get fancy…
  • Some ways you might go wrong…
  • How do you know you are correct?
  • How do you know you are correct?
  • All the action is here.
  • Check for yourself.
  • Laplace Exercise
  • Introduction to OpenMP
  • Nested parallelism
  • Nested parallelism (cont)
  • Nested parallelism (cont)
  • NUMTHREADS clause
  • Orphaned directives
  • Orphaned directives (cont)
  • Data scoping rules
  • Binding rules
  • Thread private global variables
  • Data scoping rules
  • Thread private globals (cont)
  • COPYIN clause
  • COPYIN clause
  • Timing routines
  • Using timers
  • Introduction to OpenMP
  • OpenMP tasks
  • task directive
  • Data Sharing
  • At thread barriers (explicit or implicit)
  • Example
  • Parallel pointer chasing
  • Parallel pointer chasing on multiple lists
  • Advanced OpenMP
  • Motivation
  • Clustered architecture
  • Programming clusters
  • Issues
  • Development / maintenance
  • Simplest Decomposition for Fortran Code
  • Simplest: Domain Decomposition III
  • Development / maintenance
  • Portability
  • Thread Safety
  • Performance
  • Replicated data
  • Effect of domain size on halo storage
  • Poorly scaling MPI codes
  • Load balancing
  • Limited MPI process numbers
  • MPI implementation not tuned for SMP clusters
  • Styles of mixed-mode programming
  • OpenMP Master-only
  • OpenMP Funneled
  • OpenMP Serialized
  • OpenMP Multiple
  • MPI_Init_thread
  • MPI_Init_thread
  • MPI_Init_thread
  • OpenMP Funneled
  • OpenMP Serialized
  • MPI_Query_thread()
  • Pitfalls
  • Pitfalls
  • Master-only
  • Example
  • Funneled
  • OpenMP Funneled with overlapping (1)
  • OpenMP Funneled with overlapping (2)
  • Serialised
  • Distinguishing between threads
  • Multiple
  • Distinguishing between threads
  • Multiple
  • End points
  • Performance
  • Consequences
  • Summary
  • Advanced OpenMP
  • OpenMP Master-only