International HPC Summer School 2016
Parallel programming: classic track (MPI & OpenMP) 5
David Henty, PRACE, XSEDE, RIKEN, Compute Canada
June 2016
Slide contents
Rules and Regulations of the 2nd Annual IHPCSS Challenge
General Rules
Some Specifics
Rules For Lawyers
Reality Checks
Suggested Things to Explore
Decision
Laplace Exercise
Our Foundation Exercise:Laplace Solver
Exercise Foundation: Jacobi Iteration
Serial Code Implementation
Serial C Code (kernel)
Serial C Code Subroutines
Whole C Code
Serial Fortran Code (kernel)
Serial Fortran Code Subroutines
Whole Fortran Code
First Things First: Domain Decomposition
Try Again: Domain Decomposition II
Simplest: Domain Decomposition III
Simplest Decomposition for C Code
Simplest Decomposition for C Code
Sending Multiple Elements
Sending Multiple Elements
Sending Multiple Elements
Simplest Decomposition for Fortran Code
Simplest Decomposition for Fortran Code
Sending Multiple Elements in Fortran
Main Loop Structure
Boundary Conditions
Two ways to approach this exercise.
MPI Template for C
MPI Template for Fortran
Some ways we might get fancy…
Some ways you might go wrong…
How do you know you are correct?
How do you know you are correct?
All the action is here.
Check for yourself.
Laplace Exercise
Introduction to OpenMP
Nested parallelism
Nested parallelism (cont)
Nested parallelism (cont)
NUMTHREADS clause
Orphaned directives
Orphaned directives (cont)
Data scoping rules
Binding rules
Thread private global variables
Data scoping rules
Thread private globals (cont)
COPYIN clause
COPYIN clause
Timing routines
Using timers
Introduction to OpenMP
OpenMP tasks
task directive
Data Sharing
At thread barriers (explicit or implicit)
Example
Parallel pointer chasing
Parallel pointer chasing on multiple lists
Advanced OpenMP
Motivation
Clustered architecture
Programming clusters
Issues
Development / maintenance
Simplest Decomposition for Fortran Code
Simplest: Domain Decomposition III
Development / maintenance
Portability
Thread Safety
Performance
Replicated data
Effect of domain size on halo storage
Poorly scaling MPI codes
Load balancing
Limited MPI process numbers
MPI implementation not tuned for SMP clusters
Styles of mixed-mode programming
OpenMP Master-only
OpenMP Funneled
OpenMP Serialized
OpenMP Multiple
MPI_Init_thread
MPI_Init_thread
MPI_Init_thread
OpenMP Funneled
OpenMP Serialized
MPI_Query_thread()
Pitfalls
Pitfalls
Master-only
Example
Funneled
OpenMP Funneled with overlapping (1)
OpenMP Funneled with overlapping (2)
Serialised
Distinguishing between threads
Multiple
Distinguishing between threads
Multiple
End points
Performance
Consequences
Summary
Advanced OpenMP
OpenMP Master-only