Parallel programming: accelerator track 2

  • John Urbanic, PRACE, XSEDE, RIKEN, Compute Canada
  • June 2016

Slide contents

  • Our Workshop Environment
  • Exercise 1 C Solution
  • Exercise 1 Fortran Solution
  • Exercise 1: Compiler output (C)
  • Exercise 1: Performance
  • What’s with the OpenMP?
  • What went wrong?
  • Basic Concept
  • Multiple Times Each Iteration
  • Excessive Data Transfers
  • Data Management
  • First, about that “reduction”
  • Data Construct Syntax and Scope
  • Data Clauses
  • Array Shaping
  • Compiler will (increasingly) often make a good guess…
  • Data Regions Have Real Consequences
  • Data Regions Are Different Than Compute Regions
  • Data Regions Have Real Consequences
  • Compiler will (increasingly) often make a good guess…
  • Data Regions Have Real Consequences
  • Data Regions Are Different Than Compute Regions
  • Compiler will (increasingly) often make a good guess…
  • Data Regions Have Real Consequences
  • Data Movement Decisions
  • Exercise 2: Use acc data to minimize transfers
  • Data Movement Decisions
  • Exercise 2: Use acc data to minimize transfers
  • Exercise 2 C Solution
  • Exercise 2 Fortran Solution
  • Exercise 2 C Solution
  • Array Shaping
  • Exercise 2 Fortran Solution
  • Exercise 2: Performance
  • OpenACC or OpenMP?
  • OpenACC or OpenMP on Larger Data?
  • Latest Happenings In Data Management
  • Further speedups
  • General Principles: Finding Parallelism In Code
  • Is OpenACC Living Up To My Claims?
  • Is OpenACC Living Up To My Claims?
  • In Conclusion…