C++ AMP, Part 2 of 2: Memory Layout and Support
Interactive

C++ AMP, Part 2 of 2: Memory Layout and Support

LearnNow Online
Updated Aug 22, 2018

Course description

In this course you’ll learn about how accelerator hardware is designed and integrated into the system. With that foundation, we can start talking about what you can expect from the system when you use various C++AMP features. Specifically, we will talk about data transfers to and from the accelerator, memory layout and memory accesses from the accelerator, and thread execution and control flow on the accelerator. Then we’ll cover what support Microsoft’s Visual Studio 2012 has for C++ AMP.

Each LearnNowOnline training course is made up of Modules (typically an hour in length). Within each module there are Topics (typically 15-30 minutes each) and Subtopics (typically 2-5 minutes each). There is a Post Exam for each Module that must be passed with a score of 70% or higher to successfully and fully complete the course.


Prerequisites

This course assumes that you have a good understanding of core C++ concepts, included classes, objects, containers, and iterators. You should also be familiar with Visual Studio 2012 for Visual C++ development, including compilation, testing, and debugging. Although not required or expected, you may get more out of some parts of the course if you are familiar with multithreaded programming, Visual Studio 2012’s debugging capabilities for multiple threads, and basic computer architecture concepts.


Meet the expert

John Stratton

John Stratton, Ph.D., is a senior architect at Multicoreware Inc. and a visiting lecturer at the University of Illinois at Urbana-Champaign. John has been at the forefront of research and education in heterogeneous computing, reaching hundreds of students through the Virtual School of Computational Science and Engineering’s courses on heterogeneous computing and optimization for scientific applications. John writes papers and articles for leading academic conferences and journals as well as broad-reaching publications such as IEEE Computer. He is also an active participant and presenter at several industry and technology groups and events across the country.

Video Runtime

142 Minutes

Time to complete

395 Minutes

Course Outline

Memory Layout

Memory Layout Overview (25:47)

  • Introduction (00:48)
  • GPU Architecture Overview (08:45)
  • Minimum Scale of Parallelism (07:43)
  • Demo: Scale and Preformance (03:20)
  • Demo: Benchmark Results (04:34)
  • Summary (00:35)

Memory Layout and SIMD (31:04)

  • Introduction (01:03)
  • Memory Layout and Accesses (06:37)
  • Good Access Patterns (00:51)
  • Demo: Transpose Operation (05:16)
  • Implicit SIMD Execution (03:57)
  • Divergent Penalties (03:07)
  • Demo: Divergence (04:32)
  • Demo: Divergence Problems (04:42)
  • Summary (00:55)

Data Transfers (17:42)

  • Introduction (00:46)
  • Host-Accelerator Data Transfers (03:30)
  • When Data Transfers Happen (03:02)
  • Demo: Data-Transfers (05:48)
  • Demo: Array View (04:01)
  • Summary (00:33)
Support for C++ AMP

Windows Support (14:44)

  • Introduction (00:30)
  • C++AMP uses Direct Compute (03:31)
  • Demo: AMP Implementations (04:01)
  • Demo: Multiple Accelerators (05:16)
  • Summary (01:23)

Debugging (20:56)

  • Introduction (00:43)
  • C++AMP Debugging (02:56)
  • Demo: Debugging C++Amp (05:24)
  • Demo: Debugging Tools (07:17)
  • Demo: Freezing Threads (02:00)
  • Debugging Parallel Kernal Code (01:52)
  • Summary (00:41)

Tiling (32:32)

  • Introduction (00:52)
  • Tiled Extents and Indexes (01:21)
  • Tiled Accelerator Execution (04:58)
  • Demo: Tiled Extents (05:44)
  • Tiled Accelerator Execution (2) (06:58)
  • Demo: Tile Size (05:50)
  • Demo: Tile Variables (05:40)
  • Summary (01:05)