Содержание

Слайд 2

General Programming on Graphical Processing Units Quentin Ochem October 4th, 2018

General Programming on Graphical Processing Units

Quentin Ochem
October 4th, 2018

Слайд 3

What is GPGPU? GPU were traditionally dedicated to graphical rendering …

What is GPGPU?

GPU were traditionally dedicated to graphical rendering …
… but

their capability is really vectorized computation
Enters General Programming GPU (GPGPU)
Слайд 4

GPGPU Programming Paradigm Debug? Optimize data transfer? How to optimize occupancy

GPGPU Programming Paradigm

Debug?

Optimize data transfer?

How to optimize occupancy

Avoid data races?

Refactor parallel

algorithms?
Слайд 5

Why do we care about Ada? (1/2) Source: https://www.adacore.com/uploads/techPapers/Controlling-Costs-with-Software-Language-Choice-AdaCore-VDC-WP.PDF

Why do we care about Ada? (1/2)

Source: https://www.adacore.com/uploads/techPapers/Controlling-Costs-with-Software-Language-Choice-AdaCore-VDC-WP.PDF

Слайд 6

Why do we care about Ada (2/2) Signal processing Machine learning

Why do we care about Ada (2/2)

Signal processing
Machine learning
Monte-carlo simulation
Trajectory prediction
Cryptography
Image

processing
Physical simulation
… and much more!
Слайд 7

Available Hardware NVIDIA GeForce / Tesla / Quadro AMD Radeon Intel

Available Hardware

NVIDIA GeForce / Tesla / Quadro
AMD Radeon
Intel HD
NVIDIA Tegra ARM Mali Qualcomm

Adreno IMG Power VR Freescale Vivante

Embedded

Desktop & Server

Слайд 8

Ada Support

Ada Support

Слайд 9

Three options Interfacing with existing libraries “Ada-ing” existing languages Ada 2020

Three options

Interfacing with existing libraries
“Ada-ing” existing languages
Ada 2020

Слайд 10

Interfacing existing libraries Already possible and straightforward effort “gcc –fdump-ada-specs” will

Interfacing existing libraries

Already possible and straightforward effort
“gcc –fdump-ada-specs” will provide a

first binding of C to Ada
We could provide “thick” bindings to e.g. Ada.Numerics matrix operations
Слайд 11

“Ada-ing” existing languages CUDA – kernel-based language specific to NVIDIA OpenCL

“Ada-ing” existing languages

CUDA – kernel-based language specific to NVIDIA
OpenCL – portable

version of CUDA
OpenACC – integrated language marking parallel loops
Слайд 12

CUDA Example (Device code) procedure Test_Cuda (A : out Float_Array; B,

CUDA Example (Device code)

procedure Test_Cuda
(A : out Float_Array; B,

C : Float_Array)
with Export => True, Convention => C;
pragma CUDA_Kernel (Test_Cuda);
procedure Test_Cuda
(A : Float_Array; B, C : Float_Array)
is
begin
A (CUDA_Get_Thread_X) := B (CUDA_Get_Thread_X) + C (CUDA_Get_Thread_X);
end Test_cuda;
Слайд 13

CUDA Example (Host code) A, B, C : Float_Array; begin --

CUDA Example (Host code)
A, B, C : Float_Array;
begin
-- initialization

of B and C
-- CUDA specific setup
pragma CUDA_Kernel_Call (Grid’(1, 1, 1), Block’(8, 8, 8));
My_Kernel (A, B, C);
-- usage of A
Слайд 14

OpenCL example Similar to CUDA in principle Requires more code on

OpenCL example
Similar to CUDA in principle
Requires more code on the host

code (no call conventions)
Слайд 15

OpenACC example (Device & Host) procedure Test_OpenACC is A, B, C

OpenACC example (Device & Host)

procedure Test_OpenACC is
A, B, C :

Float_Array;
begin
-- initialization of B and C
for I in A’Range loop
pragma Acc_Parallel;
A (I) := B (I) + C (I);
end loop;
end Test_OpenACC;
Слайд 16

Ada 2020 procedure Test_Ada2020 is A, B, C : Float_Array; begin

Ada 2020

procedure Test_Ada2020 is
A, B, C : Float_Array;
begin
-- initialization

of B and C
parallel for I in A’Range loop
A (I) := B (I) + C (I);
end loop;
end Test_Ada2020;
Слайд 17

Lots of other language considerations Identification of memory layout (per thread,

Lots of other language considerations

Identification of memory layout (per thread, per

block, global)
Thread allocation specification
Reduction (ability to aggregate results through operators e.g. sum or concatenation)
Containers
Mutual exclusion

Слайд 18

A word on SPARK X_Size : 1000; Y_Size : 10; Data

A word on SPARK

X_Size : 1000;
Y_Size : 10;
Data

: array (1 .. X_Size * Y_Size) of Integer;
begin
for X in 1 .. X_Size loop
for Y in 1 .. Y_Size loop
Data (X + Y_Size * Y) := Compute (X, Y);
end loop;
end loop;

{X = 100, Y = 1}, X + Y * Y_Size = 100 + 10 = 110
{X = 10, Y = 10}, X + Y * Y_Size = 10 + 100 = 110