Collision detection on the GPU

Содержание

Слайд 2

Overview Quick Background CPU Methods CULLIDE RCULLIDE QCULLIDE CUDA Methods

Overview

Quick Background
CPU Methods
CULLIDE
RCULLIDE
QCULLIDE
CUDA Methods

Слайд 3

Background Need to find collisions for lots of reasons Physics engines

Background

Need to find collisions for lots of reasons
Physics engines
Seeing if a

projectile hits an object
Ray casting
Game engines
Etc…
Слайд 4

Background Broad phase: Looks at entire scene Looks at proxy geometry

Background

Broad phase:
Looks at entire scene
Looks at proxy geometry (bounding shapes)
Determines if

two objects may intersect
Needs to be very fast
Слайд 5

Background Narrow phase: Looks at pairs of objects flagged by broad

Background

Narrow phase:
Looks at pairs of objects flagged by broad phase
Looks at

the actual geometry of an object
Determines if objects are truly intersecting
Generally slower
Слайд 6

Background Resolution Compute forces according to the contact points returned from

Background

Resolution
Compute forces according to the contact points returned from the narrow

phase
Can be non trivial if there are multiple contact points
Returns resulting forces to be added to each body
Слайд 7

CPU Methods Brute Force Check every object against every other N(N-1)/2

CPU Methods

Brute Force
Check every object against every other
N(N-1)/2 tests O(N²)
Sweep and

Prune
Average case: O(N log N)
Worst case: O(N²)
Spatial Subdivisions
Average case: O(N log N)
Worst case: O(N²)
Слайд 8

Sweep and Prune Bounding volume is projected onto x, y, z

Sweep and Prune

Bounding volume is projected onto x, y, z axis
Determine

collision interval for each object [bi, ei]
Two objects who’s collision intervals do not overlap can not collide

O1

O3

O2

Sorting Axis

B1

B3

E1

B2

E3

E2

Слайд 9

Spatial Subdivisions 1 2 3 4 5 6 7 8 Images

Spatial Subdivisions

1

2

3

4

5

6

7

8

Images from pg 699, 700 GPU Gems III

O1

O2

O3

O4

1

2

3

4

5

6

7

8

Example

Слайд 10

CULLIDE Came out of Dinesh’s group at UNC in 2003 Uses

CULLIDE

Came out of Dinesh’s group at UNC in 2003
Uses graphics hardware

to do a broad-narrow phase hybrid
No shader languages
Слайд 11

Outline Overview Pruning Algorithm Implementation and Results Conclusions and Future Work

Outline

Overview
Pruning Algorithm
Implementation and Results
Conclusions and Future Work

Слайд 12

Outline Overview Pruning Algorithm Implementation and Results Conclusions and Future Work

Outline

Overview
Pruning Algorithm
Implementation and Results
Conclusions and Future Work

Слайд 13

Overview Potentially Colliding Set (PCS) computation Exact collision tests on the PCS

Overview

Potentially Colliding Set (PCS) computation
Exact collision tests on the PCS

Слайд 14

Algorithm Object Level Pruning Sub-object Level Pruning Exact Tests

Algorithm

Object Level Pruning

Sub-object Level Pruning

Exact Tests

Слайд 15

Potentially Colliding Set (PCS)

Potentially Colliding Set (PCS)

Слайд 16

Potentially Colliding Set (PCS) PCS

Potentially Colliding Set (PCS)

PCS

Слайд 17

Outline Problem Overview Overview Pruning Algorithm Implementation and Results Conclusions and Future Work

Outline

Problem Overview
Overview
Pruning Algorithm
Implementation and Results
Conclusions and Future Work

Слайд 18

Algorithm Object Level Pruning Sub-object Level Pruning Exact Tests

Algorithm

Object Level Pruning

Sub-object Level Pruning

Exact Tests

Слайд 19

Visibility Computations Lemma 1: An object O does not collide with

Visibility Computations

Lemma 1: An object O does not collide with

a set of objects S if O is fully visible with respect to S
Utilize visibility for PCS computation
Слайд 20

Collision Detection using Visibility Computations

Collision Detection using Visibility Computations

Слайд 21

PCS Pruning Lemma 2: Given n objects O1,O2,…,On , an object

PCS Pruning

Lemma 2: Given n objects O1,O2,…,On , an object Oi

does not belong to PCS if it does not collide with O1,…,Oi-1,Oi+1,…,On
Prune objects that do not collide
Слайд 22

PCS Pruning O1 O2 … Oi-1 Oi Oi+1 … On-1 On

PCS Pruning

O1 O2 … Oi-1 Oi Oi+1 … On-1 On

O1 O2

… Oi-1 Oi Oi+1 … On-1 On

O1 O2 … Oi-1 Oi Oi+1 … On-1 On

Слайд 23

PCS Pruning O1 O2 … Oi-1 Oi

PCS Pruning

O1 O2 … Oi-1 Oi

Слайд 24

PCS Pruning Oi Oi+1 … On-1 On

PCS Pruning

Oi Oi+1 … On-1 On

Слайд 25

PCS Computation Each object tested against all objects but itself Naive

PCS Computation

Each object tested against all objects but itself
Naive algorithm is

O(n2)
Linear time algorithm
Uses two pass rendering approach
Conservative solution
Слайд 26

PCS Computation: First Pass O1 O2 … Oi-1 Oi Oi+1 … On-1 On

PCS Computation: First Pass

O1 O2 … Oi-1 Oi Oi+1 … On-1

On
Слайд 27

PCS Computation: First Pass O1

PCS Computation: First Pass

O1

Слайд 28

PCS Computation: First Pass O1 O2

PCS Computation: First Pass

O1 O2

Слайд 29

O1 O2 … Oi-1 Oi PCS Computation: First Pass

O1 O2 … Oi-1 Oi

PCS Computation: First Pass

Слайд 30

PCS Computation: First Pass O1 O2 … Oi-1 Oi Oi+1 … On-1 On

PCS Computation: First Pass

O1 O2 … Oi-1 Oi Oi+1 … On-1

On
Слайд 31

PCS Computation: Second Pass O1 O2 … Oi-1 Oi Oi+1 … On-1 On On

PCS Computation: Second Pass

O1 O2 … Oi-1 Oi Oi+1 … On-1

On

On

Слайд 32

PCS Computation: Second Pass On

PCS Computation: Second Pass

On

Слайд 33

PCS Computation: Second Pass On-1 On

PCS Computation: Second Pass

On-1 On

Слайд 34

PCS Computation: Second Pass Oi Oi+1 … On-1 On

PCS Computation: Second Pass

Oi Oi+1 … On-1 On

Слайд 35

PCS Computation: Second Pass Fully Visible? O1 O2 … Oi-1 Oi Oi+1 … On-1 On

PCS Computation: Second Pass

Fully Visible?

O1 O2 … Oi-1 Oi Oi+1 …

On-1 On
Слайд 36

PCS Computation O1 O2 … Oi-1 Oi Oi+1 … On-1 On

PCS Computation

O1 O2 … Oi-1 Oi Oi+1 … On-1 On

Слайд 37

PCS Computation O1 O3 … Oi-1 Oi+1 … On-1

PCS Computation

O1 O3 … Oi-1 Oi+1 … On-1

Слайд 38

Example O1 O2 O3 O4 Scene with 4 objects O1and O2

Example

O1

O2

O3

O4

Scene with 4 objects O1and O2 collide O3, O4 do not collide

Initial PCS

= { O1,O2,O3,O4 }
Слайд 39

O3 O1 First Pass O2 Order of rendering: O1 O4 O4

O3

O1

First Pass

O2

Order of rendering: O1 O4

O4

Слайд 40

Second Pass O1 O3 O2 Order of rendering: O4 O1 O4

Second Pass

O1

O3

O2

Order of rendering: O4 O1

O4

Слайд 41

After two passes O1 O2 O3 O4

After two passes

O1

O2

O3

O4

Слайд 42

Potential Colliding Set O1 O2 PCS ={O1,O2}

Potential Colliding Set

O1

O2

PCS ={O1,O2}

Слайд 43

Algorithm Object Level Pruning Sub-object Level Pruning Exact Tests

Algorithm

Object Level Pruning

Sub-object Level Pruning

Exact Tests

Слайд 44

Overlap Localization Each object is composed of sub-objects We are given

Overlap Localization

Each object is composed of sub-objects
We are given n objects

O1,…,On
Compute sub-objects of an object Oi that overlap with sub-objects of other objects
Слайд 45

Overlap Localization Our solution Test if each sub-object of Oi overlaps

Overlap Localization

Our solution
Test if each sub-object of Oi overlaps with sub-objects

of O1,..Oi-1
Test if each sub-object of Oi overlaps with sub-objects of Oi+1,...,On
Linear time algorithm
Extend the two pass approach
Слайд 46

Overlap Localization Sub-objects

Overlap Localization

Sub-objects

Слайд 47

Overlap Localization: First Pass O1 O2 … Oi-1 Oi Oi+1 … On-1 On

Overlap Localization: First Pass

O1 O2 … Oi-1 Oi Oi+1 … On-1

On
Слайд 48

Overlap Localization: First Pass O1 O2 … Oi-1 Oi Rendered sub-objects

Overlap Localization: First Pass

O1 O2 … Oi-1 Oi

Rendered sub-objects

Слайд 49

Overlap Localization: First Pass O1 O2 … Oi-1 Rendered sub-objects

Overlap Localization: First Pass

O1 O2 … Oi-1

Rendered sub-objects

Слайд 50

Overlap Localization: First Pass O1 O2 … Oi-1 Rendered sub-objects

Overlap Localization: First Pass

O1 O2 … Oi-1

Rendered sub-objects

Слайд 51

Overlap Localization: First Pass Rendered sub-objects O1 O2 … Oi-1

Overlap Localization: First Pass

Rendered sub-objects

O1 O2 … Oi-1

Слайд 52

Overlap Localization: First Pass Rendered sub-objects O1 O2 … Oi-1

Overlap Localization: First Pass

Rendered sub-objects

O1 O2 … Oi-1

Слайд 53

Overlap Localization: First Pass O1 O2 … Oi-1 Oi Rendered sub-objects

Overlap Localization: First Pass

O1 O2 … Oi-1 Oi

Rendered sub-objects

Слайд 54

Overlap Localization: First Pass O1 O2 … Oi-1 Oi Oi+1 … On-1 On Rendered sub-objects

Overlap Localization: First Pass

O1 O2 … Oi-1 Oi Oi+1 … On-1

On

Rendered sub-objects

Слайд 55

Overlap Localization: Second Pass O1 O2 … Oi-1 Oi Oi+1 … On-1 On

Overlap Localization: Second Pass

O1 O2 … Oi-1 Oi Oi+1 … On-1

On
Слайд 56

Overlap Localization O1 O2 … Oi-1 Oi Oi+1 … On-1 On Sub-objects

Overlap Localization

O1 O2 … Oi-1 Oi Oi+1 … On-1 On

Sub-objects

Слайд 57

Potential Colliding Set O1 O2 PCS = {O1,O2}

Potential Colliding Set

O1

O2

PCS = {O1,O2}

Слайд 58

O1 Sub-objects O2 PCS = sub-objects of {O1,O2}

O1

Sub-objects

O2


PCS = sub-objects of {O1,O2}

Слайд 59

First Pass Rendering order: Sub-objects of O1 O2

First Pass

Rendering order: Sub-objects of O1 O2

Слайд 60

First Pass

First Pass

Слайд 61

First Pass

First Pass

Слайд 62

First Pass

First Pass

Слайд 63

First Pass

First Pass

Слайд 64

First Pass

First Pass

Слайд 65

First Pass

First Pass

Слайд 66

Second Pass Rendering order: Sub-objects of O2 O1

Second Pass

Rendering order: Sub-objects of O2 O1

Слайд 67

Second Pass

Second Pass

Слайд 68

Second Pass

Second Pass

Слайд 69

Second Pass

Second Pass

Слайд 70

Fully Visible Second Pass

Fully Visible

Second Pass

Слайд 71

Fully Visible Second Pass

Fully Visible

Second Pass

Слайд 72

After two passes

After two passes

Слайд 73

PCS

PCS

Слайд 74

Algorithm Object Level Pruning Sub-object level Pruning Exact Tests Exact Overlap tests using CPU

Algorithm

Object Level Pruning

Sub-object level Pruning

Exact Tests

Exact Overlap tests using CPU

Слайд 75

Visibility Queries We require a query Tests if a primitive is

Visibility Queries

We require a query
Tests if a primitive is fully visible

or not
Current hardware supports occlusion queries
Test if a primitive is visible or not
Our solution
Change the sign of depth function
Слайд 76

Visibility Queries Depth function GEQUAL LESS All fragments Pass Examples - HP_Occlusion_test, NV_occlusion_query

Visibility Queries

Depth function

GEQUAL

LESS

All fragments

Pass

Examples - HP_Occlusion_test, NV_occlusion_query

Слайд 77

Bandwidth Analysis Read back only integer identifiers Independent of screen resolution

Bandwidth Analysis

Read back only integer identifiers
Independent of screen resolution

Слайд 78

Optimizations First use AABBs as object bounding volume Use orthographic views

Optimizations

First use AABBs as object bounding volume
Use orthographic views for pruning
Prune

using original objects
Слайд 79

Advantages No coherence No assumptions on motion of objects Works on

Advantages

No coherence
No assumptions on motion of objects
Works on generic models
A fast

pruning algorithm
No frame-buffer readbacks
Слайд 80

Limitations No distance or penetration depth information Resolution issues No self-collisions

Limitations

No distance or penetration depth information
Resolution issues
No self-collisions
Culling performance varies with

relative configurations
Слайд 81

Assumptions Makes assumptions that their algorithm will get faster as hardware improves. Luckily they were right

Assumptions

Makes assumptions that their algorithm will get faster as hardware improves.
Luckily

they were right
Слайд 82

RCULLIDE An improvement on CULLIDE in 2004 Resolves issue of screen resolution precision

RCULLIDE

An improvement on CULLIDE in 2004
Resolves issue of screen resolution precision

Слайд 83

Overview A main issue with CULLIDE was the fact that it

Overview

A main issue with CULLIDE was the fact that it wasn’t

reliable
Collisions could easily be missed due to screen resolution
Слайд 84

Overview 3 kinds of error associated with visibility based overlap Perspective

Overview

3 kinds of error associated with visibility based overlap
Perspective error
Strange shapes

from the transformation
Sampling error
Pixel resolution isn’t high enough
Depth buffer precision error
If distance between primitives is less than the depth buffer resolution, we will get incorrect results from our visibility query
Слайд 85

Reliable Queries The three errors cause the following: A fragment to

Reliable Queries

The three errors cause the following:
A fragment to not be

rasterized
A fragment is generated but not sampled where interference occurs
A fragment is generated and sampled where the interference occurs but the precision of the buffer is not sufficient
Слайд 86

Reliable Queries Use “fat” triangles Generate 2 fragments for each pixel

Reliable Queries

Use “fat” triangles
Generate 2 fragments for each pixel touched by

a triangle (no matter how little it is in the pixel)
For each pixel touched by the triangle, the depth of the 2 fragments must bound the depth of all points of the triangle in that pixel
Causes method to become more conservative (read: slower) but much more accurate
Слайд 87

Minkowski Sum Scary name…easy math A = { (1, 0), (0,

Minkowski Sum

Scary name…easy math

A = { (1, 0), (0, 1), (0, −1)}


B = { (0, 0), (1, 1), (1, −1)}

A + B = { (1, 0), (2, 1), (2, −1), (0, 1), (1, 2), (1, 0), (0, −1), (1, 0), (1, −2)}

Слайд 88

Reliable Queries In practice, we use the Minkowski sum of a

Reliable Queries

In practice, we use the Minkowski sum of a bounding

cube B and the triangle T
B = max(2dx, 2dy, 2dz) where dx,y,z are pixel dimensions
If uniform supersampling is known to occur on the card, we can reduce the size of B
We need B to cover at least 1 sampling point for the triangle it bounds
Слайд 89

Reliable Queries Cubes only work for z-axis projections so in practice

Reliable Queries

Cubes only work for z-axis projections so in practice use

a bounding sphere of radius sqrt(3)p/2
Слайд 90

Bounding Offset So far we’ve just dealt with single triangles but

Bounding Offset

So far we’ve just dealt with single triangles but we

need whole objects
This is done using a Union of Object-oriented Bounding Boxes(UOBB)
Слайд 91

Algorithm

Algorithm

Слайд 92

Improvement over CULLIDE

Improvement over CULLIDE

Слайд 93

Performance Still runs faster than CPU implementations 3x slower than CULLIDE

Performance

Still runs faster than CPU implementations
3x slower than CULLIDE due to

bounding box rasterization vs triangle rasterization
Слайд 94

QCULLIDE Extends CULLIDE to handle self collisions in complex meshes All running in real time

QCULLIDE

Extends CULLIDE to handle self collisions in complex meshes
All running in

real time
Слайд 95

Self Collision Culling Note that only intersecting triangles that don’t share

Self Collision Culling

Note that only intersecting triangles that don’t share a

vertex or edge are considered colliding
Слайд 96

Self Collision Culling Algorithm Include all potentially colliding primitives and PCS

Self Collision Culling

Algorithm
Include all potentially colliding primitives and PCS where each

primitive is a triangle
Perform the visibility test to see if a triangle is penetrating any other
If completely visible, the object is not colliding
Слайд 97

Q-CULLIDE Sets BFV – Objects fully visible in both passes and

Q-CULLIDE

Sets
BFV – Objects fully visible in both passes and are pruned

from the PCS
FFV – Fully visible in only the first pass
SFV – Fully visible in only the second pass
NFV – Not fully visible in both passes
Слайд 98

Q-CULLIDE Properties of sets FFV and SFV are collision free No

Q-CULLIDE

Properties of sets
FFV and SFV are collision free
No object in FFV

collides with any other in FFV…same for SFV
If an object is in FFV and is fully visible in the 2nd pass of the algorithm, we can prune it and vice versa
Слайд 99

Algorithm

Algorithm

Слайд 100

Algorithm

Algorithm

Слайд 101

What’s Happening

What’s Happening

Слайд 102

Improvement Over CULLIDE

Improvement Over CULLIDE

Слайд 103

Improvements Over CULLIDE Sends an order of magnitude less collisions to the CPU than CULLIDE

Improvements Over CULLIDE

Sends an order of magnitude less collisions to the

CPU than CULLIDE
Слайд 104

Spatial Subdivision Partition space into uniform grid Grid cell is at

Spatial Subdivision

Partition space into uniform grid
Grid cell is at least as

large as largest object
Each cell contains list of each object whose centroid is in the cell
Collision tests are performed between objects who are in same cell or adjacent cells

1

2

3

4

5

6

7

8

Images from pg 699, 700 GPU Gems III

O1

O2

O3

O4

Implementation:
Create list of object IDs along with hashing of cell IDs in which they reside
Sort list by cell ID
Traverse swaths of identical cell IDs
Perform collision tests on all objects that share same cell ID

1

2

3

4

5

6

7

8

Example

Слайд 105

Parallel Spatial Subdivision Complications: Single object can be involved in multiple

Parallel Spatial Subdivision

Complications:
Single object can be involved in multiple collision tests
Need

to prevent multiple threads updating the state of an object at the same time

Ways to solve this?

Слайд 106

Guaranteed Individual Collision Tests Prove: No two cells updated in parallel

Guaranteed Individual Collision Tests

Prove: No two cells updated in parallel may

contain the same object that is being updated
Constraints
Each cell is as large as the bounding volume of the largest object
Each cell processed in parallel must be separated by each other cell by at least one intervening cell
In 2d this takes _____ number of passes
In 3d this takes _____ number of passes

4

8

Слайд 107

Example of Parallel Spatial Subdivision O1 O2 O3 O4 1 2

Example of Parallel Spatial Subdivision

O1

O2

O3

O4

1

2

1

2

3

4

3

4

O1

O2

O3

O4

1

2

1

2

3

4

3

4

Слайд 108

Avoiding Extra Collision Testing Associate each object a set of control

Avoiding Extra Collision Testing

Associate each object a set of control bits

to test where its centroid resides
Scale the bounding sphere of each object by sqrt(2) to ensure the grid cell is at least 1.5 times larger than the largest object

1

2

1

2

3

4

3

4

Case 1

Case 2

Слайд 109

Implementing in CUDA Store list of object IDs, cell IDs in

Implementing in CUDA

Store list of object IDs, cell IDs in device

memory
Build the list of cell IDs from object’s bounding boxes
Sorting list from previous step
Build an index table to traverse the sorted list
Schedule pairs of objects for narrow phase collision detection
Слайд 110

Initialization Cell ID Array OBJ 1 Cell ID 1 OBJ 1

Initialization

Cell ID Array

OBJ 1 Cell ID 1
OBJ 1 Cell ID 2
OBJ

1 Cell ID 3
OBJ 1 Cell ID 4
OBJ 2 Cell ID 1
OBJ 2 Cell ID 2
OBJ 2 Cell ID 3
OBJ 2 Cell ID 4
.
.
.

Object ID Array

OBJ 1 ID, Control Bits
OBJ 1 ID, Control Bits
OBJ 1 ID, Control Bits
OBJ 1 ID, Control Bits
OBJ 2 ID, Control Bits
OBJ 2 ID, Control Bits
OBJ 2 ID, Control Bits
OBJ 2 ID, Control Bits
.
.
.

Слайд 111

Construct the Cell ID Array Host Cells (H – Cells) Contain

Construct the Cell ID Array

Host Cells (H – Cells)
Contain the centroid

of the object
Phantom Cells (P-Cells)
Overlap with bounding volume but do not contain the centroid

H-Cell Hash = (pos.x / CELLSIZE) << XSHIFT) |
(pos.y / CELLSIZE) << YSHIFT) |
(pos.z / CELLSIZE) << ZSHIFT)

P

P

P

P

H

P

P

P

P

P-Cells – Test the 3d-1 cells surrounding the H cell
There can be as many as 2d-1 P cells

Слайд 112

Sorting the Cell ID Array What we want: Sorted by Cell

Sorting the Cell ID Array

What we want:
Sorted by Cell ID
H cells

of an ID occur before P cells of an ID
Starting with a partial sort
H cells are before P cells, but array is not sorted by Cell ID
Solution:
Radix Sort
Radix Sort ensures identical cell IDs remain in the same order as before sorting.
Слайд 113

Sorting Cell Array 010 0 011 1 111 2 101 3

Sorting Cell Array

010
0

011
1

111
2

101
3

021
4

021
n

020
0

110
2

100
3

011
4

011
n

011
0

100
2

021
n

021
0

000
2

111
n

001
2

022
n

101
2

011
2

010
2

...

Cell ID Array

000
2

011
n

101
3

001
2

020
0

101
2

010
0

021
4

110
2

010
2

021
n

111
2

011
1

021
0

111
n

011
0

022
n

111
n

011
2

100
2

102
n

011
4

100
3

103
3

...

Sorted Cell ID Array

011
1

100
2

Invalid Cell

Home Cell

Phantom

Cell

103
3

Object ID

Cell ID

Legend

Слайд 114

Spatial Subdivision 1 2 3 4 5 6 7 8 Images

Spatial Subdivision

1

2

3

4

5

6

7

8

Images from pg 699, 700 GPU Gems III

O1

O2

O3

O4

1

2

3

4

5

6

7

8

Example

Assign to each

cell the list of bounding volumes whose objects intersect with the cell
Perform Collision test only if both objects are in the cell and one has a centroid in the cell
Слайд 115

Create the Collision Cell List Scan sorted cell ID array for

Create the Collision Cell List

Scan sorted cell ID array for changes

of cell ID
Mark by end of the list of occupants of one cell and beginning of another
Count number of objects each collision cell contains and convert them into offsets using scan
Create entries for each collision cell in new array
Start
Number of H occupants
Number of P occupants
Слайд 116

Create Collision Cell List 000 2 011 n 101 3 001

Create Collision Cell List

000
2

011
n

101
3

001
2

020
0

101
2

010
0

021
4

110
2

010
2

021
n

111
2

011
1

021
0

111
n

011
0

022
n

111
n

011
2

100
2

102
n

011
4

100
3

103
3

...

Sorted Cell ID Array

Cell Index & Size

Array

2
1 1

4
1 4

10
2 1

...

ID
H P

ID = Cell index in sorted Cell ID Array
H = Number of Home Cell IDs
P = Number of Phantom Cell IDs