Advances in Real - Time Rendering in Games

Содержание

Слайд 2

GPU-Driven Rendering Pipelines Ulrich Haar, Lead Programmer 3D, Ubisoft Montreal Sebastian

GPU-Driven Rendering Pipelines
Ulrich Haar, Lead Programmer 3D, Ubisoft Montreal
Sebastian Aaltonen, Senior

Lead Programmer, RedLynx a Ubisoft Studio

SIGGRAPH 2015: Advances in Real-Time Rendering in Games

Слайд 3

Topics Motivation Mesh Cluster Rendering Rendering Pipeline Overview Occlusion Depth Generation

Topics

Motivation
Mesh Cluster Rendering
Rendering Pipeline Overview
Occlusion Depth Generation
Results and future work
SIGGRAPH 2015:

Advances in Real-Time Rendering course
Слайд 4

GPU-Driven Rendering? GPU controls what objects are actually rendered “draw scene”

GPU-Driven Rendering?

GPU controls what objects are actually rendered
“draw scene” GPU-command
n viewports/frustums
GPU

determines (sub-)object visibility
No CPU/GPU roundtrip
Prior work [SBOT08]
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 5

Motivation (RedLynx) Modular construction using in-game level editor High draw distance.

Motivation (RedLynx)
Modular construction using in-game level editor
High draw distance. Background built

from small objects.
No baked lighting. Lots of draw calls from shadow maps.
CPU used for physics simulation and visual scripting
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 6

Massive amounts of geometry: architecture Motivation Assassin’s Creed Unity SIGGRAPH 2015: Advances in Real-Time Rendering course


Massive amounts of geometry: architecture

Motivation Assassin’s Creed Unity
SIGGRAPH 2015: Advances

in Real-Time Rendering course
Слайд 7

Motivation Assassin’s Creed Unity Massive amounts of geometry: seamless interiors SIGGRAPH

Motivation Assassin’s Creed Unity


Massive amounts of geometry: seamless interiors
SIGGRAPH 2015:

Advances in Real-Time Rendering course
Слайд 8

Motivation Assassin’s Creed Unity Massive amounts of geometry: crowds SIGGRAPH 2015: Advances in Real-Time Rendering course

Motivation Assassin’s Creed Unity


Massive amounts of geometry: crowds
SIGGRAPH 2015: Advances

in Real-Time Rendering course
Слайд 9

Motivation Assassin’s Creed Unity Modular construction (partially automated) ~10x instances compared

Motivation Assassin’s Creed Unity


Modular construction (partially automated)
~10x instances compared to

previous Assassin’s Creed games
CPU scarcest resource on consoles
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 10

Mesh Cluster Rendering Fixed topology (64 vertex strip) Split & rearrange

Mesh Cluster Rendering

Fixed topology (64 vertex strip)
Split & rearrange all meshes

to fit fixed topology (insert degenerate triangles)
Fetch vertices manually in VS from shared buffer [Riccio13]
DrawInstancedIndirect
GPU culling outputs cluster list
& drawcall args
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 11

Mesh Cluster Rendering Arbitrary number of meshes in single drawcall GPU-culled

Mesh Cluster Rendering

Arbitrary number of meshes in single drawcall
GPU-culled by cluster

bounds [Greene93] [Shopf08] [Hill11]
Faster vertex fetch
Cluster depth sorting
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 12

Mesh Cluster Rendering (ACU) Problems with triangle strips: Memory increase due

Mesh Cluster Rendering (ACU)

Problems with triangle strips:
Memory increase due to degenerate

triangles
Non-deterministic cluster order
MultiDrawIndexedInstancedIndirect:
One (sub-)drawcall per instance
64 triangles per cluster
Requires appending index buffer on the fly
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 13

Rendering Pipeline Overview MULTI-DRAW COARSE FRUSTUM CULLING BATCH DRAWCALLS INSTANCE CULLING

Rendering Pipeline Overview

MULTI-DRAW

COARSE FRUSTUM CULLING

BATCH DRAWCALLS

INSTANCE CULLING (FRUSTUM/OCCLUSION)

CLUSTER CHUNK EXPANSION

CLUSTER CULLING

(FRUSTUM/OCCLUSION/TRIANGLE BACKFACE)

INDEX BUFFER COMPACTION

BUILD BATCH HASH UPDATE INSTANCE GPU DATA

- CPU

- GPU
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 14

Rendering pipeline overview CPU quad tree culling Per instance data: E.g.

Rendering pipeline overview

CPU quad tree culling
Per instance data:
E.g. transform, LOD factor...
Updated

in GPU ring buffer
Persistent for static instances
Drawcall hash build on non-instanced data:
E.g. material, renderstate, …
Drawcalls merged based on hash
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 15

SIGGRAPH 2015: Advances in Real-Time Rendering course Transform Bounds Mesh Rendering

SIGGRAPH 2015: Advances in Real-Time Rendering course

Transform
Bounds
Mesh

Rendering Pipeline Overview

MULTI-DRAW

INSTANCE CULLING (FRUSTUM/OCCLUSION)

CLUSTER

CHUNK EXPANSION

CLUSTER CULLING (FRUSTUM/OCCLUSION/TRIANGLE BACKFACE)

INDEX BUFFER COMPACTION

Instance0

Instance1

Instance2


Instance3

This stream of instances contains a list of offsets into a GPU-buffer per instance that allows the GPU to access information like transform, instance bounds etc.

Слайд 16

SIGGRAPH 2015: Advances in Real-Time Rendering course Rendering Pipeline Overview Instance

SIGGRAPH 2015: Advances in Real-Time Rendering course

Rendering Pipeline Overview

Instance Idx
Chunk Idx

MULTI-DRAW

INSTANCE

CULLING (FRUSTUM/OCCLUSION)

CLUSTER CHUNK EXPANSION

CLUSTER CULLING (FRUSTUM/OCCLUSION/TRIANGLE BACKFACE)

INDEX BUFFER COMPACTION

Instance0

Instance1

Instance2

Chunk1_0

Chunk2_0

Chunk2_1

Chunk2_2



Instance3

Слайд 17

SIGGRAPH 2015: Advances in Real-Time Rendering course Instance Idx Cluster Idx

SIGGRAPH 2015: Advances in Real-Time Rendering course

Instance Idx
Cluster Idx

Rendering Pipeline Overview

MULTI-DRAW

INSTANCE

CULLING (FRUSTUM/OCCLUSION)

CLUSTER CHUNK EXPANSION

CLUSTER CULLING (FRUSTUM/OCCLUSION/TRIANGLE BACKFACE)

INDEX BUFFER COMPACTION

Chunk1_0

Chunk2_0

Chunk2_1

Chunk2_2


Cluster1_0

Cluster1_1

Cluster2_0


Cluster2_64


Cluster2_1

Слайд 18

SIGGRAPH 2015: Advances in Real-Time Rendering course Triangle Mask Read/Write Offsets

SIGGRAPH 2015: Advances in Real-Time Rendering course

Triangle Mask
Read/Write Offsets

Rendering Pipeline Overview

MULTI-DRAW

INSTANCE

CULLING (FRUSTUM/OCCLUSION)

CLUSTER CHUNK EXPANSION

CLUSTER CULLING (FRUSTUM/OCCLUSION/TRIANGLE BACKFACE)

INDEX BUFFER COMPACTION

Cluster1_0

Cluster1_1

Cluster2_0


Cluster2_64


Cluster2_1

Index1_1

Index2_1


Index2_64


Слайд 19

SIGGRAPH 2015: Advances in Real-Time Rendering course Rendering Pipeline Overview MULTI-DRAW

SIGGRAPH 2015: Advances in Real-Time Rendering course

Rendering Pipeline Overview

MULTI-DRAW

INSTANCE CULLING (FRUSTUM/OCCLUSION)

CLUSTER

CHUNK EXPANSION

CLUSTER CULLING (FRUSTUM/OCCLUSION/TRIANGLE BACKFACE)

INDEX BUFFER COMPACTION

Index1_1

Index2_1


Index2_64


Compacted index buffer

0

1

0

1

0

1

2


Instance0

Instance1

Instance2

Слайд 20

SIGGRAPH 2015: Advances in Real-Time Rendering course Rendering Pipeline Overview 0

SIGGRAPH 2015: Advances in Real-Time Rendering course

Rendering Pipeline Overview

0

1

0

1

0

1

2

MULTI-DRAW

INSTANCE CULLING (FRUSTUM/OCCLUSION)

CLUSTER

CHUNK EXPANSION

CLUSTER CULLING (FRUSTUM/OCCLUSION/TRIANGLE BACKFACE)

INDEX BUFFER COMPACTION

Index1_1

Index2_1


Index2_64



1

1

3

64

8

Слайд 21

Rendering Pipeline Overview MULTI-DRAW INSTANCE CULLING (FRUSTUM/OCCLUSION) CLUSTER CHUNK EXPANSION CLUSTER

Rendering Pipeline Overview

MULTI-DRAW

INSTANCE CULLING (FRUSTUM/OCCLUSION)

CLUSTER CHUNK EXPANSION

CLUSTER CULLING (FRUSTUM/OCCLUSION/TRIANGLE BACKFACE)

INDEX BUFFER

COMPACTION


Drawcall 0

Drawcall 1

Drawcall 2

0

1

0

1

0

1

2

1

1

3

64

8
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 22

Static Triangle Backface Culling Bake triangle visibility for pixel frustums of

Static Triangle Backface Culling

Bake triangle visibility for pixel frustums of cluster

centered cubemap
Cubemap lookup based on camera
Fetch 64 bits for visibility of all triangles in cluster
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 23

Static Triangle Backface Culling SIGGRAPH 2015: Advances in Real-Time Rendering course


Static Triangle Backface Culling
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 24

Static Triangle Backface Culling Only one pixel per cubemap face (6

Static Triangle Backface Culling

Only one pixel per cubemap face (6 bits

per triangle)
Pixel frustum is cut at distance to increase culling efficiency (possible false positives at oblique angles)
10-30% triangles culled
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 25

Occlusion Depth Generation SIGGRAPH 2015: Advances in Real-Time Rendering course

Occlusion Depth Generation
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 26

Occlusion Depth Generation Hierarchy Depth pre-pass with best occluders Rendered in

Occlusion Depth Generation

Hierarchy

Depth pre-pass with best occluders
Rendered in full resolution for

High-Z and Early-Z
Downsampled to 512x256
Combined with reprojection of last frame’s depth
Depth hierarchy for GPU culling
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 27

Occlusion Depth Generation Hierarchy 300 best occluders (~600us) Rendered in full

Occlusion Depth Generation

Hierarchy

300 best occluders (~600us)
Rendered in full resolution for High-Z

and Early-Z
Downsampled to 512x256 (100us)
Combined with reprojection of last frame’s depth (50us)
Depth hierarchy for GPU culling (50us)
(*PS4 performance )
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 28

Shadow Occlusion Depth Generation For each cascade Camera depth reprojection (~70us)

Shadow Occlusion Depth Generation

For each cascade
Camera depth reprojection (~70us)
Combine with shadow

depth reprojection (10us)
Depth hierarchy for GPU culling (30us)
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 29

Camera Depth Reprojection SIGGRAPH 2015: Advances in Real-Time Rendering course

Camera Depth Reprojection


SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 30

Camera Depth Reprojection SIGGRAPH 2015: Advances in Real-Time Rendering course

Camera Depth Reprojection


SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 31

Camera Depth Reprojection SIGGRAPH 2015: Advances in Real-Time Rendering course

Camera Depth Reprojection


SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 32

Camera Depth Reprojection SIGGRAPH 2015: Advances in Real-Time Rendering course

Camera Depth Reprojection


SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 33

Camera Depth Reprojection SIGGRAPH 2015: Advances in Real-Time Rendering course

Camera Depth Reprojection


SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 34

Camera Depth Reprojection Light Space Reprojection SIGGRAPH 2015: Advances in Real-Time Rendering course

Camera Depth Reprojection


Light Space Reprojection
SIGGRAPH 2015: Advances in Real-Time Rendering

course
Слайд 35

Camera Depth Reprojection Reprojection “shadow” of the building SIGGRAPH 2015: Advances in Real-Time Rendering course

Camera Depth Reprojection


Reprojection “shadow” of the building
SIGGRAPH 2015: Advances in

Real-Time Rendering course
Слайд 36

Camera Depth Reprojection Similar to [Silvennoinen12] But, mask not effective because

Camera Depth Reprojection


Similar to [Silvennoinen12]
But, mask not effective because of

fog:
Cannot use min-depth
Cannot exclude far-plane
64x64 pixel reprojection
Could pre-process depth to remove redundant overdraw
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 37

Results CPU: 1-2 Orders of magnitude less drawcalls ~75% of previous

Results

CPU:
1-2 Orders of magnitude less drawcalls
~75% of previous AC, with ~10x

objects
GPU:
20-40% triangles culled (backface + cluster bounds)
Only small overall gain: <10% of geometry rendering
30-80% shadow triangles culled
Work in progress:
More GPU-driven for static objects
More batch friendly data
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 38

Future Bindless textures GPU-driven vs. DX12/Vulkan SIGGRAPH 2015: Advances in Real-Time Rendering course

Future

Bindless textures
GPU-driven vs. DX12/Vulkan
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 39

RedLynx Topics Virtual Texturing in GPU-Driven Rendering Virtual Deferred Texturing MSAA

RedLynx Topics

Virtual Texturing in GPU-Driven Rendering
Virtual Deferred Texturing
MSAA Trick
Two-Phase Occlusion Culling
Virtual

Shadow Mapping
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 40

Virtual Texturing Key idea: Keep only the visible texture data in

Virtual Texturing

Key idea: Keep only the visible texture data in memory

[Hall99]
Virtual 256k2 texel atlas
1282 texel pages
8k2 texture page cache
5 slice texture array: Albedo, specular, roughness, normal, etc.
DXT compressed (BC5 / BC3)
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 41

GPU-Driven Rendering with VT Virtual texturing is the biggest difference between

GPU-Driven Rendering with VT

Virtual texturing is the biggest difference between our

and AC: Unity’s renderer
Key feature: All texture data is available at once, using just a single texture binding
No need to batch by textures!
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 42

Single Draw Call Rendering Viewport = single draw call (x2) Dynamic

Single Draw Call Rendering

Viewport = single draw call (x2)
Dynamic branching for

different vertex animation types
Fast on modern GPUs (+2% cost)
Cluster depth sorting provides gain similar to depth prepass
Cheap OIT with inverse sort
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 43

Additional VT Advantages Complex material blends and decal rendering results are

Additional VT Advantages

Complex material blends and decal rendering results are stored

to VT page cache
Data reuse amortizes costs over hundreds of frames
Constant memory footprint, regardless of texture resolution and the number of assets
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 44

Virtual Deferred Texturing Old Idea: Store UVs to the G-buffer instead

Virtual Deferred Texturing

Old Idea: Store UVs to the G-buffer instead of

texels [Auf.07]
Key feature: VT page cache atlas contains all the currently visible texture data
16+16 bit UV to the 8k2 texture atlas gives us 8 x 8 subpixel filtering precision

height

albedo

roughness

specular

ambient

normal

tangent frame

UV
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 45

Gradients and Tangent Frame Calculate pixel gradients in screen space. UV

Gradients and Tangent Frame

Calculate pixel gradients in screen space. UV distance

used to detect neighbors.
No neighbors found ? bilinear
Tangent frame stored as a 32 bit quaternion [Frykholm09]
Implicit mip and material id from VT. Page = UV.xy / 128.

height

albedo

roughness

specular

ambient

normal

tangent frame

UV
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 46

Recap & Advantages 64 bits. Full fill rate. No MRT. Overdraw

Recap & Advantages

64 bits. Full fill rate. No MRT.
Overdraw is dirt

cheap
Texturing deferred to lighting CS
Quad efficiency less important
Virtual texturing page ID pass is no longer needed

height

albedo

roughness

specular

ambient

normal

tangent frame

UV
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 47

Gradient reconstruction quality Ground truth Reconstructed Difference (x4) SIGGRAPH 2015: Advances in Real-Time Rendering course

Gradient reconstruction quality

Ground truth

Reconstructed

Difference (x4)
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 48

MSAA Trick Key Observation: UV and tangent can be interpolated Idea:

MSAA Trick

Key Observation: UV and tangent can be interpolated
Idea: Render the

scene at 2x2 lower resolution (540p) with ordered grid 4xMSAA pattern
Use Texture2DMS.Load() to read each sample separately in the lighting compute shader

 
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 49

1080p Reconstruction Reconstruct 1080p into LDS Edge pixels are perfectly reconstructed.

1080p Reconstruction

Reconstruct 1080p into LDS
Edge pixels are perfectly reconstructed. MSAA runs

the pixel shader for both sides.
Interpolate the inner pixels’ UV and tangent
Quality is excellent. Differences are hard to spot.

 
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 50

8xMSAA Trick Benchmark 128 bpp G-Buffer One pixel is a 2x2

8xMSAA Trick Benchmark

128 bpp G-Buffer
One pixel is a 2x2 tile of

”2xMSAA pixels”
Xbox One: 1080p + MSAA + 60 fps ☺
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 51

Two-Phase Occlusion Culling No extra occlusion pass with low poly proxy

Two-Phase Occlusion Culling

No extra occlusion pass with low poly proxy geometry
Precise

WYSIWYG occlusion
Based on depth buffer data
Depth pyramid generated from HTILE min/max buffer
O(1) occlusion test (gather4)
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 52

SIGGRAPH 2015: Advances in Real-Time Rendering course Two-Phase Occlusion Culling 1st

SIGGRAPH 2015: Advances in Real-Time Rendering course

Two-Phase Occlusion Culling

1st phase
Cull objects

& clusters using last frame’s depth pyramid
Render visible objects
2nd phase
Refresh depth pyramid
Test culled objects & clusters
Render false negatives

Object list

Object culling

Cluster culling

Depth sort clusters

Draw

Obj. occlusion culling

Clu. occlusion
culling

Depth sort clusters

Draw

First
phase

Down-
sample

Second
phase

Occluded clusters

Occluded objects

Слайд 53

Benchmark “Torture” unit test scene 250,000 separate moving objects 1 GB

Benchmark

“Torture” unit test scene
250,000 separate moving objects
1 GB of mesh data

(10k+ meshes)
8k2 texture cache atlas
DirectX 11 code path
64 vertex clusters (strips)
No ExecuteIndirect / MultiDrawIndirect
Only two DrawInstancedIndirect calls
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 54

Benchmark Results CPU time: 0.2 milliseconds (single Jaguar CPU core) Xbox

Benchmark Results

CPU time: 0.2 milliseconds (single Jaguar CPU core)

Xbox One, 1080p
SIGGRAPH

2015: Advances in Real-Time Rendering course
Слайд 55

Virtual Shadow Mapping 128k2 virtual shadow map 2562 texel pages Identify

Virtual Shadow Mapping

128k2 virtual shadow map
2562 texel pages
Identify needed shadow pages

from the z-buffer [Fernando01].
Cull shadow pages with the GPU-driven pipeline.
Render all pages at once.
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 56

VTSM Quality and Performance Close to 1:1 shadow-to-screen resolution in all

VTSM Quality and Performance

Close to 1:1 shadow-to-screen resolution in all areas
Measured:

Up to 3.5x faster than SDSM [Lauritzen10] in complex “sparse” scenes
Virtual SM slightly slower than SDSM & CSM in simple scenes
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 57

GPU-Driven Rendering + DX12 NEW DX12 (PC) FEATURES ExecuteIndirect Asynchronous Compute

GPU-Driven Rendering + DX12

NEW DX12 (PC) FEATURES
ExecuteIndirect
Asynchronous Compute
VS RT index (GS

bypass)
Resource management
Explicit multiadapter
Tiled resources + bindless
Conservative raster + ROV

FEATURES IN OTHER APIs
Custom MSAA patterns
GPU side dispatch
SIMD lane swizzles
Ordered atomics
SV_Barycentric to PS
Exposed CSAA/EQAA samples
Shading language with templates
SIGGRAPH 2015: Advances in Real-Time Rendering course

Слайд 58

References [SBOT08] Shopf, J., Barczak, J., Oat, C., Tatarchuk, N. March

References

[SBOT08] Shopf, J., Barczak, J., Oat, C., Tatarchuk, N. March of

the Froblins: simulation and rendering massive crowds of intelligent and detailed creatures on GPU, SIGGRAPH 2008.
[Persson12] Merge-Instancing, SIGGRAPH 2012.
[Greene93] Hierarchical Z-buffer visibility, SIGGRAPH 1993.
[Hill11] Practical, Dynamic Visibility for Games, GPU Pro 2, 2011.
[Decoret05] N-Buffers for efficient depth map query, Computer Graphics Forum, Volume 24, Number 3, 2005.
[Zhang97] Visibility Culling using Hierarchical Occlusion Maps, SIGGRAPH 1997.
[Riccio13] Introducing the Programmable Vertex Pulling Rendering Pipeline, GPU Pro 4, 2013.
[Silvennoinen12] Chasing Shadows, GDMag Feb/2012.
[Hall99] Virtual Textures, Texture Management in Silicon, 1999.
[Aufderheide07] Deferred Texture mapping?, 2007.
[Reed14] Deferred Texturing, 2014.
[Frykholm09] The BitSquid low level animation system, 2009.
[Fernando01] Adaptive Shadow Maps, SIGGRAPH 2001.
[Lauritzen10] Sample Distribution Shadow Maps, SIGGRAPH 2010.
SIGGRAPH 2015: Advances in Real-Time Rendering course
Слайд 59

Acknowledgements Stephen Hill Roland Kindermann Jussi Knuuttila Jalal Eddine El Mansouri

Acknowledgements

Stephen Hill
Roland Kindermann
Jussi Knuuttila
Jalal Eddine El Mansouri
Tiago Rodrigues
Lionel Berenguier
Stephen McAuley
Ivan Nevraev