Common C++ Performance Mistakes in Games

Содержание

Слайд 2

About the Data ATG reviews code to find bottlenecks and make

About the Data

ATG reviews code to find bottlenecks and make perf

recommendations
50 titles per year
96% use C++
1 in 3 use “advanced” features like templates or generics
Слайд 3

Why This Talk Is Important The majority of Xbox games are

Why This Talk Is Important

The majority of Xbox games are CPU

bound
The CPU bottleneck is often a language or C++ library issue
These issues are not usually specific to the platform
Слайд 4

Format Definition of the problem Examples Recommendation For reference A frame

Format

Definition of the problem
Examples
Recommendation
For reference
A frame is 17 or 33 ms

(60fps / 30fps)
Bottlenecks given in ms per frame
Слайд 5

Issue: STL Game using std::list Adding ~20,000 objects every frame Rebuilding

Issue: STL

Game using std::list
Adding ~20,000 objects every frame
Rebuilding the list every

frame
Time spent: 6.5 ms/frame!
~156K overhead (2 pointers per node)
Objects spread all over the heap
Слайд 6

std::set and map Many games use set/map as sorted lists Inserts

std::set and map

Many games use set/map as sorted lists
Inserts are slow

(log(N))
Memory overhead: 3 ptrs + color
Worst case in game: 3.8 ms/frame
Слайд 7

std::vector Hundreds of push_back()s per frame VS7.1 expands vector by 50%

std::vector

Hundreds of push_back()s per frame
VS7.1 expands vector by 50%
Question: How many

reallocations for 100 push_back()s?
Answer: 13! (1,2,3,4,5,7,10,14,20,29,43,64,95)
Слайд 8

Clearly, the STL is Evil

Clearly, the STL is Evil

Слайд 9

Use the Right Tool for the Job The STL is powerful,

Use the Right Tool for the Job

The STL is powerful, but

it’s not free
Filling any container is expensive
Be aware of container overhead
Be aware of heap fragmentation and cache coherency
Prefer vector, vector::reserve()
Слайд 10

The STL is Evil, Sometimes The STL doesn’t solve every problem

The STL is Evil, Sometimes

The STL doesn’t solve every problem
The STL

solves some problems poorly
Sometimes good old C-arrays are the perfect container
Mike Abrash puts it well:
“The best optimizer is between your ears”
Слайд 11

Issue: NIH Syndrome Example: Custom binary tree Sorted list of transparent

Issue: NIH Syndrome

Example: Custom binary tree
Sorted list of transparent objects
Badly unbalanced
1

ms/frame to add only 400 items
Example: Custom dynamic array class
Poorer performance than std::vector
Fewer features
Слайд 12

Optimizations that Aren’t void appMemcpy( void* d, const void* s, size_t

Optimizations that Aren’t

void appMemcpy( void* d, const void* s, size_t b

)
{
// lots of assembly code here ...
}
appMemcpy( pDest, pSrc, 100 ); // bottleneck
appMemcpy was slower than memcpy for anything under 64K
Слайд 13

Invent Only What You Need std::set/map more efficient than the custom

Invent Only What You Need

std::set/map more efficient than the custom tree

by 10X
Tested and proven
Still high overhead
An even better solution
Unsorted vector or array
Sort once
20X improvement
Слайд 14

Profile Run your profiler Rinse. Repeat. Prove the improvement. Don’t rewrite

Profile

Run your profiler
Rinse. Repeat.
Prove the improvement.
Don’t rewrite the C runtime or

STL just because you can. There are more interesting places to spend your time.
Слайд 15

Issue: Tool Knowledge If you’re a programmer, you use C/C++ every

Issue: Tool Knowledge

If you’re a programmer, you use C/C++ every day
C++

is complex
CRT and STL libraries are complex
The complexities matter
Sometimes they really matter
Слайд 16

vector::clear Game reused global vector in frame loop clear() called every

vector::clear

Game reused global vector in frame loop
clear() called every frame to

empty the vector
C++ Standard
clear() erases all elements (size() goes to 0)
No mention of what happens to vector capacity
On VS7.1/Dinkumware, frees the memory
Every frame reallocated memory
Слайд 17

Zero-Initialization struct Array { int x[1000]; }; struct Container { Array

Zero-Initialization

struct Array { int x[1000]; };
struct Container {
Array arr;
Container()

: arr() { }
};
Container x; // bottleneck
Costing 3.5 ms/frame
Removing : arr() speeds this by 20X
Слайд 18

Know Thine Holy Standard Use resize(0) to reduce container size without

Know Thine Holy Standard

Use resize(0) to reduce container size without affecting

capacity
T() means zero-initialize PODs. Don’t use T() unless you mean it.
Get a copy of the C++ Standard. Really.
www.techstreet.com; search on 14882
Only $18 for the PDF
Слайд 19

Issue: C Runtime void BuildScore( char* s, int n ) {

Issue: C Runtime

void BuildScore( char* s, int n )
{
if( n

> 0 )
sprintf( s, “%d”, n );
else
sprintf( s, “” );
}
n was often zero
sprintf was a hotspot
Слайд 20

qsort Sorting is important in games qsort is not an ideal

qsort

Sorting is important in games
qsort is not an ideal sorting function
No

type safety
Comparison function call overhead
No opportunity for compiler inlining
There are faster options
Слайд 21

Clearly, the CRT is Evil

Clearly, the CRT is Evil

Слайд 22

Understand Your Options itoa() can replace sprintf( s, “%d”, n )

Understand Your Options

itoa() can replace sprintf( s, “%d”, n )
*s

= ‘\0’ can replace sprintf( s, “” )
std::sort can replace qsort
Type safe
Comparison can be inlined
Other sorting options can be even faster: partial_sort, partition
Слайд 23

Issue: Function Calls 50,000-100,000 calls/frame is normal At 60Hz, Xbox has

Issue: Function Calls

50,000-100,000 calls/frame is normal
At 60Hz, Xbox has 12.2M cycles/frame
Function

call/return averages 20 cycles
A game calling 61,000 functions/frame spends 10% CPU (1.7 ms/frame) in function call overhead
Слайд 24

Extreme Function-ality 120,000 functions/frame 140,000 functions/frame 130,000 calls to a single

Extreme Function-ality

120,000 functions/frame
140,000 functions/frame
130,000 calls to a single function/frame (ColumnVec<3,float>::operator[])
And the

winner:
340,000 calls per frame!
9 ms/frame of call overhead
Слайд 25

Beware Elegance Elegance → levels of indirection → more functions →

Beware Elegance

Elegance → levels of indirection → more functions → perf

impact
Use algorithmic solutions first
One pass through the world
Better object rejection
Do AI/physics/networking less often than once/frame
Слайд 26

Inline Judiciously Remember: inline is a suggestion Try “inline any suitable”

Inline Judiciously

Remember: inline is a suggestion
Try “inline any suitable” compiler option
15

to 20 fps
68,000 calls down to 47,000
Try __forceinline or similar keyword
Adding to 5 funcs shaved 1.5 ms/frame
Don’t over-inline
Слайд 27

Issue: for loops // Example 1: Copy indices to push buffer

Issue: for loops

// Example 1: Copy indices to push buffer
for( DWORD

i = 0; i < dwIndexCnt; ++i )
*pPushBuffer++ = arrIndices[ i ];
// Example 2: Initialize vector array
for( DWORD i = 0; i < dwMax; ++i )
mVectorArr[i] = XGVECTOR4(0,0,0,0);
// Example 3: Process items in world
for( itr i = c.begin(); i < c.end(); ++i )
Process( *i );
Слайд 28

Watch Out For For Never copy/clear a POD with a for

Watch Out For For

Never copy/clear a POD with a for loop
std::algorithms

are optimized; use them
memcpy( pPushBuffer, arrIndices,
dwIndexCnt * sizeof(DWORD) );
memset( mVectorArr, 0, dwMax * sizeof(XGVECTOR4) );
for_each( c.begin(), c.end(), Process );
Слайд 29

Issue: Exception Handling Most games never throw Most games never catch

Issue: Exception Handling

Most games never throw
Most games never catch
Yet, most games

enable EH
EH adds code to do stack unwinding
A little bit of overhead to a lot of code
10% size increase is common
2 ms/frame in worst case
Слайд 30

Disable Exception Handling Don’t throw or catch exceptions Turn off the

Disable Exception Handling

Don’t throw or catch exceptions
Turn off the C++ EH

compiler option
For Dinkumware STL
Define “_HAS_EXCEPTIONS=0”
Write empty _Throw and _Raise_handler; see stdthrow.cpp and raisehan.cpp in crt folder
Add #pragma warning(disable: 4530)
Слайд 31

Issue: Strings Programmers love strings Love hurts ~7000 calls to stricmp

Issue: Strings

Programmers love strings
Love hurts
~7000 calls to stricmp in frame loop
1.5

ms/frame
Binary search of a string table
2 ms/frame
Слайд 32

Avoid strings String comparisons don’t belong in the frame loop Put

Avoid strings

String comparisons don’t belong in the frame loop
Put strings in

an table and compare indices
At least optimize the comparison
Compare pointers only
Prefer strcmp to stricmp
Слайд 33

Issue: Memory Allocation Memory overhead Xbox granularity/overhead is 16/16 bytes Overhead

Issue: Memory Allocation

Memory overhead
Xbox granularity/overhead is 16/16 bytes
Overhead alone is often

1+ MB
Too many allocations
Games commonly do thousands of allocations per frame
Cost: 1-5 ms/frame
Слайд 34

Hidden Allocations push_back(), insert() and friends typically allocate memory String constructors

Hidden Allocations

push_back(), insert() and friends typically allocate memory
String constructors allocate
Init-style calls

often allocate
Temporary objects, particularly string constants that convert to string objects
Слайд 35

Minimize Per-Frame Allocations Use memory-friendly data structures, e.g. arrays, vectors Reserve

Minimize Per-Frame Allocations

Use memory-friendly data structures, e.g. arrays, vectors
Reserve memory in

advance
Use custom allocators
Pool same-size allocations in a single block of memory to avoid overhead
Use the explicit keyword to avoid hidden temporaries
Avoid strings
Слайд 36

Other Tidbits Compiler settings: experiment dynamic_cast: just say no Constructors: performance

Other Tidbits

Compiler settings: experiment
dynamic_cast: just say no
Constructors: performance killers
Unused static array

space: track this
Loop unrolling: huge wins, sometimes
Suspicious comments: watch out
“Immensely slow matrix multiplication”
Слайд 37

Wrap Up Use the Right Tool for the Job The STL

Wrap Up

Use the Right Tool for the Job
The STL is Evil,

Sometimes
Invent Only What You Need
Profile
Know Thine Holy Standard
Understand Your Options

Beware Elegance
Inline Judiciously
Watch Out For For
Disable Exception Handling
Avoid Strings
Minimize Per-frame Allocations

Слайд 38

Call to Action: Evolve! Pass the rubber chicken Share your C++

Call to Action: Evolve!

Pass the rubber chicken
Share your C++ performance mistakes

with your team
Mentor junior programmers
So they only make new mistakes
Don’t stop learning
You can never know enough C++