This site relies heavily on Javascript. You should enable it if you want the full experience. Learn more.

dev performance optimization

acl(admin devvvv vvvvgroup)

Performace Optimization Tips

GDC 2005 presentations

http://developer.nvidia.com/object/gdc_2005_presentations.html

locking & parallel processing

The most obvious solution would be to lock the back buffer for each frame in Direct3D (analogous to calling glFinish() in OpenGL). This ensures that all pending graphics commands are completed by the GPU before the CPU moves on. However, this completely removes any potential for asynchronous processing, as the CPU is unable to process the next frame until the current frame has finished rendering.

A better solution is double-buffered texture locking. This is a generalization of locking the back-buffer. At the end of your frame you render a single triangle to a tiny (2x2) texture, then read the contents of your texture. So far this solution is equivalent to locking the back-buffer, and suffers the same kind of stalls. It ensures that the GPU never gets more than 1 frame ahead of the CPU.

Now generalize it: use two tiny textures and alternately render to them and alternately lock them:

Render frame 1

Render a triangle to texture 0

Lock and read texture 1

Render frame 2

Render a triangle to texture 1

Lock and read texture 0

Render frame 3

Render triangle to texture 0

Lock and read texture 1

Render frame 4

Render a triangle to texture 1

Lock and read texture 0

...

Now, the GPU does not get stalled; it also never gets more than 2 frames ahead of the CPU. Lag is up to one frame, but overall efficiency is higher since the GPU is always busy (if you are GPU bound). You can further generalize it to use triple-buffered textures, and you may even be able to insert multiple sync points per frame to get finer control over lag.

A second solution is to use DirectX 9's Asynchronous Query functionality (analogous to using fences in OpenGL). At the end of your frame, insert a D3DQUERYTYPE_EVENT query into your rendering stream. You can then poll whether the GPU has reached this event yet by using GetData. As in 1) you can thus ensure (i.e., busy wait w/ the CPU) that the CPU never gets more than 2 frames ahead of the GPU, while the GPU is never idled. Similarly it is conceivable to insert multiple queries per frame to get finer control over lag.

dev performance optimization

locking & parallel processing

Shoutbox