» dev performance optimization
This site relies heavily on Javascript. You should enable it if you want the full experience. Learn more.

dev performance optimization

acl(admin devvvv vvvvgroup)

Performace Optimization Tips

GDC 2005 presentations

    http://developer.nvidia.com/object/gdc_2005_presentations.html

locking & parallel processing

The most obvious solution would be to lock the back buffer for each frame in Direct3D (analogous to calling glFinish() in OpenGL). This ensures that all pending graphics commands are completed by the GPU before the CPU moves on. However, this completely removes any potential for asynchronous processing, as the CPU is unable to process the next frame until the current frame has finished rendering.

A better solution is double-buffered texture locking. This is a generalization of locking the back-buffer. At the end of your frame you render a single triangle to a tiny (2x2) texture, then read the contents of your texture. So far this solution is equivalent to locking the back-buffer, and suffers the same kind of stalls. It ensures that the GPU never gets more than 1 frame ahead of the CPU.

Now generalize it: use two tiny textures and alternately render to them and alternately lock them:

Render frame 1

Render a triangle to texture 0

Lock and read texture 1

Render frame 2

Render a triangle to texture 1

Lock and read texture 0

Render frame 3

Render triangle to texture 0

Lock and read texture 1

Render frame 4

Render a triangle to texture 1

Lock and read texture 0

...

Now, the GPU does not get stalled; it also never gets more than 2 frames ahead of the CPU. Lag is up to one frame, but overall efficiency is higher since the GPU is always busy (if you are GPU bound). You can further generalize it to use triple-buffered textures, and you may even be able to insert multiple sync points per frame to get finer control over lag.

A second solution is to use DirectX 9's Asynchronous Query functionality (analogous to using fences in OpenGL). At the end of your frame, insert a D3DQUERYTYPE_EVENT query into your rendering stream. You can then poll whether the GPU has reached this event yet by using GetData. As in 1) you can thus ensure (i.e., busy wait w/ the CPU) that the CPU never gets more than 2 frames ahead of the GPU, while the GPU is never idled. Similarly it is conceivable to insert multiple queries per frame to get finer control over lag.

anonymous user login

Shoutbox

~2d ago

joreg: vvvvTv S0204 is out: Custom Widgets with Dear ImGui: https://youtube.com/live/nrXfpn5V9h0

~2d ago

joreg: New user registration is currently disabled as we're moving to a new login provider: https://visualprogramming.net/blog/2024/reclaiming-vvvv.org/

~10d ago

joreg: vvvvTv S02E03 is out: Logging: https://youtube.com/live/OpUrJjTXBxM

~12d ago

~13d ago

joreg: Follow TobyK on his Advent of Code: https://www.twitch.tv/tobyklight

~17d ago

joreg: vvvvTv S02E02 is out: Saving & Loading UI State: https://www.youtube.com/live/GJQGVxA1pIQ

~17d ago

joreg: We now have a presence on LinkedIn: https://www.linkedin.com/company/vvvv-group

~24d ago

joreg: vvvvTv S02E01 is out: Buttons & Sliders with Dear ImGui: https://www.youtube.com/live/PuuTilbqd9w

~1mth ago

joreg: vvvvTv S02E00 is out: Sensors & Servos with Arduino: https://visualprogramming.net/blog/2024/vvvvtv-is-back-with-season-2/

~1mth ago