In this paper, I will describe the process of my implementation of Tile Based Omnidirectional Shadow Mapping, which is an effective way of generating shadow map for large number of point lights at run time, introduced by Hawar Doghramachi in the book GPU Pro 6.
Overview:
The basic idea of this method is to take advantage of compute shader and indirect draw commands by doing shadow object culling and draw call generating in compute shader. Besides, this method also takes usage of a new shadow projection method called tetrahedron shadow mapping, which is faster than cube map shadow mapping and saves more space than parabola shadow mapping.
Indirect Draw Buffer:
Before start the topic, let me briefly talk about indirect draw command buffer, which is an effective way of drawing instances of a mesh especially in new generation graphics APIs like DirectX 12 and Vulkan. In those APIs, drawing related commands are usually recorded into command buffers before sent to GPU for execution. Usually, if we use traditional draw call functions like vkCmdDrawIndexed(), which takes instance number as an explicit argument, the command buffer may need to be recorded each frame since the number of object instance may change. Using an indirect draw command, like vkCmdDrawIndirect(), makes it possible to record command buffer only once at begging because for such commands, drawing arguments such as instance number are stored in a buffer object rather than command buffer itself and later if the number of object instance change, we can simply update draw call by copying number argument into this buffer object instead of recording the whole command again. In Vulkan, indirect draw command should be structured as VkDrawIndexedIndirectCommand inside the buffer object.
Generating Shadow Draw Calls in Compute Shader:
One of the most important processes of Tile Based Shadow Mapping is doing object culling, as well as populating indirect draw commands inside compute shaders. In the original implementation, the author pre-calculate bounding boxes for all scene objects in world space, storing those boxes into a Shader Storage Buffer Object, and pass them into compute shader together with point light data and indirect draw command structures of each mesh with instanceCount initialized with zeros. Then then compute shader is dispatched with same working group size as bounding box number of scene. In the compute shader, each working group represents one bounding box and all the threads of it traverse over all the light data and doing intersection test between current bounding box and each light. If an intersection occurs, increment the instanceCount of indirect draw structure that related with current bounding box by one and store the light id in a local shared array (these operations are done on multiple threads, so atomicAdd() is needed). After traversing all the lights and a barrier operation, light indices in local shared array should be copied back to an output Shader Storage Buffer and the offset index should be assigned to the indirect draw command buffer as firstInstance. Such process should also use atomicAdd() operations since all the working groups are copying to same buffer. Through this process, draw commands will be generated only for intersected bounding boxes and lights, which greatly reduce total draw call number.
Tetrahedron Shadow Mapping and Shadow Atlas
After executing the compute shader, those generated indirect draw commands are used for shadow rendering. In this case, the author introduced an interesting method of shadow mapping which I never heard before: Tetrahedron Shadow Mapping. In this method, scene objects are project four times onto four triangular faces of a tetrahedron centered at light position in a geometry shader. Those four faces are them scaled and moved into a square sized area before rasterization. Besides, since pipeline won’t do “triangular” clipping itself, manual clipping should be done through assigning gl_ClipDistance array with distances between vertex and three clipping faces. Those values will be linearly interpolated between points and later during rasterization stage, fragments with negative clip distance values will be discarded.
Tetrahedron projection
In order to make shadow texture storage more memory friendly, instead of creating image objects for each light, a huge texture atlas with 8192 x 8192 size is created to store all the shadow data. In geometry shader, each triangles are scaled and moved with light specific shadow offset in atlas. Same as when retrieving depth data during shading time.
Shadow atlas in my program
Light Shading:
Though this method is called “Tiled-Based Shadow Mapping”, it can be used in both tile based forward lighting and deferred lighting. In forward lighting, shadow maps can be retrieved with light indices of the tile and in my implementation, which uses deferred lighting, shadow depth is retrieved using Instance index of light sphere. The shading process is almost same as cube shadow mapping. Firstly, project vertex vector to each direction to determine which side it belongs to. Then project the point onto that side with shadow projection matrix and compare its z value (after perspective divide) with sampled depth value.
Some Improvements:
In the original implementation, the author write a quad tree manager to manage shadow atlas and allocate shadow maps with different sizes to point lights based on their coverage of screen space. Since I have no time to implement such an ingenious manager, I simply divide the shadow atlas uniformly into 256 512 x 512 squares. Another adjustment I made is make it possible to do object culling for meshes located in different buffer objects. The original implementation requires all the meshes in scene are stored in same buffer object with different vertex and instance offsets, which is hard to fulfill in my case, so I divide object culling into two parts. In the first part, doing light culling and drawing number counting for each instance bounding box and in the second part, assigning drawing number of instance bounding box to the drawing command of its related mesh resource. Besides, each mesh resource’s pointer and it’s first indirect draw command structure offset are stored in an unordered map. During the draw time, I traverse that map, binding mesh resources and call indirect draws with buffer offsets in pairs.
Description of Video:
In the video there are around 240 moving point lights and all of them cast omnidirectional shadows. The scene contains 63 teapots and a sponza scene mesh with about 390 submeshes. Rendering at 1024 x 720 resolution, with frame rate ranges from 35 to 60 fps. The Graphics hardware is Nvidia Geforce GTX 2070.