← back to main

How i render voxels

### What?

**Lum** is a voxel renderer i made. It is not an extendable general purpose engine, rather fast and simple library with certain design constraints - e.g. optimized for certain number of voxels on screen.

I am not an artist, and Lum embraces that - you are expected to reuse assets (in a form of blocks), program animations and draw some things with shaders.

### Key structures

## Pipeline

Before rendering even starts, “render requests” are collected from user code. Render request is a “command” to renderer that looks like {draw this block id at this position} or {draw this model with this rotation and this translation} (essentially the same “command buffer” system from Vulkan, repeated at higher level, for the same purposes).
Then, the pipeline looks something like this:

## Lightmaps

*(believe me, this boring part is necessary for an interesting one after)*

We start by generating lightmaps for blocks and models. Lightmaps are a classic approach, and for this, we don’t need voxel materials (until I implement transparency one day). We just need the geometry “contour” - vertices that define its silhouette.

To render this contour, we bind its vertex buffer and use push constants to pass in data for each draw call.

Since we only need model-space position for vertices, and models are not expected to be large, and vertices of a voxel are snapped to its corners, each vertex position can be a tiny u8vec3 - just 3 bytes (in fact, model sizes are limited to 255 to fit into u8. We could also remap the u8 range to increments of 2).

Also, for better culling, we don’t actually store the contour as a single triangle mesh. We abuse the fact that voxel sides are grid-aligned and only have 6 possible normals. The mesh is divided into 6 sections, one for each normal, and we cull them separately against the camera direction (btw some big games also use it, but it works best for voxels).

*if you are wondering why rasterization and not raytracing - rasterization IS raytracing - it is optimization of specialized case of raytracing*

At this point, the lightmap command buffers are done (of course, we execute them before shading).

## GBuffer

Lum is a “deferred” renderer, and in this stage, we determine the voxel material and normal for each pixel.

This is where things get interesting. We actually don’t need a second mesh - the contour we used for lightmapping is enough.

Since we render the 6 sides of the contour separately, they all have the same normal, which we just pass in push constants.

What about the material though? In the past, Lum encoded it into the vertex data (obvious thing to do). This required a lot of vertices (16x16 side of a block with random material voxels would generate a lot of vertices since they all have different material, unlike for contour, which can merge them) and Lum was (as you might have guessed from experience), vertex-bound.

The fix was to move this work into the pixel shader (individual voxel was 18 vertices, and resulted into few dozens of pixels - not a good ratio. right?). The vertex shader passes the interpolated model-space position of the vertex into the fragment shader. And as it turns out, that is enough to determine the material! We just fetch the voxel material at that position.

At this point, we have rasterized all blocks and models into our GBuffer.

There are some small visual features that I just really wanted to implement, so here we are:

## Compute

There is not much Lum does in compute workload - updating data structure to represent models, radiance field lighting and grass & water states.

At this point, all CPU parts of “Copy-on-write” happened - we “allocated” (which is just memorizing which indices are free and which are taken) temporary dynamic blocks on CPU. Now we actually copy data from static blocks to allocated dynamic ones (and for empty (air) blocks too, but for performance we “clear” (memset) corresponding subimages with 0). After that, for every requested model draw, we submit compute dispatch, that reads model-space voxel and writes it to corresponding world voxel - determines in which block it happens (reads world data), then indexes into block palette and determines actual voxel (previously we made sure all blocks that are occupied by models have been allocated, so we are guaranteed to not write static blocks).

Now, with updated data structure, it’s time to update radiance field.

Radiance is per-block lighting probes, which describe luminance in every direction with some encoding (currently for performance tuning there is no direction, LOL). For shading a pixel, we interpolate between multiple adjacent probes to estimate light coming from all directions to this pixel (note, that we don’t really care for it to be ReAlIsTiC, we only need it to look good).

So, we submit a compute shader, NxMx1, where N is number of radiance update requests, and M is number of rays per request. Currently, M is 64 (double warp size for swap, you know this). In each thread, we send a ray into random (uniform with some randomness) direction, raytrace it and downsample result to store in radiance probes (currently it has no direction, so just average across threads. When I’ll make it 6-directional, there will be mixing to 6 different directions) and mix it with what is currently in radiance field (accumulating across time).

My raytracing is actually ray marching - making small steps and checking collisions. I tried “precise” raymarchers (“based” on famous paper from 1987), in multiple different variations (branches/masks/partially unrolled/), I tried distance fields, I tried bit arrays instead of voxel for “checking” if voxel is non-empty, but the fastest good looking solution turned out to be simple fixed-step raymarching with rollback on collision for precise hit processing (and with manual “jumps” over air (empty) blocks).

Details about actual material light processing don’t really matter - it’s just a visual thing, you can use whatever you want. I like primitive shading model with roughness, color and emmitance.

As optimization, for tracing probe rays, instead of doing it until hit / out-of-bounds, we do it for a fixed distance, and if not a hit, “inherit” light from end point - I call it light propagation (it is another way to accumulate over time/space). It also creates cool effect of visible light waves.

## Shading

Modern games think that rendering in full-res is slow, so they shade in low-res and then upscale. However, upscaling can be costly (and looks terrible, and then you need neural networks and even more temporal accumulation with complicated algorithms to fix it), while subpasses allow a lot of optimizations*, so this tradeoff is almost never worth it. All of Lum’s shading happens in a single render pass, and that is a key reason why it can run on laptops. The frame image we render to potentially never even leaves the GPU’s on-chip memory.
*Especially if you target lower end of the spectrum - Integrated GPUs, which specifically perform worse in temporal accumulation and love subpass optimizations.*

The shading render pass order is:

“diffuse” light shading → ambient occlusion → glossy reflections → volumetrics → tonemapping to swapchain.

All em-dashes are AI-generated, harvested by me and brought here.
Feel free to ask any questions, anytime, anywhere. I would genuinely love to answer them.