Optimizations
It soon became essential to optimize some parts of the project. Since critical
parts (like the realtime rendering) were already optimized with LoD bias, the only things we added were frustum culling and an occlusion map. We mostly tried to accelerate the loading phase and the pre-computing phase. Those optimizations are detailed below.
Pre-computing phase optimization
Mesh
After profiling the system, it was made clear that most of the pre-computing time was spent using vectors to hold vertices. Particularly, iteration and array extraction functions looked really slow. The solution was to replace the vectors used in the mesh with simple arrays. Along with better code comprehension, this achieved a 30% performance gain. Despite this performance boost, we kept some vectors when used for small-sized data, or when data size wasn't known in advance (which is the case with heightmaps).
Geomorphing
The second major part of the loading phase was the geomorphing data calculation. This code is separated into two parts:
- a first function (computeGeomorphingVertices), that iterated through all the triangulated mesh, in order to choose the points which need to be computed;
- a second function (computeGeomorphingVertex), called by the first one for each point to compute.
While the first function is mostly made of loops and very few calculations, there was probably not much to gain by optimizing it. Thus, we concentrated our efforts on the second function, and completely rewrote it to avoid int-to-float and float-to-int conversions, since those were clearly the bottleneck in the function. The call count being of millions, the boost appeared clearly.
Lightmap
The last loading part being slow was the lightmap calculation. This one could be optimized by computing all the normal vectors for the whole mesh, then keeping them temporarily, and finally by using them to compute the means. Each normal vector would thus be computed only once, whereas we are calculating the 6 normal vectors for each vertex, even if they were previously calculated for another vertex. This optimization was not implemented yet.
Frustum Culling
To optimize the critical part of the system (the realtime rendering), we implemented frustum culling. This method consists in using bounding boxes for each patch of the mesh, in order to check whether or not the patch is in the view frustum. If the patch is totally out of the view frustum, then we are sure (except for some rare cases) that it won't be visible, and thus we don't draw it. On the contrary, we render it, even if the part of the bounding box being in the view frustum doesn't contain any part of the patch. This optimization improved rendering speed by a factor of 2.
Hidden faces culling (Occlusion Maps)
This optimization was made because of the terrain being rendered, which contained lots of mountains. We realized that being in valleys or facing mountains, a huge part of the terrain being rendered was masked. To improve rendering speed, we decided to implement an algorithm that could be able to detect and ignore those patches, instead of rendering them uselessly as we did.
The algorithm was to detect invisible patches rapidly to be really useful. Actually, there is no point in detecting the patches to ignore if it's slower than rendering them. This is why we chose to use occlusion maps.
This algorithm computes an occlusion map for each patch in the mesh. This map tells us which height we need to reach for a patch of the map to be visible from the current patch (the one we are calculating the map for).
When all occlusion maps are computed, we just have to use the camera position, and positions of the patches we want to test. Then, using the occlusion map associated with the camera position, we can test whether or not the height is greater than the visibility limit, thus finding out if we render or ignore the patch.
(computed for patches marked in green and red)
Memory use
When optimizing, we noticed that most of the allocated memory was not freed. After analyzing those memory leaks, we succeeded in correcting the most important ones. The unfreed memory was thus lowered from hundreds of megabytes to a few dozens.