Introducing MeshOptimizer
For GPUs to render a mesh efficiently, all vertices in the vertex buffer should be unique and without duplicates. Solving this problem efficiently can be a complicated and computationally intensive task in any modern 3D content pipeline.
MeshOptimizer is an open source C++ library developed by Arseny Kapoulkine, which provides algorithms to help optimize meshes for modern GPU vertex and index processing pipelines. It can reindex an existing index buffer or generate an entirely new set of indices from an unindexed vertex buffer.
Getting ready
We use MeshOptimizer version 0.16. Here is the Bootstrap snippet that you can use to download this version:
{ "name": "meshoptimizer", "source": { "type": "git", "url": "https://github.com/zeux/meshoptimizer", "revision": "v0.16" } }
The complete source code for this recipe can be found in Chapter2/10_MeshOptimizer
.
How to do it...
Let's use MeshOptimizer to optimize the vertex and index buffer layouts of a mesh loaded by the Assimp library. Then, we can generate a simplified model of the mesh:
- First, we load our mesh via Assimp, as shown in the following code snippet. We preserve the existing vertices and indices exactly as they were loaded by Assimp:
const aiScene* scene = aiImportFile( "data/rubber_duck/scene.gltf", aiProcess_Triangulate); const aiMesh* mesh = scene->mMeshes[0]; std::vector<vec3> positions; std::vector<unsigned int> indices; for (unsigned i = 0; i != mesh->mNumVertices; i++) { const aiVector3D v = mesh->mVertices[i]; positions.push_back( vec3(v.x, v.z, v.y) ); } for (unsigned i = 0; i != mesh->mNumFaces; i++) { for ( unsigned j = 0; j != 3; j++ ) indices.push_back(mesh->mFaces[i].mIndices[j]); } aiReleaseImport(scene);
- Now we should generate a remap table for our existing vertex and index data:
std::vector<unsigned int> remap( indices.size() ); const size_t vertexCount = meshopt_generateVertexRemap( remap.data(), indices.data(), indices.size(), positions.data(), indices.size(), sizeof(vec3) );
The MeshOptimizer documentation (https://github.com/zeux/meshoptimizer) tells us the following:
- The returned
vertexCount
value corresponds to the number of unique vertices that have remained after remapping. Let's allocate space and generate new vertex and index buffers:std::vector<unsigned int> remappedIndices( indices.size() ); std::vector<vec3> remappedVertices( vertexCount ); meshopt_remapIndexBuffer( remappedIndices.data(), indices.data(), indices.size(), remap.data() ); meshopt_remapVertexBuffer( remappedVertices.data(), positions.data(), positions.size(), sizeof(vec3), remap.data() );
Now we can use other MeshOptimizer algorithms to optimize these buffers even further. The official documentation is pretty straightforward. We will adapt the example it provides for the purposes of our demo application.
- When we want to render a mesh, the GPU has to transform each vertex via a vertex shader. GPUs can reuse transformed vertices by means of a small built-in cache, usually storing between
16
and32
vertices inside it. In order to use this small cache effectively, we need to reorder the triangles to maximize the locality of vertex references. How to do this with MeshOptimizer in place is shown next. Pay attention to how only the indices data is being touched here:meshopt_optimizeVertexCache( remappedIndices.data(), remappedIndices.data(), indices.size(), vertexCount );
- Transformed vertices form triangles that are sent for rasterization to generate fragments. Usually, each fragment is run through a depth test first, and fragments that pass the depth test get the fragment shader executed to compute the final color. As fragment shaders get more and more expensive, it becomes increasingly important to reduce the number of fragment shader invocations. This can be achieved by reducing pixel overdraw in a mesh, and, in general, it requires the use of view-dependent algorithms. However, MeshOptimizer implements heuristics to reorder the triangles and minimize overdraw from all directions. We can use it as follows:
meshopt_optimizeOverdraw( remappedIndices.data(), remappedIndices.data(), indices.size(), glm::value_ptr(remappedVertices[0]), vertexCount, sizeof(vec3), 1.05f );
The last parameter,
1.05
, is the threshold that determines how much the algorithm can compromise the vertex cache hit ratio. We use the recommended default value from the documentation. - Once we have optimized the mesh to reduce pixel overdraw, the vertex buffer access pattern can still be optimized for memory efficiency. The GPU has to fetch specified vertex attributes from the vertex buffer and pass this data into the vertex shader. To speed up this fetch, a memory cache is used, which means optimizing the locality of vertex buffer access is very important. We can use MeshOptimizer to optimize our index and vertex buffers for vertex fetch efficiency, as follows:
meshopt_optimizeVertexFetch( remappedVertices.data(), remappedIndices.data(), indices.size(), remappedVertices.data(), vertexCount, sizeof(vec3) );
This function will reorder vertices in the vertex buffer and regenerate indices to match the new contents of the vertex buffer.
- The last thing we will do in this recipe is simplify the mesh. MeshOptimizer can generate a new index buffer that uses existing vertices from the vertex buffer with a reduced number of triangles. This new index buffer can be used to render Level-of-Detail (LOD) meshes. The following code snippet shows you how to do this using the default threshold and target error values:
const float threshold = 0.2f; const size_t target_index_count = size_t( remappedIndices.size() * threshold); const float target_error = 1e-2f; std::vector<unsigned int> indicesLod( remappedIndices.size() ); indicesLod.resize( meshopt_simplify( &indicesLod[0], remappedIndices.data(), remappedIndices.size(), &remappedVertices[0].x, vertexCount, sizeof(vec3), target_index_count, target_error) );
Multiple LOD meshes can be generated this way by changing the threshold
value.
Let's render the optimized and LOD meshes that we created earlier:
- For the simplicity of this demo, we copy the remapped data back into the original vectors as follows:
indices = remappedIndices; positions = remappedVertices;
- With modern OpenGL, we can store vertex and index data inside a single buffer. You can do this as follows:
const size_t sizeIndices = sizeof(unsigned int) * indices.size(); const size_t sizeIndicesLod = sizeof(unsigned int) * indicesLod.size(); const size_t sizeVertices = sizeof(vec3) * positions.size(); glNamedBufferStorage(meshData, sizeIndices + sizeIndicesLod + sizeVertices, nullptr, GL_DYNAMIC_STORAGE_BIT); glNamedBufferSubData( meshData, 0, sizeIndices, indices.data()); glNamedBufferSubData(meshData, sizeIndices, sizeIndicesLod, indicesLod.data()); glNamedBufferSubData(meshData, sizeIndices + sizeIndicesLod, sizeVertices, positions.data());
- Now we should tell OpenGL where to read the vertex and index data from. The starting offset to the vertex data is
sizeIndices + sizeIndicesLod
:glVertexArrayElementBuffer(VAO, meshData); glVertexArrayVertexBuffer(VAO, 0, meshData, sizeIndices + sizeIndicesLod, sizeof(vec3)); glEnableVertexArrayAttrib(VAO, 0); glVertexArrayAttribFormat( VAO, 0, 3, GL_FLOAT, GL_FALSE, 0); glVertexArrayAttribBinding(VAO, 0, 0);
- To render the optimized mesh, we can call
glDrawElements()
, as follows:glDrawElements(GL_TRIANGLES, indices.size(), GL_UNSIGNED_INT, nullptr);
- To render the simplified LOD mesh, we use the number of indices in the LOD and use an offset to where its indices start in the index buffer. We need to skip
sizeIndices
bytes to do it:glDrawElements(GL_TRIANGLES, indicesLod.size(), GL_UNSIGNED_INT, (void*)sizeIndices);
The resulting image should look similar to the following screenshot:
There's more...
This recipe uses a slightly different technique for the wireframe rendering. Instead of rendering a mesh twice, we use barycentric coordinates to identify the proximity of the triangle edge inside each triangle and change the color accordingly. Here is the geometry shader to generate barycentric coordinates for a triangular mesh:
#version 460 core layout( triangles ) in; layout( triangle_strip, max_vertices = 3 ) out; layout (location=0) in vec3 color[]; layout (location=0) out vec3 colors; layout (location=1) out vec3 barycoords; void main() {
Next, store the values of the barycentric coordinates for each vertex of the triangle:
const vec3 bc[3] = vec3[]( vec3(1.0, 0.0, 0.0), vec3(0.0, 1.0, 0.0), vec3(0.0, 0.0, 1.0) ); for ( int i = 0; i < 3; i++ ) { gl_Position = gl_in[i].gl_Position; colors = color[i]; barycoords = bc[i]; EmitVertex(); } EndPrimitive(); }
Barycentric coordinates can be used inside the fragment shader to discriminate colors in the following way:
#version 460 core layout (location=0) in vec3 colors; layout (location=1) in vec3 barycoords; layout (location=0) out vec4 out_FragColor; float edgeFactor(float thickness) { vec3 a3 = smoothstep( vec3(0.0), fwidth(barycoords) * thickness,barycoords ); return min( min(a3.x, a3.y), a3.z ); } void main() { out_FragColor = vec4(mix(vec3(0.0), colors, edgeFactor(1.0)), 1.0); };
The fwidth()
function calculates the sum of the absolute values of the derivatives in the x
and y
screen coordinates and is used to determine the thickness of the lines. The smoothstep()
function is used for antialiasing.