OpenGL 4 Tessellation

Nov. 07, 2010
comments

I bought a new graphics card which is OpenGL 4.1 compatible. And since tessellation is hot, this article explains how to use tessellation to LOD large terrains easily.

I am going to assume knowledge of the OpenGL pipeline and API, matrix and vector math, vertex and fragment shader programming and if you want to try it, OpenGL 4/DX11 compatible hardware.

Contents

Overview and video

The terrain in this video is rendered from a 4096x4096 pixel heightmap, normalmap and texture. It consists of 128x128 patches, each of which can be divided up to 64 times. This gives a virtual triangle resolution of up to 67 million triangles.

Since the GTX-460 has a performance of about 600 million triangles per second, if all triangles where put on screen the result would be around 10 frames per second.

Using tessellation shaders and a LODing technique, I can get framerates between 360-500 FPS on this scene.

Why GPU LODing?

LODing has a long history and traditionally it was all done on the CPU and then pushed to the GPU. By far and large all traditional approaches to LODing of terrain have these things in common.

  • They are complex and difficult to implement, for instance Overgrowth by Wolfire eschews the use of CPU LODing in favor of TINs.
  • Noticable temporal and/or spatial artefacts
  • Not compatible with TINs
  • Shift load from the (very powerful) GPU to the CPU
  • Rely on a lot of preprocessing

By moving the level of detail implementation into the graphics card, these issues can be addressed much more satisfactory.

New OpenGL Functionality

Opengl 4 introduced three new pipeline stages between Vertex shading and Geometry shading

  • Tessellation Control: Is Programmable. Decides how often a patch is subdivided.
  • Tessellator: Is configurable. Takes the data from control and produces new primitives.
  • Tessellation Evaluation: Is programmable. Receives the output of the tessellator, can modify each output vertex.

It also introduced a couple of new primitives, the most important of which is GL_PATCHES which can have between 1 - 32 vertices per patch.

Libraries

I use a texture for the terrain color, a heightmap texture for the terrain elevation and a normalmap texture for normals. The terrain is generated with Lithosphere. For windowing, opengl context and input I use Pyglet and an abstraction library for advanced OpengGL usage: gletools.

Setup

The normal and heightmap are baked together into a GL_RGBA32F texture where RGB is the normal and Alpha is the terrain height. With these textures bound and the shader program, a plane consisting of 128x128 quad patches is rendered using a Vertexbuffer object. The plane covers the world coordinates from 0,0,0 to 1,1,0. The Z axis in worldspace is assumed to be terrain elevation.

Vertex Shader

For each vertex sent in a lookup into the terrain texture is made using x and y positions as texture coordinates. The obtained height is written to the z component of the resulting gl_Position.

in vec4 position;
uniform sampler2D terrain;

void main(void){
    vec2 texcoord = position.xy;
    float height = texture(terrain, texcoord).a;
    vec4 displaced = vec4(
        position.x, position.y,
        height, 1.0);
    gl_Position = displaced;
}

Tessellation control shader

Defines that this shader produces 4 vertices of output.

layout(vertices = 4) out;

Accept three uniforms, the screen size in pixels, the modelview/projection matrix and a LOD factor

uniform vec2 screen_size;
uniform mat4 mvp;
uniform float lod_factor;

A helper function to project a world space vertex to device normal space

vec4 project(vec4 vertex){
    vec4 result = mvp * vertex;
    result /= result.w;
    return result;
}

This helper function converts a device normal space vector to screen space

vec2 screen_space(vec4 vertex){
    return (clamp(vertex.xy, -1.3, 1.3)+1) * (screen_size*0.5);
}

The LOD calculation as a function of distance in screen space

float level(vec2 v0, vec2 v1){
     return clamp(distance(v0, v1)/lod_factor, 1, 64);
 }

To improve performance this function is used to test a vertex in device normal space against the view frustum

bool offscreen(vec4 vertex){
    if(vertex.z < -0.5){
        return true;
    }   
    return any(
        lessThan(vertex.xy, vec2(-1.7)) ||
        greaterThan(vertex.xy, vec2(1.7))
    );  
}

The main function is called for each vertex in the patch. gl_InvocationID identifies which vertex is being processed.

An estimate is made whether a primitive is on screen, and if not, all tessellation levels are set to zero which causes this patch to be skipped.

If the patch is on screen then each edge is subdivided such as to approximate the given lod_factor (in pixels per edge).

void main(){
     #define id gl_InvocationID
     gl_out[id].gl_Position = gl_in[id].gl_Position;
     if(id == 0){
         vec4 v0 = project(gl_in[0].gl_Position);
         vec4 v1 = project(gl_in[1].gl_Position);
         vec4 v2 = project(gl_in[2].gl_Position);
         vec4 v3 = project(gl_in[3].gl_Position);

         if(all(bvec4(
             offscreen(v0),
             offscreen(v1),
             offscreen(v2),
             offscreen(v3)
         ))){
             gl_TessLevelInner[0] = 0;
             gl_TessLevelInner[1] = 0;
             gl_TessLevelOuter[0] = 0;
             gl_TessLevelOuter[1] = 0;
             gl_TessLevelOuter[2] = 0;
             gl_TessLevelOuter[3] = 0;
         }
         else{
             vec2 ss0 = screen_space(v0);
             vec2 ss1 = screen_space(v1);
             vec2 ss2 = screen_space(v2);
             vec2 ss3 = screen_space(v3);

             float e0 = level(ss1, ss2);
             float e1 = level(ss0, ss1);
             float e2 = level(ss3, ss0);
             float e3 = level(ss2, ss3);

             gl_TessLevelInner[0] = mix(e1, e2, 0.5);
             gl_TessLevelInner[1] = mix(e0, e3, 0.5);
             gl_TessLevelOuter[0] = e0;
             gl_TessLevelOuter[1] = e1;
             gl_TessLevelOuter[2] = e2;
             gl_TessLevelOuter[3] = e3;
         }
     }
 }

The resulting mesh looks like this:

The tessellation is split into the inner part (gl_TessLevelInner) which governs how often the inside of a patch is divided, and the outer part (glTessLevelOuter) which governs how often an edge is divided.

The gl_TessLevel* and gl_InvocationID variables for quad sized patches correlate in the following way:

  • gl_TessLevelInner[0] orientation corresponds to the edges identified by gl_TessLevelOuter[1] and glTessLevelOuter[2]
  • gl_TessLevelInner[1] orientation corresponds to the edges identified by gl_TessLevelOuter[0] and glTessLevelOuter[3]
  • gl_TessLevelOuter[0] corresponds to gl_InvocationID 1 and 2
  • gl_TessLevelOuter[1] corresponds to gl_InvocationID 0 and 1
  • gl_TessLevelOuter[2] corresponds to gl_InvocationID 3 and 0
  • gl_TessLevelOuter[2] corresponds to gl_InvocationID 2 and 3

Tessellation Evaluation Shader

Controls the Tessellator, telling it to produce smoothly sliding divisions (determined by the tessellation levels) at odd spacings. fractional_even_spacing and equal_spacing is also available.

layout(quads, fractional_odd_spacing, ccw) in;

It produces a texture coordinate and a depth for use in the fragment shader, and it samples the terrain map again.

out vec2 texcoord;
out float depth;

uniform sampler2D terrain;
uniform mat4 mvp;

The Evaluation main function is called once for each vertex of the tessellated output. The coordinate is given as UV vector relative to the positions of the patches control points.

After the position calculation, the texcoord is used to make the lookup into the terrain heightmap. Again the obtained height is used as Z component in the resulting gl_Position.

void main(){
    float u = gl_TessCoord.x;
    float v = gl_TessCoord.y;

    vec4 a = mix(gl_in[1].gl_Position, gl_in[0].gl_Position, u);
    vec4 b = mix(gl_in[2].gl_Position, gl_in[3].gl_Position, u);
    vec4 position = mix(a, b, v);
    texcoord = position.xy;
    float height = texture(terrain, texcoord).a;
    gl_Position = mvp * vec4(texcoord, height, 1.0);
    depth = gl_Position.z;
}

A geometry shader stage is not required and the fragment shader is business as usual and you can look it up in the full source.

Source Code

This terrain LOD implementation is available in the gletools examples or you can view the application and shader source directly. I use a small preprocessor to split the shader source in its respective (vertex, control, evaluator, fragment) components and add the version tag for each.

I cannot share the terrain data at this point because it is 400mb in size.

Advantages

  • The implementation is reasonably easy with about 120 lines of shading code.
  • A good quality of tessellation can be achieved
  • Instead of regularly divided meshes, TINs could just as easily be used
  • Keeps all the load on the GPU
  • No preprocessing required

Issues and Limitations

The current algorithm for selecting the LOD factor tends to select too few division on an edge when the view vector is collinear to a patches edge and the patch is near to the viewpoint. This leads to odd (but small) artifacts.

A patch can be divided maximally by 64 further divisions per edge. Very large terrains would require a large number of input patches. This would lead to faraway patches being smaller then the desired primitive size, and near patches would be partially over tessellated.

Loading larger terrains into vram is not feasible with the current approach.

I do not have an ATI card, and I cannot comment if this code runs on it. It is largely unknown how many people as a gaming/creative professional audience would be able to use OpenGL 4 compatible hardware and drivers right now.

Further Work

  • A better measure for the LOD factor could be developed that both preserves screen-space even division, but takes into account edges with too few divisions.
  • Multi stage GPU tessellation using transform and feedback buffers could be used. This would allow fine grained control over tessellation without producing massive amounts of patches.
  • The technique of texture clipmapping/megatexturing could be used to hold texture data many times larger then the available VRAM

Acknowledgements

The Rastergrid blog has some very good entries on new opengl functionality and rendering techniques.

These two entries by The Little Grasshopper where instrumental for me to understand tessellation shading.

The Redbook and the Orangebook where valuable sources of reference information and I'm looking forward to see these books in OpenGL 4.1 updated version.