wgpu/docs/api-specs/mesh_shading.md
Inner Daemons a05c70cef7
WGSL parsing for mesh shaders (#8370)
Co-authored-by: Jim Blandy <jimb@red-bean.com>
Co-authored-by: Erich Gubler <erichdongubler@gmail.com>
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
Co-authored-by: SupaMaggie70Incorporated <85136135+SupaMaggie70Incorporated@users.noreply.github.com>
2025-11-12 21:06:20 -05:00

18 KiB

Mesh Shader Extensions

🧪Experimental🧪

wgpu supports an experimental version of mesh shading when Features::EXPERIMENTAL_MESH_SHADER is enabled. The status of the implementation is documented in the mesh-shading issue.

Note: The features documented here may have major bugs in them and are expected to be subject to breaking changes. Suggestions for the API exposed by this should be posted on the issue above.

Mesh shaders overview

What are mesh shaders?

Mesh shaders are a new kind of rasterization pipeline intended to address some of the shortfalls with the vertex shader pipeline. The core idea of mesh shaders is that the GPU decides how to render the many small parts of a scene instead of the CPU issuing a draw call for every small part or issuing an inefficient monolithic draw call for a large part of the scene.

Mesh shaders are specifically designed to be used with meshlet rendering, a technique where every object is split into many subobjects called meshlets that are each rendered with their own parameters. With the standard vertex pipeline, each draw call specifies an exact number of primitives to render and the same parameters for all vertex shaders on an entire object (or even multiple objects). This doesn't leave room for different LODs for different parts of an object, for example a closer part having more detail, nor does it allow culling smaller sections (or primitives) of objects. With mesh shaders, each task workgroup might get assigned to a single object. It can then analyze the different meshlets(sections) of that object, determine which are visible and should actually be rendered, and for those meshlets determine what LOD to use based on the distance from the camera. It can then dispatch a mesh workgroup for each meshlet, with each mesh workgroup then reading the data for that specific LOD of its meshlet, determining which and how many vertices and primitives to output, determining which remaining primitives need to be culled, and passing the resulting primitives to the rasterizer.

Mesh shaders are most effective in scenes with many polygons. They can allow skipping processing of entire groups of primitives that are facing away from the camera or otherwise occluded, which reduces the number of primitives that need to be processed by more than half in most cases, and they can reduce the number of primitives that need to be processed for more distant objects. Scenes that are not bottlenecked by geometry (perhaps instead by fragment processing or post processing) will not see much benefit from using them.

Mesh shaders were first shown off in NVIDIA's asteroids demo. Now, they form the basis for Unreal Engine's Nanite.

Mesh shader pipeline

With the current pipeline set to a mesh pipeline, a draw command like render_pass.draw_mesh_tasks(x, y, z) takes the following steps:

  • If the pipeline has a task shader stage:

    • Dispatch a grid of task shader workgroups, where x, y, and z give the number of workgroups along each axis of the grid. Each task shader workgroup produces a mesh shader workgroup grid size (mx, my, mz) and a task payload value mp.

    • For each task shader workgroup, dispatch a grid of mesh shader workgroups, where mx, my, and mz give the number of workgroups along each axis of the grid. Pass mp to each of these workgroup's mesh shader invocations.

  • Alternatively, if the pipeline does not have a task shader stage:

    • Dispatch a single grid of mesh shader workgroups, where x, y, and z give the number of workgroups along each axis of the grid. These mesh shaders receive no task payload value.
  • Each mesh shader workgroup produces a list of output vertices, and a list of primitives built from those vertices. The workgroup can supply per-primitive values as well, if needed. Each primitive selects its vertices by index, like an indexed draw call, from among the vertices generated by this workgroup.

    Unlike a grid of ordinary compute shader workgroups collaborating to build vertex and index data in common storage buffers, the vertices and primitives produced by a mesh shader workgroup are entirely private to that workgroup, and are not accessible by other workgroups.

  • Primitives produced by a mesh shader workgroup can have a culling flag. If a primitive's culling flag is false, it is skipped during rasterization.

  • The primitives produced by all mesh shader workgroups are then rasterized in the usual way, with each fragment shader invocation handling one pixel.

    Attributes from the vertices produced by the mesh shader workgroup are provided to the fragment shader with interpolation applied as appropriate.

    If the mesh shader workgroup supplied per-primitive values, these are available to each primitive's fragment shader invocations. Per-primitive values are never interpolated; fragment shaders simply receive the values the mesh shader workgroup associated with their primitive.

wgpu API

New wgpu functions

Device::create_mesh_pipeline - Creates a mesh shader pipeline. This is very similar to creating a standard render pipeline, except that it takes a mesh shader state and optional task shader state instead of a vertex state. If the task state is omitted, during rendering the number of workgroups is passed directly from the draw call to the mesh shader state, with an empty payload.

RenderPass::draw_mesh_tasks - Dispatches the mesh shader pipeline. This ignores render pipeline specific information, such as vertex buffer bindings and index buffer bindings. The dispatch size must adhere to the limits described below.

RenderPass::draw_mesh_tasks_indirect, RenderPass::multi_draw_mesh_tasks_indirect and RenderPass::multi_draw_mesh_tasks_indirect_count - Dispatches the mesh shader pipeline with dispatch size taken from a buffer. This ignores render pipeline specific information, such as vertex buffer bindings and index buffer bindings. The dispatch size must adhere to the limits described below. Analogous to draw_indirect, multi_draw_indirect and multi_draw_indirect_count. Requires the corresponding indirect feature to be enabled.

An example of using mesh shaders to render a single triangle can be seen here.

Features

  • Using mesh shaders requires enabling Features::EXPERIMENTAL_MESH_SHADER.
  • Using mesh shaders with multiview requires enabling Features::EXPERIMENTAL_MESH_SHADER_MULTIVIEW.
  • Using mesh shaders with point primitives requires enabling Features::EXPERIMENTAL_MESH_SHADER_POINTS.
  • Queries are unsupported
  • Primitive index support will be added once support lands in for them in general.

Limits

Note

: More limits will be added when support is added to naga.

  • Limits::max_task_workgroup_total_count - the maximum total number of workgroups from a draw_mesh_tasks command or similar. The dimensions passed must be less than or equal to this limit when multiplied together.
  • Limits::max_task_workgroups_per_dimension - the maximum for each of the 3 workgroup dimensions in a draw_mesh_tasks command. Each dimension passed must be less than or equal to this limit.
  • max_mesh_multiview_count - The maximum number of views used when multiview rendering with a mesh shader pipeline.
  • max_mesh_output_layers - the maximum number of output layers for a mesh shader pipeline.

Naga implementation

Supported frontends

  • 🛠️ WGSL
  • SPIR-V
  • 🚫 GLSL

Supported backends

  • 🛠️ SPIR-V
  • 🛠️ HLSL
  • MSL
  • 🚫 GLSL
  • 🚫 WGSL

✔️ = Complete 🛠️ = In progress = Planned 🚫 = Unplanned/impossible

WGSL extension specification

The majority of changes relating to mesh shaders will be in WGSL and naga.

Using any of these features in a wgsl program will require adding the enable wgpu_mesh_shader; directive to the top of a program.

Two new shader stages will be added to WGSL. Fragment shaders are also modified slightly. Both task shaders and mesh shaders are allowed to use any compute-available functionality, including subgroup operations.

Task shader

A function with the @task attribute is a task shader entry point. A mesh shader pipeline may optionally specify a task shader entry point, and if it does, mesh draw commands using that pipeline dispatch a task shader grid of workgroups running the task shader entry point. Like compute shader dispatches, the three-component size passed to draw_mesh_tasks, or drawn from the indirect buffer for its indirect variants, specifies the size of the task shader grid as the number of workgroups along each of the grid's three axes.

A task shader entry point must have a @workgroup_size attribute, meeting the same requirements as one appearing on a compute shader entry point.

A task shader entry point must also have a @payload(G) property, where G is the name of a global variable in the task_payload address space. Each task shader workgroup has its own instance of this variable, visible to all invocations in the workgroup. Whatever value the workgroup collectively stores in that global variable becomes the task payload, and is provided to all invocations in the mesh shader grid dispatched for the workgroup. A task payload variable must be at least 4 bytes in size.

A task shader entry point must return a vec3<u32> value. The return value of each workgroup's first invocation (that is, the one whose local_invocation_index is 0) is taken as the size of a mesh shader grid to dispatch, measured in workgroups. (If the task shader entry point returns vec3(0, 0, 0), then no mesh shaders are dispatched.) Mesh shader grids are described in the next section.

Each task shader workgroup dispatches an independent mesh shader grid: in mesh shader invocations, @builtin values like workgroup_id and global_invocation_id describe the position of the workgroup and invocation within that grid; and @builtin(num_workgroups) matches the task shader workgroup's return value. Mesh shaders dispatched for other task shader workgroups are not included in the count. If it is necessary for a mesh shader to know which task shader workgroup dispatched it, the task shader can include its own workgroup id in the task payload.

Task shaders must return a value of type vec3<u32> decorated with @builtin(mesh_task_size).

Task shaders can use compute and subgroup builtin inputs, in addition to view_index and draw_id.

Mesh shader

A function with the @mesh attribute is a mesh shader entry point. Mesh shaders must not return anything.

Like compute shaders, mesh shaders are invoked in a grid of workgroups, called a mesh shader grid. If the mesh shader pipeline has a task shader, then each task shader workgroup determines the size of a mesh shader grid to be dispatched, as described above. Otherwise, the three-component size passed to draw_mesh_tasks, or drawn from the indirect buffer for its indirect variants, specifies the size of the mesh shader grid directly, as the number of workgroups along each of the grid's three axes.

If the mesh shader pipeline has a task shader entry point, then the pipeline's mesh shader entry point must also have a @payload(G) attribute, and the sizes of the variables must match. Mesh shader invocations can read from, but not write to, this variable, which is initialized to whatever value was written to it by the task shader workgroup that dispatched this mesh shader grid.

If the mesh shader pipeline does not have a task shader entry point, then the mesh shader entry point must not have any @payload attribute.

A mesh shader entry point must have the following attributes:

  • @workgroup_size: this has the same meaning as when it appears on a compute shader entry point.

  • @mesh(VAR): Here, VAR represents a workgroup variable storing the output information.

All mesh shader outputs are per-workgroup, and taken from the workgroup variable specified above. The type must have exactly 4 fields:

  • A field decorated with @builtin(vertex_count), with type u32: this field represents the number of vertices that will be drawn
  • A field decorated with @builtin(primitive_count), with type u32: this field represents the number of primitives that will be drawn
  • A field decorated with @builtin(vertices), typed as an array of V, where V is the vertex output type as specified below
  • A field decorated with @builtin(primitives), typed as an array of P, where P is the primitive output type as specified below

For a vertex count NV, the first NV elements of the vertex array above are outputted. Therefore, NV must be less than or equal to the size of the vertex array. The same is true for primitives with NP.

The vertex output type V must meet the same requirements as a struct type returned by a @vertex entry point: all members must have either @builtin or @location attributes, there must be a @builtin(position), and so on.

The primitive output type P must be a struct type, every member of which either has a @location or @builtin attribute. All members decorated with @location must also be decorated with @per_primitive, as must the corresponding fragment input. The @per_primitive decoration may only be applied to members decorated with @location. The following @builtin attributes are allowed:

  • triangle_indices, line_indices, or point_index: The annotated member must be of type vec3<u32>, vec2<u32>, or u32.

    The member's components are indices (or, its value is an index) into the list of vertices generated by this workgroup, identifying the vertices of the primitive to be drawn. These indices must be less than the value of numVertices passed to setMeshOutputs.

    The type P must contain exactly one member with one of these attributes, determining what sort of primitives the mesh shader generates.

  • cull_primitive: The annotated member must be of type bool. If it is true, then the primitive is skipped during rendering.

The @location attributes of P and V must not overlap, since they are merged to produce the user-defined inputs to the fragment shader.

Mesh shaders may write to the primitive_index builtin. This is treated just like a field decorated with @location, so if the mesh shader outputs primitive_index the fragment shader must input it, and if the fragment shader inputs it, the mesh shader must write it (unlike vertex shader pipelines).

Mesh shaders can use compute and mesh shader builtin inputs, in addition to view_index, and if no task shader is present, draw_id.

Fragment shader

Fragment shaders can access vertex output data as if it is from a vertex shader. They can also access primitive output data, provided the input is decorated with @per_primitive. The @per_primitive decoration may only be applied to inputs or struct members decorated with @location.

The primitive state is part of the fragment input and must match the output of the mesh shader in the pipeline. Using @per_primitive also requires enabling the mesh shader extension. Additionally, the locations of vertex and primitive input cannot overlap.

Full example

The following is a full example of WGSL shaders that could be used to create a mesh shader pipeline, showing off many of the features.

enable wgpu_mesh_shader;

const positions = array(
    vec4(0., 1., 0., 1.),
    vec4(-1., -1., 0., 1.),
    vec4(1., -1., 0., 1.)
);
const colors = array(
    vec4(0., 1., 0., 1.),
    vec4(0., 0., 1., 1.),
    vec4(1., 0., 0., 1.)
);

struct TaskPayload {
    colorMask: vec4<f32>,
    visible: bool,
}
struct VertexOutput {
    @builtin(position) position: vec4<f32>,
    @location(0) color: vec4<f32>,
}
struct PrimitiveOutput {
    @builtin(triangle_indices) indices: vec3<u32>,
    @builtin(cull_primitive) cull: bool,
    @per_primitive @location(1) colorMask: vec4<f32>,
}
struct PrimitiveInput {
    @per_primitive @location(1) colorMask: vec4<f32>,
}

var<task_payload> taskPayload: TaskPayload;
var<workgroup> workgroupData: f32;

@task
@payload(taskPayload)
@workgroup_size(1)
fn ts_main() -> @builtin(mesh_task_size) vec3<u32> {
    workgroupData = 1.0;
    taskPayload.colorMask = vec4(1.0, 1.0, 0.0, 1.0);
    taskPayload.visible = true;
    return vec3(1, 1, 1);
}

struct MeshOutput {
    @builtin(vertices) vertices: array<VertexOutput, 3>,
    @builtin(primitives) primitives: array<PrimitiveOutput, 1>,
    @builtin(vertex_count) vertex_count: u32,
    @builtin(primitive_count) primitive_count: u32,
}
var<workgroup> mesh_output: MeshOutput;

@mesh(mesh_output)
@payload(taskPayload)
@workgroup_size(1)
fn ms_main(@builtin(local_invocation_index) index: u32, @builtin(global_invocation_id) id: vec3<u32>) {
    mesh_output.vertex_count = 3;
    mesh_output.primitive_count = 1;
    workgroupData = 2.0;

    mesh_output.vertices[0].position = positions[0];
    mesh_output.vertices[0].color = colors[0] * taskPayload.colorMask;

    mesh_output.vertices[1].position = positions[1];
    mesh_output.vertices[1].color = colors[1] * taskPayload.colorMask;

    mesh_output.vertices[2].position = positions[2];
    mesh_output.vertices[2].color = colors[2] * taskPayload.colorMask;

    mesh_output.primitives[0].indices = vec3<u32>(0, 1, 2);
    mesh_output.primitives[0].cull = !taskPayload.visible;
    mesh_output.primitives[0].colorMask = vec4<f32>(1.0, 0.0, 1.0, 1.0);
}

@fragment
fn fs_main(vertex: VertexOutput, primitive: PrimitiveInput) -> @location(0) vec4<f32> {
    return vertex.color * primitive.colorMask;
}