WebGPU: The Future of Web Graphics
Pipelines, bind groups, timestamp queries—and a small triangle example to keep everything concrete.
WebGPU is a modern, low‑overhead graphics and compute API for the web. It exposes explicit control over pipelines, bind groups, command encoders, and compute workloads, bringing browser rendering closer to Vulkan, Metal, and Direct3D 12 in terms of mental model. Hidden driver behavior does not disappear, but far fewer surprises leak into frame times.
In practical terms, WebGPU favors deliberate setup over ad‑hoc state changes. Pipeline objects are created ahead of time and reused; resource bindings are grouped and treated as data; command buffers are constructed explicitly and submitted in batches. The trade‑off is extra boilerplate upfront in exchange for predictable performance and clear ownership of GPU work.
Some of the core shifts compared to WebGL:
- Explicit pipelines: pipeline state is compiled once and kept around, rather than being inferred from mutable global state.
- Bind groups: textures, buffers, and samplers are clustered into bind groups so that whole sets of resources can be swapped with a single call.
- Timestamp queries:
GPUQuerySetmakes it possible to measure the cost of individual passes instead of guessing from wall‑clock timings.
The minimal triangle example below shows the end‑to‑end setup without helpers or frameworks. It covers adapter selection, device creation, canvas configuration, and a single render pass that clears the screen and draws three vertices:
// 1) Init
const canvas = document.querySelector('canvas');
const adapter = await navigator.gpu?.requestAdapter();
const device = await adapter.requestDevice();
const context = canvas.getContext('webgpu');
const format = navigator.gpu.getPreferredCanvasFormat();
context.configure({ device, format });
// 2) Shaders (WGSL)
const shader = device.createShaderModule({ code: `
@vertex fn v_main(@builtin(vertex_index) vi: u32) -> @builtin(position) vec4f {
var p = array<vec2f,3>(
vec2f(0.0, 0.6), vec2f(-0.6, -0.6), vec2f(0.6, -0.6)
);
return vec4f(p[vi], 0.0, 1.0);
}
@fragment fn f_main() -> @location(0) vec4f {
return vec4f(0.9, 0.3, 0.2, 1.0);
}
`});
// 3) Pipeline
const pipeline = device.createRenderPipeline({
layout: 'auto',
vertex: { module: shader, entryPoint: 'v_main' },
fragment: { module: shader, entryPoint: 'f_main', targets: [{ format }] },
primitive: { topology: 'triangle-list' }
});
// 4) Draw
function frame(){
const encoder = device.createCommandEncoder();
const view = context.getCurrentTexture().createView();
const pass = encoder.beginRenderPass({
colorAttachments: [{ view, loadOp: 'clear', storeOp: 'store', clearValue: {r:0.08,g:0.09,b:0.1,a:1} }]
});
pass.setPipeline(pipeline);
pass.draw(3);
pass.end();
device.queue.submit([encoder.finish()]);
requestAnimationFrame(frame);
}
frame();
Even in this small example, the WebGPU style is visible: pipeline and shaders are created once, while each animation frame focuses on encoding commands and submitting work. That separation scales cleanly to scenes with multiple passes and frame‑graph‑style orchestration.
WebGPU is not limited to rasterization. Compute pipelines share the same device, queue, and resource model, which makes it straightforward to interleave rendering passes with general‑purpose GPU work. The following snippet performs a prefix sum over a small buffer as a compact demonstration of dispatching compute work and reading the results back to the CPU:
const n = 256;
const input = new Uint32Array(n).map((_,i) => i);
const inBuf = device.createBuffer({ size: input.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST });
const outBuf = device.createBuffer({ size: input.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC });
const readBuf = device.createBuffer({ size: input.byteLength, usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST });
device.queue.writeBuffer(inBuf, 0, input);
const cshader = device.createShaderModule({ code: `
@group(0) @binding(0) var<storage, read> In: array<u32>;
@group(0) @binding(1) var<storage, read_write> Out: array<u32>;
@compute @workgroup_size(64) fn main(@builtin(global_invocation_id) gid: vec3u) {
let i = gid.x;
if (i < arrayLength(&In)) { Out[i] = (i == 0u) ? In[i] : In[i] + Out[i-1u]; }
}
`});
const cpipe = device.createComputePipeline({ layout: 'auto', compute: { module: cshader, entryPoint:'main' } });
const group = device.createBindGroup({ layout: cpipe.getBindGroupLayout(0), entries:[
{ binding:0, resource:{ buffer: inBuf } },
{ binding:1, resource:{ buffer: outBuf } }
]});
const enc = device.createCommandEncoder();
const passC = enc.beginComputePass();
passC.setPipeline(cpipe); passC.setBindGroup(0, group); passC.dispatchWorkgroups(Math.ceil(n/64));
passC.end(); enc.copyBufferToBuffer(outBuf,0,readBuf,0,input.byteLength);
device.queue.submit([enc.finish()]);
await readBuf.mapAsync(GPUMapMode.READ);
console.log(new Uint32Array(readBuf.getMappedRange()));
In production workloads, compute stages often handle tasks such as culling, clustering, particle updates, or preprocessing data for later rendering passes. Common patterns include designing pipeline layouts early, minimizing transient buffer allocations, and batching updates through queue.writeBuffer or staging buffers. Timings from GPUQuerySet help track regressions and keep each pass honest.