GTAO Pass — Horizon-Based Ambient Occlusion with Spatial and Temporal Denoising
The GTAO pipeline is a three-stage compute-post-processing system that produces high-quality screen-space ambient occlusion from a depth buffer and world-space normal buffer. Ground-Truth Ambient Occlusion (Jimenez et al. 2016) replaces classic ray-marched SSAO with an analytic horizon-search approach: for each pixel, the shader sweeps multiple directions across the depth buffer, finds the occlusion horizon on each side, and integrates the cosine-weighted visibility integral in closed form. The raw output — noisy due to limited sample counts — then passes through a 5×5 edge-preserving bilateral blur and a reprojection-based temporal accumulator with three-layer rejection, producing a temporally stable AO signal with a side product: a bent normal vector representing the mean unoccluded direction per pixel.
The pipeline's three passes are orchestrated by the renderer in strict sequential order — GTAO → Spatial → Temporal — within the render graph, which automatically inserts memory barriers between them. The final ao_filtered image is consumed by the Forward Pass for diffuse AO and specular occlusion (GTSO cone intersection), and also serves as temporal history for the next frame.
Sources: gtao_pass.cpp, ao_spatial_pass.cpp, ao_temporal_pass.cpp, renderer_rasterization.cpp
Pipeline Architecture and Data Flow
The AO pipeline operates entirely in compute shaders dispatched at full resolution (one 8×8 workgroup thread per pixel), reading from the resolved depth and normal buffers produced by the Depth Prepass. No shared memory is used — each thread independently processes its pixel.
D32Sfloat] N[Normal Buffer
R10G10B10A2] DP[Depth Prev
D32Sfloat] AH[AO History
RGBA8] end subgraph Pass 1 — GTAO G[GTAO
Compute 8×8
Horizon Search +
Analytic Integration] end subgraph Pass 2 — Spatial S[Bilateral Blur
Compute 8×8
5×5 Edge-Aware
Gaussian + Depth] end subgraph Pass 3 — Temporal T[Temporal Filter
Compute 8×8
Reprojection +
3-Layer Rejection] end subgraph Outputs AF[ao_filtered
RGBA8] FWD[Forward Pass
Diffuse AO + GTSO] end D --> G N --> G G -->|ao_noisy
RGBA8| S D --> S S -->|ao_blurred
RGBA8| T D --> T DP --> T AH --> T T -->|ao_filtered
RGBA8| AF AF -->|Next frame
history| AH AF --> FWD
The render graph manages four intermediate resources — ao_noisy, ao_blurred, ao_filtered, and ao_history — all in R8G8B8A8Unorm format. The ao_filtered image is created with history support (true parameter to create_managed_image), enabling the render graph to retain the previous frame's result for temporal reprojection. Each pass declares its resource accesses (read/write) to the render graph, which automatically inserts layout transitions and memory barriers between passes.
Sources: renderer_init.cpp, renderer_rasterization.cpp, frame_context.h
GTAO Compute Shader — Horizon Search and Analytic Integration
The gtao.comp shader implements the core GTAO algorithm. For each pixel, it reconstructs the view-space position from depth, rotates the world-space normal into view-space, then performs a structured horizon search across multiple directions in screen-space.
View-Space Reconstruction and Noise
The shader reconstructs view-space position from the depth buffer using inv_projection, accounting for the Y-flipped viewport used by scene passes (negative viewport height convention). The world-space normal stored in R10G10B10A2_UNORM (encoded as n * 0.5 + 0.5) is decoded and rotated to view-space via the upper-left 3×3 of the view matrix.
Temporal noise decorrelation uses interleaved gradient noise (Jorge Jimenez) modulated by the golden ratio fractional part, producing quasi-random per-pixel offsets that vary smoothly across frames:
base_noise = fract(IGN(pixel) + frame_index % 64 * φ)
dir_noise = base_noise // direction jitter
step_noise = fract(base_noise * 0.7 + 0.15) // step jitter within directionThis decorrelation is critical for temporal accumulation — diverse noise patterns across frames allow the temporal filter to effectively multiply the sample count.
Sources: gtao.comp, noise.glsl, normal.glsl
Horizon Search with Distance Falloff
For each of directions search directions, the shader projects a tangent vector in screen-space and marches steps_per_dir samples along both positive and negative directions. Sample positions follow a quadratic power curve (t = linear_t²) that clusters samples near the pixel center where crevice and corner detail matters most, with sparse samples at the periphery for large-scale occlusion.
The distance falloff uses a flat-inner/steep-outer profile with kFalloffRange = 0.615 (matching XeGTAO's auto-tuned optimal): samples within radius × (1 − 0.615) receive full weight, beyond which weight linearly decays to zero at the radius boundary. This preserves near-surface detail while aggressively cutting distant contributions.
| Parameter | Description | Typical Range |
|---|---|---|
radius |
World-space sampling radius (meters) | 0.1 – 5.0 |
directions |
Number of search directions | 2 / 4 / 8 |
steps_per_dir |
Ray march steps per direction | 2 / 4 / 8 |
thin_compensation |
Blend toward tangent plane (0 = off) | 0.0 – 0.7 |
intensity |
AO power curve exponent | 0.5 – 3.0 |
frame_index |
Frame counter for temporal noise | monotonically increasing |
Sources: gtao.comp, GTAOPushConstants, AOConfig
Analytic Integration and Bent Normal
The key insight of GTAO is that once the horizon angles h1 and h2 (positive and negative sides) are found for a given direction, the cosine-weighted visibility integral can be evaluated analytically rather than stochastically:
inner_integral(h, n) = 0.25 × (−cos(2h − n) + cos(n) + 2h × sin(n))
slice_visibility(h1, h2, n) = inner_integral(h1, n) + inner_integral(h2, −n)The bent normal computation follows Algorithm 2 from the GTAO paper — the per-slice cosine-weighted mean unoccluded direction. All compound-angle trigonometric expressions are expanded using identities (triple-angle: sin(3x) = 3s − 4s³, sum/difference products) so the entire bent normal integral requires zero additional sin(), cos(), or acos() calls beyond what the AO integral already computed. This is a significant optimization on GPU architectures where transcendental functions are expensive.
The final output is packed as RGBA8 where RGB = bent normal (encoded as n × 0.5 + 0.5) and A = diffuse AO value. If the accumulated bent normal is degenerate (near-zero length), it falls back to the surface normal.
Sources: gtao.comp, gtao.comp bent normal, gtao.comp output
Spatial Denoising — 5×5 Edge-Aware Bilateral Blur
The raw GTAO output is noisy at low sample counts (e.g., 4 directions × 4 steps = 16 samples per pixel). The ao_spatial.comp shader applies a 5×5 bilateral blur that combines Gaussian spatial weights (σ = 1.5) with depth-based edge weights to prevent AO bleeding across geometric boundaries.
Path-Accumulated Edge Weights
The critical design decision is the edge weight computation. Rather than computing a simple pairwise weight between the center pixel and each tap, the shader accumulates edge weights along a grid path from center to target — horizontal leg first, then vertical leg at the target column. Each leg multiplies the adjacent-pixel depth edge weights encountered along the way.
This path-accumulation approach prevents "tunneling": even if two pixels share identical depths, an intervening depth discontinuity (a thin wall, a one-pixel edge between two surfaces at the same distance) drives the accumulated path weight to zero. A naïve bilateral filter would blur across such boundaries; the path approach correctly blocks them.
The edge weight function uses relative depth comparison for scene-scale independence:
depth_edge_weight(z1, z2) = clamp(1 − |z1 − z2| / (max(z1, z2) × 0.05), 0, 1)Linearized depth values (via projection[3][2] / (d + projection[2][2])) ensure uniform rejection sensitivity at all camera distances, unlike raw Reverse-Z values which compress non-linearly.
Sources: ao_spatial.comp, depth.glsl
Temporal Denoising — Reprojection with Three-Layer Rejection
The ao_temporal.comp shader performs exponential moving average temporal accumulation by reprojecting the current pixel back to the previous frame's screen-space, sampling the history buffer, and blending with the spatially denoised current-frame result. The blend factor (temporal_blend) is a runtime parameter (0.0–0.98), set to 0.0 on the first frame or when history is invalidated (resize, first frame).
Reprojection Pipeline
The reprojection follows: screen UV + depth → NDC → clip → world space (via inv_view_projection) → previous frame clip (via prev_view_projection) → previous NDC → previous UV. The Y-flip conventions are handled symmetrically in both directions.
Three-Layer Rejection
History rejection prevents ghosting artifacts from temporal accumulation of stale data. The shader applies three progressively stricter tests:
| Layer | Test | Rejection Condition |
|---|---|---|
| 1 — UV Validity | Is reprojected UV within [0,1]? | Off-screen pixels reject entirely |
| 2 — Depth Consistency | Linearized depth comparison | |expected − stored| / expected > 5% |
| 3 — Neighborhood Clamp | 3×3 min/max from current frame | History AO clamped to [min, max] range |
Layers 1 and 2 produce hard rejection (history discarded entirely, output = current frame). Layer 3 produces soft clamping — the history AO value is retained but clamped into the 3×3 neighborhood min/max range, which handles slow-moving disocclusions gracefully.
Bent normal (RGB) is NOT neighborhood-clamped — min/max of encoded direction vectors has no physical meaning. The bent normal shares the rejection flag (zero-blended on hard rejection) but otherwise blends directly.
After blending, the output bent normal is decoded, re-normalized, and re-encoded to prevent drift from unit length caused by averaging encoded values.
Sources: ao_temporal.comp, AOTemporalPass::record, frame_context.h
Pass Infrastructure — Descriptor Layout and Dispatch Pattern
All three passes share an identical architectural pattern: each owns a Vulkan compute pipeline, a custom Set 3 push descriptor layout for per-dispatch I/O, and binds the global Sets 0–2 (per-frame uniforms, bindless textures, render targets). Passes are registered with the render graph via record(), which declares resource usage (read/write/compute stage) and provides a lambda callback for command buffer recording.
| Pass | Set 3 Bindings | Push Constants | Workgroup |
|---|---|---|---|
| GTAO | binding 0: storage_image (ao_noisy) |
GTAOPushConstants (24 bytes) |
8×8 |
| AO Spatial | binding 0: storage_image (ao_blurred), binding 1: combined_image_sampler (ao_noisy) |
None | 8×8 |
| AO Temporal | binding 0: storage_image (ao_filtered), binding 1: sampler (ao_blurred), binding 2: sampler (ao_history), binding 3: sampler (depth_prev) |
AOTemporalPushConstants (4 bytes) |
8×8 |
The dispatch dimensions are computed as ceil(width / 8) × ceil(height / 8), with the shader's bounds check (pixel >= size → return) handling the fringe case where the resolution isn't a multiple of 8. The temporal pass also reads depth from Set 2 (binding 1) for reprojection, and the render graph updates the Set 2 binding 3 (rt_ao_texture) each frame with the ao_filtered backing image so the Forward Pass can sample it.
Sources: gtao_pass.h, ao_spatial_pass.h, ao_temporal_pass.h, renderer_rasterization.cpp
Consumption in the Forward Pass
The Forward Pass samples rt_ao_texture (Set 2, binding 3 — the temporally filtered ao_filtered image) to extract both diffuse AO (A channel) and the bent normal (RGB, decoded and transformed back to world-space). Two specular occlusion modes are available at runtime via ao_so_mode:
- Lagarde approximation (
AO_SO_LAGARDE): Estimates specular occlusion from diffuse AO, view angle, and roughness — no bent normal required. A useful baseline comparison. - GTSO cone intersection (
AO_SO_GTSO): Computes the overlap between a visibility cone (derived from bent normal + AO) and a specular lobe cone (reflection direction + roughness), withao²grazing-angle compensation to eliminate false darkening.
Diffuse AO is combined with material-baked occlusion (occlusion_tex × occlusion_strength) and then passed through multi-bounce AO color compensation (Jimenez 2016), which prevents over-darkening on light-colored surfaces by modeling inter-bounce light scattering as a per-channel polynomial correction.
Sources: forward.frag, bindings.glsl, scene_data.h
Runtime Configuration and Debug Visualization
All AO parameters are exposed through the Debug UI for real-time tuning. The AOConfig struct lives in application space and is passed to passes via FrameContext::ao_config, which is dereferenced into push constants each frame. The render graph's history validity tracking (is_history_valid()) automatically resets temporal accumulation on first frame and resize events.
| Control | Field | Range | Notes |
|---|---|---|---|
| Radius | radius |
0.1 – 5.0 m | World-space sampling extent |
| Directions | directions |
2 / 4 / 8 | Quality/performance tradeoff |
| Steps/Dir | steps_per_dir |
2 / 4 / 8 | Per-direction sample density |
| Thin Compensation | thin_compensation |
0.0 – 0.7 | XeGTAO quality at 0.7 |
| Intensity | intensity |
0.5 – 3.0 | Power curve exponent |
| Temporal Blend | temporal_blend |
0.0 – 0.98 | History weight (higher = smoother) |
| GTSO | use_gtso |
on/off | Bent normal specular occlusion |
The debug render mode DEBUG_MODE_AO (mode 7) visualizes the raw AO value as a grayscale passthrough, and DEBUG_MODE_AO_SSAO (mode 9) provides a comparison view. Both are selected via debug_render_mode in the GlobalUBO.
Sources: debug_ui.cpp, forward.frag debug modes, bindings.glsl
See Also
- Forward Pass — Cook-Torrance PBR, IBL Split-Sum, and Multi-Bounce AO — consumption of
ao_filteredfor diffuse/specular occlusion - Depth Prepass — Z-Fill for Zero-Overdraw Forward Rendering — produces the depth and normal buffers that GTAO reads
- Render Graph — Automatic Barrier Insertion and Pass Orchestration — managed image lifecycle and history tracking
- GLSL Shader Architecture — Shared Bindings, BRDF Library, and Feature Flags — Set 0–2 layout and
FEATURE_AOflag