**Ray Tracing: GPU Edition**
[Arman Uguray][]
Draft
!!! WARNING
This is a living document for a work in progress. Please bear in mind that the contents will
change frequently and go through many edits before the final version.
Introduction
====================================================================================================
_Ray Tracing_ is a rendering method in Computer Graphics that simulates the flow of light. It can
faithfully recreate a variety of optical phenomena and can be used to render photorealistic images.
_Path tracing_ is an application of this approach used to compute _Global Illumination_. Its
core idea is to repeatedly trace millions of random rays through the scene and bounce them off
objects based on surface properties. The algorithm is remarkably simple and relatively easy
to implement when applied to a small number of material and geometry types. Peter
Shirley's [_Ray Tracing In One Weekend_][RTIOW] (RTIOW) is a great introduction to building the
foundation for a hobby renderer.
A challenge with path tracing is its high computational cost. Rendering a complex scene takes a
long time and this get worse as the rendered scenes get complex. This has historically made path
tracing unsuitable for real-time applications. Fortunately -- like many problems in Computer
Graphics -- the algorithm lends itself very well to parallelism. It is possible to achieve a
significant speedup by distributing the work across many processor cores.
The GPU (Graphics Processing Unit) is a type of processor designed to run the same set of operations
over large amounts of data in parallel. This parallelism has been instrumental to achieving
realistic visuals in real-time applications like video games. GPUs have been traditionally used to
accelerate scanline rasterization but have since become programmable and capable of running
a variety of parallel workloads. Notably, modern GPUs are now equipped with hardware cores dedicated
to ray tracing.
GPUs aren't without limitations. Programming a GPU requires a different approach than a typical CPU
program. Taking full advantage of a GPU often involves careful tuning based on its architecture and
capabilities which can vary widely across vendors and models. Rendering fully path-traced scenes
at real-time rates remains elusive even on the most high-end GPUs. This is an an active and vibrant
area of Computer Graphics research.
This book is an introduction to GPU programming by building a simple GPU accelerated path tracer.
We'll focus on building a renderer that can produce high quality and correct images using a fairly
simple design. It won't be full-featured and its performance will be limited, however it will expose
you to several fundamental GPU programming concepts. By the end, the renderer you'll have built can
serve as a great starting point for extensions and experiments with more advanced GPU techniques. We will
avoid most optimizations in favor of simplicity but the renderer will be able to achieve interactive
frame rates on a decent GPU when targeting simple scenes.[^ch1] The accompanying code intentionally
avoids hardware ray tracing APIs that are present on newer GPU models, instead focusing on
implementing the same functionality on a programmable GPU unit using a shading language.
This book follows a similar progression to [_Ray Tracing In One Weekend_][RTIOW]. It covers some of
the same material but I highly recommend completing _RTIOW_ before embarking on building
the GPU version. Doing so will teach you the path tracing algorithm in a much more approachable
way and it will make you appreciate both the advantages and challenges of moving to a GPU-based
architecture.
If you run into any problems with your implementation, have general questions or corrections, or
would like to share your own ideas or work, check out [the GitHub Discussions forum][discussions].
[^ch1]: A BVH-accelerated implementation can render a version of the RTIOW cover scene with ~32,000
spheres, 16 ray bounces per pixel, and a resolution of 2048x1536 on a 2022 _Apple M1 Max_ in 15
milliseconds. The same renderer performs very poorly on a 2019 _Intel UHD Graphics 630_ which takes
more than 200ms to render a single sample.
GPU APIs
--------
Interfacing with a GPU and writing programs for it typically requires the use of a special API. This
interface depends on your operating system and GPU vendor. You often have various options depending
on the capabilities you want. For example, an application that wants to get the most juice out of a
NVIDIA GPU for general purpose computations may choose to target CUDA. A developer who prefers
broad hardware compatibility for a graphical mobile game may choose OpenGL ES or Vulkan. Direct3D
(D3D) is the main graphics API on Microsoft platforms while Metal is the preferred framework on
Apple systems. Vulkan, D3D12, and Metal all support an API specifically to accelerate ray
tracing.
You can implement this book using any API or framework that you prefer, though I generally assume
you are working with a graphics API. In my examples I use an API based on [WebGPU][webgpu],
which I think maps well to all modern graphics APIs. The code
examples should be easy to adapt to those libraries. I avoid using ray tracing APIs (such as
[DXR][dxr] or [Vulkan Ray Tracing][vkrt]) to show you how to implement similar functionality on
your own.
If you're looking to implement this in CUDA, you may also be interested in Roger Allen's
[blog post][rtiow-cuda] titled _Accelerated Ray Tracing in One Weekend in CUDA_.
Example Code
------------
Like _RTIOW_, you'll find code examples throughout the book. I use [Rust][] as
the implementation language but you can choose any language that supports your GPU API of choice. I avoid
most esoteric aspects of Rust to keep the code easily understandable to a large audience. On the few
occasions where I had to resort to a potentially unfamiliar Rust-ism, I provide a C example to add
clarity.
I provide the finished source code for this book on [GitHub][gt-project] as a reference but I
encourage you to type in your own code. I decided to also provide a minimal source template that you
can use as a starting point if you want to follow along in Rust. The template provides a small
amount of setup code for the windowing logic to help get you started.
### A note on Rust, Libraries, and APIs
I chose Rust for this project because of its ease of use and portability. It is also the language
that I tend to be most productive in.
An important aspect of Rust is that a lot of common functionality is provided by libraries outside
its standard library. I tried to avoid external dependencies as much as possible except for the
following:
* I use *[wgpu][]* to interact with the GPU. This is a native graphics API based on
WebGPU. It's portable and allows the example code to run on Vulkan, Metal, Direct3D 11/12, OpenGL
ES 3.1, as well as WebGPU and WebGL via WebAssembly.
wgpu also has [native bindings in other languages](https://github.com/gfx-rs/wgpu-native).
* I use [*winit*](https://docs.rs/winit/latest/winit/) which is a portable windowing library. It's
used to display the rendered image in real-time and to make the example code interactive.
* For ease of Rust development I use [*anyhow*](https://docs.rs/anyhow/latest/anyhow/) and
[*bytemuck*](https://docs.rs/bytemuck/latest/bytemuck/). *anyhow* is a popular error handling
utility and integrates seamlessly. *bytemuck* provides a safe abstraction for the equivalent of
`reinterpret_cast` in C++, which normally requires [`unsafe`][rust-unsafe] Rust. It's used to
bridge CPU data types with their GPU equivalents.
* Lastly, I use [*pollster*](https://docs.rs/pollster/latest/pollster/) to execute asynchronous
wgpu API functions (which is only called from a single line).
[wgpu][] is the most important dependency as it defines how the example code interacts with the
GPU. Every GPU API is different but their abstractions for the general concepts used in this book
are fairly similar. I will highlight these differences occasionally where they matter.
A large portion of the example code runs on the GPU. Every graphics API defines a programming
language -- a so called **shading language** -- for authoring GPU programs. wgpu is based on WebGPU,
as such my GPU code examples are written in *WebGPU Shading Language* (WGSL)[^ch1.2.1].
I also recommend keeping the following references handy while you're developing:
* wgpu API documentation (version 0.19.1): https://docs.rs/wgpu/0.19.1/wgpu
* WebGPU specification: https://www.w3.org/TR/webgpu
* WGSL specification: https://www.w3.org/TR/WGSL
With all of that out of the way, let's get started!
[^ch1.2.1]: wgpu also supports shaders in the
[SPIR-V](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html) binary format. You could
in theory write your shaders in a shading language that can compile to SPIR-V (such as OpenGL's GLSL
and Direct3D's HLSL) as long as you avoid any language features that can't be expressed in WGSL.
Windowing and GPU Setup
====================================================================================================
The first thing to decide is how you want to view your image. One option is to write the output from
the GPU to a file. I think a more fun option is to display the image inside an application window.
I prefer this approach because it allows you to see your rendering as it resolves over time and it
will allow you to make your application interactive later on. The downside is that it requires a
little bit of wiring.
First, your program needs a way to interact with your operating system to create and manage a
window. Next, you need a way to coordinate your GPU workloads to output a sequence of images at the
right time for your OS to be able to composite it inside the window and send it to your display.
Every operating system with a graphical UI provides a native *windowing API* for this purpose.
Graphics APIs typically define some way to integrate with a windowing system. You'll have various
libraries to choose from depending on your OS and programming language. You mainly need to make sure
that the windowing API or UI toolkit you choose can integrate with your graphics API.
In my examples I use *winit* which is a Rust framework that integrates smoothly with wgpu. I put
together a [project template][gt-template] that sets up the library boilerplate for the window
handling. You're welcome to use it as a starting point.
The setup code isn't a lot, so I'll briefly go over the important pieces in this chapter.
The Event Loop
--------------
The first thing the template does is create a window and associate it with an *event loop*. The OS
sends a message to the application during important "events" that the application should act on,
such as a mouse click or when the window gets resized. Your application can wait for these events
and handle them as they arrive by looping indefinitely:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
use {
anyhow::{Context, Result},
winit::{
event::{Event, WindowEvent},
event_loop::{ControlFlow, EventLoop},
window::{Window, WindowBuilder},
},
};
const WIDTH: u32 = 800;
const HEIGHT: u32 = 600;
fn main() -> Result<()> {
let event_loop = EventLoop::new()?;
let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT);
let window = WindowBuilder::new()
.with_inner_size(window_size)
.with_resizable(false)
.with_title("GPU Path Tracer".to_string())
.build(&event_loop)?;
// TODO: initialize renderer
event_loop.run(|event, control_handle| {
control_handle.set_control_flow(ControlFlow::Poll);
match event {
Event::WindowEvent { event, .. } => match event {
WindowEvent::CloseRequested => control_handle.exit(),
WindowEvent::RedrawRequested => {
// TODO: draw frame
window.request_redraw();
}
_ => (),
},
_ => (),
}
})?;
Ok(())
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [main-initial]: [main.rs] Creating a window and handling window events]
This code creates a window titled "GPU Path Tracer" and kicks off an event loop.
`event_loop.run()` internally waits for window events and notifies your application by calling the
lambda function that it gets passed as an argument.
The lambda function only handles a few events for now. The most important one is `RedrawRequested`
which is the signal to render and present a new frame. `MainEventsCleared` is simply an event that
gets sent when all pending events have been processed. We call `window.request_redraw()` to draw
repeatedly -- this triggers a new `RedrawRequested` event which is followed by another
`MainEventsCleared`, which requests a redraw, and so on until someone closes the window.
Running this code should bring up an empty window like this:
![Figure [empty-window]: Empty Window](../images/img-01-empty-window.png)
GPU and Surface Initialization
------------------------------
The next thing the template does is establish a connection to the GPU and configure a surface. The
surface manages a set of *textures* that allow the GPU to render inside the window.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
async fn connect_to_gpu(window: &Window) -> Result<(wgpu::Device, wgpu::Queue, wgpu::Surface)> {
use wgpu::TextureFormat::{Bgra8Unorm, Rgba8Unorm};
// Create an "instance" of wgpu. This is the entry-point to the API.
let instance = wgpu::Instance::default();
// Create a drawable "surface" that is associated with the window.
let surface = instance.create_surface(window)?;
// Request a GPU that is compatible with the surface. If the system has multiple GPUs then
// pick the high performance one.
let adapter = instance
.request_adapter(&wgpu::RequestAdapterOptions {
power_preference: wgpu::PowerPreference::HighPerformance,
force_fallback_adapter: false,
compatible_surface: Some(&surface),
})
.await
.context("failed to find a compatible adapter")?;
// Connect to the GPU. "device" represents the connection to the GPU and allows us to create
// resources like buffers, textures, and pipelines. "queue" represents the command queue that
// we use to submit commands to the GPU.
let (device, queue) = adapter
.request_device(&wgpu::DeviceDescriptor::default(), None)
.await
.context("failed to connect to the GPU")?;
// Configure the texture memory backs the surface. Our renderer will draw to a surface texture
// every frame.
let caps = surface.get_capabilities(&adapter);
let format = caps
.formats
.into_iter()
.find(|it| matches!(it, Rgba8Unorm | Bgra8Unorm))
.context("could not find preferred texture format (Rgba8Unorm or Bgra8Unorm)")?;
let size = window.inner_size();
let config = wgpu::SurfaceConfiguration {
usage: wgpu::TextureUsages::RENDER_ATTACHMENT,
format,
width: size.width,
height: size.height,
present_mode: wgpu::PresentMode::AutoVsync,
alpha_mode: caps.alpha_modes[0],
view_formats: vec![],
desired_maximum_frame_latency: 3,
};
surface.configure(&device, &config);
Ok((device, queue, surface))
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [main-initial]: [main.rs] The connect_to_gpu function]
The code that sets this all up is a bit wordy. I'll quickly go over the important bits:
1. What the first ~20 lines do is request a connection to a GPU that is compatible with the
window. The bit about `wgpu::PowerPreference::HighPerformance` is a hint to the API that we want
the higher-powered GPU if the current system has more than one available.
2. The rest of the function configures the dimensions, pixel format, and presentation mode of the
surface. `Rgba8Unorm` and `Bgra8Unorm` are common pixel formats that store each color component
(red, green, blue, and alpha) as an 8-bit unsigned integer. The "unorm" part stands for "unsigned
normalized", which means that our rendering code can represent the component values as a real
number in the range `[0.0, 1.0]`. We set the size to simply span the entire window.
The bit about `wgpu::PresentMode::AutoVsync` tells the surface to synchronize the presentation of
each frame with the display's refresh rate. The surface will manage an internal queue of textures
for us and we will render to them as they become available. This prevents a visual artifact known
as "tearing" (which can happen when frames get presented faster than the display refresh rate) by
setting up the renderer to be *v-sync locked*. We will discuss some of the implications of this
later on.
The last bit that I'll highlight here is `wgpu::TextureUsage::RENDER_ATTACHMENT`. This just
indicates that we are going to use the GPU's rendering function to draw directly into the surface
textures.
After setting all this up the function returns 3 objects: A `wgpu::Device` that represents the
connection to the GPU, a `wgpu::Queue` which we'll use to issue commands to the GPU, and a
`wgpu::Surface` that we'll use to present frames to the window. We will talk a lot about the first
two when we start putting together our renderer in the next chapter.
You may have noticed that the function declaration begins with `async`. This marks the function as
*asynchronous* which means that it doesn't return its result immediately. This is only necessary
because the API functions that we invoke (`wgpu::Instance::request_adapter` and
`wgpu::Adapter::request_device`) are asynchronous functions. The `.await` keyword is syntactic sugar
that makes the asynchronous calls appear like regular (synchronous) function calls. What happens
under the hood is somewhat complex but I wouldn't worry about this too much since this is the one
and only bit of asynchronous code that we will encounter. If you want to learn more about it, I
recommend checking out the [Rust Async Book](https://rust-lang.github.io/async-book/).
### Completing Setup
Finally, the `main` function needs a couple updates: first we make it `async` so that it we can
"await" on `connect_to_gpu`. Technically the `main` function of a program cannot be async and
running an async function requires some additional utilities. There are various alternatives but I
chose to use a library called `pollster`. The library provides a special macro (called `main`) that
takes care of everything. Again, this is the only asynchronous code that we'll encounter so don't
worry about what it does.
The second change to the main function is where it handles the `RedrawRequested` event. For every
new frame, we first request the next available texture from the surface that we just created. The
queue has a limited number of textures available. If the CPU outpaces the GPU (i.e. the GPU takes
longer than a display refresh cycle to finish its tasks), then calling
`surface.get_current_texture()` can block until a texture becomes available.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
#[pollster::main]
async fn main() -> Result<()> {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
let event_loop = EventLoop::new()?;
let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT);
let window = WindowBuilder::new()
.with_inner_size(window_size)
.with_resizable(false)
.with_title("GPU Path Tracer".to_string())
.build(&event_loop)?;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let (device, queue, surface) = connect_to_gpu(&window).await?;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
// TODO: initialize renderer
event_loop.run(|event, control_handle| {
control_handle.set_control_flow(ControlFlow::Poll);
match event {
Event::WindowEvent { event, .. } => match event {
WindowEvent::CloseRequested => control_handle.exit(),
WindowEvent::RedrawRequested => {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
// Wait for the next available frame buffer.
let frame: wgpu::SurfaceTexture = surface
.get_current_texture()
.expect("failed to get current texture");
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
// TODO: draw frame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
frame.present();
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
window.request_redraw();
}
_ => (),
},
_ => (),
}
})?;
Ok(())
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [main-setup-complete]: [main.rs] Putting together the initial main function]
Once a frame texture becomes available, the example issues a request to display it as soon as
possible by calling `frame.present()`. All of our rendering work will be scheduled before this call.
That was a lot of boilerplate -- this is sometimes necessary to interact with OS resources. With all
of this in place, we can start building a real-time renderer.
### A note on error handling in Rust
If you're new to Rust, some of the patterns above may look unfamiliar. One of these is error
handling using the `Result` type. I use this pattern frequently enough that it's worth a quick
explainer.
A `Result` is a variant type that can hold either a success (`Ok`) value or an error (`Err`) value.
The types of the `Ok` and `Err` variants are generic:
`T` and `E` can be any type. It's common for a library to define its own error types to represent
various error conditions.
The idea is that a function returns a `Result` if it has a failure mode. A caller must check the
status of the `Result` to unpack the return value or recover from an error.
In a C program, a common way to handle an error is to return early from the calling function and
and perhaps return an entirely new error. For example:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C
bool function_with_a_result(Foo* out_result);
int main() {
Foo foo;
if (!function_with_result(&foo)) {
return -1;
}
// ...do something with `foo`...
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rust provides the `?` operator to automatically unpack a `Result` and return early if it holds an
error. A Rust version of the C program above could be written like this:
If `function_with_result()` returns an error, the `?` operator will cause `caller` to return and
propagate the error value. This works as long as `caller` and `function_with_result` either return
the same error type or types with a known conversion. There are various other ways to handle an
error:
I like to keep things simple in my code examples and use the `?` operator. Instead of defining
custom error types and conversions, I use a catch all `Error` type from a library called *anyhow*.
You'll often see the examples include `anyhow::Result` (an alias for `Result<, anyhow::Error>`)
and `anyhow::Context`. The latter is a useful trait for adding an error message while converting to
an `anyhow::Error`:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
fn caller() -> anyhow::Result<()> {
let foo: Foo = function_with_result().context("failed to get foo")?;
// ...do something with `foo`...
Ok(())
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can read more about the `Result` type in [its module
documentation](https://doc.rust-lang.org/std/result/index.html).
Drawing Pixels
====================================================================================================
At this stage, we have code that brings up a window, connects to the GPU, and sets up a queue of
textures that is synchronized with the display. In Computer Graphics, the term "texture" is
generally used in the context of *texture mapping*, which is a technique to apply detail to geometry
using data stored in memory. A very common application is to map color data from the pixels of a 2D
image onto the surface of a 3D polygon.
Texture mapping is so essential to real-time graphics that all modern GPUs are equipped with
specialized hardware to speed up texture operations. It's not uncommon for a modern video game to
use texture assets that take up hundreds of megabytes. Processing all of that data involves a lot
of memory traffic which is a big performance bottleneck for a GPU. This is why GPUs come with
dedicated texture memory caches, sampling hardware, compression schemes and other features to
improve texture data throughput.
We are going to use the texture hardware to store the output of our renderer. In wgpu, a *texture
object* represents texture memory that can be used in three main ways: texture mapping, shader
storage, or as a *render target*[^ch3-cit1]. A surface texture is a special kind of texture that can
only be used as a render target.
Not all native APIs have this restriction. For instance, both Metal and Vulkan allow their version
of a surface texture -- a *frame buffer* (Metal) or *swap chain* (Vulkan) texture -- to be
configured for other usages, though this sometimes comes with a warning about impaired performance
and is not guaranteed to be supported by the hardware.
wgpu doesn't provide any other option so I'm going to start by implementing a render pass. This is
a fundamental and very widely used function of the GPU, so it's worth learning about.
[^ch3-cit1]: See [`wgpu::TextureUsages`](https://docs.rs/wgpu/0.17.0/wgpu/struct.TextureUsages.html).
The render Module
---------------------
I like to separate the rendering code from all the windowing code, so I'll start by creating a file
named `render.rs`. Every Rust file makes up a *module* (with the same name) which serves as a
namespace for all functions and types that are declared in it. Here I'll add a data structure called
`PathTracer`. This will hold all GPU resources and eventually implement our path tracing algorithm:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
}
impl PathTracer {
pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
// TODO: initialize GPU resources
PathTracer {
device,
queue,
}
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render-initial]: [render.rs] The PathTracer structure]
We start out with an associated function called `PathTracer::new` which will serve as the
constructor and eventually initialize all GPU resources. The `PathTracer` takes ownership of the
`wgpu::Device` and `wgpu::Queue` that we created earlier and it will hold on to them for the rest of
the application's life.
`wgpu::Device` represents a connection to the GPU. It is responsible for creating resources like
texture, buffer, and pipeline objects. It also defines some methods for error handling.
The first thing I do is set up an "uncaptured error" handler. If you look at the [declarations
](https://docs.rs/wgpu/0.17.0/wgpu/struct.Device.html) of resource creation methods you'll notice
that none of them return a `Result`. This doesn't mean that they always succeed, as a matter of fact
all of these operations can fail. This is because wgpu closely mirrors the WebGPU API which uses a
concept called *error scopes* to detect and respond to errors.
Whenever there's an error that I don't handle using an error scope it will trigger the uncaptured
error handler, which will print out an error message and abort the program[^ch3.1-cit1]. For now,
I won't set up any error scopes in `PathTracer::new` and I'll abort the program if the API fails to
create the initial resources.
Next, let's declare the `render` module and initialize a `PathTracer` in the `main` function:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Rust highlight
mod render;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
const WIDTH: u32 = 800;
const HEIGHT: u32 = 600;
#[pollster::main]
async fn main() -> Result<()> {
let event_loop = EventLoop::new();
let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT);
let window = WindowBuilder::new()
.with_inner_size(window_size)
.with_resizable(false)
.with_title("GPU Path Tracer".to_string())
.build(&event_loop)?;
let (device, queue, surface) = connect_to_gpu(&window).await?;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Rust highlight
let renderer = render::PathTracer::new(device, queue);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
event_loop.run(move |event, _, control_flow| {
control_handle.set_control_flow(ControlFlow::Poll);
match event {
Event::WindowEvent { event, .. } => match event {
WindowEvent::CloseRequested => control_handle.exit(),
WindowEvent::RedrawRequested => {
// Wait for the next available frame buffer.
let frame: wgpu::SurfaceTexture = surface
.get_current_texture()
.expect("failed to get current texture");
// TODO: draw frame
frame.present();
window.request_redraw();
}
_ => (),
},
_ => (),
}
});
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [main-renderer-init]: [main.rs] Initializing a Renderer]
Now that we have the skeleton in place, it's time to paint some pixels on the screen.
[^ch3.1-cit1]: This is actually the default behavior so I didn't really need to call
`on_uncaptured_error`.
Display Pipeline
----------------
Before setting up the render pass let's first talk about how it works. Traditionally, graphics
systems have been modeled after an abstraction called the *graphics pipeline*.[#Hughes13] At a
very high level, the input to the pipeline is a mathematical model that describes what to draw
-- such as geometry, materials, and light -- and the output is a 2D grid of pixels. This
transformation is processed in a series of standard *pipeline stages* which form the basis of the
rendering abstraction provided by GPUs and graphics APIs. wgpu uses the term *render pipeline* which
is what I'll use going forward.
The input to the render pipeline is a polygon stream represented by points in 3D space and their
associated data. The polygons are described in terms of geometric primitives (points, lines, and
triangles) which consist of *vertices*. The *vertex stage* transforms each vertex from the input
stream into a 2D coordinate space that corresponds to the viewport. After some additional processing
(such as clipping and culling) the assembled primitives are passed on to the *rasterizer*.
The rasterizer applies a process called scan conversion to determine the pixels that are covered by
each primitive and breaks them up into per-pixel *fragments*. The output of the vertex
stage (the vertex positions, texture coordinates, vertex colors, etc) gets interpolated between the
vertices of the primitive and the interpolated values get assigned to each fragment. Fragments are
then passed on to the *fragment stage* which computes an output (such as the pixel or sample color)
for each fragment. Shading techniques such as texture mapping and lighting are usually performed
in this stage. The output then goes through several other operations before getting written to the
render target as pixels.[^ch3-footnote1]
![Figure [render-pipeline]: Vertex and Fragment stages of the render pipeline
](../images/fig-01-render-pipeline.svg)
What I just described is very much a data pipeline: a data stream goes through a series of
transformations in stages. The input to each stage is defined in terms of smaller elements (e.g.
vertices and pixel-fragments) that can be processed in parallel. This is the fundamental principle
behind the GPU.
Early commercial GPUs implemented the graphics pipeline entirely in fixed-function hardware. Modern
GPUs still use fixed-function stages (and at much greater data rates) but virtually all of them
allow you to program the vertex and fragment stages with custom logic using *shader programs*.
[^ch3-footnote1]: I glossed over a few pipeline stages (such as geometry and tessellation) and
important steps like multi-sampling, blending, and the scissor/depth/stencil tests. These play an
important role in many real-time graphics applications but we won't make use of them in our path
tracer.
### Compiling Shaders
Let's put together a render pipeline that draws a red triangle. We'll define a vertex shader that
outputs the 3 corner vertices and a fragment shader that outputs a solid color. We'll write
these shaders in the WebGPU Shading Language (WGSL).
Go ahead and create a file called `shaders.wgsl` to host all of our WGSL code (I put it next to the
Rust files under `src/`). Before we can run this code on the GPU we need to compile it into a
form that can be executed on the GPU. We start by creating a *shader module*:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
}
impl PathTracer {
pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let shader_module = compile_shader_module(&device);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
// TODO: initialize GPU resources
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
PathTracer {
device,
queue,
}
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
fn compile_shader_module(device: &wgpu::Device) -> wgpu::ShaderModule {
use std::borrow::Cow;
let code = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/src/shaders.wgsl"));
device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: None,
source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(code)),
})
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render-shader-module]: [render.rs] Creating the shader module]
The `compile_shader_module` function loads the file we just created into a string using the
`include_str!` macro. This bundles the contents of `shaders.wgsl` into the program binary at build
time. This is followed by a call to `wgpu::Device::create_shader_module` to compile the WGSL source
code.[^ch3-footnote2]
Let's define the vertex and fragment functions, which I'm calling `display_vs` and `display_fs`:
I'm using the "vs" and "fs" suffixes as shorthand for "vertex stage" and "fragment stage". Together,
these two functions form our "display pipeline" (the "display" part will become more clear later).
The `@vertex` and `@fragment` annotations are WGSL keywords that mark these two functions as entry
points to each pipeline stage program.
Since graphics workloads generally involve a high amount of linear algebra, GPUs natively support
SIMD operations over vectors and matrices. All shading languages define built-in types for vectors
and matrices of up to 4 dimensions (4x4 in the case of matrices). The `vec4f` and `vec2f` types that
are in the code represent 4D and 2D vectors of floating point numbers.
`display_vs` returns the vertex position as a `vec4f`. This position is defined relative to a
coordinate space called the *Normalized Device Coordinate Space*. In NDC, the center of the viewport
marks the origin $(0, 0, 0)$. The $x$-axis spans horizontally from $(-1, 0, 0)$ on the left edge of
the viewport to $(1, 0, 0)$ on the right edge while the $y$-axis spans vertically from $(0,-1,0)$ at
the bottom to $(0,1,0)$ at the top. The $z$-axis is directly perpendicular to the viewport, going
*through* the origin.
![Figure [ndc]: Our triangle in Normalized Device Coordinates](../images/fig-02-ndc.svg)
`display_vs` takes a *vertex index* as its parameter. The vertex function gets invoked for every
input vertex across different GPU threads. `vid` identifies the individual vertex that is assigned
to the *invocation*. The number of vertices and where they exist within the topology of the input
geometry is up to us to define. Since we want to draw a triangle, we'll later issue a *draw call*
with 3 vertices and `display_vs` will get invoked exactly 3 times with vertex indices ranging from
$0$ to $2$.
Since our 2D triangle is viewport-aligned, we can set the $z$ coordinate to $0$. The 4th
coordinate is known as a *homogeneous coordinate* used for projective transformations. Don't worry
about this coordinate for now -- just know that for a vector that represents a *position* we set
this coordinate to $1$. We can declare the $x$ and $y$ coordinates for the 3 vertices as an array
of `vec2f` and simply return the element that corresponds to `vid`. I enumerate the vertices in
counter-clockwise order which matches the winding order we'll specify when we create the pipeline.
`display_fs` takes no inputs and returns a `vec4f` that represents the fragment color. The 4
dimensions represent the red, green, blue, and alpha channels of the destination pixel. `display_fs`
gets invoked for all pixel fragments that result from our triangle and the invocations are executed
in parallel across many GPU threads, just like the vertex function. To paint the triangle solid red,
we simply return `vec4f(1., 0., 0., 1.)` for all fragments.
[^ch3-footnote2]: The `Cow::Borrowed` bit is a Rust idiom that creates a "copy-on-write borrow".
This allows the API to take ownership of the WGSL string if necessary. This is not really an
important detail for us.
### Creating the Pipeline Object
Before we can run the shaders, we need to assemble them into a *pipeline state object*. This is
where we specify the data layout of the render pipeline and link the shaders into a runnable binary
program. Let's add a new function called `create_display_pipeline`:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
fn compile_shader_module(device: &wgpu::Device) -> wgpu::ShaderModule {
use std::borrow::Cow;
let code = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/src/shaders.wgsl"));
device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: None,
source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(code)),
})
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
fn create_display_pipeline(
device: &wgpu::Device,
shader_module: &wgpu::ShaderModule,
) -> wgpu::RenderPipeline {
device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
label: Some("display"),
layout: None,
primitive: wgpu::PrimitiveState {
topology: wgpu::PrimitiveTopology::TriangleList,
front_face: wgpu::FrontFace::Ccw,
polygon_mode: wgpu::PolygonMode::Fill,
..Default::default()
},
vertex: wgpu::VertexState {
module: shader_module,
entry_point: "display_vs",
buffers: &[],
},
fragment: Some(wgpu::FragmentState {
module: shader_module,
entry_point: "display_fs",
targets: &[Some(wgpu::ColorTargetState {
format: wgpu::TextureFormat::Bgra8Unorm,
blend: None,
write_mask: wgpu::ColorWrites::ALL,
})],
}),
depth_stencil: None,
multisample: wgpu::MultisampleState::default(),
multiview: None,
})
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [display-pipeline]: [render.rs] The `create_display_pipeline` function]
This code describes a render pipeline that draws a list of triangle primitives. The vertex winding
order is set to counter-clockwise which defines the orientation of the triangle's *front
face*.[^ch3-footnote3]
We request that the interior of each polygon be completely filled (rather than drawing just the
edges or vertices). We specify that `display_vs` is the main function of the vertex stage and that
we're not providing any vertex data from the CPU (since we declared our vertices in the shader
code). Similarly, we set up a fragment stage with `display_fs` as the entry point and a single
color target.[^ch3-footnote4] I set the pixel format of the render target to `Bgra8Unorm` since
that happens to be widely supported on all of my devices. What's important is that you assign a
pixel format that matches the surface configuration in your windowing setup and that your GPU device
supports this as a *render attachment* format.
Let's instantiate the pipeline and store it in the `PathTracer` object. Pipeline creation is
expensive so we want to create the pipeline state object once and hold on to it. We'll reference it
later when drawing a frame:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
display_pipeline: wgpu::RenderPipeline,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
impl PathTracer {
pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
let shader_module = compile_shader_module(&device);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let display_pipeline = create_display_pipeline(&device, &shader_module);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
PathTracer {
device,
queue,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
display_pipeline,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [display-pipeline-init]: [render.rs] Initializing the display pipeline]
[^ch3-footnote3]: The GPU can automatically discard triangles that are oriented away from the
viewport. This is a feature called *back face culling* which our code doesn't make use of.
[^ch3-footnote4]: The `fragment` field of `wgpu::RenderPipelineDescriptor` is optional
(notice the *Some* in `Some(wgpu::FragmentState {...})` ?). A render pipeline that only outputs to
the depth or stencil buffers doesn't have to specify a fragment shader or any color attachments. An
example of this is *shadow mapping*: a shadow map is a texture that stores the distances between a
light source and geometry samples from the scene; it can be produced by a depth-only render-pass
from the point of view of the light source. The shadow map is later sampled from a render pass from
the camera's point of view to determine whether a rasterized point is visible from the light or in
shadow.
The Render Pass
---------------
We now have the pieces in place to issue a draw command to the GPU. The general abstraction modern
graphics APIs define for this is called a "command buffer" (or "command list" in D3D12). You can
think of the command buffer as a memory location that holds the serialized list of GPU commands
representing the sequence of actions we want the GPU to take. To draw a triangle we'll *encode*
a draw command into the command buffer and then *submit* the command buffer to the GPU for exection.
With wgpu, the encoding is abstracted by an object called `wgpu::CommandEncoder`, which we'll use to
record our draw command. Once we are done, we will call `wgpu::CommandEncoder::finish()` to produce
a finalized `wgpu::CommandBuffer` which we can submit to the GPU via the `wgpu::Queue` that we
created at start up.
Let's add a new `PathTracer` function called `render_frame`. This function will take a texture as
its parameter (our *render target*) and tell the GPU to draw to it using the pipeline object we
created earlier:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl PathTracer {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn render_frame(&self, target: &wgpu::TextureView) {
let mut encoder = self
.device
.create_command_encoder(&wgpu::CommandEncoderDescriptor {
label: Some("render frame"),
});
let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
label: Some("display pass"),
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
view: target,
resolve_target: None,
ops: wgpu::Operations {
load: wgpu::LoadOp::Clear(wgpu::Color::BLACK),
store: wgpu::StoreOp::Store,
},
})],
..Default::default()
});
render_pass.set_pipeline(&self.display_pipeline);
// Draw 1 instance of a polygon with 3 vertices.
render_pass.draw(0..3, 0..1);
// End the render pass by consuming the object.
drop(render_pass);
let command_buffer = encoder.finish()
self.queue.submit(Some(command_buffer));
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render_frame-stub]: [render.rs] The `render_frame` function]
`target` here is defined as a `wgpu::TextureView`. wgpu makes the distinction between a texture
resource (represented by `wgpu::Texture`) and how that texture's memory is accessed by a pipeline
(which is represented by the *view* into the texture). When we want to bind a texture we first
create a view with the right properties. In this case we'll assume that the caller already created
a `TextureView` of the render target.
The first thing we do in `render_frame` is create a command encoder. We then tell the encoder to
begin a *render pass*. There are 4 important API calls we make to encode the draw command:
1. Create a `wgpu::RenderPass`. We tell it to store the colors that are output by the render
pipeline to the `target` texture by assigning it as the only color attachment. We also tell it
to clear all pixels of the target to black (i.e. $(0, 0, 0, 1)$ in RGBA) before drawing to it.
2. Assign the render pipeline.
3. Record a single draw with 3 vertices.
4. End the render pass by destroying the `wgpu::RenderPass` object.
We then serialize the command buffer and submit it to the GPU. Finally, let's invoke `render_frame`
from our windowing event loop, using the current surface texture as the render target:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
async fn main() -> Result<()> {
...
event_loop.run(move |event, _, control_flow| {
...
Event::RedrawRequested(_) => {
// Wait for the next available frame buffer.
let frame: wgpu::SurfaceTexture = surface
.get_current_texture()
.expect("failed to get current texture");
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
// TODO: draw frame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let render_target = frame
.texture
.create_view(&wgpu::TextureViewDescriptor::default());
renderer.render_frame(&render_target);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
frame.present();
}
...
});
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render_frame-call]: [main.rs] Rendering to a surface texture]
Running this code should bring up a window that looks like this:
![Figure [first-triangle]: First Triangle](../images/img-02-first-triangle.png)
Finally drawing something! A single triangle may not look that interesting but you can model highly
complex 3D scenes and geometry by putting many of them together. It takes only a few tweaks to the
render pipeline to shape, animate, and render millions of triangles many times per second.
Full-Screen Quad
----------------
The render pipeline that we just put together plays a rather small role in the overall renderer:
its purpose is to display the output of the path-tracer on the window surface.
The output of our renderer is a 2D rectangular image and I would like it to fill the whole window.
We can achieve this by having the render pipeline draw two right triangles that are adjacent at
their hypothenuse. Remember that the viewport coordinates span the range $[-1, 1]$ in NDC, so
setting the 4 corners of the rectangle to $(-1, 1)$, $(1, 1)$, $(1, -1)$, $(-1, -1)$ should cover
the entire viewport regardless of its dimensions.
![Figure [half-screen-quad]: Half-Screen Triangle](../images/img-03-half-screen-quad.png)
That painted only one of the triangles. We also need to update the draw command with the new vertex
count:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl PathTracer {
...
pub fn render_frame(&self, target: &wgpu::TextureView) {
...
render_pass.set_pipeline(&self.display_pipeline);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
// Draw 1 instance of a polygon with 3 vertices.
render_pass.draw(0..3, 0..1);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
// Draw 1 instance of a polygon with 6 vertices.
render_pass.draw(0..6, 0..1);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
// End the render pass by consuming the object.
drop(render_pass);
let command_buffer = encoder.finish()
self.queue.submit(Some(command_buffer));
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render_frame-stub]: [render.rs] The `render_frame` function]
![Figure [full-screen-quad]: Full-Screen Quad](../images/img-04-full-screen-quad.png)
Viewport Coordinates
--------------------
In this setup, every fragment shader invocation outputs the color of a single pixel. We can identify
that pixel using the built-in `position` input to the pipeline stage.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
return vec4f(1.0, 0.0, 0.0, 1.0);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [position-builtin]: [shaders.wgsl] Position Built-In]
The input is defined as a `vec4f`. The $x$ and $y$ coordinates are defined in the _Viewport
Coordinate System_. The origin $(0, 0)$ corresponds to the top-left corner pixel of the viewport.
The $x$-coordinate increases towards the right and the $y$-coordinate increases towards the bottom.
A whole number increment in $x$ or $y$ represents an increment by 1 pixel (and fractional increments
can fall "inside" a pixel). For example, for a viewport with the physical dimensions of
$800\times600$, the coordinate ranges are $0\le x\lt799, 0\le y \lt599$.
![Figure [viewport-coords]: Viewport Coordinate System](../images/fig-03-viewport-coords.svg)
Let's assign every pixel fragment a color based on its position in the viewport by mapping the
coordinates to a color channel (red and green). The render target uses a normalized color format
(i.e. the values must be between $0$ and $1$), so we divide each dimension by the largest possible
value to convert it to that range:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
const WIDTH: u32 = 800u;
const HEIGHT: u32 = 600u;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let color = pos.xy / vec2f(f32(WIDTH - 1u), f32(HEIGHT - 1u));
return vec4f(color, 0.0, 1.0);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [pos-to-color]: [shaders.wgsl]]
There are two language expressions here that are worth highlighting. `pos.xy` is a so called
_vector swizzle_ that extracts the $x$ and $y$ components and produces a `vec2f` containing only
those. Next, we divide that `vec2f` by another `vec2f`. Here, the division operator performs a
component-wise division of every element of the vector on the left-hand side by the corresponding
element on the right-hand side, so `pos.xy / vec2f(f32(WIDTH - 1u), f32(HEIGHT - 1u))` is equivalent
to `vec2f(pos.x / f32(WIDTH - 1u), pos.y / f32(HEIGHT - 1u))`.
Now we are able to separately color each individual pixel. Running this should produce a picture
that looks like this:
![Figure [viewport-gradient]: Viewport Coordinates as a color gradient
](../images/img-05-viewport-gradient.png)
Resource Bindings
====================================================================================================
Our program is split across separate runnable parts: the main executable that runs on the CPU and
pipelines that run on the GPU. As we add more features we will want to exchange data between the
different parts. The main way to achieve this is via memory resources.
The CPU side of our program can create and interact with resources by making API calls. On the GPU
side, the shader program can access those via _bindings_. A binding associates a resource with a
unique slot number that can be referenced by the shader. Each slot is identified by an index number.
The shader code declares a variable for each binding with a decoration that assigns it a binding
index. The CPU side is responsible for setting up the resources for a GPU pipeline according to its
binding layout.
WebGPU introduces an additional concept around bindings called _bind group_. A bind group
associates a group of resources that are frequently bound together.[^ch4-footnote1] Like individual
bindings, each bind group is identified by an index number. Our pipelines won't make use of more
than one bind group at a time, so we'll always assign $0$ as the group index.
[^ch4-footnote1]: The bind group concept is similar to "descriptor set" in Vulkan, "descriptor
table" in D3D12, and "argument buffer" in Metal.
Uniform Declaration
-------------------
The first binding we are going to set up is a _uniform buffer_. Uniforms are read-only data that
don't vary across GPU threads. We are going to use a uniform buffer to store certain globals, like
camera parameters.
Our renderer currently assumes a window dimension of $800\times600$ and declares this in two
different places (`shaders.wgsl` and `main.rs`) which must be kept in sync. Let's make `WIDTH` and
`HEIGHT` uniforms and upload their values from the CPU side. We'll first declare a uniform buffer
and assign it to binding index $0$:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
struct Uniforms {
width: u32,
height: u32,
}
@group(0) @binding(0) var uniforms: Uniforms;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete
const WIDTH: u32 = 800u;
const HEIGHT: u32 = 600u;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let color = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
return vec4f(color, 0.0, 1.0);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [uniform binding declaration]: [shaders.wgsl] Uniform binding declaration]
The `var` declaration tells the compiler that the shader expects a uniform buffer binding.
The type of the binding variable is `Uniforms` which represents the shader's view over the buffer's
memory. Declaring it this way allows the shader to access the contents of the buffer with an
expression like `uniforms.width`.
Bind Group Layout
-----------------
If you run the code now you should get a validation error telling you that the pipeline layout
expects a bind group layout at index $0$. We need to update the display pipeline description with a
layout that includes the new uniform binding. Let's update the `create_display_pipeline` function
to return a `wgpu::BindGroupLayout` alongside the pipeline object:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
impl PathTracer {
pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
let shader_module = compile_shader_module(&device);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let (display_pipeline, display_layout) =
create_display_pipeline(&device, &shader_module);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
}
...
}
...
fn create_display_pipeline(
device: &wgpu::Device,
shader_module: &wgpu::ShaderModule,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
) -> (wgpu::RenderPipeline, wgpu::BindGroupLayout) {
let bind_group_layout = device.create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor {
label: None,
entries: &[
wgpu::BindGroupLayoutEntry {
binding: 0,
visibility: wgpu::ShaderStages::FRAGMENT,
ty: wgpu::BindingType::Buffer {
ty: wgpu::BufferBindingType::Uniform,
has_dynamic_offset: false,
min_binding_size: None,
},
count: None,
},
],
});
let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
label: Some("display"),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
layout: Some(&device.create_pipeline_layout(&wgpu::PipelineLayoutDescriptor {
bind_group_layouts: &[&bind_group_layout],
..Default::default()
})),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
});
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
(pipeline, bind_group_layout)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [display-pipeline-layout]: [render.rs] Display pipeline layout]
This says that the pipeline contains a single bind group, containing a single buffer entry. The
buffer entry has the "uniform" buffer binding type and is visible only to the fragment stage.
Buffer Object
-------------
Let's now create the buffer object that will provide the backing memory for the uniforms. The
size and layout of the memory need to match the `Uniforms` struct that we declared in the WGSL. A
common pattern is to maintain two sets of these declarations (one for the CPU and one for the GPU
side) and keep them in sync. Some frameworks allow you to reuse the same declarations on both sides.
_wgpu_ doesn't provide a utility for this out of the box, so I'm going to redeclare `Uniforms` for
the CPU side:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
use bytemuck::{Pod, Zeroable};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
display_pipeline: wgpu::RenderPipeline,
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
struct Uniforms {
width: u32,
height: u32,
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
impl PathTracer {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [uniforms-struct-cpu]: [render.rs] CPU-side `Uniforms` struct]
The `repr(C)` attribute makes the memory layout of the `Uniforms` struct conform to the C language
rules so that the fields have a predictable order, size, and alignment.[^ch4-footnote2] For our
purposes, this should make the memory layout of the struct exactly match the WGSL declaration.
The `derive` attribute automatically implements the enumerated traits for our type. `Copy` and
`Clone` allow the type be copied by value (Rust types are move-only by default). This is also the
first time we are using the `bytemuck` crate. The `Pod` and `Zeroable` traits, along with `repr(C)`,
allow us the safely reinterpret the `Uniforms` struct as a sequence of bytes.
For all intents and purposes, these Rust attributes enable the same semantics as the following plain
C/C++ struct:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C
// If `Uniforms` were declared in C:
struct Uniforms {
uint32_t width;
uint32_t height;
};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now, let's allocate the backing buffer object and initialize its contents:
This code allocates a buffer resource that is large enough to store an instance of `Uniforms` and
copies the contents of `uniforms` into it. The buffer is mapped at creation so that its address
space accessible to the CPU side. We also declare its usage to be `UNIFORM`: this is a hint to
the GPU driver that allows it to perform optimizations based on the buffer access pattern. The usage
is also useful for validating that the bindings we provide conform to the pipeline's layout.
After the data copy, we need to flush and unmap the buffer from CPU memory before we can use it in
GPU commands. We also store both `uniforms` and `uniform_buffer`, since we'll reuse them to modify
some of the uniforms at runtime.
[^ch4-footnote2]: The default Rust layout representation doesn't provide a strong guarantee on the
order of the fields. See the [Rust reference](https://doc.rust-lang.org/reference/type-layout.html#representations).
Bind Group
----------
We need to associate the buffer object with a bind group with the correct layout before it can be
used in a render pass. Let's create and store a bind group and assign it to group index $0$ while
encoding the draw:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
use bytemuck::{Pod, Zeroable};
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
uniforms: Uniforms,
uniform_buffer: wgpu::Buffer,
display_pipeline: wgpu::RenderPipeline,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
display_bind_group: wgpu::BindGroup,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
struct Uniforms {
width: u32,
height: u32,
}
impl PathTracer {
pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer {
...
uniform_buffer.unmap();
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
// Create the display pipeline bind group.
let display_bind_group = device.create_bind_group(&wgpu::BindGroupDescriptor {
label: None,
layout: &display_layout,
entries: &[wgpu::BindGroupEntry {
binding: 0,
resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding {
buffer: &uniform_buffer,
offset: 0,
size: None,
}),
}],
});
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
PathTracer {
device,
queue,
uniforms,
uniform_buffer,
display_pipeline,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
display_bind_group,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
}
pub fn render_frame(&self, target: &wgpu::TextureView) {
...
render_pass.set_pipeline(&self.display_pipeline);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
render_pass.set_bind_group(0, &self.display_bind_group, &[]);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
// Draw 1 instance of a polygon with 6 vertices
render_pass.draw(0..6, 0..1);
...
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [display-bind-group]: [render.rs] Creating and using the display bind group]
Running the program now should bring up the same picture as before. The viewport dimensions are
still hardcoded in two places so let's clean that up by making the viewport width and height
parameters of the `PathTracer` constructor:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
impl PathTracer {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn new(
device: wgpu::Device,
queue: wgpu::Queue,
width: u32,
height: u32,
) -> PathTracer {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
let shader_module = compile_shader_module(&device);
let (display_pipeline, display_layout) =
create_display_pipeline(&device, &shader_module);
// Initialize the uniform buffer.
let uniforms = Uniforms {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
width,
height,
};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [width-height-parameters]: [render.rs]]
Let's update the main function to pass in the physical window dimensions while creating the
`PathTracer`:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
const WIDTH: u32 = 800;
const HEIGHT: u32 = 600;
#[pollster::main]
async fn main() -> Result<()> {
let event_loop = EventLoop::new();
let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT);
let window = WindowBuilder::new()
.with_inner_size(window_size)
.with_resizable(false)
.with_title("GPU Path Tracer".to_string())
.build(&event_loop)?;
let (device, queue, surface) = connect_to_gpu(&window).await?;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
event_loop.run(move |event, _, control_flow| {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [width-height-parameters-main]: [main.rs]]
Now we have a way to pass data between the CPU and GPU sides of the program. We can repeat
this pattern whenever we need to add or modify a bind group layout.
Ray Casting
====================================================================================================
Light flows out of emissive objects (like the sun or a lamp) and scatters off objects as it floods
the environment. When some of that light reaches a camera sensor, the camera can measure the amount
that arrived at each pixel and create a picture. Our virtual camera will compute the same
measurement by tracing the light's path in the reverse direction, starting at the camera and towards
the objects in the scene.
Camera Rays
-----------
The first segment in a path is between the camera and the closest surface that is visible "through
a pixel". To locate that surface, we can plot a ray from the camera and search for the closest
point where the ray intersects the scene.
A ray is a part of a straight line that has a starting point and extends infinitely in one
direction. A ray in 3D space can be represented using two vectors: a point of origin
$\mathbf{P}$ and a direction $\vec{\mathbf{d}}$. All points $\mathbf{R}$ on the ray
are described by the linear equation $\mathbf{R}(t) = \mathbf{P} + t \mathbf{d}$ over the parameter
$t$. $t$ is a real number and its positive values represent points on the ray that are
in front of the ray origin (if we consider the direction $\mathbf{d}$ as _forward_). Negative values
of $t$ represent points behind the origin, and $t=0$ is the same as the origin.
![Figure [ray]: Ray definition](../images/fig-05-ray.svg)
Let's define the data structure to represent a ray:
Let's now model a simple pinhole camera. Initially we'll the define the eye position (where the
arriving light gets focused) as the camera's origin and this will act as the origin for all camera
rays. The camera has a view direction, and some distance away from the origin along the view
direction sits the 2D viewport framing the rendered image.
We will initially position the camera origin at the coordinate system origin $(0, 0, 0)$ and set the
view direction towards the $-z$-axis in a 3-dimensional right-handed cartesian coordinate
system.[^ch5-footnote1]
![Figure [camera-view-space]: Rays in camera coordinates](../images/fig-04-camera-view-space.svg)
In order to determine the direction for the ray targeting a pixel, we need to convert the
pixel's viewport coordinates to the coordinate system we are going to use when computing ray
intersections. Let's define the $x$ and $y$ coordinate span of the viewport to be the same as NDC
(see _Figure 4_). This would make the viewport a square (with a width and height of $2$) so we need
to adjust it by the aspect ratio of the application window in order to make its shape match the
window frame.
The fragment shader already normalizes the viewport pixel coordinates to the range $[0,1]$ and
returns that as the output color. We can instead apply a simple transformation to convert them
to our new camera coordinate space:
1. Map the range to $[-1, 1]$ by doubling the range and shifting it in the negative direction by
$1$.
2. Scale the $x$ coordinate by the aspect ratio (which we'll define as $\tfrac{width}{height}$).
3. Flip the sign of the $y$ coordinate by multiplying it by $-1$.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete
let color = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
// Normalize the viewport coordinates.
var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
// Map `uv` from y-down (normalized) viewport coordinates to camera coordinates.
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-ray-computation]: [shaders.wgsl] Obtaining the viewport vector]
We now have a vector $\vec{\mathbf{uv}}$ that spans from the center of the viewport to the pixel
$\mathbf{A}$. The ray direction is the vector that points from the origin towards the pixel, which
is given by $\mathbf{A} - \mathbf{O}$. $\mathbf{O}$ is equal to $(0, 0, 0)$, so computing
$\mathbf{A}$ will give us the ray direction.
If we picture the viewport to be positioned away from the origin at distance $f$ along the $-z$
axis then we can obtain $\mathbf{A}$ by computing
$\begin{bmatrix} \vec{\mathbf{uv}} \\ 0 \end{bmatrix} - \begin{bmatrix} 0 \\ 0 \\ f \end{bmatrix}$,
or simply $\begin{bmatrix} \vec{\mathbf{uv}} \\ -f \end{bmatrix}$.
In the code, I'll refer to $f$ as `focus_distance`:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let origin = vec3(0.);
let focus_distance = 1.;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
// Normalize the viewport coordinates.
var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
// Map `uv` from y-down (normalized) viewport coordinates to camera coordinates.
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let direction = vec3(uv, -focus_distance);
let ray = Ray(origin, direction);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-ray-computation]: [shaders.wgsl] Deriving the camera ray origin and direction]
We finally have our camera ray. Initially we can make all rays hit the sky which will act as
the light source. We can make the sky appear a little more realistic by painting it with a
gradient that blends from blue to white as the $y$ coordinate of the ray's direction decreases.
We'll first map the $y$ coordinate to the $[0,1]$ range and use that value to linearly interpolate
between the two colors using the blend equation:
$$ \mathit{blendedValue} = (1-a)\cdot\mathit{startValue} + a\cdot\mathit{endValue} $$
Let's introduce a function called `sky_color` to compute this for a given ray and return that as the
fragment color. I used the same colors as RTIOW but you can use different ones:[^ch5-footnote2]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
struct Ray {
origin: vec3f,
direction: vec3f,
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
fn sky_color(ray: Ray) -> vec3f {
let t = 0.5 * (normalize(ray.direction).y + 1.);
return (1. - t) * vec3(1.) + t * vec3(0.3, 0.5, 1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
let origin = vec3(0.);
let focus_distance = 1.;
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
// Map `pos` from y-down viewport coordinates to camera viewport plane coordinates.
var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
let direction = vec3(uv, -focus_distance);
let ray = Ray(origin, direction);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
return vec4(sky_color(ray), 1.);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-ray-computation]: [shaders.wgsl] Deriving the camera ray origin and direction]
Running the program now should produce an image that looks like this:
![Figure [sky]: Ray tracing the sky](../images/img-06-sky-gradient.png)
[^ch5-footnote1]: The choice of a right-handed vs left-handed system is really up to you - you can
pick any relative orientation for the major axes that you want, as long as you stay consistent.
[^ch5-footnote2]: Interpolating from blue towards a reddish color instead of pure white can resemble
twilight. Give `vec3(1., 0.5, 0.3)`) a try.
Ray-Sphere Intersection
-----------------------
It's time to introduce objects to the scene. We'll start with a sphere since it has a simple
implicit form and querying for intersections between a ray and a sphere is straightforward. I'll
quickly go over the mathematics of the intersection function that we are going to implement:
Let's define a sphere by its center point $\mathbf{C}$ and its radius $r$. Then, any point
$\mathbf{X}$ on the surface of the sphere can be described by the equation[^ch5-footnote3]
$$ (\mathbf{X} - \mathbf{C}) \cdot (\mathbf{X} - \mathbf{C}) = r^2 $$
We want to determine if there is a point along the ray that satisfies this equation. Substituting
our ray equation for $\mathbf{X}$ we get:
$$ (\mathbf{P} + t\mathbf{d} - \mathbf{C}) \cdot (\mathbf{P} + t\mathbf{d} - \mathbf{C}) = r^2 $$
Now we need to solve for $t$. To simplify things, let's substitute $\mathbf{v}$ for $(\mathbf{P} -
\mathbf{C})$. After expanding the dot product and rearranging the terms we get
$$ (\mathbf{d} \cdot \mathbf{d}) t^2 + 2 (\mathbf{v} \cdot \mathbf{d}) t +
(\mathbf{v} \cdot \mathbf{v}) - r^2 = 0 $$
This is now in a canonical form for a quadratic equation: $at^2 + 2bt + c = 0$ and the solutions
for $t$ are given by
$$ t = \dfrac{-b \pm\sqrt{b^2 - ac}}{a} $$
with $a = \mathbf{d}\cdot\mathbf{d}$, $b = (\mathbf{P}-\mathbf{C})\cdot\mathbf{d}$, and
$c = (\mathbf{P}-\mathbf{C})\cdot(\mathbf{P}-\mathbf{C}) - r^2$. The value of the discriminant
$b^2 - ac$ determines the number of solutions. If the discriminant is negative, then there
are no real solutions and thus no intersection. If the discriminant is exactly 0, then there is one
real solution where the ray tangentially intersects the sphere at that point. If the discriminant is
positive, then there are two real solutions and thus two potential intersections that we need to
consider.
![Figure [ray-sphere-solutions]: Different cases of ray-sphere intersection
](../images/fig-06-ray-sphere-solutions.svg)
We are looking for the first visible surface in the ray's "line of sight", so when there are two
possible intersections it makes sense to choose the one that's closer to the ray's origin and lies
in front of it. If the closer result is negative (i.e. it's located _behind_ the origin relative to
the ray direction), we can discard it and choose the other one. If that one is non-negative, then
the ray origin is inside the sphere, so the intersection is valid. If both results are negative,
then the sphere is "behind" the ray.
$t$ is $0$ when the ray origin is on the surface. In general, rays that start exactly on the
surface of an object will be rays that trace the paths of light arriving at that surface. We
generally don't want such a ray to intersect the geometry that the ray originates from, so for
simplicity let's only consider positive values of $t$ as a valid intersection.
Let's define a new function called `intersect_sphere`. This function will return the smaller
positive solution for $t$ if there is an intersection and a non-positive value if the ray misses the
sphere. Let's also define a new type called `Sphere` to represent the object:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
struct Sphere {
center: vec3f,
radius: f32,
}
fn intersect_sphere(ray: Ray, sphere: Sphere) -> f32 {
let v = ray.origin - sphere.center;
let a = dot(ray.direction, ray.direction);
let b = dot(v, ray.direction);
let c = dot(v, v) - sphere.radius * sphere.radius;
let d = b * b - a * c;
if d < 0. {
return -1.;
}
let sqrt_d = sqrt(d);
let recip_a = 1. / a;
let mb = -b;
let t = (mb - sqrt_d) * recip_a;
if t > 0. {
return t;
}
return (mb + sqrt_d) * recip_a;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
struct Ray {
origin: vec3f,
direction: vec3f,
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [ray-sphere-intersection]: [shaders.wgsl] The `intersect_sphere` function]
Let's now add a single sphere to the scene. First we'll test the sphere for an intersection with
the camera ray. If there is a hit, then we'll return a solid color for the pixel.
If not, we'll return the color of the sky as before. Let's also make sure that the sphere is far
enough away from the view origin so that the camera doesn't fall inside the sphere:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
let origin = vec3(0.);
let focus_distance = 1.;
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
// Map `pos` from y-down viewport coordinates to camera viewport plane coordinates.
var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
let direction = vec3(uv, -focus_distance);
let ray = Ray(origin, direction);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let sphere = Sphere(/*center*/ vec3(0., 0., -1), /*radius*/ 0.5);
if intersect_sphere(ray, sphere) > 0. {
return vec4(1., 0.76, 0.03, 1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
return vec4(sky_color(ray), 1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [single-sphere]: [shaders.wgsl] First intersection test]
This should render a solid circle that looks like this:
![Figure [yellow-circle]: A solid circle](../images/img-07-solid-circle.png)
[^ch5-footnote3]: This equation has an intuitive geometric interpretation. $\mathbf{X} - \mathbf{C}$
describes a vector that spans from the center of the sphere to its surface. We know that the
magnitude of this vector must be equal to $r$. The dot product of a vector with itself yields the
square of its magnitude (as $V \cdot V = V_x^2 + V_y^2 + V_z^2)$ which, in this case, must be equal
to $r^2$.
Shading Multiple Spheres
------------------------
Now that our sphere intersection code is working, we'll next generalize the ray casting logic to
look for intersections in a _scene_ containing multiple objects. We can initially represent the
scene as an array of spheres. We'll change the code to test all spheres for a possible hit and use
the closest intersection to color the pixel. As before, we'll use the $t$ parameter to determine the
nearest hit.
Let's declare the scene with a second (large) sphere that serves as the "ground" where our first
sphere will sit. We can declare the array as a private global, like we did for the vertices of the
full-screen quad:
The scene traversal code is straightforward: loop through the scene array and keep track of the
closest $t$ value that results from calling `intersect_sphere` on each element. It makes sense to
initialize $t$ with a value that is larger than all other possible values. Since we're dealing with
floating-point numbers, _infinity_ is a suitable initial value. However, since WGSL doesn't quite
support infinities[^ch5-footnote4], I'll use the largest representable `f32` value as a substitute:
This should result in the following image:
![Figure [yellow-circles]: Two solid circles](../images/img-08-two-solid-circles.png)
Both spheres are visible where we expect them. Since we're painting both objects with the same
solid color, it's not possible to tell if our code works correctly for the bottom half of the top
sphere where the ray intersects both objects.
An easy way to improve this is to assign each object a different solid color and use that to paint
the pixel. I'm going to do something different: I'll scale the color by the value of `closest_t`
such that intersections that are further away from the origin are shaded darker compared to those
that are closer. This will convey the _depth_ of the shaded object with respect to our virtual
camera.
We can achieve this by multiplying the color by a factor of $1 - t$ which will keep the color bright
for smaller values of $t$ (representing closer intersections) and darken it as $t$ grows. I'll use
the [**`saturate`**](https://www.w3.org/TR/WGSL/#saturate-float-builtin) built-in function to clamp
the resulting value to the $[0, 1]$ range so that values of $t$ that are larger than $1$ will be
shaded black:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
let origin = vec3(0.);
let focus_distance = 1.;
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
// Map `pos` from y-down viewport coordinates to camera viewport plane coordinates.
var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
let direction = vec3(uv, -focus_distance);
let ray = Ray(origin, direction);
var closest_t = FLT_MAX;
for (var i = 0u; i < OBJECT_COUNT; i += 1u) {
let t = intersect_sphere(ray, scene[i]);
if t > 0. && t < closest_t {
closest_t = t;
}
}
if closest_t < FLT_MAX {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
return vec4(1., 0.76, 0.03, 1.) * saturate(1. - closest_t);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
}
return vec4(sky_color(ray), 1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [shading-using-depth]: [shaders.wgsl] Shading using depth]
This should make the objects' order of visibility and their spherical shape more apparent:
![Figure [depth-shaded-spheres]: Spheres shaded by depth](../images/img-09-depth-shaded-spheres.png)
Both spheres appear quite dark and the bottom sphere fades to black where it meets the one on top.
This makes sense since the center of the top sphere is exactly where $t = 1$. You can play with
different ways to convert `closest_t` to a color. Here is a version that paints the scene gray and
brighter with increasing depth:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
if closest_t < FLT_MAX {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
return vec4(saturate(closest_t) * 0.5);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
}
return vec4(sky_color(ray), 1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [shading-using-depth-alt]: [shaders.wgsl] Another way to shade with depth]
![Figure [depth-shaded-spheres-gray]: Spheres shaded by depth (gray)
](../images/img-10-depth-shaded-spheres-gray.png)
[^ch5-footnote4]: [WGSL W3C Working Draft, §14.6. Floating Point Evaluation](https://www.w3.org/TR/WGSL/#floating-point-evaluation)
states that "_Overflow, infinities, and NaNs generated before runtime are errors_" and "_[compiler]
implementations may assume that overflow, infinities, and NaNs are not present at runtime._"
Surface Normals
---------------
Shading using depth can serve as a great debugging tool as well as the basis for various visual
effects. However, we need to know more about the surface geometry in order to color it with a
lighting model. This includes its orientation with respect to our viewing direction and the rest of
the scene, which is given by its _normal vector_.
For any point on a surface, the normal $\vec{\mathbf{N}}$ is defined by the line that is
perpendicular to the plane tangent at that point.
![Figure [normal-vector]: The normal vector](../images/fig-07-normal-vector.svg)
The orientation of the normal vector depends on both the type of geometry as well as the specific
point of intersection. First we're going to make some assumptions that will come into play later
when we implement materials:
1. Every surface has a _front_ face and a _back_ face and the direction of the normal vector lines
up with the front face.
2. All normal vectors have a unit length by default.
The normal vector at point $\mathbf{X}$ on the surface of a sphere with center $\mathbf{C}$ and
radius $r$ is simply given by
$$ \vec{\mathbf{N}} = \dfrac{\mathbf{X} - \mathbf{C}}{||\mathbf{X} - \mathbf{C}||}
= \dfrac{\mathbf{X} - \mathbf{C}}{r} $$
![Figure [sphere-normal]: Computing the normal on a sphere
](../images/fig-08-sphere-normal.svg)
Now that we know how to compute the normal, let's change `intersect_sphere` to return a normal
vector alongside the $t$ parameter. We'll introduce a struct called `Intersection` that bundles
them together:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@group(0) @binding(0) var uniforms: Uniforms;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
struct Intersection {
normal: vec3f,
t: f32,
}
fn no_intersection() -> Intersection {
return Intersection(vec3(0.), -1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
struct Sphere {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
fn intersect_sphere(ray: Ray, sphere: Sphere) -> Intersection {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
let v = ray.origin - sphere.center;
let a = dot(ray.direction, ray.direction);
let b = dot(v, ray.direction);
let c = dot(v, v) - sphere.radius * sphere.radius;
let d = b * b - a * c;
if d < 0. {
return no_intersection();
}
let sqrt_d = sqrt(d);
let recip_a = 1. / a;
let mb = -b;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let t1 = (mb - sqrt_d) * recip_a;
let t2 = (mb + sqrt_d) * recip_a;
let t = select(t2, t1, t1 > 0.);
if t <= 0. {
return no_intersection();
}
let p = point_on_ray(ray, t);
let N = (p - sphere.center) / sphere.radius;
return Intersection(N, t);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
}
struct Ray {
origin: vec3f,
direction: vec3f,
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
fn point_on_ray(ray: Ray, t: f32) -> vec3 {
return ray.origin + t * ray.direction;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
fn sky_color(ray: Ray) -> vec3f {
let t = 0.5 * (normalize(ray.direction).y + 1.);
return (1. - t) * vec3(1.) + t * vec3(0.3, 0.5, 1.);
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [intersection-struct]: [shaders.wgsl] A struct for intersection data]
Let's talk about some of the changes. We added a helper function called `no_intersection()` that
returns an `Intersection` representing a null result. We also declared a function called
`point_on_ray`, which returns the coordinates of a point along a ray at a known $t$ value.
You may have noticed that the if statement which used to be conditioned on `t > 0.` is now a
function call to _select_. [**`select`**](https://www.w3.org/TR/WGSL/#select-builtin) evaluates to
either its first or second argument depending on the value of the third. The call
`select(t2, t1, t1 > 0.)` is functionally equivalent to `t1 > 0. ? t1 : t2` (a ternary expression)
in C/C++, with one exception: there is no guarantee of short-circuiting, meaning that both `t1` and
`t2` will be evaluated regardless of the conditional. You may be tempted to rewrite this as an if
statement (why needlessly evaluate both branches after all?):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
var t = (mb - sqrt_d) * recip_a;
if t <= 0. {
t = (mb + sqrt_d) * recip_a;
}
if t <= 0. {
return no_intersection();
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [branchy-version]: [shaders.wgsl] Branchy version]
This is perfectly fine and will behave in the same way. In fact, it's possible that this will
compile down to the exact same GPU instructions as the version with `select`. GPUs are generally not
good at handling conditional branches in code without sacrificing some amount of parallelism (though
this depends on several factors). A good shader compiler will often eliminate branches altogether
for simple conditionals like these. Writing efficient GPU code requires a good understanding of how
GPUs deal with divergent control flow -- a topic that we will discuss more later on.
Let's update the fragment shader to make use of the new data structure. Let's also change our
shading code to visualize the normal vector by mapping the coordinates (from the $[-1, 1]$ range)
to a color value (in the $[0, 1]$ range):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
let origin = vec3(0.);
let focus_distance = 1.;
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
// Normalize the viewport coordinates.
var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
// Map `uv` from y-down (normalized) viewport coordinates to camera coordinates.
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
let direction = vec3(uv, -focus_distance);
let ray = Ray(origin, direction);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
var closest_hit = Intersection(vec3(0.), FLT_MAX);
for (var i = 0u; i < OBJECT_COUNT; i += 1u) {
let hit = intersect_sphere(ray, scene[i]);
if hit.t > 0. && hit.t < closest_hit.t {
closest_hit = hit;
}
}
if closest_hit.t < FLT_MAX {
return vec4(0.5 * closest_hit.normal + vec3(0.5), 1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
return vec4(sky_color(ray), 1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [shading-with-normals]: [shaders.wgsl] Shading with normals]
and we get:
![Figure [normal-shaded-spheres]: Visualizing surface normals
](../images/img-11-normal-shaded-spheres.png)
Notice how each color channel maps directly to one of the major axis coordinates, so that normals
pointing towards the $+x$ direction get shaded with a higher _red_ component, normals pointing
straight up towards $+y$ appear green, and so on.
Temporal Accumulation
====================================================================================================
Over the next two chapters we are going to focus on two important features of the renderer:
antialiasing and path tracing. These are both sampling problems in essence: they try to estimate
some continuous signal (in this case the light flowing out of the scene into the pixels of our
virtual camera) by repeatedly sampling various discrete light paths. Once a sufficient number of
samples have been collected, we hope that their average will converge to the real signal -- or at
least get close enough.[^ch6-footnote1]
How many samples do we need to collect for each pixel before displaying the result? How can we
structure the code to achieve some amount of interactivity?
The answer to the first question depends highly on the scene but the sample count we are looking at
is possibly in the hundreds if not _thousands_. One option is to add a loop to our fragment shader
that intersects the scene with camera rays thousands of times before returning the final color,
though it will take a long time before we can display a frame. Path tracing is computationally
_very_ expensive, even for a GPU.
I'm going to suggest an alternative approach: spread the sample collection across many frames.
An invocation of the pipeline will output 1 sample per pixel (as it currently does) but rather
than outputting the samples directly to the display surface, we'll accumulate them in a texture over
time.
This approach has the nice benefit that we can to present the contents of the texture to the display
as soon as a pipeline invocation completes, allowing us to watch as the image resolves to the final
rendering.
[^ch6-footnote1]: This is referred to as the
[_Law of Large Numbers_](https://en.wikipedia.org/wiki/Law_of_large_numbers) in probability theory.
The Monte Carlo method employed in path tracing is an example of this (and we'll talk more about it
in the next chapter).
Frame Count
-----------
The arithmetic average of a set of samples is simply given by their sum divided by the sample count.
In other words, given $N$ samples of a random variable $x \in x_1,...,x_N$ the average is given by
$$ \dfrac{1}{N}\sum_{i=1}^N x_i $$
Since we are going to distribute the samples across rendered frames, for any given frame, $N$ is
equal to the number of frames we have rendered up that point plus $1$. We can represent this as a
simple counter that we increment every time `render_frame` gets called. We'll also define a uniform
variable for the frame count so that our shader program can access it when it needs to compute the
average:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
struct Uniforms {
width: u32,
height: u32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
frame_count: u32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
}
@group(0) @binding(0) var uniforms: Uniforms;
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [frame-count-cpu]: [shaders.wgsl] The `frame_count` uniform declaration]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
struct Uniforms {
width: u32,
height: u32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
frame_count: u32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
impl PathTracer {
pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
let shader_module = compile_shader_module(&device);
let (display_pipeline, display_layout) =
create_display_pipeline(&device, &shader_module);
// Initialize the uniform buffer.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let uniforms = Uniforms {
width: 800,
height: 600,
frame_count: 0,
};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
let uniform_buffer = device.create_buffer(&wgpu::BufferDescriptor {
...
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
pub fn render_frame(&self, target: &wgpu::TextureView) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn render_frame(&mut self, target: &wgpu::TextureView) {
self.uniforms.frame_count += 1;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [frame-count-cpu]: [render.rs] Initializing the `frame_count` uniform]
We declared `frame_count` as a 32-bit unsigned integer, which is supported by all shading languages.
This will inevitably overflow if you leave the application running for a long time but I'm not too
worried. Consider this: if you have a powerful graphics card that can render frames at 1000 fps, it
will take approximately 50 days for the count to reach the maximum representable `u32` value
($2^{32}-1$). This is not perfect but also not a huge issue for us.[^ch6-footnote2] Note that we
also changed `render_frame` to take a `&mut self` since it now mutates a member of the `PathTracer`
type. We also need to update the call site and declare the `PathTracer` instance as mutable to make
the compiler happy:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
#[pollster::main]
async fn main() -> Result<()> {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
let renderer = render::PathTracer::new(device, queue);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let mut renderer = render::PathTracer::new(device, queue);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render_frame-call]: [main.rs] Rendering to a surface texture]
We are now maintaining a count on the CPU but we still need to make sure that the changes are
mirrored on the GPU side by writing the contents of `self.uniforms` to `self.uniform_buffer`. Since
we are modifying `self.uniforms` every frame, we should also update the contents of the GPU buffer
every frame. This is where things can get a little complicated.
[^ch6-footnote2]: Rust has overflow checks enabled in debug builds, so the program will always panic
(i.e. assert and crash) on overflow. In release builds, the checks are disabled by default and Rust
performs two's complement wrapping (see the
[docs](https://doc.rust-lang.org/book/ch03-02-data-types.html#integer-overflow)). If you don't care
about the runtime cost and want to play it safe, you can use one of the explicit arithmetic methods
provided by the standard library. For example, the following will always panic in the case of an
overflow: `self.uniforms.frame_count = self.uniforms.frame_count.checked_add(1).unwrap()`.
### Buffer Updates
There are some things to consider when modifying the contents of a GPU buffer. The first
is the type of memory the buffer resides in. GPUs typically come in two flavors: a _discrete_ GPU
(such as a desktop graphics card) has its own dedicated memory and connects to the CPU via a
peripheral bus that facilitates memory transfers between the two processors. In a _unified_
architecture, the GPU and the CPU are integrated into the same die and can share system memory
without an explicit memory transfer.
Before any writes can occur, the CPU side must have access to a region of memory that's mapped to
its address space. How the written data is made available to the GPU side very much depends on the
hardware and the functions provided by the graphics API. For example, both Metal and Vulkan support
buffer types that are backed by shared system memory and can be permanently mapped on a unified
architecture. Similarly, both APIs provide facilities to transfer buffer data to GPU memory when
fast shared memory isn't supported.[^ch6-footnote3]
Another consideration is around synchronization. Suppose that we changed our renderer to allow
multiple frames to be in flight without gating the GPU submissions on v-sync.[^ch6-footnote4] We
would need to avoid making any changes to the uniform buffer while a GPU submission is in progress,
as that could cause a data race. There are different ways to handle this depending on the API,
such as double or triple buffering when using a persistently mapped buffer or using synchronization
primitives like memory fences.
If you're following this book using a native API (like Metal, Vulkan, D3D, CUDA, etc), please
consult its documentation for the best approach for frequent buffer updates on your GPU.
WebGPU tries to provide a common abstraction over these nuances while working within
additional constraints imposed by a web browser environment.[^ch6-footnote5]
As a result, WebGPU imposes some strict limitations on how buffer mapping works:
* A buffer must have the [`MAP_WRITE`](https://www.w3.org/TR/webgpu/#dom-gpubufferusage-map_write)
usage for the CPU side to map and write its contents and this usage can only be combined with the
[`COPY_SRC`](https://www.w3.org/TR/webgpu/#dom-gpubufferusage-copy_src) usage. This means that
a buffer we map for writing connot be bound as a shader resource (such as a uniform buffer) and
instead serves as a _staging buffer_ for a _copy command_. Updating the contents of a buffer
is only possible by issuing a copy from this intermediate staging buffer.
* Buffers can only be mapped asynchronously and there is no synchronous way to map a buffer
_except_ when first created (using the `mapped_at_creation` field in the buffer descriptor). This
requires some careful coordination so that buffers are mapped and available for writing when we
need to update them.
This immediately rules out shared memory buffers so we have to issue a copy. The easiest way would
be to create a new staging buffer on every update and set `mapped_at_creation` to `true` but
allocating a new short-lived buffer every frame can be expensive and we should strive to reuse GPU
buffers when we can. Buffers have to get unmapped before they can be bound to a shader, so we need
to re-map a buffer before we can write to it again. A buffer can only get re-mapped asynchronously,
so we may need to allocate another staging buffer if `render_frame` ever gets called before the
asynchronous mapping of the first staging buffer has completed.
One possible approach is to maintain a pool of staging buffers. Each of these is a
`wgpu::Buffer` object with the `MAP_WRITE` and `COPY_SRC` usages and mapped at creation.
When it's time to update the uniform buffer, we do the following:
1. Find a large enough staging buffer in the pool (or create a new one if not found). Assume the
buffer is mapped and write its contents.
2. Unmap the buffer and move it to a "pending buffer" list. Then, encode a
["copy buffer to buffer"](https://www.w3.org/TR/webgpu/#dom-gpucommandencoder-copybuffertobuffer)
command with the staging buffer as the source and the uniform buffer as the destination.
3. After submitting the command buffer, call
["map async"](https://www.w3.org/TR/webgpu/#dom-gpubuffer-mapasync) on all buffers in the pending
list. The [wgpu implementation of map async](https://docs.rs/wgpu/latest/wgpu/struct.BufferSlice.html#method.map_async)
reports its completion in a callback (which runs asynchronously), so the callback can be
responsible for removing the buffer from the pending list and adding it back to the mapped
staging buffer pool.
This is a relatively simple state machine but fortunately there is a method that boils
all of that down to a single API call:
[`wgpu::Queue::write_buffer`](https://docs.rs/wgpu/latest/wgpu/struct.Queue.html#method.write_buffer)[^ch6-footnote6].
This simplifies the code quite a bit so let's use it instead of implementing a buffer pool.
`write_buffer` achieves the same thing while leaving it up to wgpu to choose the most
efficient way to transfer the data on the host platform.
As for synchronization, everything gets internally handled by wgpu so there isn't anything special
we need to do. As long as we call `write_buffer` before encoding any other GPU commands referencing
the copy destination (i.e. our uniform buffer) on the same queue, the copy is guaranteed
to complete before the shader runs and reads from the buffer:
That's pretty much it. Since the new code always updates the uniform buffer before a GPU submission
we don't really need to initialize it at the start. Let's check that the code works by creating
a visual effect using `frame_count`. The count increases monotonically, so we can use it like a
"timestamp" and drive a simple animation. Here is a simple shader change that makes the spheres
shrink and expand:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
...
var closest_hit = Intersection(vec3(0.), FLT_MAX);
for (var i = 0u; i < OBJECT_COUNT; i += 1u) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
var sphere = scene[i];
sphere.radius += sin(f32(uniforms.frame_count) * 0.02) * 0.2;
let hit = intersect_sphere(ray, sphere);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
if hit.t > 0. && hit.t < closest_hit.t {
closest_hit = hit;
}
}
if closest_hit.t < FLT_MAX {
return vec4(0.5 * closest_hit.normal + vec3(0.5), 1.);
}
return vec4(sky_color(ray), 1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [shading-with-normals]: [shaders.wgsl] Shading with normals]
This should create an effect like the one in this video (Figure 21):
![Figure [animated-radius]: (video) Spheres animated with frame count
](../images/vid-01-animated-radius.mp4 autoplay muted loop)
[^ch6-footnote3]: Metal provides a
["managed"](https://developer.apple.com/documentation/metal/resource_fundamentals/synchronizing_a_managed_resource)
storage mode for these situations alongside a "private" storage mode for memory that is meant for
fast GPU-only access. Vulkan's memory abstraction also provides many similar low level
configurations.
[^ch6-footnote4]: This can be desirable on a high-end GPU that can render a single frame much faster
than the display refresh rate.
[^ch6-footnote5]: See [wgpu#1438](https://github.com/gfx-rs/wgpu/discussions/1438) for an
interesting discussion on the motivations behind the async-only buffer mapping API.
[^ch6-footnote6]: See the WebGPU specification for
[GPUQueue.writeBuffer](https://www.w3.org/TR/webgpu/#dom-gpuqueue-writebuffer)
Radiance Texture
----------------
The animation you just rendered is a type of computation that is spread over time (hence the word
_"temporal"_). We can use the same mechanism to compute running averages of per-pixel radiance
samples. _Radiance_ is a radiometric term that refers to the energy carried by light through space,
restricted to an instant in time, emanating from a unit patch of surface towards another. It is a
physical quantity that renderers often emulate to produce realistic stills. Following this model,
we'll pretend that every ray we cast measures some fraction of the radiance along its direction,
and rays will always originate from a surface in the scene and point in the direction of
another. The first rays all originate at a pixel (inside the virtual camera).[^ch6-footnote7]
On each frame, the program will compute one sample per pixel and add to a per-pixel sum of samples
When it's time to display the current sample average, we can divide the sum by `frame_count` and
output that to the surface. In order to achieve this, let's set aside GPU texture to persist the
running sums across frames:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
use bytemuck::{Pod, Zeroable};
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
uniforms: Uniforms,
uniform_buffer: wgpu::Buffer,
display_pipeline: wgpu::RenderPipeline,
}
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
struct Uniforms {
width: u32,
height: u32,
frame_count: u32,
}
impl PathTracer {
pub fn new(
device: wgpu::Device,
queue: wgpu::Queue,
width: u32,
height: u32,
) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
let shader_module = compile_shader_module(&device);
let (display_pipeline, display_layout) =
create_display_pipeline(&device, &shader_module);
// Initialize the uniform buffer.
let uniforms = Uniforms {
width,
height,
frame_count: 0,
};
let uniform_buffer = device.create_buffer(&wgpu::BufferDescriptor {
label: Some("uniforms"),
size: std::mem::size_of::() as u64,
usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST,
mapped_at_creation: false,
});
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let radiance_samples = create_sample_texture(&device, width, height);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
PathTracer {
device,
queue,
uniforms,
uniform_buffer,
display_pipeline,
}
}
...
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
fn create_sample_texture(device: &wgpu::Device, width: u32, height: u32) -> wgpu::Texture {
device.create_texture(&wgpu::TextureDescriptor {
label: Some("radiance samples"),
size: wgpu::Extent3d {
width,
height,
depth_or_array_layers: 1,
},
mip_level_count: 1,
sample_count: 1,
dimension: wgpu::TextureDimension::D2,
format: wgpu::TextureFormat::Rgba32Float,
usage: wgpu::TextureUsages::TEXTURE_BINDING | wgpu::TextureUsages::STORAGE_BINDING,
view_formats: &[],
})
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [radiance-samples-texture]: [render.rs] Radiance samples texture]
The texture has the same dimensions as the window surface, so that the resolution of the
rendered image matches what gets displayed. (Though, it's not uncommon to render at a lower
resolution and upsample that in order to save on computations.) The texture format is `Rgba32Float`,
which stores every pixel (or "texel") as four 32-bit floating point components (one for each of the
4 RGBA channels). This uses more memory than the 8-bit `Rgba8Unorm` format we used for the display
surface but provides sufficient precision to store very large sums of radiance samples on all color
channels.
The usages (`TEXTURE_BINDING` and `STORAGE_BINDING`) enable the texture to be bound for reading and
writing. wgpu doesn't allow a texture to be bound to the same shader stage simultaneously with both
read and write access (except with an extension feature[^ch6-footnote8]). This may
not be supported on all GPUs, so let's avoid depending on specific GPU features for now. Instead of
reading and modifying the same texture in the render pass we can "ping-pong" between two textures.
The pipeline will declare two texture bindings: a read-only binding that contains the previously
accumulated sums, and a second (write-only) storage binding where it will output the updated sums.
We'll also create two texture objects for each binding and alternate their binding assignments
with every frame, repeatedly swapping their roles: the texture that was previously the write
target provides the accumulated sums for the next frame, and vice versa.
Start by changing the type of `radiance_samples` to an array of 2 textures:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
use bytemuck::{Pod, Zeroable};
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
uniforms: Uniforms,
uniform_buffer: wgpu::Buffer,
display_pipeline: wgpu::RenderPipeline,
}
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
struct Uniforms {
width: u32,
height: u32,
frame_count: u32,
}
impl PathTracer {
pub fn new(
device: wgpu::Device,
queue: wgpu::Queue,
width: u32,
height: u32,
) -> PathTracer {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let radiance_samples = create_sample_textures(&device, width, height);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
}
...
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
fn create_sample_textures(
device: &wgpu::Device,
width: u32,
height: u32,
) -> [wgpu::Texture; 2] {
let desc = wgpu::TextureDescriptor {
label: Some("radiance samples"),
size: wgpu::Extent3d {
width,
height,
depth_or_array_layers: 1,
},
mip_level_count: 1,
sample_count: 1,
dimension: wgpu::TextureDimension::D2,
format: wgpu::TextureFormat::Rgba32Float,
usage: wgpu::TextureUsages::TEXTURE_BINDING | wgpu::TextureUsages::STORAGE_BINDING,
view_formats: &[],
};
// Create two textures with the same parameters.
[device.create_texture(&desc), device.create_texture(&desc)]
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [radiance-samples-textures]: [render.rs] Radiance samples textures]
Now, let's add the new bindings to the the bind group layout definition, assigning
binding index $1$ to the read-only binding (previous sums) and $2$ to the write-only storage
binding (the updated sums):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
fn create_display_pipeline(
device: &wgpu::Device,
shader_module: &wgpu::ShaderModule,
) -> (wgpu::RenderPipeline, wgpu::BindGroupLayout) {
let bind_group_layout = device.create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor {
label: None,
entries: &[
wgpu::BindGroupLayoutEntry {
binding: 0,
visibility: wgpu::ShaderStages::FRAGMENT,
ty: wgpu::BindingType::Buffer {
ty: wgpu::BufferBindingType::Uniform,
has_dynamic_offset: false,
min_binding_size: None,
},
count: None,
},
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
wgpu::BindGroupLayoutEntry {
binding: 1,
visibility: wgpu::ShaderStages::FRAGMENT,
ty: wgpu::BindingType::Texture {
sample_type: wgpu::TextureSampleType::Float {
filterable: false,
},
view_dimension: wgpu::TextureViewDimension::D2,
multisampled: false,
},
count: None,
},
wgpu::BindGroupLayoutEntry {
binding: 2,
visibility: wgpu::ShaderStages::FRAGMENT,
ty: wgpu::BindingType::StorageTexture {
access: wgpu::StorageTextureAccess::WriteOnly,
format: wgpu::TextureFormat::Rgba32Float,
view_dimension: wgpu::TextureViewDimension::D2,
},
count: None,
},
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
],
});
let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
...
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [updated-pipeline-layout]: [render.rs] Updated bind group layout]
Next, we need to change the actual bind group object to match the new layout. We want to alternate
the texture assignments but a bind group cannot be modified once it's created. We could instead
create two bind groups with the textures swapped and alternate those at render time:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
uniforms: Uniforms,
uniform_buffer: wgpu::Buffer,
display_pipeline: wgpu::RenderPipeline,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
display_bind_groups: [wgpu::BindGroup; 2],
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
struct Uniforms {
width: u32,
height: u32,
frame_count: u32,
}
impl PathTracer {
pub fn new(
device: wgpu::Device,
queue: wgpu::Queue,
width: u32,
height: u32,
) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
let shader_module = compile_shader_module(&device);
let (display_pipeline, display_layout) =
create_display_pipeline(&device, &shader_module);
// Initialize the uniform buffer.
let uniforms = Uniforms {
width,
height,
frame_count: 0,
};
let uniform_buffer = device.create_buffer(&wgpu::BufferDescriptor {
label: Some("uniforms"),
size: std::mem::size_of::() as u64,
usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST,
mapped_at_creation: false,
});
let radiance_samples = create_sample_textures(&device, width, height);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let display_bind_groups = create_display_bind_groups(
&device,
&display_layout,
&radiance_samples,
&uniform_buffer,
);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
PathTracer {
device,
queue,
uniforms,
uniform_buffer,
display_pipeline,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
display_bind_groups,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
}
...
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
fn create_display_bind_groups(
device: &wgpu::Device,
layout: &wgpu::BindGroupLayout,
textures: &[wgpu::Texture; 2],
uniform_buffer: &wgpu::Buffer,
) -> [wgpu::BindGroup; 2] {
let views = [
textures[0].create_view(&wgpu::TextureViewDescriptor::default()),
textures[1].create_view(&wgpu::TextureViewDescriptor::default()),
];
[
// Bind group with view[0] assigned to binding 1 and view[1] assigned to binding 2.
device.create_bind_group(&wgpu::BindGroupDescriptor {
label: None,
layout,
entries: &[
wgpu::BindGroupEntry {
binding: 0,
resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding {
buffer: uniform_buffer,
offset: 0,
size: None,
}),
},
wgpu::BindGroupEntry {
binding: 1,
resource: wgpu::BindingResource::TextureView(&views[0]),
},
wgpu::BindGroupEntry {
binding: 2,
resource: wgpu::BindingResource::TextureView(&views[1]),
},
],
}),
// Bind group with view[1] assigned to binding 1 and view[0] assigned to binding 2.
device.create_bind_group(&wgpu::BindGroupDescriptor {
label: None,
layout,
entries: &[
wgpu::BindGroupEntry {
binding: 0,
resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding {
buffer: uniform_buffer,
offset: 0,
size: None,
}),
},
wgpu::BindGroupEntry {
binding: 1,
resource: wgpu::BindingResource::TextureView(&views[1]),
},
wgpu::BindGroupEntry {
binding: 2,
resource: wgpu::BindingResource::TextureView(&views[0]),
},
],
}),
]
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [textures-bind-groups]: [render.rs] Bind groups with different texture assignments]
Now, let's update the shader:
The intersection test logic remains the same as before but instead of returning the computed
radiance value right away, we first store it in a local variable (`radiance_sample`). Next we fetch
the current tally from the "old" texture (`old_sum`) and compute the updated tally by adding
`radiance_sample` to it. We want to ensure that the accumulation starts as 0, so we set `old_sum`
to `vec3(0)` for the initial frame (when `frame_count` is equal to $1$). Then simply return
`new_sum / f32(uniform.frame_count)`, i.e. the current average, in the RGB channels of the output
color.[^ch6-footnote9]
Finally, let's update the bind group assignment in `PathTracer::render_frame` to ping-pong between
the two bind groups we created, using even and odd values of `frame_count` as a toggle:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl PathTracer {
...
pub fn render_frame(&mut self, target: &wgpu::TextureView) {
self.uniforms.frame_count += 1;
self.queue
.write_buffer(&self.uniform_buffer, 0, bytemuck::bytes_of(&self.uniforms));
let mut encoder = self
.device
.create_command_encoder(&wgpu::CommandEncoderDescriptor {
label: Some("render frame"),
});
let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
label: Some("display pass"),
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
view: target,
resolve_target: None,
ops: wgpu::Operations {
load: wgpu::LoadOp::Clear(wgpu::Color::BLACK),
store: wgpu::StoreOp::Store,
},
})],
..Default::default()
});
render_pass.set_pipeline(&self.display_pipeline);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
render_pass.set_bind_group(
0,
&self.display_bind_groups[(self.uniforms.frame_count % 2) as usize],
&[],
);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
// Draw 1 instance of a polygon with 6 vertices
render_pass.draw(0..6, 0..1);
// End the render pass by consuming the object.
drop(render_pass);
let command_buffer = encoder.finish();
self.queue.submit(Some(command_buffer));
}
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [ping-pong-bind-groups]: [render.rs] Ping-pong bind groups]
When you run the code, you should see the animation from before but the displayed image should
look a bit smeared. You should be able to see the oscillating sphere leave behind a "trail" over the
first few seconds and the image should eventually settle at something like this:
![Figure [temporal-blur-effect]: Temporal Blur Effect](../images/img-12-blurred-animation.png)
I find it fun to watch the rendering of this image. After the program runs for a few seconds the
image seems to reach a steady state. This happens when the renderer has collected enough samples
that adding new ones doesn't perceivably contribute to the average. The sphere radii are oscillating
inside a fixed range, so we observe all possible frame states of the animation rather quickly.
[^ch6-footnote7]: We are making the assumption that light travels along straight lines.
[^ch6-footnote8]: wgpu supports a
[read/write access mode](https://docs.rs/wgpu/0.19.3/wgpu/enum.StorageTextureAccess.html#variant.ReadWrite)
which is hidden behind the adapter feature `TEXTURE_ADAPTER_SPECIFIC_FORMAT_FEATURES`. This isn't
guaranteed to be supported by all GPUs but feel free use it if yours does.
[^ch6-footnote9]: Note that the value we store in the texture (`vec4(new_sum, 0.)`) has its alpha
component set to $0$. We aren't making use of the alpha values so it doesn't matter what we set
this to.
Antialiasing
------------
Let's undo the animation and bring back the static spheres.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
...
let direction = vec3(uv, -focus_distance);
let ray = Ray(origin, direction);
var closest_hit = Intersection(vec3(0.), FLT_MAX);
for (var i = 0u; i < OBJECT_COUNT; i += 1u) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let sphere = scene[i];
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete
var sphere = scene[i];
sphere.radius += sin(f32(uniforms.frame_count) * 0.02) * 0.2;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
let hit = intersect_sphere(ray, sphere);
if hit.t > 0. && hit.t < closest_hit.t {
closest_hit = hit;
}
}
var radiance_sample: vec3f;
...
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [stop-animating]: [shaders.wgsl] Remove radius animation]
The output should be the same still image from the end of Chapter 5 (Figure 20). The
accumulation logic has no effect because every sample is computing exactly the same value.
Let's zoom in and take a closer look at the edges of the spheres:
![Figure [aliased-boundaries]: Aliased shape boundaries @ 400x300
](../images/img-13-aliased-boundaries.png height="500px" class="pixel")
Here, each pixel is visualized as a square. A discrete pixel can only display a single color but
pixels along shape boundaries overlap multiple (continuous) surfaces. Ideally the pixel color
should receive a contribution from all of those surfaces, in proportion to the "pixel area" covered
by each surface.
Casting a single camera ray returns only a point sample but averaging multiple _sub-pixel_ samples
can give us an approximation of the whole area. Let's try a very simple approach first: subdivide a
pixel into a rectangular grid and on each frame cast the ray towards one of the sub-regions.
The following code change adds a small offset to the ray which cycles through the sub-regions of a
4x4 grid centered at the original ray direction, using `uniforms.frame_count` like an index. The
offsets range within $[-0.5, 0.5]$ in both coordinate directions:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
let origin = vec3(0.);
let focus_distance = 1.;
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
// Offset and normalize the viewport coordinates of the ray.
let offset = vec2(
f32(uniforms.frame_count % 4) * 0.25 - 0.5,
f32((uniforms.frame_count % 16) / 4) * 0.25 - 0.5
);
var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
// Map `uv` from y-down (normalized) viewport coordinates to camera coordinates.
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
let direction = vec3(uv, -focus_distance);
...
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [16-sample-aa]: [shaders.wgsl] 16 grid samples]
![Figure [16-sample-aa-aliased-boundaries]: Anti-aliasing with 16 regularly-spaced samples @ 400x300
](../images/img-14-16-sample-aa.png height="500px" class="pixel")
That's an improvement but we can do better. Instead of subdividing the pixel into 16
regularly-spaced discrete regions (which is prone to the same sampling artifact), let's offset the
ray by a random amount within that range. This should accumulate enough samples from various parts
of the pixel area over time to yield a better estimate of the average color. Plus, why limit
ourselves to only 16 discrete samples when our renderer is already set up for an indefinite
amount?
PRNG
----
Shading languages don't provide a built-in facility to generate random numbers, which means we need
to implement our own.
A class of pseudorandom number generators that is very easy to implement is called _Xorshift
RNGs_.[^marsaglia] Xorshift generators work by repeatedly computing the bitwise exclusive-or of an
initial seed with a bit-shifted version of itself. The result is a deterministic sequence with a
uniform distribution and a long period that suits our needs.[^ch6-footnote10]
We can implement the RNG as a private variable such that each GPU thread gets its own local instance
of the RNG state. We generally want to seed the RNG such that the pseudorandom sequence for a pixel
is different across successive frames since we want to sample a different sub-pixel coordinate
each time. The sequences should also ideally differ across adjacent pixels in a single frame
(instead of repeating the same spatial pattern) in order to improve the sampling distribution. We
can combine `uniforms.frame_count` with the pixel's coordinates using a hash function to obtain a
good initial seed for each thread. I use the _One-at-a-Time Hash_ function from Bob Jenkins'
Dr Dobbs article from 1997[#Jenkins97] but you could use any other hash function as long as it's fast
and has good statistical properties.
The following listing defines the RNG state, the hash function, and the 32-bit xorshift.
`init_rng()` initializes the state with the seed. The RNG state and the generated numbers are 32-bit
unsigned integers. Since we're pretty much only dealing with floating-point numbers, the code
includes a `rand_f32()` function that generates and converts a random `u32` to a `f32` between $0$
and $1$:
Now to change the offset computation in the fragment shader to pick a random coordinate:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
init_rng(vec2u(pos.xy));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
let origin = vec3(0.);
let focus_distance = 1.;
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
// Offset and normalize the viewport coordinates of the ray.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
// Map `uv` from y-down (normalized) viewport coordinates to camera coordinates.
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
let direction = vec3(uv, -focus_distance);
...
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [randomized-ssaa-code]: [shaders.wgsl] 16 grid samples]
Now, the anti-aliased edges have a much more gradual transition and look a lot less blocky compared
to our previous 16-sample AA:
![Figure [randomized-ssaa-image]: Randomized sub-pixel supersampling @ 400x300
](../images/img-15-random-subpixel-samples.png height="500px" class="pixel")
[^ch6-footnote10]: Xorshift is a so-called _linear congruential generator_. The random offsets
generated with xorshift follow a _white noise_ pattern in that they appear to be "purely random":
the sample points may appear clumped together in some places and have large gaps in others. A more
even spatial distribution of points (e.g. using _blue noise_ or the _Sobol sequence_) is generally
more desirable for stochastic methods but the Xorshift PRNG is good enough for our purposes, given
our large number of samples.
Path Tracing
====================================================================================================
What we perceive as color, shadows, transparency, reflections, and many other visual phenomena
result from interactions of light and matter. If we want to achieve some amount of realism, it makes
sense to base our computations on the real-world physics of light. That said, it's not necessary to
fully simulate electromagnetic wave interactions to render a visually pleasing image.
What we mainly care about is how light travels through the scene and what happens when it hits a
surface. We'll adhere to a relatively simple model with the following assumptions:
- Light travels in straight lines represented as rays.
- A ray transports some amount of light energy, called _radiance_.
- Light gets scattered when it hits a surface. The surface absorbs some of the radiance and scatters
the rest towards a new direction, represented by a new ray.
- A sequence of connected rays form a _light transport path_. All light transport paths originate
at a light source.
![Figure [light-paths-in-a-room]: The various paths that light rays in a room may take before they reach the camera.
](../images/fig-09-light-paths-overview.svg)
There are infinitely many transport paths in a scene. The paths that contribute to the
rendered image are the ones that eventually arrive at the camera, so we trace a light transport path
_backwards_, starting at a camera pixel. When we find an intersection with a surface in the scene,
we cast a new ray in the scattering direction based on the properties of the surface. We repeat the
process until a ray intersects a light source.
Path Tracing Loop
-----------------
Before implementing the path tracing logic let's introduce two subroutines. The first will be a
new function responsible for traversing the scene and finding an intersection, called
`intersect_scene`:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
struct Intersection {
normal: vec3f,
t: f32,
}
fn no_intersection() -> Intersection {
return Intersection(vec3(0.), -1.);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
fn is_intersection_valid(hit: Intersection) -> bool {
return hit.t > 0.;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
fn intersect_scene(ray: Ray) -> Intersection {
var closest_hit = Intersection(vec3(0.), FLT_MAX);
for (var i = 0u; i < OBJECT_COUNT; i += 1u) {
let sphere = scene[i];
let hit = intersect_sphere(ray, sphere);
if hit.t > 0. && hit.t < closest_hit.t {
closest_hit = hit;
}
}
if closest_hit.t < FLT_MAX {
return closest_hit;
}
return no_intersection();
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
struct Ray {
origin: vec3f,
direction: vec3f,
}
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
init_rng(vec2u(pos.xy));
let origin = vec3(0.);
let focus_distance = 1.;
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
// Offset and normalize the viewport coordinates of the ray.
let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5);
var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
// Map `uv` from y-down (normalized) viewport coordinates to camera coordinates.
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
let direction = vec3(uv, -focus_distance);
let ray = Ray(origin, direction);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let hit = intersect_scene(ray);
var radiance_sample: vec3f;
if is_intersection_valid(hit) {
radiance_sample = vec3(0.5 * hit.normal + vec3(0.5));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
} else {
radiance_sample = sky_color(ray);
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [intersect-scene]: [shaders.wgsl] The intersect_scene function]
A second new function, called `scatter`, will be responsible for evaluating the surface material.
For now it returns two values: an attenuation factor that represents the fraction of scattered
radiance and a scattering direction (typically denoted with the lower-case Greek letter $\omega$).
We store the attenuation factor as a `vec3f` since we're computing a separate radiance value for
each color channel.[^ch7-footnote3] Surface materials (which we'll explore in Section [materials])
are represented by various _scattering functions_. A scattering function maps an _incident_ light
direction $\omega_i$ to an _outgoing_ light direction $\omega_o$.
The rays originate from the camera and trace the transport path backwards towards light sources,
so when we call `scatter` we already know the scattering direction $\omega_o$. In that sense
"scatter" is somewhat a misnomer, since we're using it to compute $\omega_i$. This doesn't really
make a difference, as the incident and outgoing light directions are interchangeable. The surface
scattering functions that we will implement are all going to be _bi-directional_, i.e. work the same
way in either direction. As such, our `scatter` function allows the `input_ray` parameter to be
either an incident or a scattered light direction.
Let's make the scattering function reflect the ray around the normal vector like a perfect mirror.
The direction of a reflected ray given an incident ray direction and a surface normal is given by
_Snell's Law_. Luckily, there is a handy shader instrinsic called `reflect` that can compute this
for us:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
struct Scatter {
attenuation: vec3f,
ray: Ray,
}
fn scatter(input_ray: Ray, hit: Intersection) -> Scatter {
let reflected = reflect(input_ray.direction, hit.normal);
let output_ray = Ray(point_on_ray(input_ray, hit.t), reflected);
let attenuation = vec3(0.4);
return Scatter(attenuation, output_ray);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
struct Ray {
origin: vec3f,
direction: vec3f,
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [intersect-scene]: [shaders.wgsl] The scatter function]
The returned attenuation factor of $0.4$ means that the material absorbs 60% of the incoming
radiance (in all color channels) and scatters the rest. Logically, we compute this by multiplying
the transported radiance by the attenuation factor at every intersection. We don't actually
know the radiance value until we reach light sources but we can compute the total attenuation and
the transported radiance separately.
We'll write a loop that traces a path, generating rays as it finds intersections. We'll accumulate
the product of attenuation factors in a `throughput` variable and multiply that by the radiance
emitted by any light source that we encounter (which is just the sky for now):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
const MAX_PATH_LENGTH: u32 = 6u;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
init_rng(vec2u(pos.xy));
let origin = vec3(0.);
let focus_distance = 1.;
let aspect_ratio = f32(uniforms.width) / f32(uniforms.height);
// Offset and normalize the viewport coordinates of the ray.
let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5);
var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
// Map `uv` from y-down (normalized) viewport coordinates to camera coordinates.
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
let direction = vec3(uv, -focus_distance);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
var ray = Ray(origin, direction);
var throughput = vec3f(1.);
var radiance_sample = vec3(0.);
var path_length = 0u;
while path_length < MAX_PATH_LENGTH {
let hit = intersect_scene(ray);
if !is_intersection_valid(hit) {
// If no intersection was found, return the color of the sky and terminate the path.
radiance_sample += throughput * sky_color(ray);
break;
}
let scattered = scatter(ray, hit);
throughput *= scattered.attenuation;
ray = scattered.ray;
path_length += 1u;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
// Fetch the old sum of samples.
var old_sum: vec3f;
if uniforms.frame_count > 1 {
old_sum = textureLoad(radiance_samples_old, vec2u(pos.xy), 0).xyz;
} else {
old_sum = vec3(0.);
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [intersect-scene]: [shaders.wgsl] The intersect_scene function]
`throughput` starts out as $1$ (meaning no radiance has been absorbed). We also impose an artificial
limit on the length of a path to prevent looping forever if we never encounter a light
source (which can happen with certain types of geometry). Running this program should produce this
image:
![Figure [invalid-scatter-with-shadow-acne]: Validating the path tracing loop (with self-shadowing)
](../images/img-18-mirror-reflection-with-shadow-acne.png)
We can see some reflections but there are some nasty circular bands. This artifact (called
"shadow acne" or "self-shadowing") is caused by the limited (and quantized) precision inherent to
floating point arithmetic. Sometimes the computed intersection point doesn't fall precisely on the
sphere surface, which can cause the new ray (originating from that point) to re-intersect the sphere.
A simple way to deal with this is to reject intersections for values of $t$ that are below a small
offset ($\epsilon$ or _epsilon_):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
const FLT_MAX: f32 = 3.40282346638528859812e+38;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
const EPSILON: f32 = 1e-2;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
fn intersect_sphere(ray: Ray, sphere: Sphere) -> Intersection {
let v = ray.origin - sphere.center;
let a = dot(ray.direction, ray.direction);
let b = dot(v, ray.direction);
let c = dot(v, v) - sphere.radius * sphere.radius;
let d = b * b - a * c;
if d < 0. {
return no_intersection();
}
let sqrt_d = sqrt(d);
let recip_a = 1. / a;
let mb = -b;
let t1 = (mb - sqrt_d) * recip_a;
let t2 = (mb + sqrt_d) * recip_a;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let t = select(t2, t1, t1 > EPSILON);
if t <= EPSILON {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
return no_intersection();
}
let p = point_on_ray(ray, t);
let N = (p - sphere.center) / sphere.radius;
return Intersection(N, t);
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [intersect-scene]: [shaders.wgsl] Rejecting intersections too close to the ray origin]
The code considers both `t1` and `t2` as `t2` (the farther point) could be a valid intersection if
the ray originated inside the sphere (e.g. for a glass-like material). Following this
change, the rendering should look like this:
![Figure [shadow-acne-fixed]: Validating the path tracing loop
](../images/img-19-mirror-reflection-no-acne.png)
That looks much cleaner. Some reflections are visible and both spheres have acquired a blue tint
where light paths eventually reach the sky. Some light paths bounce back and forth between both
spheres. Each bounce is an "absorption event" that decreases the path throughput. The image looks
darker with more absorptions, which is most apparent where the two spheres meet.
Rejecting intersections that are too close to the ray origin may not always prevent self-shadowing.
When the ray direction is nearly tangent to the surface, the required threshold to reject points
that lie on the surface can increase--especially when the ray origin has a large coordinate value
with less floating point precision. For example, try increasing the radius of the lower sphere to
10,000. This may result in some bad self-shadowing artifacts in the distance:
![Figure [shadow-acne-fixed]: Self-shadowing in the distance
](../images/img-28-shadow-acne-distance.png)
An alternative way to avoid the issue is to perturb the ray origin away from the surface along
the surface normal by a small amount. Since this moves the "entire ray" away from the object,
self intersections are become less likely with an appropriate epsilon:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
fn scatter(input_ray: Ray, hit: Intersection) -> Scatter {
let reflected = reflect(input_ray.direction, hit.normal);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let output_ray = Ray(point_on_ray(input_ray, hit.t) + hit.normal * EPSILON, reflected);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
let attenuation = vec3(0.4);
return Scatter(attenuation, output_ray);
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [intersect-scene]: [shaders.wgsl] Moving the ray origin away from the surface]
Gamma Correction
----------------
Right now, this image looks a bit too dark. The perceived brightness (or luminance) of a pixel
should ideally scale linearly with the stored radiance value. In other words, if a material absorbs
50% of the radiance arriving directly from the sky, it should appear half as dark as the sky.
However, the reflections of both spheres become nearly invisible after only three ray bounces.
This is because the surface texture expects pixel values to be _gamma encoded_. Our eyes are more
sensitive to changes in dark tones than they are to similar changes in bright tones. Given that
we only have a fixed range to represent pixel's luminance ($[0, 1]$), it is more efficient (in
terms of storage) to allocate a bigger numerical range for smaller radiance values. This is how
digital images usually get stored and virtually all displays apply _gamma correction_ while
converting pixel values to light.[^ch7-footnote4]
The formula for gamma ($\gamma$) encoding is $V_{out} = V_{in}^{\frac{1}{\gamma}}$. We can apply
this function in the fragment shader right before outputting the color:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
...
// Compute and store the new sum.
let new_sum = radiance_sample + old_sum;
textureStore(radiance_samples_new, vec2u(pos.xy), vec4(new_sum, 0.));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
// Display the average after gamma correction (gamma = 2.2)
let color = new_sum / f32(uniforms.frame_count);
return vec4(pow(color, vec3(1. / 2.2)), 1.);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [gamma-2.2]: [shaders.wgsl] Encoding a pixel with $\gamma = 2.2$]
The gamma corrected output should look like this:
![Figure [gamma-correction]: Gamma-correction](../images/img-20-gamma-correction.png)
Some platforms support textures with a _sRGB_ format. Pixels automatically undergo gamma compression
(or decompression) upon writes and reads to sRGB textures. You can try this yourself: instead of
applying gamma correction in the shader, change all instances of `Rgba8Unorm` and `Bgra8Unorm` in
`src/main.rs` and `src/render.rs` to `Rgba8UnormSrgb` and `Bgra8UnormSrgb`. You should see a similar
result if your platform supports sRGB surfaces.
[^ch7-footnote3]: This RGB representation is simple and works well for most cases but cannot
accurately represent effects like diffraction and interference. There are alternative
representations to handle such phenomena, for example by storing a power distribution across a
spectrum of constituent wavelengths.
[^ch7-footnote4]: [_Understanding Gamma Correction_](https://www.cambridgeincolour.com/tutorials/gamma-correction.htm)
(by Cambridge in Color) is a great short read on the topic.
Path Length
-----------
Let's momentarily set the attenuation factor to 1, so that both spheres reflect 100% of the energy
they receive.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
fn scatter(input_ray: Ray, hit: Intersection) -> Scatter {
let reflected = reflect(input_ray.direction, hit.normal);
let output_ray = Ray(point_on_ray(input_ray, hit.t), reflected);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let attenuation = vec3(1.);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
return Scatter(attenuation, output_ray);
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [no-absorption]: [shaders.wgsl] Spheres that reflect all light]
You should get this result:
![Figure [mirrors-showing-bias]: Bias from early termination
](../images/img-21-max-bounces-too-low.png)
The image looks a lot brighter (as expected) but there is a well-defined black circle in between
the spheres. That looks wrong, given the spheres aren't supposed to absorb any light. Luckily there
is an easy explanation: the current upper limit on path length (i.e. `MAX_PATH_LENGTH`) is too low
to fully explore that part of the scene, so the path gets terminated before it can find the light
source.
![Figure [path-sphere-interreflections]: A light transport path with 7 bounces
](../images/fig-11-sphere-interreflections.svg)
Try increasing `MAX_PATH_LENGTH` to 10. There should be a black circle but smaller. It turns out
that at least 13 bounces are necessary to eliminate the black circle for this particular scene:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
const MAX_PATH_LENGTH: u32 = 13u;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [increase-path-length]: [shaders.wgsl] Increased path length]
![Figure [mirrors-with-13-bounces]: Infinite mirror with 13 bounces
](../images/img-22-infinite-mirror-with-13-bounces.png)
This begs the question: what is the ideal value for `MAX_PATH_LENGTH`? The answer depends on a
number of factors, but it mainly comes down to the scene and performance expectations. Fewer
bounces means less computation but potentially incorrect images. More bounces means more light
paths get explored but more computation is necessary. It also increases the chances of wasted work
on paths that don't contribute significantly to the final image. We'll revisit this topic later.
Colored Spheres
---------------
All real-world objects absorb some amount of light. They also impart a color on the light that they
reflect. It would be nice to assign different colors to the spheres so we can tell them apart. For
now, let's add an additional field to the `Sphere` structure to hold a shape's color:
The RGB triplet can directly represent the attenuation factor for a given sphere. Let's have the
intersection routine return the color of a sphere and use that color in the scattering function:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
struct Intersection {
normal: vec3f,
t: f32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
color: vec3f,
}
fn no_intersection() -> Intersection {
return Intersection(vec3(0.), -1., vec3(0.));
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
fn intersect_sphere(ray: Ray, sphere: Sphere) -> Intersection {
...
let p = point_on_ray(ray, t);
let N = (p - sphere.center) / sphere.radius;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
return Intersection(N, t, sphere.color);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
}
fn intersect_scene(ray: Ray) -> Intersection {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
var closest_hit = no_intersection();
closest_hit.t = FLT_MAX;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
for (var i = 0u; i < OBJECT_COUNT; i += 1u) {
let sphere = scene[i];
let hit = intersect_sphere(ray, sphere);
if hit.t > 0. && hit.t < closest_hit.t {
closest_hit = hit;
}
}
if closest_hit.t < FLT_MAX {
return closest_hit;
}
return no_intersection();
}
...
fn scatter(input_ray: Ray, hit: Intersection) -> Scatter {
let reflected = reflect(input_ray.direction, hit.normal);
let output_ray = Ray(point_on_ray(input_ray, hit.t), reflected);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let attenuation = hit.color;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
return Scatter(attenuation, output_ray);
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [apply-sphere-color]: [shaders.wgsl] Use sphere color to attenuate throughput]
![Figure [colored-spheres]: Spheres with different colors](../images/img-23-colored-spheres.png)
Interactive Camera
====================================================================================================
So far, we've been looking at spheres from a fixed position and it would be nice to be able to move
around. In order to reposition the camera with user input, we need a representation of the camera
state that is shared between the CPU and GPU sides of the program.
In our GPU code, we have relied on built-in vector algebra primitives (such as `vec3`) provided by
WGSL. We need similar primitives on the CPU side in order to compute camera parameters (such as
camera position and orientation) in response to input events generated by the windowing system.
To that end, we'll be adding a new `algebra` module for linear algebra utilities:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
mod algebra;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
mod render;
const WIDTH: u32 = 800;
const HEIGHT: u32 = 600;
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [algebra-module-decl]: [main.rs] Declare the `algebra` module]
`algebra.rs` defines a single type: `Vec3`. As its name suggests, this type represents a
3-dimensional vector with three 32-bit floating point components. `Vec3` defines methods for vector
operations and operator overloads for component-wise arithmetic (`+, -, *, /`) and assignment
(`+=, -=, *=, /=`).[^ch8-footnote1]
The memory layout of a `Vec3` consists of three contiguous `f32`'s (taking up 12 bytes) which
exactly matches the layout of the WGSL `vec3f` type.
[^ch8-footnote1]: In Rust, operators get overloaded by implementing traits (`std::ops::Add`,
`std::ops::Sub`, `std::ops::Mul`, `std::ops::Div`, etc). The operator traits are parameterized on
value types (such as `fn add(self, rhs: RHS) -> Output` in `std::ops::Add`) and don't automatically
extend to invocations on borrows. For example, `a + b`, where `a` and `b` are both `Vec3`, is
different from `a + &b`, `&a + b`, and `&a + &b`. The `impl_binary_op` macro automatically implements
the traits for all of these combinations, for convenience.
Uniforms and Alignment
----------------------
We can use the uniform buffer to make the camera parameters visible to both the CPU and GPU sides of
the program. Let's define a new `CameraUniforms` structure that just stores the camera position:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
struct Uniforms {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
camera: CameraUniforms,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
width: u32,
height: u32,
frame_count: u32,
}
@group(0) @binding(0) var uniforms: Uniforms;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
struct CameraUniforms {
origin: vec3f,
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
init_rng(vec2u(pos.xy));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
let origin = uniforms.camera.origin;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
let focus_distance = 1.;
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [apply-sphere-color]: [shaders.wgsl] Use sphere color to attenuate throughput]
We need to mirror these changes on the CPU side. Let's introduce a new Rust module called `camera`
for all camera related code.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
use bytemuck::{Pod, Zeroable};
use crate::algebra::Vec3;
#[derive(Debug, Copy, Clone, Pod, Zeroable)]
#[repr(C)]
pub struct CameraUniforms {
origin: Vec3,
}
pub struct Camera {
uniforms: CameraUniforms,
}
impl Camera {
pub fn new(origin: Vec3) -> Camera {
Camera {
uniforms: CameraUniforms { origin },
}
}
pub fn uniforms(&self) -> &CameraUniforms {
&self.uniforms
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-module]: [camera.rs] The `camera` module]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
mod algebra;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
mod camera;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
mod render;
const WIDTH: u32 = 800;
const HEIGHT: u32 = 600;
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-module-decl]: [main.rs] Declare the `camera` module]
The module defines two structs: `CameraUniforms` and `Camera`. `CameraUniforms` is going to contain
only the state that will be shared with the GPU, while `Camera` is meant to be a higher level
wrapper that can contain additional variables. For now, the only state is the camera origin so the
type definition is pretty bare bones.
Let's update our CPU-side `Uniforms` struct to mirror the GPU side by including `CameraUniforms`.
We'll also reposition the camera origin to verify our changes:
When you run this code, wgpu should emit an API validation error that says "_Buffer is bound with
size 24 where the shader expects 32_." 24 bytes looks correct at first glance: 12 bytes for a `Vec3`
(4 bytes each for 3 `f32`s), and 3 `u32`s for the `width`, `height`, and `frame_count` fields, each
taking up 4 bytes. The error message says the shader declared a 32-byte struct, so where do the 8
missing bytes come from? The answer is _implicit padding_ inserted by WGSL to satisfy alignment
requirements.
Computers access memory more efficiently if the memory address of the accessed data is aligned to
certain multiples of the processor word size. WGSL defines specific rules for its scalar and vector
types[^ch8-footnote2] and it expects the memory layout of bound data structures to adhere to those
rules (see Table [scalar-and-vector-alignment]).
Type | Alignment | Size
:----:|:---------:|:----:
**u32, f32** | 4 | 4
**vec2** | 8 | 8
**vec3** | 16 | 12
**vec4** | 16 | 16
[Table [scalar-and-vector-alignment]: Alignment and data sizes for scalar and vector types.]
The alignment of a struct is equal to the largest alignment among its members. The size of a struct
is defined as the sum of the sizes of its members, rounded up to a multiple of its alignment.
Before our last change, the `Uniforms` struct had 4-byte alignment and occupied 12 bytes in size.
We introduced the `CameraUniforms` structure, which has a single member of type `vec3f` and
therefore 16-byte alignment. `vec3f` is 12 bytes in size, so the struct is _padded_ with 4 bytes
to bring its size up to 16. While WGSL does this implicitly, we need to explicitly add padding on
the Rust side.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
#[derive(Debug, Copy, Clone, Pod, Zeroable)]
#[repr(C)]
pub struct CameraUniforms {
origin: Vec3,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
_pad: u32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
impl Camera {
pub fn new(origin: Vec3) -> Camera {
Camera {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
uniforms: CameraUniforms { origin, _pad: 0 },
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
}
pub fn uniforms(&self) -> &CameraUniforms {
&self.uniforms
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-uniforms-padded]: [camera.rs] `CameraUniforms` explicitly padded]
We also introduced a new member of type `CameraUniforms` to the `Uniforms` struct. That increased the
latter's alignment to 16 and brought its size up to 28 bytes. 28 is not a multiple of the new
alignment and the next closest multiple is 32. Therefore we need to pad `Uniforms` with 4 additional
bytes:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
struct Uniforms {
camera: CameraUniforms,
width: u32,
height: u32,
frame_count: u32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
_pad: u32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
impl PathTracer {
pub fn new(
device: wgpu::Device,
queue: wgpu::Queue,
width: u32,
height: u32,
) -> PathTracer {
...
// Initialize the uniform buffer.
let camera = Camera::new(Vec3::new(0., -0.5, 1.));
let uniforms = Uniforms {
camera: *camera.uniforms(),
width,
height,
frame_count: 0,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
_pad: 0,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
};
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [uniforms-padded]: [render.rs] `Uniforms` explicitly padded]
The padding is currently wasted space but we will make use of it in the future. Running the program
should now pass validation and render an image that looks like this:
![Figure [repositioned-camera-origin]: Camera origin repositioned](../images/img-24-camera-origin-repositioned.png)
[^ch8-footnote2]: The alignment and size requirements for WGSL types are defined at
https://www.w3.org/TR/WGSL/#alignment-and-size.
Rotation
--------
We know how to reposition the camera but the view direction is still fixed towards $-z$.
Remember that we define the camera ray direction for each pixel in terms of a point
on an imaginary viewport (Figure [camera-view-space]). Conceptually, rotating the camera to
change the view direction is much like moving and rotating the viewport around the camera origin.
Let's imagine for a moment that the coordinate system depicted in Figure [camera-view-space] is
distinct from the coordinate space of the scene. This new _camera coordinate space_ has its own
$x$, $y$, and $z$ axes. The viewport is always parallel to the $xz$-plane and sits some distance
away on the $z$-axis. We can even define this coordinate system as left-handed so that the
view direction faces the $+z$-axis instead of $-z$.
Now imagine that the camera coordinate space exists within the scene coordinate space and it can
move around freely. Suppose that the camera coordinate axes can point towards any direction in
scene space as long as they satisfy the definition of our left-handed cartesian system: **a)** the axes
are always orthogonal to each other (i.e. the angle between any two axes is 90 degrees), and **b)**
from the camera's point of view, $+x$ points towards the _right_, $+y$ points _up_, and $+z$ points
_forward_.
Let's define the scene-space orientation of the camera coordinate axes with 3 unit vectors:
$\vec{\textbf{u}}$ for $+x$, $\vec{\textbf{v}}$ for $+y$, and $\vec{\textbf{w}}$ for $+z$. These
are the _basis vectors_ of the camera coordinate space. For example, our current camera orientation
staring down the $-z$-axis of the scene coordinate space (with $+y$ pointing up), would
have the basis vectors $\vec{\textbf{u}} = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}$,
$\vec{\textbf{v}} = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}$,
$\vec{\textbf{w}} = \begin{bmatrix} 0 \\ 0 \\ -1 \end{bmatrix}$.
![Figure [camera-basis-vectors]: Camera basis vectors in relation to camera parameters
](../images/fig-12-camera-basis-vectors.svg)
These vectors establish a relationship between the two coordinate systems. Each basis vector
tells us how to project the corresponding camera-space axis onto the scene-space axes. With this
information, we can transform any vector defined in one space into the other. For example, we can
rotate a ray direction vector defined in camera-space into the appropriate scene-space orientation
by multiplying it by this matrix:
$$
\begin{bmatrix}
\textbf{u}.x & \textbf{v}.x & \textbf{w}.x \\
\textbf{u}.y & \textbf{v}.y & \textbf{w}.y \\
\textbf{u}.z & \textbf{v}.z & \textbf{w}.z
\end{bmatrix}
$$
$\vec{\textbf{u}}$, $\vec{\textbf{v}}$, and $\vec{\textbf{w}}$ have to be unit vectors and
orthogonal. Instead of specifying them directly, we will compute them from three parameters:
the camera origin, a reference point the camera should "look at", and an "up" direction. The
reference point will always appear at the center of the viewport. The vector pointing from the
origin to this center point is the view direction $\vec{\textbf{w}}$.
The cross product of two vectors yields another vector that is orthogonal to the plane formed by the
original two, so once we know $\vec{\textbf{w}}$, we can compute the other two basis vectors using
a series of cross products:
$$
\begin{aligned}
\vec{\textbf{u}} &= \vec{\textbf{w}} \times \vec{\textbf{up}} \\
\vec{\textbf{v}} &= \vec{\textbf{u}} \times \vec{\textbf{w}}
\end{aligned}
$$
Let's start with the WGSL and extend the `CameraUniforms` structure to hold the basis
vectors in addition to the origin. We'll construct a 3x3 matrix out of the basis vectors and use
that to transform the ray which we currently compute in camera space. Note that the $z$-coordinate
of the camera ray direction no longer needs to be negative, since it's now defined with respect to
$\vec{\textbf{w}}$.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
...
struct CameraUniforms {
origin: vec3f,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
u: vec3f,
v: vec3f,
w: vec3f,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
}
...
@fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f {
init_rng(vec2u(pos.xy));
let origin = uniforms.camera.origin;
let focus_distance = 1.;
// Offset and normalize the viewport coordinates of the ray.
let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5);
var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u));
// Map `uv` from y-down (normalized) viewport coordinates to camera coordinates.
uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
// Compute the scene-space ray direction by rotating the camera-space vector into a new
// basis.
let camera_rotation = mat3x3(uniforms.camera.u, uniforms.camera.v, uniforms.camera.w);
let direction = camera_rotation * vec3(uv, focus_distance);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL
var ray = Ray(origin, direction);
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-rotation-matrix]: [shaders.wgsl] Ray direction rotated to camera basis]
Once again, we need to pay attention to the required alignment on the CPU side. `u`, `v`, and `w`
are declared as `vec3f` which must be aligned to an offset that's a multiple of 16. Since the size
of `vec3f` is 12, we need to insert padding after each member to fix the alignment of the next
member:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
#[derive(Debug, Copy, Clone, Pod, Zeroable)]
#[repr(C)]
pub struct CameraUniforms {
origin: Vec3,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
_pad0: u32,
u: Vec3,
_pad1: u32,
v: Vec3,
_pad2: u32,
w: Vec3,
_pad3: u32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
impl Camera {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn look_at(origin: Vec3, center: Vec3, up: Vec3) -> Camera {
let w = (center - origin).normalized();
let u = w.cross(&up).normalized();
let v = u.cross(&w);
Camera {
uniforms: CameraUniforms {
origin,
_pad0: 0,
u,
_pad1: 0,
v,
_pad2: 0,
w,
_pad3: 0,
},
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
pub fn uniforms(&self) -> &CameraUniforms {
&self.uniforms
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-basis-vectors-cpu]: [camera.rs] Computing the camera basis vectors]
Finally, let's update the camera position and orientation to look towards the bottom of the small
sphere from above:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl PathTracer {
pub fn new(
device: wgpu::Device,
queue: wgpu::Queue,
width: u32,
height: u32,
) -> PathTracer {
...
// Initialize the uniform buffer.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let camera = Camera::look_at(
Vec3::new(0., 0.75, 1.),
Vec3::new(0., -0.5, -1.),
Vec3::new(0., 1., 0.),
);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
let uniforms = Uniforms {
camera: *camera.uniforms(),
width,
height,
frame_count: 0,
_pad: 0,
};
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-new-position]: [render.rs] New camera position]
![Figure [camera-reoriented]: New camera orientation](../images/img-25-camera-look-at.png)
Zoom
----
Now let's start add controls for camera movement. It is generally useful to be able to bring the
camera closer to (or away from) the object
in view without changing the viewing angle. Imagine a straight line through the `origin` and `center`
parameters of our `Camera::look_at` function. We can achieve a simple _zoom_ effect by moving the
camera forwards or backwards along this line. The basis vector $\vec{\textbf{w}}$ already gives us
the forward-facing direction on this line and it has unit length. Thus, computing the displacement
of the camera origin $\textbf{P}$ along this line by distance $d$ is straightforward:
$$
\begin{aligned}
\textbf{P}_{forward} &= \textbf{P} + \vec{\textbf{w}} \cdot d \\
\textbf{P}_{backward} &= \textbf{P} - \vec{\textbf{w}} \cdot d \\
\end{aligned}
$$
![Figure [orbit-camera-zoom-fig]: Moving the camera origin along the view direction
](../images/fig-13-orbit-camera-distance.svg)
Let's implement this as a new function called `Camera::zoom`. This will take a single parameter
representing the displacement. Positive values will move the origin forward while negative values
will move it backwards:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl Camera {
...
pub fn uniforms(&self) -> &CameraUniforms {
&self.uniforms
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn zoom(&mut self, displacement: f32) {
self.uniforms.origin += displacement * self.uniforms.w;
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-fn-zoom]: [camera.rs] The `Camera::zoom` function]
The next step is to wire this up to an input method. I personally prefer the scroll wheel on a
mouse (or a scroll gesture on a trackpad) for zooming, so I'll show you how to implement that.
`winit` sends raw input device events in the form of a `Event::DeviceEvent`. This is an enum
type (just like `Event::WindowEvent`) and the specific variant for mouse wheel events is named
`DeviceEvent::MouseWheel`. The event has a parameter called `delta` which we can convert to a
displacement amount. There are two variants of this parameter:
- `MouseScrollDelta::PixelDelta`: represents the delta in "number of pixels", typically generated by
a touch screen or trackpad.
- `MouseScrollDelta::LineDelta`: represents the delta in terms of "lines in a text document",
typically corresponding to the discrete "clicks" of a mouse scroll wheel.
The variant you receive depends on your input device. It usually makes sense to apply a scaling factor
to this delta, since using it directly is likely to result in a very large displacement in scene
coordinates. I used factors of 0.001 and 0.1 for the two events
respectively, though the ideal factor is going to depend on your device and system settings. The `delta`
value is _signed_, with positive and negative values corresponding to scrolling up and down, which
translates nicely to our `displacement` parameter.
We are going to handle the mouse scroll event in our main event loop. The event loop code currently
doesn't have direct access to the `Camera` object, as it is internal to the `PathTracer`
constructor. `PathTracer::new` currently discards the camera object, retaining only the uniform data,
as the camera state has so far been static. In addition to retaining the camera state, we also need
a way to update the uniforms buffer for changes to take effect before rendering a frame.
I'm going to suggest a simple refactor: let's decouple the `Camera` construction from the
`PathTracer` object and instead pass the camera as an argument to `PathTracer::render_frame`. We'll
simply always update the camera uniforms before rendering an individual frame:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
use crate::{
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
algebra::Vec3,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
camera::{Camera, CameraUniforms},
};
...
impl PathTracer {
pub fn new(
device: wgpu::Device,
queue: wgpu::Queue,
width: u32,
height: u32,
) -> PathTracer {
...
// Initialize the uniform buffer.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
let camera = Camera::look_at(
Vec3::new(0., 0.75, 1.),
Vec3::new(0., -0.5, -1.),
Vec3::new(0., 1., 0.),
);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
let uniforms = Uniforms {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
camera: CameraUniforms::zeroed(),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
width,
height,
frame_count: 0,
_pad: 0,
};
...
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn render_frame(&mut self, camera: &Camera, target: &wgpu::TextureView) {
self.uniforms.camera = *camera.uniforms();
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
self.uniforms.frame_count += 1;
self.queue
.write_buffer(&self.uniform_buffer, 0, bytemuck::bytes_of(&self.uniforms));
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render-frame-with-camera]: [render.rs] `render_frame` with a `Camera` parameter]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
use {
anyhow::{Context, Result},
winit::{
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
event::{DeviceEvent, Event, MouseScrollDelta, WindowEvent},
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
event_loop::{ControlFlow, EventLoop},
window::{Window, WindowBuilder},
},
};
mod algebra;
mod camera;
mod render;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
use crate::{algebra::Vec3, camera::Camera};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
const WIDTH: u32 = 800;
const HEIGHT: u32 = 600;
#[pollster::main]
async fn main() -> Result<()> {
...
let (device, queue, surface) = connect_to_gpu(&window).await?;
let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let mut camera = Camera::look_at(
Vec3::new(0., 0.75, 1.),
Vec3::new(0., -0.5, -1.),
Vec3::new(0., 1., 0.),
);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
event_loop.run(|event, control_handle| {
control_handle.set_control_flow(ControlFlow::Poll);
match event {
Event::WindowEvent { event, .. } => match event {
WindowEvent::CloseRequested => control_handle.exit(),
WindowEvent::RedrawRequested => {
// Wait for the next available frame buffer.
let frame: wgpu::SurfaceTexture = surface
.get_current_texture()
.expect("failed to get current texture");
let render_target = frame
.texture
.create_view(&wgpu::TextureViewDescriptor::default());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
renderer.render_frame(&camera, &render_target);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
frame.present();
window.request_redraw();
}
_ => (),
},
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
Event::DeviceEvent { event, .. } => match event {
DeviceEvent::MouseWheel { delta } => {
let delta = match delta {
MouseScrollDelta::PixelDelta(delta) => 0.001 * delta.y as f32,
MouseScrollDelta::LineDelta(_, y) => y * 0.1,
};
camera.zoom(delta);
}
_ => (),
},
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
_ => (),
}
})?;
Ok(())
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [mouse-wheel-event]: [main.rs] Updating the camera on mouse wheel events]
Now, run this code and use your trackpad or mouse wheel to scroll up and down. You should see some
movement but you should also see some "smudging" or "ghosting". This is what I get if I scroll back
and forth, pausing at different distances for a few seconds:
![Figure [smuged-zoom]: Ghosts of zoom levels past](../images/img-26-smudged-zoom.png)
This is the same effect that we saw in Figure [temporal-blur-effect], which is caused by temporal
accumulation. Moving the camera effectively invalidates all the samples we have collected up to that
point, as our cached radiance values only make sense for a specific camera configuration. The
simplest thing we can do is discard old samples whenever we mutate the camera. Luckily, this is
pretty easy to do: the code we added in Listing [sample-accumulation] already ignores old samples
for the initial value of `uniforms.frame_count`. So all we need to do is reset the frame count:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl PathTracer {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn reset_samples(&mut self) {
self.uniforms.frame_count = 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub fn render_frame(&mut self, camera: &Camera, target: &wgpu::TextureView) {
self.uniforms.camera = *camera.uniforms();
self.uniforms.frame_count += 1;
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [path-tracer-reset-frame]: [render.rs] `PathTracer::reset_frame`]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
fn main() -> Result<()> {
...
event_loop.run(|event, control_handle| {
control_handle.set_control_flow(ControlFlow::Poll);
match event {
Event::WindowEvent { event, .. } => match event {
...
},
Event::DeviceEvent { event, .. } => match event {
DeviceEvent::MouseWheel { delta } => {
let delta = match delta {
MouseScrollDelta::PixelDelta(delta) => 0.001 * delta.y as f32,
MouseScrollDelta::LineDelta(_, y) => y * 0.1,
};
camera.zoom(delta);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
renderer.reset_samples();
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
_ => (),
}
_ => (),
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [mouse-wheel-reset-samples]: [main.rs] Reset samples on camera zoom]
You should no longer see any artifacts when you zoom in and out:
![Figure [zooming-in-and-out]: (video) Zooming in and out
](../images/vid-02-zooming-in-and-out.mp4 autoplay muted loop)
Pan
---
The next camera movement type on our list is _pan_, which moves the camera left, right, up, or
down without changing the view direction. We're going to align these 4 directions to the basis
vectors $\vec{\mathbf{u}}$ and $\vec{\mathbf{v}}$ and displace the origin point on the 2D plane
that is perpendicular to the view direction.
![Figure [orbit-camera-pan]: Pan movement on the uv-plane.
](../images/fig-14-orbit-camera-pan.svg)
A new `Camera::pan` function will accept two delta values that represent displacement in two
dimensions ($\vec{\mathbf{u}}$ and $\vec{\mathbf{v}}$). Note that both of these values (`du` and
`dv`) can be negative:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl Camera {
...
pub fn uniforms(&self) -> &CameraUniforms {
&self.uniforms
}
pub fn zoom(&mut self, displacement: f32) {
self.uniforms.origin += displacement * self.uniforms.w;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn pan(&mut self, du: f32, dv: f32) {
let pan = du * self.uniforms.u + dv * self.uniforms.v;
self.uniforms.origin += pan;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-fn-pan]: [camera.rs] The `Camera::pan` function]
Let's continue using the mouse, this time translating its motion to camera movement.
`winit` sends the `DeviceEvent::MouseMotion` with a 2D `delta` parameter that contains
the mouse displacement in $x$ and $y$ coordinates. Negative and positive values of the $x$ delta
corresponds to left and right movement, respectively. Similarly, negative and positive values of the
$y$ delta corresponds to movement up and down.
Note that the application will receive the `DeviceEvent::MouseMotion` even without input focus.
Unless we explicitly control when the camera should and should not move, all mouse movement will
result in camera movement and reset the radiance samples. Bumping into the mouse while waiting for
a slow render to resolve can be annoying, so let's prevent accidents and require that the user hold
down a mouse button during movement. We can use the `DeviceEvent::Button` event to detect when a
mouse button gets pressed and released.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
async fn main() -> Result<()> {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let mut mouse_button_pressed = false;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
event_loop.run(|event, control_handle| {
control_handle.set_control_flow(ControlFlow::Poll);
match event {
Event::WindowEvent { event, .. } => match event {
...
},
Event::DeviceEvent { event, .. } => match event {
DeviceEvent::MouseWheel { delta } => {
let delta = match delta {
MouseScrollDelta::PixelDelta(delta) => 0.001 * delta.y as f32,
MouseScrollDelta::LineDelta(_, y) => y * 0.1,
};
camera.zoom(delta);
renderer.reset_samples();
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
DeviceEvent::MouseMotion { delta: (dx, dy) } => {
if mouse_button_pressed {
camera.pan(dx as f32 * 0.01, dy as f32 * -0.01);
renderer.reset_samples();
}
}
DeviceEvent::Button { state, .. } => {
// NOTE: If multiple mouse buttons are pressed, releasing any of them will
// set this to false.
mouse_button_pressed = state == ElementState::Pressed;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
_ => (),
}
_ => (),
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [mouse-motion-camera-pan]: [main.rs] Pan camera on click-and-drag]
We apply a scale factor to adjust the mouse sensitivity and also flip the sign of `dy` so that
moving the mouse upwards pans the camera upwards.
![Figure [camera-pan]: (video) Pan camera with mouse movement](../images/vid-03-camera-pan.mp4 autoplay muted loop)
Orbit
-----
The zoom and pan controls let us move the `origin` point along the camera basis vectors without
changing the view direction. In order to freely look around objects in the scene, we need a way to
rotate the basis vectors.
The view direction $\vec{\mathbf{w}}$ is parallel to the vector that subtends the `origin` and
`center` points. We can effectively re-orient $\vec{\mathbf{w}}$ by simply re-positioning these two
points with respect to each other. Keeping `origin` fixed while moving `center` would result in a
_first-person_ style camera (imagine shifting your gaze around you without moving)
Alternately, keeping `center` fixed while moving `origin` around would appear as moving around while
facing the same stationary point. Both are valid approaches, though we're going to focus on the
latter.
Let's say that `origin` is allowed to move freely around `center` but we require that the distance
between the two points remain fixed. Now imagine a sphere that is centered at `center`, with a
radius equal to the distance between the two points. All possible positions of `origin` are then
located on the surface of this sphere.
![Figure [orbit-camera-angles]: The spherical coordinates of the camera origin, with azimuth angle $\theta$ and altitude angle $\phi$
](../images/fig-15-orbit-camera-angles.svg)
Given the sphere's `center` and its radius, we can represent any point on the surface of the sphere
using polar coordinates: an _azimuth_ angle and an _altitude_ angle. These two angles help us define
the location of `origin` in terms of rotations around the coordinate axes. This is convenient, since
we can easily map mouse movement to changes in polar coordinates, and use this representation to
move `origin` around `center`. With a little bit of trigonometry, we can compute $\vec{\mathbf{w}}$
from the two angles. If we also know the distance between the camera and `center`, `origin` can be
computed with a simple vector addition.
### Working With Spherical Coordinates
Let's expand the `Camera` struct with 4 new parameters:
- `center`: the point of camera focus, which serves as the center of rotation.
- `azimuth`: the azimuth angle $\theta$, defining rotation around the $y$-axis. Values can range
from $0$ to $2\pi$.
- `altitude`: the altitude angle $\phi$, defining rotation around the basis vector $\vec{\mathbf{u}}$. We'll
allow values to range from $-\frac{\pi}{2}$ to $\frac{\pi}{2}$ such that $sin~\phi$ yields a
$y$-coordinate ranging from $-1$ to $1$.
- `distance`: the distance between `center` and `origin`. This is assumed to be a positive, non-zero
value.
The bottom are _spherical coordinates_ for `origin`, ($\theta$, $\phi$, $r$). We're going to
define the coordinate system such that ($0$, $0$, $d$) corresponds to a view direction aligned with
the $-z$-axis, $d$ units away from `center`. Similarly, spherical coordinates ($0$, $\pi$, $1$) will
have the view direction point down the $-y$-axis, with the camera located at the cartesian
coordinates ($0$, $1$, $0$).
First, we'll rework the scene so that we can more easily observe rotations (the current scene is
symmetrical around the $y$-axis, so changes in azimuth would be difficult to tell). Let's also reset
the camera:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
async fn main() -> Result<()> {
...
let (device, queue, surface) = connect_to_gpu(&window).await?;
let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let mut camera = Camera::look_at(
Vec3::new(0., 0., 1.),
Vec3::new(0., 0.,-1.),
Vec3::new(0., 1., 0.),
);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-default-position]: [main.rs]]
![Figure [four-spheres-orbit]: Four spheres for reference](../images/img-27-four-spheres.png)
For now, we are going to do away with `Camera::look_at` and introduce a new constructor called
`Camera::with_spherical_coords`. This will compute the camera uniforms (i.e. the basis vectors and
the origin) from the new parameters. Since we are going to need to re-compute
the camera uniforms whenever the spherical coordinates change, let's factor out that logic in a
helper called `Camera::calculate_uniforms`:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
pub struct Camera {
uniforms: CameraUniforms,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
center: Vec3,
up: Vec3,
distance: f32,
azimuth: f32,
altitude: f32,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
...
impl Camera {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
pub fn look_at(origin: Vec3, center: Vec3, up: Vec3) -> Camera {
let w = (center - origin).normalized();
let u = w.cross(&up).normalized();
let v = u.cross(&w);
Camera {
uniforms: CameraUniforms {
origin,
_pad0: 0,
u,
_pad1: 0,
v,
_pad2: 0,
w,
_pad3: 0,
},
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn with_spherical_coords(
center: Vec3,
up: Vec3,
distance: f32,
azimuth: f32,
altitude: f32,
) -> Camera {
let mut camera = Camera {
uniforms: CameraUniforms::zeroed(),
center,
up,
distance,
azimuth,
altitude,
};
camera.calculate_uniforms();
camera
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub fn uniforms(&self) -> &CameraUniforms {
&self.uniforms
}
pub fn zoom(&mut self, displacement: f32) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
self.uniforms.origin += displacement * self.uniforms.w;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
self.distance = (self.distance - displacement).max(0.0); // Prevent negative distance
self.uniforms.origin = self.center - self.distance * self.uniforms.w;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
pub fn pan(&mut self, du: f32, dv: f32) {
let pan = du * self.uniforms.u + dv * self.uniforms.v;
self.uniforms.origin += pan;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
fn calculate_uniforms(&mut self) {
// TODO: calculate the correct w.
let w = Vec3::new(0., 0., -1.);
let origin = self.center - self.distance * w;
let u = w.cross(&self.up).normalized();
let v = u.cross(&w);
self.uniforms.origin = origin;
self.uniforms.u = u;
self.uniforms.v = v;
self.uniforms.w = w;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-basis-vectors-cpu]: [camera.rs] `Camera::with_spherical_coords`]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
async fn main() -> Result<()> {
...
let (device, queue, surface) = connect_to_gpu(&window).await?;
let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let mut camera = Camera::with_spherical_coords(
Vec3::new(0., 0., -1.),
Vec3::new(0., 1., 0.),
2.,
0.,
0.,
);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-default-position-spherical]: [main.rs] Default camera position with spherical coordinates]
### Altitude Control
We will use mouse movement to control the altitude and azimuth angles. We are already routing
`DeviceEvent::MouseMotion` events to `Camera::pan` but we could reserve _left-click_ drag for
orbital movement and _right-click_ drag for pan. The `DeviceEvent::Button` event has a `button`
field that can be used to identify the mouse button that was pressed or released:
Next, we'll define `Camera::orbit`. Let's initially ignore the azimuth angle. Vertical mouse
movement will modify the altitude angle while keeping it between $-\frac{\pi}{2}$ to $\frac{\pi}{2}$.
We won't allow the angle to increase or decrease beyond this range, so once the camera moves to one
of these extrema, it will stay there unless it is moved in the opposite direction. This will
disallow turning the scene "upside down" and spinning continously.
Let's also update `Camera::calculate_uniforms` to compute $\vec{\mathbf{w}}$ using only the
altitude. Consider the unit-length vector pointing from `center` to `origin`, i.e.
$-\vec{\mathbf{w}}$. The $y$-coordinate of this vector is equal to $sin~\phi$. We're ignoring
azimuth, so we can simply assign $0$ to the $x$-coordinate, and $cos~\phi$ to the $z$-coordinate:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
use {
bytemuck::{Pod, Zeroable},
std::f32::consts::FRAC_PI_2,
};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl Camera {
...
pub fn uniforms(&self) -> &CameraUniforms {
&self.uniforms
}
pub fn zoom(&mut self, displacement: f32) {
self.uniforms.origin += displacement * self.uniforms.w;
}
pub fn pan(&mut self, du: f32, dv: f32) {
let pan = du * self.uniforms.u + dv * self.uniforms.v;
self.uniforms.origin += pan;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn orbit(&mut self, du: f32, dv: f32) {
self.altitude = (self.altitude + dv).clamp(-FRAC_PI_2, FRAC_PI_2);
self.calculate_uniforms();
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
fn calculate_uniforms(&mut self) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let w = {
let (y, z) = self.altitude.sin_cos();
-Vec3::new(0., y, z)
};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
let origin = self.center - self.distance * w;
let u = w.cross(&self.up).normalized();
let v = u.cross(&w);
self.uniforms.origin = origin;
self.uniforms.u = u;
self.uniforms.v = v;
self.uniforms.w = w;
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-orbit-altitude]: [camera.rs] `Camera::orbit`, altitude-only]
Now, when you hold down the left mouse button and move the camera around, you should see something
like this:
![Figure [camera-orbit-altitude]: (video) Adjust altitude angle with mouse movement
](../images/vid-04-camera-orbit-altitude.mp4 autoplay muted loop)
If you look carefully, you may notice that the green and yellow spheres swap places when the altitude
angle is at one of the extrema. At those angles (i.e. exactly at $-\frac{\pi}{2}$ and
$\frac{\pi}{2}$) the view vector $\vec{\textbf{w}}$ becomes parallel to the _up_ vector and their
cross product becomes zero. This causes both $\vec{\textbf{u}}$ and $\vec{\textbf{v}}$ to become
degenerate. A simple fix is to truncate the range by a small amount, so that the angle can be
close but never equal to $-\frac{\pi}{2}$ or $\frac{\pi}{2}$. This only works if _up_ is exactly
$(0, 1, 0)$ or $(0, -1, 0)$ and doesn't generalize to other directions:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl Camera {
...
pub fn orbit(&mut self, du: f32, dv: f32) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
const MAX_ALT: f32 = FRAC_PI_2 - 1e-6;
self.altitude = (self.altitude + dv).clamp(-MAX_ALT, MAX_ALT);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
self.calculate_uniforms();
}
...
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-prevent-degenerate-altitude]: [camera.rs] Apply offset to altitude clamp]
That should fix the issue:
![Figure [camera-fixed-altitude-clamp]: (video) Fixed altitude clamp
](../images/vid-05-fixed-altitude-clamp.mp4 autoplay muted loop)
### Azimuth Control
The same way we mapped vertical mouse movement (`dv`) to changes in altitude, we'll use horizontal
movement `du` to change the control. We won't restrict the horizontal orbit the way we clamped the
altitude angle and instead permit orbiting in either direction indefinitely. It still make sense
to restrict the value to the $[0, 2\pi]$ range since floating-point precision decreases with
large values.[^ch8-footnote3] Though instead of clamping the value we'll just let it wrap, so that
an azimuth angle of $3\pi$ results in a value of $\pi$.
In Rust, this is achieved with the arithmetic remainder operators `%` and `%=`. These support
floating point numbers and retain the sign of the value that's on the left-hand side: for
example, if the mouse moves left by $-\frac{5}{2}\pi$ (i.e. -450 degrees) the resulting angle will be
$-\frac{1}{2}\pi$:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl Camera {
...
pub fn orbit(&mut self, du: f32, dv: f32) {
const MAX_ALT: f32 = FRAC_PI_2 - 1e-6;
self.altitude = (self.altitude + dv).clamp(-MAX_ALT, MAX_ALT);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
self.azimuth += du;
self.azimuth %= 2. * PI;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
self.calculate_uniforms();
}
...
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-orbit-azimuth]: [camera.rs] Modifying the azimuth angle]
We assigned the cosine of the altitude angle (i.e. $cos ~ \phi$) to the z-coordinate of the rotated
$\vec{\textbf{w}}$. More generally, this quantity is equal to the length of a vector that results
from projecting $\vec{\textbf{w}}$ onto the $xz$-plane (see Figure [orbit-camera-angles]). We can
compute the $x$ and $z$ components of this vector from the azimuth angle as
$(sin ~ \theta, 0, cos ~ \theta)$ scaled by the magnitude $cos ~ \phi$. Combining this with
$sin ~ \phi$ for the $y$-coordinate we get:
$$
-\vec{\textbf{w}} =
\begin{bmatrix}
sin~\theta \cdot cos~\phi \\
sin~\phi \\
cos~\theta \cdot cos~\phi
\end{bmatrix}
$$
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl Camera {
...
fn calculate_uniforms(&mut self) {
let w = {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let (y, xz_scale) = self.altitude.sin_cos();
let (x, z) = self.azimuth.sin_cos();
-Vec3::new(x * xz_scale, y, z * xz_scale)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
};
let origin = self.center - self.distance * w;
let u = w.cross(&self.up).normalized();
let v = u.cross(&w);
self.uniforms.origin = origin;
self.uniforms.u = u;
self.uniforms.v = v;
self.uniforms.w = w;
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-orbit-azimuth-w]: [camera.rs] Computing $\vec{\textbf{w}}$ from azimuth and altitude]
We can now rotate the camera horizontally and vertically around the center point:
![Figure [camera-orbit-azimuth]: (video) Horizontal rotation](../images/vid-06-camera-orbit-azimuth.mp4 autoplay muted loop)
[^ch8-footnote3]: As a floating point number gets larger, the precision goes down as
there are fewer bits to represent the mantissa. This causes the smallest representable _increments_,
also known as ULP or "Unit of Least Precision" to get larger. If you allow the azimuth angle to
get arbitrarily large, you may find that the same increment in `du` results in a much faster
camera rotation.
### `Camera::look_at`
The new representation is in terms of spherical angles because this is convenient for computing
movement over a sphere. Often we'll have a particular position in mind for the camera so it's nice
to have a `Camera::look_at` function that takes an explicit camera origin. Let's bring it back and
redefine it using the `with_spherical_coords` function. We can compute the altitude and azimuth
angles from the `origin`, `center`, and `up` parameters:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
impl Camera {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn look_at(origin: Vec3, center: Vec3, up: Vec3) -> Camera {
let center_to_origin = origin - center;
let distance = center_to_origin.length().max(0.01); // Prevent distance of 0
let neg_w = center_to_origin.normalized();
let azimuth = neg_w.x().atan2(neg_w.z());
let altitude = neg_w.y().asin();
Self::with_spherical_coords(center, up, distance, azimuth, altitude)
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub fn with_spherical_coords(
center: Vec3,
up: Vec3,
distance: f32,
azimuth: f32,
altitude: f32,
) -> Camera {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [camera-look-at-returns]: [camera.rs] Updated `Camera::look_at`]
(insert acknowledgments.md.html here)
References
==========
[#Marsaglia03]: George Marsaglia, [*Xorshift RNGs*](https://www.jstatsoft.org/article/download/v008i14/916), 2003
[#Jenkins13]: Bob Jenkins, [*A hash function for hash Table lookup*](https://www.burtleburtle.net/bob/hash/doobs.html), 2013
[#Hughes13]: J.F. Hughes, A. van Dam, M. McGuire, D.F. Sklar, J.D. Foley, S.K. Feiner, K. Akeley *Computer Graphics: Principles and Practice, 3rd Edition, Section 1.6*
"Beyond White Noise for Real-Time Rendering", Alan Wolfe (2024) https://youtu.be/tethAU66xaA?si=qIPEwF5XTm8kO3tF
[#Immel86]: David S. Immel, Michael F. Cohen, Donald P. Greenberg *A Radiosity Method For Non-Diffuse Environments*
[#Kajiya86]: James T. Kajiya *The Rendering Equation*, 1986
[#Lambert1760]: Johann Heinrich Lambert, *Photometria sive de mensura et gradibus luminis, colorum et umbrae*, 1760. Courtesy of ETH-Bibliothek Zürich, Switzerland.
[#McGuire2024GraphicsCodex]: Morgan McGuire, *The Graphics Codex*, 2024
[^ericson]: C. Ericson, Real Time Collision Detection
[^mcguire-codex]: https://graphicscodex.courses.nvidia.com/app.html
[Arman Uguray]: https://github.com/armansito
[Steve Hollasch]: https://github.com/hollasch
[Trevor David Black]: https://github.com/trevordblack
[RTIOW]: https://raytracing.github.io/books/RayTracingInOneWeekend.html
[RTTROYL]: https://raytracing.github.io/books/RayTracingTheRestOfYourLife.html
[rt-project]: https://github.com/RayTracing/
[gt-project]: https://github.com/RayTracing/gpu-tracing/
[gt-template]: https://github.com/RayTracing/gpu-tracing/blob/dev/code/template
[discussions]: https://github.com/RayTracing/gpu-tracing/discussions/
[dxr]: https://en.wikipedia.org/wiki/DirectX_Raytracing
[vkrt]: https://www.khronos.org/blog/ray-tracing-in-vulkan
[rtiow-cuda]: https://developer.nvidia.com/blog/accelerated-ray-tracing-cuda/
[webgpu]: https://www.w3.org/TR/webgpu/
[Rust]: https://www.rust-lang.org/
[rust-unsafe]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html
[wgpu]: https://wgpu.rs