**Ray Tracing: GPU Edition** [Arman Uguray][]
Draft
!!! WARNING This is a living document for a work in progress. Please bear in mind that the contents will change frequently and go through many edits before the final version. Introduction ==================================================================================================== _Ray Tracing_ is a rendering method in Computer Graphics that simulates the flow of light. It can faithfully recreate a variety of optical phenomena and can be used to render photorealistic images. _Path tracing_ is an application of this approach used to compute _Global Illumination_. Its core idea is to repeatedly trace millions of random rays through the scene and bounce them off objects based on surface properties. The algorithm is remarkably simple and relatively easy to implement when applied to a small number of material and geometry types. Peter Shirley's [_Ray Tracing In One Weekend_][RTIOW] (RTIOW) is a great introduction to building the foundation for a hobby renderer. A challenge with path tracing is its high computational cost. Rendering a complex scene takes a long time and this get worse as the rendered scenes get complex. This has historically made path tracing unsuitable for real-time applications. Fortunately -- like many problems in Computer Graphics -- the algorithm lends itself very well to parallelism. It is possible to achieve a significant speedup by distributing the work across many processor cores. The GPU (Graphics Processing Unit) is a type of processor designed to run the same set of operations over large amounts of data in parallel. This parallelism has been instrumental to achieving realistic visuals in real-time applications like video games. GPUs have been traditionally used to accelerate scanline rasterization but have since become programmable and capable of running a variety of parallel workloads. Notably, modern GPUs are now equipped with hardware cores dedicated to ray tracing. GPUs aren't without limitations. Programming a GPU requires a different approach than a typical CPU program. Taking full advantage of a GPU often involves careful tuning based on its architecture and capabilities which can vary widely across vendors and models. Rendering fully path-traced scenes at real-time rates remains elusive even on the most high-end GPUs. This is an an active and vibrant area of Computer Graphics research. This book is an introduction to GPU programming by building a simple GPU accelerated path tracer. We'll focus on building a renderer that can produce high quality and correct images using a fairly simple design. It won't be full-featured and its performance will be limited, however it will expose you to several fundamental GPU programming concepts. By the end, the renderer you'll have built can serve as a great starting point for extensions and experiments with more advanced GPU techniques. We will avoid most optimizations in favor of simplicity but the renderer will be able to achieve interactive frame rates on a decent GPU when targeting simple scenes.[^ch1] The accompanying code intentionally avoids hardware ray tracing APIs that are present on newer GPU models, instead focusing on implementing the same functionality on a programmable GPU unit using a shading language. This book follows a similar progression to [_Ray Tracing In One Weekend_][RTIOW]. It covers some of the same material but I highly recommend completing _RTIOW_ before embarking on building the GPU version. Doing so will teach you the path tracing algorithm in a much more approachable way and it will make you appreciate both the advantages and challenges of moving to a GPU-based architecture. If you run into any problems with your implementation, have general questions or corrections, or would like to share your own ideas or work, check out [the GitHub Discussions forum][discussions]. [^ch1]: A BVH-accelerated implementation can render a version of the RTIOW cover scene with ~32,000 spheres, 16 ray bounces per pixel, and a resolution of 2048x1536 on a 2022 _Apple M1 Max_ in 15 milliseconds. The same renderer performs very poorly on a 2019 _Intel UHD Graphics 630_ which takes more than 200ms to render a single sample. GPU APIs -------- Interfacing with a GPU and writing programs for it typically requires the use of a special API. This interface depends on your operating system and GPU vendor. You often have various options depending on the capabilities you want. For example, an application that wants to get the most juice out of a NVIDIA GPU for general purpose computations may choose to target CUDA. A developer who prefers broad hardware compatibility for a graphical mobile game may choose OpenGL ES or Vulkan. Direct3D (D3D) is the main graphics API on Microsoft platforms while Metal is the preferred framework on Apple systems. Vulkan, D3D12, and Metal all support an API specifically to accelerate ray tracing. You can implement this book using any API or framework that you prefer, though I generally assume you are working with a graphics API. In my examples I use an API based on [WebGPU][webgpu], which I think maps well to all modern graphics APIs. The code examples should be easy to adapt to those libraries. I avoid using ray tracing APIs (such as [DXR][dxr] or [Vulkan Ray Tracing][vkrt]) to show you how to implement similar functionality on your own. If you're looking to implement this in CUDA, you may also be interested in Roger Allen's [blog post][rtiow-cuda] titled _Accelerated Ray Tracing in One Weekend in CUDA_. Example Code ------------ Like _RTIOW_, you'll find code examples throughout the book. I use [Rust][] as the implementation language but you can choose any language that supports your GPU API of choice. I avoid most esoteric aspects of Rust to keep the code easily understandable to a large audience. On the few occasions where I had to resort to a potentially unfamiliar Rust-ism, I provide a C example to add clarity. I provide the finished source code for this book on [GitHub][gt-project] as a reference but I encourage you to type in your own code. I decided to also provide a minimal source template that you can use as a starting point if you want to follow along in Rust. The template provides a small amount of setup code for the windowing logic to help get you started. ### A note on Rust, Libraries, and APIs I chose Rust for this project because of its ease of use and portability. It is also the language that I tend to be most productive in. An important aspect of Rust is that a lot of common functionality is provided by libraries outside its standard library. I tried to avoid external dependencies as much as possible except for the following: * I use *[wgpu][]* to interact with the GPU. This is a native graphics API based on WebGPU. It's portable and allows the example code to run on Vulkan, Metal, Direct3D 11/12, OpenGL ES 3.1, as well as WebGPU and WebGL via WebAssembly. wgpu also has [native bindings in other languages](https://github.com/gfx-rs/wgpu-native). * I use [*winit*](https://docs.rs/winit/latest/winit/) which is a portable windowing library. It's used to display the rendered image in real-time and to make the example code interactive. * For ease of Rust development I use [*anyhow*](https://docs.rs/anyhow/latest/anyhow/) and [*bytemuck*](https://docs.rs/bytemuck/latest/bytemuck/). *anyhow* is a popular error handling utility and integrates seamlessly. *bytemuck* provides a safe abstraction for the equivalent of `reinterpret_cast` in C++, which normally requires [`unsafe`][rust-unsafe] Rust. It's used to bridge CPU data types with their GPU equivalents. * Lastly, I use [*pollster*](https://docs.rs/pollster/latest/pollster/) to execute asynchronous wgpu API functions (which is only called from a single line). [wgpu][] is the most important dependency as it defines how the example code interacts with the GPU. Every GPU API is different but their abstractions for the general concepts used in this book are fairly similar. I will highlight these differences occasionally where they matter. A large portion of the example code runs on the GPU. Every graphics API defines a programming language -- a so called **shading language** -- for authoring GPU programs. wgpu is based on WebGPU, as such my GPU code examples are written in *WebGPU Shading Language* (WGSL)[^ch1.2.1]. I also recommend keeping the following references handy while you're developing: * wgpu API documentation (version 0.19.1): https://docs.rs/wgpu/0.19.1/wgpu * WebGPU specification: https://www.w3.org/TR/webgpu * WGSL specification: https://www.w3.org/TR/WGSL With all of that out of the way, let's get started! [^ch1.2.1]: wgpu also supports shaders in the [SPIR-V](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html) binary format. You could in theory write your shaders in a shading language that can compile to SPIR-V (such as OpenGL's GLSL and Direct3D's HLSL) as long as you avoid any language features that can't be expressed in WGSL. Windowing and GPU Setup ==================================================================================================== The first thing to decide is how you want to view your image. One option is to write the output from the GPU to a file. I think a more fun option is to display the image inside an application window. I prefer this approach because it allows you to see your rendering as it resolves over time and it will allow you to make your application interactive later on. The downside is that it requires a little bit of wiring. First, your program needs a way to interact with your operating system to create and manage a window. Next, you need a way to coordinate your GPU workloads to output a sequence of images at the right time for your OS to be able to composite it inside the window and send it to your display. Every operating system with a graphical UI provides a native *windowing API* for this purpose. Graphics APIs typically define some way to integrate with a windowing system. You'll have various libraries to choose from depending on your OS and programming language. You mainly need to make sure that the windowing API or UI toolkit you choose can integrate with your graphics API. In my examples I use *winit* which is a Rust framework that integrates smoothly with wgpu. I put together a [project template][gt-template] that sets up the library boilerplate for the window handling. You're welcome to use it as a starting point. The setup code isn't a lot, so I'll briefly go over the important pieces in this chapter. The Event Loop -------------- The first thing the template does is create a window and associate it with an *event loop*. The OS sends a message to the application during important "events" that the application should act on, such as a mouse click or when the window gets resized. Your application can wait for these events and handle them as they arrive by looping indefinitely: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use { anyhow::{Context, Result}, winit::{ event::{Event, WindowEvent}, event_loop::{ControlFlow, EventLoop}, window::{Window, WindowBuilder}, }, }; const WIDTH: u32 = 800; const HEIGHT: u32 = 600; fn main() -> Result<()> { let event_loop = EventLoop::new()?; let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT); let window = WindowBuilder::new() .with_inner_size(window_size) .with_resizable(false) .with_title("GPU Path Tracer".to_string()) .build(&event_loop)?; // TODO: initialize renderer event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { WindowEvent::CloseRequested => control_handle.exit(), WindowEvent::RedrawRequested => { // TODO: draw frame window.request_redraw(); } _ => (), }, _ => (), } })?; Ok(()) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [main-initial]: [main.rs] Creating a window and handling window events] This code creates a window titled "GPU Path Tracer" and kicks off an event loop. `event_loop.run()` internally waits for window events and notifies your application by calling the lambda function that it gets passed as an argument. The lambda function only handles a few events for now. The most important one is `RedrawRequested` which is the signal to render and present a new frame. `MainEventsCleared` is simply an event that gets sent when all pending events have been processed. We call `window.request_redraw()` to draw repeatedly -- this triggers a new `RedrawRequested` event which is followed by another `MainEventsCleared`, which requests a redraw, and so on until someone closes the window. Running this code should bring up an empty window like this: ![Figure [empty-window]: Empty Window](../images/img-01-empty-window.png) GPU and Surface Initialization ------------------------------ The next thing the template does is establish a connection to the GPU and configure a surface. The surface manages a set of *textures* that allow the GPU to render inside the window. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust async fn connect_to_gpu(window: &Window) -> Result<(wgpu::Device, wgpu::Queue, wgpu::Surface)> { use wgpu::TextureFormat::{Bgra8Unorm, Rgba8Unorm}; // Create an "instance" of wgpu. This is the entry-point to the API. let instance = wgpu::Instance::default(); // Create a drawable "surface" that is associated with the window. let surface = instance.create_surface(window)?; // Request a GPU that is compatible with the surface. If the system has multiple GPUs then // pick the high performance one. let adapter = instance .request_adapter(&wgpu::RequestAdapterOptions { power_preference: wgpu::PowerPreference::HighPerformance, force_fallback_adapter: false, compatible_surface: Some(&surface), }) .await .context("failed to find a compatible adapter")?; // Connect to the GPU. "device" represents the connection to the GPU and allows us to create // resources like buffers, textures, and pipelines. "queue" represents the command queue that // we use to submit commands to the GPU. let (device, queue) = adapter .request_device(&wgpu::DeviceDescriptor::default(), None) .await .context("failed to connect to the GPU")?; // Configure the texture memory backs the surface. Our renderer will draw to a surface texture // every frame. let caps = surface.get_capabilities(&adapter); let format = caps .formats .into_iter() .find(|it| matches!(it, Rgba8Unorm | Bgra8Unorm)) .context("could not find preferred texture format (Rgba8Unorm or Bgra8Unorm)")?; let size = window.inner_size(); let config = wgpu::SurfaceConfiguration { usage: wgpu::TextureUsages::RENDER_ATTACHMENT, format, width: size.width, height: size.height, present_mode: wgpu::PresentMode::AutoVsync, alpha_mode: caps.alpha_modes[0], view_formats: vec![], desired_maximum_frame_latency: 3, }; surface.configure(&device, &config); Ok((device, queue, surface)) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [main-initial]: [main.rs] The connect_to_gpu function] The code that sets this all up is a bit wordy. I'll quickly go over the important bits: 1. What the first ~20 lines do is request a connection to a GPU that is compatible with the window. The bit about `wgpu::PowerPreference::HighPerformance` is a hint to the API that we want the higher-powered GPU if the current system has more than one available. 2. The rest of the function configures the dimensions, pixel format, and presentation mode of the surface. `Rgba8Unorm` and `Bgra8Unorm` are common pixel formats that store each color component (red, green, blue, and alpha) as an 8-bit unsigned integer. The "unorm" part stands for "unsigned normalized", which means that our rendering code can represent the component values as a real number in the range `[0.0, 1.0]`. We set the size to simply span the entire window. The bit about `wgpu::PresentMode::AutoVsync` tells the surface to synchronize the presentation of each frame with the display's refresh rate. The surface will manage an internal queue of textures for us and we will render to them as they become available. This prevents a visual artifact known as "tearing" (which can happen when frames get presented faster than the display refresh rate) by setting up the renderer to be *v-sync locked*. We will discuss some of the implications of this later on. The last bit that I'll highlight here is `wgpu::TextureUsage::RENDER_ATTACHMENT`. This just indicates that we are going to use the GPU's rendering function to draw directly into the surface textures. After setting all this up the function returns 3 objects: A `wgpu::Device` that represents the connection to the GPU, a `wgpu::Queue` which we'll use to issue commands to the GPU, and a `wgpu::Surface` that we'll use to present frames to the window. We will talk a lot about the first two when we start putting together our renderer in the next chapter. You may have noticed that the function declaration begins with `async`. This marks the function as *asynchronous* which means that it doesn't return its result immediately. This is only necessary because the API functions that we invoke (`wgpu::Instance::request_adapter` and `wgpu::Adapter::request_device`) are asynchronous functions. The `.await` keyword is syntactic sugar that makes the asynchronous calls appear like regular (synchronous) function calls. What happens under the hood is somewhat complex but I wouldn't worry about this too much since this is the one and only bit of asynchronous code that we will encounter. If you want to learn more about it, I recommend checking out the [Rust Async Book](https://rust-lang.github.io/async-book/). ### Completing Setup Finally, the `main` function needs a couple updates: first we make it `async` so that it we can "await" on `connect_to_gpu`. Technically the `main` function of a program cannot be async and running an async function requires some additional utilities. There are various alternatives but I chose to use a library called `pollster`. The library provides a special macro (called `main`) that takes care of everything. Again, this is the only asynchronous code that we'll encounter so don't worry about what it does. The second change to the main function is where it handles the `RedrawRequested` event. For every new frame, we first request the next available texture from the surface that we just created. The queue has a limited number of textures available. If the CPU outpaces the GPU (i.e. the GPU takes longer than a display refresh cycle to finish its tasks), then calling `surface.get_current_texture()` can block until a texture becomes available. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight #[pollster::main] async fn main() -> Result<()> { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let event_loop = EventLoop::new()?; let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT); let window = WindowBuilder::new() .with_inner_size(window_size) .with_resizable(false) .with_title("GPU Path Tracer".to_string()) .build(&event_loop)?; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let (device, queue, surface) = connect_to_gpu(&window).await?; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // TODO: initialize renderer event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { WindowEvent::CloseRequested => control_handle.exit(), WindowEvent::RedrawRequested => { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight // Wait for the next available frame buffer. let frame: wgpu::SurfaceTexture = surface .get_current_texture() .expect("failed to get current texture"); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // TODO: draw frame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight frame.present(); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust window.request_redraw(); } _ => (), }, _ => (), } })?; Ok(()) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [main-setup-complete]: [main.rs] Putting together the initial main function] Once a frame texture becomes available, the example issues a request to display it as soon as possible by calling `frame.present()`. All of our rendering work will be scheduled before this call. That was a lot of boilerplate -- this is sometimes necessary to interact with OS resources. With all of this in place, we can start building a real-time renderer. ### A note on error handling in Rust If you're new to Rust, some of the patterns above may look unfamiliar. One of these is error handling using the `Result` type. I use this pattern frequently enough that it's worth a quick explainer. A `Result` is a variant type that can hold either a success (`Ok`) value or an error (`Err`) value. The types of the `Ok` and `Err` variants are generic: `T` and `E` can be any type. It's common for a library to define its own error types to represent various error conditions. The idea is that a function returns a `Result` if it has a failure mode. A caller must check the status of the `Result` to unpack the return value or recover from an error. In a C program, a common way to handle an error is to return early from the calling function and and perhaps return an entirely new error. For example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C bool function_with_a_result(Foo* out_result); int main() { Foo foo; if (!function_with_result(&foo)) { return -1; } // ...do something with `foo`... return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust provides the `?` operator to automatically unpack a `Result` and return early if it holds an error. A Rust version of the C program above could be written like this: If `function_with_result()` returns an error, the `?` operator will cause `caller` to return and propagate the error value. This works as long as `caller` and `function_with_result` either return the same error type or types with a known conversion. There are various other ways to handle an error: I like to keep things simple in my code examples and use the `?` operator. Instead of defining custom error types and conversions, I use a catch all `Error` type from a library called *anyhow*. You'll often see the examples include `anyhow::Result` (an alias for `Result<, anyhow::Error>`) and `anyhow::Context`. The latter is a useful trait for adding an error message while converting to an `anyhow::Error`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust fn caller() -> anyhow::Result<()> { let foo: Foo = function_with_result().context("failed to get foo")?; // ...do something with `foo`... Ok(()) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can read more about the `Result` type in [its module documentation](https://doc.rust-lang.org/std/result/index.html). Drawing Pixels ==================================================================================================== At this stage, we have code that brings up a window, connects to the GPU, and sets up a queue of textures that is synchronized with the display. In Computer Graphics, the term "texture" is generally used in the context of *texture mapping*, which is a technique to apply detail to geometry using data stored in memory. A very common application is to map color data from the pixels of a 2D image onto the surface of a 3D polygon. Texture mapping is so essential to real-time graphics that all modern GPUs are equipped with specialized hardware to speed up texture operations. It's not uncommon for a modern video game to use texture assets that take up hundreds of megabytes. Processing all of that data involves a lot of memory traffic which is a big performance bottleneck for a GPU. This is why GPUs come with dedicated texture memory caches, sampling hardware, compression schemes and other features to improve texture data throughput. We are going to use the texture hardware to store the output of our renderer. In wgpu, a *texture object* represents texture memory that can be used in three main ways: texture mapping, shader storage, or as a *render target*[^ch3-cit1]. A surface texture is a special kind of texture that can only be used as a render target. Not all native APIs have this restriction. For instance, both Metal and Vulkan allow their version of a surface texture -- a *frame buffer* (Metal) or *swap chain* (Vulkan) texture -- to be configured for other usages, though this sometimes comes with a warning about impaired performance and is not guaranteed to be supported by the hardware. wgpu doesn't provide any other option so I'm going to start by implementing a render pass. This is a fundamental and very widely used function of the GPU, so it's worth learning about. [^ch3-cit1]: See [`wgpu::TextureUsages`](https://docs.rs/wgpu/0.17.0/wgpu/struct.TextureUsages.html). The render Module --------------------- I like to separate the rendering code from all the windowing code, so I'll start by creating a file named `render.rs`. Every Rust file makes up a *module* (with the same name) which serves as a namespace for all functions and types that are declared in it. Here I'll add a data structure called `PathTracer`. This will hold all GPU resources and eventually implement our path tracing algorithm: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); // TODO: initialize GPU resources PathTracer { device, queue, } } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render-initial]: [render.rs] The PathTracer structure] We start out with an associated function called `PathTracer::new` which will serve as the constructor and eventually initialize all GPU resources. The `PathTracer` takes ownership of the `wgpu::Device` and `wgpu::Queue` that we created earlier and it will hold on to them for the rest of the application's life. `wgpu::Device` represents a connection to the GPU. It is responsible for creating resources like texture, buffer, and pipeline objects. It also defines some methods for error handling. The first thing I do is set up an "uncaptured error" handler. If you look at the [declarations ](https://docs.rs/wgpu/0.17.0/wgpu/struct.Device.html) of resource creation methods you'll notice that none of them return a `Result`. This doesn't mean that they always succeed, as a matter of fact all of these operations can fail. This is because wgpu closely mirrors the WebGPU API which uses a concept called *error scopes* to detect and respond to errors. Whenever there's an error that I don't handle using an error scope it will trigger the uncaptured error handler, which will print out an error message and abort the program[^ch3.1-cit1]. For now, I won't set up any error scopes in `PathTracer::new` and I'll abort the program if the API fails to create the initial resources. Next, let's declare the `render` module and initialize a `PathTracer` in the `main` function: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Rust highlight mod render; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust const WIDTH: u32 = 800; const HEIGHT: u32 = 600; #[pollster::main] async fn main() -> Result<()> { let event_loop = EventLoop::new(); let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT); let window = WindowBuilder::new() .with_inner_size(window_size) .with_resizable(false) .with_title("GPU Path Tracer".to_string()) .build(&event_loop)?; let (device, queue, surface) = connect_to_gpu(&window).await?; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Rust highlight let renderer = render::PathTracer::new(device, queue); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust event_loop.run(move |event, _, control_flow| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { WindowEvent::CloseRequested => control_handle.exit(), WindowEvent::RedrawRequested => { // Wait for the next available frame buffer. let frame: wgpu::SurfaceTexture = surface .get_current_texture() .expect("failed to get current texture"); // TODO: draw frame frame.present(); window.request_redraw(); } _ => (), }, _ => (), } }); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [main-renderer-init]: [main.rs] Initializing a Renderer] Now that we have the skeleton in place, it's time to paint some pixels on the screen. [^ch3.1-cit1]: This is actually the default behavior so I didn't really need to call `on_uncaptured_error`. Display Pipeline ---------------- Before setting up the render pass let's first talk about how it works. Traditionally, graphics systems have been modeled after an abstraction called the *graphics pipeline*.[#Hughes13] At a very high level, the input to the pipeline is a mathematical model that describes what to draw -- such as geometry, materials, and light -- and the output is a 2D grid of pixels. This transformation is processed in a series of standard *pipeline stages* which form the basis of the rendering abstraction provided by GPUs and graphics APIs. wgpu uses the term *render pipeline* which is what I'll use going forward. The input to the render pipeline is a polygon stream represented by points in 3D space and their associated data. The polygons are described in terms of geometric primitives (points, lines, and triangles) which consist of *vertices*. The *vertex stage* transforms each vertex from the input stream into a 2D coordinate space that corresponds to the viewport. After some additional processing (such as clipping and culling) the assembled primitives are passed on to the *rasterizer*. The rasterizer applies a process called scan conversion to determine the pixels that are covered by each primitive and breaks them up into per-pixel *fragments*. The output of the vertex stage (the vertex positions, texture coordinates, vertex colors, etc) gets interpolated between the vertices of the primitive and the interpolated values get assigned to each fragment. Fragments are then passed on to the *fragment stage* which computes an output (such as the pixel or sample color) for each fragment. Shading techniques such as texture mapping and lighting are usually performed in this stage. The output then goes through several other operations before getting written to the render target as pixels.[^ch3-footnote1] ![Figure [render-pipeline]: Vertex and Fragment stages of the render pipeline ](../images/fig-01-render-pipeline.svg) What I just described is very much a data pipeline: a data stream goes through a series of transformations in stages. The input to each stage is defined in terms of smaller elements (e.g. vertices and pixel-fragments) that can be processed in parallel. This is the fundamental principle behind the GPU. Early commercial GPUs implemented the graphics pipeline entirely in fixed-function hardware. Modern GPUs still use fixed-function stages (and at much greater data rates) but virtually all of them allow you to program the vertex and fragment stages with custom logic using *shader programs*. [^ch3-footnote1]: I glossed over a few pipeline stages (such as geometry and tessellation) and important steps like multi-sampling, blending, and the scissor/depth/stencil tests. These play an important role in many real-time graphics applications but we won't make use of them in our path tracer. ### Compiling Shaders Let's put together a render pipeline that draws a red triangle. We'll define a vertex shader that outputs the 3 corner vertices and a fragment shader that outputs a solid color. We'll write these shaders in the WebGPU Shading Language (WGSL). Go ahead and create a file called `shaders.wgsl` to host all of our WGSL code (I put it next to the Rust files under `src/`). Before we can run this code on the GPU we need to compile it into a form that can be executed on the GPU. We start by creating a *shader module*: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let shader_module = compile_shader_module(&device); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete // TODO: initialize GPU resources ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, } } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn compile_shader_module(device: &wgpu::Device) -> wgpu::ShaderModule { use std::borrow::Cow; let code = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/src/shaders.wgsl")); device.create_shader_module(wgpu::ShaderModuleDescriptor { label: None, source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(code)), }) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render-shader-module]: [render.rs] Creating the shader module] The `compile_shader_module` function loads the file we just created into a string using the `include_str!` macro. This bundles the contents of `shaders.wgsl` into the program binary at build time. This is followed by a call to `wgpu::Device::create_shader_module` to compile the WGSL source code.[^ch3-footnote2] Let's define the vertex and fragment functions, which I'm calling `display_vs` and `display_fs`: I'm using the "vs" and "fs" suffixes as shorthand for "vertex stage" and "fragment stage". Together, these two functions form our "display pipeline" (the "display" part will become more clear later). The `@vertex` and `@fragment` annotations are WGSL keywords that mark these two functions as entry points to each pipeline stage program. Since graphics workloads generally involve a high amount of linear algebra, GPUs natively support SIMD operations over vectors and matrices. All shading languages define built-in types for vectors and matrices of up to 4 dimensions (4x4 in the case of matrices). The `vec4f` and `vec2f` types that are in the code represent 4D and 2D vectors of floating point numbers. `display_vs` returns the vertex position as a `vec4f`. This position is defined relative to a coordinate space called the *Normalized Device Coordinate Space*. In NDC, the center of the viewport marks the origin $(0, 0, 0)$. The $x$-axis spans horizontally from $(-1, 0, 0)$ on the left edge of the viewport to $(1, 0, 0)$ on the right edge while the $y$-axis spans vertically from $(0,-1,0)$ at the bottom to $(0,1,0)$ at the top. The $z$-axis is directly perpendicular to the viewport, going *through* the origin. ![Figure [ndc]: Our triangle in Normalized Device Coordinates](../images/fig-02-ndc.svg) `display_vs` takes a *vertex index* as its parameter. The vertex function gets invoked for every input vertex across different GPU threads. `vid` identifies the individual vertex that is assigned to the *invocation*. The number of vertices and where they exist within the topology of the input geometry is up to us to define. Since we want to draw a triangle, we'll later issue a *draw call* with 3 vertices and `display_vs` will get invoked exactly 3 times with vertex indices ranging from $0$ to $2$. Since our 2D triangle is viewport-aligned, we can set the $z$ coordinate to $0$. The 4th coordinate is known as a *homogeneous coordinate* used for projective transformations. Don't worry about this coordinate for now -- just know that for a vector that represents a *position* we set this coordinate to $1$. We can declare the $x$ and $y$ coordinates for the 3 vertices as an array of `vec2f` and simply return the element that corresponds to `vid`. I enumerate the vertices in counter-clockwise order which matches the winding order we'll specify when we create the pipeline. `display_fs` takes no inputs and returns a `vec4f` that represents the fragment color. The 4 dimensions represent the red, green, blue, and alpha channels of the destination pixel. `display_fs` gets invoked for all pixel fragments that result from our triangle and the invocations are executed in parallel across many GPU threads, just like the vertex function. To paint the triangle solid red, we simply return `vec4f(1., 0., 0., 1.)` for all fragments. [^ch3-footnote2]: The `Cow::Borrowed` bit is a Rust idiom that creates a "copy-on-write borrow". This allows the API to take ownership of the WGSL string if necessary. This is not really an important detail for us. ### Creating the Pipeline Object Before we can run the shaders, we need to assemble them into a *pipeline state object*. This is where we specify the data layout of the render pipeline and link the shaders into a runnable binary program. Let's add a new function called `create_display_pipeline`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... fn compile_shader_module(device: &wgpu::Device) -> wgpu::ShaderModule { use std::borrow::Cow; let code = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/src/shaders.wgsl")); device.create_shader_module(wgpu::ShaderModuleDescriptor { label: None, source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(code)), }) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn create_display_pipeline( device: &wgpu::Device, shader_module: &wgpu::ShaderModule, ) -> wgpu::RenderPipeline { device.create_render_pipeline(&wgpu::RenderPipelineDescriptor { label: Some("display"), layout: None, primitive: wgpu::PrimitiveState { topology: wgpu::PrimitiveTopology::TriangleList, front_face: wgpu::FrontFace::Ccw, polygon_mode: wgpu::PolygonMode::Fill, ..Default::default() }, vertex: wgpu::VertexState { module: shader_module, entry_point: "display_vs", buffers: &[], }, fragment: Some(wgpu::FragmentState { module: shader_module, entry_point: "display_fs", targets: &[Some(wgpu::ColorTargetState { format: wgpu::TextureFormat::Bgra8Unorm, blend: None, write_mask: wgpu::ColorWrites::ALL, })], }), depth_stencil: None, multisample: wgpu::MultisampleState::default(), multiview: None, }) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [display-pipeline]: [render.rs] The `create_display_pipeline` function] This code describes a render pipeline that draws a list of triangle primitives. The vertex winding order is set to counter-clockwise which defines the orientation of the triangle's *front face*.[^ch3-footnote3] We request that the interior of each polygon be completely filled (rather than drawing just the edges or vertices). We specify that `display_vs` is the main function of the vertex stage and that we're not providing any vertex data from the CPU (since we declared our vertices in the shader code). Similarly, we set up a fragment stage with `display_fs` as the entry point and a single color target.[^ch3-footnote4] I set the pixel format of the render target to `Bgra8Unorm` since that happens to be widely supported on all of my devices. What's important is that you assign a pixel format that matches the surface configuration in your windowing setup and that your GPU device supports this as a *render attachment* format. Let's instantiate the pipeline and store it in the `PathTracer` object. Pipeline creation is expensive so we want to create the pipeline state object once and hold on to it. We'll reference it later when drawing a frame: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_pipeline: wgpu::RenderPipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let display_pipeline = create_display_pipeline(&device, &shader_module); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_pipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [display-pipeline-init]: [render.rs] Initializing the display pipeline] [^ch3-footnote3]: The GPU can automatically discard triangles that are oriented away from the viewport. This is a feature called *back face culling* which our code doesn't make use of. [^ch3-footnote4]: The `fragment` field of `wgpu::RenderPipelineDescriptor` is optional (notice the *Some* in `Some(wgpu::FragmentState {...})` ?). A render pipeline that only outputs to the depth or stencil buffers doesn't have to specify a fragment shader or any color attachments. An example of this is *shadow mapping*: a shadow map is a texture that stores the distances between a light source and geometry samples from the scene; it can be produced by a depth-only render-pass from the point of view of the light source. The shadow map is later sampled from a render pass from the camera's point of view to determine whether a rasterized point is visible from the light or in shadow. The Render Pass --------------- We now have the pieces in place to issue a draw command to the GPU. The general abstraction modern graphics APIs define for this is called a "command buffer" (or "command list" in D3D12). You can think of the command buffer as a memory location that holds the serialized list of GPU commands representing the sequence of actions we want the GPU to take. To draw a triangle we'll *encode* a draw command into the command buffer and then *submit* the command buffer to the GPU for exection. With wgpu, the encoding is abstracted by an object called `wgpu::CommandEncoder`, which we'll use to record our draw command. Once we are done, we will call `wgpu::CommandEncoder::finish()` to produce a finalized `wgpu::CommandBuffer` which we can submit to the GPU via the `wgpu::Queue` that we created at start up. Let's add a new `PathTracer` function called `render_frame`. This function will take a texture as its parameter (our *render target*) and tell the GPU to draw to it using the pipeline object we created earlier: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn render_frame(&self, target: &wgpu::TextureView) { let mut encoder = self .device .create_command_encoder(&wgpu::CommandEncoderDescriptor { label: Some("render frame"), }); let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor { label: Some("display pass"), color_attachments: &[Some(wgpu::RenderPassColorAttachment { view: target, resolve_target: None, ops: wgpu::Operations { load: wgpu::LoadOp::Clear(wgpu::Color::BLACK), store: wgpu::StoreOp::Store, }, })], ..Default::default() }); render_pass.set_pipeline(&self.display_pipeline); // Draw 1 instance of a polygon with 3 vertices. render_pass.draw(0..3, 0..1); // End the render pass by consuming the object. drop(render_pass); let command_buffer = encoder.finish() self.queue.submit(Some(command_buffer)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render_frame-stub]: [render.rs] The `render_frame` function] `target` here is defined as a `wgpu::TextureView`. wgpu makes the distinction between a texture resource (represented by `wgpu::Texture`) and how that texture's memory is accessed by a pipeline (which is represented by the *view* into the texture). When we want to bind a texture we first create a view with the right properties. In this case we'll assume that the caller already created a `TextureView` of the render target. The first thing we do in `render_frame` is create a command encoder. We then tell the encoder to begin a *render pass*. There are 4 important API calls we make to encode the draw command: 1. Create a `wgpu::RenderPass`. We tell it to store the colors that are output by the render pipeline to the `target` texture by assigning it as the only color attachment. We also tell it to clear all pixels of the target to black (i.e. $(0, 0, 0, 1)$ in RGBA) before drawing to it. 2. Assign the render pipeline. 3. Record a single draw with 3 vertices. 4. End the render pass by destroying the `wgpu::RenderPass` object. We then serialize the command buffer and submit it to the GPU. Finally, let's invoke `render_frame` from our windowing event loop, using the current surface texture as the render target: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust async fn main() -> Result<()> { ... event_loop.run(move |event, _, control_flow| { ... Event::RedrawRequested(_) => { // Wait for the next available frame buffer. let frame: wgpu::SurfaceTexture = surface .get_current_texture() .expect("failed to get current texture"); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete // TODO: draw frame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let render_target = frame .texture .create_view(&wgpu::TextureViewDescriptor::default()); renderer.render_frame(&render_target); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust frame.present(); } ... }); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render_frame-call]: [main.rs] Rendering to a surface texture] Running this code should bring up a window that looks like this: ![Figure [first-triangle]: First Triangle](../images/img-02-first-triangle.png) Finally drawing something! A single triangle may not look that interesting but you can model highly complex 3D scenes and geometry by putting many of them together. It takes only a few tweaks to the render pipeline to shape, animate, and render millions of triangles many times per second. Full-Screen Quad ---------------- The render pipeline that we just put together plays a rather small role in the overall renderer: its purpose is to display the output of the path-tracer on the window surface. The output of our renderer is a 2D rectangular image and I would like it to fill the whole window. We can achieve this by having the render pipeline draw two right triangles that are adjacent at their hypothenuse. Remember that the viewport coordinates span the range $[-1, 1]$ in NDC, so setting the 4 corners of the rectangle to $(-1, 1)$, $(1, 1)$, $(1, -1)$, $(-1, -1)$ should cover the entire viewport regardless of its dimensions. ![Figure [half-screen-quad]: Half-Screen Triangle](../images/img-03-half-screen-quad.png) That painted only one of the triangles. We also need to update the draw command with the new vertex count: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { ... pub fn render_frame(&self, target: &wgpu::TextureView) { ... render_pass.set_pipeline(&self.display_pipeline); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete // Draw 1 instance of a polygon with 3 vertices. render_pass.draw(0..3, 0..1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight // Draw 1 instance of a polygon with 6 vertices. render_pass.draw(0..6, 0..1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // End the render pass by consuming the object. drop(render_pass); let command_buffer = encoder.finish() self.queue.submit(Some(command_buffer)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render_frame-stub]: [render.rs] The `render_frame` function] ![Figure [full-screen-quad]: Full-Screen Quad](../images/img-04-full-screen-quad.png) Viewport Coordinates -------------------- In this setup, every fragment shader invocation outputs the color of a single pixel. We can identify that pixel using the built-in `position` input to the pipeline stage. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return vec4f(1.0, 0.0, 0.0, 1.0); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [position-builtin]: [shaders.wgsl] Position Built-In] The input is defined as a `vec4f`. The $x$ and $y$ coordinates are defined in the _Viewport Coordinate System_. The origin $(0, 0)$ corresponds to the top-left corner pixel of the viewport. The $x$-coordinate increases towards the right and the $y$-coordinate increases towards the bottom. A whole number increment in $x$ or $y$ represents an increment by 1 pixel (and fractional increments can fall "inside" a pixel). For example, for a viewport with the physical dimensions of $800\times600$, the coordinate ranges are $0\le x\lt799, 0\le y \lt599$. ![Figure [viewport-coords]: Viewport Coordinate System](../images/fig-03-viewport-coords.svg) Let's assign every pixel fragment a color based on its position in the viewport by mapping the coordinates to a color channel (red and green). The render target uses a normalized color format (i.e. the values must be between $0$ and $1$), so we divide each dimension by the largest possible value to convert it to that range: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight const WIDTH: u32 = 800u; const HEIGHT: u32 = 600u; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let color = pos.xy / vec2f(f32(WIDTH - 1u), f32(HEIGHT - 1u)); return vec4f(color, 0.0, 1.0); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [pos-to-color]: [shaders.wgsl]] There are two language expressions here that are worth highlighting. `pos.xy` is a so called _vector swizzle_ that extracts the $x$ and $y$ components and produces a `vec2f` containing only those. Next, we divide that `vec2f` by another `vec2f`. Here, the division operator performs a component-wise division of every element of the vector on the left-hand side by the corresponding element on the right-hand side, so `pos.xy / vec2f(f32(WIDTH - 1u), f32(HEIGHT - 1u))` is equivalent to `vec2f(pos.x / f32(WIDTH - 1u), pos.y / f32(HEIGHT - 1u))`. Now we are able to separately color each individual pixel. Running this should produce a picture that looks like this: ![Figure [viewport-gradient]: Viewport Coordinates as a color gradient ](../images/img-05-viewport-gradient.png) Resource Bindings ==================================================================================================== Our program is split across separate runnable parts: the main executable that runs on the CPU and pipelines that run on the GPU. As we add more features we will want to exchange data between the different parts. The main way to achieve this is via memory resources. The CPU side of our program can create and interact with resources by making API calls. On the GPU side, the shader program can access those via _bindings_. A binding associates a resource with a unique slot number that can be referenced by the shader. Each slot is identified by an index number. The shader code declares a variable for each binding with a decoration that assigns it a binding index. The CPU side is responsible for setting up the resources for a GPU pipeline according to its binding layout. WebGPU introduces an additional concept around bindings called _bind group_. A bind group associates a group of resources that are frequently bound together.[^ch4-footnote1] Like individual bindings, each bind group is identified by an index number. Our pipelines won't make use of more than one bind group at a time, so we'll always assign $0$ as the group index. [^ch4-footnote1]: The bind group concept is similar to "descriptor set" in Vulkan, "descriptor table" in D3D12, and "argument buffer" in Metal. Uniform Declaration ------------------- The first binding we are going to set up is a _uniform buffer_. Uniforms are read-only data that don't vary across GPU threads. We are going to use a uniform buffer to store certain globals, like camera parameters. Our renderer currently assumes a window dimension of $800\times600$ and declares this in two different places (`shaders.wgsl` and `main.rs`) which must be kept in sync. Let's make `WIDTH` and `HEIGHT` uniforms and upload their values from the CPU side. We'll first declare a uniform buffer and assign it to binding index $0$: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct Uniforms { width: u32, height: u32, } @group(0) @binding(0) var uniforms: Uniforms; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete const WIDTH: u32 = 800u; const HEIGHT: u32 = 600u; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let color = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return vec4f(color, 0.0, 1.0); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [uniform binding declaration]: [shaders.wgsl] Uniform binding declaration] The `var` declaration tells the compiler that the shader expects a uniform buffer binding. The type of the binding variable is `Uniforms` which represents the shader's view over the buffer's memory. Declaring it this way allows the shader to access the contents of the buffer with an expression like `uniforms.width`. Bind Group Layout ----------------- If you run the code now you should get a validation error telling you that the pipeline layout expects a bind group layout at index $0$. We need to update the display pipeline description with a layout that includes the new uniform binding. Let's update the `create_display_pipeline` function to return a `wgpu::BindGroupLayout` alongside the pipeline object: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... } ... } ... fn create_display_pipeline( device: &wgpu::Device, shader_module: &wgpu::ShaderModule, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight ) -> (wgpu::RenderPipeline, wgpu::BindGroupLayout) { let bind_group_layout = device.create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor { label: None, entries: &[ wgpu::BindGroupLayoutEntry { binding: 0, visibility: wgpu::ShaderStages::FRAGMENT, ty: wgpu::BindingType::Buffer { ty: wgpu::BufferBindingType::Uniform, has_dynamic_offset: false, min_binding_size: None, }, count: None, }, ], }); let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust label: Some("display"), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight layout: Some(&device.create_pipeline_layout(&wgpu::PipelineLayoutDescriptor { bind_group_layouts: &[&bind_group_layout], ..Default::default() })), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... }); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight (pipeline, bind_group_layout) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [display-pipeline-layout]: [render.rs] Display pipeline layout] This says that the pipeline contains a single bind group, containing a single buffer entry. The buffer entry has the "uniform" buffer binding type and is visible only to the fragment stage. Buffer Object ------------- Let's now create the buffer object that will provide the backing memory for the uniforms. The size and layout of the memory need to match the `Uniforms` struct that we declared in the WGSL. A common pattern is to maintain two sets of these declarations (one for the CPU and one for the GPU side) and keep them in sync. Some frameworks allow you to reuse the same declarations on both sides. _wgpu_ doesn't provide a utility for this out of the box, so I'm going to redeclare `Uniforms` for the CPU side: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight use bytemuck::{Pod, Zeroable}; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, display_pipeline: wgpu::RenderPipeline, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust impl PathTracer { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [uniforms-struct-cpu]: [render.rs] CPU-side `Uniforms` struct] The `repr(C)` attribute makes the memory layout of the `Uniforms` struct conform to the C language rules so that the fields have a predictable order, size, and alignment.[^ch4-footnote2] For our purposes, this should make the memory layout of the struct exactly match the WGSL declaration. The `derive` attribute automatically implements the enumerated traits for our type. `Copy` and `Clone` allow the type be copied by value (Rust types are move-only by default). This is also the first time we are using the `bytemuck` crate. The `Pod` and `Zeroable` traits, along with `repr(C)`, allow us the safely reinterpret the `Uniforms` struct as a sequence of bytes. For all intents and purposes, these Rust attributes enable the same semantics as the following plain C/C++ struct: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C // If `Uniforms` were declared in C: struct Uniforms { uint32_t width; uint32_t height; }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now, let's allocate the backing buffer object and initialize its contents: This code allocates a buffer resource that is large enough to store an instance of `Uniforms` and copies the contents of `uniforms` into it. The buffer is mapped at creation so that its address space accessible to the CPU side. We also declare its usage to be `UNIFORM`: this is a hint to the GPU driver that allows it to perform optimizations based on the buffer access pattern. The usage is also useful for validating that the bindings we provide conform to the pipeline's layout. After the data copy, we need to flush and unmap the buffer from CPU memory before we can use it in GPU commands. We also store both `uniforms` and `uniform_buffer`, since we'll reuse them to modify some of the uniforms at runtime. [^ch4-footnote2]: The default Rust layout representation doesn't provide a strong guarantee on the order of the fields. See the [Rust reference](https://doc.rust-lang.org/reference/type-layout.html#representations). Bind Group ---------- We need to associate the buffer object with a bind group with the correct layout before it can be used in a render pass. Let's create and store a bind group and assign it to group index $0$ while encoding the draw: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use bytemuck::{Pod, Zeroable}; pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, uniforms: Uniforms, uniform_buffer: wgpu::Buffer, display_pipeline: wgpu::RenderPipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_bind_group: wgpu::BindGroup, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { ... uniform_buffer.unmap(); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight // Create the display pipeline bind group. let display_bind_group = device.create_bind_group(&wgpu::BindGroupDescriptor { label: None, layout: &display_layout, entries: &[wgpu::BindGroupEntry { binding: 0, resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding { buffer: &uniform_buffer, offset: 0, size: None, }), }], }); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, uniforms, uniform_buffer, display_pipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_bind_group, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } } pub fn render_frame(&self, target: &wgpu::TextureView) { ... render_pass.set_pipeline(&self.display_pipeline); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight render_pass.set_bind_group(0, &self.display_bind_group, &[]); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // Draw 1 instance of a polygon with 6 vertices render_pass.draw(0..6, 0..1); ... } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [display-bind-group]: [render.rs] Creating and using the display bind group] Running the program now should bring up the same picture as before. The viewport dimensions are still hardcoded in two places so let's clean that up by making the viewport width and height parameters of the `PathTracer` constructor: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust impl PathTracer { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); // Initialize the uniform buffer. let uniforms = Uniforms { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight width, height, }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [width-height-parameters]: [render.rs]] Let's update the main function to pass in the physical window dimensions while creating the `PathTracer`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... const WIDTH: u32 = 800; const HEIGHT: u32 = 600; #[pollster::main] async fn main() -> Result<()> { let event_loop = EventLoop::new(); let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT); let window = WindowBuilder::new() .with_inner_size(window_size) .with_resizable(false) .with_title("GPU Path Tracer".to_string()) .build(&event_loop)?; let (device, queue, surface) = connect_to_gpu(&window).await?; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust event_loop.run(move |event, _, control_flow| { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [width-height-parameters-main]: [main.rs]] Now we have a way to pass data between the CPU and GPU sides of the program. We can repeat this pattern whenever we need to add or modify a bind group layout. Ray Casting ==================================================================================================== Light flows out of emissive objects (like the sun or a lamp) and scatters off objects as it floods the environment. When some of that light reaches a camera sensor, the camera can measure the amount that arrived at each pixel and create a picture. Our virtual camera will compute the same measurement by tracing the light's path in the reverse direction, starting at the camera and towards the objects in the scene. Camera Rays ----------- The first segment in a path is between the camera and the closest surface that is visible "through a pixel". To locate that surface, we can plot a ray from the camera and search for the closest point where the ray intersects the scene. A ray is a part of a straight line that has a starting point and extends infinitely in one direction. A ray in 3D space can be represented using two vectors: a point of origin $\mathbf{P}$ and a direction $\vec{\mathbf{d}}$. All points $\mathbf{R}$ on the ray are described by the linear equation $\mathbf{R}(t) = \mathbf{P} + t \mathbf{d}$ over the parameter $t$. $t$ is a real number and its positive values represent points on the ray that are in front of the ray origin (if we consider the direction $\mathbf{d}$ as _forward_). Negative values of $t$ represent points behind the origin, and $t=0$ is the same as the origin. ![Figure [ray]: Ray definition](../images/fig-05-ray.svg) Let's define the data structure to represent a ray: Let's now model a simple pinhole camera. Initially we'll the define the eye position (where the arriving light gets focused) as the camera's origin and this will act as the origin for all camera rays. The camera has a view direction, and some distance away from the origin along the view direction sits the 2D viewport framing the rendered image. We will initially position the camera origin at the coordinate system origin $(0, 0, 0)$ and set the view direction towards the $-z$-axis in a 3-dimensional right-handed cartesian coordinate system.[^ch5-footnote1] ![Figure [camera-view-space]: Rays in camera coordinates](../images/fig-04-camera-view-space.svg) In order to determine the direction for the ray targeting a pixel, we need to convert the pixel's viewport coordinates to the coordinate system we are going to use when computing ray intersections. Let's define the $x$ and $y$ coordinate span of the viewport to be the same as NDC (see _Figure 4_). This would make the viewport a square (with a width and height of $2$) so we need to adjust it by the aspect ratio of the application window in order to make its shape match the window frame. The fragment shader already normalizes the viewport pixel coordinates to the range $[0,1]$ and returns that as the output color. We can instead apply a simple transformation to convert them to our new camera coordinate space: 1. Map the range to $[-1, 1]$ by doubling the range and shifting it in the negative direction by $1$. 2. Scale the $x$ coordinate by the aspect ratio (which we'll define as $\tfrac{width}{height}$). 3. Flip the sign of the $y$ coordinate by multiplying it by $-1$. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete let color = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Normalize the viewport coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-ray-computation]: [shaders.wgsl] Obtaining the viewport vector] We now have a vector $\vec{\mathbf{uv}}$ that spans from the center of the viewport to the pixel $\mathbf{A}$. The ray direction is the vector that points from the origin towards the pixel, which is given by $\mathbf{A} - \mathbf{O}$. $\mathbf{O}$ is equal to $(0, 0, 0)$, so computing $\mathbf{A}$ will give us the ray direction. If we picture the viewport to be positioned away from the origin at distance $f$ along the $-z$ axis then we can obtain $\mathbf{A}$ by computing $\begin{bmatrix} \vec{\mathbf{uv}} \\ 0 \end{bmatrix} - \begin{bmatrix} 0 \\ 0 \\ f \end{bmatrix}$, or simply $\begin{bmatrix} \vec{\mathbf{uv}} \\ -f \end{bmatrix}$. In the code, I'll refer to $f$ as `focus_distance`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let origin = vec3(0.); let focus_distance = 1.; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Normalize the viewport coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-ray-computation]: [shaders.wgsl] Deriving the camera ray origin and direction] We finally have our camera ray. Initially we can make all rays hit the sky which will act as the light source. We can make the sky appear a little more realistic by painting it with a gradient that blends from blue to white as the $y$ coordinate of the ray's direction decreases. We'll first map the $y$ coordinate to the $[0,1]$ range and use that value to linearly interpolate between the two colors using the blend equation: $$ \mathit{blendedValue} = (1-a)\cdot\mathit{startValue} + a\cdot\mathit{endValue} $$ Let's introduce a function called `sky_color` to compute this for a given ray and return that as the fragment color. I used the same colors as RTIOW but you can use different ones:[^ch5-footnote2] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Ray { origin: vec3f, direction: vec3f, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn sky_color(ray: Ray) -> vec3f { let t = 0.5 * (normalize(ray.direction).y + 1.); return (1. - t) * vec3(1.) + t * vec3(0.3, 0.5, 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Map `pos` from y-down viewport coordinates to camera viewport plane coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight return vec4(sky_color(ray), 1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-ray-computation]: [shaders.wgsl] Deriving the camera ray origin and direction] Running the program now should produce an image that looks like this: ![Figure [sky]: Ray tracing the sky](../images/img-06-sky-gradient.png) [^ch5-footnote1]: The choice of a right-handed vs left-handed system is really up to you - you can pick any relative orientation for the major axes that you want, as long as you stay consistent. [^ch5-footnote2]: Interpolating from blue towards a reddish color instead of pure white can resemble twilight. Give `vec3(1., 0.5, 0.3)`) a try. Ray-Sphere Intersection ----------------------- It's time to introduce objects to the scene. We'll start with a sphere since it has a simple implicit form and querying for intersections between a ray and a sphere is straightforward. I'll quickly go over the mathematics of the intersection function that we are going to implement: Let's define a sphere by its center point $\mathbf{C}$ and its radius $r$. Then, any point $\mathbf{X}$ on the surface of the sphere can be described by the equation[^ch5-footnote3] $$ (\mathbf{X} - \mathbf{C}) \cdot (\mathbf{X} - \mathbf{C}) = r^2 $$ We want to determine if there is a point along the ray that satisfies this equation. Substituting our ray equation for $\mathbf{X}$ we get: $$ (\mathbf{P} + t\mathbf{d} - \mathbf{C}) \cdot (\mathbf{P} + t\mathbf{d} - \mathbf{C}) = r^2 $$ Now we need to solve for $t$. To simplify things, let's substitute $\mathbf{v}$ for $(\mathbf{P} - \mathbf{C})$. After expanding the dot product and rearranging the terms we get $$ (\mathbf{d} \cdot \mathbf{d}) t^2 + 2 (\mathbf{v} \cdot \mathbf{d}) t + (\mathbf{v} \cdot \mathbf{v}) - r^2 = 0 $$ This is now in a canonical form for a quadratic equation: $at^2 + 2bt + c = 0$ and the solutions for $t$ are given by $$ t = \dfrac{-b \pm\sqrt{b^2 - ac}}{a} $$ with $a = \mathbf{d}\cdot\mathbf{d}$, $b = (\mathbf{P}-\mathbf{C})\cdot\mathbf{d}$, and $c = (\mathbf{P}-\mathbf{C})\cdot(\mathbf{P}-\mathbf{C}) - r^2$. The value of the discriminant $b^2 - ac$ determines the number of solutions. If the discriminant is negative, then there are no real solutions and thus no intersection. If the discriminant is exactly 0, then there is one real solution where the ray tangentially intersects the sphere at that point. If the discriminant is positive, then there are two real solutions and thus two potential intersections that we need to consider. ![Figure [ray-sphere-solutions]: Different cases of ray-sphere intersection ](../images/fig-06-ray-sphere-solutions.svg) We are looking for the first visible surface in the ray's "line of sight", so when there are two possible intersections it makes sense to choose the one that's closer to the ray's origin and lies in front of it. If the closer result is negative (i.e. it's located _behind_ the origin relative to the ray direction), we can discard it and choose the other one. If that one is non-negative, then the ray origin is inside the sphere, so the intersection is valid. If both results are negative, then the sphere is "behind" the ray. $t$ is $0$ when the ray origin is on the surface. In general, rays that start exactly on the surface of an object will be rays that trace the paths of light arriving at that surface. We generally don't want such a ray to intersect the geometry that the ray originates from, so for simplicity let's only consider positive values of $t$ as a valid intersection. Let's define a new function called `intersect_sphere`. This function will return the smaller positive solution for $t$ if there is an intersection and a non-positive value if the ray misses the sphere. Let's also define a new type called `Sphere` to represent the object: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct Sphere { center: vec3f, radius: f32, } fn intersect_sphere(ray: Ray, sphere: Sphere) -> f32 { let v = ray.origin - sphere.center; let a = dot(ray.direction, ray.direction); let b = dot(v, ray.direction); let c = dot(v, v) - sphere.radius * sphere.radius; let d = b * b - a * c; if d < 0. { return -1.; } let sqrt_d = sqrt(d); let recip_a = 1. / a; let mb = -b; let t = (mb - sqrt_d) * recip_a; if t > 0. { return t; } return (mb + sqrt_d) * recip_a; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL struct Ray { origin: vec3f, direction: vec3f, } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [ray-sphere-intersection]: [shaders.wgsl] The `intersect_sphere` function] Let's now add a single sphere to the scene. First we'll test the sphere for an intersection with the camera ray. If there is a hit, then we'll return a solid color for the pixel. If not, we'll return the color of the sky as before. Let's also make sure that the sphere is far enough away from the view origin so that the camera doesn't fall inside the sphere: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Map `pos` from y-down viewport coordinates to camera viewport plane coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let sphere = Sphere(/*center*/ vec3(0., 0., -1), /*radius*/ 0.5); if intersect_sphere(ray, sphere) > 0. { return vec4(1., 0.76, 0.03, 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [single-sphere]: [shaders.wgsl] First intersection test] This should render a solid circle that looks like this: ![Figure [yellow-circle]: A solid circle](../images/img-07-solid-circle.png) [^ch5-footnote3]: This equation has an intuitive geometric interpretation. $\mathbf{X} - \mathbf{C}$ describes a vector that spans from the center of the sphere to its surface. We know that the magnitude of this vector must be equal to $r$. The dot product of a vector with itself yields the square of its magnitude (as $V \cdot V = V_x^2 + V_y^2 + V_z^2)$ which, in this case, must be equal to $r^2$. Shading Multiple Spheres ------------------------ Now that our sphere intersection code is working, we'll next generalize the ray casting logic to look for intersections in a _scene_ containing multiple objects. We can initially represent the scene as an array of spheres. We'll change the code to test all spheres for a possible hit and use the closest intersection to color the pixel. As before, we'll use the $t$ parameter to determine the nearest hit. Let's declare the scene with a second (large) sphere that serves as the "ground" where our first sphere will sit. We can declare the array as a private global, like we did for the vertices of the full-screen quad: The scene traversal code is straightforward: loop through the scene array and keep track of the closest $t$ value that results from calling `intersect_sphere` on each element. It makes sense to initialize $t$ with a value that is larger than all other possible values. Since we're dealing with floating-point numbers, _infinity_ is a suitable initial value. However, since WGSL doesn't quite support infinities[^ch5-footnote4], I'll use the largest representable `f32` value as a substitute: This should result in the following image: ![Figure [yellow-circles]: Two solid circles](../images/img-08-two-solid-circles.png) Both spheres are visible where we expect them. Since we're painting both objects with the same solid color, it's not possible to tell if our code works correctly for the bottom half of the top sphere where the ray intersects both objects. An easy way to improve this is to assign each object a different solid color and use that to paint the pixel. I'm going to do something different: I'll scale the color by the value of `closest_t` such that intersections that are further away from the origin are shaded darker compared to those that are closer. This will convey the _depth_ of the shaded object with respect to our virtual camera. We can achieve this by multiplying the color by a factor of $1 - t$ which will keep the color bright for smaller values of $t$ (representing closer intersections) and darken it as $t$ grows. I'll use the [**`saturate`**](https://www.w3.org/TR/WGSL/#saturate-float-builtin) built-in function to clamp the resulting value to the $[0, 1]$ range so that values of $t$ that are larger than $1$ will be shaded black: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Map `pos` from y-down viewport coordinates to camera viewport plane coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); var closest_t = FLT_MAX; for (var i = 0u; i < OBJECT_COUNT; i += 1u) { let t = intersect_sphere(ray, scene[i]); if t > 0. && t < closest_t { closest_t = t; } } if closest_t < FLT_MAX { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight return vec4(1., 0.76, 0.03, 1.) * saturate(1. - closest_t); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [shading-using-depth]: [shaders.wgsl] Shading using depth] This should make the objects' order of visibility and their spherical shape more apparent: ![Figure [depth-shaded-spheres]: Spheres shaded by depth](../images/img-09-depth-shaded-spheres.png) Both spheres appear quite dark and the bottom sphere fades to black where it meets the one on top. This makes sense since the center of the top sphere is exactly where $t = 1$. You can play with different ways to convert `closest_t` to a color. Here is a version that paints the scene gray and brighter with increasing depth: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... if closest_t < FLT_MAX { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight return vec4(saturate(closest_t) * 0.5); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [shading-using-depth-alt]: [shaders.wgsl] Another way to shade with depth] ![Figure [depth-shaded-spheres-gray]: Spheres shaded by depth (gray) ](../images/img-10-depth-shaded-spheres-gray.png) [^ch5-footnote4]: [WGSL W3C Working Draft, §14.6. Floating Point Evaluation](https://www.w3.org/TR/WGSL/#floating-point-evaluation) states that "_Overflow, infinities, and NaNs generated before runtime are errors_" and "_[compiler] implementations may assume that overflow, infinities, and NaNs are not present at runtime._" Surface Normals --------------- Shading using depth can serve as a great debugging tool as well as the basis for various visual effects. However, we need to know more about the surface geometry in order to color it with a lighting model. This includes its orientation with respect to our viewing direction and the rest of the scene, which is given by its _normal vector_. For any point on a surface, the normal $\vec{\mathbf{N}}$ is defined by the line that is perpendicular to the plane tangent at that point. ![Figure [normal-vector]: The normal vector](../images/fig-07-normal-vector.svg) The orientation of the normal vector depends on both the type of geometry as well as the specific point of intersection. First we're going to make some assumptions that will come into play later when we implement materials: 1. Every surface has a _front_ face and a _back_ face and the direction of the normal vector lines up with the front face. 2. All normal vectors have a unit length by default. The normal vector at point $\mathbf{X}$ on the surface of a sphere with center $\mathbf{C}$ and radius $r$ is simply given by $$ \vec{\mathbf{N}} = \dfrac{\mathbf{X} - \mathbf{C}}{||\mathbf{X} - \mathbf{C}||} = \dfrac{\mathbf{X} - \mathbf{C}}{r} $$ ![Figure [sphere-normal]: Computing the normal on a sphere ](../images/fig-08-sphere-normal.svg) Now that we know how to compute the normal, let's change `intersect_sphere` to return a normal vector alongside the $t$ parameter. We'll introduce a struct called `Intersection` that bundles them together: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @group(0) @binding(0) var uniforms: Uniforms; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct Intersection { normal: vec3f, t: f32, } fn no_intersection() -> Intersection { return Intersection(vec3(0.), -1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL struct Sphere { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn intersect_sphere(ray: Ray, sphere: Sphere) -> Intersection { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let v = ray.origin - sphere.center; let a = dot(ray.direction, ray.direction); let b = dot(v, ray.direction); let c = dot(v, v) - sphere.radius * sphere.radius; let d = b * b - a * c; if d < 0. { return no_intersection(); } let sqrt_d = sqrt(d); let recip_a = 1. / a; let mb = -b; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let t1 = (mb - sqrt_d) * recip_a; let t2 = (mb + sqrt_d) * recip_a; let t = select(t2, t1, t1 > 0.); if t <= 0. { return no_intersection(); } let p = point_on_ray(ray, t); let N = (p - sphere.center) / sphere.radius; return Intersection(N, t); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } struct Ray { origin: vec3f, direction: vec3f, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn point_on_ray(ray: Ray, t: f32) -> vec3 { return ray.origin + t * ray.direction; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL fn sky_color(ray: Ray) -> vec3f { let t = 0.5 * (normalize(ray.direction).y + 1.); return (1. - t) * vec3(1.) + t * vec3(0.3, 0.5, 1.); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersection-struct]: [shaders.wgsl] A struct for intersection data] Let's talk about some of the changes. We added a helper function called `no_intersection()` that returns an `Intersection` representing a null result. We also declared a function called `point_on_ray`, which returns the coordinates of a point along a ray at a known $t$ value. You may have noticed that the if statement which used to be conditioned on `t > 0.` is now a function call to _select_. [**`select`**](https://www.w3.org/TR/WGSL/#select-builtin) evaluates to either its first or second argument depending on the value of the third. The call `select(t2, t1, t1 > 0.)` is functionally equivalent to `t1 > 0. ? t1 : t2` (a ternary expression) in C/C++, with one exception: there is no guarantee of short-circuiting, meaning that both `t1` and `t2` will be evaluated regardless of the conditional. You may be tempted to rewrite this as an if statement (why needlessly evaluate both branches after all?): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var t = (mb - sqrt_d) * recip_a; if t <= 0. { t = (mb + sqrt_d) * recip_a; } if t <= 0. { return no_intersection(); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [branchy-version]: [shaders.wgsl] Branchy version] This is perfectly fine and will behave in the same way. In fact, it's possible that this will compile down to the exact same GPU instructions as the version with `select`. GPUs are generally not good at handling conditional branches in code without sacrificing some amount of parallelism (though this depends on several factors). A good shader compiler will often eliminate branches altogether for simple conditionals like these. Writing efficient GPU code requires a good understanding of how GPUs deal with divergent control flow -- a topic that we will discuss more later on. Let's update the fragment shader to make use of the new data structure. Let's also change our shading code to visualize the normal vector by mapping the coordinates (from the $[-1, 1]$ range) to a color value (in the $[0, 1]$ range): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Normalize the viewport coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight var closest_hit = Intersection(vec3(0.), FLT_MAX); for (var i = 0u; i < OBJECT_COUNT; i += 1u) { let hit = intersect_sphere(ray, scene[i]); if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } if closest_hit.t < FLT_MAX { return vec4(0.5 * closest_hit.normal + vec3(0.5), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [shading-with-normals]: [shaders.wgsl] Shading with normals] and we get: ![Figure [normal-shaded-spheres]: Visualizing surface normals ](../images/img-11-normal-shaded-spheres.png) Notice how each color channel maps directly to one of the major axis coordinates, so that normals pointing towards the $+x$ direction get shaded with a higher _red_ component, normals pointing straight up towards $+y$ appear green, and so on. Temporal Accumulation ==================================================================================================== Over the next two chapters we are going to focus on two important features of the renderer: antialiasing and path tracing. These are both sampling problems in essence: they try to estimate some continuous signal (in this case the light flowing out of the scene into the pixels of our virtual camera) by repeatedly sampling various discrete light paths. Once a sufficient number of samples have been collected, we hope that their average will converge to the real signal -- or at least get close enough.[^ch6-footnote1] How many samples do we need to collect for each pixel before displaying the result? How can we structure the code to achieve some amount of interactivity? The answer to the first question depends highly on the scene but the sample count we are looking at is possibly in the hundreds if not _thousands_. One option is to add a loop to our fragment shader that intersects the scene with camera rays thousands of times before returning the final color, though it will take a long time before we can display a frame. Path tracing is computationally _very_ expensive, even for a GPU. I'm going to suggest an alternative approach: spread the sample collection across many frames. An invocation of the pipeline will output 1 sample per pixel (as it currently does) but rather than outputting the samples directly to the display surface, we'll accumulate them in a texture over time. This approach has the nice benefit that we can to present the contents of the texture to the display as soon as a pipeline invocation completes, allowing us to watch as the image resolves to the final rendering. [^ch6-footnote1]: This is referred to as the [_Law of Large Numbers_](https://en.wikipedia.org/wiki/Law_of_large_numbers) in probability theory. The Monte Carlo method employed in path tracing is an example of this (and we'll talk more about it in the next chapter). Frame Count ----------- The arithmetic average of a set of samples is simply given by their sum divided by the sample count. In other words, given $N$ samples of a random variable $x \in x_1,...,x_N$ the average is given by $$ \dfrac{1}{N}\sum_{i=1}^N x_i $$ Since we are going to distribute the samples across rendered frames, for any given frame, $N$ is equal to the number of frames we have rendered up that point plus $1$. We can represent this as a simple counter that we increment every time `render_frame` gets called. We'll also define a uniform variable for the frame count so that our shader program can access it when it needs to compute the average: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Uniforms { width: u32, height: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight frame_count: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } @group(0) @binding(0) var uniforms: Uniforms; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [frame-count-cpu]: [shaders.wgsl] The `frame_count` uniform declaration] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight frame_count: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); // Initialize the uniform buffer. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let uniforms = Uniforms { width: 800, height: 600, frame_count: 0, }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let uniform_buffer = device.create_buffer(&wgpu::BufferDescriptor { ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete pub fn render_frame(&self, target: &wgpu::TextureView) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn render_frame(&mut self, target: &wgpu::TextureView) { self.uniforms.frame_count += 1; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [frame-count-cpu]: [render.rs] Initializing the `frame_count` uniform] We declared `frame_count` as a 32-bit unsigned integer, which is supported by all shading languages. This will inevitably overflow if you leave the application running for a long time but I'm not too worried. Consider this: if you have a powerful graphics card that can render frames at 1000 fps, it will take approximately 50 days for the count to reach the maximum representable `u32` value ($2^{32}-1$). This is not perfect but also not a huge issue for us.[^ch6-footnote2] Note that we also changed `render_frame` to take a `&mut self` since it now mutates a member of the `PathTracer` type. We also need to update the call site and declare the `PathTracer` instance as mutable to make the compiler happy: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[pollster::main] async fn main() -> Result<()> { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete let renderer = render::PathTracer::new(device, queue); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut renderer = render::PathTracer::new(device, queue); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render_frame-call]: [main.rs] Rendering to a surface texture] We are now maintaining a count on the CPU but we still need to make sure that the changes are mirrored on the GPU side by writing the contents of `self.uniforms` to `self.uniform_buffer`. Since we are modifying `self.uniforms` every frame, we should also update the contents of the GPU buffer every frame. This is where things can get a little complicated. [^ch6-footnote2]: Rust has overflow checks enabled in debug builds, so the program will always panic (i.e. assert and crash) on overflow. In release builds, the checks are disabled by default and Rust performs two's complement wrapping (see the [docs](https://doc.rust-lang.org/book/ch03-02-data-types.html#integer-overflow)). If you don't care about the runtime cost and want to play it safe, you can use one of the explicit arithmetic methods provided by the standard library. For example, the following will always panic in the case of an overflow: `self.uniforms.frame_count = self.uniforms.frame_count.checked_add(1).unwrap()`. ### Buffer Updates There are some things to consider when modifying the contents of a GPU buffer. The first is the type of memory the buffer resides in. GPUs typically come in two flavors: a _discrete_ GPU (such as a desktop graphics card) has its own dedicated memory and connects to the CPU via a peripheral bus that facilitates memory transfers between the two processors. In a _unified_ architecture, the GPU and the CPU are integrated into the same die and can share system memory without an explicit memory transfer. Before any writes can occur, the CPU side must have access to a region of memory that's mapped to its address space. How the written data is made available to the GPU side very much depends on the hardware and the functions provided by the graphics API. For example, both Metal and Vulkan support buffer types that are backed by shared system memory and can be permanently mapped on a unified architecture. Similarly, both APIs provide facilities to transfer buffer data to GPU memory when fast shared memory isn't supported.[^ch6-footnote3] Another consideration is around synchronization. Suppose that we changed our renderer to allow multiple frames to be in flight without gating the GPU submissions on v-sync.[^ch6-footnote4] We would need to avoid making any changes to the uniform buffer while a GPU submission is in progress, as that could cause a data race. There are different ways to handle this depending on the API, such as double or triple buffering when using a persistently mapped buffer or using synchronization primitives like memory fences. If you're following this book using a native API (like Metal, Vulkan, D3D, CUDA, etc), please consult its documentation for the best approach for frequent buffer updates on your GPU. WebGPU tries to provide a common abstraction over these nuances while working within additional constraints imposed by a web browser environment.[^ch6-footnote5] As a result, WebGPU imposes some strict limitations on how buffer mapping works: * A buffer must have the [`MAP_WRITE`](https://www.w3.org/TR/webgpu/#dom-gpubufferusage-map_write) usage for the CPU side to map and write its contents and this usage can only be combined with the [`COPY_SRC`](https://www.w3.org/TR/webgpu/#dom-gpubufferusage-copy_src) usage. This means that a buffer we map for writing connot be bound as a shader resource (such as a uniform buffer) and instead serves as a _staging buffer_ for a _copy command_. Updating the contents of a buffer is only possible by issuing a copy from this intermediate staging buffer. * Buffers can only be mapped asynchronously and there is no synchronous way to map a buffer _except_ when first created (using the `mapped_at_creation` field in the buffer descriptor). This requires some careful coordination so that buffers are mapped and available for writing when we need to update them. This immediately rules out shared memory buffers so we have to issue a copy. The easiest way would be to create a new staging buffer on every update and set `mapped_at_creation` to `true` but allocating a new short-lived buffer every frame can be expensive and we should strive to reuse GPU buffers when we can. Buffers have to get unmapped before they can be bound to a shader, so we need to re-map a buffer before we can write to it again. A buffer can only get re-mapped asynchronously, so we may need to allocate another staging buffer if `render_frame` ever gets called before the asynchronous mapping of the first staging buffer has completed. One possible approach is to maintain a pool of staging buffers. Each of these is a `wgpu::Buffer` object with the `MAP_WRITE` and `COPY_SRC` usages and mapped at creation. When it's time to update the uniform buffer, we do the following: 1. Find a large enough staging buffer in the pool (or create a new one if not found). Assume the buffer is mapped and write its contents. 2. Unmap the buffer and move it to a "pending buffer" list. Then, encode a ["copy buffer to buffer"](https://www.w3.org/TR/webgpu/#dom-gpucommandencoder-copybuffertobuffer) command with the staging buffer as the source and the uniform buffer as the destination. 3. After submitting the command buffer, call ["map async"](https://www.w3.org/TR/webgpu/#dom-gpubuffer-mapasync) on all buffers in the pending list. The [wgpu implementation of map async](https://docs.rs/wgpu/latest/wgpu/struct.BufferSlice.html#method.map_async) reports its completion in a callback (which runs asynchronously), so the callback can be responsible for removing the buffer from the pending list and adding it back to the mapped staging buffer pool. This is a relatively simple state machine but fortunately there is a method that boils all of that down to a single API call: [`wgpu::Queue::write_buffer`](https://docs.rs/wgpu/latest/wgpu/struct.Queue.html#method.write_buffer)[^ch6-footnote6]. This simplifies the code quite a bit so let's use it instead of implementing a buffer pool. `write_buffer` achieves the same thing while leaving it up to wgpu to choose the most efficient way to transfer the data on the host platform. As for synchronization, everything gets internally handled by wgpu so there isn't anything special we need to do. As long as we call `write_buffer` before encoding any other GPU commands referencing the copy destination (i.e. our uniform buffer) on the same queue, the copy is guaranteed to complete before the shader runs and reads from the buffer: That's pretty much it. Since the new code always updates the uniform buffer before a GPU submission we don't really need to initialize it at the start. Let's check that the code works by creating a visual effect using `frame_count`. The count increases monotonically, so we can use it like a "timestamp" and drive a simple animation. Here is a simple shader change that makes the spheres shrink and expand: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ... var closest_hit = Intersection(vec3(0.), FLT_MAX); for (var i = 0u; i < OBJECT_COUNT; i += 1u) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight var sphere = scene[i]; sphere.radius += sin(f32(uniforms.frame_count) * 0.02) * 0.2; let hit = intersect_sphere(ray, sphere); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } if closest_hit.t < FLT_MAX { return vec4(0.5 * closest_hit.normal + vec3(0.5), 1.); } return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [shading-with-normals]: [shaders.wgsl] Shading with normals] This should create an effect like the one in this video (Figure 21): ![Figure [animated-radius]: (video) Spheres animated with frame count ](../images/vid-01-animated-radius.mp4 autoplay muted loop) [^ch6-footnote3]: Metal provides a ["managed"](https://developer.apple.com/documentation/metal/resource_fundamentals/synchronizing_a_managed_resource) storage mode for these situations alongside a "private" storage mode for memory that is meant for fast GPU-only access. Vulkan's memory abstraction also provides many similar low level configurations. [^ch6-footnote4]: This can be desirable on a high-end GPU that can render a single frame much faster than the display refresh rate. [^ch6-footnote5]: See [wgpu#1438](https://github.com/gfx-rs/wgpu/discussions/1438) for an interesting discussion on the motivations behind the async-only buffer mapping API. [^ch6-footnote6]: See the WebGPU specification for [GPUQueue.writeBuffer](https://www.w3.org/TR/webgpu/#dom-gpuqueue-writebuffer) Radiance Texture ---------------- The animation you just rendered is a type of computation that is spread over time (hence the word _"temporal"_). We can use the same mechanism to compute running averages of per-pixel radiance samples. _Radiance_ is a radiometric term that refers to the energy carried by light through space, restricted to an instant in time, emanating from a unit patch of surface towards another. It is a physical quantity that renderers often emulate to produce realistic stills. Following this model, we'll pretend that every ray we cast measures some fraction of the radiance along its direction, and rays will always originate from a surface in the scene and point in the direction of another. The first rays all originate at a pixel (inside the virtual camera).[^ch6-footnote7] On each frame, the program will compute one sample per pixel and add to a per-pixel sum of samples When it's time to display the current sample average, we can divide the sum by `frame_count` and output that to the surface. In order to achieve this, let's set aside GPU texture to persist the running sums across frames: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use bytemuck::{Pod, Zeroable}; pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, uniforms: Uniforms, uniform_buffer: wgpu::Buffer, display_pipeline: wgpu::RenderPipeline, } #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, frame_count: u32, } impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); // Initialize the uniform buffer. let uniforms = Uniforms { width, height, frame_count: 0, }; let uniform_buffer = device.create_buffer(&wgpu::BufferDescriptor { label: Some("uniforms"), size: std::mem::size_of::() as u64, usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST, mapped_at_creation: false, }); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let radiance_samples = create_sample_texture(&device, width, height); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, uniforms, uniform_buffer, display_pipeline, } } ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn create_sample_texture(device: &wgpu::Device, width: u32, height: u32) -> wgpu::Texture { device.create_texture(&wgpu::TextureDescriptor { label: Some("radiance samples"), size: wgpu::Extent3d { width, height, depth_or_array_layers: 1, }, mip_level_count: 1, sample_count: 1, dimension: wgpu::TextureDimension::D2, format: wgpu::TextureFormat::Rgba32Float, usage: wgpu::TextureUsages::TEXTURE_BINDING | wgpu::TextureUsages::STORAGE_BINDING, view_formats: &[], }) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [radiance-samples-texture]: [render.rs] Radiance samples texture] The texture has the same dimensions as the window surface, so that the resolution of the rendered image matches what gets displayed. (Though, it's not uncommon to render at a lower resolution and upsample that in order to save on computations.) The texture format is `Rgba32Float`, which stores every pixel (or "texel") as four 32-bit floating point components (one for each of the 4 RGBA channels). This uses more memory than the 8-bit `Rgba8Unorm` format we used for the display surface but provides sufficient precision to store very large sums of radiance samples on all color channels. The usages (`TEXTURE_BINDING` and `STORAGE_BINDING`) enable the texture to be bound for reading and writing. wgpu doesn't allow a texture to be bound to the same shader stage simultaneously with both read and write access (except with an extension feature[^ch6-footnote8]). This may not be supported on all GPUs, so let's avoid depending on specific GPU features for now. Instead of reading and modifying the same texture in the render pass we can "ping-pong" between two textures. The pipeline will declare two texture bindings: a read-only binding that contains the previously accumulated sums, and a second (write-only) storage binding where it will output the updated sums. We'll also create two texture objects for each binding and alternate their binding assignments with every frame, repeatedly swapping their roles: the texture that was previously the write target provides the accumulated sums for the next frame, and vice versa. Start by changing the type of `radiance_samples` to an array of 2 textures: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use bytemuck::{Pod, Zeroable}; pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, uniforms: Uniforms, uniform_buffer: wgpu::Buffer, display_pipeline: wgpu::RenderPipeline, } #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, frame_count: u32, } impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let radiance_samples = create_sample_textures(&device, width, height); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... } ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn create_sample_textures( device: &wgpu::Device, width: u32, height: u32, ) -> [wgpu::Texture; 2] { let desc = wgpu::TextureDescriptor { label: Some("radiance samples"), size: wgpu::Extent3d { width, height, depth_or_array_layers: 1, }, mip_level_count: 1, sample_count: 1, dimension: wgpu::TextureDimension::D2, format: wgpu::TextureFormat::Rgba32Float, usage: wgpu::TextureUsages::TEXTURE_BINDING | wgpu::TextureUsages::STORAGE_BINDING, view_formats: &[], }; // Create two textures with the same parameters. [device.create_texture(&desc), device.create_texture(&desc)] } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [radiance-samples-textures]: [render.rs] Radiance samples textures] Now, let's add the new bindings to the the bind group layout definition, assigning binding index $1$ to the read-only binding (previous sums) and $2$ to the write-only storage binding (the updated sums): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... fn create_display_pipeline( device: &wgpu::Device, shader_module: &wgpu::ShaderModule, ) -> (wgpu::RenderPipeline, wgpu::BindGroupLayout) { let bind_group_layout = device.create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor { label: None, entries: &[ wgpu::BindGroupLayoutEntry { binding: 0, visibility: wgpu::ShaderStages::FRAGMENT, ty: wgpu::BindingType::Buffer { ty: wgpu::BufferBindingType::Uniform, has_dynamic_offset: false, min_binding_size: None, }, count: None, }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight wgpu::BindGroupLayoutEntry { binding: 1, visibility: wgpu::ShaderStages::FRAGMENT, ty: wgpu::BindingType::Texture { sample_type: wgpu::TextureSampleType::Float { filterable: false, }, view_dimension: wgpu::TextureViewDimension::D2, multisampled: false, }, count: None, }, wgpu::BindGroupLayoutEntry { binding: 2, visibility: wgpu::ShaderStages::FRAGMENT, ty: wgpu::BindingType::StorageTexture { access: wgpu::StorageTextureAccess::WriteOnly, format: wgpu::TextureFormat::Rgba32Float, view_dimension: wgpu::TextureViewDimension::D2, }, count: None, }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ], }); let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor { ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [updated-pipeline-layout]: [render.rs] Updated bind group layout] Next, we need to change the actual bind group object to match the new layout. We want to alternate the texture assignments but a bind group cannot be modified once it's created. We could instead create two bind groups with the textures swapped and alternate those at render time: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, uniforms: Uniforms, uniform_buffer: wgpu::Buffer, display_pipeline: wgpu::RenderPipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_bind_groups: [wgpu::BindGroup; 2], ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, frame_count: u32, } impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); // Initialize the uniform buffer. let uniforms = Uniforms { width, height, frame_count: 0, }; let uniform_buffer = device.create_buffer(&wgpu::BufferDescriptor { label: Some("uniforms"), size: std::mem::size_of::() as u64, usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST, mapped_at_creation: false, }); let radiance_samples = create_sample_textures(&device, width, height); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let display_bind_groups = create_display_bind_groups( &device, &display_layout, &radiance_samples, &uniform_buffer, ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, uniforms, uniform_buffer, display_pipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_bind_groups, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } } ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn create_display_bind_groups( device: &wgpu::Device, layout: &wgpu::BindGroupLayout, textures: &[wgpu::Texture; 2], uniform_buffer: &wgpu::Buffer, ) -> [wgpu::BindGroup; 2] { let views = [ textures[0].create_view(&wgpu::TextureViewDescriptor::default()), textures[1].create_view(&wgpu::TextureViewDescriptor::default()), ]; [ // Bind group with view[0] assigned to binding 1 and view[1] assigned to binding 2. device.create_bind_group(&wgpu::BindGroupDescriptor { label: None, layout, entries: &[ wgpu::BindGroupEntry { binding: 0, resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding { buffer: uniform_buffer, offset: 0, size: None, }), }, wgpu::BindGroupEntry { binding: 1, resource: wgpu::BindingResource::TextureView(&views[0]), }, wgpu::BindGroupEntry { binding: 2, resource: wgpu::BindingResource::TextureView(&views[1]), }, ], }), // Bind group with view[1] assigned to binding 1 and view[0] assigned to binding 2. device.create_bind_group(&wgpu::BindGroupDescriptor { label: None, layout, entries: &[ wgpu::BindGroupEntry { binding: 0, resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding { buffer: uniform_buffer, offset: 0, size: None, }), }, wgpu::BindGroupEntry { binding: 1, resource: wgpu::BindingResource::TextureView(&views[1]), }, wgpu::BindGroupEntry { binding: 2, resource: wgpu::BindingResource::TextureView(&views[0]), }, ], }), ] } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [textures-bind-groups]: [render.rs] Bind groups with different texture assignments] Now, let's update the shader: The intersection test logic remains the same as before but instead of returning the computed radiance value right away, we first store it in a local variable (`radiance_sample`). Next we fetch the current tally from the "old" texture (`old_sum`) and compute the updated tally by adding `radiance_sample` to it. We want to ensure that the accumulation starts as 0, so we set `old_sum` to `vec3(0)` for the initial frame (when `frame_count` is equal to $1$). Then simply return `new_sum / f32(uniform.frame_count)`, i.e. the current average, in the RGB channels of the output color.[^ch6-footnote9] Finally, let's update the bind group assignment in `PathTracer::render_frame` to ping-pong between the two bind groups we created, using even and odd values of `frame_count` as a toggle: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { ... pub fn render_frame(&mut self, target: &wgpu::TextureView) { self.uniforms.frame_count += 1; self.queue .write_buffer(&self.uniform_buffer, 0, bytemuck::bytes_of(&self.uniforms)); let mut encoder = self .device .create_command_encoder(&wgpu::CommandEncoderDescriptor { label: Some("render frame"), }); let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor { label: Some("display pass"), color_attachments: &[Some(wgpu::RenderPassColorAttachment { view: target, resolve_target: None, ops: wgpu::Operations { load: wgpu::LoadOp::Clear(wgpu::Color::BLACK), store: wgpu::StoreOp::Store, }, })], ..Default::default() }); render_pass.set_pipeline(&self.display_pipeline); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight render_pass.set_bind_group( 0, &self.display_bind_groups[(self.uniforms.frame_count % 2) as usize], &[], ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // Draw 1 instance of a polygon with 6 vertices render_pass.draw(0..6, 0..1); // End the render pass by consuming the object. drop(render_pass); let command_buffer = encoder.finish(); self.queue.submit(Some(command_buffer)); } } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [ping-pong-bind-groups]: [render.rs] Ping-pong bind groups] When you run the code, you should see the animation from before but the displayed image should look a bit smeared. You should be able to see the oscillating sphere leave behind a "trail" over the first few seconds and the image should eventually settle at something like this: ![Figure [temporal-blur-effect]: Temporal Blur Effect](../images/img-12-blurred-animation.png) I find it fun to watch the rendering of this image. After the program runs for a few seconds the image seems to reach a steady state. This happens when the renderer has collected enough samples that adding new ones doesn't perceivably contribute to the average. The sphere radii are oscillating inside a fixed range, so we observe all possible frame states of the animation rather quickly. [^ch6-footnote7]: We are making the assumption that light travels along straight lines. [^ch6-footnote8]: wgpu supports a [read/write access mode](https://docs.rs/wgpu/0.19.3/wgpu/enum.StorageTextureAccess.html#variant.ReadWrite) which is hidden behind the adapter feature `TEXTURE_ADAPTER_SPECIFIC_FORMAT_FEATURES`. This isn't guaranteed to be supported by all GPUs but feel free use it if yours does. [^ch6-footnote9]: Note that the value we store in the texture (`vec4(new_sum, 0.)`) has its alpha component set to $0$. We aren't making use of the alpha values so it doesn't matter what we set this to. Antialiasing ------------ Let's undo the animation and bring back the static spheres. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ... let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); var closest_hit = Intersection(vec3(0.), FLT_MAX); for (var i = 0u; i < OBJECT_COUNT; i += 1u) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let sphere = scene[i]; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete var sphere = scene[i]; sphere.radius += sin(f32(uniforms.frame_count) * 0.02) * 0.2; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let hit = intersect_sphere(ray, sphere); if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } var radiance_sample: vec3f; ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [stop-animating]: [shaders.wgsl] Remove radius animation] The output should be the same still image from the end of Chapter 5 (Figure 20). The accumulation logic has no effect because every sample is computing exactly the same value. Let's zoom in and take a closer look at the edges of the spheres: ![Figure [aliased-boundaries]: Aliased shape boundaries @ 400x300 ](../images/img-13-aliased-boundaries.png height="500px" class="pixel") Here, each pixel is visualized as a square. A discrete pixel can only display a single color but pixels along shape boundaries overlap multiple (continuous) surfaces. Ideally the pixel color should receive a contribution from all of those surfaces, in proportion to the "pixel area" covered by each surface. Casting a single camera ray returns only a point sample but averaging multiple _sub-pixel_ samples can give us an approximation of the whole area. Let's try a very simple approach first: subdivide a pixel into a rectangular grid and on each frame cast the ray towards one of the sub-regions. The following code change adds a small offset to the ray which cycles through the sub-regions of a 4x4 grid centered at the original ray direction, using `uniforms.frame_count` like an index. The offsets range within $[-0.5, 0.5]$ in both coordinate directions: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // Offset and normalize the viewport coordinates of the ray. let offset = vec2( f32(uniforms.frame_count % 4) * 0.25 - 0.5, f32((uniforms.frame_count % 16) / 4) * 0.25 - 0.5 ); var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [16-sample-aa]: [shaders.wgsl] 16 grid samples] ![Figure [16-sample-aa-aliased-boundaries]: Anti-aliasing with 16 regularly-spaced samples @ 400x300 ](../images/img-14-16-sample-aa.png height="500px" class="pixel") That's an improvement but we can do better. Instead of subdividing the pixel into 16 regularly-spaced discrete regions (which is prone to the same sampling artifact), let's offset the ray by a random amount within that range. This should accumulate enough samples from various parts of the pixel area over time to yield a better estimate of the average color. Plus, why limit ourselves to only 16 discrete samples when our renderer is already set up for an indefinite amount? PRNG ---- Shading languages don't provide a built-in facility to generate random numbers, which means we need to implement our own. A class of pseudorandom number generators that is very easy to implement is called _Xorshift RNGs_.[^marsaglia] Xorshift generators work by repeatedly computing the bitwise exclusive-or of an initial seed with a bit-shifted version of itself. The result is a deterministic sequence with a uniform distribution and a long period that suits our needs.[^ch6-footnote10] We can implement the RNG as a private variable such that each GPU thread gets its own local instance of the RNG state. We generally want to seed the RNG such that the pseudorandom sequence for a pixel is different across successive frames since we want to sample a different sub-pixel coordinate each time. The sequences should also ideally differ across adjacent pixels in a single frame (instead of repeating the same spatial pattern) in order to improve the sampling distribution. We can combine `uniforms.frame_count` with the pixel's coordinates using a hash function to obtain a good initial seed for each thread. I use the _One-at-a-Time Hash_ function from Bob Jenkins' Dr Dobbs article from 1997[#Jenkins97] but you could use any other hash function as long as it's fast and has good statistical properties. The following listing defines the RNG state, the hash function, and the 32-bit xorshift. `init_rng()` initializes the state with the seed. The RNG state and the generated numbers are 32-bit unsigned integers. Since we're pretty much only dealing with floating-point numbers, the code includes a `rand_f32()` function that generates and converts a random `u32` to a `f32` between $0$ and $1$: Now to change the offset computation in the fragment shader to pick a random coordinate: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight init_rng(vec2u(pos.xy)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Offset and normalize the viewport coordinates of the ray. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [randomized-ssaa-code]: [shaders.wgsl] 16 grid samples] Now, the anti-aliased edges have a much more gradual transition and look a lot less blocky compared to our previous 16-sample AA: ![Figure [randomized-ssaa-image]: Randomized sub-pixel supersampling @ 400x300 ](../images/img-15-random-subpixel-samples.png height="500px" class="pixel") [^ch6-footnote10]: Xorshift is a so-called _linear congruential generator_. The random offsets generated with xorshift follow a _white noise_ pattern in that they appear to be "purely random": the sample points may appear clumped together in some places and have large gaps in others. A more even spatial distribution of points (e.g. using _blue noise_ or the _Sobol sequence_) is generally more desirable for stochastic methods but the Xorshift PRNG is good enough for our purposes, given our large number of samples. Path Tracing ==================================================================================================== What we perceive as color, shadows, transparency, reflections, and many other visual phenomena are the result of complex interactions of light. If we want to achieve some amount of realism, it makes sense to base our computations on the real-world physics of light. That said, it's not necessary to simulate electromagnetic wave interactions to render a visually pleasing image. What we mainly care about is how light travels and what happens when it falls on surfaces in the scene. We'll adhere to a relatively simple model with the following assumptions: - Light travels in straight lines represented as rays. - A ray transports some amount of light energy, called _radiance_. - Light gets scattered when it hits a surface. The surface absorbs some of the radiance and scatters the rest towards a new direction, represented by a new ray. - A sequence of connected rays form a _light transport path_. All light transport paths originate at a light source. ![Figure [light-paths-in-a-room]: The various paths that light rays in a room may take before they reach the camera. ](../images/fig-09-light-paths-overview.svg) There are infinitely many transport paths in a scene. The paths that contribute to the rendered image are the ones that eventually arrive at the camera, so we trace a light transport path _backwards_, starting at a camera pixel. When we find an intersection with a surface in the scene, we cast a new ray in the scattering direction based on the properties of the surface. We repeat the process until a ray intersects a light source. Path Tracing Loop ----------------- Before implementing the path tracing logic let's introduce two subroutines. The first will be a new function responsible for traversing the scene and finding an intersection, called `intersect_scene`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Intersection { normal: vec3f, t: f32, } fn no_intersection() -> Intersection { return Intersection(vec3(0.), -1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn is_intersection_valid(hit: Intersection) -> bool { return hit.t > 0.; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn intersect_scene(ray: Ray) -> Intersection { var closest_hit = Intersection(vec3(0.), FLT_MAX); for (var i = 0u; i < OBJECT_COUNT; i += 1u) { let sphere = scene[i]; let hit = intersect_sphere(ray, sphere); if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } if closest_hit.t < FLT_MAX { return closest_hit; } return no_intersection(); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL struct Ray { origin: vec3f, direction: vec3f, } ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { init_rng(vec2u(pos.xy)); let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Offset and normalize the viewport coordinates of the ray. let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5); var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let hit = intersect_scene(ray); var radiance_sample: vec3f; if is_intersection_valid(hit) { radiance_sample = vec3(0.5 * hit.normal + vec3(0.5)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } else { radiance_sample = sky_color(ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersect-scene]: [shaders.wgsl] The intersect_scene function] A second new function, called `scatter`, will be responsible for evaluating the surface material. For now it returns two values: an attenuation factor that represents the fraction of scattered radiance and a scattering direction (typically denoted with the lower-case Greek letter $\omega$). We store the attenuation factor as a `vec3f` since we're computing a separate radiance value for each color channel.[^ch7-footnote3] Surface materials (which we'll explore in Section [materials]) are represented by various _scattering functions_. A scattering function maps an _incident_ light direction $\omega_i$ to an _outgoing_ light direction $\omega_o$. The rays originate from the camera and trace the transport path backwards towards light sources, so when we call `scatter` we already know the scattering direction $\omega_o$. In that sense "scatter" is somewhat a misnomer, since we're using it to compute $\omega_i$. This doesn't really make a difference, as the incident and outgoing light directions are interchangeable. The surface scattering functions that we will implement are all going to be _bi-directional_, i.e. work the same way in either direction. As such, our `scatter` function allows the `input_ray` parameter to be either an incident or a scattered light direction. Let's make the scattering function reflect the ray around the normal vector like a perfect mirror. The direction of a reflected ray given an incident ray direction and a surface normal is given by _Snell's Law_. Luckily, there is a handy shader instrinsic called `reflect` that can compute this for us: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct Scatter { attenuation: vec3f, ray: Ray, } fn scatter(input_ray: Ray, hit: Intersection) -> Scatter { let reflected = reflect(input_ray.direction, hit.normal); let output_ray = Ray(point_on_ray(input_ray, hit.t), reflected); let attenuation = vec3(0.4); return Scatter(attenuation, output_ray); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL struct Ray { origin: vec3f, direction: vec3f, } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersect-scene]: [shaders.wgsl] The intersect_scene function] The returned attenuation factor of $0.4$ means that the material absorbs 60% of the incoming radiance (in all color channels) and scatters the rest. Logically, we compute this by multiplying the transported radiance by the attenuation factor at every intersection. We don't actually know the radiance value until we reach light sources but we can compute the total attenuation and the transported radiance separately. We'll write a loop that traces a path, generating rays as it finds intersections. We'll accumulate the product of attenuation factors in a `throughput` variable and multiply that by the radiance emitted by any light source that we encounter (which is just the sky for now): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight const MAX_PATH_LENGTH: u32 = 6u; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { init_rng(vec2u(pos.xy)); let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Offset and normalize the viewport coordinates of the ray. let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5); var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight var ray = Ray(origin, direction); var throughput = vec3f(1.); var radiance_sample = vec3(0.); var path_length = 0u; while path_length < MAX_PATH_LENGTH { let hit = intersect_scene(ray); if !is_intersection_valid(hit) { // If no intersection was found, return the color of the sky and terminate the path. radiance_sample += throughput * sky_color(ray); break; } let scattered = scatter(ray, hit); throughput *= scattered.attenuation; ray = scattered.ray; path_length += 1u; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL // Fetch the old sum of samples. var old_sum: vec3f; if uniforms.frame_count > 1 { old_sum = textureLoad(radiance_samples_old, vec2u(pos.xy), 0).xyz; } else { old_sum = vec3(0.); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersect-scene]: [shaders.wgsl] The intersect_scene function] `throughput` starts out as $1$ (meaning no radiance has been absorbed). We also impose an artificial limit on the length of a path to prevent looping forever if we never encounter a light source (which can happen with certain types of geometry). Running this program should produce this image: ![Figure [invalid-scatter-with-shadow-acne]: Validating the path tracing loop (with self-shadowing) ](../images/img-18-mirror-reflection-with-shadow-acne.png) We can see some reflections but there are some nasty circular bands. This artifact (called "shadow acne" or "self-shadowing") is caused by the limited (and quantized) precision inherent to floating point arithmetic. Sometimes the computed intersection point doesn't fall precisely on the sphere surface, which can cause the new ray (originating from that point) to re-intersect the sphere. An easy way to deal with this is to reject intersections for values of $t$ that are below a small offset ($\epsilon$ or _epsilon_): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL const FLT_MAX: f32 = 3.40282346638528859812e+38; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight const EPSILON: f32 = 1e-3; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn intersect_sphere(ray: Ray, sphere: Sphere) -> Intersection { let v = ray.origin - sphere.center; let a = dot(ray.direction, ray.direction); let b = dot(v, ray.direction); let c = dot(v, v) - sphere.radius * sphere.radius; let d = b * b - a * c; if d < 0. { return no_intersection(); } let sqrt_d = sqrt(d); let recip_a = 1. / a; let mb = -b; let t1 = (mb - sqrt_d) * recip_a; let t2 = (mb + sqrt_d) * recip_a; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let t = select(t2, t1, t1 > EPSILON); if t <= EPSILON { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return no_intersection(); } let p = point_on_ray(ray, t); let N = (p - sphere.center) / sphere.radius; return Intersection(N, t); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersect-scene]: [shaders.wgsl] The intersect_scene function] This code considers both `t1` and `t2` as `t2` (the farther point) could be a valid intersection if the ray resulted from a surface _refraction_ (e.g. for a glass-like material). Following this change, the rendering should look like this: ![Figure [shadow-acne-fixed]: Validating the path tracing loop ](../images/img-19-mirror-reflection-no-acne.png) That looks much cleaner. Some reflections are visible and both spheres have acquired a blue tint where light paths eventually reach the sky. Some light paths bounce back and forth between both spheres. Each bounce is an "absorption event" that decreases the path throughput. The image looks darker with more absorptions, which is most apparent where the two spheres meet. Gamma Correction ---------------- Right now, this image looks a bit too dark. The perceived brightness (or luminance) of a pixel should ideally scale linearly with the stored radiance value. In other words, if a material absorbs 50% of the radiance arriving directly from the sky, it should appear half as dark as the sky. However, the reflections of both spheres become nearly invisible after only three ray bounces. This is because the surface texture expects pixel values to be _gamma encoded_. Our eyes are more sensitive to changes in dark tones than they are to similar changes in bright tones. Given that we only have a fixed range to represent pixel's luminance ($[0, 1]$), it is more efficient (in terms of storage) to allocate a bigger numerical range for smaller radiance values. This is how digital images usually get stored and virtually all displays apply _gamma correction_ while converting pixel values to light.[^ch7-footnote4] The formula for gamma ($\gamma$) encoding is $V_{out} = V_{in}^{\frac{1}{\gamma}}$. We can apply this function in the fragment shader right before outputting the color: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ... // Compute and store the new sum. let new_sum = radiance_sample + old_sum; textureStore(radiance_samples_new, vec2u(pos.xy), vec4(new_sum, 0.)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // Display the average after gamma correction (gamma = 2.2) let color = new_sum / f32(uniforms.frame_count); return vec4(pow(color, vec3(1. / 2.2)), 1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [gamma-2.2]: [shaders.wgsl] Encoding a pixel with $\gamma = 2.2$] The gamma corrected output should look like this: ![Figure [gamma-correction]: Gamma-correction](../images/img-20-gamma-correction.png) Some platforms support textures with a _sRGB_ format. Pixels automatically undergo gamma compression (or decompression) upon writes and reads to sRGB textures. You can try this yourself: instead of applying gamma correction in the shader, change all instances of `Rgba8Unorm` and `Bgra8Unorm` in `src/main.rs` and `src/render.rs` to `Rgba8UnormSrgb` and `Bgra8UnormSrgb`. You should see a similar result if your platform supports sRGB surfaces. [^ch7-footnote3]: This RGB representation is simple and works well for most cases but cannot accurately represent effects like diffraction and interference. There are alternative representations to handle such phenomena, for example by storing a power distribution across a spectrum of constituent wavelengths. [^ch7-footnote4]: [_Understanding Gamma Correction_](https://www.cambridgeincolour.com/tutorials/gamma-correction.htm) (by Cambridge in Color) is a great short read on the topic. Path Length ----------- Let's momentarily set the attenuation factor to 1, so that both spheres reflect 100% of the energy they receive. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn scatter(input_ray: Ray, hit: Intersection) -> Scatter { let reflected = reflect(input_ray.direction, hit.normal); let output_ray = Ray(point_on_ray(input_ray, hit.t), reflected); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let attenuation = vec3(1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [no-absorption]: [shaders.wgsl] Spheres that reflect all light] You should get this result: ![Figure [mirrors-showing-bias]: Bias from early termination ](../images/img-21-max-bounces-too-low.png) The image looks a lot brighter (as expected) but there is a well-defined black circle in between the spheres. That looks wrong, given the spheres aren't supposed to absorb any light. Luckily there is an easy explanation: the current upper limit on path length (i.e. `MAX_PATH_LENGTH`) is too low to fully explore that part of the scene, so the path gets terminated before it can find the light source. ![Figure [path-sphere-interreflections]: A light transport path with 7 bounces ](../images/fig-11-sphere-interreflections.svg) Try increasing `MAX_PATH_LENGTH` to 10. There should be a black circle but smaller. It turns out that at least 13 bounces are necessary to eliminate the black circle for this particular scene: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight const MAX_PATH_LENGTH: u32 = 13u; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [increase-path-length]: [shaders.wgsl] Increased path length] ![Figure [mirrors-with-13-bounces]: Infinite mirror with 13 bounces ](../images/img-22-infinite-mirror-with-13-bounces.png) This begs the question: what is the ideal value for `MAX_PATH_LENGTH`? The answer depends on a number of factors, but it mainly comes down to the scene and performance expectations. Fewer bounces means less computation but potentially incorrect images. More bounces means more light paths get explored but more computation is necessary. It also increases the chances of wasted work on paths that don't contribute significantly to the final image. We'll revisit this topic later. Colored Spheres --------------- All real-world objects absorb some amount of light. They also impart a color on the light that they reflect. It would be nice to assign different colors to the spheres so we can tell them apart. For now, let's add an additional field to the `Sphere` structure to hold a shape's color: The RGB triplet can directly represent the attenuation factor for a given sphere. Let's have the intersection routine return the color of a sphere and use that color in the scattering function: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Intersection { normal: vec3f, t: f32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight color: vec3f, } fn no_intersection() -> Intersection { return Intersection(vec3(0.), -1., vec3(0.)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn intersect_scene(ray: Ray) -> Intersection { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight var closest_hit = no_intersection(); closest_hit.t = FLT_MAX; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL for (var i = 0u; i < OBJECT_COUNT; i += 1u) { let sphere = scene[i]; let hit = intersect_sphere(ray, sphere); if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } if closest_hit.t < FLT_MAX { return closest_hit; } return no_intersection(); } ... fn scatter(input_ray: Ray, hit: Intersection) -> Scatter { let reflected = reflect(input_ray.direction, hit.normal); let output_ray = Ray(point_on_ray(input_ray, hit.t), reflected); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let attenuation = hit.color; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [apply-sphere-color]: [shaders.wgsl] Use sphere color to attenuate throughput] ![Figure [colored-spheres]: Spheres with different colors](../images/img-23-colored-spheres.png) Interactive Camera ==================================================================================================== So far, we've been looking at spheres from a fixed position and it would be nice to be able to move around. In order to reposition the camera with user input, we need a representation of the camera state that is shared between the CPU and GPU sides of the program. In our GPU code, we have relied on built-in vector algebra primitives (such as `vec3`) provided by WGSL. We need similar primitives on the CPU side in order to compute camera parameters (such as camera position and orientation) in response to input events generated by the windowing system. To that end, we'll be adding a new `algebra` module for linear algebra utilities: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight mod algebra; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust mod render; const WIDTH: u32 = 800; const HEIGHT: u32 = 600; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [algebra-module-decl]: [main.rs] Declare the `algebra` module] `algebra.rs` defines a single type: `Vec3`. As its name suggests, this type represents a 3-dimensional vector with three 32-bit floating point components. `Vec3` defines methods for vector operations and operator overloads for component-wise arithmetic (`+, -, *, /`) and assignment (`+=, -=, *=, /=`).[^ch8-footnote1] The memory layout of a `Vec3` consists of three contiguous `f32`'s (taking up 12 bytes) which exactly matches the layout of the WGSL `vec3f` type. [^ch8-footnote1]: In Rust, operators get overloaded by implementing traits (`std::ops::Add`, `std::ops::Sub`, `std::ops::Mul`, `std::ops::Div`, etc). The operator traits are parameterized on value types (such as `fn add(self, rhs: RHS) -> Output` in `std::ops::Add`) and don't automatically extend to invocations on borrows. For example, `a + b`, where `a` and `b` are both `Vec3`, is different from `a + &b`, `&a + b`, and `&a + &b`. The `impl_binary_op` macro automatically implements the traits for all of these combinations, for convenience. Uniforms and Alignment ---------------------- We can use the uniform buffer to make the camera parameters visible to both the CPU and GPU sides of the program. Let's define a new `CameraUniforms` structure that just stores the camera position: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Uniforms { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight camera: CameraUniforms, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL width: u32, height: u32, frame_count: u32, } @group(0) @binding(0) var uniforms: Uniforms; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct CameraUniforms { origin: vec3f, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { init_rng(vec2u(pos.xy)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let origin = uniforms.camera.origin; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let focus_distance = 1.; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [apply-sphere-color]: [shaders.wgsl] Use sphere color to attenuate throughput] We need to mirror these changes on the CPU side. Let's introduce a new Rust module called `camera` for all camera related code. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight use bytemuck::{Pod, Zeroable}; use crate::algebra::Vec3; #[derive(Debug, Copy, Clone, Pod, Zeroable)] #[repr(C)] pub struct CameraUniforms { origin: Vec3, } pub struct Camera { uniforms: CameraUniforms, } impl Camera { pub fn new(origin: Vec3) -> Camera { Camera { uniforms: CameraUniforms { origin }, } } pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-module]: [camera.rs] The `camera` module] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... mod algebra; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight mod camera; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust mod render; const WIDTH: u32 = 800; const HEIGHT: u32 = 600; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-module-decl]: [main.rs] Declare the `camera` module] The module defines two structs: `CameraUniforms` and `Camera`. `CameraUniforms` is going to contain only the state that will be shared with the GPU, while `Camera` is meant to be a higher level wrapper that can contain additional variables. For now, the only state is the camera origin so the type definition is pretty bare bones. Let's update our CPU-side `Uniforms` struct to mirror the GPU side by including `CameraUniforms`. We'll also reposition the camera origin to verify our changes: When you run this code, wgpu should emit an API validation error that says "_Buffer is bound with size 24 where the shader expects 32_." 24 bytes looks correct at first glance: 12 bytes for a `Vec3` (4 bytes each for 3 `f32`s), and 3 `u32`s for the `width`, `height`, and `frame_count` fields, each taking up 4 bytes. The error message says the shader declared a 32-byte struct, so where do the 8 missing bytes come from? The answer is _implicit padding_ inserted by WGSL to satisfy alignment requirements. Computers access memory more efficiently if the memory address of the accessed data is aligned to certain multiples of the processor word size. WGSL defines specific rules for its scalar and vector types[^ch8-footnote2] and it expects the memory layout of bound data structures to adhere to those rules (see Table [scalar-and-vector-alignment]). Type | Alignment | Size :----:|:---------:|:----: **u32, f32** | 4 | 4 **vec2** | 8 | 8 **vec3** | 16 | 12 **vec4** | 16 | 16 [Table [scalar-and-vector-alignment]: Alignment and data sizes for scalar and vector types.] The alignment of a struct is equal to the largest alignment among its members. The size of a struct is defined as the sum of the sizes of its members, rounded up to a multiple of its alignment. Before our last change, the `Uniforms` struct had 4-byte alignment and occupied 12 bytes in size. We introduced the `CameraUniforms` structure, which has a single member of type `vec3f` and therefore 16-byte alignment. `vec3f` is 12 bytes in size, so the struct is _padded_ with 4 bytes to bring its size up to 16. While WGSL does this implicitly, we need to explicitly add padding on the Rust side. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[derive(Debug, Copy, Clone, Pod, Zeroable)] #[repr(C)] pub struct CameraUniforms { origin: Vec3, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight _pad: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl Camera { pub fn new(origin: Vec3) -> Camera { Camera { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight uniforms: CameraUniforms { origin, _pad: 0 }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } } pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-uniforms-padded]: [camera.rs] `CameraUniforms` explicitly padded] We also introduced a new member of type `CameraUniforms` to the `Uniforms` struct. That increased the latter's alignment to 16 and brought its size up to 28 bytes. 28 is not a multiple of the new alignment and the next closest multiple is 32. Therefore we need to pad `Uniforms` with 4 additional bytes: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { camera: CameraUniforms, width: u32, height: u32, frame_count: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight _pad: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ... // Initialize the uniform buffer. let camera = Camera::new(Vec3::new(0., -0.5, 1.)); let uniforms = Uniforms { camera: *camera.uniforms(), width, height, frame_count: 0, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight _pad: 0, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust }; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [uniforms-padded]: [render.rs] `Uniforms` explicitly padded] The padding is currently wasted space but we will make use of it in the future. Running the program should now pass validation and render an image that looks like this: ![Figure [repositioned-camera-origin]: Camera origin repositioned](../images/img-24-camera-origin-repositioned.png) [^ch8-footnote2]: The alignment and size requirements for WGSL types are defined at https://www.w3.org/TR/WGSL/#alignment-and-size. Rotation -------- We know how to reposition the camera but the view direction is still fixed towards $-z$. Remember that we define the camera ray direction for each pixel in terms of a point on an imaginary viewport (Figure [camera-view-space]). Conceptually, rotating the camera to change the view direction is much like moving and rotating the viewport around the camera origin. Let's imagine for a moment that the coordinate system depicted in Figure [camera-view-space] is distinct from the coordinate space of the scene. This new _camera coordinate space_ has its own $x$, $y$, and $z$ axes. The viewport is always parallel to the $xz$-plane and sits some distance away on the $z$-axis. We can even define this coordinate system as left-handed so that the view direction faces the $+z$-axis instead of $-z$. Now imagine that the camera coordinate space exists within the scene coordinate space and it can move around freely. Suppose that the camera coordinate axes can point towards any direction in scene space as long as they satisfy the definition of our left-handed cartesian system: **a)** the axes are always orthogonal to each other (i.e. the angle between any two axes is 90 degrees), and **b)** from the camera's point of view, $+x$ points towards the _right_, $+y$ points _up_, and $+z$ points _forward_. Let's define the scene-space orientation of the camera coordinate axes with 3 unit vectors: $\vec{\textbf{u}}$ for $+x$, $\vec{\textbf{v}}$ for $+y$, and $\vec{\textbf{w}}$ for $+z$. These are the _basis vectors_ of the camera coordinate space. For example, our current camera orientation staring down the $-z$-axis of the scene coordinate space (with $+y$ pointing up), would have the basis vectors $\vec{\textbf{u}} = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}$, $\vec{\textbf{v}} = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}$, $\vec{\textbf{w}} = \begin{bmatrix} 0 \\ 0 \\ -1 \end{bmatrix}$. ![Figure [camera-basis-vectors]: Camera basis vectors in relation to camera parameters ](../images/fig-12-camera-basis-vectors.svg) These vectors establish a relationship between the two coordinate systems. Each basis vector tells us how to project the corresponding camera-space axis onto the scene-space axes. With this information, we can transform any vector defined in one space into the other. For example, we can rotate a ray direction vector defined in camera-space into the appropriate scene-space orientation by multiplying it by this matrix: $$ \begin{bmatrix} \textbf{u}.x & \textbf{v}.x & \textbf{w}.x \\ \textbf{u}.y & \textbf{v}.y & \textbf{w}.y \\ \textbf{u}.z & \textbf{v}.z & \textbf{w}.z \end{bmatrix} $$ $\vec{\textbf{u}}$, $\vec{\textbf{v}}$, and $\vec{\textbf{w}}$ have to be unit vectors and orthogonal. Instead of specifying them directly, we will compute them from three parameters: the camera origin, a reference point the camera should "look at", and an "up" direction. The reference point will always appear at the center of the viewport. The vector pointing from the origin to this center point is the view direction $\vec{\textbf{w}}$. The cross product of two vectors yields another vector that is orthogonal to the plane formed by the original two, so once we know $\vec{\textbf{w}}$, we can compute the other two basis vectors using a series of cross products: $$ \begin{aligned} \vec{\textbf{u}} &= \vec{\textbf{w}} \times \vec{\textbf{up}} \\ \vec{\textbf{v}} &= \vec{\textbf{u}} \times \vec{\textbf{w}} \end{aligned} $$ Let's start with the WGSL and extend the `CameraUniforms` structure to hold the basis vectors in addition to the origin. We'll construct a 3x3 matrix out of the basis vectors and use that to transform the ray which we currently compute in camera space. Note that the $z$-coordinate of the camera ray direction no longer needs to be negative, since it's now defined with respect to $\vec{\textbf{w}}$. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct CameraUniforms { origin: vec3f, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight u: vec3f, v: vec3f, w: vec3f, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { init_rng(vec2u(pos.xy)); let origin = uniforms.camera.origin; let focus_distance = 1.; // Offset and normalize the viewport coordinates of the ray. let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5); var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // Compute the scene-space ray direction by rotating the camera-space vector into a new // basis. let camera_rotation = mat3x3(uniforms.camera.u, uniforms.camera.v, uniforms.camera.w); let direction = camera_rotation * vec3(uv, focus_distance); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var ray = Ray(origin, direction); ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-rotation-matrix]: [shaders.wgsl] Ray direction rotated to camera basis] Once again, we need to pay attention to the required alignment on the CPU side. `u`, `v`, and `w` are declared as `vec3f` which must be aligned to an offset that's a multiple of 16. Since the size of `vec3f` is 12, we need to insert padding after each member to fix the alignment of the next member: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[derive(Debug, Copy, Clone, Pod, Zeroable)] #[repr(C)] pub struct CameraUniforms { origin: Vec3, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight _pad0: u32, u: Vec3, _pad1: u32, v: Vec3, _pad2: u32, w: Vec3, _pad3: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl Camera { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn look_at(origin: Vec3, center: Vec3, up: Vec3) -> Camera { let w = (center - origin).normalized(); let u = w.cross(&up).normalized(); let v = u.cross(&w); Camera { uniforms: CameraUniforms { origin, _pad0: 0, u, _pad1: 0, v, _pad2: 0, w, _pad3: 0, }, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-basis-vectors-cpu]: [camera.rs] Computing the camera basis vectors] Finally, let's update the camera position and orientation to look towards the bottom of the small sphere from above: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ... // Initialize the uniform buffer. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let camera = Camera::look_at( Vec3::new(0., 0.75, 1.), Vec3::new(0., -0.5, -1.), Vec3::new(0., 1., 0.), ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let uniforms = Uniforms { camera: *camera.uniforms(), width, height, frame_count: 0, _pad: 0, }; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-new-position]: [render.rs] New camera position] ![Figure [camera-reoriented]: New camera orientation](../images/img-25-camera-look-at.png) Zoom ---- Now let's start add controls for camera movement. It is generally useful to be able to bring the camera closer to (or away from) the object in view without changing the viewing angle. Imagine a straight line through the `origin` and `center` parameters of our `Camera::look_at` function. We can achieve a simple _zoom_ effect by moving the camera forwards or backwards along this line. The basis vector $\vec{\textbf{w}}$ already gives us the forward-facing direction on this line and it has unit length. Thus, computing the displacement of the camera origin $\textbf{P}$ along this line by distance $d$ is straightforward: $$ \begin{aligned} \textbf{P}_{forward} &= \textbf{P} + \vec{\textbf{w}} \cdot d \\ \textbf{P}_{backward} &= \textbf{P} - \vec{\textbf{w}} \cdot d \\ \end{aligned} $$ ![Figure [orbit-camera-zoom-fig]: Moving the camera origin along the view direction ](../images/fig-13-orbit-camera-distance.svg) Let's implement this as a new function called `Camera::zoom`. This will take a single parameter representing the displacement. Positive values will move the origin forward while negative values will move it backwards: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn zoom(&mut self, displacement: f32) { self.uniforms.origin += displacement * self.uniforms.w; } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-fn-zoom]: [camera.rs] The `Camera::zoom` function] The next step is to wire this up to an input method. I personally prefer the scroll wheel on a mouse (or a scroll gesture on a trackpad) for zooming, so I'll show you how to implement that. `winit` sends raw input device events in the form of a `Event::DeviceEvent`. This is an enum type (just like `Event::WindowEvent`) and the specific variant for mouse wheel events is named `DeviceEvent::MouseWheel`. The event has a parameter called `delta` which we can convert to a displacement amount. There are two variants of this parameter: - `MouseScrollDelta::PixelDelta`: represents the delta in "number of pixels", typically generated by a touch screen or trackpad. - `MouseScrollDelta::LineDelta`: represents the delta in terms of "lines in a text document", typically corresponding to the discrete "clicks" of a mouse scroll wheel. The variant you receive depends on your input device. It usually makes sense to apply a scaling factor to this delta, since using it directly is likely to result in a very large displacement in scene coordinates. I used factors of 0.001 and 0.1 for the two events respectively, though the ideal factor is going to depend on your device and system settings. The `delta` value is _signed_, with positive and negative values corresponding to scrolling up and down, which translates nicely to our `displacement` parameter. We are going to handle the mouse scroll event in our main event loop. The event loop code currently doesn't have direct access to the `Camera` object, as it is internal to the `PathTracer` constructor. `PathTracer::new` currently discards the camera object, retaining only the uniform data, as the camera state has so far been static. In addition to retaining the camera state, we also need a way to update the uniforms buffer for changes to take effect before rendering a frame. I'm going to suggest a simple refactor: let's decouple the `Camera` construction from the `PathTracer` object and instead pass the camera as an argument to `PathTracer::render_frame`. We'll simply always update the camera uniforms before rendering an individual frame: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... use crate::{ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete algebra::Vec3, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust camera::{Camera, CameraUniforms}, }; ... impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ... // Initialize the uniform buffer. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete let camera = Camera::look_at( Vec3::new(0., 0.75, 1.), Vec3::new(0., -0.5, -1.), Vec3::new(0., 1., 0.), ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let uniforms = Uniforms { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight camera: CameraUniforms::zeroed(), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust width, height, frame_count: 0, _pad: 0, }; ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn render_frame(&mut self, camera: &Camera, target: &wgpu::TextureView) { self.uniforms.camera = *camera.uniforms(); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust self.uniforms.frame_count += 1; self.queue .write_buffer(&self.uniform_buffer, 0, bytemuck::bytes_of(&self.uniforms)); ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render-frame-with-camera]: [render.rs] `render_frame` with a `Camera` parameter] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use { anyhow::{Context, Result}, winit::{ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight event::{DeviceEvent, Event, MouseScrollDelta, WindowEvent}, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust event_loop::{ControlFlow, EventLoop}, window::{Window, WindowBuilder}, }, }; mod algebra; mod camera; mod render; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight use crate::{algebra::Vec3, camera::Camera}; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust const WIDTH: u32 = 800; const HEIGHT: u32 = 600; #[pollster::main] async fn main() -> Result<()> { ... let (device, queue, surface) = connect_to_gpu(&window).await?; let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut camera = Camera::look_at( Vec3::new(0., 0.75, 1.), Vec3::new(0., -0.5, -1.), Vec3::new(0., 1., 0.), ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { WindowEvent::CloseRequested => control_handle.exit(), WindowEvent::RedrawRequested => { // Wait for the next available frame buffer. let frame: wgpu::SurfaceTexture = surface .get_current_texture() .expect("failed to get current texture"); let render_target = frame .texture .create_view(&wgpu::TextureViewDescriptor::default()); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight renderer.render_frame(&camera, &render_target); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust frame.present(); window.request_redraw(); } _ => (), }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight Event::DeviceEvent { event, .. } => match event { DeviceEvent::MouseWheel { delta } => { let delta = match delta { MouseScrollDelta::PixelDelta(delta) => 0.001 * delta.y as f32, MouseScrollDelta::LineDelta(_, y) => y * 0.1, }; camera.zoom(delta); } _ => (), }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust _ => (), } })?; Ok(()) } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [mouse-wheel-event]: [main.rs] Updating the camera on mouse wheel events] Now, run this code and use your trackpad or mouse wheel to scroll up and down. You should see some movement but you should also see some "smudging" or "ghosting". This is what I get if I scroll back and forth, pausing at different distances for a few seconds: ![Figure [smuged-zoom]: Ghosts of zoom levels past](../images/img-26-smudged-zoom.png) This is the same effect that we saw in Figure [temporal-blur-effect], which is caused by temporal accumulation. Moving the camera effectively invalidates all the samples we have collected up to that point, as our cached radiance values only make sense for a specific camera configuration. The simplest thing we can do is discard old samples whenever we mutate the camera. Luckily, this is pretty easy to do: the code we added in Listing [sample-accumulation] already ignores old samples for the initial value of `uniforms.frame_count`. So all we need to do is reset the frame count: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn reset_samples(&mut self) { self.uniforms.frame_count = 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub fn render_frame(&mut self, camera: &Camera, target: &wgpu::TextureView) { self.uniforms.camera = *camera.uniforms(); self.uniforms.frame_count += 1; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [path-tracer-reset-frame]: [render.rs] `PathTracer::reset_frame`] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... fn main() -> Result<()> { ... event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { ... }, Event::DeviceEvent { event, .. } => match event { DeviceEvent::MouseWheel { delta } => { let delta = match delta { MouseScrollDelta::PixelDelta(delta) => 0.001 * delta.y as f32, MouseScrollDelta::LineDelta(_, y) => y * 0.1, }; camera.zoom(delta); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight renderer.reset_samples(); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } _ => (), } _ => (), } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [mouse-wheel-reset-samples]: [main.rs] Reset samples on camera zoom] You should no longer see any artifacts when you zoom in and out: ![Figure [zooming-in-and-out]: (video) Zooming in and out ](../images/vid-02-zooming-in-and-out.mp4 autoplay muted loop) Pan --- The next camera movement type on our list is _pan_, which moves the camera left, right, up, or down without changing the view direction. We're going to align these 4 directions to the basis vectors $\vec{\mathbf{u}}$ and $\vec{\mathbf{v}}$ and displace the origin point on the 2D plane that is perpendicular to the view direction. ![Figure [orbit-camera-pan]: Pan movement on the uv-plane. ](../images/fig-14-orbit-camera-pan.svg) A new `Camera::pan` function will accept two delta values that represent displacement in two dimensions ($\vec{\mathbf{u}}$ and $\vec{\mathbf{v}}$). Note that both of these values (`du` and `dv`) can be negative: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } pub fn zoom(&mut self, displacement: f32) { self.uniforms.origin += displacement * self.uniforms.w; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn pan(&mut self, du: f32, dv: f32) { let pan = du * self.uniforms.u + dv * self.uniforms.v; self.uniforms.origin += pan; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-fn-pan]: [camera.rs] The `Camera::pan` function] Let's continue using the mouse, this time translating its motion to camera movement. `winit` sends the `DeviceEvent::MouseMotion` with a 2D `delta` parameter that contains the mouse displacement in $x$ and $y$ coordinates. Negative and positive values of the $x$ delta corresponds to left and right movement, respectively. Similarly, negative and positive values of the $y$ delta corresponds to movement up and down. Note that the application will receive the `DeviceEvent::MouseMotion` even without input focus. Unless we explicitly control when the camera should and should not move, all mouse movement will result in camera movement and reset the radiance samples. Bumping into the mouse while waiting for a slow render to resolve can be annoying, so let's prevent accidents and require that the user hold down a mouse button during movement. We can use the `DeviceEvent::Button` event to detect when a mouse button gets pressed and released. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut mouse_button_pressed = false; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust fn main() -> Result<()> { ... event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { ... }, Event::DeviceEvent { event, .. } => match event { DeviceEvent::MouseWheel { delta } => { let delta = match delta { MouseScrollDelta::PixelDelta(delta) => 0.001 * delta.y as f32, MouseScrollDelta::LineDelta(_, y) => y * 0.1, }; camera.zoom(delta); renderer.reset_samples(); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight DeviceEvent::MouseMotion { delta: (dx, dy) } => { if mouse_button_pressed { camera.pan(dx as f32 * 0.01, dy as f32 * -0.01); renderer.reset_samples(); } } DeviceEvent::Button { state, .. } => { // NOTE: If multiple mouse buttons are pressed, releasing any of them will // set this to false. mouse_button_pressed = state == ElementState::Pressed; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust _ => (), } _ => (), } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [mouse-motion-camera-pan]: [main.rs] Pan camera on click-and-drag] We apply a scale factor to adjust the mouse sensitivity and also flip the sign of `dy` so that moving the mouse upwards pans the camera upwards. ![Figure [camera-pan]: (video) Pan camera with mouse movement](../images/vid-03-camera-pan.mp4 autoplay muted loop) Orbit ----- The zoom and pan controls let us move the `origin` point along the camera basis vectors without changing the view direction. In order to freely look around objects in the scene, we need a way to rotate the basis vectors. The view direction $\vec{\mathbf{w}}$ is parallel to the vector that subtends the `origin` and `center` points. We can effectively re-orient $\vec{\mathbf{w}}$ by simply re-positioning these two points with respect to each other. Keeping `origin` fixed while moving `center` would result in a _first-person_ style camera (imagine shifting your gaze around you without moving) Alternately, keeping `center` fixed while moving `origin` around would appear as moving around while facing the same stationary point. Both are valid approaches, though we're going to focus on the latter. Let's say that `origin` is allowed to move freely around `center` but we require that the distance between the two points remain fixed. Now imagine a sphere that is centered at `center`, with a radius equal to the distance between the two points. All possible positions of `origin` are then located on the surface of this sphere. ![Figure [orbit-camera-angles]: The spherical coordinates of the camera origin, with azimuth angle $\theta$ and altitude angle $\phi$ ](../images/fig-15-orbit-camera-angles.svg) Given the sphere's `center` and its radius, we can represent any point on the surface of the sphere using polar coordinates: an _azimuth_ angle and an _altitude_ angle. These two angles help us define the location of `origin` in terms of rotations around the coordinate axes. This is convenient, since we can easily map mouse movement to changes in polar coordinates, and use this representation to move `origin` around `center`. With a little bit of trigonometry, we can compute $\vec{\mathbf{w}}$ from the two angles. If we also know the distance between the camera and `center`, `origin` can be computed with a simple vector addition. ### Working With Spherical Coordinates Let's expand the `Camera` struct with 4 new parameters: - `center`: the point of camera focus, which serves as the center of rotation. - `azimuth`: the azimuth angle $\theta$, defining rotation around the $y$-axis. Values can range from $0$ to $2\pi$. - `altitude`: the altitude angle $\phi$, defining rotation around the basis vector $\vec{\mathbf{u}}$. We'll allow values to range from $-\frac{\pi}{2}$ to $\frac{\pi}{2}$ such that $sin~\phi$ yields a $y$-coordinate ranging from $-1$ to $1$. - `distance`: the distance between `center` and `origin`. This is assumed to be a positive, non-zero value. The bottom are _spherical coordinates_ for `origin`, ($\theta$, $\phi$, $r$). We're going to define the coordinate system such that ($0$, $0$, $d$) corresponds to a view direction aligned with the $-z$-axis, $d$ units away from `center`. Similarly, spherical coordinates ($0$, $\pi$, $1$) will have the view direction point down the $-y$-axis, with the camera located at the cartesian coordinates ($0$, $1$, $0$). First, we'll rework the scene so that we can more easily observe rotations (the current scene is symmetrical around the $y$-axis, so changes in azimuth would be difficult to tell). Let's also reset the camera: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... async fn main() -> Result<()> { ... let (device, queue, surface) = connect_to_gpu(&window).await?; let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut camera = Camera::look_at( Vec3::new(0., 0., 1.), Vec3::new(0., 0.,-1.), Vec3::new(0., 1., 0.), ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-default-position]: [main.rs]] ![Figure [four-spheres-orbit]: Four spheres for reference](../images/img-26-four-spheres.png) For now, we are going to do away with `Camera::look_at` and introduce a new constructor called `Camera::with_spherical_coords`. This will compute the camera uniforms (i.e. the basis vectors and the origin) from the new parameters. Since we are going to need to re-compute the camera uniforms whenever the spherical coordinates change, let's factor out that logic in a helper called `Camera::calculate_uniforms`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... pub struct Camera { uniforms: CameraUniforms, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight center: Vec3, up: Vec3, distance: f32, azimuth: f32, altitude: f32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ... impl Camera { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete pub fn look_at(origin: Vec3, center: Vec3, up: Vec3) -> Camera { let w = (center - origin).normalized(); let u = w.cross(&up).normalized(); let v = u.cross(&w); Camera { uniforms: CameraUniforms { origin, _pad0: 0, u, _pad1: 0, v, _pad2: 0, w, _pad3: 0, }, } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn with_spherical_coords( center: Vec3, up: Vec3, distance: f32, azimuth: f32, altitude: f32, ) -> Camera { let mut camera = Camera { uniforms: CameraUniforms::zeroed(), center, up, distance, azimuth, altitude, }; camera.calculate_uniforms(); camera } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } pub fn zoom(&mut self, displacement: f32) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete self.uniforms.origin += displacement * self.uniforms.w; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight self.distance -= displacement; self.uniforms.origin = self.center - self.distance * self.uniforms.w; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } pub fn pan(&mut self, du: f32, dv: f32) { let pan = du * self.uniforms.u + dv * self.uniforms.v; self.uniforms.origin += pan; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn calculate_uniforms(&mut self) { // TODO: calculate the correct w. let w = Vec3::new(0., 0., -1.); let origin = self.center - self.distance * w; let u = w.cross(&self.up).normalized(); let v = u.cross(&w); self.uniforms.origin = origin; self.uniforms.u = u; self.uniforms.v = v; self.uniforms.w = w; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-basis-vectors-cpu]: [camera.rs] `Camera::with_spherical_coords`] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... async fn main() -> Result<()> { ... let (device, queue, surface) = connect_to_gpu(&window).await?; let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut camera = Camera::with_spherical_coords( Vec3::new(0., 0., -1.), Vec3::new(0., 1., 0.), 2., 0., 0., ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-default-position-spherical]: [main.rs] Default camera position with spherical coordinates] ### Altitude Control We will use mouse movement to control the altitude and azimuth angles. We are already routing `DeviceEvent::MouseMotion` events to `Camera::pan` but we could reserve _left-click_ drag for orbital movement and _right-click_ drag for pan. The `DeviceEvent::Button` event has a `button` field that can be used to identify the mouse button that was pressed or released: Next, we'll define `Camera::orbit`. Let's initially ignore the azimuth angle. Vertical mouse movement will modify the altitude angle while keeping it between $-\frac{\pi}{2}$ to $\frac{\pi}{2}$. We won't allow the angle to increase or decrease beyond this range, so once the camera moves to one of these extrema, it will stay there unless it is moved in the opposite direction. This will disallow turning the scene "upside down" and spinning continously. Let's also update `Camera::calculate_uniforms` to compute $\vec{\mathbf{w}}$ using only the altitude. Consider the unit-length vector pointing from `center` to `origin`, i.e. $-\vec{\mathbf{w}}$. The $y$-coordinate of this vector is equal to $sin~\phi$. We're ignoring azimuth, so we can simply assign $0$ to the $x$-coordinate, and $cos~\phi$ to the $z$-coordinate: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight use { bytemuck::{Pod, Zeroable}, std::f32::consts::FRAC_PI_2, }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } pub fn zoom(&mut self, displacement: f32) { self.uniforms.origin += displacement * self.uniforms.w; } pub fn pan(&mut self, du: f32, dv: f32) { let pan = du * self.uniforms.u + dv * self.uniforms.v; self.uniforms.origin += pan; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn orbit(&mut self, du: f32, dv: f32) { self.altitude = (self.altitude + dv).clamp(-FRAC_PI_2, FRAC_PI_2); self.calculate_uniforms(); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust fn calculate_uniforms(&mut self) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let w = { let (y, z) = self.altitude.sin_cos(); -Vec3::new(0., y, z) }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let origin = self.center - self.distance * w; let u = w.cross(&self.up).normalized(); let v = u.cross(&w); self.uniforms.origin = origin; self.uniforms.u = u; self.uniforms.v = v; self.uniforms.w = w; } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-orbit-altitude]: [camera.rs] `Camera::orbit`, altitude-only] Now, when you hold down the left mouse button and move the camera around, you should see something like this: ![Figure [camera-orbit-altitude]: (video) Adjust altitude angle with mouse movement ](../images/vid-04-camera-orbit-altitude.mp4 autoplay muted loop) If you look carefully, you may notice that the green and yellow spheres swap places when the altitude angle is at one of the extrema. At those angles (i.e. exactly at $-\frac{\pi}{2}$ and $\frac{\pi}{2}$) the view vector $\vec{\textbf{w}}$ becomes parallel to the _up_ vector and their cross product becomes zero. This causes both $\vec{\textbf{u}}$ and $\vec{\textbf{v}}$ to become degenerate. A simple fix is to truncate the range by a small amount, so that the angle can be close but never equal to $-\frac{\pi}{2}$ or $\frac{\pi}{2}$. This only works if _up_ is exactly $(0, 1, 0)$ or $(0, -1, 0)$ and doesn't generalize to other directions: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... pub fn orbit(&mut self, du: f32, dv: f32) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight const MAX_ALT: f32 = FRAC_PI_2 - 1e-6; self.altitude = (self.altitude + dv).clamp(-MAX_ALT, MAX_ALT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust self.calculate_uniforms(); } ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-prevent-degenerate-altitude]: [camera.rs] Apply offset to altitude clamp] That should fix the issue: ![Figure [camera-fixed-altitude-clamp]: (video) Fixed altitude clamp ](../images/vid-05-fixed-altitude-clamp.mp4 autoplay muted loop) ### Azimuth Control The same way we mapped vertical mouse movement (`dv`) to changes in altitude, we'll use horizontal movement `du` to change the control. We won't restrict the horizontal orbit the way we clamped the altitude angle and instead permit orbiting in either direction indefinitely. It still make sense to restrict the value to the $[0, 2\pi]$ range since floating-point precision decreases with large values.[^ch8-footnote3] Though instead of clamping the value we'll just let it wrap, so that an azimuth angle of $3\pi$ results in a value of $\pi$. In Rust, this is achieved with the arithmetic remainder operators `%` and `%=`. These support floating point numbers and retain the sign of the value that's on the left-hand side: for example, if the mouse moves left by $-\frac{5}{2}\pi$ (i.e. -450 degrees) the resulting angle will be $-\frac{1}{2}\pi$: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... pub fn orbit(&mut self, du: f32, dv: f32) { const MAX_ALT: f32 = FRAC_PI_2 - 1e-6; self.altitude = (self.altitude + dv).clamp(-MAX_ALT, MAX_ALT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight self.azimuth += du; self.azimuth %= 2. * PI; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust self.calculate_uniforms(); } ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-orbit-azimuth]: [camera.rs] Modifying the azimuth angle] We assigned the cosine of the altitude angle (i.e. $cos ~ \phi$) to the z-coordinate of the rotated $\vec{\textbf{w}}$. More generally, this quantity is equal to the length of a vector that results from projecting $\vec{\textbf{w}}$ onto the $xz$-plane (see Figure [orbit-camera-angles]). We can compute the $x$ and $z$ components of this vector from the azimuth angle as $(sin ~ \theta, 0, cos ~ \theta)$ scaled by the magnitude $cos ~ \phi$. Combining this with $sin ~ \phi$ for the $y$-coordinate we get: $$ -\vec{\textbf{w}} = \begin{bmatrix} sin~\theta \cdot cos~\phi \\ sin~\phi \\ cos~\theta \cdot cos~\phi \end{bmatrix} $$ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... fn calculate_uniforms(&mut self) { let w = { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let (y, xz_scale) = self.altitude.sin_cos(); let (x, z) = self.azimuth.sin_cos(); -Vec3::new(x * xz_scale, y, z * xz_scale) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust }; let origin = self.center - self.distance * w; let u = w.cross(&self.up).normalized(); let v = u.cross(&w); self.uniforms.origin = origin; self.uniforms.u = u; self.uniforms.v = v; self.uniforms.w = w; } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-orbit-azimuth-w]: [camera.rs] Computing $\vec{\textbf{w}}$ from azimuth and altitude] We can now rotate the camera horizontally and vertically around the center point: ![Figure [camera-orbit-azimuth]: (video) Fixed altitude clamp ](../images/vid-06-camera-orbit-azimuth.mp4 autoplay muted loop) [^ch8-footnote3]: As a floating point number gets larger, the precision goes down as there are fewer bits to represent the mantissa. This causes the smallest representable _increments_, also known as ULP or "Unit of Least Precision" to get larger. If you allow the azimuth angle to get arbitrarily large, you may find that the same increment in `du` results in a much faster camera rotation. ### `Camera::look_at` The new representation is in terms of spherical angles because this is convenient for computing movement over a sphere. Often we'll have a particular position in mind for the camera so it's nice to have a `Camera::look_at` function that takes an explicit camera origin. Let's bring it back and redefine it using the `with_spherical_coords` function. We can compute the altitude and azimuth angles from the `origin`, `center`, and `up` parameters: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust impl Camera { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn look_at(origin: Vec3, center: Vec3, up: Vec3) -> Camera { let center_to_origin = origin - center; let distance = center_to_origin.length().max(0.01); // Prevent distance of 0 let neg_w = center_to_origin.normalized(); let azimuth = neg_w.x().atan2(neg_w.z()); let altitude = neg_w.y().asin(); Self::with_spherical_coords(center, up, distance, azimuth, altitude) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub fn with_spherical_coords( center: Vec3, up: Vec3, distance: f32, azimuth: f32, altitude: f32, ) -> Camera { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-look-at-returns]: [camera.rs] Updated `Camera::look_at`] (insert acknowledgments.md.html here) References ========== [#Marsaglia03]: George Marsaglia, [*Xorshift RNGs*](https://www.jstatsoft.org/article/download/v008i14/916), 2003 [#Jenkins13]: Bob Jenkins, [*A hash function for hash Table lookup*](https://www.burtleburtle.net/bob/hash/doobs.html), 2013 [#Hughes13]: J.F. Hughes, A. van Dam, M. McGuire, D.F. Sklar, J.D. Foley, S.K. Feiner, K. Akeley *Computer Graphics: Principles and Practice, 3rd Edition, Section 1.6* "Beyond White Noise for Real-Time Rendering", Alan Wolfe (2024) https://youtu.be/tethAU66xaA?si=qIPEwF5XTm8kO3tF [#Immel86]: David S. Immel, Michael F. Cohen, Donald P. Greenberg *A Radiosity Method For Non-Diffuse Environments* [#Kajiya86]: James T. Kajiya *The Rendering Equation*, 1986 [#Lambert1760]: Johann Heinrich Lambert, *Photometria sive de mensura et gradibus luminis, colorum et umbrae*, 1760. Courtesy of ETH-Bibliothek Zürich, Switzerland. [#McGuire2024GraphicsCodex]: Morgan McGuire, *The Graphics Codex*, 2024 [^ericson]: C. Ericson, Real Time Collision Detection [^mcguire-codex]: https://graphicscodex.courses.nvidia.com/app.html [Arman Uguray]: https://github.com/armansito [Steve Hollasch]: https://github.com/hollasch [Trevor David Black]: https://github.com/trevordblack [RTIOW]: https://raytracing.github.io/books/RayTracingInOneWeekend.html [RTTROYL]: https://raytracing.github.io/books/RayTracingTheRestOfYourLife.html [rt-project]: https://github.com/RayTracing/ [gt-project]: https://github.com/RayTracing/gpu-tracing/ [gt-template]: https://github.com/RayTracing/gpu-tracing/blob/dev/code/template [discussions]: https://github.com/RayTracing/gpu-tracing/discussions/ [dxr]: https://en.wikipedia.org/wiki/DirectX_Raytracing [vkrt]: https://www.khronos.org/blog/ray-tracing-in-vulkan [rtiow-cuda]: https://developer.nvidia.com/blog/accelerated-ray-tracing-cuda/ [webgpu]: https://www.w3.org/TR/webgpu/ [Rust]: https://www.rust-lang.org/ [rust-unsafe]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html [wgpu]: https://wgpu.rs