**Ray Tracing: GPU Edition** [Arman Uguray][]
Draft
!!! WARNING This is a living document for a work in progress. Please bear in mind that the contents will change frequently and go through many edits before the final version. Introduction ==================================================================================================== _Ray Tracing_ is a rendering method in Computer Graphics that simulates the flow of light. It can faithfully recreate a variety of optical phenomena and can be used to render photorealistic images. _Path tracing_ is an application of this approach used to compute _Global Illumination_. Its core idea is to repeatedly trace millions of random rays through the scene and bounce them off objects based on surface properties. The algorithm is remarkably simple and relatively easy to implement when applied to a small number of material and geometry types. Peter Shirley's [_Ray Tracing In One Weekend_][RTIOW] (RTIOW) is a great introduction to building the foundation for a hobby renderer. A challenge with path tracing is its high computational cost. Rendering a complex scene takes a long time and this get worse as the rendered scenes get complex. This has historically made path tracing unsuitable for real-time applications. Fortunately -- like many problems in Computer Graphics -- the algorithm lends itself very well to parallelism. It is possible to achieve a significant speedup by distributing the work across many processor cores. The GPU (Graphics Processing Unit) is a type of processor designed to run the same set of operations over large amounts of data in parallel. This parallelism has been instrumental to achieving realistic visuals in real-time applications like video games. GPUs have been traditionally used to accelerate scanline rasterization but have since become programmable and capable of running a variety of parallel workloads. Notably, modern GPUs are now equipped with hardware cores dedicated to ray tracing. GPUs aren't without limitations. Programming a GPU requires a different approach than a typical CPU program. Taking full advantage of a GPU often involves careful tuning based on its architecture and capabilities which can vary widely across vendors and models. Rendering fully path-traced scenes at real-time rates remains elusive even on the most high-end GPUs. This is an an active and vibrant area of Computer Graphics research. This book is an introduction to GPU programming by building a simple GPU accelerated path tracer. We'll focus on building a renderer that can produce high quality and correct images using a fairly simple design. It won't be full-featured and its performance will be limited, however it will expose you to several fundamental GPU programming concepts. By the end, the renderer you'll have built can serve as a great starting point for extensions and experiments with more advanced GPU techniques. We will avoid most optimizations in favor of simplicity but the renderer will be able to achieve interactive frame rates on a decent GPU when targeting simple scenes.[^ch1] The accompanying code intentionally avoids hardware ray tracing APIs that are present on newer GPU models, instead focusing on implementing the same functionality on a programmable GPU unit using a shading language. This book follows a similar progression to [_Ray Tracing In One Weekend_][RTIOW]. It covers some of the same material but I highly recommend completing _RTIOW_ before embarking on building the GPU version. Doing so will teach you the path tracing algorithm in a much more approachable way and it will make you appreciate both the advantages and challenges of moving to a GPU-based architecture. If you run into any problems with your implementation, have general questions or corrections, or would like to share your own ideas or work, check out [the GitHub Discussions forum][discussions]. [^ch1]: A BVH-accelerated implementation can render a version of the RTIOW cover scene with ~32,000 spheres, 16 ray bounces per pixel, and a resolution of 2048x1536 on a 2022 _Apple M1 Max_ in 15 milliseconds. The same renderer performs very poorly on a 2019 _Intel UHD Graphics 630_ which takes more than 200ms to render a single sample. GPU APIs -------- Interfacing with a GPU and writing programs for it typically requires the use of a special API. This interface depends on your operating system and GPU vendor. You often have various options depending on the capabilities you want. For example, an application that wants to get the most juice out of a NVIDIA GPU for general purpose computations may choose to target CUDA. A developer who prefers broad hardware compatibility for a graphical mobile game may choose OpenGL ES or Vulkan. Direct3D (D3D) is the main graphics API on Microsoft platforms while Metal is the preferred framework on Apple systems. Vulkan, D3D12, and Metal all support an API specifically to accelerate ray tracing. You can implement this book using any API or framework that you prefer, though I generally assume you are working with a graphics API. In my examples I use an API based on [WebGPU][webgpu], which I think maps well to all modern graphics APIs. The code examples should be easy to adapt to those libraries. I avoid using ray tracing APIs (such as [DXR][dxr] or [Vulkan Ray Tracing][vkrt]) to show you how to implement similar functionality on your own. If you're looking to implement this in CUDA, you may also be interested in Roger Allen's [blog post][rtiow-cuda] titled _Accelerated Ray Tracing in One Weekend in CUDA_. Example Code ------------ Like _RTIOW_, you'll find code examples throughout the book. I use [Rust][] as the implementation language but you can choose any language that supports your GPU API of choice. I avoid most esoteric aspects of Rust to keep the code easily understandable to a large audience. On the few occasions where I had to resort to a potentially unfamiliar Rust-ism, I provide a C example to add clarity. I provide the finished source code for this book on [GitHub][gt-project] as a reference but I encourage you to type in your own code. I decided to also provide a minimal source template that you can use as a starting point if you want to follow along in Rust. The template provides a small amount of setup code for the windowing logic to help get you started. ### A note on Rust, Libraries, and APIs I chose Rust for this project because of its ease of use and portability. It is also the language that I tend to be most productive in. An important aspect of Rust is that a lot of common functionality is provided by libraries outside its standard library. I tried to avoid external dependencies as much as possible except for the following: * I use *[wgpu][]* to interact with the GPU. This is a native graphics API based on WebGPU. It's portable and allows the example code to run on Vulkan, Metal, Direct3D 11/12, OpenGL ES 3.1, as well as WebGPU and WebGL via WebAssembly. wgpu also has [native bindings in other languages](https://github.com/gfx-rs/wgpu-native). * I use [*winit*](https://docs.rs/winit/latest/winit/) which is a portable windowing library. It's used to display the rendered image in real-time and to make the example code interactive. * For ease of Rust development I use [*anyhow*](https://docs.rs/anyhow/latest/anyhow/) and [*bytemuck*](https://docs.rs/bytemuck/latest/bytemuck/). *anyhow* is a popular error handling utility and integrates seamlessly. *bytemuck* provides a safe abstraction for the equivalent of `reinterpret_cast` in C++, which normally requires [`unsafe`][rust-unsafe] Rust. It's used to bridge CPU data types with their GPU equivalents. * Lastly, I use [*pollster*](https://docs.rs/pollster/latest/pollster/) to execute asynchronous wgpu API functions (which is only called from a single line). [wgpu][] is the most important dependency as it defines how the example code interacts with the GPU. Every GPU API is different but their abstractions for the general concepts used in this book are fairly similar. I will highlight these differences occasionally where they matter. A large portion of the example code runs on the GPU. Every graphics API defines a programming language -- a so called **shading language** -- for authoring GPU programs. wgpu is based on WebGPU, as such my GPU code examples are written in *WebGPU Shading Language* (WGSL)[^ch1.2.1]. I also recommend keeping the following references handy while you're developing: * wgpu API documentation (version 0.19.1): https://docs.rs/wgpu/0.19.1/wgpu * WebGPU specification: https://www.w3.org/TR/webgpu * WGSL specification: https://www.w3.org/TR/WGSL With all of that out of the way, let's get started! [^ch1.2.1]: wgpu also supports shaders in the [SPIR-V](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html) binary format. You could in theory write your shaders in a shading language that can compile to SPIR-V (such as OpenGL's GLSL and Direct3D's HLSL) as long as you avoid any language features that can't be expressed in WGSL. Windowing and GPU Setup ==================================================================================================== The first thing to decide is how you want to view your image. One option is to write the output from the GPU to a file. I think a more fun option is to display the image inside an application window. I prefer this approach because it allows you to see your rendering as it resolves over time and it will allow you to make your application interactive later on. The downside is that it requires a little bit of wiring. First, your program needs a way to interact with your operating system to create and manage a window. Next, you need a way to coordinate your GPU workloads to output a sequence of images at the right time for your OS to be able to composite it inside the window and send it to your display. Every operating system with a graphical UI provides a native *windowing API* for this purpose. Graphics APIs typically define some way to integrate with a windowing system. You'll have various libraries to choose from depending on your OS and programming language. You mainly need to make sure that the windowing API or UI toolkit you choose can integrate with your graphics API. In my examples I use *winit* which is a Rust framework that integrates smoothly with wgpu. I put together a [project template][gt-template] that sets up the library boilerplate for the window handling. You're welcome to use it as a starting point. The setup code isn't a lot, so I'll briefly go over the important pieces in this chapter. The Event Loop -------------- The first thing the template does is create a window and associate it with an *event loop*. The OS sends a message to the application during important "events" that the application should act on, such as a mouse click or when the window gets resized. Your application can wait for these events and handle them as they arrive by looping indefinitely: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use { anyhow::{Context, Result}, winit::{ event::{Event, WindowEvent}, event_loop::{ControlFlow, EventLoop}, window::{Window, WindowBuilder}, }, }; const WIDTH: u32 = 800; const HEIGHT: u32 = 600; fn main() -> Result<()> { let event_loop = EventLoop::new()?; let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT); let window = WindowBuilder::new() .with_inner_size(window_size) .with_resizable(false) .with_title("GPU Path Tracer".to_string()) .build(&event_loop)?; // TODO: initialize renderer event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { WindowEvent::CloseRequested => control_handle.exit(), WindowEvent::RedrawRequested => { // TODO: draw frame window.request_redraw(); } _ => (), }, _ => (), } })?; Ok(()) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [main-initial]: [main.rs] Creating a window and handling window events] This code creates a window titled "GPU Path Tracer" and kicks off an event loop. `event_loop.run()` internally waits for window events and notifies your application by calling the lambda function that it gets passed as an argument. The lambda function only handles a few events for now. The most important one is `RedrawRequested` which is the signal to render and present a new frame. `MainEventsCleared` is simply an event that gets sent when all pending events have been processed. We call `window.request_redraw()` to draw repeatedly -- this triggers a new `RedrawRequested` event which is followed by another `MainEventsCleared`, which requests a redraw, and so on until someone closes the window. Running this code should bring up an empty window like this: ![Figure [empty-window]: Empty Window](../images/img-01-empty-window.png) GPU and Surface Initialization ------------------------------ The next thing the template does is establish a connection to the GPU and configure a surface. The surface manages a set of *textures* that allow the GPU to render inside the window. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust async fn connect_to_gpu(window: &Window) -> Result<(wgpu::Device, wgpu::Queue, wgpu::Surface)> { use wgpu::TextureFormat::{Bgra8Unorm, Rgba8Unorm}; // Create an "instance" of wgpu. This is the entry-point to the API. let instance = wgpu::Instance::default(); // Create a drawable "surface" that is associated with the window. let surface = instance.create_surface(window)?; // Request a GPU that is compatible with the surface. If the system has multiple GPUs then // pick the high performance one. let adapter = instance .request_adapter(&wgpu::RequestAdapterOptions { power_preference: wgpu::PowerPreference::HighPerformance, force_fallback_adapter: false, compatible_surface: Some(&surface), }) .await .context("failed to find a compatible adapter")?; // Connect to the GPU. "device" represents the connection to the GPU and allows us to create // resources like buffers, textures, and pipelines. "queue" represents the command queue that // we use to submit commands to the GPU. let (device, queue) = adapter .request_device(&wgpu::DeviceDescriptor::default(), None) .await .context("failed to connect to the GPU")?; // Configure the texture memory backs the surface. Our renderer will draw to a surface texture // every frame. let caps = surface.get_capabilities(&adapter); let format = caps .formats .into_iter() .find(|it| matches!(it, Rgba8Unorm | Bgra8Unorm)) .context("could not find preferred texture format (Rgba8Unorm or Bgra8Unorm)")?; let size = window.inner_size(); let config = wgpu::SurfaceConfiguration { usage: wgpu::TextureUsages::RENDER_ATTACHMENT, format, width: size.width, height: size.height, present_mode: wgpu::PresentMode::AutoVsync, alpha_mode: caps.alpha_modes[0], view_formats: vec![], desired_maximum_frame_latency: 3, }; surface.configure(&device, &config); Ok((device, queue, surface)) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [main-initial]: [main.rs] The connect_to_gpu function] The code that sets this all up is a bit wordy. I'll quickly go over the important bits: 1. What the first ~20 lines do is request a connection to a GPU that is compatible with the window. The bit about `wgpu::PowerPreference::HighPerformance` is a hint to the API that we want the higher-powered GPU if the current system has more than one available. 2. The rest of the function configures the dimensions, pixel format, and presentation mode of the surface. `Rgba8Unorm` and `Bgra8Unorm` are common pixel formats that store each color component (red, green, blue, and alpha) as an 8-bit unsigned integer. The "unorm" part stands for "unsigned normalized", which means that our rendering code can represent the component values as a real number in the range `[0.0, 1.0]`. We set the size to simply span the entire window. The bit about `wgpu::PresentMode::AutoVsync` tells the surface to synchronize the presentation of each frame with the display's refresh rate. The surface will manage an internal queue of textures for us and we will render to them as they become available. This prevents a visual artifact known as "tearing" (which can happen when frames get presented faster than the display refresh rate) by setting up the renderer to be *v-sync locked*. We will discuss some of the implications of this later on. The last bit that I'll highlight here is `wgpu::TextureUsage::RENDER_ATTACHMENT`. This just indicates that we are going to use the GPU's rendering function to draw directly into the surface textures. After setting all this up the function returns 3 objects: A `wgpu::Device` that represents the connection to the GPU, a `wgpu::Queue` which we'll use to issue commands to the GPU, and a `wgpu::Surface` that we'll use to present frames to the window. We will talk a lot about the first two when we start putting together our renderer in the next chapter. You may have noticed that the function declaration begins with `async`. This marks the function as *asynchronous* which means that it doesn't return its result immediately. This is only necessary because the API functions that we invoke (`wgpu::Instance::request_adapter` and `wgpu::Adapter::request_device`) are asynchronous functions. The `.await` keyword is syntactic sugar that makes the asynchronous calls appear like regular (synchronous) function calls. What happens under the hood is somewhat complex but I wouldn't worry about this too much since this is the one and only bit of asynchronous code that we will encounter. If you want to learn more about it, I recommend checking out the [Rust Async Book](https://rust-lang.github.io/async-book/). ### Completing Setup Finally, the `main` function needs a couple updates: first we make it `async` so that it we can "await" on `connect_to_gpu`. Technically the `main` function of a program cannot be async and running an async function requires some additional utilities. There are various alternatives but I chose to use a library called `pollster`. The library provides a special macro (called `main`) that takes care of everything. Again, this is the only asynchronous code that we'll encounter so don't worry about what it does. The second change to the main function is where it handles the `RedrawRequested` event. For every new frame, we first request the next available texture from the surface that we just created. The queue has a limited number of textures available. If the CPU outpaces the GPU (i.e. the GPU takes longer than a display refresh cycle to finish its tasks), then calling `surface.get_current_texture()` can block until a texture becomes available. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight #[pollster::main] async fn main() -> Result<()> { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let event_loop = EventLoop::new()?; let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT); let window = WindowBuilder::new() .with_inner_size(window_size) .with_resizable(false) .with_title("GPU Path Tracer".to_string()) .build(&event_loop)?; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let (device, queue, surface) = connect_to_gpu(&window).await?; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // TODO: initialize renderer event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { WindowEvent::CloseRequested => control_handle.exit(), WindowEvent::RedrawRequested => { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight // Wait for the next available frame buffer. let frame: wgpu::SurfaceTexture = surface .get_current_texture() .expect("failed to get current texture"); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // TODO: draw frame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight frame.present(); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust window.request_redraw(); } _ => (), }, _ => (), } })?; Ok(()) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [main-setup-complete]: [main.rs] Putting together the initial main function] Once a frame texture becomes available, the example issues a request to display it as soon as possible by calling `frame.present()`. All of our rendering work will be scheduled before this call. That was a lot of boilerplate -- this is sometimes necessary to interact with OS resources. With all of this in place, we can start building a real-time renderer. ### A note on error handling in Rust If you're new to Rust, some of the patterns above may look unfamiliar. One of these is error handling using the `Result` type. I use this pattern frequently enough that it's worth a quick explainer. A `Result` is a variant type that can hold either a success (`Ok`) value or an error (`Err`) value. The types of the `Ok` and `Err` variants are generic: `T` and `E` can be any type. It's common for a library to define its own error types to represent various error conditions. The idea is that a function returns a `Result` if it has a failure mode. A caller must check the status of the `Result` to unpack the return value or recover from an error. In a C program, a common way to handle an error is to return early from the calling function and and perhaps return an entirely new error. For example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C bool function_with_a_result(Foo* out_result); int main() { Foo foo; if (!function_with_result(&foo)) { return -1; } // ...do something with `foo`... return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust provides the `?` operator to automatically unpack a `Result` and return early if it holds an error. A Rust version of the C program above could be written like this: If `function_with_result()` returns an error, the `?` operator will cause `caller` to return and propagate the error value. This works as long as `caller` and `function_with_result` either return the same error type or types with a known conversion. There are various other ways to handle an error: I like to keep things simple in my code examples and use the `?` operator. Instead of defining custom error types and conversions, I use a catch all `Error` type from a library called *anyhow*. You'll often see the examples include `anyhow::Result` (an alias for `Result<, anyhow::Error>`) and `anyhow::Context`. The latter is a useful trait for adding an error message while converting to an `anyhow::Error`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust fn caller() -> anyhow::Result<()> { let foo: Foo = function_with_result().context("failed to get foo")?; // ...do something with `foo`... Ok(()) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can read more about the `Result` type in [its module documentation](https://doc.rust-lang.org/std/result/index.html). Drawing Pixels ==================================================================================================== At this stage, we have code that brings up a window, connects to the GPU, and sets up a queue of textures that is synchronized with the display. In Computer Graphics, the term "texture" is generally used in the context of *texture mapping*, which is a technique to apply detail to geometry using data stored in memory. A very common application is to map color data from the pixels of a 2D image onto the surface of a 3D polygon. Texture mapping is so essential to real-time graphics that all modern GPUs are equipped with specialized hardware to speed up texture operations. It's not uncommon for a modern video game to use texture assets that take up hundreds of megabytes. Processing all of that data involves a lot of memory traffic which is a big performance bottleneck for a GPU. This is why GPUs come with dedicated texture memory caches, sampling hardware, compression schemes and other features to improve texture data throughput. We are going to use the texture hardware to store the output of our renderer. In wgpu, a *texture object* represents texture memory that can be used in three main ways: texture mapping, shader storage, or as a *render target*[^ch3-cit1]. A surface texture is a special kind of texture that can only be used as a render target. Not all native APIs have this restriction. For instance, both Metal and Vulkan allow their version of a surface texture -- a *frame buffer* (Metal) or *swap chain* (Vulkan) texture -- to be configured for other usages, though this sometimes comes with a warning about impaired performance and is not guaranteed to be supported by the hardware. wgpu doesn't provide any other option so I'm going to start by implementing a render pass. This is a fundamental and very widely used function of the GPU, so it's worth learning about. [^ch3-cit1]: See [`wgpu::TextureUsages`](https://docs.rs/wgpu/0.17.0/wgpu/struct.TextureUsages.html). The render Module --------------------- I like to separate the rendering code from all the windowing code, so I'll start by creating a file named `render.rs`. Every Rust file makes up a *module* (with the same name) which serves as a namespace for all functions and types that are declared in it. Here I'll add a data structure called `PathTracer`. This will hold all GPU resources and eventually implement our path tracing algorithm: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); // TODO: initialize GPU resources PathTracer { device, queue, } } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render-initial]: [render.rs] The PathTracer structure] We start out with an associated function called `PathTracer::new` which will serve as the constructor and eventually initialize all GPU resources. The `PathTracer` takes ownership of the `wgpu::Device` and `wgpu::Queue` that we created earlier and it will hold on to them for the rest of the application's life. `wgpu::Device` represents a connection to the GPU. It is responsible for creating resources like texture, buffer, and pipeline objects. It also defines some methods for error handling. The first thing I do is set up an "uncaptured error" handler. If you look at the [declarations ](https://docs.rs/wgpu/0.17.0/wgpu/struct.Device.html) of resource creation methods you'll notice that none of them return a `Result`. This doesn't mean that they always succeed, as a matter of fact all of these operations can fail. This is because wgpu closely mirrors the WebGPU API which uses a concept called *error scopes* to detect and respond to errors. Whenever there's an error that I don't handle using an error scope it will trigger the uncaptured error handler, which will print out an error message and abort the program[^ch3.1-cit1]. For now, I won't set up any error scopes in `PathTracer::new` and I'll abort the program if the API fails to create the initial resources. Next, let's declare the `render` module and initialize a `PathTracer` in the `main` function: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Rust highlight mod render; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust const WIDTH: u32 = 800; const HEIGHT: u32 = 600; #[pollster::main] async fn main() -> Result<()> { let event_loop = EventLoop::new(); let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT); let window = WindowBuilder::new() .with_inner_size(window_size) .with_resizable(false) .with_title("GPU Path Tracer".to_string()) .build(&event_loop)?; let (device, queue, surface) = connect_to_gpu(&window).await?; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Rust highlight let renderer = render::PathTracer::new(device, queue); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust event_loop.run(move |event, _, control_flow| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { WindowEvent::CloseRequested => control_handle.exit(), WindowEvent::RedrawRequested => { // Wait for the next available frame buffer. let frame: wgpu::SurfaceTexture = surface .get_current_texture() .expect("failed to get current texture"); // TODO: draw frame frame.present(); window.request_redraw(); } _ => (), }, _ => (), } }); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [main-renderer-init]: [main.rs] Initializing a Renderer] Now that we have the skeleton in place, it's time to paint some pixels on the screen. [^ch3.1-cit1]: This is actually the default behavior so I didn't really need to call `on_uncaptured_error`. Display Pipeline ---------------- Before setting up the render pass let's first talk about how it works. Traditionally, graphics systems have been modeled after an abstraction called the *graphics pipeline*.[#Hughes13] At a very high level, the input to the pipeline is a mathematical model that describes what to draw -- such as geometry, materials, and light -- and the output is a 2D grid of pixels. This transformation is processed in a series of standard *pipeline stages* which form the basis of the rendering abstraction provided by GPUs and graphics APIs. wgpu uses the term *render pipeline* which is what I'll use going forward. The input to the render pipeline is a polygon stream represented by points in 3D space and their associated data. The polygons are described in terms of geometric primitives (points, lines, and triangles) which consist of *vertices*. The *vertex stage* transforms each vertex from the input stream into a 2D coordinate space that corresponds to the viewport. After some additional processing (such as clipping and culling) the assembled primitives are passed on to the *rasterizer*. The rasterizer applies a process called scan conversion to determine the pixels that are covered by each primitive and breaks them up into per-pixel *fragments*. The output of the vertex stage (the vertex positions, texture coordinates, vertex colors, etc) gets interpolated between the vertices of the primitive and the interpolated values get assigned to each fragment. Fragments are then passed on to the *fragment stage* which computes an output (such as the pixel or sample color) for each fragment. Shading techniques such as texture mapping and lighting are usually performed in this stage. The output then goes through several other operations before getting written to the render target as pixels.[^ch3-footnote1] ![Figure [render-pipeline]: Vertex and Fragment stages of the render pipeline ](../images/fig-01-render-pipeline.svg) What I just described is very much a data pipeline: a data stream goes through a series of transformations in stages. The input to each stage is defined in terms of smaller elements (e.g. vertices and pixel-fragments) that can be processed in parallel. This is the fundamental principle behind the GPU. Early commercial GPUs implemented the graphics pipeline entirely in fixed-function hardware. Modern GPUs still use fixed-function stages (and at much greater data rates) but virtually all of them allow you to program the vertex and fragment stages with custom logic using *shader programs*. [^ch3-footnote1]: I glossed over a few pipeline stages (such as geometry and tessellation) and important steps like multi-sampling, blending, and the scissor/depth/stencil tests. These play an important role in many real-time graphics applications but we won't make use of them in our path tracer. ### Compiling Shaders Let's put together a render pipeline that draws a red triangle. We'll define a vertex shader that outputs the 3 corner vertices and a fragment shader that outputs a solid color. We'll write these shaders in the WebGPU Shading Language (WGSL). Go ahead and create a file called `shaders.wgsl` to host all of our WGSL code (I put it next to the Rust files under `src/`). Before we can run this code on the GPU we need to compile it into a form that can be executed on the GPU. We start by creating a *shader module*: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let shader_module = compile_shader_module(&device); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete // TODO: initialize GPU resources ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, } } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn compile_shader_module(device: &wgpu::Device) -> wgpu::ShaderModule { use std::borrow::Cow; let code = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/src/shaders.wgsl")); device.create_shader_module(wgpu::ShaderModuleDescriptor { label: None, source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(code)), }) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render-shader-module]: [render.rs] Creating the shader module] The `compile_shader_module` function loads the file we just created into a string using the `include_str!` macro. This bundles the contents of `shaders.wgsl` into the program binary at build time. This is followed by a call to `wgpu::Device::create_shader_module` to compile the WGSL source code.[^ch3-footnote2] Let's define the vertex and fragment functions, which I'm calling `display_vs` and `display_fs`: I'm using the "vs" and "fs" suffixes as shorthand for "vertex stage" and "fragment stage". Together, these two functions form our "display pipeline" (the "display" part will become more clear later). The `@vertex` and `@fragment` annotations are WGSL keywords that mark these two functions as entry points to each pipeline stage program. Since graphics workloads generally involve a high amount of linear algebra, GPUs natively support SIMD operations over vectors and matrices. All shading languages define built-in types for vectors and matrices of up to 4 dimensions (4x4 in the case of matrices). The `vec4f` and `vec2f` types that are in the code represent 4D and 2D vectors of floating point numbers. `display_vs` returns the vertex position as a `vec4f`. This position is defined relative to a coordinate space called the *Normalized Device Coordinate Space*. In NDC, the center of the viewport marks the origin $(0, 0, 0)$. The $x$-axis spans horizontally from $(-1, 0, 0)$ on the left edge of the viewport to $(1, 0, 0)$ on the right edge while the $y$-axis spans vertically from $(0,-1,0)$ at the bottom to $(0,1,0)$ at the top. The $z$-axis is directly perpendicular to the viewport, going *through* the origin. ![Figure [ndc]: Our triangle in Normalized Device Coordinates](../images/fig-02-ndc.svg) `display_vs` takes a *vertex index* as its parameter. The vertex function gets invoked for every input vertex across different GPU threads. `vid` identifies the individual vertex that is assigned to the *invocation*. The number of vertices and where they exist within the topology of the input geometry is up to us to define. Since we want to draw a triangle, we'll later issue a *draw call* with 3 vertices and `display_vs` will get invoked exactly 3 times with vertex indices ranging from $0$ to $2$. Since our 2D triangle is viewport-aligned, we can set the $z$ coordinate to $0$. The 4th coordinate is known as a *homogeneous coordinate* used for projective transformations. Don't worry about this coordinate for now -- just know that for a vector that represents a *position* we set this coordinate to $1$. We can declare the $x$ and $y$ coordinates for the 3 vertices as an array of `vec2f` and simply return the element that corresponds to `vid`. I enumerate the vertices in counter-clockwise order which matches the winding order we'll specify when we create the pipeline. `display_fs` takes no inputs and returns a `vec4f` that represents the fragment color. The 4 dimensions represent the red, green, blue, and alpha channels of the destination pixel. `display_fs` gets invoked for all pixel fragments that result from our triangle and the invocations are executed in parallel across many GPU threads, just like the vertex function. To paint the triangle solid red, we simply return `vec4f(1., 0., 0., 1.)` for all fragments. [^ch3-footnote2]: The `Cow::Borrowed` bit is a Rust idiom that creates a "copy-on-write borrow". This allows the API to take ownership of the WGSL string if necessary. This is not really an important detail for us. ### Creating the Pipeline Object Before we can run the shaders, we need to assemble them into a *pipeline state object*. This is where we specify the data layout of the render pipeline and link the shaders into a runnable binary program. Let's add a new function called `create_display_pipeline`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... fn compile_shader_module(device: &wgpu::Device) -> wgpu::ShaderModule { use std::borrow::Cow; let code = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/src/shaders.wgsl")); device.create_shader_module(wgpu::ShaderModuleDescriptor { label: None, source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(code)), }) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn create_display_pipeline( device: &wgpu::Device, shader_module: &wgpu::ShaderModule, ) -> wgpu::RenderPipeline { device.create_render_pipeline(&wgpu::RenderPipelineDescriptor { label: Some("display"), layout: None, primitive: wgpu::PrimitiveState { topology: wgpu::PrimitiveTopology::TriangleList, front_face: wgpu::FrontFace::Ccw, polygon_mode: wgpu::PolygonMode::Fill, ..Default::default() }, vertex: wgpu::VertexState { module: shader_module, entry_point: "display_vs", buffers: &[], }, fragment: Some(wgpu::FragmentState { module: shader_module, entry_point: "display_fs", targets: &[Some(wgpu::ColorTargetState { format: wgpu::TextureFormat::Bgra8Unorm, blend: None, write_mask: wgpu::ColorWrites::ALL, })], }), depth_stencil: None, multisample: wgpu::MultisampleState::default(), multiview: None, }) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [display-pipeline]: [render.rs] The `create_display_pipeline` function] This code describes a render pipeline that draws a list of triangle primitives. The vertex winding order is set to counter-clockwise which defines the orientation of the triangle's *front face*.[^ch3-footnote3] We request that the interior of each polygon be completely filled (rather than drawing just the edges or vertices). We specify that `display_vs` is the main function of the vertex stage and that we're not providing any vertex data from the CPU (since we declared our vertices in the shader code). Similarly, we set up a fragment stage with `display_fs` as the entry point and a single color target.[^ch3-footnote4] I set the pixel format of the render target to `Bgra8Unorm` since that happens to be widely supported on all of my devices. What's important is that you assign a pixel format that matches the surface configuration in your windowing setup and that your GPU device supports this as a *render attachment* format. Let's instantiate the pipeline and store it in the `PathTracer` object. Pipeline creation is expensive so we want to create the pipeline state object once and hold on to it. We'll reference it later when drawing a frame: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_pipeline: wgpu::RenderPipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let display_pipeline = create_display_pipeline(&device, &shader_module); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_pipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [display-pipeline-init]: [render.rs] Initializing the display pipeline] [^ch3-footnote3]: The GPU can automatically discard triangles that are oriented away from the viewport. This is a feature called *back face culling* which our code doesn't make use of. [^ch3-footnote4]: The `fragment` field of `wgpu::RenderPipelineDescriptor` is optional (notice the *Some* in `Some(wgpu::FragmentState {...})` ?). A render pipeline that only outputs to the depth or stencil buffers doesn't have to specify a fragment shader or any color attachments. An example of this is *shadow mapping*: a shadow map is a texture that stores the distances between a light source and geometry samples from the scene; it can be produced by a depth-only render-pass from the point of view of the light source. The shadow map is later sampled from a render pass from the camera's point of view to determine whether a rasterized point is visible from the light or in shadow. The Render Pass --------------- We now have the pieces in place to issue a draw command to the GPU. The general abstraction modern graphics APIs define for this is called a "command buffer" (or "command list" in D3D12). You can think of the command buffer as a memory location that holds the serialized list of GPU commands representing the sequence of actions we want the GPU to take. To draw a triangle we'll *encode* a draw command into the command buffer and then *submit* the command buffer to the GPU for exection. With wgpu, the encoding is abstracted by an object called `wgpu::CommandEncoder`, which we'll use to record our draw command. Once we are done, we will call `wgpu::CommandEncoder::finish()` to produce a finalized `wgpu::CommandBuffer` which we can submit to the GPU via the `wgpu::Queue` that we created at start up. Let's add a new `PathTracer` function called `render_frame`. This function will take a texture as its parameter (our *render target*) and tell the GPU to draw to it using the pipeline object we created earlier: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn render_frame(&self, target: &wgpu::TextureView) { let mut encoder = self .device .create_command_encoder(&wgpu::CommandEncoderDescriptor { label: Some("render frame"), }); let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor { label: Some("display pass"), color_attachments: &[Some(wgpu::RenderPassColorAttachment { view: target, resolve_target: None, ops: wgpu::Operations { load: wgpu::LoadOp::Clear(wgpu::Color::BLACK), store: wgpu::StoreOp::Store, }, })], ..Default::default() }); render_pass.set_pipeline(&self.display_pipeline); // Draw 1 instance of a polygon with 3 vertices. render_pass.draw(0..3, 0..1); // End the render pass by consuming the object. drop(render_pass); let command_buffer = encoder.finish() self.queue.submit(Some(command_buffer)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render_frame-stub]: [render.rs] The `render_frame` function] `target` here is defined as a `wgpu::TextureView`. wgpu makes the distinction between a texture resource (represented by `wgpu::Texture`) and how that texture's memory is accessed by a pipeline (which is represented by the *view* into the texture). When we want to bind a texture we first create a view with the right properties. In this case we'll assume that the caller already created a `TextureView` of the render target. The first thing we do in `render_frame` is create a command encoder. We then tell the encoder to begin a *render pass*. There are 4 important API calls we make to encode the draw command: 1. Create a `wgpu::RenderPass`. We tell it to store the colors that are output by the render pipeline to the `target` texture by assigning it as the only color attachment. We also tell it to clear all pixels of the target to black (i.e. $(0, 0, 0, 1)$ in RGBA) before drawing to it. 2. Assign the render pipeline. 3. Record a single draw with 3 vertices. 4. End the render pass by destroying the `wgpu::RenderPass` object. We then serialize the command buffer and submit it to the GPU. Finally, let's invoke `render_frame` from our windowing event loop, using the current surface texture as the render target: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust async fn main() -> Result<()> { ... event_loop.run(move |event, _, control_flow| { ... Event::RedrawRequested(_) => { // Wait for the next available frame buffer. let frame: wgpu::SurfaceTexture = surface .get_current_texture() .expect("failed to get current texture"); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete // TODO: draw frame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let render_target = frame .texture .create_view(&wgpu::TextureViewDescriptor::default()); renderer.render_frame(&render_target); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust frame.present(); } ... }); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render_frame-call]: [main.rs] Rendering to a surface texture] Running this code should bring up a window that looks like this: ![Figure [first-triangle]: First Triangle](../images/img-02-first-triangle.png) Finally drawing something! A single triangle may not look that interesting but you can model highly complex 3D scenes and geometry by putting many of them together. It takes only a few tweaks to the render pipeline to shape, animate, and render millions of triangles many times per second. Full-Screen Quad ---------------- The render pipeline that we just put together plays a rather small role in the overall renderer: its purpose is to display the output of the path-tracer on the window surface. The output of our renderer is a 2D rectangular image and I would like it to fill the whole window. We can achieve this by having the render pipeline draw two right triangles that are adjacent at their hypothenuse. Remember that the viewport coordinates span the range $[-1, 1]$ in NDC, so setting the 4 corners of the rectangle to $(-1, 1)$, $(1, 1)$, $(1, -1)$, $(-1, -1)$ should cover the entire viewport regardless of its dimensions. ![Figure [half-screen-quad]: Half-Screen Triangle](../images/img-03-half-screen-quad.png) That painted only one of the triangles. We also need to update the draw command with the new vertex count: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { ... pub fn render_frame(&self, target: &wgpu::TextureView) { ... render_pass.set_pipeline(&self.display_pipeline); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete // Draw 1 instance of a polygon with 3 vertices. render_pass.draw(0..3, 0..1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight // Draw 1 instance of a polygon with 6 vertices. render_pass.draw(0..6, 0..1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // End the render pass by consuming the object. drop(render_pass); let command_buffer = encoder.finish() self.queue.submit(Some(command_buffer)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render_frame-stub]: [render.rs] The `render_frame` function] ![Figure [full-screen-quad]: Full-Screen Quad](../images/img-04-full-screen-quad.png) Viewport Coordinates -------------------- In this setup, every fragment shader invocation outputs the color of a single pixel. We can identify that pixel using the built-in `position` input to the pipeline stage. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return vec4f(1.0, 0.0, 0.0, 1.0); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [position-builtin]: [shaders.wgsl] Position Built-In] The input is defined as a `vec4f`. The $x$ and $y$ coordinates are defined in the _Viewport Coordinate System_. The origin $(0, 0)$ corresponds to the top-left corner pixel of the viewport. The $x$-coordinate increases towards the right and the $y$-coordinate increases towards the bottom. A whole number increment in $x$ or $y$ represents an increment by 1 pixel (and fractional increments can fall "inside" a pixel). For example, for a viewport with the physical dimensions of $800\times600$, the coordinate ranges are $0\le x\lt799, 0\le y \lt599$. ![Figure [viewport-coords]: Viewport Coordinate System](../images/fig-03-viewport-coords.svg) Let's assign every pixel fragment a color based on its position in the viewport by mapping the coordinates to a color channel (red and green). The render target uses a normalized color format (i.e. the values must be between $0$ and $1$), so we divide each dimension by the largest possible value to convert it to that range: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight const WIDTH: u32 = 800u; const HEIGHT: u32 = 600u; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let color = pos.xy / vec2f(f32(WIDTH - 1u), f32(HEIGHT - 1u)); return vec4f(color, 0.0, 1.0); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [pos-to-color]: [shaders.wgsl]] There are two language expressions here that are worth highlighting. `pos.xy` is a so called _vector swizzle_ that extracts the $x$ and $y$ components and produces a `vec2f` containing only those. Next, we divide that `vec2f` by another `vec2f`. Here, the division operator performs a component-wise division of every element of the vector on the left-hand side by the corresponding element on the right-hand side, so `pos.xy / vec2f(f32(WIDTH - 1u), f32(HEIGHT - 1u))` is equivalent to `vec2f(pos.x / f32(WIDTH - 1u), pos.y / f32(HEIGHT - 1u))`. Now we are able to separately color each individual pixel. Running this should produce a picture that looks like this: ![Figure [viewport-gradient]: Viewport Coordinates as a color gradient ](../images/img-05-viewport-gradient.png) Resource Bindings ==================================================================================================== Our program is split across separate runnable parts: the main executable that runs on the CPU and pipelines that run on the GPU. As we add more features we will want to exchange data between the different parts. The main way to achieve this is via memory resources. The CPU side of our program can create and interact with resources by making API calls. On the GPU side, the shader program can access those via _bindings_. A binding associates a resource with a unique slot number that can be referenced by the shader. Each slot is identified by an index number. The shader code declares a variable for each binding with a decoration that assigns it a binding index. The CPU side is responsible for setting up the resources for a GPU pipeline according to its binding layout. WebGPU introduces an additional concept around bindings called _bind group_. A bind group associates a group of resources that are frequently bound together.[^ch4-footnote1] Like individual bindings, each bind group is identified by an index number. Our pipelines won't make use of more than one bind group at a time, so we'll always assign $0$ as the group index. [^ch4-footnote1]: The bind group concept is similar to "descriptor set" in Vulkan, "descriptor table" in D3D12, and "argument buffer" in Metal. Uniform Declaration ------------------- The first binding we are going to set up is a _uniform buffer_. Uniforms are read-only data that don't vary across GPU threads. We are going to use a uniform buffer to store certain globals, like camera parameters. Our renderer currently assumes a window dimension of $800\times600$ and declares this in two different places (`shaders.wgsl` and `main.rs`) which must be kept in sync. Let's make `WIDTH` and `HEIGHT` uniforms and upload their values from the CPU side. We'll first declare a uniform buffer and assign it to binding index $0$: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct Uniforms { width: u32, height: u32, } @group(0) @binding(0) var uniforms: Uniforms; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete const WIDTH: u32 = 800u; const HEIGHT: u32 = 600u; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let color = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return vec4f(color, 0.0, 1.0); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [uniform binding declaration]: [shaders.wgsl] Uniform binding declaration] The `var` declaration tells the compiler that the shader expects a uniform buffer binding. The type of the binding variable is `Uniforms` which represents the shader's view over the buffer's memory. Declaring it this way allows the shader to access the contents of the buffer with an expression like `uniforms.width`. Bind Group Layout ----------------- If you run the code now you should get a validation error telling you that the pipeline layout expects a bind group layout at index $0$. We need to update the display pipeline description with a layout that includes the new uniform binding. Let's update the `create_display_pipeline` function to return a `wgpu::BindGroupLayout` alongside the pipeline object: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... } ... } ... fn create_display_pipeline( device: &wgpu::Device, shader_module: &wgpu::ShaderModule, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight ) -> (wgpu::RenderPipeline, wgpu::BindGroupLayout) { let bind_group_layout = device.create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor { label: None, entries: &[ wgpu::BindGroupLayoutEntry { binding: 0, visibility: wgpu::ShaderStages::FRAGMENT, ty: wgpu::BindingType::Buffer { ty: wgpu::BufferBindingType::Uniform, has_dynamic_offset: false, min_binding_size: None, }, count: None, }, ], }); let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust label: Some("display"), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight layout: Some(&device.create_pipeline_layout(&wgpu::PipelineLayoutDescriptor { bind_group_layouts: &[&bind_group_layout], ..Default::default() })), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... }); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight (pipeline, bind_group_layout) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [display-pipeline-layout]: [render.rs] Display pipeline layout] This says that the pipeline contains a single bind group, containing a single buffer entry. The buffer entry has the "uniform" buffer binding type and is visible only to the fragment stage. Buffer Object ------------- Let's now create the buffer object that will provide the backing memory for the uniforms. The size and layout of the memory need to match the `Uniforms` struct that we declared in the WGSL. A common pattern is to maintain two sets of these declarations (one for the CPU and one for the GPU side) and keep them in sync. Some frameworks allow you to reuse the same declarations on both sides. _wgpu_ doesn't provide a utility for this out of the box, so I'm going to redeclare `Uniforms` for the CPU side: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight use bytemuck::{Pod, Zeroable}; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, display_pipeline: wgpu::RenderPipeline, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust impl PathTracer { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [uniforms-struct-cpu]: [render.rs] CPU-side `Uniforms` struct] The `repr(C)` attribute makes the memory layout of the `Uniforms` struct conform to the C language rules so that the fields have a predictable order, size, and alignment.[^ch4-footnote2] For our purposes, this should make the memory layout of the struct exactly match the WGSL declaration. The `derive` attribute automatically implements the enumerated traits for our type. `Copy` and `Clone` allow the type be copied by value (Rust types are move-only by default). This is also the first time we are using the `bytemuck` crate. The `Pod` and `Zeroable` traits, along with `repr(C)`, allow us the safely reinterpret the `Uniforms` struct as a sequence of bytes. For all intents and purposes, these Rust attributes enable the same semantics as the following plain C/C++ struct: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C // If `Uniforms` were declared in C: struct Uniforms { uint32_t width; uint32_t height; }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now, let's allocate the backing buffer object and initialize its contents: This code allocates a buffer resource that is large enough to store an instance of `Uniforms` and copies the contents of `uniforms` into it. The buffer is mapped at creation so that its address space accessible to the CPU side. We also declare its usage to be `UNIFORM`: this is a hint to the GPU driver that allows it to perform optimizations based on the buffer access pattern. The usage is also useful for validating that the bindings we provide conform to the pipeline's layout. After the data copy, we need to flush and unmap the buffer from CPU memory before we can use it in GPU commands. We also store both `uniforms` and `uniform_buffer`, since we'll reuse them to modify some of the uniforms at runtime. [^ch4-footnote2]: The default Rust layout representation doesn't provide a strong guarantee on the order of the fields. See the [Rust reference](https://doc.rust-lang.org/reference/type-layout.html#representations). Bind Group ---------- We need to associate the buffer object with a bind group with the correct layout before it can be used in a render pass. Let's create and store a bind group and assign it to group index $0$ while encoding the draw: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use bytemuck::{Pod, Zeroable}; pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, uniforms: Uniforms, uniform_buffer: wgpu::Buffer, display_pipeline: wgpu::RenderPipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_bind_group: wgpu::BindGroup, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { ... uniform_buffer.unmap(); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight // Create the display pipeline bind group. let display_bind_group = device.create_bind_group(&wgpu::BindGroupDescriptor { label: None, layout: &display_layout, entries: &[wgpu::BindGroupEntry { binding: 0, resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding { buffer: &uniform_buffer, offset: 0, size: None, }), }], }); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, uniforms, uniform_buffer, display_pipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_bind_group, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } } pub fn render_frame(&self, target: &wgpu::TextureView) { ... render_pass.set_pipeline(&self.display_pipeline); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight render_pass.set_bind_group(0, &self.display_bind_group, &[]); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // Draw 1 instance of a polygon with 6 vertices render_pass.draw(0..6, 0..1); ... } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [display-bind-group]: [render.rs] Creating and using the display bind group] Running the program now should bring up the same picture as before. The viewport dimensions are still hardcoded in two places so let's clean that up by making the viewport width and height parameters of the `PathTracer` constructor: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust impl PathTracer { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); // Initialize the uniform buffer. let uniforms = Uniforms { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight width, height, }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [width-height-parameters]: [render.rs]] Let's update the main function to pass in the physical window dimensions while creating the `PathTracer`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... const WIDTH: u32 = 800; const HEIGHT: u32 = 600; #[pollster::main] async fn main() -> Result<()> { let event_loop = EventLoop::new(); let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT); let window = WindowBuilder::new() .with_inner_size(window_size) .with_resizable(false) .with_title("GPU Path Tracer".to_string()) .build(&event_loop)?; let (device, queue, surface) = connect_to_gpu(&window).await?; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust event_loop.run(move |event, _, control_flow| { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [width-height-parameters-main]: [main.rs]] Now we have a way to pass data between the CPU and GPU sides of the program. We can repeat this pattern whenever we need to add or modify a bind group layout. Ray Casting ==================================================================================================== Light flows out of emissive objects (like the sun or a lamp) and scatters off objects as it floods the environment. When some of that light reaches a camera sensor, the camera can measure the amount that arrived at each pixel and create a picture. Our virtual camera will compute the same measurement by tracing the light's path in the reverse direction, starting at the camera and towards the objects in the scene. Camera Rays ----------- The first segment in a path is between the camera and the closest surface that is visible "through a pixel". To locate that surface, we can plot a ray from the camera and search for the closest point where the ray intersects the scene. A ray is a part of a straight line that has a starting point and extends infinitely in one direction. A ray in 3D space can be represented using two vectors: a point of origin $\mathbf{P}$ and a direction $\vec{\mathbf{d}}$. All points $\mathbf{R}$ on the ray are described by the linear equation $\mathbf{R}(t) = \mathbf{P} + t \mathbf{d}$ over the parameter $t$. $t$ is a real number and its positive values represent points on the ray that are in front of the ray origin (if we consider the direction $\mathbf{d}$ as _forward_). Negative values of $t$ represent points behind the origin, and $t=0$ is the same as the origin. ![Figure [ray]: Ray definition](../images/fig-05-ray.svg) Let's define the data structure to represent a ray: Let's now model a simple pinhole camera. Initially we'll the define the eye position (where the arriving light gets focused) as the camera's origin and this will act as the origin for all camera rays. The camera has a view direction, and some distance away from the origin along the view direction sits the 2D viewport framing the rendered image. We will initially position the camera origin at the coordinate system origin $(0, 0, 0)$ and set the view direction towards the $-z$-axis in a 3-dimensional right-handed cartesian coordinate system.[^ch5-footnote1] ![Figure [camera-view-space]: Rays in camera coordinates](../images/fig-04-camera-view-space.svg) In order to determine the direction for the ray targeting a pixel, we need to convert the pixel's viewport coordinates to the coordinate system we are going to use when computing ray intersections. Let's define the $x$ and $y$ coordinate span of the viewport to be the same as NDC (see _Figure 4_). This would make the viewport a square (with a width and height of $2$) so we need to adjust it by the aspect ratio of the application window in order to make its shape match the window frame. The fragment shader already normalizes the viewport pixel coordinates to the range $[0,1]$ and returns that as the output color. We can instead apply a simple transformation to convert them to our new camera coordinate space: 1. Map the range to $[-1, 1]$ by doubling the range and shifting it in the negative direction by $1$. 2. Scale the $x$ coordinate by the aspect ratio (which we'll define as $\tfrac{width}{height}$). 3. Flip the sign of the $y$ coordinate by multiplying it by $-1$. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete let color = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Normalize the viewport coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-ray-computation]: [shaders.wgsl] Obtaining the viewport vector] We now have a vector $\vec{\mathbf{uv}}$ that spans from the center of the viewport to the pixel $\mathbf{A}$. The ray direction is the vector that points from the origin towards the pixel, which is given by $\mathbf{A} - \mathbf{O}$. $\mathbf{O}$ is equal to $(0, 0, 0)$, so computing $\mathbf{A}$ will give us the ray direction. If we picture the viewport to be positioned away from the origin at distance $f$ along the $-z$ axis then we can obtain $\mathbf{A}$ by computing $\begin{bmatrix} \vec{\mathbf{uv}} \\ 0 \end{bmatrix} - \begin{bmatrix} 0 \\ 0 \\ f \end{bmatrix}$, or simply $\begin{bmatrix} \vec{\mathbf{uv}} \\ -f \end{bmatrix}$. In the code, I'll refer to $f$ as `focus_distance`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let origin = vec3(0.); let focus_distance = 1.; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Normalize the viewport coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-ray-computation]: [shaders.wgsl] Deriving the camera ray origin and direction] We finally have our camera ray. Initially we can make all rays hit the sky which will act as the light source. We can make the sky appear a little more realistic by painting it with a gradient that blends from blue to white as the $y$ coordinate of the ray's direction decreases. We'll first map the $y$ coordinate to the $[0,1]$ range and use that value to linearly interpolate between the two colors using the blend equation: $$ \mathit{blendedValue} = (1-a)\cdot\mathit{startValue} + a\cdot\mathit{endValue} $$ Let's introduce a function called `sky_color` to compute this for a given ray and return that as the fragment color. I used the same colors as RTIOW but you can use different ones:[^ch5-footnote2] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Ray { origin: vec3f, direction: vec3f, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn sky_color(ray: Ray) -> vec3f { let t = 0.5 * (normalize(ray.direction).y + 1.); return (1. - t) * vec3(1.) + t * vec3(0.3, 0.5, 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Map `pos` from y-down viewport coordinates to camera viewport plane coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight return vec4(sky_color(ray), 1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-ray-computation]: [shaders.wgsl] Deriving the camera ray origin and direction] Running the program now should produce an image that looks like this: ![Figure [sky]: Ray tracing the sky](../images/img-06-sky-gradient.png) [^ch5-footnote1]: The choice of a right-handed vs left-handed system is really up to you - you can pick any relative orientation for the major axes that you want, as long as you stay consistent. [^ch5-footnote2]: Interpolating from blue towards a reddish color instead of pure white can resemble twilight. Give `vec3(1., 0.5, 0.3)`) a try. Ray-Sphere Intersection ----------------------- It's time to introduce objects to the scene. We'll start with a sphere since it has a simple implicit form and querying for intersections between a ray and a sphere is straightforward. I'll quickly go over the mathematics of the intersection function that we are going to implement: Let's define a sphere by its center point $\mathbf{C}$ and its radius $r$. Then, any point $\mathbf{X}$ on the surface of the sphere can be described by the equation[^ch5-footnote3] $$ (\mathbf{X} - \mathbf{C}) \cdot (\mathbf{X} - \mathbf{C}) = r^2 $$ We want to determine if there is a point along the ray that satisfies this equation. Substituting our ray equation for $\mathbf{X}$ we get: $$ (\mathbf{P} + t\mathbf{d} - \mathbf{C}) \cdot (\mathbf{P} + t\mathbf{d} - \mathbf{C}) = r^2 $$ Now we need to solve for $t$. To simplify things, let's substitute $\mathbf{v}$ for $(\mathbf{P} - \mathbf{C})$. After expanding the dot product and rearranging the terms we get $$ (\mathbf{d} \cdot \mathbf{d}) t^2 + 2 (\mathbf{v} \cdot \mathbf{d}) t + (\mathbf{v} \cdot \mathbf{v}) - r^2 = 0 $$ This is now in a canonical form for a quadratic equation: $at^2 + 2bt + c = 0$ and the solutions for $t$ are given by $$ t = \dfrac{-b \pm\sqrt{(b^2 - ac)}}{a} $$ with $a = \mathbf{d}\cdot\mathbf{d}$, $b = (\mathbf{P}-\mathbf{C})\cdot\mathbf{d}$, and $c = (\mathbf{P}-\mathbf{C})\cdot(\mathbf{P}-\mathbf{C}) - r^2$. The value of the discriminant $b^2 - ac$ determines the number of solutions. If the discriminant is negative, then there are no real solutions and thus no intersection. If the discriminant is exactly 0, then there is one real solution where the ray tangentially intersects the sphere at that point. If the discriminant is positive, then there are two real solutions and thus two potential intersections that we need to consider. ![Figure [ray-sphere-solutions]: Different cases of ray-sphere intersection ](../images/fig-06-ray-sphere-solutions.svg) We are looking for the first visible surface in the ray's "line of sight", so when there are two possible intersections it makes sense to choose the one that's closer to the ray's origin and lies in front of it. If the closer result is negative (i.e. it's located _behind_ the origin relative to the ray direction), we can discard it and choose the other one. If that one is non-negative, then the ray origin is inside the sphere, so the intersection is valid. If both results are negative, then the sphere is "behind" the ray. $t$ is $0$ when the ray origin is on the surface. In general, rays that start exactly on the surface of an object will be rays that trace the paths of light arriving at that surface. We generally don't want such a ray to intersect the geometry that the ray originates from, so for simplicity let's only consider positive values of $t$ as a valid intersection. Let's define a new function called `intersect_sphere`. This function will return the smaller positive solution for $t$ if there is an intersection and a non-positive value if the ray misses the sphere. Let's also define a new type called `Sphere` to represent the object: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct Sphere { center: vec3f, radius: f32, } fn intersect_sphere(ray: Ray, sphere: Sphere) -> f32 { let v = ray.origin - sphere.center; let a = dot(ray.direction, ray.direction); let b = dot(v, ray.direction); let c = dot(v, v) - sphere.radius * sphere.radius; let d = b * b - a * c; if d < 0. { return -1.; } let sqrt_d = sqrt(d); let recip_a = 1. / a; let mb = -b; let t = (mb - sqrt_d) * recip_a; if t > 0. { return t; } return (mb + sqrt_d) * recip_a; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL struct Ray { origin: vec3f, direction: vec3f, } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [ray-sphere-intersection]: [shaders.wgsl] The `intersect_sphere` function] Let's now add a single sphere to the scene. First we'll test the sphere for an intersection with the camera ray. If there is a hit, then we'll return a solid color for the pixel. If not, we'll return the color of the sky as before. Let's also make sure that the sphere is far enough away from the view origin so that the camera doesn't fall inside the sphere: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Map `pos` from y-down viewport coordinates to camera viewport plane coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let sphere = Sphere(/*center*/ vec3(0., 0., -1), /*radius*/ 0.5); if intersect_sphere(ray, sphere) > 0. { return vec4(1., 0.76, 0.03, 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [single-sphere]: [shaders.wgsl] First intersection test] This should render a solid circle that looks like this: ![Figure [yellow-circle]: A solid circle](../images/img-07-solid-circle.png) [^ch5-footnote3]: This equation has an intuitive geometric interpretation. $\mathbf{X} - \mathbf{C}$ describes a vector that spans from the center of the sphere to its surface. We know that the magnitude of this vector must be equal to $r$. The dot product of a vector with itself yields the square of its magnitude (as $V \cdot V = V_x^2 + V_y^2 + V_z^2)$ which, in this case, must be equal to $r^2$. Shading Multiple Spheres ------------------------ Now that our sphere intersection code is working, we'll next generalize the ray casting logic to look for intersections in a _scene_ containing multiple objects. We can initially represent the scene as an array of spheres. We'll change the code to test all spheres for a possible hit and use the closest intersection to color the pixel. As before, we'll use the $t$ parameter to determine the nearest hit. Let's declare the scene with a second (large) sphere that serves as the "ground" where our first sphere will sit. We can declare the array as a private global, like we did for the vertices of the full-screen quad: The scene traversal code is straightforward: loop through the scene array and keep track of the closest $t$ value that results from calling `intersect_sphere` on each element. It makes sense to initialize $t$ with a value that is larger than all other possible values. Since we're dealing with floating-point numbers, _infinity_ is a suitable initial value. However, since WGSL doesn't quite support infinities[^ch5-footnote4], I'll use the largest representable `f32` value as a substitute: This should result in the following image: ![Figure [yellow-circles]: Two solid circles](../images/img-08-two-solid-circles.png) Both spheres are visible where we expect them. Since we're painting both objects with the same solid color, it's not possible to tell if our code works correctly for the bottom half of the top sphere where the ray intersects both objects. An easy way to improve this is to assign each object a different solid color and use that to paint the pixel. I'm going to do something different: I'll scale the color by the value of `closest_t` such that intersections that are further away from the origin are shaded darker compared to those that are closer. This will convey the _depth_ of the shaded object with respect to our virtual camera. We can achieve this by multiplying the color by a factor of $1 - t$ which will keep the color bright for smaller values of $t$ (representing closer intersections) and darken it as $t$ grows. I'll use the [**`saturate`**](https://www.w3.org/TR/WGSL/#saturate-float-builtin) built-in function to clamp the resulting value to the $[0, 1]$ range so that values of $t$ that are larger than $1$ will be shaded black: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Map `pos` from y-down viewport coordinates to camera viewport plane coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); var closest_t = FLT_MAX; for (var i = 0u; i < OBJECT_COUNT; i += 1u) { let t = intersect_sphere(ray, scene[i]); if t > 0. && t < closest_t { closest_t = t; } } if closest_t < FLT_MAX { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight return vec4(1., 0.76, 0.03, 1.) * saturate(1. - closest_t); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [shading-using-depth]: [shaders.wgsl] Shading using depth] This should make the objects' order of visibility and their spherical shape more apparent: ![Figure [depth-shaded-spheres]: Spheres shaded by depth](../images/img-09-depth-shaded-spheres.png) Both spheres appear quite dark and the bottom sphere fades to black where it meets the one on top. This makes sense since the center of the top sphere is exactly where $t = 1$. You can play with different ways to convert `closest_t` to a color. Here is a version that paints the scene gray and brighter with increasing depth: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... if closest_t < FLT_MAX { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight return vec4(saturate(closest_t) * 0.5); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [shading-using-depth-alt]: [shaders.wgsl] Another way to shade with depth] ![Figure [depth-shaded-spheres-gray]: Spheres shaded by depth (gray) ](../images/img-10-depth-shaded-spheres-gray.png) [^ch5-footnote4]: [WGSL W3C Working Draft, §14.6. Floating Point Evaluation](https://www.w3.org/TR/WGSL/#floating-point-evaluation) states that "_Overflow, infinities, and NaNs generated before runtime are errors_" and "_[compiler] implementations may assume that overflow, infinities, and NaNs are not present at runtime._" Surface Normals --------------- Shading using depth can serve as a great debugging tool as well as the basis for various visual effects. However, we need to know more about the surface geometry in order to color it with a lighting model. This includes its orientation with respect to our viewing direction and the rest of the scene, which is given by its _normal vector_. For any point on a surface, the normal $\vec{\mathbf{N}}$ is defined by the line that is perpendicular to the plane tangent at that point. ![Figure [normal-vector]: The normal vector](../images/fig-07-normal-vector.svg) The orientation of the normal vector depends on both the type of geometry as well as the specific point of intersection. First we're going to make some assumptions that will come into play later when we implement materials: 1. Every surface has a _front_ face and a _back_ face and the direction of the normal vector lines up with the front face. 2. All normal vectors have a unit length by default. The normal vector at point $\mathbf{X}$ on the surface of a sphere with center $\mathbf{C}$ and radius $r$ is simply given by $$ \vec{\mathbf{N}} = \dfrac{\mathbf{X} - \mathbf{C}}{||\mathbf{X} - \mathbf{C}||} = \dfrac{\mathbf{X} - \mathbf{C}}{r} $$ ![Figure [sphere-normal]: Computing the normal on a sphere ](../images/fig-08-sphere-normal.svg) Now that we know how to compute the normal, let's change `intersect_sphere` to return a normal vector alongside the $t$ parameter. We'll introduce a struct called `Intersection` that bundles them together: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @group(0) @binding(0) var uniforms: Uniforms; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct Intersection { normal: vec3f, t: f32, } fn no_intersection() -> Intersection { return Intersection(vec3(0.), -1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL struct Sphere { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn intersect_sphere(ray: Ray, sphere: Sphere) -> Intersection { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let v = ray.origin - sphere.center; let a = dot(ray.direction, ray.direction); let b = dot(v, ray.direction); let c = dot(v, v) - sphere.radius * sphere.radius; let d = b * b - a * c; if d < 0. { return no_intersection(); } let sqrt_d = sqrt(d); let recip_a = 1. / a; let mb = -b; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let t1 = (mb - sqrt_d) * recip_a; let t2 = (mb + sqrt_d) * recip_a; let t = select(t2, t1, t1 > 0.); if t <= 0. { return no_intersection(); } let p = point_on_ray(ray, t); let N = (p - sphere.center) / sphere.radius; return Intersection(N, t); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } struct Ray { origin: vec3f, direction: vec3f, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn point_on_ray(ray: Ray, t: f32) -> vec3 { return ray.origin + t * ray.direction; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL fn sky_color(ray: Ray) -> vec3f { let t = 0.5 * (normalize(ray.direction).y + 1.); return (1. - t) * vec3(1.) + t * vec3(0.3, 0.5, 1.); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersection-struct]: [shaders.wgsl] A struct for intersection data] Let's talk about some of the changes. We added a helper function called `no_intersection()` that returns an `Intersection` representing a null result. We also declared a function called `point_on_ray`, which returns the coordinates of a point along a ray at a known $t$ value. You may have noticed that the if statement which used to be conditioned on `t > 0.` is now a function call to _select_. [**`select`**](https://www.w3.org/TR/WGSL/#select-builtin) evaluates to either its first or second argument depending on the value of the third. The call `select(t2, t1, t1 > 0.)` is functionally equivalent to `t1 > 0. ? t1 : t2` (a ternary expression) in C/C++, with one exception: there is no guarantee of short-circuiting, meaning that both `t1` and `t2` will be evaluated regardless of the conditional. You may be tempted to rewrite this as an if statement (why needlessly evaluate both branches after all?): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var t = (mb - sqrt_d) * recip_a; if t <= 0. { t = (mb + sqrt_d) * recip_a; } if t <= 0. { return no_intersection(); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [branchy-version]: [shaders.wgsl] Branchy version] This is perfectly fine and will behave in the same way. In fact, it's possible that this will compile down to the exact same GPU instructions as the version with `select`. GPUs are generally not good at handling conditional branches in code without sacrificing some amount of parallelism (though this depends on several factors). A good shader compiler will often eliminate branches altogether for simple conditionals like these. Writing efficient GPU code requires a good understanding of how GPUs deal with divergent control flow -- a topic that we will discuss more later on. Let's update the fragment shader to make use of the new data structure. Let's also change our shading code to visualize the normal vector by mapping the coordinates (from the $[-1, 1]$ range) to a color value (in the $[0, 1]$ range): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Normalize the viewport coordinates. var uv = pos.xy / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight var closest_hit = Intersection(vec3(0.), FLT_MAX); for (var i = 0u; i < OBJECT_COUNT; i += 1u) { let hit = intersect_sphere(ray, scene[i]); if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } if closest_hit.t < FLT_MAX { return vec4(0.5 * closest_hit.normal + vec3(0.5), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [shading-with-normals]: [shaders.wgsl] Shading with normals] and we get: ![Figure [normal-shaded-spheres]: Visualizing surface normals ](../images/img-11-normal-shaded-spheres.png) Notice how each color channel maps directly to one of the major axis coordinates, so that normals pointing towards the $+x$ direction get shaded with a higher _red_ component, normals pointing straight up towards $+y$ appear green, and so on. Temporal Accumulation ==================================================================================================== Over the next two chapters we are going to focus on two important features of the renderer: antialiasing and path tracing. These are both sampling problems in essence: they try to estimate some continuous signal (in this case the light flowing out of the scene into the pixels of our virtual camera) by repeatedly sampling various discrete light paths. Once a sufficient number of samples have been collected, we hope that their average will converge to the real signal -- or at least get close enough.[^ch6-footnote1] How many samples do we need to collect for each pixel before displaying the result? How can we structure the code to achieve some amount of interactivity? The answer to the first question depends highly on the scene but the sample count we are looking at is possibly in the hundreds if not _thousands_. One option is to add a loop to our fragment shader that intersects the scene with camera rays thousands of times before returning the final color, though it will take a long time before we can display a frame. Path tracing is computationally _very_ expensive, even for a GPU. I'm going to suggest an alternative approach: spread the sample collection across many frames. An invocation of the pipeline will output 1 sample per pixel (as it currently does) but rather than outputting the samples directly to the display surface, we'll accumulate them in a texture over time. This approach has the nice benefit that we can to present the contents of the texture to the display as soon as a pipeline invocation completes, allowing us to watch as the image resolves to the final rendering. [^ch6-footnote1]: This is referred to as the [_Law of Large Numbers_](https://en.wikipedia.org/wiki/Law_of_large_numbers) in probability theory. The Monte Carlo method employed in path tracing is an example of this (and we'll talk more about it in the next chapter). Frame Count ----------- The arithmetic average of a set of samples is simply given by their sum divided by the sample count. In other words, given $N$ samples of a random variable $x \in x_1,...,x_N$ the average is given by $$ \dfrac{1}{N}\sum_{i=1}^N x_i $$ Since we are going to distribute the samples across rendered frames, for any given frame, $N$ is equal to the number of frames we have rendered up that point plus $1$. We can represent this as a simple counter that we increment every time `render_frame` gets called. We'll also define a uniform variable for the frame count so that our shader program can access it when it needs to compute the average: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Uniforms { width: u32, height: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight frame_count: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } @group(0) @binding(0) var uniforms: Uniforms; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [frame-count-cpu]: [shaders.wgsl] The `frame_count` uniform declaration] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight frame_count: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl PathTracer { pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); // Initialize the uniform buffer. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let uniforms = Uniforms { width: 800, height: 600, frame_count: 0, }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let uniform_buffer = device.create_buffer(&wgpu::BufferDescriptor { ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete pub fn render_frame(&self, target: &wgpu::TextureView) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn render_frame(&mut self, target: &wgpu::TextureView) { self.uniforms.frame_count += 1; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [frame-count-cpu]: [render.rs] Initializing the `frame_count` uniform] We declared `frame_count` as a 32-bit unsigned integer, which is supported by all shading languages. This will inevitably overflow if you leave the application running for a long time but I'm not too worried. Consider this: if you have a powerful graphics card that can render frames at 1000 fps, it will take approximately 50 days for the count to reach the maximum representable `u32` value ($2^{32}-1$). This is not perfect but also not a huge issue for us.[^ch6-footnote2] Note that we also changed `render_frame` to take a `&mut self` since it now mutates a member of the `PathTracer` type. We also need to update the call site and declare the `PathTracer` instance as mutable to make the compiler happy: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[pollster::main] async fn main() -> Result<()> { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete let renderer = render::PathTracer::new(device, queue); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut renderer = render::PathTracer::new(device, queue); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render_frame-call]: [main.rs] Rendering to a surface texture] We are now maintaining a count on the CPU but we still need to make sure that the changes are mirrored on the GPU side by writing the contents of `self.uniforms` to `self.uniform_buffer`. Since we are modifying `self.uniforms` every frame, we should also update the contents of the GPU buffer every frame. This is where things can get a little complicated. [^ch6-footnote2]: Rust has overflow checks enabled in debug builds, so the program will always panic (i.e. assert and crash) on overflow. In release builds, the checks are disabled by default and Rust performs two's complement wrapping (see the [docs](https://doc.rust-lang.org/book/ch03-02-data-types.html#integer-overflow)). If you don't care about the runtime cost and want to play it safe, you can use one of the explicit arithmetic methods provided by the standard library. For example, the following will always panic in the case of an overflow: `self.uniforms.frame_count = self.uniforms.frame_count.checked_add(1).unwrap()`. ### Buffer Updates There are some things to consider when modifying the contents of a GPU buffer. The first is the type of memory the buffer resides in. GPUs typically come in two flavors: a _discrete_ GPU (such as a desktop graphics card) has its own dedicated memory and connects to the CPU via a peripheral bus that facilitates memory transfers between the two processors. In a _unified_ architecture, the GPU and the CPU are integrated into the same die and can share system memory without an explicit memory transfer. Before any writes can occur, the CPU side must have access to a region of memory that's mapped to its address space. How the written data is made available to the GPU side very much depends on the hardware and the functions provided by the graphics API. For example, both Metal and Vulkan support buffer types that are backed by shared system memory and can be permanently mapped on a unified architecture. Similarly, both APIs provide facilities to transfer buffer data to GPU memory when fast shared memory isn't supported.[^ch6-footnote3] Another consideration is around synchronization. Suppose that we changed our renderer to allow multiple frames to be in flight without gating the GPU submissions on v-sync.[^ch6-footnote4] We would need to avoid making any changes to the uniform buffer while a GPU submission is in progress, as that could cause a data race. There are different ways to handle this depending on the API, such as double or triple buffering when using a persistently mapped buffer or using synchronization primitives like memory fences. If you're following this book using a native API (like Metal, Vulkan, D3D, CUDA, etc), please consult its documentation for the best approach for frequent buffer updates on your GPU. WebGPU tries to provide a common abstraction over these nuances while working within additional constraints imposed by a web browser environment.[^ch6-footnote5] As a result, WebGPU imposes some strict limitations on how buffer mapping works: * A buffer must have the [`MAP_WRITE`](https://www.w3.org/TR/webgpu/#dom-gpubufferusage-map_write) usage for the CPU side to map and write its contents and this usage can only be combined with the [`COPY_SRC`](https://www.w3.org/TR/webgpu/#dom-gpubufferusage-copy_src) usage. This means that a buffer we map for writing connot be bound as a shader resource (such as a uniform buffer) and instead serves as a _staging buffer_ for a _copy command_. Updating the contents of a buffer is only possible by issuing a copy from this intermediate staging buffer. * Buffers can only be mapped asynchronously and there is no synchronous way to map a buffer _except_ when first created (using the `mapped_at_creation` field in the buffer descriptor). This requires some careful coordination so that buffers are mapped and available for writing when we need to update them. This immediately rules out shared memory buffers so we have to issue a copy. The easiest way would be to create a new staging buffer on every update and set `mapped_at_creation` to `true` but allocating a new short-lived buffer every frame can be expensive and we should strive to reuse GPU buffers when we can. Buffers have to get unmapped before they can be bound to a shader, so we need to re-map a buffer before we can write to it again. A buffer can only get re-mapped asynchronously, so we may need to allocate another staging buffer if `render_frame` ever gets called before the asynchronous mapping of the first staging buffer has completed. One possible approach is to maintain a pool of staging buffers. Each of these is a `wgpu::Buffer` object with the `MAP_WRITE` and `COPY_SRC` usages and mapped at creation. When it's time to update the uniform buffer, we do the following: 1. Find a large enough staging buffer in the pool (or create a new one if not found). Assume the buffer is mapped and write its contents. 2. Unmap the buffer and move it to a "pending buffer" list. Then, encode a ["copy buffer to buffer"](https://www.w3.org/TR/webgpu/#dom-gpucommandencoder-copybuffertobuffer) command with the staging buffer as the source and the uniform buffer as the destination. 3. After submitting the command buffer, call ["map async"](https://www.w3.org/TR/webgpu/#dom-gpubuffer-mapasync) on all buffers in the pending list. The [wgpu implementation of map async](https://docs.rs/wgpu/latest/wgpu/struct.BufferSlice.html#method.map_async) reports its completion in a callback (which runs asynchronously), so the callback can be responsible for removing the buffer from the pending list and adding it back to the mapped staging buffer pool. This is a relatively simple state machine but fortunately there is a method that boils all of that down to a single API call: [`wgpu::Queue::write_buffer`](https://docs.rs/wgpu/latest/wgpu/struct.Queue.html#method.write_buffer)[^ch6-footnote6]. This simplifies the code quite a bit so let's use it instead of implementing a buffer pool. `write_buffer` achieves the same thing while leaving it up to wgpu to choose the most efficient way to transfer the data on the host platform. As for synchronization, everything gets internally handled by wgpu so there isn't anything special we need to do. As long as we call `write_buffer` before encoding any other GPU commands referencing the copy destination (i.e. our uniform buffer) on the same queue, the copy is guaranteed to complete before the shader runs and reads from the buffer: That's pretty much it. Since the new code always updates the uniform buffer before a GPU submission we don't really need to initialize it at the start. Let's check that the code works by creating a visual effect using `frame_count`. The count increases monotonically, so we can use it like a "timestamp" and drive a simple animation. Here is a simple shader change that makes the spheres shrink and expand: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ... var closest_hit = Intersection(vec3(0.), FLT_MAX); for (var i = 0u; i < OBJECT_COUNT; i += 1u) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight var sphere = scene[i]; sphere.radius += sin(f32(uniforms.frame_count) * 0.02) * 0.2; let hit = intersect_sphere(ray, sphere); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } if closest_hit.t < FLT_MAX { return vec4(0.5 * closest_hit.normal + vec3(0.5), 1.); } return vec4(sky_color(ray), 1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [shading-with-normals]: [shaders.wgsl] Shading with normals] This should create an effect like the one in this video (Figure 21): ![Figure [animated-radius]: (video) Spheres animated with frame count ](../images/vid-01-animated-radius.mp4 autoplay muted loop) [^ch6-footnote3]: Metal provides a ["managed"](https://developer.apple.com/documentation/metal/resource_fundamentals/synchronizing_a_managed_resource) storage mode for these situations alongside a "private" storage mode for memory that is meant for fast GPU-only access. Vulkan's memory abstraction also provides many similar low level configurations. [^ch6-footnote4]: This can be desirable on a high-end GPU that can render a single frame much faster than the display refresh rate. [^ch6-footnote5]: See [wgpu#1438](https://github.com/gfx-rs/wgpu/discussions/1438) for an interesting discussion on the motivations behind the async-only buffer mapping API. [^ch6-footnote6]: See the WebGPU specification for [GPUQueue.writeBuffer](https://www.w3.org/TR/webgpu/#dom-gpuqueue-writebuffer) Radiance Texture ---------------- The animation you just rendered is a type of computation that is spread over time (hence the word _"temporal"_). We can use the same mechanism to compute running averages of per-pixel radiance samples. _Radiance_ is a radiometric term that refers to the energy carried by light through space, restricted to an instant in time, emanating from a unit patch of surface towards another. It is a physical quantity that renderers often emulate to produce realistic stills. Following this model, we'll pretend that every ray we cast measures some fraction of the radiance along its direction, and rays will always originate from a surface in the scene and point in the direction of another. The first rays all originate at a pixel (inside the virtual camera).[^ch6-footnote7] On each frame, the program will compute one sample per pixel and add to a per-pixel sum of samples When it's time to display the current sample average, we can divide the sum by `frame_count` and output that to the surface. In order to achieve this, let's set aside GPU texture to persist the running sums across frames: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use bytemuck::{Pod, Zeroable}; pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, uniforms: Uniforms, uniform_buffer: wgpu::Buffer, display_pipeline: wgpu::RenderPipeline, } #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, frame_count: u32, } impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); // Initialize the uniform buffer. let uniforms = Uniforms { width, height, frame_count: 0, }; let uniform_buffer = device.create_buffer(&wgpu::BufferDescriptor { label: Some("uniforms"), size: std::mem::size_of::() as u64, usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST, mapped_at_creation: false, }); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let radiance_samples = create_sample_texture(&device, width, height); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, uniforms, uniform_buffer, display_pipeline, } } ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn create_sample_texture(device: &wgpu::Device, width: u32, height: u32) -> wgpu::Texture { device.create_texture(&wgpu::TextureDescriptor { label: Some("radiance samples"), size: wgpu::Extent3d { width, height, depth_or_array_layers: 1, }, mip_level_count: 1, sample_count: 1, dimension: wgpu::TextureDimension::D2, format: wgpu::TextureFormat::Rgba32Float, usage: wgpu::TextureUsages::TEXTURE_BINDING | wgpu::TextureUsages::STORAGE_BINDING, view_formats: &[], }) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [radiance-samples-texture]: [render.rs] Radiance samples texture] The texture has the same dimensions as the window surface, so that the resolution of the rendered image matches what gets displayed. (Though, it's not uncommon to render at a lower resolution and upsample that in order to save on computations.) The texture format is `Rgba32Float`, which stores every pixel (or "texel") as four 32-bit floating point components (one for each of the 4 RGBA channels). This uses more memory than the 8-bit `Rgba8Unorm` format we used for the display surface but provides sufficient precision to store very large sums of radiance samples on all color channels. The usages (`TEXTURE_BINDING` and `STORAGE_BINDING`) enable the texture to be bound for reading and writing. wgpu doesn't allow a texture to be bound to the same shader stage simultaneously with both read and write access (except with an extension feature[^ch6-footnote8]). This may not be supported on all GPUs, so let's avoid depending on specific GPU features for now. Instead of reading and modifying the same texture in the render pass we can "ping-pong" between two textures. The pipeline will declare two texture bindings: a read-only binding that contains the previously accumulated sums, and a second (write-only) storage binding where it will output the updated sums. We'll also create two texture objects for each binding and alternate their binding assignments with every frame, repeatedly swapping their roles: the texture that was previously the write target provides the accumulated sums for the next frame, and vice versa. Start by changing the type of `radiance_samples` to an array of 2 textures: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use bytemuck::{Pod, Zeroable}; pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, uniforms: Uniforms, uniform_buffer: wgpu::Buffer, display_pipeline: wgpu::RenderPipeline, } #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, frame_count: u32, } impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let radiance_samples = create_sample_textures(&device, width, height); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... } ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn create_sample_textures( device: &wgpu::Device, width: u32, height: u32, ) -> [wgpu::Texture; 2] { let desc = wgpu::TextureDescriptor { label: Some("radiance samples"), size: wgpu::Extent3d { width, height, depth_or_array_layers: 1, }, mip_level_count: 1, sample_count: 1, dimension: wgpu::TextureDimension::D2, format: wgpu::TextureFormat::Rgba32Float, usage: wgpu::TextureUsages::TEXTURE_BINDING | wgpu::TextureUsages::STORAGE_BINDING, view_formats: &[], }; // Create two textures with the same parameters. [device.create_texture(&desc), device.create_texture(&desc)] } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [radiance-samples-textures]: [render.rs] Radiance samples textures] Now, let's add the new bindings to the the bind group layout definition, assigning binding index $1$ to the read-only binding (previous sums) and $2$ to the write-only storage binding (the updated sums): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... fn create_display_pipeline( device: &wgpu::Device, shader_module: &wgpu::ShaderModule, ) -> (wgpu::RenderPipeline, wgpu::BindGroupLayout) { let bind_group_layout = device.create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor { label: None, entries: &[ wgpu::BindGroupLayoutEntry { binding: 0, visibility: wgpu::ShaderStages::FRAGMENT, ty: wgpu::BindingType::Buffer { ty: wgpu::BufferBindingType::Uniform, has_dynamic_offset: false, min_binding_size: None, }, count: None, }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight wgpu::BindGroupLayoutEntry { binding: 1, visibility: wgpu::ShaderStages::FRAGMENT, ty: wgpu::BindingType::Texture { sample_type: wgpu::TextureSampleType::Float { filterable: false, }, view_dimension: wgpu::TextureViewDimension::D2, multisampled: false, }, count: None, }, wgpu::BindGroupLayoutEntry { binding: 2, visibility: wgpu::ShaderStages::FRAGMENT, ty: wgpu::BindingType::StorageTexture { access: wgpu::StorageTextureAccess::WriteOnly, format: wgpu::TextureFormat::Rgba32Float, view_dimension: wgpu::TextureViewDimension::D2, }, count: None, }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ], }); let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor { ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [updated-pipeline-layout]: [render.rs] Updated bind group layout] Next, we need to change the actual bind group object to match the new layout. We want to alternate the texture assignments but a bind group cannot be modified once it's created. We could instead create two bind groups with the textures swapped and alternate those at render time: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... pub struct PathTracer { device: wgpu::Device, queue: wgpu::Queue, uniforms: Uniforms, uniform_buffer: wgpu::Buffer, display_pipeline: wgpu::RenderPipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_bind_groups: [wgpu::BindGroup; 2], ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { width: u32, height: u32, frame_count: u32, } impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { device.on_uncaptured_error(Box::new(|error| { panic!("Aborting due to an error: {}", error); })); let shader_module = compile_shader_module(&device); let (display_pipeline, display_layout) = create_display_pipeline(&device, &shader_module); // Initialize the uniform buffer. let uniforms = Uniforms { width, height, frame_count: 0, }; let uniform_buffer = device.create_buffer(&wgpu::BufferDescriptor { label: Some("uniforms"), size: std::mem::size_of::() as u64, usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST, mapped_at_creation: false, }); let radiance_samples = create_sample_textures(&device, width, height); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let display_bind_groups = create_display_bind_groups( &device, &display_layout, &radiance_samples, &uniform_buffer, ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust PathTracer { device, queue, uniforms, uniform_buffer, display_pipeline, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight display_bind_groups, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } } ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn create_display_bind_groups( device: &wgpu::Device, layout: &wgpu::BindGroupLayout, textures: &[wgpu::Texture; 2], uniform_buffer: &wgpu::Buffer, ) -> [wgpu::BindGroup; 2] { let views = [ textures[0].create_view(&wgpu::TextureViewDescriptor::default()), textures[1].create_view(&wgpu::TextureViewDescriptor::default()), ]; [ // Bind group with view[0] assigned to binding 1 and view[1] assigned to binding 2. device.create_bind_group(&wgpu::BindGroupDescriptor { label: None, layout, entries: &[ wgpu::BindGroupEntry { binding: 0, resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding { buffer: uniform_buffer, offset: 0, size: None, }), }, wgpu::BindGroupEntry { binding: 1, resource: wgpu::BindingResource::TextureView(&views[0]), }, wgpu::BindGroupEntry { binding: 2, resource: wgpu::BindingResource::TextureView(&views[1]), }, ], }), // Bind group with view[1] assigned to binding 1 and view[0] assigned to binding 2. device.create_bind_group(&wgpu::BindGroupDescriptor { label: None, layout, entries: &[ wgpu::BindGroupEntry { binding: 0, resource: wgpu::BindingResource::Buffer(wgpu::BufferBinding { buffer: uniform_buffer, offset: 0, size: None, }), }, wgpu::BindGroupEntry { binding: 1, resource: wgpu::BindingResource::TextureView(&views[1]), }, wgpu::BindGroupEntry { binding: 2, resource: wgpu::BindingResource::TextureView(&views[0]), }, ], }), ] } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [textures-bind-groups]: [render.rs] Bind groups with different texture assignments] Now, let's update the shader: The intersection test logic remains the same as before but instead of returning the computed radiance value right away, we first store it in a local variable (`radiance_sample`). Next we fetch the current tally from the "old" texture (`old_sum`) and compute the updated tally by adding `radiance_sample` to it. We want to ensure that the accumulation starts as 0, so we set `old_sum` to `vec3(0)` for the initial frame (when `frame_count` is equal to $1$). Then simply return `new_sum / f32(uniform.frame_count)`, i.e. the current average, in the RGB channels of the output color.[^ch6-footnote9] Finally, let's update the bind group assignment in `PathTracer::render_frame` to ping-pong between the two bind groups we created, using even and odd values of `frame_count` as a toggle: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { ... pub fn render_frame(&mut self, target: &wgpu::TextureView) { self.uniforms.frame_count += 1; self.queue .write_buffer(&self.uniform_buffer, 0, bytemuck::bytes_of(&self.uniforms)); let mut encoder = self .device .create_command_encoder(&wgpu::CommandEncoderDescriptor { label: Some("render frame"), }); let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor { label: Some("display pass"), color_attachments: &[Some(wgpu::RenderPassColorAttachment { view: target, resolve_target: None, ops: wgpu::Operations { load: wgpu::LoadOp::Clear(wgpu::Color::BLACK), store: wgpu::StoreOp::Store, }, })], ..Default::default() }); render_pass.set_pipeline(&self.display_pipeline); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight render_pass.set_bind_group( 0, &self.display_bind_groups[(self.uniforms.frame_count % 2) as usize], &[], ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust // Draw 1 instance of a polygon with 6 vertices render_pass.draw(0..6, 0..1); // End the render pass by consuming the object. drop(render_pass); let command_buffer = encoder.finish(); self.queue.submit(Some(command_buffer)); } } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [ping-pong-bind-groups]: [render.rs] Ping-pong bind groups] When you run the code, you should see the animation from before but the displayed image should look a bit smeared. You should be able to see the oscillating sphere leave behind a "trail" over the first few seconds and the image should eventually settle at something like this: ![Figure [temporal-blur-effect]: Temporal Blur Effect](../images/img-12-blurred-animation.png) I find it fun to watch the rendering of this image. After the program runs for a few seconds the image seems to reach a steady state. This happens when the renderer has collected enough samples that adding new ones doesn't perceivably contribute to the average. The sphere radii are oscillating inside a fixed range, so we observe all possible frame states of the animation rather quickly. [^ch6-footnote7]: We are making the assumption that light travels along straight lines. [^ch6-footnote8]: wgpu supports a [read/write access mode](https://docs.rs/wgpu/0.19.3/wgpu/enum.StorageTextureAccess.html#variant.ReadWrite) which is hidden behind the adapter feature `TEXTURE_ADAPTER_SPECIFIC_FORMAT_FEATURES`. This isn't guaranteed to be supported by all GPUs but feel free use it if yours does. [^ch6-footnote9]: Note that the value we store in the texture (`vec4(new_sum, 0.)`) has its alpha component set to $0$. We aren't making use of the alpha values so it doesn't matter what we set this to. Antialiasing ------------ Let's undo the animation and bring back the static spheres. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ... let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); var closest_hit = Intersection(vec3(0.), FLT_MAX); for (var i = 0u; i < OBJECT_COUNT; i += 1u) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let sphere = scene[i]; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete var sphere = scene[i]; sphere.radius += sin(f32(uniforms.frame_count) * 0.02) * 0.2; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let hit = intersect_sphere(ray, sphere); if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } var radiance_sample: vec3f; ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [stop-animating]: [shaders.wgsl] Remove radius animation] The output should be the same still image from the end of Chapter 5 (Figure 20). The accumulation logic has no effect because every sample is computing exactly the same value. Let's zoom in and take a closer look at the edges of the spheres: ![Figure [aliased-boundaries]: Aliased shape boundaries @ 400x300 ](../images/img-13-aliased-boundaries.png height="500px" class="pixel") Here, each pixel is visualized as a square. A discrete pixel can only display a single color but pixels along shape boundaries overlap multiple (continuous) surfaces. Ideally the pixel color should receive a contribution from all of those surfaces, in proportion to the "pixel area" covered by each surface. Casting a single camera ray returns only a point sample but averaging multiple _sub-pixel_ samples can give us an approximation of the whole area. Let's try a very simple approach first: subdivide a pixel into a rectangular grid and on each frame cast the ray towards one of the sub-regions. The following code change adds a small offset to the ray which cycles through the sub-regions of a 4x4 grid centered at the original ray direction, using `uniforms.frame_count` like an index. The offsets range within $[-0.5, 0.5]$ in both coordinate directions: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // Offset and normalize the viewport coordinates of the ray. let offset = vec2( f32(uniforms.frame_count % 4) * 0.25 - 0.5, f32((uniforms.frame_count % 16) / 4) * 0.25 - 0.5 ); var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [16-sample-aa]: [shaders.wgsl] 16 grid samples] ![Figure [16-sample-aa-aliased-boundaries]: Anti-aliasing with 16 regularly-spaced samples @ 400x300 ](../images/img-14-16-sample-aa.png height="500px" class="pixel") That's an improvement but we can do better. Instead of subdividing the pixel into 16 regularly-spaced discrete regions (which is prone to the same sampling artifact), let's offset the ray by a random amount within that range. This should accumulate enough samples from various parts of the pixel area over time to yield a better estimate of the average color. Plus, why limit ourselves to only 16 discrete samples when our renderer is already set up for an indefinite amount? PRNG ---- Shading languages don't provide a built-in facility to generate random numbers, which means we need to implement our own. A class of pseudorandom number generators that is very easy to implement is called _Xorshift RNGs_.[^marsaglia] Xorshift generators work by repeatedly computing the bitwise exclusive-or of an initial seed with a bit-shifted version of itself. The result is a deterministic sequence with a uniform distribution and a long period that suits our needs.[^ch6-footnote10] We can implement the RNG as a private variable such that each GPU thread gets its own local instance of the RNG state. We generally want to seed the RNG such that the pseudorandom sequence for a pixel is different across successive frames since we want to sample a different sub-pixel coordinate each time. The sequences should also ideally differ across adjacent pixels in a single frame (instead of repeating the same spatial pattern) in order to improve the sampling distribution. We can combine `uniforms.frame_count` with the pixel's coordinates using a hash function to obtain a good initial seed for each thread. I use the _One-at-a-Time Hash_ function from Bob Jenkins' Dr Dobbs article from 1997[#Jenkins97] but you could use any other hash function as long as it's fast and has good statistical properties. The following listing defines the RNG state, the hash function, and the 32-bit xorshift. `init_rng()` initializes the state with the seed. The RNG state and the generated numbers are 32-bit unsigned integers. Since we're pretty much only dealing with floating-point numbers, the code includes a `rand_f32()` function that generates and converts a random `u32` to a `f32` between $0$ and $1$: Now to change the offset computation in the fragment shader to pick a random coordinate: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight init_rng(vec2u(pos.xy)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Offset and normalize the viewport coordinates of the ray. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [randomized-ssaa-code]: [shaders.wgsl] 16 grid samples] Now, the anti-aliased edges have a much more gradual transition and look a lot less blocky compared to our previous 16-sample AA: ![Figure [randomized-ssaa-image]: Randomized sub-pixel supersampling @ 400x300 ](../images/img-15-random-subpixel-samples.png height="500px" class="pixel") [^ch6-footnote10]: Xorshift is a so-called _linear congruential generator_. The random offsets generated with xorshift follow a _white noise_ pattern in that they appear to be "purely random": the sample points may appear clumped together in some places and have large gaps in others. A more even spatial distribution of points (e.g. using _blue noise_ or the _Sobol sequence_) is generally more desirable for stochastic methods but the Xorshift PRNG is good enough for our purposes, given our large number of samples. Path Tracing ==================================================================================================== What we perceive as color, shadows, transparency, reflections, and many other visual phenomena result from interactions of light and matter. If we want to achieve some amount of realism, it makes sense to base our computations on the real-world physics of light. That said, it's not necessary to fully simulate electromagnetic wave interactions to render a visually pleasing image. What we mainly care about is how light travels through the scene and what happens when it hits a surface. We'll adhere to a relatively simple model with the following assumptions: - Light travels in straight lines represented as rays. - A ray transports some amount of light energy, called _radiance_. - Light gets scattered when it hits a surface. The surface absorbs some of the radiance and scatters the rest towards a new direction, represented by a new ray. - A sequence of connected rays form a _light transport path_. All light transport paths originate at a light source. ![Figure [light-paths-in-a-room]: The various paths that light rays in a room may take before they reach the camera. ](../images/fig-09-light-paths-overview.svg) There are infinitely many transport paths in a scene. The paths that contribute to the rendered image are the ones that eventually arrive at the camera, so we trace a light transport path _backwards_, starting at a camera pixel. When we find an intersection with a surface in the scene, we cast a new ray in the scattering direction based on the properties of the surface. We repeat the process until a ray intersects a light source. Path Tracing Loop ----------------- Before implementing the path tracing logic let's introduce two subroutines. The first will be a new function responsible for traversing the scene and finding an intersection, called `intersect_scene`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Intersection { normal: vec3f, t: f32, } fn no_intersection() -> Intersection { return Intersection(vec3(0.), -1.); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn is_intersection_valid(hit: Intersection) -> bool { return hit.t > 0.; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn intersect_scene(ray: Ray) -> Intersection { var closest_hit = Intersection(vec3(0.), FLT_MAX); for (var i = 0u; i < OBJECT_COUNT; i += 1u) { let sphere = scene[i]; let hit = intersect_sphere(ray, sphere); if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } if closest_hit.t < FLT_MAX { return closest_hit; } return no_intersection(); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL struct Ray { origin: vec3f, direction: vec3f, } ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { init_rng(vec2u(pos.xy)); let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Offset and normalize the viewport coordinates of the ray. let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5); var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); let ray = Ray(origin, direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let hit = intersect_scene(ray); var radiance_sample: vec3f; if is_intersection_valid(hit) { radiance_sample = vec3(0.5 * hit.normal + vec3(0.5)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } else { radiance_sample = sky_color(ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersect-scene]: [shaders.wgsl] The intersect_scene function] A second new function, called `scatter`, will be responsible for evaluating the surface material. For now it returns two values: a scattering direction (typically denoted with the lower-case Greek letter $\omega$) and an attenuation factor that represents the material's reflectance in the direction of the incident ray. We store the attenuation as a `vec3f` since we're computing a separate radiance value for each color channel.[^ch7-footnote3] Surface materials (which we'll explore in Section [materials]) are represented by various _scattering functions_. A scattering function maps an _incident_ light direction $\omega_i$ to an _outgoing_ light direction $\omega_o$. The rays originate from the camera and trace the transport path backwards towards light sources, so when we call `scatter` we already know the scattering direction $\omega_o$. In that sense "scatter" is somewhat a misnomer, since we're using it to compute $\omega_i$. This doesn't really make a difference, as the incident and outgoing light directions are interchangeable. The surface scattering functions that we will implement are all going to be _bi-directional_, i.e. work the same way in either direction. As such, our `scatter` function allows the `input_ray` parameter to be either an incident or a scattered light direction. Let's make the scattering function reflect the ray around the normal vector like a perfect mirror. The law of reflection states that when a ray of light hits a surface, the angle of reflection--the angle between the reflected ray and the surface normal--is equal to the angle of incidence. The direction vector $\vec{R}$ of the reflected ray can be computed from the incident ray direction $\vec{\textbf{I}}$ as $\vec{\textbf{R}} = \vec{\textbf{I}}-2(\vec{\textbf{N}}\cdot\vec{\textbf{I}})\vec{\textbf{N}}$. Luckily, there is a handy shader instrinsic called `reflect` that implements this formula for us: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct Scatter { attenuation: vec3f, ray: Ray, } fn scatter(input_ray: Ray, hit: Intersection) -> Scatter { let scattered = reflect(input_ray.direction, hit.normal); let output_ray = Ray(point_on_ray(input_ray, hit.t), scattered); let attenuation = vec3(0.4); return Scatter(attenuation, output_ray); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL struct Ray { origin: vec3f, direction: vec3f, } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersect-scene]: [shaders.wgsl] The scatter function] The returned attenuation factor of $0.4$ means that the material absorbs 60% of the incoming radiance (in all color channels) and scatters the rest. Logically, we compute this by multiplying the transported radiance by the attenuation factor at every intersection. We don't actually know the radiance value until we reach light sources but we can compute the total attenuation and the transported radiance separately. We'll write a loop that traces a path, generating rays as it finds intersections. We'll accumulate the product of attenuation factors in a `throughput` variable and multiply that by the radiance emitted by any light source that we encounter (which is just the sky for now): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight const MAX_PATH_LENGTH: u32 = 6u; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { init_rng(vec2u(pos.xy)); let origin = vec3(0.); let focus_distance = 1.; let aspect_ratio = f32(uniforms.width) / f32(uniforms.height); // Offset and normalize the viewport coordinates of the ray. let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5); var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); let direction = vec3(uv, -focus_distance); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight var ray = Ray(origin, direction); var throughput = vec3f(1.); var radiance_sample = vec3(0.); var path_length = 0u; while path_length < MAX_PATH_LENGTH { let hit = intersect_scene(ray); if !is_intersection_valid(hit) { // If no intersection was found, return the color of the sky and terminate the path. radiance_sample += throughput * sky_color(ray); break; } let scattered = scatter(ray, hit); throughput *= scattered.attenuation; ray = scattered.ray; path_length += 1u; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL // Fetch the old sum of samples. var old_sum: vec3f; if uniforms.frame_count > 1 { old_sum = textureLoad(radiance_samples_old, vec2u(pos.xy), 0).xyz; } else { old_sum = vec3(0.); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersect-scene]: [shaders.wgsl] The intersect_scene function] `throughput` starts out as $1$ (meaning no radiance has been absorbed). We also impose an artificial limit on the length of a path to prevent looping forever if we never encounter a light source (which can happen with certain types of geometry). Running this program should produce this image: ![Figure [invalid-scatter-with-shadow-acne]: Validating the path tracing loop (with self-shadowing) ](../images/img-18-mirror-reflection-with-shadow-acne.png) We can see some reflections but there are some nasty circular bands. This artifact (called "shadow acne" or "self-shadowing") is caused by the limited (and quantized) precision inherent to floating point arithmetic. Sometimes the computed intersection point doesn't fall precisely on the sphere surface, which can cause the new ray (originating from that point) to re-intersect the sphere. A simple way to deal with this is to reject intersections for values of $t$ that are below a small offset ($\epsilon$ or _epsilon_): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL const FLT_MAX: f32 = 3.40282346638528859812e+38; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight const EPSILON: f32 = 1e-3; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn intersect_sphere(ray: Ray, sphere: Sphere) -> Intersection { let v = ray.origin - sphere.center; let a = dot(ray.direction, ray.direction); let b = dot(v, ray.direction); let c = dot(v, v) - sphere.radius * sphere.radius; let d = b * b - a * c; if d < 0. { return no_intersection(); } let sqrt_d = sqrt(d); let recip_a = 1. / a; let mb = -b; let t1 = (mb - sqrt_d) * recip_a; let t2 = (mb + sqrt_d) * recip_a; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let t = select(t2, t1, t1 > EPSILON); if t <= EPSILON { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return no_intersection(); } let p = point_on_ray(ray, t); let N = (p - sphere.center) / sphere.radius; return Intersection(N, t); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersect-scene]: [shaders.wgsl] Rejecting intersections too close to the ray origin] The code considers both `t1` and `t2` as `t2` (the farther point) could be a valid intersection if the ray originated inside the sphere (e.g. for a glass-like material). Following this change, the rendering should look like this: ![Figure [shadow-acne-fixed]: Validating the path tracing loop ](../images/img-19-mirror-reflection-no-acne.png) That looks much cleaner. Some reflections are visible and both spheres have acquired a blue tint where light paths eventually reach the sky. Some light paths bounce back and forth between both spheres. Each bounce is an "absorption event" that decreases the path throughput. The image looks darker with more absorptions, which is most apparent where the two spheres meet. Gamma Correction ---------------- Right now, this image looks a bit too dark. The perceived brightness (or luminance) of a pixel should ideally scale linearly with the stored radiance value. In other words, if a material absorbs 50% of the radiance arriving directly from the sky, it should appear half as dark as the sky. However, the reflections of both spheres become nearly invisible after only three ray bounces. This is because the surface texture expects pixel values to be _gamma encoded_. Our eyes are more sensitive to changes in dark tones than they are to similar changes in bright tones. Given that we only have a fixed range to represent pixel's luminance ($[0, 1]$), it is more efficient (in terms of storage) to allocate a bigger numerical range for smaller radiance values. This is how digital images usually get stored and virtually all displays apply _gamma correction_ while converting pixel values to light.[^ch7-footnote4] The formula for gamma ($\gamma$) encoding is $V_{out} = V_{in}^{\frac{1}{\gamma}}$. We can apply this function in the fragment shader right before outputting the color: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { ... // Compute and store the new sum. let new_sum = radiance_sample + old_sum; textureStore(radiance_samples_new, vec2u(pos.xy), vec4(new_sum, 0.)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // Display the average after gamma correction (gamma = 2.2) let color = new_sum / f32(uniforms.frame_count); return vec4(pow(color, vec3(1. / 2.2)), 1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [gamma-2.2]: [shaders.wgsl] Encoding a pixel with $\gamma = 2.2$] The gamma corrected output should look like this: ![Figure [gamma-correction]: Gamma-correction](../images/img-20-gamma-correction.png) Some platforms support textures with a _sRGB_ format. Pixels automatically undergo gamma compression (or decompression) upon writes and reads to sRGB textures. You can try this yourself: instead of applying gamma correction in the shader, change all instances of `Rgba8Unorm` and `Bgra8Unorm` in `src/main.rs` and `src/render.rs` to `Rgba8UnormSrgb` and `Bgra8UnormSrgb`. You should see a similar result if your platform supports sRGB surfaces. [^ch7-footnote3]: This RGB representation is simple and works well for most cases but cannot accurately represent effects like diffraction and interference. There are alternative representations to handle such phenomena, for example by storing a power distribution across a spectrum of constituent wavelengths. [^ch7-footnote4]: [_Understanding Gamma Correction_](https://www.cambridgeincolour.com/tutorials/gamma-correction.htm) (by Cambridge in Color) is a great short read on the topic. Path Length ----------- Let's momentarily set the attenuation factor to 1, so that both spheres reflect 100% of the energy they receive. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn scatter(input_ray: Ray, hit: Intersection) -> Scatter { let scattered = reflect(input_ray.direction, hit.normal); let output_ray = Ray(point_on_ray(input_ray, hit.t), scattered); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let attenuation = vec3(1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [no-absorption]: [shaders.wgsl] Spheres that reflect all light] You should get this result: ![Figure [mirrors-showing-bias]: Bias from early termination ](../images/img-21-max-bounces-too-low.png) The image looks a lot brighter (as expected) but there is a well-defined black circle in between the spheres. That looks wrong, given the spheres aren't supposed to absorb any light. Luckily there is an easy explanation: the current upper limit on path length (i.e. `MAX_PATH_LENGTH`) is too low to fully explore that part of the scene, so the path gets terminated before it can find the light source. ![Figure [path-sphere-interreflections]: A light transport path with 7 bounces ](../images/fig-11-sphere-interreflections.svg) Try increasing `MAX_PATH_LENGTH` to 10. There should be a black circle but smaller. It turns out that at least 13 bounces are necessary to eliminate the black circle for this particular scene: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight const MAX_PATH_LENGTH: u32 = 13u; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [increase-path-length]: [shaders.wgsl] Increased path length] ![Figure [mirrors-with-13-bounces]: Infinite mirror with 13 bounces ](../images/img-22-infinite-mirror-with-13-bounces.png) This begs the question: what is the ideal value for `MAX_PATH_LENGTH`? The answer depends on a number of factors, but it mainly comes down to the scene and performance expectations. Fewer bounces means less computation but potentially incorrect images. More bounces means more light paths get explored but more computation is necessary. It also increases the chances of wasted work on paths that don't contribute significantly to the final image. We'll revisit this topic later. Colored Spheres --------------- All real-world objects absorb some amount of light. They also impart a color on the light that they reflect. It would be nice to assign different colors to the spheres so we can tell them apart. For now, let's add an additional field to the `Sphere` structure to hold a shape's color: The RGB triplet can directly represent the attenuation factor for a given sphere. Let's have the intersection routine return the color of a sphere and use that color in the scattering function: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Intersection { normal: vec3f, t: f32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight color: vec3f, } fn no_intersection() -> Intersection { return Intersection(vec3(0.), -1., vec3(0.)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn intersect_sphere(ray: Ray, sphere: Sphere) -> Intersection { ... let p = point_on_ray(ray, t); let N = (p - sphere.center) / sphere.radius; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight return Intersection(N, t, sphere.color); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } fn intersect_scene(ray: Ray) -> Intersection { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight var closest_hit = no_intersection(); closest_hit.t = FLT_MAX; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL for (var i = 0u; i < OBJECT_COUNT; i += 1u) { let sphere = scene[i]; let hit = intersect_sphere(ray, sphere); if hit.t > 0. && hit.t < closest_hit.t { closest_hit = hit; } } if closest_hit.t < FLT_MAX { return closest_hit; } return no_intersection(); } ... fn scatter(input_ray: Ray, hit: Intersection) -> Scatter { let scattered = reflect(input_ray.direction, hit.normal); let output_ray = Ray(point_on_ray(input_ray, hit.t), scattered); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let attenuation = hit.color; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [apply-sphere-color]: [shaders.wgsl] Use sphere color to attenuate throughput] ![Figure [colored-spheres]: Spheres with different colors](../images/img-23-colored-spheres.png) Interactive Camera ==================================================================================================== So far, we've been looking at spheres from a fixed position and it would be nice to be able to move around. In order to reposition the camera with user input, we need a representation of the camera state that is shared between the CPU and GPU sides of the program. In our GPU code, we have relied on built-in vector algebra primitives (such as `vec3`) provided by WGSL. We need similar primitives on the CPU side in order to compute camera parameters (such as camera position and orientation) in response to input events generated by the windowing system. To that end, we'll be adding a new `algebra` module for linear algebra utilities: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight mod algebra; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust mod render; const WIDTH: u32 = 800; const HEIGHT: u32 = 600; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [algebra-module-decl]: [main.rs] Declare the `algebra` module] `algebra.rs` defines a single type: `Vec3`. As its name suggests, this type represents a 3-dimensional vector with three 32-bit floating point components. `Vec3` defines methods for vector operations and operator overloads for component-wise arithmetic (`+, -, *, /`) and assignment (`+=, -=, *=, /=`).[^ch8-footnote1] The memory layout of a `Vec3` consists of three contiguous `f32`'s (taking up 12 bytes) which exactly matches the layout of the WGSL `vec3f` type. [^ch8-footnote1]: In Rust, operators get overloaded by implementing traits (`std::ops::Add`, `std::ops::Sub`, `std::ops::Mul`, `std::ops::Div`, etc). The operator traits are parameterized on value types (such as `fn add(self, rhs: RHS) -> Output` in `std::ops::Add`) and don't automatically extend to invocations on borrows. For example, `a + b`, where `a` and `b` are both `Vec3`, is different from `a + &b`, `&a + b`, and `&a + &b`. The `impl_binary_op` macro automatically implements the traits for all of these combinations, for convenience. Uniforms and Alignment ---------------------- We can use the uniform buffer to make the camera parameters visible to both the CPU and GPU sides of the program. Let's define a new `CameraUniforms` structure that just stores the camera position: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Uniforms { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight camera: CameraUniforms, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL width: u32, height: u32, frame_count: u32, } @group(0) @binding(0) var uniforms: Uniforms; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight struct CameraUniforms { origin: vec3f, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { init_rng(vec2u(pos.xy)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let origin = uniforms.camera.origin; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let focus_distance = 1.; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [apply-sphere-color]: [shaders.wgsl] Use sphere color to attenuate throughput] We need to mirror these changes on the CPU side. Let's introduce a new Rust module called `camera` for all camera related code. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight use bytemuck::{Pod, Zeroable}; use crate::algebra::Vec3; #[derive(Debug, Copy, Clone, Pod, Zeroable)] #[repr(C)] pub struct CameraUniforms { origin: Vec3, } pub struct Camera { uniforms: CameraUniforms, } impl Camera { pub fn new(origin: Vec3) -> Camera { Camera { uniforms: CameraUniforms { origin }, } } pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-module]: [camera.rs] The `camera` module] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... mod algebra; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight mod camera; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust mod render; const WIDTH: u32 = 800; const HEIGHT: u32 = 600; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-module-decl]: [main.rs] Declare the `camera` module] The module defines two structs: `CameraUniforms` and `Camera`. `CameraUniforms` is going to contain only the state that will be shared with the GPU, while `Camera` is meant to be a higher level wrapper that can contain additional variables. For now, the only state is the camera origin so the type definition is pretty bare bones. Let's update our CPU-side `Uniforms` struct to mirror the GPU side by including `CameraUniforms`. We'll also reposition the camera origin to verify our changes: When you run this code, wgpu should emit an API validation error that says "_Buffer is bound with size 24 where the shader expects 32_." 24 bytes looks correct at first glance: 12 bytes for a `Vec3` (4 bytes each for 3 `f32`s), and 3 `u32`s for the `width`, `height`, and `frame_count` fields, each taking up 4 bytes. The error message says the shader declared a 32-byte struct, so where do the 8 missing bytes come from? The answer is _implicit padding_ inserted by WGSL to satisfy alignment requirements. Computers access memory more efficiently if the memory address of the accessed data is aligned to certain multiples of the processor word size. WGSL defines specific rules for its scalar and vector types[^ch8-footnote2] and it expects the memory layout of bound data structures to adhere to those rules (see Table [scalar-and-vector-alignment]). Type | Alignment | Size :----:|:---------:|:----: **u32, f32** | 4 | 4 **vec2** | 8 | 8 **vec3** | 16 | 12 **vec4** | 16 | 16 [Table [scalar-and-vector-alignment]: Alignment and data sizes for scalar and vector types.] The alignment of a struct is equal to the largest alignment among its members. The size of a struct is defined as the sum of the sizes of its members, rounded up to a multiple of its alignment. Before our last change, the `Uniforms` struct had 4-byte alignment and occupied 12 bytes in size. We introduced the `CameraUniforms` structure, which has a single member of type `vec3f` and therefore 16-byte alignment. `vec3f` is 12 bytes in size, so the struct is _padded_ with 4 bytes to bring its size up to 16. While WGSL does this implicitly, we need to explicitly add padding on the Rust side. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[derive(Debug, Copy, Clone, Pod, Zeroable)] #[repr(C)] pub struct CameraUniforms { origin: Vec3, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight _pad: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl Camera { pub fn new(origin: Vec3) -> Camera { Camera { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight uniforms: CameraUniforms { origin, _pad: 0 }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } } pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-uniforms-padded]: [camera.rs] `CameraUniforms` explicitly padded] We also introduced a new member of type `CameraUniforms` to the `Uniforms` struct. That increased the latter's alignment to 16 and brought its size up to 28 bytes. 28 is not a multiple of the new alignment and the next closest multiple is 32. Therefore we need to pad `Uniforms` with 4 additional bytes: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[derive(Copy, Clone, Pod, Zeroable)] #[repr(C)] struct Uniforms { camera: CameraUniforms, width: u32, height: u32, frame_count: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight _pad: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ... // Initialize the uniform buffer. let camera = Camera::new(Vec3::new(0., -0.5, 1.)); let uniforms = Uniforms { camera: *camera.uniforms(), width, height, frame_count: 0, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight _pad: 0, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust }; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [uniforms-padded]: [render.rs] `Uniforms` explicitly padded] The padding is currently wasted space but we will make use of it in the future. Running the program should now pass validation and render an image that looks like this: ![Figure [repositioned-camera-origin]: Camera origin repositioned](../images/img-24-camera-origin-repositioned.png) [^ch8-footnote2]: The alignment and size requirements for WGSL types are defined at https://www.w3.org/TR/WGSL/#alignment-and-size. Rotation -------- We know how to reposition the camera but the view direction is still fixed towards $-z$. Remember that we define the camera ray direction for each pixel in terms of a point on an imaginary viewport (Figure [camera-view-space]). Conceptually, rotating the camera to change the view direction is much like moving and rotating the viewport around the camera origin. Let's imagine for a moment that the coordinate system depicted in Figure [camera-view-space] is distinct from the coordinate space of the scene. This new _camera coordinate space_ has its own $x$, $y$, and $z$ axes. The viewport is always parallel to the $xz$-plane and sits some distance away on the $z$-axis. We can even define this coordinate system as left-handed so that the view direction faces the $+z$-axis instead of $-z$. Now imagine that the camera coordinate space exists within the scene coordinate space and it can move around freely. Suppose that the camera coordinate axes can point towards any direction in scene space as long as they satisfy the definition of our left-handed cartesian system: **a)** the axes are always orthogonal to each other (i.e. the angle between any two axes is 90 degrees), and **b)** from the camera's point of view, $+x$ points towards the _right_, $+y$ points _up_, and $+z$ points _forward_. Let's define the scene-space orientation of the camera coordinate axes with 3 unit vectors: $\vec{\textbf{u}}$ for $+x$, $\vec{\textbf{v}}$ for $+y$, and $\vec{\textbf{w}}$ for $+z$. These are the _basis vectors_ of the camera coordinate space. For example, our current camera orientation staring down the $-z$-axis of the scene coordinate space (with $+y$ pointing up), would have the basis vectors $\vec{\textbf{u}} = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}$, $\vec{\textbf{v}} = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}$, $\vec{\textbf{w}} = \begin{bmatrix} 0 \\ 0 \\ -1 \end{bmatrix}$. ![Figure [camera-basis-vectors]: Camera basis vectors in relation to camera parameters ](../images/fig-12-camera-basis-vectors.svg) These vectors establish a relationship between the two coordinate systems. Each basis vector tells us how to project the corresponding camera-space axis onto the scene-space axes. With this information, we can transform any vector defined in one space into the other. For example, we can rotate a ray direction vector defined in camera-space into the appropriate scene-space orientation by multiplying it by this matrix: $$ \begin{bmatrix} \textbf{u}.x & \textbf{v}.x & \textbf{w}.x \\ \textbf{u}.y & \textbf{v}.y & \textbf{w}.y \\ \textbf{u}.z & \textbf{v}.z & \textbf{w}.z \end{bmatrix} $$ $\vec{\textbf{u}}$, $\vec{\textbf{v}}$, and $\vec{\textbf{w}}$ have to be unit vectors and orthogonal. Instead of specifying them directly, we will compute them from three parameters: the camera origin, a reference point the camera should "look at", and an "up" direction. The reference point will always appear at the center of the viewport. The vector pointing from the origin to this center point is the view direction $\vec{\textbf{w}}$. The cross product of two vectors yields another vector that is orthogonal to the plane formed by the original two, so once we know $\vec{\textbf{w}}$, we can compute the other two basis vectors using a series of cross products: $$ \begin{aligned} \vec{\textbf{u}} &= \vec{\textbf{w}} \times \vec{\textbf{up}} \\ \vec{\textbf{v}} &= \vec{\textbf{u}} \times \vec{\textbf{w}} \end{aligned} $$ Let's start with the WGSL and extend the `CameraUniforms` structure to hold the basis vectors in addition to the origin. We'll construct a 3x3 matrix out of the basis vectors and use that to transform the ray which we currently compute in camera space. Note that the $z$-coordinate of the camera ray direction no longer needs to be negative, since it's now defined with respect to $\vec{\textbf{w}}$. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct CameraUniforms { origin: vec3f, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight u: vec3f, v: vec3f, w: vec3f, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } ... @fragment fn display_fs(@builtin(position) pos: vec4f) -> @location(0) vec4f { init_rng(vec2u(pos.xy)); let origin = uniforms.camera.origin; let focus_distance = 1.; // Offset and normalize the viewport coordinates of the ray. let offset = vec2(rand_f32() - 0.5, rand_f32() - 0.5); var uv = (pos.xy + offset) / vec2f(f32(uniforms.width - 1u), f32(uniforms.height - 1u)); // Map `uv` from y-down (normalized) viewport coordinates to camera coordinates. uv = (2. * uv - vec2(1.)) * vec2(aspect_ratio, -1.); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // Compute the scene-space ray direction by rotating the camera-space vector into a new // basis. let camera_rotation = mat3x3(uniforms.camera.u, uniforms.camera.v, uniforms.camera.w); let direction = camera_rotation * vec3(uv, focus_distance); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var ray = Ray(origin, direction); ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-rotation-matrix]: [shaders.wgsl] Ray direction rotated to camera basis] Once again, we need to pay attention to the required alignment on the CPU side. `u`, `v`, and `w` are declared as `vec3f` which must be aligned to an offset that's a multiple of 16. Since the size of `vec3f` is 12, we need to insert padding after each member to fix the alignment of the next member: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... #[derive(Debug, Copy, Clone, Pod, Zeroable)] #[repr(C)] pub struct CameraUniforms { origin: Vec3, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight _pad0: u32, u: Vec3, _pad1: u32, v: Vec3, _pad2: u32, w: Vec3, _pad3: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } impl Camera { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn look_at(origin: Vec3, center: Vec3, up: Vec3) -> Camera { let w = (center - origin).normalized(); let u = w.cross(&up).normalized(); let v = u.cross(&w); Camera { uniforms: CameraUniforms { origin, _pad0: 0, u, _pad1: 0, v, _pad2: 0, w, _pad3: 0, }, } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-basis-vectors-cpu]: [camera.rs] Computing the camera basis vectors] Finally, let's update the camera position and orientation to look towards the bottom of the small sphere from above: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ... // Initialize the uniform buffer. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let camera = Camera::look_at( Vec3::new(0., 0.75, 1.), Vec3::new(0., -0.5, -1.), Vec3::new(0., 1., 0.), ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let uniforms = Uniforms { camera: *camera.uniforms(), width, height, frame_count: 0, _pad: 0, }; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-new-position]: [render.rs] New camera position] ![Figure [camera-reoriented]: New camera orientation](../images/img-25-camera-look-at.png) Zoom ---- Now let's start add controls for camera movement. It is generally useful to be able to bring the camera closer to (or away from) the object in view without changing the viewing angle. Imagine a straight line through the `origin` and `center` parameters of our `Camera::look_at` function. We can achieve a simple _zoom_ effect by moving the camera forwards or backwards along this line. The basis vector $\vec{\textbf{w}}$ already gives us the forward-facing direction on this line and it has unit length. Thus, computing the displacement of the camera origin $\textbf{P}$ along this line by distance $d$ is straightforward: $$ \begin{aligned} \textbf{P}_{forward} &= \textbf{P} + \vec{\textbf{w}} \cdot d \\ \textbf{P}_{backward} &= \textbf{P} - \vec{\textbf{w}} \cdot d \\ \end{aligned} $$ ![Figure [orbit-camera-zoom-fig]: Moving the camera origin along the view direction ](../images/fig-13-orbit-camera-distance.svg) Let's implement this as a new function called `Camera::zoom`. This will take a single parameter representing the displacement. Positive values will move the origin forward while negative values will move it backwards: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn zoom(&mut self, displacement: f32) { self.uniforms.origin += displacement * self.uniforms.w; } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-fn-zoom]: [camera.rs] The `Camera::zoom` function] The next step is to wire this up to an input method. I personally prefer the scroll wheel on a mouse (or a scroll gesture on a trackpad) for zooming, so I'll show you how to implement that. `winit` sends raw input device events in the form of a `Event::DeviceEvent`. This is an enum type (just like `Event::WindowEvent`) and the specific variant for mouse wheel events is named `DeviceEvent::MouseWheel`. The event has a parameter called `delta` which we can convert to a displacement amount. There are two variants of this parameter: - `MouseScrollDelta::PixelDelta`: represents the delta in "number of pixels", typically generated by a touch screen or trackpad. - `MouseScrollDelta::LineDelta`: represents the delta in terms of "lines in a text document", typically corresponding to the discrete "clicks" of a mouse scroll wheel. The variant you receive depends on your input device. It usually makes sense to apply a scaling factor to this delta, since using it directly is likely to result in a very large displacement in scene coordinates. I used factors of 0.001 and 0.1 for the two events respectively, though the ideal factor is going to depend on your device and system settings. The `delta` value is _signed_, with positive and negative values corresponding to scrolling up and down, which translates nicely to our `displacement` parameter. We are going to handle the mouse scroll event in our main event loop. The event loop code currently doesn't have direct access to the `Camera` object, as it is internal to the `PathTracer` constructor. `PathTracer::new` currently discards the camera object, retaining only the uniform data, as the camera state has so far been static. In addition to retaining the camera state, we also need a way to update the uniforms buffer for changes to take effect before rendering a frame. I'm going to suggest a simple refactor: let's decouple the `Camera` construction from the `PathTracer` object and instead pass the camera as an argument to `PathTracer::render_frame`. We'll simply always update the camera uniforms before rendering an individual frame: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... use crate::{ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete algebra::Vec3, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust camera::{Camera, CameraUniforms}, }; ... impl PathTracer { pub fn new( device: wgpu::Device, queue: wgpu::Queue, width: u32, height: u32, ) -> PathTracer { ... // Initialize the uniform buffer. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete let camera = Camera::look_at( Vec3::new(0., 0.75, 1.), Vec3::new(0., -0.5, -1.), Vec3::new(0., 1., 0.), ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let uniforms = Uniforms { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight camera: CameraUniforms::zeroed(), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust width, height, frame_count: 0, _pad: 0, }; ... } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn render_frame(&mut self, camera: &Camera, target: &wgpu::TextureView) { self.uniforms.camera = *camera.uniforms(); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust self.uniforms.frame_count += 1; self.queue .write_buffer(&self.uniform_buffer, 0, bytemuck::bytes_of(&self.uniforms)); ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [render-frame-with-camera]: [render.rs] `render_frame` with a `Camera` parameter] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use { anyhow::{Context, Result}, winit::{ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight event::{DeviceEvent, Event, MouseScrollDelta, WindowEvent}, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust event_loop::{ControlFlow, EventLoop}, window::{Window, WindowBuilder}, }, }; mod algebra; mod camera; mod render; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight use crate::{algebra::Vec3, camera::Camera}; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust const WIDTH: u32 = 800; const HEIGHT: u32 = 600; #[pollster::main] async fn main() -> Result<()> { ... let (device, queue, surface) = connect_to_gpu(&window).await?; let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut camera = Camera::look_at( Vec3::new(0., 0.75, 1.), Vec3::new(0., -0.5, -1.), Vec3::new(0., 1., 0.), ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { WindowEvent::CloseRequested => control_handle.exit(), WindowEvent::RedrawRequested => { // Wait for the next available frame buffer. let frame: wgpu::SurfaceTexture = surface .get_current_texture() .expect("failed to get current texture"); let render_target = frame .texture .create_view(&wgpu::TextureViewDescriptor::default()); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight renderer.render_frame(&camera, &render_target); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust frame.present(); window.request_redraw(); } _ => (), }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight Event::DeviceEvent { event, .. } => match event { DeviceEvent::MouseWheel { delta } => { let delta = match delta { MouseScrollDelta::PixelDelta(delta) => 0.001 * delta.y as f32, MouseScrollDelta::LineDelta(_, y) => y * 0.1, }; camera.zoom(delta); } _ => (), }, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust _ => (), } })?; Ok(()) } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [mouse-wheel-event]: [main.rs] Updating the camera on mouse wheel events] Now, run this code and use your trackpad or mouse wheel to scroll up and down. You should see some movement but you should also see some "smudging" or "ghosting". This is what I get if I scroll back and forth, pausing at different distances for a few seconds: ![Figure [smuged-zoom]: Ghosts of zoom levels past](../images/img-26-smudged-zoom.png) This is the same effect that we saw in Figure [temporal-blur-effect], which is caused by temporal accumulation. Moving the camera effectively invalidates all the samples we have collected up to that point, as our cached radiance values only make sense for a specific camera configuration. The simplest thing we can do is discard old samples whenever we mutate the camera. Luckily, this is pretty easy to do: the code we added in Listing [sample-accumulation] already ignores old samples for the initial value of `uniforms.frame_count`. So all we need to do is reset the frame count: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl PathTracer { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn reset_samples(&mut self) { self.uniforms.frame_count = 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub fn render_frame(&mut self, camera: &Camera, target: &wgpu::TextureView) { self.uniforms.camera = *camera.uniforms(); self.uniforms.frame_count += 1; ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [path-tracer-reset-frame]: [render.rs] `PathTracer::reset_frame`] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... fn main() -> Result<()> { ... event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { ... }, Event::DeviceEvent { event, .. } => match event { DeviceEvent::MouseWheel { delta } => { let delta = match delta { MouseScrollDelta::PixelDelta(delta) => 0.001 * delta.y as f32, MouseScrollDelta::LineDelta(_, y) => y * 0.1, }; camera.zoom(delta); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight renderer.reset_samples(); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } _ => (), } _ => (), } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [mouse-wheel-reset-samples]: [main.rs] Reset samples on camera zoom] You should no longer see any artifacts when you zoom in and out: ![Figure [zooming-in-and-out]: (video) Zooming in and out ](../images/vid-02-zooming-in-and-out.mp4 autoplay muted loop) Pan --- The next camera movement type on our list is _pan_, which moves the camera left, right, up, or down without changing the view direction. We're going to align these 4 directions to the basis vectors $\vec{\mathbf{u}}$ and $\vec{\mathbf{v}}$ and displace the origin point on the 2D plane that is perpendicular to the view direction. ![Figure [orbit-camera-pan]: Pan movement on the uv-plane. ](../images/fig-14-orbit-camera-pan.svg) A new `Camera::pan` function will accept two delta values that represent displacement in two dimensions ($\vec{\mathbf{u}}$ and $\vec{\mathbf{v}}$). Note that both of these values (`du` and `dv`) can be negative: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } pub fn zoom(&mut self, displacement: f32) { self.uniforms.origin += displacement * self.uniforms.w; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn pan(&mut self, du: f32, dv: f32) { let pan = du * self.uniforms.u + dv * self.uniforms.v; self.uniforms.origin += pan; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-fn-pan]: [camera.rs] The `Camera::pan` function] Let's continue using the mouse, this time translating its motion to camera movement. `winit` sends the `DeviceEvent::MouseMotion` with a 2D `delta` parameter that contains the mouse displacement in $x$ and $y$ coordinates. Negative and positive values of the $x$ delta corresponds to left and right movement, respectively. Similarly, negative and positive values of the $y$ delta corresponds to movement up and down. Note that the application will receive the `DeviceEvent::MouseMotion` even without input focus. Unless we explicitly control when the camera should and should not move, all mouse movement will result in camera movement and reset the radiance samples. Bumping into the mouse while waiting for a slow render to resolve can be annoying, so let's prevent accidents and require that the user hold down a mouse button during movement. We can use the `DeviceEvent::Button` event to detect when a mouse button gets pressed and released. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust async fn main() -> Result<()> { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut mouse_button_pressed = false; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust event_loop.run(|event, control_handle| { control_handle.set_control_flow(ControlFlow::Poll); match event { Event::WindowEvent { event, .. } => match event { ... }, Event::DeviceEvent { event, .. } => match event { DeviceEvent::MouseWheel { delta } => { let delta = match delta { MouseScrollDelta::PixelDelta(delta) => 0.001 * delta.y as f32, MouseScrollDelta::LineDelta(_, y) => y * 0.1, }; camera.zoom(delta); renderer.reset_samples(); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight DeviceEvent::MouseMotion { delta: (dx, dy) } => { if mouse_button_pressed { camera.pan(dx as f32 * 0.01, dy as f32 * -0.01); renderer.reset_samples(); } } DeviceEvent::Button { state, .. } => { // NOTE: If multiple mouse buttons are pressed, releasing any of them will // set this to false. mouse_button_pressed = state == ElementState::Pressed; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust _ => (), } _ => (), } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [mouse-motion-camera-pan]: [main.rs] Pan camera on click-and-drag] We apply a scale factor to adjust the mouse sensitivity and also flip the sign of `dy` so that moving the mouse upwards pans the camera upwards. ![Figure [camera-pan]: (video) Pan camera with mouse movement](../images/vid-03-camera-pan.mp4 autoplay muted loop) Orbit ----- The zoom and pan controls let us move the `origin` point along the camera basis vectors without changing the view direction. In order to freely look around objects in the scene, we need a way to rotate the basis vectors. The view direction $\vec{\mathbf{w}}$ is parallel to the vector that subtends the `origin` and `center` points. We can effectively re-orient $\vec{\mathbf{w}}$ by simply re-positioning these two points with respect to each other. Keeping `origin` fixed while moving `center` would result in a _first-person_ style camera (imagine shifting your gaze around you without moving) Alternately, keeping `center` fixed while moving `origin` around would appear as moving around while facing the same stationary point. Both are valid approaches, though we're going to focus on the latter. Let's say that `origin` is allowed to move freely around `center` but we require that the distance between the two points remain fixed. Now imagine a sphere that is centered at `center`, with a radius equal to the distance between the two points. All possible positions of `origin` are then located on the surface of this sphere. ![Figure [orbit-camera-angles]: The spherical coordinates of the camera origin, with azimuth angle $\theta$ and altitude angle $\phi$ ](../images/fig-15-orbit-camera-angles.svg) Given the sphere's `center` and its radius, we can represent any point on the surface of the sphere using polar coordinates: an _azimuth_ angle and an _altitude_ angle. These two angles help us define the location of `origin` in terms of rotations around the coordinate axes. This is convenient, since we can easily map mouse movement to changes in polar coordinates, and use this representation to move `origin` around `center`. With a little bit of trigonometry, we can compute $\vec{\mathbf{w}}$ from the two angles. If we also know the distance between the camera and `center`, `origin` can be computed with a simple vector addition. ### Working With Spherical Coordinates Let's expand the `Camera` struct with 4 new parameters: - `center`: the point of camera focus, which serves as the center of rotation. - `azimuth`: the azimuth angle $\theta$, defining rotation around the $y$-axis. Values can range from $0$ to $2\pi$. - `altitude`: the altitude angle $\phi$, defining rotation around the basis vector $\vec{\mathbf{u}}$. We'll allow values to range from $-\frac{\pi}{2}$ to $\frac{\pi}{2}$ such that $sin~\phi$ yields a $y$-coordinate ranging from $-1$ to $1$. - `distance`: the distance between `center` and `origin`. This is assumed to be a positive, non-zero value. The bottom are _spherical coordinates_ for `origin`, ($\theta$, $\phi$, $r$). We're going to define the coordinate system such that ($0$, $0$, $d$) corresponds to a view direction aligned with the $-z$-axis, $d$ units away from `center`. Similarly, spherical coordinates ($0$, $\pi$, $1$) will have the view direction point down the $-y$-axis, with the camera located at the cartesian coordinates ($0$, $1$, $0$). First, we'll rework the scene so that we can more easily observe rotations (the current scene is symmetrical around the $y$-axis, so changes in azimuth would be difficult to tell). Let's also reset the camera: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... async fn main() -> Result<()> { ... let (device, queue, surface) = connect_to_gpu(&window).await?; let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut camera = Camera::look_at( Vec3::new(0., 0., 1.), Vec3::new(0., 0.,-1.), Vec3::new(0., 1., 0.), ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-default-position]: [main.rs]] ![Figure [four-spheres-orbit]: Four spheres for reference](../images/img-27-four-spheres.png) For now, we are going to do away with `Camera::look_at` and introduce a new constructor called `Camera::with_spherical_coords`. This will compute the camera uniforms (i.e. the basis vectors and the origin) from the new parameters. Since we are going to need to re-compute the camera uniforms whenever the spherical coordinates change, let's factor out that logic in a helper called `Camera::calculate_uniforms`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... pub struct Camera { uniforms: CameraUniforms, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight center: Vec3, up: Vec3, distance: f32, azimuth: f32, altitude: f32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ... impl Camera { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete pub fn look_at(origin: Vec3, center: Vec3, up: Vec3) -> Camera { let w = (center - origin).normalized(); let u = w.cross(&up).normalized(); let v = u.cross(&w); Camera { uniforms: CameraUniforms { origin, _pad0: 0, u, _pad1: 0, v, _pad2: 0, w, _pad3: 0, }, } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn with_spherical_coords( center: Vec3, up: Vec3, distance: f32, azimuth: f32, altitude: f32, ) -> Camera { let mut camera = Camera { uniforms: CameraUniforms::zeroed(), center, up, distance, azimuth, altitude, }; camera.calculate_uniforms(); camera } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } pub fn zoom(&mut self, displacement: f32) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete self.uniforms.origin += displacement * self.uniforms.w; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight self.distance = (self.distance - displacement).max(0.0); // Prevent negative distance self.uniforms.origin = self.center - self.distance * self.uniforms.w; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } pub fn pan(&mut self, du: f32, dv: f32) { let pan = du * self.uniforms.u + dv * self.uniforms.v; self.uniforms.origin += pan; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight fn calculate_uniforms(&mut self) { // TODO: calculate the correct w. let w = Vec3::new(0., 0., -1.); let origin = self.center - self.distance * w; let u = w.cross(&self.up).normalized(); let v = u.cross(&w); self.uniforms.origin = origin; self.uniforms.u = u; self.uniforms.v = v; self.uniforms.w = w; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-basis-vectors-cpu]: [camera.rs] `Camera::with_spherical_coords`] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... async fn main() -> Result<()> { ... let (device, queue, surface) = connect_to_gpu(&window).await?; let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut camera = Camera::with_spherical_coords( Vec3::new(0., 0., -1.), Vec3::new(0., 1., 0.), 2., 0., 0., ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-default-position-spherical]: [main.rs] Default camera position with spherical coordinates] ### Altitude Control We will use mouse movement to control the altitude and azimuth angles. We are already routing `DeviceEvent::MouseMotion` events to `Camera::pan` but we could reserve _left-click_ drag for orbital movement and _right-click_ drag for pan. The `DeviceEvent::Button` event has a `button` field that can be used to identify the mouse button that was pressed or released: Next, we'll define `Camera::orbit`. Let's initially ignore the azimuth angle. Vertical mouse movement will modify the altitude angle while keeping it between $-\frac{\pi}{2}$ to $\frac{\pi}{2}$. We won't allow the angle to increase or decrease beyond this range, so once the camera moves to one of these extrema, it will stay there unless it is moved in the opposite direction. This will disallow turning the scene "upside down" and spinning continously. Let's also update `Camera::calculate_uniforms` to compute $\vec{\mathbf{w}}$ using only the altitude. Consider the unit-length vector pointing from `center` to `origin`, i.e. $-\vec{\mathbf{w}}$. The $y$-coordinate of this vector is equal to $sin~\phi$. We're ignoring azimuth, so we can simply assign $0$ to the $x$-coordinate, and $cos~\phi$ to the $z$-coordinate: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight use { bytemuck::{Pod, Zeroable}, std::f32::consts::FRAC_PI_2, }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... pub fn uniforms(&self) -> &CameraUniforms { &self.uniforms } pub fn zoom(&mut self, displacement: f32) { self.uniforms.origin += displacement * self.uniforms.w; } pub fn pan(&mut self, du: f32, dv: f32) { let pan = du * self.uniforms.u + dv * self.uniforms.v; self.uniforms.origin += pan; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn orbit(&mut self, du: f32, dv: f32) { self.altitude = (self.altitude + dv).clamp(-FRAC_PI_2, FRAC_PI_2); self.calculate_uniforms(); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust fn calculate_uniforms(&mut self) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let w = { let (y, z) = self.altitude.sin_cos(); -Vec3::new(0., y, z) }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust let origin = self.center - self.distance * w; let u = w.cross(&self.up).normalized(); let v = u.cross(&w); self.uniforms.origin = origin; self.uniforms.u = u; self.uniforms.v = v; self.uniforms.w = w; } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-orbit-altitude]: [camera.rs] `Camera::orbit`, altitude-only] Now, when you hold down the left mouse button and move the camera around, you should see something like this: ![Figure [camera-orbit-altitude]: (video) Adjust altitude angle with mouse movement ](../images/vid-04-camera-orbit-altitude.mp4 autoplay muted loop) If you look carefully, you may notice that the green and yellow spheres swap places when the altitude angle is at one of the extrema. At those angles (i.e. exactly at $-\frac{\pi}{2}$ and $\frac{\pi}{2}$) the view vector $\vec{\textbf{w}}$ becomes parallel to the _up_ vector and their cross product becomes zero. This causes both $\vec{\textbf{u}}$ and $\vec{\textbf{v}}$ to become degenerate. A simple fix is to truncate the range by a small amount, so that the angle can be close but never equal to $-\frac{\pi}{2}$ or $\frac{\pi}{2}$. This only works if _up_ is exactly $(0, 1, 0)$ or $(0, -1, 0)$ and doesn't generalize to other directions: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... pub fn orbit(&mut self, du: f32, dv: f32) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight const MAX_ALT: f32 = FRAC_PI_2 - 1e-6; self.altitude = (self.altitude + dv).clamp(-MAX_ALT, MAX_ALT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust self.calculate_uniforms(); } ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-prevent-degenerate-altitude]: [camera.rs] Apply offset to altitude clamp] That should fix the issue: ![Figure [camera-fixed-altitude-clamp]: (video) Fixed altitude clamp ](../images/vid-05-fixed-altitude-clamp.mp4 autoplay muted loop) ### Azimuth Control The same way we mapped vertical mouse movement (`dv`) to changes in altitude, we'll use horizontal movement `du` to change the control. We won't restrict the horizontal orbit the way we clamped the altitude angle and instead permit orbiting in either direction indefinitely. It still make sense to restrict the value to the $[0, 2\pi]$ range since floating-point precision decreases with large values.[^ch8-footnote3] Though instead of clamping the value we'll just let it wrap, so that an azimuth angle of $3\pi$ results in a value of $\pi$. In Rust, this is achieved with the arithmetic remainder operators `%` and `%=`. These support floating point numbers and retain the sign of the value that's on the left-hand side: for example, if the mouse moves left by $-\frac{5}{2}\pi$ (i.e. -450 degrees) the resulting angle will be $-\frac{1}{2}\pi$: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust use { bytemuck::{Pod, Zeroable}, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete std::f32::consts::FRAC_PI_2, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight std::f32::consts::{FRAC_PI_2, PI}, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust }; ... impl Camera { ... pub fn orbit(&mut self, du: f32, dv: f32) { const MAX_ALT: f32 = FRAC_PI_2 - 1e-6; self.altitude = (self.altitude + dv).clamp(-MAX_ALT, MAX_ALT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight self.azimuth += du; self.azimuth %= 2. * PI; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust self.calculate_uniforms(); } ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-orbit-azimuth]: [camera.rs] Modifying the azimuth angle] We assigned the cosine of the altitude angle (i.e. $cos ~ \phi$) to the z-coordinate of the rotated $\vec{\textbf{w}}$. More generally, this quantity is equal to the length of a vector that results from projecting $\vec{\textbf{w}}$ onto the $xz$-plane (see Figure [orbit-camera-angles]). We can compute the $x$ and $z$ components of this vector from the azimuth angle as $(sin ~ \theta, 0, cos ~ \theta)$ scaled by the magnitude $cos ~ \phi$. Combining this with $sin ~ \phi$ for the $y$-coordinate we get: $$ -\vec{\textbf{w}} = \begin{bmatrix} sin~\theta \cdot cos~\phi \\ sin~\phi \\ cos~\theta \cdot cos~\phi \end{bmatrix} $$ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... impl Camera { ... fn calculate_uniforms(&mut self) { let w = { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let (y, xz_scale) = self.altitude.sin_cos(); let (x, z) = self.azimuth.sin_cos(); -Vec3::new(x * xz_scale, y, z * xz_scale) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust }; let origin = self.center - self.distance * w; let u = w.cross(&self.up).normalized(); let v = u.cross(&w); self.uniforms.origin = origin; self.uniforms.u = u; self.uniforms.v = v; self.uniforms.w = w; } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-orbit-azimuth-w]: [camera.rs] Computing $\vec{\textbf{w}}$ from azimuth and altitude] We can now rotate the camera horizontally and vertically around the center point: ![Figure [camera-orbit-azimuth]: (video) Horizontal rotation](../images/vid-06-camera-orbit-azimuth.mp4 autoplay muted loop) [^ch8-footnote3]: As a floating point number gets larger, the precision goes down as there are fewer bits to represent the mantissa. This causes the smallest representable _increments_, also known as ULP or "Unit of Least Precision" to get larger. If you allow the azimuth angle to get arbitrarily large, you may find that the same increment in `du` results in a much faster camera rotation. ### `Camera::look_at` The new representation is in terms of spherical angles because this is convenient for computing movement over a sphere. Often we'll have a particular position in mind for the camera so it's nice to have a `Camera::look_at` function that takes an explicit camera origin. Let's bring it back and redefine it using the `with_spherical_coords` function. We can compute the altitude and azimuth angles from the `origin`, `center`, and `up` parameters: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust impl Camera { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight pub fn look_at(origin: Vec3, center: Vec3, up: Vec3) -> Camera { let center_to_origin = origin - center; let distance = center_to_origin.length().max(0.01); // Prevent distance of 0 let neg_w = center_to_origin.normalized(); let azimuth = neg_w.x().atan2(neg_w.z()); let altitude = neg_w.y().asin(); Self::with_spherical_coords(center, up, distance, azimuth, altitude) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust pub fn with_spherical_coords( center: Vec3, up: Vec3, distance: f32, azimuth: f32, altitude: f32, ) -> Camera { ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [camera-look-at-returns]: [camera.rs] Updated `Camera::look_at`] Materials ========= The `scatter` function that we defined in Section [path tracing] describes what happens when light interacts with a surface: some amount of the energy gets absorbed and the rest scatters off. This is represented in the return type (the `Scatter` struct) as an attenuation factor and a ray direction. The appearance of a real-world object depends on its material composition. This includes surface features--like the overall smoothness of the surface--and molecular properties like electrical conductivity and crystal structure. Generally we can get away with an approximate representation of what makes an object appear like a metal or glass, with parameters that an artist can tune to achieve a particular appearance. We defined the first material parameter in Section [colored spheres]: color. We declared the material color as a field of the `Sphere` structure and used it to compute the attenuation factor in `scatter`. Let's introduce a new data structure `Material` and store the color there instead. We'll declare a separate array to store the `Material` instances and have each `Sphere` reference a material by its array index. Storing the material separately allows multiple `Sphere` instances to reference the same material. This reduces the overall required memory when the number of spheres exceeds the available materials. Another advantage is that it reduces the size of an individual `Sphere` which is better for memory access performance. Note that the `scene` array gets accessed far more frequently during the search for an intersection while the `materials` array is only read from once an intersection has been found. I also rearranged the scene to contain two spheres that use the same material, with a larger sphere acting as the ground and a small one touching the ground at the origin. Let's reposition the camera accordingly: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... async fn main() -> Result<()> { ... let (device, queue, surface) = connect_to_gpu(&window).await?; let mut renderer = render::PathTracer::new(device, queue, WIDTH, HEIGHT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight let mut camera = Camera::look_at( Vec3::new(0., 0.55, 1.5), Vec3::new(0., 0.5, 0.), Vec3::new(0., 1., 0.), ); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [reposition-camera-for-materials]: [main.rs]] ![Figure [gray-spheres]: Gray mirror spheres](../images/img-29-gray-spheres.png) Diffuse BRDF ------------ The purpose of the `scatter` function is to choose the next segment in the light transport path at the point of intersection. An intersection point in the scene can potentially connect infinitely many transport path edges. This is analogous to the flow of light passing through the point: every path that touches the point yields a sample of the total energy that it contributes to the rendering. At the moment, `scatter` implements a surface that behaves like a perfectly smooth mirror. A perfectly _specular_ (i.e. mirror-like) surface reflects incident light from a given direction always at the same angle, which means we can compute the next transport path segment directly by reflecting the `input_ray` around the normal. A specular surface appears "shiny" because the reflected radiance varies with the viewing direction. In contrast, a matte surface reflects incident light diffusely at many angles, such that the apparent brightness of the surface doesn't vary that much with the viewing direction. An so called "ideal" matte or _diffuse_ surface exhibits _Lambertian reflectance_ in which the reflected radiance is the same in all directions. ![Figure [lambertian-and-mirror-brdf-lobes]: The reflectance lobes for a perfectly specular (left) and a Lambertian diffuse (right) surface. A perfectly specular surface reflects the light beam from incident direction $\omega$ towards a single direction while a Lambertian diffuse surface reflects the beam equally in all directions. The same principle applies in reverse if we treat $\omega$ as the reflected/outgoing light direction. ](../images/fig-16-lambertian-and-mirror-brdf-lobes.svg) To correctly render a diffuse surface we need to consider the entire hemisphere over which the surface receives light. This involves potentially tracing infinitely many rays in all directions and combining their radiance contributions. Tracing infinitely many rays at once to cover every fraction of a continuous hemisphere isn't plausible, however, sampling a large number of directions and taking their average over time works quite well. We already have the mechanism to distribute samples temporally -- all we need to do is pick a random direction (facing outward from the surface) as the next path segment. We can do so by uniformly sampling a point over the unit hemisphere that is aligned with the surface normal. Now we come to the attenuation. Our perfect mirror implementation directly returns the material color. We could do the same for the diffuse material and get a pretty decent result but there are a few things consider if the material should resemble an accurate Lambertian surface. Radiance is a term borrowed from radiometry that is defined as radiant power per unit surface area per unit solid angle[^ch9-footnote1]. The solid angle constrains the measured radiant power (the light energy per unit time) to a small set of directions and the surface area represents a unit patch of the reflecting surface. Intuitively, radiance is defined over a volume through space rather than a dimensionless ray. Hence, when discussing the flow of light energy, the intersection is typically treated as a tiny surface patch with non-zero area rather than a dimensionless point. ![Figure [brdf-hemisphere]: The integration interval $\Omega$ and the differential solid angle $d\omega$. Radiance is defined as radiant flux (or power) arriving at $X$ via $d\omega$. $d\omega$ can be thought of a very constrained subset of directions among infinitely many over a hemosphere. ](../images/fig-10-brdf-hemisphere.svg) A surface patch in 3D space has a different _apparent_ surface area when viewed from different angles. The surface appears largest when the view direction is parallel to the surface normal. When the view direction is exactly tangent to the surface, the patch is no longer visible. Radiance is defined as _per unit surface area_ so as the projected area shrinks at grazing angles, the computed radiance has to increase if the solid angle and the incoming radiant power don't change. In order to keep radiance equal in all directions, the attenuation should scale proportionally to the apparent surface area, decreasing down to zero when the surface is viewed at a 90 degree angle. According to _Lambert's projected area law_, the apparent surface area scales proportionally to the cosine of the angle between the surface normal and the viewing direction. For a sampled incident light direction $\omega$ (which is a unit vector) and surface normal $\vec{\textbf{N}}$ this is given by the absolute value of their dot product: $| \vec{\textbf{N}} \cdot \omega |$. We can simply multiply `material.color` by `dot(N, normalize(output_ray.direction))` and we'll have a Lambertian material. ![Figure [lambert-photometria]: When viewed at an angle, the area $AB$ gets projected onto $AE$, which is smaller. This drawing is from Figure 1 in Johann Heinrich Lambert's "Photometria" [#Lambert1760]. ](../images/img-17-lambert-cosine-law-photometria.png) The material color here is an analog for _albedo_[^ch9-footnote2] which is defined as the ratio of the light reflected by the surface to the total amount of light that it receives.[^ch9-footnote3] Now, suppose that we have a function $I(\omega)$ that gives us the radiance arriving at (or leaving) a surface patch via direction $\omega$, where $\omega$ is a subset of the hemisphere of all light directions ($\Omega$). We can compute the total radiant power arriving at the patch by integrating $I$ over the hemisphere: $$ \int_\Omega I(\omega) ~ | \vec{\mathbf{N}} \cdot \omega | ~ d\omega $$ Suppose now that $\omega_o$ represents a reflected light direction (i.e. `input_ray`) and $\omega_i$ is a single direction from which light arrives at the surface. The reflected radiance $I(\omega_o)$ is given by $$ I(\omega_o) = \int_\Omega f(\omega_o, \omega_i) ~ I(\omega_i) ~ | \vec{\mathbf{N}} \cdot \omega_i | ~ d\omega_i $$ $f$ is a function that returns the reflectance of the surface material for a pair of incident and reflected light direction. The reflectance function for a perfect mirror returns 0 unless the two directions are mirror reflections of each other. The function for a Lambertian surface returns a constant value regardless of the directions. This function is also called a BRDF (for _Bi-directional Reflectance Distribution Function_). Let's ignore $I$ for a moment and only consider the Lambertian surface albedo. Replacing $f$ with the material color, the BRDF becomes a constant and we can pull it out of the integral: $$ albedo = color ~ \int_\Omega | \vec{\mathbf{N}} \cdot \omega | ~ d\omega $$ The integral simply reduces to $\pi$[^ch9-footnote4] which leads to $albedo = \pi ~ color$. The Lambertian BRDF is typically defined as $\frac{color}{\pi}$ in order to make albedo equal to the material color. This is somewhat a matter of choice but we're going to stick with this definition in the rest of this book, such that the Lambertian attenuation for any two given light direction becomes $$ \frac{color}{\pi} ~ | \vec{\mathbf{N}} \cdot \omega_i | $$ Except, we aren't computing the integral analytically but rather by random sampling using _Monte Carlo Integration_[^ch9-footnote5], so we also need to weigh each sample by its probability. Suppose we have some function $g(x)$ that we want to integrate over some domain $A$ by drawing $N$ random samples from some probability distibution. The integral can be approximated as $$ \int_A g(x) ~ dx \approx \dfrac{1}{N}\sum_{\bar{\mathbf{x}}}^N \dfrac{g(\bar{\mathbf{x}})}{\rho(\bar{\mathbf{x}})} $$ where $\rho(\bar{\mathbf{x}})$ is a probability distribution function (PDF) that was used to choose a random sample ($\bar{\mathbf{x}}$). When the sampling PDF is uniform, all samples get equal weight. If we uniformly sample the hemisphere, $\rho$ is a constant $\frac{1}{2\pi}$,[^ch9-footnote6] giving us $$ attenuation = \frac{\frac{color}{\pi} ~ | \vec{\mathbf{N}} \cdot \omega_i |}{\frac{1}{2\pi}} = 2 ~ color ~ | \vec{\mathbf{N}} \cdot \omega_i | $$ This gives us a correct solution and the resulting albedo equals the material color. As it turns out, this isn't the most efficient sampling distribution. Think about the lower parts of the hemisphere where rays are almost tangent to the surface and the dot product is near zero. Radiance computed by tracing paths in these directions will get attenuated so much that they won't significantly contribute to the resulting image. Since these directions are somewhat "unimportant", sampling all directions with equal likelihood ends up wasting computation time. That's not to say that these directions don't contribute any radiance at all. We can sample them less frequently by picking a $\rho$ that is proportional to $| \vec{\mathbf{N}} \cdot \omega_i |$ (i.e. weighted by the cosine of the angle between the two vectors) but we'll need to also weigh them _higher_ to make up for the fact that we sampled those directions less frequently (hence the division by $\rho(\bar{\mathbf{x}})$ in the Monte Carlo formula). ![Uniform hemisphere distribution](../images/fig-18-uniform-dist.png) ![Cosine-weighted hemisphere distribution](../images/fig-18-cosine-weighted-dist.png) Now here is the fun bit: we'll use a cosine-weighted distribution such that $\rho = \frac{|\vec{\mathbf{N}}\cdot\omega_i|}{\pi}.$[^ch9-footnote7] This is a very commonly used distribution for diffuse BRDFs. If we plug this into the formula for our attenuation factor we get: $$ attenuation = \frac{\frac{color}{\pi} ~ | \vec{\mathbf{N}} \cdot \omega_i |}{\frac{|\vec{\mathbf{N}}\cdot\omega_i|}{\pi}} = color $$ This perfectly matches the Lambertian BRDF. Since the factors all cancel out, we can use _color_ as the attenuation and the distribution inherently weighs the samples proportionally. There are a couple ways to sample a cosine-weighted hemisphere. One approach is to uniformly sample a 2D disk on the $xz$-plane and project the sampled point "up" to a hemisphere aligned with the $y-axis$. This produces a sampling direction in the "tangent space" of our surface patch which we'll have to rotate in a way that aligns the $y$-axis with the surface normal in the scene coordinate space. Another approach is to uniformly sample a sphere that sits tangent to the surface. This results in a cosine-weighted distribution when you project the point to the tangent hemisphere. This is also relatively simple to implement: sample the unit sphere to obtain a unit vector and add it to the surface normal. The resulting vector is our sampling direction. Let's implement this approach since a little simpler. ![Figure [cosine-weighted-hemisphere-samples]: The yellow points were uniformly sampled on a sphere. The blue points result from projecting to the yellow points to a hemisphere aligned with the sphere](../images/fig-19-tangent-sphere.svg) We'll start by sampling two numbers between 0 and 1. We'll map one of these numbers to the $[-1, 1]$ range and use it as the $y$-coordinate. We'll map the other number to the $[0, 2]$ range and use it as the spherical azimuth angle on the $xz$-plane. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL const FLT_MAX: f32 = 3.40282346638528859812e+38; const EPSILON: f32 = 1e-3; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight const TWO_PI: f32 = 6.2831853; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn rand_f32() -> f32 { return bitcast(0x3f800000u | (xorshift32() >> 9u)) - 1.; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // Uniformly sample a unit sphere centered at the origin fn sample_sphere() -> vec3f { let r0 = rand_f32(); let r1 = rand_f32(); // Map r0 to [-1, 1] let y = 1. - 2. * r0; // Compute the projected radius on the xz-plane using Pythagorean theorem let xz_r = sqrt(1. - y * y); let phi = TWO_PI * r1; return vec3(xz_r * cos(phi), y, xz_r * sin(phi)); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [sample-sphere]: [shaders.wgsl] The `sample_sphere` function] Now let's update `scatter`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn scatter(input_ray: Ray, hit: Intersection, material: Material) -> Scatter { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let scattered = normalize(hit.normal + (1. - EPSILON) * sample_sphere()); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let output_ray = Ray(point_on_ray(input_ray, hit.t), scattered); let attenuation = material.color; return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [sample-lambertian]: [shaders.wgsl] Sampling the lambertian BRDF using the tangent sphere method] Note that the listing multiplies the sampled direction by `(1. - EPSILON)`. This is to guard against an edge case in which the sampled unit direction is nearly opposite the normal vector. We make the length of the sampled vector slightly shorter than the normal, so that adding the two vectors doesn't result in a zero-length vector due to limited floating point precision. Making this change should give us two Lambertian spheres: ![Figure [lambertian-spheres]: Lambertian spheres](../images/img-30-lambertian-spheres.png) You may see a little bit of noise at first but the image should resolve fast if you have a decent GPU. The noise will be a lot more noticeable when you move the camera since the accumulated samples in the radiance texture get reset on input events. While we did this to avoid ghosting, the noise is just as undesirable in a real-time application. There are strategies to reduce the noise but we are not going to cover them in this book. ![Figure [noise-camera-movement]: (video) We see noise when the accumulation texture gets cleared](../images/vid-08-noise-lambertian.mp4 autoplay muted loop) We now have two material types to choose from. Let's add a new field to the `Material` structure that stores the type of reflectance the material exhibits. For now let's declare this as an integer with "1" representing specular and "0" representing lambertian. We can use this to create a scene with both kinds of material with various colors: ![Figure [lambertian-and-specular-spheres]: Lambertian and specular spheres](../images/img-31-specular-and-lambertian.png) The diffuse surfaces appear matte and featureless but their overall shade still varies based on their surroundings. This applies to their color too: try setting the color of the diffuse sphere on the right to white (i.e. `vec3f(1.)`). It will appear greener in the bottom half where it gets illuminated by the ground and bluer in the upper half where it faces the sky.
[^ch9-footnote1]: Radiance has units Watts per square meter per steradian ($\frac{W}{m^2 \cdot sr}$). "Steradian" is the unit of solid angle. It's analogous to radian but in 3D. [^ch9-footnote2]: [_Ray Tracing In One Weekend_][RTIOW], Chapter 10.3 Modeling Light Scatter and Reflectance. [^ch9-footnote3]: In terms of "radiant power leaving or arriving at a unit surface", albedo is the ratio of radiosity to irradiance. [^ch9-footnote4]: For a complete derivation, see the article ["Deriving Lambertian BRDF from first principles"](https://sakibsaikia.github.io/graphics/2019/09/10/Deriving-Lambertian-BRDF-From-First-Principles.html) by Sakib Saikia. [^ch9-footnote5]: For a refresher on Monte Carlo integration, take a look at [_Ray Tracing: The Rest of Your Life_][RTTROYL], Chapters 3 and 4. [^ch9-footnote6]: This follows from the surface area of a unit hemisphere is equal to $2\pi$. [^ch9-footnote7]: See [Physically Based-Rendering: From Theory To Implementation, 4th Edition, A.5.3 "Cosine-Weighted Hemisphere Sampling"](https://pbr-book.org/4ed/Sampling_Algorithms/Sampling_Multidimensional_Functions#Cosine-WeightedHemisphereSampling). Specular Transmission --------------------- We have so far focused only on the reflected portion of scattered light and assumed the rest gets absorbed into the material. This approach results in surfaces that are opaque, though some materials (such as glass, water, and air) transmit light without fully absorbing it. These materials can appear transparent or translucent according to certain physical properties, including how smooth the surface is and how light interacts with its molecules. The speed of light varies depending on the surrounding medium through which it propagates. For instance, the speed of light in vacuum is 299,792,458 m/s while the speed in glass approximately two-thirds of that (~199,861,638 m/s). When a wave transitions between two media and undergoes a sudden change in its phase velocity (e.g. when when light enters glass from air, or when water waves travel from shallow to deep water) the wavefronts appear to change direction. This phenomenon is called _refraction_. The change in phase velocity is given by the material's _index of refraction_ (IOR)[^ch9-footnote8] and the change in direction is governed by _Snell's Law_. ![Figure [refracting-wavefronts]: Wavefronts undergoing refraction at the transition to a different medium. (Image Credit: Oleg Alexandrov)](../images/Snells_law_wavefronts_by_Oleg_Alexandrov.gif class="small-gif" loop) The IOR for a given medium is defined as the ratio between the speed of light in that medium and $c$, the speed of light in vacuum. For example, glass with a IOR of 1.5 propagates light at $\frac{2}{3}c$. Here are the indices of refraction for some transparent materials (taken from _Physically Based Rendering: From Theory To Implementation_[^ch9-footnote8]): Medium | Index of refraction :------:|:-------------------: Vacuum | 1.0 Air at sea level | 1.00029 Ice | 1.31 Water (20 degrees C) | 1.333 Fused quartz | 1.46 Glass | 1.5-1.6 Sapphire | 1.77 Diamond | 2.42 Snell's Law relates the angle of incidence $\theta_i$ (the angle between the incident light ray and the surface normal) to the angle of refraction $\theta_o$ (the angle between the refracted ray and the negative surface normal) in terms of the ratio of the indices of refraction of the two media: $$ \frac{\eta_o}{\eta_i} = \frac{sin~\theta_i}{sin~\theta_o} $$ ![Figure [snells-law]: Snell's Law](../images/fig-20-snell.svg) In order to correctly implement refraction for a transmissive surface, we need to extend our material definition to specify an index of refraction. For a transmissive material, it makes sense to treat the scattering event at a ray-surface intersection as a transition from one type of medium (e.g. air) into a different type (e.g. glass). The surface of a sphere defines an enclosed volume, so the IOR relates the speed of light inside the sphere to the speed of light outside of it. According to Snell's law, we need the ratio of the IOR values on both sides of the surface, i.e. the _relative_ index of refraction. We have a choice when it comes to the material definition: the material can either store the IOR relative to vacuum or it can store the relative IOR. The former is a little bit more complicated to implement, since we need to keep track of the IOR of the surrounding volume along the light transport path and compute the relative IOR when we intersect a transmissive material. We'll instead go with the second approach: if we want to place a glass sphere inside a volume of water, the material will store $\frac{\eta_{glass}}{\eta_{water}}$. This makes the implementation simpler though at the cost of some flexibility; for instance we can't easily represent a glass sphere that is partially submerged in water and partially exposed to air.[^ch9-footnote9] We can represent the relative IOR as a 32-bit floating point number in a new field of type `f32`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... struct Material { color: vec3f, specular: u32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight ior: f32, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [ior-parameter-one]: [shaders.wgsl] IOR as its own field] Note that the `Material` structure has a 16-byte alignment due to `color`. `ior` itself only uses 4 bytes, so the compiler will add 12 bytes of padding to the end of the struct. This means that every entry in the `materials` array takes up 32 bytes in total. On the other hand, we're not really using all 32-bits that are available in the `specular` field. The two material types that `specular` represents (lambertian and mirror) are both opaque and defining an additional index of refrection isn't particularly useful when the material isn't transmissive. `f32` is a signed number. We're not going to deal with negative IOR values in this book, so the entire negative range of values is unused. With that knowledge, here is an alternative encoding that keeps the footprint of `Material` at 16 bytes: we define a single field of type `f32`. If the value is negative, then the material is transmissive and the absolute value represents the IOR. If the value is $0$, then the material is lambertian. Otherwise, a positive value means the material acts like a mirror. Let's update the material definition with this in mind. I'm choosing the name `specular_or_ior` for this new field, which isn't particularly creative but clearly conveys the intent. Let's also add a new transmissive sphere to the scene, using our new encoding: We now have a 4th material entry with $-1.5$ assigned to the `specular_or_ior` field, so its relative refractive index is 1.5 (approximately representing the air/glass interface). The color is set to `vec3(1.)`, so the material should transmit all light without any absorption. Let's now update the scatter function. Once again we're in luck: just like `reflect`, WGSL defines an intrinsic called `refract` to compute the refracted ray given an incident ray direction, a surface normal, and a relative index of refraction. Let's plug this into our `scatter` function: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn scatter(input_ray: Ray, hit: Intersection, material: Material) -> Scatter { var scattered: vec3f; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight if material.specular_or_ior > 0. { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL scattered = reflect(input_ray.direction, hit.normal); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight } else if material.specular_or_ior < 0. { let ior = abs(material.specular_or_ior); scattered = refract(input_ray.direction, hit.normal, ior); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } else { scattered = sample_lambertian(hit.normal); } let output_ray = Ray(point_on_ray(input_ray, hit.t), scattered); let attenuation = material.color; return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [refract-in-scatter]: [shaders.wgsl] Implementing a transmissive material with `refract`] When I run this on my computer, I get a very strange result: ![Figure [nan-propagation]: (video) NaN propagation](../images/vid-09-nan-propagation.mp4 autoplay muted loop) On the 3 GPUs and operating systems[^ch9-footnote10] that I ran this program on, I see a blue ring with a large black circle in the middle instead of a refractive sphere. Don't be alarmed if the result you see is different, as the color and behavior you see can vary depending on your GPU and the graphics API you're using. For me, the blue pixels originate at the ring and gradually spread to its surroundings. They seemingly "radiate" along light transport paths, even casting a "shadow" on the opposite side of the lambertian sphere. The problem has something to do with our use of `refract`, so let's consult the API documentation provided in the WGSL specification:[^ch9-footnote11] > For the incident vector `e1` and surface normal `e2`, and the ratio of indices of refraction `e3`, > `let k = 1.0 - e3 * e3 * (1.0 - dot(e2, e1) * dot(e2, e1))`. If `k` < 0.0, returns the refraction vector 0.0, > otherwise return the refraction vector `e3 * e1 - (e3 * dot(e2, e1) + sqrt(k)) * e2`. The incident vector `e1` and > the normal `e2` should be normalized for desired results according to Snell’s Law; otherwise, the results may not > conform to expected physical behavior. There are two important points being discussed here. Let's first focus on the second half: the input vectors need to be normalized. Our sphere intersection routine already returns a unit length normal vector, so let's generally assume that's always the case and `hit.normal` is already normalized. We can't really assume the same for the incident ray direction, so let's update our code to explicitly normalize it: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn scatter(input_ray: Ray, hit: Intersection, material: Material) -> Scatter { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let incident = normalize(input_ray.direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var scattered: vec3f; if material.specular_or_ior > 0. { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight scattered = reflect(incident, hit.normal); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } else if material.specular_or_ior < 0. { let ior = abs(material.specular_or_ior); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight scattered = refract(incident, hit.normal, ior); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } else { scattered = sample_lambertian(hit.normal); } let output_ray = Ray(point_on_ray(input_ray, hit.t), scattered); let attenuation = material.color; return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [scatter-normalize-incident-ray]: [shaders.wgsl] Normalizing the input ray direction] Let's run this now: ![Figure [buggy-refract-input-normalized]: Buggy refract with normalized input rays](../images/img-32-buggy-refract-with-normalized-input.png) That's better. The black hole is gone (or at least it's harder to see) but the refraction is still buggy. The API documentation states: _If k < 0.0, returns the refraction vector 0.0._ This means that under certain circumstances, the scattered ray direction can be a null vector. A null vector doesn't represent a meaningful ray direction and is bound to cause numerical issues, so we need to prevent it. ### NaN and INF Let's briefly discuss why a null vector can lead to a bug like this. After finding a valid intersection with the buggy sphere, the null ray direction is used for sphere intersection tests in the next iteration of the path tracing loop. Let's look back at `intersect_sphere`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn intersect_sphere(ray: Ray, sphere: Sphere) -> Intersection { let v = ray.origin - sphere.center; let a = dot(ray.direction, ray.direction); let b = dot(v, ray.direction); let c = dot(v, v) - sphere.radius * sphere.radius; let d = b * b - a * c; if d < 0. { return no_intersection(); } let sqrt_d = sqrt(d); let recip_a = 1. / a; let mb = -b; let t1 = (mb - sqrt_d) * recip_a; let t2 = (mb + sqrt_d) * recip_a; let t = select(t2, t1, t1 > EPSILON); if t <= EPSILON { return no_intersection(); } let p = point_on_ray(ray, t); let N = (p - sphere.center) / sphere.radius; return Intersection(N, t); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [intersect-sphere-reminder]: [shaders.wgsl] `intersect_sphere`] When `ray.direction` is null, `a` and `b` are both $0$, and consequently so is `d`. The `d < 0.` check evaluates to false and the function moves on to compute `t`. Now we have problem: the line `let recip_a = 1. / a;` divides $1$ by zero. The value of `recip_a` is mathematically undefined but if your GPU follows the standard rules for floating-point arithmetic, the result will be _infinity_. The binary representation and operations over floating point types is governed by the IEEE-754 standard.[^ch9-footnote12] According to the standard, when the result of an operation exceeds the representable range it can produce positive or negative _infinity_ (`±INF`). An example is when a very large number is divided by a very small number -- which surprisingly includes dividing a non-zero number by zero). Another special value is _NaN_ (not a number) which is produced by invalid operations, such as dividing zero by zero and multiplying zero by infinity. When at least one operand in an operation is NaN, the result is defined to be NaN. As such, NaNs are said to "poison" any calculation they are involved in. NaNs are also defined as _unordered_, meaning that boolean comparison operators (i.e. `<`, `<=`, `>=`, `>`, `==`) evaluate to `false` when an operand is NaN.[^ch9-footnote13] Following these rules, we should expect `recip_a` to be `+INF`. Since `d` and `b` are both zero, so are `sqrt_d` and `mb`, which should cause `t1`, `t2`, and `t` to be `NaN`. `t <= EPSILON` would evaluate to false and NaNs would propagate through the rest of the function. Fortunately, `intersect_scene` should treat this as "no intersection" (since the `hit.t > 0. && hit.t < closest_hit.t` check would evaluate to false. Because the procedure doesn't find any intersections, it moves on to compute the sky color with a null ray direction. The first thing that `sky_color` does with the ray direction is normalize it (`normalize(ray.direction).y`) which is implemented by dividing all components of a vector by its scalar magnitude (or "length"). The magnitude of a null vector is $0$, so dividing $0$ by $0$ produces a `t` value of NaN. This means that whenever a light transport path intersects our buggy sphere, our program computes a received radiance of NaN and stores it in the radiance texture, poisoning all future samples for that pixel. Typically NaNs in a framebuffer are treated as $0$, so one might expect the NaN pixels to be displayed as black. I'm seeing blue, i.e. the color of the sphere is exactly the RGB triplet $(0, 0, 1)$. Oddly enough if I change any component of the sky color `vec3(0.3, 0.5, 1.0)` to `1.0`, that component is $1$ in the output pixel and any value that is less than $1$ seems to become $0$. Shouldn't all channels be 0 if the radiance value is `vec3(NaN, NaN, NaN)`? In practice, shader compilers support optimizations for floating-point arithmetic that don't strictly adhere to the standard. For instance, Metal and Vulkan both support _fast math_ modes in which NaN and INF behavior is _undefined_[^ch9-footnote14] (and the WGSL specification explicitly allows them[^ch9-footnote15]). Therefore, `sky_color` is allowed to return `vec3(NaN, NaN, 1.0)` due to an optimization. Your program may produce blue pixels, black pixels, or even a plain white sphere with no NaN propagation whatsoever. It all depends on on your shader compiler configuration and the specific GPU you're running on. ### Total Internal Reflection Let's get back to `refract`. The specification says the result is a null vector if `k` < 0, where `k = 1.0 - e3 * e3 * (1.0 - dot(e2, e1) * dot(e2, e1))`, or equivalently, when `e3 * e3 * (1.0 - dot(e2, e1) * dot(e2, e1))` > 1. This rule is derived from a special case in Snell's Law in which the angle of refraction has no solution. Let's re-arrange the equation from earlier and solve for $\theta_o$: $$ \begin{eqnarray} \frac{\eta_o}{\eta_i} &=& \frac{sin~\theta_i}{sin~\theta_o} \nonumber \\ sin~\theta_o &=& \frac{\eta_i}{\eta_o} sin~\theta_i \nonumber \end{eqnarray} $$ If $\eta_i$ is greater than $\eta_o$, it is possible for the right-hand side of the equation to be greater than one. The range of the sine function is limited to $[-1, 1]$, so no value of $\theta_o$ can satisfy the equation. When this is the case, no refraction occurs and light is completely reflected instead. To handle this case, we'll compute $\frac{\eta_i}{\eta_o} sin~\theta_i$ and reflect the ray if the value is greater than 1. Remember the Pythagorean identity: $$ \begin{eqnarray} sin^2~\theta + cos^2~\theta &=& 1 \nonumber \\ sin~\theta &=& \sqrt{(1 - cos^2~\theta)} \nonumber \end{eqnarray} $$ This involves a square-root, which is a relatively expensive computation. Given the inequality that we're solving for, we can technically avoid it. If we square both sides, we get: $$ \begin{eqnarray} 1 &\ge& \frac{\eta_i}{\eta_o}\sqrt{(1 - cos^2~\theta)} \nonumber \\ 1 &\ge& (\frac{\eta_i}{\eta_o})^2(1 - cos^2~\theta) \nonumber \end{eqnarray} $$ We know how to compute $\cos\theta_i$ using the dot product: `abs(dot(incident, hit.normal))`. The right-hand side of the inequality can then be computed as `ior * ior * (1.0 - abs(dot(incident, hit.normal)))`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn scatter(input_ray: Ray, hit: Intersection, material: Material) -> Scatter { let incident = normalize(input_ray.direction); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let incident_dot_normal = dot(incident, hit.normal); let cos_theta = abs(incident_dot_normal); // `ior` only has meaning if the material is transmissive. let is_transmissive = material.specular_or_ior < 0.; let is_specular = material.specular_or_ior > 0.; let ior = abs(material.specular_or_ior); let cannot_refract = ior * ior * (1.0 - cos_theta * cos_theta) > 1.; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var scattered: vec3f; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight if is_specular || (is_transmissive && cannot_refract) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL scattered = reflect(incident, hit.normal); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight } else if is_transmissive { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL scattered = refract(incident, hit.normal, ior); } else { scattered = sample_lambertian(hit.normal); } let output_ray = Ray(point_on_ray(input_ray, hit.t), scattered); let attenuation = material.color; return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [scatter-handle-total-internal-reflection]: [shaders.wgsl] Handling total internal reflection] Let's run this now: ![Figure [broken-refraction-ratio]: Handling total internal reflection](../images/img-33-total-internal-reflection.png) The issue with NaN/INF values is gone and we now see a mirror reflection where the blue ring used to be. The image is still wrong though, as there is now a black circle in the middle again. In the black region, the camera ray refracts into the sphere but the rest of the light transport path gets stuck in a total internal reflection loop. The path tracing loop eventually exits without reaching the sky, hence the black. ### Fixing the refraction ratio The third parameter of `refract` is defined as the _ratio of the indices of refraction_, i.e. $\frac{\eta_i}{\eta_o}$. When a ray arrives from outside the sphere the ratio is $\frac{\eta_{outside}}{\eta_{inside}}$. Conversely, when a ray arrives from the inside the ratio is $\frac{\eta_{inside}}{\eta_{outside}}$. We defined `ior` as $\frac{\eta_{inside}}{\eta_{outside}}$, so we need to take the reciprocal of it when the ray arrives from the outside. How we distinguish the "outside" of a shape from the "inside" is a matter of convention. Typically, the surface normals are defined as facing outward, so we can treat the side facing the normal direction as the "front" face. ![Figure [flipped-normal]: The outward face normal and the flipped normal used for shading when the ray intersects the surface from behind](../images/fig-17-front-facing-normal.svg) The front face is easy enough to detect: if the dot product of the ray direction and the normal vector is negative, then the vectors "oppose" each other and the intersection is on the front face. It is also common practice to flip the normal vector when the intersection is on the back face, so that material evaluation has a consistent frame of reference. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn scatter(input_ray: Ray, hit: Intersection, material: Material) -> Scatter { let incident = normalize(input_ray.direction); let incident_dot_normal = dot(incident, hit.normal); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let is_front_face = incident_dot_normal < 0.; let N = select(-hit.normal, hit.normal, is_front_face); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let cos_theta = abs(incident_dot_normal); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // `ior`, `ref_ratio`, and `cannot_refract` only have meaning if the material is transmissive. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let is_transmissive = material.specular_or_ior < 0.; let is_specular = material.specular_or_ior > 0.; let ior = abs(material.specular_or_ior); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight let ref_ratio = select(ior, 1. / ior, is_front_face); let cannot_refract = ref_ratio * ref_ratio * (1.0 - cos_theta * cos_theta) > 1.; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var scattered: vec3f; if is_specular || (is_transmissive && cannot_refract) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight scattered = reflect(incident, N); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } else if is_transmissive { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight scattered = refract(incident, N, ref_ratio); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } else { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight scattered = sample_lambertian(N); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL } let output_ray = Ray(point_on_ray(input_ray, hit.t), scattered); let attenuation = material.color; return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [fix-refraction-ratio]: [shaders.wgsl] Detecting the front face to the correct refraction ratio] That was the last piece of the puzzle. This should produce the following image: ![Figure [working-specular-transmission]: Working specular transmission](../images/img-34-working-specular-transmission.png) As a quick test, change the index of refraction assigned to the material from `1.5` to `1.0/1.5`. That should trigger the total internal reflection case but the black sphere inside should look different: ![Figure [working-specular-transmission-inverted-ior]: Working specular transmission with inverted IOR](../images/img-35-working-specular-transmission-inverted-ior.png) [^ch9-footnote8]: See [Physically Based Rendering: From Theory To Implementation, 4th Edition, 9.3.2 "Specular Reflection and Transmission: The Index of Refraction"](https://pbr-book.org/4ed/Reflection_Models/Specular_Reflection_and_Transmission#TheIndexofRefraction) [^ch9-footnote9]: Another property that we overlooked is that the index of refraction varies with the wavelength of light (since phase velocity of a wave depends on its wavelength). The visual effect is called _dispersion_. This is the mechanism that creates a rainbow as light disperses through a cloud of water vapor. Our path tracer doesn't represent spectra, as such it can't render this effect. [^ch9-footnote10]: I ran this on Apple silicon running macOS (Metal), an Nvidia RTX GPU on Windows (Vulkan + D3D12), and an Intel iGPU on Linux (Vulkan). [^ch9-footnote11]: WGSL specification, Section 17.5.50: https://www.w3.org/TR/WGSL/#refract-builtin [^ch9-footnote12]: 754-2008 - IEEE Standard for Floating-Point Arithmetic: https://ieeexplore.ieee.org/document/4610935 [^ch9-footnote13]: The standard defines two kinds of NaN: quiet and signaling. Signaling NaNs allow the hardware to generate an exception while quiet NaNs "quietly" propagate through operations. The WGSL and Metal shading language standards explicitly disallow signaling NaNs. [^ch9-footnote14]: See Metal Shading Language Specification, Version 3.2, Section 7.1 "INF, NaN, and Denormalized Numbers; SPIR-V Specification, Version 1.6, Revision 5, Section 3.15 "FP Fast Math Mode". At the time of writing of this book, "fast math" is the default behavior of the Metal shader compiler. [^ch9-footnote15]: See WGSL Specification, Section 15.7.2 "Differences from IEEE-754" (https://www.w3.org/TR/WGSL/#differences-from-ieee754) Dielectric BSDF --------------- Most materials exhibit a mixture of transmission and reflection. Water and glass are both mostly transparent when viewed head on but appear increasingly more mirror-like at grazing angles. The mechanisms that give rise to this effect depend on the physical properties of the surface material as well as the wavelength and polarization of the incident light. Rendering such a surface accurately would require us to compute both the reflected and transmitted radiance and combine them using some view-dependent ratio that is plausible for what we want to simulate. A common method to compute this ratio is to use the Fresnel equations.[^ch9-footnote16] The Fresnel equations relate the surface reflectance (i.e. the reflected portion of the incident beam) to the angle of incidence and the indices of refraction at the surface interface.[^ch9-footnote17] Light is fundamentally a wave of oscillations in the electromagnetic field and the orientation of these oscillations are called _polarizations_. Reflection and transmission are dependent on the incident polarization, so the Fresnel equations come in a pair that define reflectance separately for the two orthogonal linear polarizations relative to the _plane of incidence_[^ch9-footnote18], i.e. the "perpendicular" ($\bot$) and "parallel" ($\parallel$) polarizations: $$ R_{\parallel} = \left(\frac{\eta ~ cos ~ \theta_i - cos ~ \theta_t}{\eta ~ cos ~ \theta_i + cos ~ \theta_t}\right)^2, ~ R_{\bot} = \left(\frac{cos ~ \theta_i - \eta ~ cos ~ \theta_t}{ cos ~ \theta_i + \eta ~ cos ~ \theta_t}\right)^2 $$ where $\eta$ is the relative index of refraction, $\theta_i$ is the angle of the incident ray, and $\theta_t$ is angle of refraction given by Snell's law. We can implement these equations directly for a system that takes polarization into account. We can also take their average to derive a single unpolarized _Fresnel reflectance_ factor.[^ch9-footnote17] Alternately, we can resort to an approximation that looks good enough and save on computation. We defined a valid index of refraction (i.e. a negative valued `specular_or_ior`) in the `Material` type to mean that the surface is transparent and refractive. In reality, opaque materials can have an index of refraction too. Stainless steel -- which visually appears reflective when viewed from any direction -- has a complex-valued refractive index. The real part represents the change in phase velocity and the imaginary part represents how rapidly light gets absorbed by the material, known as the _absorption coefficient_. Electrical conductors (like metals) have a high absorption coefficient for visible light. _Dielectrics_ (which are electrical insulators) generally tend to have a low absorption coefficient for visible wavelengths. The Fresnel equations can be solved for both metals and dielectrics using complex arithmetic but we'll simplify things. We'll assume that an index of refraction is only present for a dielectric and ignore the absorption coefficient (i.e. pretend that its value is $0$). We are also going to ignore polarization. Next, instead of solving the equations above we'll use a very commonly used approximation called _Schlick's Formula_.[^ch9-footnote19] The formula defines Fresnel reflectance $F$ for incident angle $\theta_i$ as: $$ F(\theta_i) = F_0 + (F_{90} - F_0) ~ (1 - cos ~ \theta_i)^5 $$ $F_0$ is reflectance at _normal incidence_ (i.e. 0 degrees) where the surface appears the least reflective. $F_{90}$ is the reflectance at 90 degrees. We'll define $F_{90}$ to be $1$, meaning all materials will exhibit perfect reflectance at a tangent angle (i.e. transmission is 0): $$ F(\theta_i) = F_0 + (1 - F_0) ~ (1 - cos ~ \theta_i)^5 $$ For dielectrics (for which the absorption coefficient is $0$), $F_0$ can be computed from the real-valued index of refraction: $$ F_0 = \left(\frac{\eta - 1}{\eta + 1}\right)^2 $$ Let's quickly translate this to code. If you look closely, the Schlick formula blends between $F_0$ and $1$ using $(1 - cos ~ \theta_i)^5$ as the blend factor. This is a simple enough expression that you can just type it out exactly, or use the [`mix`](https://www.w3.org/TR/WGSL/#mix-builtin) intrinsic: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight fn schlick_fresnel(ior: f32, cos_theta: f32) -> f32 { let u = 1 - cos_theta; let sqrt_f0 = (ior - 1.) / (ior + 1.); let f0 = sqrt_f0 * sqrt_f0; return mix(f0, 1., u * u * u * u * u); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [schlick-fresnel]: [shaders.wgsl] The `schlick_fresnel` function] Now, if the material is transmissive we need to compute the contributions from both the refracted and reflected paths and blend them with our fresnel factor. This would involve tracing both paths all the way to a light source. This is difficult to do in our current setup in which the loop in the shader's entry point iterates over a single path and invokes `scatter` once for every path segment. Let's imagine the combined reflection and transmission distributions (i.e. the BRDF and BTDF) for a dielectric as a unified _BSDF_ (i.e. a Bi-directional _scattering_ distribution function). The Fresnel factor represents the relative distribution of reflected and transmitted rays, in other words it is the likelihood that a ray gets reflected instead of transmitted.[^ch9-footnote20] As we have done for other distributions, we can sample the BSDF one segment at a time and integrate the result over time. After the check for total internal reflection we are going to randomly decide whether to reflect or refract using the Fresnel reflectance as the probability of reflecting. This choice of probability actually fits well into our Monte Carlo framework, in that we are selecting a distribution that closely matches the BSDF in much the same way as our choice of the cosine-weighted hemisphere distribution matches the Lambertian BRDF. With that, let's reorganize the code to make the flow control more clear. We'll repurpose the `is_specular` variable to represent whether we chose specular reflection over other scattering types (and not just based on `material.specular_or_ior`): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL ... fn scatter(input_ray: Ray, hit: Intersection, material: Material) -> Scatter { let incident = normalize(input_ray.direction); let incident_dot_normal = dot(incident, hit.normal); let is_front_face = incident_dot_normal < 0.; let N = select(-hit.normal, hit.normal, is_front_face); let cos_theta = abs(incident_dot_normal); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // `ior` and `ref_ratio` only have meaning if the material is transmissive. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let is_transmissive = material.specular_or_ior < 0.; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL delete let is_specular = material.specular_or_ior > 0.; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL let ior = abs(material.specular_or_ior); let ref_ratio = select(ior, 1. / ior, is_front_face); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight // Determine whether to use specular reflection. var is_specular: bool; if is_transmissive { let cannot_refract = ref_ratio * ref_ratio * (1.0 - cos_theta * cos_theta) > 1.; is_specular = cannot_refract || schlick_fresnel(ref_ratio, cos_theta) > rand_f32(); } else { is_specular = material.specular_or_ior > 0.; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL var scattered: vec3f; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight if is_specular { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL scattered = reflect(incident, N); } else if is_transmissive { scattered = refract(incident, N, ref_ratio); } else { scattered = sample_lambertian(N); } let output_ray = Ray(point_on_ray(input_ray, hit.t), scattered); let attenuation = material.color; return Scatter(attenuation, output_ray); } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Listing [specular-dielectric-bsdf]: [shaders.wgsl] Implementing a perfect specular dielectric BSDF] Note that due to WGSL's short-circuiting rules, the call to `schlick_fresnel` won't evaluate if `cannot_refract` is true, skipping the computation at angles of total internal reflection. ![Figure [dielectric-bsdf]: Specular Dielectric BSDF](../images/img-36-dielectric-bsdf.png) Here is another look from a different angle. Notice that the reflection of the sky is more prominent on the fringes: ![Figure [dielectric-bsdf]: Specular Dielectric BSDF](../images/img-37-dielectric-bsdf-up-close.png) [^ch9-footnote16]: Augustin-Jean Fresnel was an early-19th-century physicist. His work on diffraction and polarization reinforced the theory that light behaves like a wave. He also invented the "Fresnel Lens" -- a type of lighthouse reflector made of concentric prisms that combine reflection and refraction to focus the beacon. [^ch9-footnote17]: See [Physically Based Rendering: From Theory To Implementation, 4th Edition, 9.3.5 "Specular Reflection and Transmission, The Fresnel Equations"](https://pbr-book.org/4ed/Reflection_Models/Specular_Reflection_and_Transmission#TheFresnelEquations) [^ch9-footnote18]: https://en.wikipedia.org/wiki/Plane_of_incidence [^ch9-footnote19]: Christophe Schlick published this approximation in his paper titled "An Inexpensive BRDF Model for Physically-based Rendering" ([#Schlick1994]), in which he builds on an earlier approximation by Cook and Torrance in their seminal paper "A Reflectance Model for Computer Graphics" ([#CookTorrance1982]). They observed that the complex index of refraction (including the absorption coefficient) is often unknown for all wavelengths in the visible spectrum. Schlick argued that even when the values are known, the precision gained by directly solving the Fresnel equations is not worth the computational cost. [^ch9-footnote20]: See [Physically Based Rendering: From Theory To Implementation, 4th Edition, 9.5 "Dielectric BSDF"](https://pbr-book.org/4ed/Reflection_Models/Dielectric_BSDF) (insert acknowledgments.md.html here) References ========== [#Marsaglia03]: George Marsaglia, [*Xorshift RNGs*](https://www.jstatsoft.org/article/download/v008i14/916), 2003 [#Jenkins13]: Bob Jenkins, [*A hash function for hash Table lookup*](https://www.burtleburtle.net/bob/hash/doobs.html), 2013 [#Hughes13]: J.F. Hughes, A. van Dam, M. McGuire, D.F. Sklar, J.D. Foley, S.K. Feiner, K. Akeley *Computer Graphics: Principles and Practice, 3rd Edition, Section 1.6* [#AlanWolfe2024]: Alan Wolfe [*Beyond White Noise for Real-Time Rendering*](https://youtu.be/tethAU66xaA?si=qIPEwF5XTm8kO3tF) [#Immel86]: David S. Immel, Michael F. Cohen, Donald P. Greenberg *A Radiosity Method For Non-Diffuse Environments* [#Kajiya86]: James T. Kajiya *The Rendering Equation*, 1986 [#Lambert1760]: Johann Heinrich Lambert, *Photometria sive de mensura et gradibus luminis, colorum et umbrae*, 1760. Courtesy of ETH-Bibliothek Zürich, Switzerland. [#McGuire2024GraphicsCodex]: Morgan McGuire, *The Graphics Codex*, 2024 [#Saikia]: Sakib Saikia [*Deriving Lambertian BRDF from first principles*](https://sakibsaikia.github.io/graphics/2019/09/10/Deriving-Lambertian-BRDF-From-First-Principles.html) [#Pharr2023]: Matt Pharr, Wenzel Jakob, and Greg Humphreys, *Physically Based Rendering: From Theory To Implementation, 4th Edition* [#Shirley2019]: Peter Shirley et al, [*Sampling Transformations Zoo*](https://research.nvidia.com/labs/rtr/publication/shirley2019sampling/) [#CookTorrance1982]: Robert L. Cook, Kenneth E. Torrance, *A Reflectance Model for Computer Graphics*, 1982 [#Schlick1994]: Christophe Schlick, *An Inexpensive BRDF Model for Physically-based Rendering*, 1994 [^ericson]: C. Ericson, Real Time Collision Detection [^mcguire-codex]: https://graphicscodex.courses.nvidia.com/app.html [Arman Uguray]: https://github.com/armansito [Steve Hollasch]: https://github.com/hollasch [Trevor David Black]: https://github.com/trevordblack [RTIOW]: https://raytracing.github.io/books/RayTracingInOneWeekend.html [RTTROYL]: https://raytracing.github.io/books/RayTracingTheRestOfYourLife.html [rt-project]: https://github.com/RayTracing/ [gt-project]: https://github.com/RayTracing/gpu-tracing/ [gt-template]: https://github.com/RayTracing/gpu-tracing/blob/dev/code/template [discussions]: https://github.com/RayTracing/gpu-tracing/discussions/ [dxr]: https://en.wikipedia.org/wiki/DirectX_Raytracing [vkrt]: https://www.khronos.org/blog/ray-tracing-in-vulkan [rtiow-cuda]: https://developer.nvidia.com/blog/accelerated-ray-tracing-cuda/ [webgpu]: https://www.w3.org/TR/webgpu/ [Rust]: https://www.rust-lang.org/ [rust-unsafe]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html [wgpu]: https://wgpu.rs