Adrian Saldanha

Building a GPU-based bidirectional path tracer from scratch

May 30, 2019 ยท

1 Introduction

Computer graphics can sometimes seem like magic. Ultimately, however, it's just a whole bunch of math brought together. The most robust algorithm for simulation of the shading you see in films and games is called a path tracer.

I am starting an n-part series of articles wherein I build a path tracer from scratch. I hope this can be useful to people who are interested in computer graphics. I will include sections and pseudocode for the basic algorithms of a path tracer, and then we will investigate how to create a GPU based path tracer and what optimizations and techniques are necessary for the creation of what will be hopefully a real-time multiple importance sampled monte carlo path tracer.

2 How a camera produces an image

Path tracers are physically based, that is, they produce an image that should be physically plausable. In order to do this, they need to simulate the propgation of light to the camera.

There are many ways to model light. Light can be thought of as a wave or a particle. Imagine a spherical light source. As a wave: light is emitted from the light source as a spherical wave. This wave has a certain frequency and amplitude, related respectively to color and intensity. As it bounces of objects, the shape of the object influences the shape of the wave and the intensity. It finally hits the camera's lens and then the optical sensor and makes the image we're all familiar with.

In this article it is easiest to think of light as a particle. Individual photons (particles of light) are emitted from the light source in all directions. Each photon has a frequency associated with it. These photons travel in straight lines. When the hit a surface, there is a certain chance of the photon being absorbed, reflected, or refracted. These photons make their way through the scene and eventually hit the lens and then the camera's sensor.

The camera's sensor is actually a grid of sensors which over time measure a quantity which correlates with the number of photons that hit the sensor over the exposure time granted to this image. Each sensor on this grid has a little filter over it which filters out photons of certain frequencies. An example of one arangement is a bayer filter mosaic.

3 Simulating light propogation with an algorithm

Since photons travel in straight lines, the entire path a photon takes from the light source to the camera sensor can be thought of as a sequence of straight line segments, each connected back-to-back.

We will simulate this process by using rays. A ray is a point and a direction, specified by a point and a vector.

Our scene will be composed of three main things: light sources, surfaces, and a camera. We can use an idealized representation of a light source. There are many possible light sources in the real world--spheres, bulbs, cylinders. Really, a light source is another surface. However, to keep things simple, we will just use a 3D point in space to represnt our light source. Surfaces have many representations. To keep things simple, we will just render spheres. However, in the industry we often use triangles and have artists or laser scanners model objects via triangles, so eventually we will also represent our surfaces with triangles. Finally we have our camera.

4 Representing our camera

There are many representations of a camera. As before, we will start with the simplest version and then move towards a more complex but physically correct version.

Pinhole camera

There is a very easy method to make a camera. If you make a very small hole on one side of a thin large box, then place a white screen on the opposite side of the hole on the inside of the box, the objects in front of the wall will be projected onto the screen. This requires the box to completley occlude all light except for the small hole which was made on one side of the box.

To understand this further, the 3D objects all have light bouncing off them in every direction, with some directions having more light than others. Some of that light will pass through the small hole in this large box. The light from one 3D point in the scene visible from the position of the small hole will always hit the screen at one particular position. In this manner, the 3D objects get projected into a 2D image from the perspective of the small hole in the box.

We will represent this camera quite simply. Since we are working on a digital computer we also have limited resources. The screen can be split into a fine grid. For now, we will imagine a square screen. So each element of the grid, also known as a pixel, will have a center position.

If you're vigilant, you will notice that this will actually produce an image which is upside down. So we will also have to flip the image in order to get the final image we are all familiar with.

5 Shooting rays from the camera

We are almost ready to get started. Phew, that was a lot of theory! The code will be relatively short, however. From each pixel's center position, we will create a ray going through the center of the pinhole, we will call it \(c\).

As mentioned earlier, a ray can be represented by an origin \(o\) and a direction \(d\). The origin is the position of the pixel in the screen, and the direction is the normalized vector from the pixel's position to the pinhole's position.

The screen can be represented by a xy rectangle, that is, if you were to stand on it and look upwards, you'd be looking in the z direction. Arbitrarily, we will place it 1 unit away from the origin, and we will make it 1x1 in size. This will give our camera a field of view of 90 degrees. We will split it in to a grid of width \(w = 640\), and height \(h = 640\).

Putting that all together, our ray for one pixel on our screen \(p_x\) and \(p_y\) from the top left is:

$$ o = (p_x / w - 0.5, p_y / h - 0.5, 1) $$

$$ d = c - o $$

6 Representing our surfaces

We will only render spheres for now. We can then move towards triangles.

We can represent our sphere with a position and a radius. We will then intersect our ray with the spheres to determine the cloest sphere and position of the point in the scene which is to be mapped to our pixel from earlier.

For now we will also ignore the light--we will just render the distance to the sphere.

7 Casting rays out or finding the 3D point which projects to a pixel

We are looking for the closest point \(s\) which is connected to a pixel along this pixel's ray. Our sphere's surface consists of all the points \(x\) which satisfy the following equation, where \(r\) is the radius of the sphere and \(c\) is its origin:

$$ r^2 = ||x - c||^2 $$

Our ray can be mathematically represented as a origin \(o\), and all points some distance \(t\) from that origin:

$$ p(t) = o + td $$

We are interested in the points where these two equations are coincident. This means finding the values of \(t\) which also satisfy the sphere equation.

$$ r^2 = ||p(t) - c||^2 $$

Solving for \(t\):

$$ r^2 = (o + td - c) \cdot (o + td - c)$$

Expanding that out and moving all terms to one side:

$$ 0 = (d \cdot d)*t^2 \\ + 2*(d\cdot(o-c))*t \\ + (o-c) \cdot (o-c) - r^2 $$

This is a quadratic equation in \(t\), with:

$$a = d \cdot d$$

$$b = 2 * (l\cdot(o-c))$$

$$c = (o-c) \cdot (o-c)\ - r^2$$

Almost done! We can find \(t\) by solving the quadratic equation:

$$ t = \frac{-2b \pm \sqrt{a^2 - 4ac}}{2a} $$

This equation will give us either two values of t, one value of t, or a complex number. If the solution is complex, then the ray does not intersect the sphere. Otherwise, we're interested in the smallest value of \(t\).

Finally, the point we are looking for is simply the ray equation:

$$ s = p(t) = o + td $$

8 Rendering the image

Now, we make an image which is 640x640 in resolution, with each pixel's color \(c\) a gray value equal to the distance along the ray \(t\) divided by some max distance \(t_{max}\):

$$ c = t / t_{max} $$

9 Next time

You should have an image of a sphere! Unfortunately, it may only be in your head. The next article will include instructions to set up haskell and produce an image of a sphere for yourself.

In the future we will trace all possible paths from the light to the camera in order to achieve higher quality results.

Finally, we will translate this code into the language of the graphics processor for huge efficiency gains, so that we can get film-quality results in almost real-time. By the time the article is released, it will probably be in real-time.