Rendering a Sphere on a Quad

Making the Sphere Impostor Feel More Competent

Ben Golus
36 min readJan 10, 2021

GPU raytracing is all the rage these days, so lets talk about about it! Today we’re going to raytrace a single sphere.

Using a fragment shader.

Yeah, I know. Not super fancy. You can do a search on Shadertoy and get hundreds of examples. There are even several great tutorials out there already for doing sphere impostors, which is what this is. So why would I write another article on it? It’s not even the right kind of GPU raytracing!

Well, because the raytracing part isn’t really the part I’m going to focus on. This article is more about how to inject opaque ray traced or ray marched objects into a rasterized scene in Unity. But also goes into some additional tricks for dealing with rendering a sphere impostor that aren’t always immediately obvious or covered by the other tutorials I’ve seen. By the end of this we’ll have a sphere impostor on a tight quad that supports multiple lights, shadow casting, shadow receiving, and orthographic cameras for the built in forward renderer that almost perfectly mimics a high poly mesh. With no extra c# script.

My First Sphere Impostor

As mentioned in the intro, this is a well trodden area. The accurate and efficient math for drawing a sphere is already known. So I’m just going to steal the applicable function from Inigo Quilez to make a basic raytraced sphere shader that we can slap on a cube mesh.

Inigo’s examples are all written in GLSL. So we have to modify that code slightly to work with HLSL. Luckily for this function that really just means a find and replace of vec with float.

float sphIntersect( float3 ro, float3 rd, float4 sph )
{
float3 oc = ro - sph.xyz;
float b = dot( oc, rd );
float c = dot( oc, oc ) - sph.w*sph.w;
float h = b*b - c;
if( h<0.0 ) return -1.0;
h = sqrt( h );
return -b - h;
}

That function takes 3 arguments, the ro (ray origin), rd (normalized ray direction), and sph (sphere position xyz and radius w). It returns the length of the ray from the origin to the sphere surface, or a -1.0 in the case of a miss. Nice and straight forward. So all we need is those three vectors and we’ve got a nice sphere.

The ray origin is perhaps the easiest point to get. For a Unity shader, that’s going to be the camera position. Conveniently pass to every shader in the global shader uniform _WorldSpaceCameraPos. For an orthographic camera it’s a little more complex, but luckily we don’t need to worry about that.

*ominous foreshadowing*

For the sphere position we can use the world space position of the object we’re applying the shader to. That can be easily extracted from the object’s transform matrix with unity_ObjectToWorld._m03_m13_m23. The radius we can set as some arbitrary value. Lets go with 0.5 for no particular reason.

Lastly is the ray direction. This is just the direction from the camera to the world position of our surrogate mesh. That’s easy enough to get by calculating it in the vertex shader and passing along the vector to the fragment.

float3 worldPos = mul(unity_ObjectToWorld, v.vertex);
float3 rayDir = _WorldSpaceCameraPos.xyz - worldPos;

Note, it is important to not normalize this in the vertex shader. You will need to do that in the fragment shader as otherwise the interpolated values won’t work out. The value we’re interpolating is the surface position, not actually the ray direction.

But after all that we’ve got the three values we need to raytrace a sphere.

Now I said the above function returns the ray length. So to get the actual world space position of the sphere’s surface, you multiply the normalized ray by the ray length and add the ray origin. You can even get the world normal by subtracting the sphere’s position from the surface’s position and normalizing. And we pass the ray length to the clip() function to hide anything outside the sphere as that function returns -1.0 in the case of a miss.

Depth Finder

The last little bit for an effective sphere impostor is the z depth. If we want our sphere to intersect with the world properly, we need to output the sphere’s depth from the fragment shader. Otherwise we’re stuck using the depth of the mesh we’re using to render. This is actually way easier than it sounds. Since we’re already calculating the world position in the fragment shader, we can apply the same view and projection matrices that we use in the vertex shader to get the z depth. Unity even includes a handy UnityWorldToClipPos() function to make it even easier. Then it’s a matter of having an output argument that uses SV_Depthwith the clip space position’s z divided by its w.

Put that all together with some basic lighting and you get something like this:

It looks like a sphere, but it’s actually a cube.
A very round cube. Make all the boy cubes go *whaaah!*.

Texturing a Sphere

Well, that’s not too exciting. We should put a texture on it. For that we need UVs, and luckily those are pretty easy for a sphere.

Equirectangular UVs

Lets slap an equirectangular texture on this thing. For that we just need to feed the normal direction into an atan2() and an acos() and we get something like this:

float2 uv = float2(
// atan returns a value between -pi and pi
// so we divide by pi * 2 to get -0.5 to 0.5
atan2(normal.z, normal.x) / (UNITY_PI * 2.0),
// acos returns 0.0 at the top, pi at the bottom
// so we flip the y to align with Unity's OpenGL style
// texture UVs so 0.0 is at the bottom
acos(-normal.y) / UNITY_PI
);
fixed4 col = tex2D(_MainTex, uv);
Earth the final frontier.

And look at that we’ve got a perfectly … wait. What’s this!?

Is that the Greenwich Mean Line?

That’s a UV seam! How do we have a UV seam? Well, that comes down to how GPUs calculate mip level for mip maps.

Unseamly

GPUs calculate the mip level by what are known as screen space partial derivatives. Roughly speaking, this is the amount a value changes from one pixel to the one next to it, either above or below. GPUs can calculate this value for each set of 2x2 pixels, so the mip level is determine by how much the UVs change with each of these 2x2 “pixel quads”. And when we’re calculating the UVs here, the atan2() suddenly jumps from roughly 0.5 to roughly -0.5 between two pixels. That makes the GPU think the entire texture is being displayed between those two pixel. And thus it uses the absolutely smallest mip map it has in response.

So how do we work around this? Why by disabling the mip maps of course!

No no no! We absolutely do not do that. But that’s the usual solution you’ll find to most mip map related issues. (As you may have seen me complain about elsewhere.) Instead a solution was nicely presented by Marco Tarini.

The idea is to use two UV sets with the seams in different places. And for our specific case, the longitudinal UVs calculated by the atan2() are already a -0.5 to 0.5 range, so all we need is a frac() to get them into a 0.0 to 1.0 range. Then use those same partial derivatives to pick the UV set with the least change. The magical function fwidth() gives us how much the value is changing in any screen space direction.

// -0.5 to 0.5 range
float phi = atan2(worldNormal.z, worldNormal.x) / (UNITY_PI * 2.0);
// 0.0 to 1.0 range
float phi_frac = frac(phi);
float2 uv = float2(
// uses a small bias to prefer the first 'UV set'
fwidth(phi) < fwidth(phi_frac) - 0.001 ? phi : phi_frac,
acos(-worldNormal.y) / UNITY_PI
);

And now we have no more seam!

I promise it’s not hiding on the other side.

* edit: It’s come to my attention that this technique may only work properly when using Direct3D, integrated Intel GPUs, or (some?) Android OpenGLES devices. The fwidth() function when using OpenGL on desktop may run using higher accuracy derivatives than used by the GPU to determine mip levels meaning the seam will still be visible. Metal is guaranteed to always run at a higher accuracy. Vulkan can be forced to run at the lower accuracy by using coarse derivative functions, but as of writing this Unity doesn’t seems to properly transpile coarse or fine derivatives. I wrote a follow up with some alternate solutions here:

Alternative you could just use a cube map instead. Unity can convert an imported equirectangular texture into a cube map for you. But that means you loose out on anisotropic filtering. The UVW for a cube map texture sample is essentially just the sphere’s normal. You do need to flip at least the x or z axis though, because cube maps are assumed to be viewed from the “inside” of a sphere and here we want it to map to the outside.

Crunchy Edges (aka Derivatives Strike Again)

At this point if we compare the raytraced sphere shader we have with an actual high poly mesh sphere using the same equirectangular UVs, you may notice something else odd. It looks like there’s an outline around the raytraced sphere that the mesh does not have. A really aliased outline.

Crunchy “outline” on the impostor.

The cause is our pesky derivatives again. There’s one more UV seam we missed! On a mesh, derivatives are calculated per pixel quad, per triangle. In fact, if a triangle only touches a single pixel of one of those 2x2 pixel quads, the GPU still runs the fragment shader for all 4 pixels! The advantage of this is it can accurately calculate plausible derivatives which prevents this problem on a real mesh. But we don’t have a good UV outside of the sphere, the function just returns a constant -1.0 on a miss, so we have bogus UVs outside of the sphere. We can see this clearly if you comment out the clip() and outDepth lines in the shader.

The Hidden UV Seam

What we want is for the UVs to be something close to the value at the visible edge of the sphere, or maybe just past the edge. That’s surprisingly complicated to calculate. But we can get something reasonably close by finding the closest point on a ray to the sphere center. At the exact sphere edge, this is 100% accurate, but it starts to curve towards the camera slightly as you get further away from the sphere. But this is cheap and good enough to get rid of the problem and is nearly indistinguishable from a fully correct fix.

Even better, we can apply this fix by replacing the ray length with a single dot() when the sphere intersection function returns a -1.0. A super power of the dot product of two vectors is, if at least one vector is normalized, the output is the magnitude other vector is along the direction of the normalized vector. This is great for getting how far away something is in a certain direction, like how far the camera is from the sphere’s pivot along the view ray.

// same sphere intersection function
float rayHit = sphIntersect(rayOrigin, rayDir, float4(0,0,0,0.5));
// clip if -1.0 to hide sphere on miss
clip(rayHit);
// dot product gets ray length at position closest to sphere
rayHit = rayHit < 0.0 ? dot(rayDir, spherePos - rayOrigin) : rayHit;
No longer seamful.

Object Scale & Rotation

So that’s all going well, but what if we want to to make a bigger sphere or rotate it? We can move the mesh position around and the sphere tracks with it, but everything else is ignored.

We could change the sphere radius manually, but then you’d have to manually keep the mesh you’re using in sync. So it’d be easier to extract the scale from the object transform itself. And we could apply an arbitrary rotation matrix, but again it’d be way easier if we could just use the object transform.

Or we could do something even easier and do the raytracing in object space! This comes with a few other benefits we’ll get into. But before that we want to add a few lines to our shader code. First we want to use the unity_WorldToObject matrix to transform the ray origin and ray direction into object space in the vertex shader. In the fragment shader, we no longer need to get the world space object position from the transform since the sphere can now just be at the object’s origin.

// vertex shader
float3 worldSpaceRayDir = worldPos - _WorldSpaceCameraPos.xyz;
// only want to rotate and scale the dir vector, so w = 0
o.rayDir = mul(unity_WorldToObject, float4(worldSpaceRayDir, 0.0));
// need to apply full transform to the origin vector
o.rayOrigin = mul(unity_WorldToObject, float4(_WorldSpaceCameraPos.xyz, 1.0));
// fragment shader
float3 spherePos = float3(0,0,0);

With this change alone, you can now rotate and scale the game object and the sphere scales and rotates as you would expect. It even supports non-uniform scaling! Just remember that all of those “world space” positions in the shader are now in object space. So we need to transform the normal and sphere surface position to world space. Just be sure to use the object space normal for the UVs.

// now gets an object space surface position instead of world space
float3 objectSpacePos = rayDir * rayHit + rayOrigin;
// still need to normalize this in object space for the UVs
float3 objectSpaceNormal = normalize(objectSpacePos);
float3 worldNormal = UnityObjectToWorldNormal(objectSpaceNormal);
float3 worldPos = mul(unity_ObjectToWorld, float4(objectSpacePos, 1.0));
Big, little, and terrible sandwich Earths.

Other advantages are better overall precision, as using world space for everything can cause some precision issues when getting far away from the origin. Those are at least partially avoided when using object space. It also means we can remove the usage of spherePos in several places since it’s all zeros, simplifying the code a bit.

Using a Quad

So far we’ve been using a cube mesh for all of this. There are some minor benefits to using a cube for some use cases, but I promised a quad in the title of this article. Also because really there’s no good reason to use an entire cube for a sphere. There’s a lot of wasted space around the sides where we’re paying the cost of rendering the sphere where we know it’s not going to be. Especially the default Unity cube which has 24 vertices! Why also waste calculating the extra 20 vertices?

Billboard Shader

There are several examples of billboard shaders out there. The basic idea for all of them is you ignore the rotation (and scale!) of the object’s transform and instead align the mesh to face the camera in some way.

View Facing Billboard

Probably the most common version of this is a view facing billboard. This is done by transforming the pivot position into view space and adding the vertex offsets to the view space position. This is relatively cheap to do. Just remember to update the ray direction to match.

// get object's world space pivot from the transform matrix
float3 worldSpacePivot = unity_ObjectToWorld._m03_m13_m23;
// transform into view space
float3 viewSpacePivot = mul(UNITY_MATRIX_V, float4(worldSpacePivot, 1.0));
// object space vertex position + view pivot = billboarded quad
float3 viewSpacePos = v.vertex.xyz + viewSpacePivot;
// calculate the object space ray dir from the view space position
o.rayDir = mul(unity_WorldToObject,
mul(UNITY_MATRIX_I_V, float4(viewSpacePos, 0.0))
);
// apply projection matrix to get clip space position
o.pos = mul(UNITY_MATRIX_P, float4(viewSpacePos, 1.0));

However, if we just add the above code to our shader, there’s something not quite right with the sphere. It’s getting clipped on the edges, especially when the sphere is to the side or close to the camera.

Thinking too far outside the box.

This is because the quad is a flat plane, and the sphere is not. A sphere has some depth. Due to perspective the volume of the sphere will cover more of the screen than the quad does!

Artist’s Recreation of the Crime Scene

A solution to this you might use is to scale the billboard up by some arbitrary amount. But this doesn’t fully solve the problem as you have to scale the quad up quite a bit. Especially if you can get close to the sphere or have a very wide FOV. And this partially defeats the purpose of using a quad over a cube to begin with. Indeed more pixels are now rendering empty space than before with even relatively small scale increases compared to the cube.

Camera Facing Billboard

Luckily, we can do a lot better. A partial fix is to use a camera facing billboard instead of a view facing billboard and pull the quad towards the camera slightly. The difference between view facing and camera facing billboards is a view facing billboard is aligned with the direction the view is facing. A camera facing billboard is facing the camera’s position. The difference can be subtle, and the code is a bit more complex.

Instead of doing things in view space, we instead need to construct a rotation matrix that rotates a quad towards the camera. This sounds scarier than it is. You just need to get the vector that points from the object position to the camera, the forward vector, and use cross products to get the up and right vectors. Put those three vectors together and you have yourself a rotation matrix.

float3 worldSpacePivot = unity_ObjectToWorld._m03_m13_m23;// offset between pivot and camera
float3 worldSpacePivotToCamera = _WorldSpaceCameraPos.xyz - worldSpacePivot;
// camera up vector
// used as a somewhat arbitrary starting up orientation
float3 up = UNITY_MATRIX_I_V._m01_m11_m2;
// forward vector is the normalized offset
// this it the direction from the pivot to the camera
float3 forward = normalize(worldSpacePivotToCamera);
// cross product gets a vector perpendicular to the input vectors
float3 right = normalize(cross(forward, up));
// another cross product ensures the up is perpendicular to both
up = cross(right, forward);
// construct the rotation matrix
float3x3 rotMat = float3x3(right, up, forward);
// the above rotate matrix is transposed, meaning the components are
// in the wrong order, but we can work with that by swapping the
// order of the matrix and vector in the mul()
float3 worldPos = mul(v.vertex.xyz, rotMat) + worldSpacePivot;
// ray direction
float3 worldRayDir = worldPos - _WorldSpaceCameraPos.xyz;
o.rayDir = mul(unity_WorldToObject, float4(worldRayDir, 0.0));
// clip space position output
o.pos = UnityWorldToClipPos(worldPos);

This is better, but still not good. The sphere is still clipping the edges of the quad. Actually, all the four edges now. At least it’s centered. Well, we forgot to move the quad toward the camera! Technically we could also scale the quad by an arbitrary amount too, but lets come back to that point.

float3 worldPos = mul(float3(v.vertex.xy, 0.3), rotMat) + worldSpacePivot;

We’re ignoring the z of the quad and adding a small (arbitrary) offset to pull it towards the camera. The advantage of this option vs an arbitrary scaling is it should stay more closely confined to the bounds of the sphere when further away, and scale when closer just due to the perspective change, just like the sphere itself. It only starts to cover significantly more screen space than needed when really close. I chose 0.3 in the above example because it was a good balance of not covering too much of the screen when close by, while still covering all of the viewable sphere until you’re really, really close.

You know, you could probably figure the exact value to use to pull or scale the quad for a given distance from the sphere with a bit of math …

Perfect Perspective Billboard Scaling

Wait! We can figure out the value using a bit of math! We can get the exact size the quad needs to be at all camera distances from the sphere. Just needs some basic high school math!

We can calculate the angle between the camera to pivot vector and camera to visible edge of the sphere. In fact it’s always a right triangle with the 90 degree corner at the sphere’s surface! Remember your old friend SOHCAHTOA? We know the distance from the camera to the pivot, that’s the hypotenuse. And we know the radius of the sphere. From that we can calculate the base of the right angle triangle formed from projecting that angle to the quad’s plane. With that we can scale the quad instead of modifying the v.vertex.z.

// get the sine of the right triangle with the hypotenuse being the // sphere pivot distance and the opposite using the sphere radius
float sinAngle = 0.5 / length(viewOffset);
// convert to cosine
float cosAngle = sqrt(1.0 - sinAngle * sinAngle);
// convert to tangent
float tanAngle = sinAngle / cosAngle;
// those previous two lines are the equivalent of this, but faster
// tanAngle = tan(asin(sinAngle));
// get the opposite face of the right triangle with the 90 degree
// angle at the sphere pivot, multiplied by 2 to get the quad size
float quadScale = tanAngle * length(viewOffset) * 2.0;
// scale the quad by the calculated size
float3 worldPos = mul(float3(v.vertex.xy, 0.0) * quadScale, rotMat) + worldSpacePivot;

Accounting for Object Scale

At the beginning of this we converted everything in to using object space so we could trivially support rotation and scale. We still support rotation, since the quad’s orientation doesn’t actually matter. But the quad doesn’t scale with the object’s transform like the cube did. The easiest fix for this is to extract the scale from the axis of the transform matrix and multiply the radius we’re using by the max scale.

// get the object scale
float3 scale = float3(
length(unity_ObjectToWorld._m00_m10_m20),
length(unity_ObjectToWorld._m01_m11_m21),
length(unity_ObjectToWorld._m02_m12_m22)
);
float maxScale = max(abs(scale.x), max(abs(scale.y), abs(scale.z)));
// multiply the sphere radius by the max scale
float maxRadius = maxScale * 0.5;
// update our sine calculation using the new radius
float sinAngle = maxRadius / length(viewOffset);
// do the rest of the scaling code

Now you can uniformly scale the game object and the sphere will still remain perfectly bound by the quad.

Ellipsoid Bounds?

It should also be possible to calculate the exact bounds of an ellipsoid, or non-uniformly scaled sphere. Unfortunately that’s starting to get a bit more difficult. So I’m not going to put the effort into solving that problem now. I’ll leave this as “an exercise for the reader.” (Aka, I have no idea how to do it.)

Frustum Culling

One additional problem with using a quad is Unity’s frustum culling. It has no idea that the quad is being rotated in the shader, so if the game object is rotated so it’s being viewed edge on it may get frustum culled while the sphere should still be visible. The fix for this would be to use a custom quad mesh that’s had its bounds manually modified from c# code to be a box. Alternatively you can use a quad mesh with one vertex pushed forward and one back by 0.5 on the z axis. And we’re already flatten the mesh in the shader by replacing v.vertex.z with 0.0.

Shadow Casting

So now we have a nicely rendered sphere on a quad that is lit, textured, and can be moved, scaled, and rotated around. So lets make it cast shadows! For that we’ll need to make a shadow caster pass in our shader. Luckily the same vertex shader can be reused for both passes, since all it does is create a quad and pass along the ray origin and direction. And those of course will be exactly the same for the shadows as it is for the camera, right? Then the fragment shader really just needs to output the depth, so you can delete all that pesky UV and lighting code.

Oh.

The ray origin and direction need to be coming from the light, not the camera. And the value we’re using for the ray origin is always the current camera position, not the light. The good news is that’s not hard to fix. We can replace any usage of _WorldSpaceCameraPos with UNITY_MATRIX_I_V._m03_m13_m23 which gets the current view’s world position from the inverse view matrix. Now as long as the shadows are rendered with perspective projections it should all work!

Oh. Oh, no.

Directional shadows use an orthographic projection.

Orthographic Pain

The nice thing with perspective projection and ray tracing is the ray origin is where the camera is. That’s really easy to get, even for arbitrary views, as shown above. For orthographic projections the ray direction is the forward view vector. That’s easy enough to get from the inverse view matrix again.

// forward in view space is -z, so we want the negative vector
float3 worldSpaceViewForward = -UNITY_MATRIX_I_V._m02_m12_m22;

But how do we get the orthographic ray origin? If you try and search online you’ll probably come across a bunch of examples that use a c# script to get the inverse projection matrix. Or abuse the current unity_OrthoParams which has information about the orthographic projection’s width and height. You can then use the clip space position to reconstruct the near view plane position the ray is originating from. The problem with these approaches is they’re all getting the camera’s orthographic settings, not the current light’s. So instead we have to calculate the inverse matrix in the shader!

float4x4 inverse(float4x4 m) {
float n11 = m[0][0], n12 = m[1][0], n13 = m[2][0], n14 = m[3][0];
float n21 = m[0][1], n22 = m[1][1], n23 = m[2][1], n24 = m[3][1];
float n31 = m[0][2], n32 = m[1][2], n33 = m[2][2], n34 = m[3][2];
float n41 = m[0][3], n42 = m[1][3], n43 = m[2][3], n44 = m[3][3];
float t11 = n23 * n34 * n42 - n24 * n33 * n42 + n24 * n32 * n43 - n22 * n34 * n43 - n23 * n32 * n44 + n22 * n33 * n44;// ... hold up, how many more lines are there of this?!

Ok, lets not do that. Those are just the first few lines of a >30 line function of increasing length and complexity. There’s got to be a better way.

The Nearly View Plane

As it turns out, you don’t need any of that. We don’t actually need the ray origin to be at the near plane. The ray origin really just needs to be the mesh’s position pulled back along the forward view vector. Just far enough to make sure it’s not starting inside the volume of the sphere. At least assuming the camera itself isn’t already inside the sphere. And a “near plane” at the camera’s position instead of the actual near plane totally fits that bill.

We already know the world position of the vertex in the vertex shader. So we could transform the world position into view space. Zero out the viewSpacePos.z, and transform back into world space. That results in a usable ray origin for an orthographic projection!

// transform world space vertex position into view space
float4 viewSpacePos = mul(UNITY_MATRIX_V, float4(worldPos, 1.0));
// flatten the view space position to be on the camera plane
viewSpacePos.z = 0.0;
// transform back into world space
float4 worldRayOrigin = mul(UNITY_MATRIX_I_V, viewSpacePos);
// orthographic ray dir
float3 worldRayDir = worldSpaceViewForward;
// and to object space
o.rayDir = mul(unity_WorldToObject, float4(worldRayDir, 0.0));
o.rayOrigin = mul(unity_WorldToObject, worldRayOrigin);

And really we don’t even need to do all that. Remember that super power of the dot() I mentioned above? We just need the camera to vertex position vector and the normalized forward view vector. We already have the camera to vertex position vector, that’s the original perspective world space ray direction. And we know the forward view vector by extracting it from the matrix mentioned above. Conveniently this vector comes already normalized! So we can remove two of the matrix multiplies in the above code and do this instead:

float3 worldSpaceViewPos = UNITY_MATRIX_I_V._m03_m13_m23;
float3 worldSpaceViewForward = -UNITY_MATRIX_I_V._m02_m12_m2;
// originally the perspective ray dir
float3 worldCameraToPos = worldPos - worldSpaceViewPos;
// orthographic ray dir
float3 worldRayDir = worldSpaceViewForward * -dot(worldCameraToPos, worldSpaceViewForward);
// orthographic ray origin
float3 worldRayOrigin = worldPos - worldRayDir;
o.rayDir = mul(unity_WorldToObject, float4(worldRayDir, 0.0));
o.rayOrigin = mul(unity_WorldToObject, float4(worldRayOrigin, 1.0));

* There is one minor caveat. This does not work for oblique projections (aka a sheared orthographic projection). For that you really do need the inverse projection matrix. Sheared perspective projections are fine though!

Light Facing Billboard

Remember how we’re doing camera facing billboards? And that fancy math to scale the quad to account for the perspective? We don’t need any of that for an orthographic projection. Just need to do view facing billboarding and scale the quad by only the object transform’s max scale. However maybe lets not delete all of that code quite yet. We can use the existing rotation matrix construction as is, just change the forward vector to be the negative worldSpaceViewForward vector instead of the worldSpacePivotToCamera vector.

A Point of Perspective

In fact now might be a good time to talk about how the spot lights and point lights use perspective projection. If we want to support directional lights, spot lights, and point light shadows we’re going to need to support both perspective and orthographic in the same shader. Unity also uses this pass to render the camera depth texture. This means we need to detect if the current projection matrix is orthographic or not and choose between the two paths.

Well, we can find out what kind of projection matrix we’re using by checking a specific component of it. The very last component of a projection matrix will be 0.0 if it’s a perspective projection matrix, and will be 1.0 if it’s an orthographic projection matrix.

bool isOrtho = UNITY_MATRIX_P._m33 == 1.0;// billboard code
float3 forward = isOrtho ? -worldSpaceViewForward : normalize(worldSpacePivotToCamera);
// do the rest of the billboard code
// quad scaling code
float quadScale = maxScale;
if (!isOrtho)
{
// do that perfect scaling code
}
// ray direction and origin code
float3 worldRayOrigin = worldSpaceViewPos;
float3 worldRayDir = worldPos - worldSpaceRayOrigin;
if (isOrtho)
{
worldRayDir = worldSpaceViewForward * -dot(worldRayDir, worldSpaceViewForward);
worldRayOrigin = worldPos - worldRayDir;
}
o.rayDir = mul(unity_WorldToObject, float4(worldRayDir, 0.0));
o.rayOrigin = mul(unity_WorldToObject, float4(worldRayOrigin, 1.0));
// don't worry, I'll show the whole vertex shader later

And now we have a vertex function that can correctly handle both orthographic and perspective projection! And nothing needs to change in the fragment shader to account for this. Oh, and we really can use the same function for both the shadow caster and forward lit pass. And now you can use an orthographic camera as well!

Shadow Bias

Now if you’d been following along, you’ll have a shadow caster pass outputting depth. But we’re not calling any of the usual functions a shadow caster usually has for applying offset. At the moment this isn’t obvious since we’re not self shadowing yet, but it’ll be a problem if we don’t fix it.

We’re not going to use the built in TRANSFER_SHADOW_CASTER_NORMALOFFSET(o) macro for the vertex shader for this since we need to do the bias in the fragment shader. Luckily, there’s another benefit to doing the raytracing in object space. The first function that the shadow caster vertex shader macro calls assumes the position being passed to it is in object space! I mean, that makes sense, since it assumes it’s working on the starting object space vertex position. But this means we can use the biasing functions the shadow caster macros call directly using the position we’ve raytraced and they’ll just work!

Yeah, really still just a quad.
Tags { "LightMode" = "ShadowCaster" }ZWrite On ZTest LEqualCGPROGRAM
#pragma vertex vert
#pragma fragment frag_shadow
#pragma multi_compile_shadowcaster// yes, I know the vertex function is missingfixed4 frag_shadow (v2f i,
out float outDepth : SV_Depth
) : SV_Target
{
// ray origin
float3 rayOrigin = i.rayOrigin;
// normalize ray vector
float3 rayDir = normalize(i.rayDir);
// ray sphere intersection
float rayHit = sphIntersect(rayOrigin, rayDir, float4(0,0,0,0.5));
// above function returns -1 if there's no intersection
clip(rayHit);
// calculate object space position
float3 objectSpacePos = rayDir * rayHit + rayOrigin;
// output modified depth
// yes, we pass in objectSpacePos as both arguments
// second one is for the object space normal, which in this case
// is the normalized position, but the function transforms it
// into world space and normalizes it so we don't have to
float4 clipPos = UnityClipSpaceShadowCasterPos(objectSpacePos, objectSpacePos);
clipPos = UnityApplyLinearShadowBias(clipPos);
outDepth = clipPos.z / clipPos.w;
return 0;
}
ENDCG

That it. And this works for every shadow caster variant.* Directional light shadows, spot light shadows, point light shadows, and the camera depth texture! You know, should we ever want to support multiple lights…

* I didn’t add support for GLES 2.0 point light shadows. That requires outputting the distance from the light as the shadow caster pass’s color value instead of just a hard coded 0. It’s not too hard to add, but it makes the shader a bit messier with a few #if and special case data we’d need to calculate. So I didn’t include it.

* edit: I forgot one thing for handling depth on OpenGL platforms. Clip space z for OpenGL is a -w to +w range, so you need to do one extra step to convert that into the 0.0 to 1.0 range needed for the fragment shader output depth.

#if !defined(UNITY_REVERSED_Z) // basically only OpenGL
outDepth = outDepth * 0.5 + 0.5;
#endif

Shadow Receiving

So now we have a working shadow casting. What about shadow receiving? This is getting into the gritty underbelly of Unity specific stuff. Turn back now if ye be mortal … er, if you don’t care about Unity’s built in forward rendering path so much. (Or at least skip to the next section about depth.)

Lighting it Up

Early on I posted a shader with a basic diffuse lighting setup. If you’ve managed to keep up with the rest of the article, the lighting code for the forward base pass should look something like this now.

// world space surface normal and position
float3 worldNormal = UnityObjectToWorldNormal(objectSpaceNormal);
float3 worldPos = mul(unity_ObjectToWorld, float4(objectSpacePos, 1.0));
// basic lighting
half3 worldLightDir = UnityWorldSpaceLightDir(worldPos);
half ndotl = saturate(dot(worldNormal, worldLightDir));
half3 lighting = _LightColor0 * ndotl;
// ambient lighting
half3 ambient = ShadeSH9(float4(worldNormal, 1));
lighting += ambient;
// apply lighting
col.rgb *= lighting;

Nothing too fancy. Get your world normal and world position. Get the world light direction. Do a clamped dot product. Multiply the light color by the dot product, add ambient lighting, and multiply the texture by the lighting. This is kind of your starting lit shader tutorial code. But we’re obviously missing shadows.

For a traditional forward base lit shader we’d want to add a hand full of macros to places and Unity automagically gets us what we need. Add the SHADOW_COORDS(#) to the v2f struct, call TRANSFER_SHADOW(o); in the vertex function, and then call UNITY_LIGHT_ATTENUATION(atten, i, worldPos); in the fragment shader. We could certainly do that, at least for the forward base pass. On desktop and console, Unity’s directional light’s shadows use screen space shadows. That is the shadow maps are rendered, then they’re cast onto the world position calculated from the camera depth texture before hand and saved into a screen space texture. So the above macros are just passing along the screen space position, which you can cheaply calculate from the clip space position.

Usually this is done by that TRANSFER_SHADOW(o); macro mentioned above, and passed from the vertex to the fragment shader. But we’re already calculating the clip space position in the fragment shader anyway. We can reuse it to calculate the screen space position with the same ComputeScreenPos(clipPos) function the macro calls. Then we can use the final built in macro and have it do the rest of the work for us.

And we do want to use that UNITY_LIGHT_ATTENUATION(atten, i, worldPos); macro. It handles extra stuff like light cookies for us. And for another reason I’ll get to in a moment.

But there’s one minor catch. The built in shadow macro expects you to pass in the a struct with screen space position. And our v2f struct doesn’t have it, nor do we want to add it to that struct if we don’t have to.

Thankfully we don’t, we can make a dummy struct! It just needs the SHADOW_COORDS(0) macro to add the struct element the other macro expects, and then we can set the value it adds ourselves.

// dummy struct
struct shadowInput {
SHADOW_COORDS(0)
);
// world space position and clip space position
float3 worldPos = mul(unity_ObjectToWorld, float4(surfacePos, 1.0));
float4 clipPos = UnityWorldToClipPos(float4(worldPos, 1.0));
#if defined (SHADOWS_SCREEN)
// setup shadow struct for screen space shadows
shadowInput shadowIN;
#if defined(UNITY_NO_SCREENSPACE_SHADOWS)
// mobile shadows
shadowIN._ShadowCoord = mul(unity_WorldToShadow[0], float4(worldPos, 1.0));
#else
// screen space shadows
shadowIN._ShadowCoord = ComputeScreenPos(clipPos);
#endif // UNITY_NO_SCREENSPACE_SHADOWS
#else
float shadowIN = 0;
#endif // SHADOWS_SCREEN
// macro creates a variable named atten with the shadow
UNITY_LIGHT_ATTENUATION(atten, shadowIN, worldPos);
// multiply the directional lighting by atten
half3 lighting = _LightColor0 * ndotl * atten;

And now we can receive directional shadows!

Catching shade.

Multiple Lights

So I mentioned how we really did want to use the UNITY_LIGHT_ATTENUATION macro above. Here’s the real reason why. It also handles other lights types! Unity’s built in forward renderer draws multiple lights by rendering the object again for each light. So we need a forward add pass. And the only thing keeping the one we now have for the forward base pass from working with the forward add pass is the ambient lighting. So you can copy the fragment shader function and delete the two lines of ambient lighting code.

Or you could stick those three lines for the ambient lighting inside of an
#if defined(UNITY_SHOULD_SAMPLE_SH) which will only be true for the base pass. Then you can share the exact same function for both passes.

RTX Off!

Fragmented Depth

There’s one big caveat about using SV_Depth. It disables early depth rejection. Basically that means you’re going to be paying the cost of rendering the impostor if it’s anywhere within the view frustum. Even if it’s behind something and not visible. Normally GPUs can use the depth buffer to skip running the fragment shader for meshes behind something else that has already been rendered closer to the camera. But since the GPU doesn’t know what the depth is going to be until after the fragment shader has run, it can’t do that.

“What about SV_DepthLessEqual or SV_DepthGreaterEqual?”

Yes! Excellent question Mr Pettineo. How did you know I was thinking that?

Conservative Depth Output

The SV_DepthLessEqual and SV_DepthGreaterEqual semantics are replacements for SV_Depth that tell the GPU to still do early depth rejection added for Shader Model 5.0. However to use this we have to make sure the mesh is as close or closer to the camera than the sphere we’re about to render. To do that we’re going to want to pull the mesh towards the camera. Right now the camera facing quad is sitting at the sphere’s center.

The thing is we need to move the vertices closer to the camera without modifying their screen space position. We’ve already calculated the perfect bounds for them, so it’d be unfortunate if we end up undoing that.

One option would be to calculate the clip space position for a view plane maxRadius closer to the camera than the sphere’s pivot. Then replace the z of the already calculated clip space position. Clip space has a really neat feature that you can change the z of a clip space position without affecting its position on screen or cause problems with interpolation.

// usual clip space at the end of the shader
o.pos = UnityWorldToClipPos(worldPos);
// get a position maxRadius closer from the sphere pivot along the
// forward view vector
float4 nearerClip = UnityWorldToClipPos(worldSpacePivotPos — worldSpaceViewForward * maxRadius);
// convert apply "pespective divide" to get real depth Z
float nearerZ = nearerClip.z / nearerClip.w
// replace the original clip space z with the new one
o.pos.z = nearerZ * o.pos.w;

But this technique has one big flaw. If you move the camera too close to or try to move past our impostor sphere, it’ll just disappear when we should still see it. The problem is the “nearer depth” is getting placed behind the camera. We can try to put some more work into this though. Like try and limit the z to the near plane. Or rather clamping to just inside the near clip plane as on the near clip plane will cause it to still get culled.

// clamp to just inside the near clip plane
o.pos.z = min(o.pos.w - 1.0e-6f, nearerZ * o.pos.w);

But … this doesn’t actually work like you’d expect.

I lied a bit when I said you could change the z of a clip space position without any problems. This fails in exactly one case, and it’s when some vertices of a mesh are behind the camera. The exact case we’re trying to work around. Even with the clamping, the quad is still more clipped than it should be. So this was a bust.

And honestly, I don’t fully understand the problem well enough to explain why.

But there is a cheaper solution that behaves just well in the general case and doesn’t fail in the “some vertices behind the camera” case! We can move the vertices one sphere radius along the ray direction. For orthographic projection this would literally just be the world position - the forward view ✕ the sphere radius. For perspective projection if we use the normalized ray direction it wouldn’t actually pull it far enough. So we need to bring back our friend the dot() again to find out how far we need to offset it to properly pull the quad’s surface exactly one sphere radius closer.

// this pushes the vertices towards the camera
// add just before the UnityWorldToClipPos line in the vertex shader
worldPos += worldSpaceRayDir / dot(normalize(viewOffset), worldSpaceRayDir) * maxRadius;
// usual clip space at the end of the shader
o.pos = UnityWorldToClipPos(worldPos);

Now the camera will still near clip with the sphere when you get close, but the result is very similar to clipping an actual sphere mesh. Generally speaking if the mesh isn’t getting clipped, neither will the ray offset quad.

Once that’s added it’s only a matter of replacing the SV_Depth semantic in the fragment shaders to the appropriate option. For anything not OpenGL, you’ll want to use SV_DepthLessEqual. This is because Unity uses a reversed Z depth for non-OpenGL platforms. Reversed Z depth means things further away have a smaller depth value than closer objects. So really we can just check for if the UNITY_REVERSED_Z keyword is active. For OpenGL … well, really this is all moot. We can’t guarantee an OpenGL platform supports the equivalent of SV_DepthGreaterEqual until OpenGL 4.2. ̶A̶n̶d̶ ̶I̶ ̶t̶h̶i̶n̶k̶ ̶O̶p̶e̶n̶G̶L̶ ̶4̶.̶2̶ ̶u̶s̶e̶s̶ ̶r̶e̶v̶e̶r̶s̶e̶d̶ ̶z̶ ̶d̶e̶p̶t̶h̶. Basically you’re likely stuck using SV_Depth on any platform that’s not using the reversed z depth. And then all this offsetting of the quad closer to the camera to reduce over shading is sadly pointless for those platforms. But we can at least handle both cases in the shader.

* edit: Unity running OpenGL 4.2+ still uses regular z depth. You could use SV_DepthGreaterEqual for that, but realistically any platform that supports OpenGL 4.2 you’ll want to be running Direct3D, Vulkan, or Metal on instead.

// update the fragment shader functions like this
half4 frag_(forward/shadow) (v2f i
#if UNITY_REVERSED_Z && SHADER_TARGET > 40
, out float outDepth : SV_DepthLessEqual
#else
// the device probably can't use conservative depth
, out float outDepth : SV_Depth
#endif
) : SV_Target

Finishing Touches

There are a few minor items left to fully flesh out the shader. Support for “per vertex” non-important lights, fog, and basic instancing. These aren’t terribly interesting, so I’ll cover them quickly.

“Per Vertex” Non-Important Lights

Since we don’t really have a lot of vertices, we’ll also need to call the “vertex light” function in the fragment shader. This is really just a matter of copy and pasting the vertex lighting function, sticking it inside an #if, and adding the returned value to the lighting.

#if defined(VERTEXLIGHT_ON)
// "per vertex" non-important lights
half3 vertexLighting = Shade4PointLights(
unity_4LightPosX0, unity_4LightPosY0, unity_4LightPosZ0,
unity_LightColor[0].rgb, unity_LightColor[1].rgb,
unity_LightColor[2].rgb, unity_LightColor[3].rgb,
unity_4LightAtten0,
worldPos, worldNormal);
lighting += vertexLighting;
#endif

Or at least it should be this simple. The VERTEXLIGHT_ON is one of the keywords controlled by #pragma multi_compile_fwdbase. But it seems like if you don’t have that function in the vertex shader the shader variant with the keyword is never created. So you also have to force it with your own multi compile line.

#pragma multi_compile _ VERTEXLIGHT_ON

Fog

Like with many things covered in this, Unity’s built in macros assume you’re outputting some kind of value from the vertex shader. For desktop, this is just passing the raw clipPos.z to the fragment shader, which then calculates the actual fog falloff as part of the fog macro called there. So we can just put the usual macro with UNITY_APPLY_FOG(clipPos.z, col); at the end of our fragment shader for the forward passes.

For mobile the falloff is calculated in the vertex shader. But we need to use the clipPos.z we calculated in the fragment shader, so we can’t just use the usual UNITY_APPLY_FOG(clipPos.z, col) macro if you want to support both mobile and desktop. So we have to calculate the falloff and pass that to the macro, but only when on mobile.

// fog
float fogCoord = clipPos.z;
#if (SHADER_TARGET < 30) || defined(SHADER_API_MOBILE)
// macro calculates fog falloff
// and creates a unityFogFactor variable to hold it
UNITY_CALC_FOG_FACTOR(fogCoord);
fogCoord = unityFogFactor;
#endif
UNITY_APPLY_FOG(fogCoord, col);

Instancing

To add instancing to the shader, copy and paste the appropriate macros mentioned on Unity’s own documenation for this:

Go to the section Adding instancing to vertex and fragment Shaders and copy the macros into the appdata &v2f structs, vertex function and fragment functions. Ignore the BUFFER andPROP macros. But yes you do need the UNITY_SETUP_INSTANCE_ID(i); in the fragment shaders. In an instanced shader, the unity_ObjectToWorld and unity_WorldToObject matrices are instanced properties. And since we’re using those in the fragment shader, we need the instance id there too.

The Finished Shader

Without futher ado, here’s the finished shader in its entirety.
(direct link to gist.)

Additional Thoughts

Surface Shaders & Shader Graph

Because I know the next question everyone is going to ask is going to be some form of “how do I do this in Surface Shaders / Shader Graph?” Here’s the answer to those question.

You can’t.*

Well, you can construct the ray origin and direction. You can do the raytracing of the sphere. You can do all of the procedural UV stuff too, of course. You can even update the surface normals so it gets lit like a sphere.

The one thing you can’t do is adjust the depth or world position used for lighting and shadows from the fragment shader. So depth intersections will look wrong, shadows will look wrong, and lights really close to the surface won’t look correct either. Because they’ll all be using the original mesh surface’s position.

So the only option for using this technique in any of Unity’s renderers is to use hand written vertex fragment shaders. For the time being at least. I hope one day you’ll be able to output a modified depth value in Shader Graph. But as of writing this, they’ve made no mention of looking to add this feature.

* It’s been pointed out that Shader Graph for the HDRP does have the ability to set a depth on the master node to do per fragment depth. It is using SV_Depth instead of SV_DepthLessEqual though, so there’s no need to do the ray direction offset of the quad. Thanks to Rémy for reminding me. Hopefully they add this feature to the URP at some point as well.

Anti-Aliasing

So many of my other articles are about anti-aliasing, why did I skip that here? Because it’s a difficult problem that has no perfect solution.

Inigo Quiles has an excellent example of how to handle anti-aliasing of ray traced spheres here:

The basic idea is to use a ray to point distance calculation (which is also being used to fix the UVs on the outer edge) to get an approximation of how close to the edge of the sphere the ray is in screen space. That can give you an gradient that can be sharpened using a function like that used in my Alpha to Coverage article and then use that as the output alpha. Could also be used for alpha blending in non-MSAA and non-opaque use cases.

4x MSAA with the original shader vs using Alpha to Coverage.
// add this to the pass outside of the CGPROGRAM to enable
// alpha to coverage
AlphaToMask On
// ray to sphere pivot distance
float rayToPointDist = length(rayDir * dot(rayDir, -rayOrigin) + rayOrigin);
// fwidth gets the sum of the ddx & ddy partial derivatives
// float fDist = fwidth(rayToPointDist);
// fwidth is a coarse approximation of this
float fDist = length(float2(ddx(rayToPointDist), ddy(rayToPointDist)));
// sharpen ray to point distance
// centered on sphere radius, +/- half a pixel based on derivatives
float alpha = (0.5 - rayToPointDist) / max(fDist, 0.0001) + 0.5;
// clip based on sharpened alpha
// don't clip based on ray hit miss
clip(alpha);
// clamp alpha to a 0 to 1 range and apply to output alpha after
// sampling the texture
col.a = saturate(alpha);

That seems like it should be good enough, right? So why did I say there’s no perfect solution? And why didn’t I implement that by default? Anti-aliasing the outside edge doesn’t solve aliasing on intersections with rasterized meshes or other shaders that output depth from the fragment shader. When rasterizing a triangle with MSAA enabled the depth is calculated for each subsample the triangle covers, but the fragment shader is only run once per pixel. This means the per subsample coverage of two intersecting meshes can be accurately determined down to the subsample count. This shader is writing the depth from the fragment shader, so there’s only a single depth per pixel. The same depth is then use for all of the subsamples. Thus no AA on intersections. Technically there is still some AA on intersections between rasterized geometry and fragment shaders that output depth, as the plane of the intersecting triangle is taken into account. But between two depth writing shaders there will be none.

4x MSAA with the original shader vs using Alpha to Coverage. Note intersections are identical with both approaches. Rasterized surfaces that are aligned with the view plane show aliasing on intersections with impostor. Rasterized surfaces viewed at an angle show anti-aliasing, but it is equivalent to intersecting a view plane aligned surface.

The Shadertoy example above can handle intersections just because it is rendering all of those spheres in one pass and doing per pixel sorting and compositing of the analytical shapes. It’s not even doing any MSAA.

There’s no efficient way I’m aware of to handle fragment shader depth writes with MSAA enabled while still only running the fragment shader once per pixel. That leaves using the sample interpolation modifier to force the fragment shader to be run for every subsample. Not exactly great for performance when the whole point of MSAA was to not do that. It does look nice though.

4x MSAA with the original shader vs shader forced to render per subsample.
4x MSAA with the original shader vs shader forced to render per subsample. Note intersections for the super sampled case are all properly anti-aliased.
// update the v2f struct to use the sample modifier for the
// interpolated ray dir and origin vectors to force fragment
// shader to run for each subsample and for the interpolated
// values to get computed uniquely for each subsample position
struct v2f
{
float4 pos : SV_POSITION;
sample float3 rayDir : TEXCOORD0;
sample float3 rayOrigin : TEXCOORD1;
UNITY_VERTEX_INPUT_INSTANCE_ID
};
// add this inside the CGPROGRAM blocks for the passes as the
// sample modifier is a Shader Model 5.0 feature
#pragma target 5.0
// and you may want to bias the texture mip level
// because why not if we're already super sampling!
half4 col = tex2Dbias(_MainTex, float4(uv, 0, -1));
4x MSAA with Alpha to Coverage vs shader forced to render per subsample.
4x MSAA with original shader vs Alpha to Coverage vs Forced Super Sampling intersection comparison.

Deferred Rendering

I didn’t include a deferred rendering pass in the example shader. There’s no reason why this wouldn’t work with deferred though. It’d be a lot easier of a shader to write even. But I was trying to keep the shader as straight forward as possible.

--

--

Ben Golus

Tech Artist & Graphics Programmer lately focused on Unity VR game dev. https://ko-fi.com/bgolus