Mipmapping is ubiquitous in real time rendering, but has limitations. This article is going to go into what mipmapping is, how it’s used, and how it can be made better.
Warning: this page has about 72MB of Gifs! Medium also has a tendency to not load them properly, so if there’s a large gap or overly blurry image, try reloading the page. Ad blockers may cause problems, no script may make it better. It’s also broken in the official Medium app. Sorry, Medium is just weird.
This article is Unity and MSAA / no-FSAA focused. If you’re intending to use some form of post transparency TAA, this article may not be for you.
Inscrutable Text
Okay, story time! On one of the first VR projects I worked on, Wayward Sky, we ran into a curious problem. We had a sign in the game with a character’s name written across it. Not an unusual thing in itself. The problem was the name was almost completely illegible when playing the game.
The first guess by the artist was it was a texture resolution issue, but increasing that didn’t change anything. Moving the camera closer solved the problem, but that wasn’t an option for this particular case. The camera needed to be where it was and the rest of the area couldn’t be significantly changed either. But it proved the base texture resolution wasn’t the issue.
The second guess was it was the display resolution itself. This was an early PSVR title which has a lower panel resolution than the competition. We also weren’t rendering at a very high resolution scale compared to the recommended target. However when the artist disabled the mipmaps the text was legible. It was mip mapping, and not the display or rendering resolution!
What was happening was the texture was on a surface just far enough away and slightly turned away from the camera such that it got dropped to too low of a mip level. The result was a blurred mess that didn’t even appear to be text.
Some of you probably figured out this was the problem in the first paragraph. I mean, it’s in the title of the article. Mipmapping causing blurring a common issue. It is magnified by resolution limited situations like VR, or even mobile and home consoles. Disabling mip mapping on the texture is one common “solution” you’ll find online for this problem. But this is a bad solution! Disabling mip mapping means the texture gets extremely aliased (especially bad for VR), and there are performance implications with having the full resolution texture rendering at all times. Ask anyone tasked with optimizing game performance about disabling mip mapping and they’ll have some “fun” stories to tell …
Ultimately we solved the issue with a small amount of mip LOD biasing, and forcing anisotropic filtering on for that texture. But while it was now clearer, it aliased a little.
It was good enough for the project which was close to shipping. For a bit of text that only showed up for a few moments it wasn’t worth the time to investigate solutions more. But it still bugged me.
Cut to our next game, Dino Frontier. This is a light sim game with a lot of in world UI, text, and icons. Sure enough the same issue of blurred images immediately appeared as we started implementing the UI and trying it out. The first solution the team came up with was to scale the UI up to much larger sizes to ensure text and icons remained legible. For some UI elements this meant making them so large as to be abusive to the player’s comfort.
I decided to take the time to solve the issue properly. And here it is:
Don’t worry, I’ll explain all that stuff in the little text under the image.
MIP Mapping
Multum in Parvo
First a bit of discussion about what mip mapping is, and why it’s a good thing. It exists to solve the problem of texture aliasing during minification. In simpler terms, reduce flickering, jaggies, and complete loss of some image details when an image is displayed smaller than its original resolution.
A GPU renders a texture by sampling it once at the center of each screen pixel. Since the texture is only being sampled once at each pixel, when the texture is being shown at a higher resolution than there are screen pixels, some details won’t be “seen”. This results in parts of the image effectively disappearing.
Now it’s possible to figure out how many texels are within the bounds of each pixel, sample all of those, and then output the average color. This would be a form of Supersampling. Offline rendering tools often do this, and it yields extremely high quality results. But it is also stupendously expensive. The below image which I’ll be using as a ground truth reference was generated using 6400 anisotropic samples per pixel.
If rendered at 1920x1080 on an Nvidia GTX 970, this scene takes roughly 350 ms on the GPU (as measured with Nvidia Nsight), or about a third of a second per frame. And even this isn’t quite perfect. Really I’d need to use closer to 15,000 samples per pixel, but that brings the rendering time to multiple seconds per frame. That is the kind of quality that we’re aiming for, and it’s way too expensive for real time. So what if we skip that and just sample the full resolution texture once per pixel? It can’t be that bad, right?
Ouch.
Lines are popping in and out of existence in the foreground, and in the background it’s just a noisy mess. It would be even worse if the texture had more than just some white lines. This is what we mean when we say aliasing in the context of image sampling.
Mip mapping seeks to avoid both the problems of aliasing and the cost of Supersampling by prefiltering the image with multiple successively half resolution versions of the previous sized image. These are the mipmaps. Each image’s pixel is the average of the 4 pixels of the larger mip level. The idea is that as long as you pick the correct mip level, all of the original image details are accounted for. This successive half sizing is also why power of 2 texture sizes are a thing; it’s harder to make a mipmap for when a half size might not end up with a whole number value for the resolution.
The other benefit of mip mapping is memory bandwidth usage. While the overall texture memory usage is now roughly 33% larger due to the inclusion of the mip maps, when the GPU goes to read the texture, it only has to read the data from the appropriate mip level. If you’ve got a very large texture that’s being displayed very small, the GPU doesn’t have to read the full texture, only the smaller mip, reducing the amount of data being passed around. This is especially important on mobile devices where memory bandwidth is in very short supply. Technically a GPU never really loads the full resolution version of the texture at once out of its memory. Instead only a small chunk of it depending on what’s needed by the shader. The full resolution textures exists in the GPU’s main memory, and when a shader tries to sample from that texture the GPU pulls a small section of that texture into the L1 Cache for the TMU (Texture Mapping Unit, the physical part of the GPU that does the work of sampling textures) to read from. If multiple pixels all sample from a small region, then the GPU doesn’t have to fetch another part of the texture later and can reuse the one chunk already loaded. That’s what saves memory bandwidth.
So now, all a GPU has to do to prevent aliasing is pick the best mip level to display. It does this by calculating the expected texel to pixel ratio for both the horizontal and vertical screen axis at each pixel¹, and then using the largest ratio picks the closest mip level that would keep it as close to 1 texel to 1 pixel as possible.
The below image is a 256x256 texture with custom colored mipmaps being rendered at 256x256.
The only time the full 256x256 top mip is sampled is when the quad is very close to the camera. Notice the floor is never showing the top mip. If this was rendered at a higher resolution, or the texture was a lower resolution, the top mip would continue to be shown until further away. Again, this is due to that 1:1 texel to pixel ratio the mip level calculations are trying to achieve. Those game analysis videos online that complain about texture filtering quality being reduced when they lower the screen resolution or a game uses dynamic resolution … the filtering quality is exactly the same, it’s just the resolution ratio that’s changing.
Isotropic Filtering
Lets go over the basics of texture filtering. Really there’s two main kinds of texture filtering, point and linear. There’s also anisotropic, but we’ll come back to that. When sampling a texture you can tell the GPU what filter to use for the “MinMag” filter and the “Mip” filter.
The “MinMag” filter is how to handle blending between texels themselves. Point sampling chooses the closest texel to the position being sampled and returns that color. Linear finds the 4 closest texels and returns a bilinear interpolation of the four colors.
The “Mip” filter determines how to blend between mip levels. Point simply picks the closest mip level and uses that. Linear blends between the colors of the two closest mip levels.
Most people reading this are likely familiar with Point, Bilinear, and Trilinear filtering. Point filtering is using a point filter for both MinMag and Mip. Bilinear is using a linear filter for MinMag, and a point filter for Mip, as you probably guessed.
As you can see, Bilinear shows clear jumps between the mip levels, both on the floor and as the quad moves forward and back. The point at which the changes happen is important, as the texture isn’t fully scaled down to the next mip size, but roughly 40% larger. This leads to the changes not only being abrupt, but for the texture to be obviously blurry when the change occurs.
Trilinear is the same linear filter for MinMag as Bilinear filtering with the addition of a linear filter for Mip, hence the name. This hides the harsh transitions between mip levels, but the blurring still remains as the next mip still starts being faded in early.
This is the downside of mip mapping. Mip maps only accurately represent the image at exactly those half sizes. In between those perfectly scaled mip levels, the GPU has to pick which mip level to display, or blend between two levels. But remember, mip mapping’s goals are to reduce aliasing and rendering cost. It has technically achieved that, even if it’s at the expense of clarity.
The most obvious issue is how blurry the floor gets. This is because mipmaps are isotropic. In simple terms, each mipmap is scaled down uniformly in both the horizontal and vertical axis, and thus can only accurately reproduce a uniformly scaled surface. This works well enough for camera facing surfaces, like the rotating quad in the examples. But when viewing a surface that isn’t perfectly facing the camera, like the ground plane, one axis of the displayed texture is scaling down faster than the other in screen space. The ground plane has non-uniform, or anisotropic, scaling. Mipmaps alone don’t handle that case well. As I mentioned above, GPUs pick the mip level based on the worst case as the alternative would cause aliasing.
Anisotropic Filtering
Anisotropic filtering exists to try to get around the blurry ground problem. Roughly speaking it works by using the mip level of the smaller texel to pixel ratio, then sampling the texture multiple times along the non-uniform scale’s orientation. The mip level used is still limited by the number of samples allowed, so low Anisotropic levels will still become blurry in the distance to prevent aliasing.
However the overall result is much sharper ground textures even at lower settings. To my eyes, 4x or 8x are good options for improving the sharpness over a significant portion of the ground at this viewing angle and resolution.
Looks pretty good, right? So I guess we’re done? Well, not so fast. Let's compare back to the “ground truth”.
Notice how much sharper it is still? Especially the quad? No? Wait, I know it’s hard to compare those two when they’re not next to each other. How about this.
Anisotropic filtering helps a lot with textures that are angled away from the camera, but directly facing is actually no different than Trilinear! Even the near ground plane isn’t quite right. Look closely at that center line and you’ll see there’s a little bit of blurring there still with Anisotropic filtering.²
So now what?
Going Sharper-ish
The Modest Proposal
So now we know that mip mapping can cause a texture to be displayed a little blurry, even with the best texture filtering enabled. The “obvious solution” is to disable mip mapping all together. We already saw how effective that is. But this is often the “solution” artist are likely to end up with, mainly because it may be the only option the have immediately available to them. For small amounts of scaling, this is actually quite effective. The problem is of course past that it quickly goes south. It also usually results in some programmer shouting at said artist some time later when it comes time to do performance optimizations. I’ve shown this to you before near the start of the article as the “just sample the full resolution image” example. To remind you of what that looks like…
So, again, don’t do this, unless you’re doing a pixel art game where you know for sure things aren’t going to be scaled, or are trying to capture the look of PS1 or PS2 game.
Leveraging Conservative Bias
A better solution is to use mip biasing. Mip biasing tells the GPU to adjust what mip level to use one way or another. For example a mip bias of -0.5 pushes the mip changes slightly further away. Specifically it would move the transition back half way between the transition points it would use normally. A mip bias of -1 would push the mip level one full mip back, so where the GPU would originally be displaying mip level N, it’s now N-1. For some engines you can set the bias on the texture settings directly, and no modifications to shaders are needed. But direct texture asset mip biasing isn’t available on all platforms, for example it’s not supported on Apple devices running Metal, so custom shaders may be needed depending on your target platform(s). It can also be annoying to tweak as it’s not value exposed in the Unity editor by default. And the LOD Bias setting in Unreal Engine 4 isn’t the same thing. Luckily biasing in a shader is easy to add, and well supported across a wide range of hardware.
https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-tex2dbias
half4 col = tex2Dbias(_MainTex, float4(i.uv.xy, 0.0, _Bias));
Alternatively you can get the same bias using tex2Dgrad()
by multiplying the UV derivatives by 2^bias. This is useful for situations where you don’t have access to a tex2Dbias()
function, like Unreal Engine 4’s node based materials.
https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-tex2dgrad
// per pixel screen space partial derivatives
float2 dx = ddx(i.uv);
float2 dy = ddy(i.uv);// bias scale
float bias = pow(2, _Bias);half4 col = tex2Dgrad(_MainTex, i.uv.xy, dx * bias, dy * bias);
But biasing introduces some aliasing again. Notice the horizontal lines on the floor blinking and the diagonals looking like dashed lines. This is especially obvious with a bias of -1.0, so it’s not a perfect solution. As we push the mip levels to change later we’re missing entire texels again. That’s the whole reason why GPUs swap between mip levels when they do. Also this does increase memory bandwidth usage slightly, but significantly less than not using mip mapping at all. The overall impact on performance is likely minimal to non-existent if only used on problem textures.
I’ve shipped games using this technique, and it’s totally fine in a pinch. If your game is using some form of temporal anti-aliasing, this is actually recommended! To the best of my knowledge both Unreal Engine 4 and Unity’s new High Definition Render Pipeline do this!
The Super Solution
So how do we increase the image clarity without introduce aliasing? Well, we might look to other anti-aliasing techniques. Specifically back at Super-Sample Anti-Aliasing (SSAA), and Multi-Sample Anti-Aliasing (MSAA).
Supersampling
Supersampling for rendering means rendering or “sampling” at a higher resolution than the screen displays, then averaging those values. This step of averaging the values is called downsampling. Since we’re sampling an existing texture there’s no higher resolution rendering, just sampling the texture multiple times. As mentioned earlier, the “ground truth” example was rendered using Supersampling with a crazy high sample count.
But that downsampling is a big part of what mip mapping was created to avoid! Doing a downsample from 2x or even 3x the resolution isn’t too bad, but the number of samples required increases quadratically as the texel to screen pixel ratio increases. A 256x256 texture that’s being displayed on screen at 16x16 pixels is a 16:1 ratio. Downsampling in a way that wouldn’t introduce aliasing would require at least 64 bilinear samples of the texture (256 samples if point sampling)! This is why the ground truth example needed so many samples, because it had to handle that 256x256 texture getting down to 1 pixel. Luckily we’re not going to need that many samples, and GPUs have gotten really fast.
We can use mip biasing, anisotropic filtering, and Supersample the texture with only a few samples to remove the added aliasing introduced by the bias. This is the best of both world as we’re still using mip mapping and getting the benefits of its reduced aliasing through prefiltering and memory bandwidth usage. Plus if you sample one texture multiple times within only a small pixel range, it’s actually quite cheap. Like I mentioned above, GPUs load textures in chunks. So if a single pixel samples a texture multiple times in a small area, it might not need to load a new chunk making successive samples much faster. It’s not free, it’s just not as expensive as the initial texture sample alone.
So now we need to sample the texture multiple times with an offset on each sample and average the results. For this we’re going to use a simple 2x2 screen aligned grid, also called Ordered Grid Super-Sampling (OGSS). The offset needs to be small, ideally roughly within the bounds of the pixel we’re rendering. Luckily my old friend the per pixel screen space partial derivatives are back to rescue us! The derivatives of the texture UVs will give us both the offset magnitude and direction between one pixel and the next in the form of a vector.
// per pixel screen space partial derivatives
float2 dx = ddx(i.uv.xy) * 0.25; // horizontal offset
float2 dy = ddy(i.uv.xy) * 0.25; // vertical offset// supersampled 2x2 ordered grid
half4 col = 0;
col += tex2Dbias(_MainTex, float4(i.uv.xy + dx + dy, 0.0, _Bias));
col += tex2Dbias(_MainTex, float4(i.uv.xy - dx + dy, 0.0, _Bias));
col += tex2Dbias(_MainTex, float4(i.uv.xy + dx - dy, 0.0, _Bias));
col += tex2Dbias(_MainTex, float4(i.uv.xy - dx - dy, 0.0, _Bias));
col *= 0.25;
The result is a slight increase in the shader cost. But the aliasing introduced by the mip biasing is greatly removed. How much cost? That will depend on the use case. For a similar scene as the above with the ground mirrored as a ceiling, running at 1920x1080 on an Nvidia GTX 970, it increases the overall frame time by about 0.15 ms to 0.5~0.6 ms per frame compared to anisotropic filtering alone. That’s a significant percentage increase due to how little is being rendered, but in a more complex scene it should only add a similar total ms increase.
Multi Sample Anti-Aliasing
MSAA, or more specifically 4x MSAA, has another trick. MSAA is similar to Supersampling in that it’s rendering at a higher resolution than the screen can necessarily display, but different in what it renders at a higher resolution isn’t necessarily the scene color, but rather the scene depth. The difference is inconsequential for this topic, and I’ve gone into more detail in another post, Anti-aliased Alpha Test, so we’ll skip that for now. What is important is 4x MSAA uses a Rotated Grid pattern, sometimes called 4 rooks.
The benefit of this pattern is vertical and horizontal lines get 4 evenly spaced positions to be sampled on as they sweep through the pixel, increasing the granularity of anti-aliasing for those kinds of edges. This means higher quality lines for the type that are usually the most affected by aliasing.
// per pixel partial derivatives
float2 dx = ddx(i.uv.xy);
float2 dy = ddy(i.uv.xy);// rotated grid uv offsets
float2 uvOffsets = float2(0.125, 0.375);
float4 offsetUV = float4(0.0, 0.0, 0.0, _Bias);// supersampled using 2x2 rotated grid
half4 col = 0;
offsetUV.xy = i.uv.xy + uvOffsets.x * dx + uvOffsets.y * dy;
col += tex2Dbias(_MainTex, offsetUV);
offsetUV.xy = i.uv.xy - uvOffsets.x * dx - uvOffsets.y * dy;
col += tex2Dbias(_MainTex, offsetUV);
offsetUV.xy = i.uv.xy + uvOffsets.y * dx - uvOffsets.x * dy;
col += tex2Dbias(_MainTex, offsetUV);
offsetUV.xy = i.uv.xy - uvOffsets.y * dx + uvOffsets.x * dy;
col += tex2Dbias(_MainTex, offsetUV);
col *= 0.25;
The difference in quality between the Ordered Grid and Rotated Grid Supersampling in this case is very minor. It can be seen in the minor amount of aliasing reduction on the ground plane’s horizontal lines, and slightly worse aliasing on a few diagonal lines. You can read up on the benefits and limitations of the two techniques here: Super-sampling Anti-aliasing Analyzed
Performance wise, the difference isn’t even measurable comparing OGSS and RGSS. So this is an essentially free upgrade over the previous technique.
But the main test is comparing 2x2 Rotated Grid Super-Sampling to the “Ground Truth” image from before.
Not exactly the same, but extremely close. Certainly far better than Anisotropic filtering alone. The ground plane is also now slightly higher quality than even 16x Anisotropic filtering when only using 8x due to the addition of Supersampling.
Closing Thoughts
And there we go! We’ve solved the problem! For a modest increase in shader cost we have near ground truth quality texture filtering! This means noticeably clearer text and images with little to no aliasing at significantly less performance impact than the ground truth.
Caveats & Additional Thoughts
Performance and Limitations
Q: Since this is so much better, shouldn’t we be using this everywhere on everything?!
A: Probably not.
On modern desktop and console GPUs from the last 10 years the cost of Anisotropic filtering over Bilinear or Trilinear is negligible. But this isn’t necessarily true on mobile GPUs. For mobile VR especially just the cost of using Anisotropic filtering everything may be enough to make hitting the frame time targets difficult. The technique still works without Anisotropic filtering, or even Trilinear filtering, especially on camera facing geometry, but won’t be as effective on surfaces where Anisotropic filtering was helping keep textures sharp so keep that in mind.
On some devices or scenes the additional texture samples may also be a far more significant performance hit than my very basic example scene and desktop GPU. I suspect sampling the texture 4 times on mobile GPUs may be an ever bigger impact than enabling Anisotropic filtering on a single sample. On desktop using this in a texture that’s already sampling a high number of textures may also lead to this becoming a more significant bottleneck even there.
Also, for 90% of textures just a small amount of Anisotropic filtering, like 2x or 4x, is likely enough to be to make them look sharp. This is best used on extreme cases where you really need something to fully utilize the available screen resolution, like important text or gameplay icons. Just know that on modern desktop GPUs the cost from going from Trilinear to Anisotropic 2x is nearly identical to going from Trilinear to 16x, so there’s little reason not to max it out if you’re going to turn it on. This isn’t true for mobile and 2x vs 16x may have a significant performance difference.
I guess my advice is try out different settings.
Also, it’s important to know this really only works properly on unlit / albedo textures. For normal maps and your usual cadre of PBR textures, the average value of the texture alone is significantly different than true Supersampling of the lighting calculations. I really only use it on problem cases like text and icons where clarity is important.
Colored Mipmaps
Q: Those colored mipmaps were fun! Can we see what that looks like for Trilinear and Anisotropic?
A: Glad you liked them, and sure. Here’s another 20 MB of gifs.
Giffy
Q: Why are you using gifs?! You should be using videos instead, they’re waaay better!
A: As strange as it might seem, I’m using gifs on purpose. For one Medium doesn’t have an option for directly uploading videos. It’s also not possible to have embedded videos loop. But those are not the real reason. The real reason is compression.
This topic is about very minute differences in image quality that doesn’t always translate in a video file. Those fine details get blurred away by compression artifacts. As long as I stick to grayscale, a gif can represent the animations exactly. It’s also why all the images and animations are at 200% scale, to help highlight those subtle differences.
Of course high quality video compression can get pretty close, but again, I can’t embed a video file directly. And I can’t guarantee the video file won’t get rescaled and/or re-compressed by an external service.
Taking Credit
Q: This is so awesome, you should totally patent it / name it after yourself!
A: Nothing presented here is new. Similar if not identical techniques have been used by other games for years. In fact the idea for this originally came from seeing Emil Persson’s example project Selective Supersampling from 2006. I’m certain offline rendering uses something very similar to this technique.
Q: You’re saying you stole this from someone else?
A: Yes.
The real time graphics community is generally a very open and sharing one. I wrote this to share my thoughts and help educate, not to claim credit.
Signed Distance Fields
Q: What about SDF Font rendering?
A: Yes! For Dino Frontier and Falcon Age I explicitly used this technique on SDF Fonts. The main thing to know is you should be doing the edge calculation on each sample and average those. Don’t average the raw SDF and then do an edge calculation. If you’re using a soft edge in the original calculation, you’ll also want to increase the sharpness of the edge otherwise you’ll just be blurring the text.
MORE SAMPLES
Q: If 4 samples are good, more must be better!
A: Of course, but the idea here was to present an option that required the smallest amount of performance cost for the benefit. The 2x2 RGSS can certainly exhibit some degree of moire on high details that a higher sample count would reduce. I’ve tested an 8 sample Halton sequence that has clear advantages to the 2x2 RGSS in extreme cases, but the extra samples come with a significantly higher performance penalty. Going from 4 to 8 samples increases the cost over Anisotropic from the 0.15ms to 1ms per frame!
Texture Compression
Q: Will this help make compressed textures look sharper?
A: Yes and no. You’re getting the benefit of sampling from a higher mip level, which means the artifacts of texture compression on mipmap quality is less impactful, but it’s still there. I try to use single channel uncompressed textures for text and icons when possible to avoid this when it’s a problem. The blurry text examples are actually using DXT1 textures, because the visual difference was imperceptible for that particular texture. All of the grid textures are uncompressed because on those it did matter.
Gamma Color Space
All the above example images are produced using Linear color space. The reasons for this are due to the fact that averaging of the samples needs to be done in linear space. If you’re using Gamma color space, like for mobile, know that your textures may become overly dim. Even if you convert the samples from gamma to linear space in the shader, bilinear filtering itself is in the wrong color space and will lead to some amount of dimming that is not possible to correct for.
That’s not to say you can’t use it. The technique still yields similar quality improvements on edges. Just your textures may look slightly darker in areas of high contrast & detail. You may not even notice anything wrong.
See Gamma error in picture scaling for more information on the topic. The short version is the average of the colors white [255,255,255] and black [0,0,0] is [188,188,188], not [127,127,127]. No, that’s not a typo.
Kaiser-filtered Mipmaps
Another common technique for improving the sharpness of mip mapping is to use a Lanczos, Kaiser, or some similar wider sinc based kernel & sharpening when generating the mipmaps themselves. Certainly Photoshop does something like this by default when downscaling images. And both Unity and Unreal, as well as many other mip mapping utilities, have options for enabling Kaiser or some other sharpening technique. These work by increasing the contrast of the mipmaps to help retain important details. You can use this in combination with the presented Supersampling technique, but I personally find its use unnecessary, and potentially makes things worse. Absolutely enable it on textures that aren’t using Supersampling and see if you like the effect.
One thing Kaiser is better at than the usual bilinear mipmaps is shape preservation. If you look at the mipmap level animation I presented above, you’ll notice that the thinner lines seem to grown “in” as the mip level increases. This is due to mipmap generation only looking at those 4 pixels from the previous mip level. Kaiser and similar downscaling algorithms look at a wider area, and are better at keeping those details appearing where you would expect them to without “growing” as much. On the other hand they can cause weird dots or lines to appear in the mipmaps that don’t exist in the original image at all. And thin lines can end up brighter than thicker lines with Kaiser!
Compare these examples being Kaiser filter mipmaps and Box filter mipmaps (the default for basically all real time applications). Note that the lines for the center cross are 2 pixel wide in the full resolution texture, and the rest are 1 pixel wide.
edit: It has been pointed out to me that Unity’s Kaiser filtering appears to be incorrectly implemented with each mipmap being derived from the previous mip when they should all be derived from the top mip level. This would explain the extreme contrast in the later mip levels as the sharpening is being applied to itself over and over. Properly implemented sinc filters are an expensive operation to do correctly, so this may have been done as an “optimization”, or just an accident. I’ve updated the comparison images to use mipmaps generated with NVIDIA Texture Tools. Used Kaiser filter & 0.4545 Gamma correction.
Update: See further below for an update on this.
Note Nvidia’s Kaiser filtering actually makes the lines blurrier overall! Especially in the Anisotropic example. But this texture is an extreme case. A texture with less initial contrast should end up clearer overall. Also note that with RGSS the differences are much less noticeable, though Nvidia’s Kaiser mipmaps are slightly smoother.
For more information on the topic, you can read up on it in these two posts by Jonathan Blow: Mipmapping, Part 1 and Mipmapping, Part 2.
e̵d̵i̵t̵ ̵#̵2̵:̵ ̵U̵n̵i̵t̵y̵’̵s̵ ̵K̵a̵i̵s̵e̵r̵ ̵f̵i̵l̵t̵e̵r̵ ̵w̵i̵l̵l̵ ̵a̵p̵p̵a̵r̵e̵n̵t̵l̵y̵ ̵b̵e̵ ̵f̵i̵x̵e̵d̵ ̵i̵n̵ ̵2̵0̵2̵0̵.̵1̵!̵
Update: Unity’s Kaiser filter has been updated for 2020.1, and it is indeed much better. The interior lines are now closer to Nvidia’s original Kaiser filter with the thicker center cross retaining the strength over the thinner interior lines. But the exterior edges are much more pronounced. However, since the original post, Nvidia has also updated their Texture Tools Exporter plugin for Photoshop, and that produces different results than either Unity 2020.1 or the legacy Nvidia Texture Tools. The first two below comparison images are still using the textures produced with the legacy version of the tool. This is in part because I want to keep the comparison true to the original. And in part because Unity had problems reading the dds files output by the new Nvidia exporter until I updated it to the latest version (2020.1.3 at the time of writing this).
This shows a comparison between the old and new Kaiser filters from both Unity and Nvidia, without RGSS. The Unity 2020.1 and Nvidia Texture Tools Exporter 2020.1.3 now produce similar outputs. Like mentioned above, both properly keep the center lines strengths correct, but also significantly accentuate the outer edge of the quad. Nvidia’s 2020.1.3 Kaiser filter also adds a significant amount of ringing that does not exist in any of the other implementations.
This last image shows a comparison between ground truth, the legacy Nvidia Texture Tools, and the latest Nvidia Texture Tools Exporter. This is to help highlight that the exterior line is way stronger compared to ground truth, though it may be considered more visually pleasing. Technically this is what Kaiser is intended to do, accentuate edge contrast to preserve details usually lost during minification. So it could be argued this is “correct”, but really there is no single correct implemenation of a Kaiser filter. They’re all just different parameterizations of a the Kaiser windowing function with slightly different goals. Ultimately this is somewhat outside the topic of this article, but was interesting.
Blurring at 1:1 Scaling
The above examples are all using a 256x256 texture and rendering to a 256x256 target. When the texture is “full screen” it’s perfectly sharp. If you use the code snippets I actually presented above, they will not be perfectly sharp but instead be slightly blurry. I correct for this in the shader I used by limiting the sample offsets by a mip level calculation:
// per pixel partial derivatives
float2 dx = ddx(i.uv);
float2 dy = ddy(i.uv);// manually calculate the per axis mip level, clamp to 0 to 1
// and use that to scale down the derivatives
dx *= saturate(
0.5 * log2(dot(dx * textureRes, dx * textureRes))
);
dy *= saturate(
0.5 * log2(dot(dy * textureRes, dy * textureRes))
);// rotated grid uv offsets
float2 uvOffsets = float2(0.125, 0.375);
float4 offsetUV = float4(0.0, 0.0, 0.0, _Bias);// supersampled using 2x2 rotated grid
half4 col = 0;
offsetUV.xy = i.uv.xy + uvOffsets.x * dx + uvOffsets.y * dy;
col += tex2Dbias(_MainTex, offsetUV);
offsetUV.xy = i.uv.xy - uvOffsets.x * dx - uvOffsets.y * dy;
col += tex2Dbias(_MainTex, offsetUV);
offsetUV.xy = i.uv.xy + uvOffsets.y * dx - uvOffsets.x * dy;
col += tex2Dbias(_MainTex, offsetUV);
offsetUV.xy = i.uv.xy - uvOffsets.y * dx + uvOffsets.x * dy;
col += tex2Dbias(_MainTex, offsetUV);
col *= 0.25;
For Unity users, the texture’s resolution can be gotten by adding float4 _TextureName_TexelSize;
to the shader. The .zw components are the texture’s resolution.
This will effectively disable Supersampling once the texture is being displayed at or larger than the original resolution. I do not actually use this in any of the games I’ve shipped as it’s so rare for textures to be displayed at exactly pixel snapped 1:1, which is the only time this issue comes into play. For 2D UI or games using 2D sprites this may be a more common problem. For most everything else it’s a non-issue. You could also scale the super sampling down when the mip level is perfectly sized, but this requires a little more work since GPUs don’t use exactly the same mip level calculation as what’s described in the reference.
Even Sharper
Q: It still looks kind of blurry to me. What can I do to make it sharper?
A: You can stick with just biasing, or you can try a slightly higher bias with RGSS. You might also try scaling the offset values a little up or down. Since I’ve been doing VR, reducing aliasing has been my primary concern. For your use case some aliasing for the benefit of some added sharpness may be better. Use whatever you find looks good to you.
Max Anisotropic Filtering Quality
Q: Why are all the images using Anisotropic Filtering set to 8x rather than 16x if the performance is the same?
A: Because that’s the default Unity uses when you set Anisotropic Filtering to Forced On in the Quality settings. I was manually setting the level on the texture, but I’m making the assumption 8x is the quality level most users will end up seeing / using. Also, as mentioned earlier, 16x may not actually be free over 8x on all platforms or scenes.
Q: What about the Anisotropic quality level comparison animation. Why only power of 2 values?
A: Because those are the most common quality levels that GPUs from the last decade support. Some GPUs support all even numbers between 2 and 16, some support only power of 2 values between 2 and 16. I think some have supported any number, and/or up to 32, but not anything recent that I know of.
Unity’s Shader Graph
Q: How do I implement this in Shader Graph?
A: As of right now, not easily. Shader Graph doesn’t have a Sample Texture 2D Bias node. The closest is the Sample Texture 2D LOD node, which uses the equivalent of tex2Dlod()
, but that requires the user calculate the mip level manually. It also disables Anisotropic filtering. You can use a Custom node and write the HLSL manually, but there’s currently no way to use the sampler state of the texture (what you control when setting the filter settings on a texture asset) in a Custom node. So you have to use a Sampler State node, and that means you again can’t use Anisotropic filtering since they don’t offer that option for that node. Hopefully at some point they’ll add a Sample Texture 2D Bias node, or the option of using a texture’s existing sample state. There’s no reason either of those couldn’t be added.
So the only way currently to do this with Shader Graph is to set the bias on the Texture itself and use the existing Sampler Texture 2D node. Workable, but won’t work on iOS or MacOS when using Metal.
If you’re doing a 2D game and aren’t doing too much non-uniform scaling the lack of Anisotropic filtering may be a non-issue for you. In that case I’d say use a Custom node.
Temporal Jittering
For some small, but free, additional perceived visual quality you can jitter the rotated grid. For VR projects I pass the frame count to shaders and flip the x component of the uvOffsets
value. It’s not huge, and for some people it may make things just look like they’re vibrating, but on small text I found it could help with clarity.
Quad Antialiasing
Q: What anti-aliasing are you using to get the edges of the quad looking so nice?
A: None. I’m cheating and the quad is using a different texture than the ground plane. It is a 512x512 texture with only the middle 256x256 of the grid visible. I wanted to avoid using MSAA or any kind of full screen supersampling or post processing anti-aliasing that might affect other parts of the image quality. Plus it shows how effective it is at keeping sharp edges for sprites.
Ground Truth
As with most things, the definition of “ground truth” is fuzzier than you might expect. I mentioned above that the ground truth images I present are using 6400 anisotropic samples per pixel. This is actually a 1.5 pixel wide 80x80 ordered grid with the contribution of each pixel scaled by a smoothstep curve. What makes that ground truth? Why 1.5 pixels? Why 80x80? Why smoothstep?
First, it’s not really ground truth. I’m a fan of using scare quotes in my articles because they really are appropriate. Beyond my comment about needing more samples, it’s just chasing some version of “ground truth” as defined as “something that looks nice”. Perhaps defined as “aiming for high quality offline rendering”.
I’m using a 1.5 pixel window because this reduces aliasing. It’s closer to the kind of filtering movies use as it makes a much smoother image. Using a Gaussian or Catmull-Rom curve helps preserve some sharpness by giving the centered samples greater impact than those furthest away, helping with perceived sharpness. I’m calculating a smoothstep individually along the x and y axis and multiplying them together which is a cheap approximation of a gaussian.
80x80? Really that comes down to going higher than that was causing Unity to freeze up due to the GPU time being so significant. I at one point used a 256x256 OGSS, but it took multiple seconds per frame, and eventually locked up my GPU.
Notes:
- ^This is actually calculated for each 2x2 pixel quad using screen space partial derivatives. GPUs always render the 4 pixels of each 2x2 group at the same time, and can compare the values calculated between them. By default, the values returned by derivative functions, like ddx and ddy, return the same value for all 4 pixels in the group as they’re only calculated using the first pixel in the group and the pixels to the side and below, leaving the last pixel’s values ignored. Using ddx_fine instead will get you the “real” per pixel derivatives, actually taking into account that last pixel, but they’re still limited to the derivatives within the pixel quad.
- ^As an aside, that small amount of blurring on the floor is actually due to modern GPUs using an approximation of Anisotropic filtering rather than the reference implementation. Almost every GPU family uses a slightly different approximation, with the actual implementation details being something of a trade secret. To be fair to them, the above texture is just about the worst case scenario, and with modern approximations there’s very little appreciable difference between what they’re doing and the “real thing”.