Ah, yeah. I forgot a step for OpenGL.
outDepth = clipPos.z / clipPos.w;
#if !defined(UNITY_REVERSED_Z)
outDepth = outDepth * 0.5 + 0.5;
#endif
Though removing depth write is a totally valid option if support isn't needed, especially on mobile platforms where it can have a noticeable performance impact.
However, depth writing isn't the only reason why I didn't use a Surface Shader.
It removes having to deal with Surface Shaders needing a tangent space normal to be output. It removes the need for the quad mesh having a normal or tangent space entirely, or needing to make sure the normal and tangent space are valid after rotating the mesh towards the camera. Removes an step needed to do a camera facing in a Surface Shaders as it only let you modify the local space vertex position so you have to transform the "final" world space position into local space adding two unnecessary matrix transforms. Removes the problem of "per vertex" and additional per pixel lights not working probably because they won't take into account the sphere's world position meaning the sphere is lit like the quad it is.
And just in general makes the whole shader much faster as it's not calculating or passing data it does not need that the Surface Shader will always calculate.
But I do know two ways to go about solving the issue. First option is this function in my example triplanar normal mapping Surface Shader example:
Second cheaper option is to override the vertex normal and tangent in the vertex function of the Surface Shader to always be object space axis. Something like this:
v.normal = float3(0,0,1);
v.tangent = float4(1,0,0,1);
Then instead of calculating the normal and world normal in the shader, you only need to do:
o.Normal = surfacePos;
And lastly, if your ambient lighting looks a little funky, force the ambient to be done per pixel.
#undef UNITY_SAMPLE_FULL_SH_PER_PIXEL
#define UNITY_SAMPLE_FULL_SH_PER_PIXEL 1