- render multi-sampled G-buffers
- perform lighting ops on ordinary buffer, while calculate lighting for several sub-samples and average them.
- continue as usual
From quality perspective...that's seams to be really good, because the lighting is what causes some aliasing by itself. In reality because of the nature of HW-MSAA we are just wasting performance here, because we often run identical calculations on fully identical subsamples. DX11 will help with that, but until then...you can only reduce the (unneeded) performance impact by sharing(or distributing) some computations between sub-samples. For example, the simple trick: when you are doing some kind of PCF, instead of sampling shadow-map 16 times for each sub-sample, you can sample only 4 times for each sub-sample, but each sub-sample uses it's own subset of the original 16 samples. With carefull selection of sampling kernel, it looks almost identical to fully super-sampled lighting.
How to utilize MSAA on DX9 hardware?
OK, everybody understands that while working in the MSAA mode hardware still executes pixel shader only once per final pixel, although it calculates specified number of Z-samples attached to that "colour" fragment and sends all that down to the ROPs/render-backend. So, in most cases the final picture consists of exactly the same data as non-AA picture, although some pixels are "blended" with their neibourhoods (actually, that's not the case, but you've got the idea).
The trick here is to find which fragments have contributed to the current fragment and by what amount. I found two ways that looks "convicing", here is the pipeline:
- render everything (including lighting) as usual including the final shading/combine pass, but not including the post-processing stages.
- re-render all the on-screen geometry to MSAA-RT while projecting and sampling the texture from stage #1 using either a) the cheap way - use the direction obtained from the difference of interpolated texture-coordinate and the same texture-coordinate with "_centroid" modifier applied, or b) interpolate fragment depth as well and search closest Z in the nearby samples
- resolve your MSAA-RT and continue as usual
Now the downsides:
2.a. suffers from the hardware trick/cheat/bug all IHVs I know have implemented. When your pixel/fragment actually covers just two sub-samples the centroid modifier will correctly give you location between whose two. Obviously one covered sub-sample is correct as well. But when your fragment covers three sub-samples out of four, they give us not the actual centroid position, but center instead! Shame on them!
Another downside is polygon intersections. In that places the algorithms behaves exactly as nVidia CSAA, and you'll got little-to-no AA there.
2.b. suffers from precision, for it to work correctly the depth values from nearby samples should be really precise. FP16 doesn't work. FP32 / D24X8 works, but you still have to slightly bias/scale difference towards current/center sample to avoid a lot of artifacts of incorrectly selected samples. A little side note: all IHVs provide us with some way to directly read depth-buffer under DX9, although that's "cheat" territory of graphics programming.
And another downside of 2.b.: performance. For 4X-AA you'll have to take 5 depth-samples + 1 final color sample for each fragment + some math - and that is not free either.
Conclusion:
I don't really know why does everybody want to use hardware MSAA for anti-aliasing aside of performance. The only good looking anti-aliasing filter is super-sampling with jittered/rotated sub-samples.
But on DX9 HW (or X360) the method 2.a. is relatively cheap and looks good. 2.b. looks better but is too slow for some hardware (X360 for example). For PS3 its probably faster to use the DX10-style 2X-AA, because the cost of re-rendering the whole visible geometry again would be too high for it.
Anyway, hardware-MSAA is just one method to approximate anti-aliased image out of thousands others.
Stay tuned :)