Follow up article: DirectX 12 Multi-GPU Technology Tested: GeForce and Radeon Paired Together Following AMD’s lead with Mantle, Microsoft aims to provide console-level efficiency with “closer to the metal” access to hardware resources as well as reduced CPU and graphics driver overhead. As was the case with Mantle, most of the DX12 performance improvements are achieved through low-level programming, which allows developers to use resources more efficiently and reduce single-threaded CPU bottlenecking caused by abstraction through higher level APIs. Finally, we have what looks to be a very accurate means of gauging DX12 performance to see just what it means for the future of PC gaming. Recently Stardock provided gamers with Steam Early Access to Ashes of the Singularity, one of the first games to use DX12. This new real-time strategy game has been developed with Oxide’s Nitrous engine and looks very much like Supreme Commander though the two games are in no way related. Supreme Commander became popular for its large scale battles that weren’t seen in other RTS games, though the mass of units often took its toll and gamers planning on big battles required the latest and greatest hardware, particularly on the CPU front.
As impressive as the Supreme Commander skirmishes were, Stardock doesn’t believe they could be described as battles due to a limited number of units on the screen. Ashes of the Singularity takes things to the next level thanks to advancements in multi-core processors, low-level APIs, 64-bit computing and the invention of a new type of 3D engine. The game could be described as a war across an entire world without abstraction. Thousands or even tens of thousands of individual actors can engage in dozens of battles simultaneously. As mentioned, anyone can currently pre-order Ashes of the Singularity and gain early access to the game. This affords us the opportunity to do some DX12 benchmarking, and as luck would have it, there is an remarkably detailed built-in benchmark to boot. The benchmark was initially designed as a developer tool for internal testing and therefore does an excellent job of reproducing the conditions gamers can expect to find when playing Ashes. Before jumping to the testing methodology and then more crucially the benchmark results, we’re taking a moment to get up to speed with everything that’s gone on so far regarding Ashes of the Singularity’s DX12 performance since it was initially tested a few months ago…
Nvidia overstates DX12 support
The latest Maxwell architecture is very efficient in DirectX 11 titles, and we have often found that compared to AMD’s current generation cards, Nvidia generally comes out well on top of the performance per watt battle. Nvidia’s DX11 supporting architectures have been great at serial scheduling of workloads, in fact anything prior to Maxwell has been limited to serial scheduling rather than parallel scheduling. This makes sense though as DirectX 11 is suited for serial scheduling and this is why Nvidia has had an advantage. However the opposite has been hinted when testing upcoming DX12 titles. Despite the fact that Maxwell based GPUs are being advertised as fully DirectX 12 compliant this might not be entirely the case. It was discovered that the new asynchronous compute feature doesn’t work correctly with Maxwell GPUs, despite Nvidia advertising the feature for the 900 series. Maxwell doesn’t provide hardware asynchronous compute support and instead Nvidia patched in support for it at the driver level which comes at a performance cost. AMD, on the other hand, offers hardware-based asynchronous compute in the GCN architecture, providing it an advantage in certain DirectX 12 benchmarks and games. Maxwell’s asynchronous engine can queue up 31 compute tasks and 1 graphic task. Compare that to AMD’s GCN 1.1/1.2 which has 8 asynchronous compute engines, each of which are able to queue 8 compute tasks for a total of 64 coupled with 1 graphic task. Again, keep in mind that Maxwell queues in software while GCN 1.1/1.2 queues in the hardware. Adding fuel to the fire Oxide claims that Nvidia pressured them not to include the asynchronous compute feature in their benchmark, removing the GeForce 900 series disadvantage from the equation when competing against AMD’s DirectX 12 compliant GCN architecture.
AMD’s weak DX11 low-resolution performance
Years of DX11 benchmarking have shown us that 1080p and below resolutions tend to favor Nvidia GPUs. But that’s not actually what’s happening. It’s AMD who does poorly with GCN-enabled GPUs at lower resolutions. This is down to the fact that the GCN architecture is geared towards parallelism and requires the CPU to feed it data. This creates a CPU bottleneck as DX11 can utilize up to 2 CPU cores for the graphics pipeline and this also includes things such as AI and physics. This CPU bottleneck is the reason why we often see Intel’s Core i3 processors challenging AMD’s flagship FX-9590. The more efficient dual-core Core i3 with Hyper-Threading is able to take on the more equipped FX as in theory the extra cores aren’t being correctly utilized. And that is why AMD started pushing the Mantle API back in 2013 as an alternative to Direct3D and OpenGL. Mantle allows the AMD GPUs to be feed in parallel which agrees with their asynchronous compute engines which are designed to split complex workloads into smaller, easier to handle workloads.