GeForce GTX 980 Whitepaper 
GM204 HARDWARE ARCHITECTURE 
IN-DEPTH 
 
 
8 
 
from 32 to 64. Again, thanks to the added benefit of higher clocks, pixel fill-rate is actually more than 
double that of GTX 680: 72 Gpixels/sec for GTX 980 versus 32.2 Gpixels/sec for GTX 680. 
The memory subsystem has also been significantly revamped. GTX 980’s memory clock is over 15% 
higher than GTX 680, and GM204’s cache is larger and more efficient than Kepler’s design, reducing the 
number of memory requests that have to be made to DRAM. Improvements in our implementation of 
memory compression provide a further benefit in reducing DRAM traffic—effectively amplifying the raw 
DRAM bandwidth in the system.  
Maxwell Streaming Multiprocessor 
The SM is the heart of our GPUs. Almost 
every operation flows through the SM at 
some point in the rendering pipeline. 
Maxwell GPUs feature a new SM that’s 
been designed to provide dramatically 
improved performance per watt than prior 
GeForce GPUs.  
Compared to GPUs based on our Kepler 
architecture, Maxwell’s new SMM design 
has been reconfigured to improve 
efficiency. Each SMM contains four warp 
schedulers, and each warp scheduler is 
capable of dispatching two instructions per 
warp every clock. Compared to Kepler’s 
scheduling logic, we’ve integrated a 
number of improvements in the scheduler 
to further reduce redundant re-
computation of scheduling decisions, 
improving energy efficiency. We’ve also 
integrated a completely new datapath 
organization. Whereas Kepler’s SM shipped 
with 192 CUDA Cores—a non-power-of-two 
organization—the Maxwell SMM is 
partitioned into four distinct 32-CUDA core 
processing blocks (128 CUDA cores total 
per SM), each with its own dedicated 
resources for scheduling and instruction 
buffering. This new configuration in 
Maxwell aligns with warp size, making it 
easier to utilize efficiently and saving area 
Figure 3: GM204 SMM Diagram (GM204 also features 4 DP units per 
SMM, which are not depicted on this diagram)