Apple describes the GPU improvements in the A17 Pro and M3. Metal API-enabled apps and games target specific functions of Apple Silicon GPUs, which get even better with major increases in parallel processing in M3 and A17 Pro. This is how it works.
Apple describes the GPU improvements in the A17 Pro and M3
Apple gave a developer session on these new Apple Silicon GPU features, outlining exactly what’s going on to produce better outcomes. The video goes into tremendous technical detail, but it also provides enough information to explain in layman’s terms.
Developers using the Metal API do not need to make any changes to their apps in order to enjoy performance benefits with the M3 and A17 Pro. These chipsets make the GPU more performant than ever before by utilising Dynamic Caching, hardware-accelerated ray tracing, and hardware-accelerated mesh mapping.
Dynamic Shader Core Memory
Dynamic Caching is enabled via a next-generation shader core. When the latest GPU cores in A17 Pro and M3 are used, these shaders may execute in parallel much more effectively than previously, significantly boosting output performance.
Normally, the GPU can only allocate register RAM for the duration of an executed action depending on the highest bandwidth process within that action. As a result, if one component of an action consumes much more register memory than the others, the action will consume significantly more register memory for a given process.
Dynamic Caching enables the GPU to allocate just the right amount of register memory for each activity it performs. The previously inaccessible register memory is released, allowing many more shader tasks to run concurrently.
Flexible On-Chip Memory
Previously, on-chip memory had fixed memory allocations for register, thread group, and tile memory, together with a buffer cache. This meant that if an action consumed more of one type of memory than another, significant amounts of memory were left unused.
All of the on-chip memory in flexible on-chip memory is a cache that may be used for any memory type. As a result, an action that primarily relies on thread group memory can use the full on-chip memory span and even overflow activities into the main memory.
To maximise performance, the shader core dynamically modifies on-chip memory occupancy. This means that developers will have to spend less effort optimising occupancy.
High-performance ALU pipelines in Shader Core
Apple advises that developers use FP16 math in their programs, however, high-performance ALUs use a variety of integer, FP32, and FP16 combinations in tandem. Because instructions are implemented over many actions that are conducted concurrently, ALU utilisation improves with increasing occupancy.
Essentially, if multiple actions contain the same FP32 or FP16 instructions that are executed at various times, the executions can be overlapped to boost parallelism.
Hardware-accelerated graphics pipelines
Hardware-accelerated ray tracing speeds up the process by removing the critical intersection computations from the GPU function. Because hardware handles a portion of the calculations, more operations may be performed in parallel, speeding up ray tracing with a hardware component.
A similar mechanism is used by hardware-accelerated mesh shading. It takes the geometric computations pipeline in the middle and routes it to a specialised unit, allowing for more parallel processes.
These are complex systems that cannot be summarised in a few paragraphs. We recommend viewing the video to gain all the information but keep in mind that the A17 Pro and M3 rely on computing parallelism to accelerate processes.
The M3 is offered in MacBook Pro and 24-inch iMac configurations. The A17 Pro can be found in the iPhone 15 Pro.