, ,

On-chip L3 cache and intelligent caching- IBM Power E1050

The Power10 processor includes a large on-chip L3 cache of up to 120 MB with a NUCA architecture that provides mechanisms to distribute and share cache footprints across a set of L3 cache regions. Each processor core can access an associated local 8 MB of L3 cache. It also can access the data in the other L3 cache regions on the chip and throughout the system.

Each L3 region serves as a victim cache for its associated L2 cache, and it can provide aggregate storage for the on-chip cache footprint.

Intelligent L3 cache management enables the Power10 processor to optimize the access to L3 cache lines and minimize cache latencies. The L3 cache includes a replacement algorithm with data type and reuse awareness. It also supports an array of prefetch requests from the core, including instruction and data, and works cooperatively with the core, memory controller, and SMP interconnection fabric to manage prefetch traffic, which optimizes system throughput and data latency.

46      IBM Power E1050: Technical Overview and Introduction

The L3 cache supports the following key features:

Ê Enhanced bandwidth that supports up to 64 bytes per core processor cycle to each SMT8 core.

Ê Enhanced data prefetch that is enabled by 96 L3 prefetch request machines that service prefetch requests to memory for each SMT8 core.

Ê Plus-one prefetching at the memory controller for enhanced effective prefetch depth and rate.

Ê Power10 software prefetch modes that support fetching blocks of data into the L3 cache.

Ê Data access with reduced latencies.

2.1.10 Nest accelerator

The Power10 processor has an on-chip accelerator that is called the nest accelerator (NX) unit. The co-processor features that are available on the Power10 processor are like the features on the Power9 processor. These co-processors provide specialized functions, such as the following examples:

Ê IBM proprietary data compression and decompression

Ê Industry-standard Gzip compression and decompression

Ê AES and SHA cryptography

Ê Random number generation

Figure 2-5 shows a block diagram of the NX unit.

Figure 2-5 Block diagram of the NX unit

Each one of the AES and SHA engines, data compression, and Gzip units consist of a co-processor type, and the NX unit features three co-processor types. The NX unit also

includes more support hardware to support co-processor invocation by user code, usage of effective addresses, high-bandwidth storage accesses, and interrupt notification of job completion.

Chapter 2. Architecture and technical overview                                                                      47

The direct memory access (DMA) controller of the NX unit helps to start the co-processors and move data on behalf of co-processors. An SMP interconnect unit (SIU) provides the interface between the Power10 SMP interconnect and the DMA controller.

The NX co-processors can be started transparently through library or OS kernel calls to speed up operations that are related to data compression, LPM migration, IPsec, JFS2 encrypted file systems, PKCS11 encryption, random number generation, and the recently announced logical volume encryption.

In effect, this on-chip NX unit on Power10 systems implements a high-throughput engine that can perform the equivalent work of multiple cores. The system performance can benefit by offloading these expensive operations to on-chip accelerators, which can greatly reduce the CPU usage and improve the performance of applications.

The accelerators are shared among the logical partitions (LPARs) under the control of the PowerVM hypervisor and accessed through a hypervisor call. The OS, along with the PowerVM hypervisor, provides a send address space that is unique per process requesting the co-processor access. This configuration allows the user process to directly post entries to the first in – first out (FIFO) queues that are associated with the NX accelerators. Each NX co-processor type has a unique receive address space corresponding to a unique FIFO for each of the accelerators.

For more information about the usage of the xgzip tool that uses the Gzip accelerator engine, see the following resources:

Ê Using the Power9 NX (gzip) accelerator in AIX

Ê Power9 GZIP Data Acceleration with IBM AIX

Ê Performance improvement in OpenSSH with on-chip data compression accelerator in Power9

Ê The nxstat command

Related Posts