Skip to content

WestmereEX

Thomas.Roehl edited this page Nov 5, 2015 · 5 revisions

Architecture specific notes for Intel® Westmere EX

Performance groups

Intel® Westmere EX Performance groups

Events

The input file for the events on Intel® Westmere EX can be found here.

Counters

Core-local counters

Fixed-purpose counters

Since the Core2 microarchitecture, Intel® provides a set of fixed-purpose counters. Each can measure only one specific event.

Counters
Counter name Event name
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
##### Available Options
Option Argument Description Comment
anythread N Set bit 2+(index*4) in config register
kernel N Set bit (index*4) in config register
#### General-purpose counters The Intel® Westmere EX microarchitecture provides 4 general-purpose counters consisting of a config and a counter register. ##### Counters
Counter name Event name
PMC0 *
PMC1 *
PMC2 *
PMC3 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
kernel N Set bit 17 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
##### Special handling for events The Intel® Westmere EX microarchitecture provides measuring of offcore events in PMC counters. Therefore the stream of offcore events must be filtered using the OFFCORE_RESPONSE registers. The Intel® Westmere EX microarchitecture has two of those registers. LIKWID defines some events that perform the filtering according to the event name. Although there are many bitmasks possible, LIKWID natively provides only the ones with response type ANY. Own filtering can be applied with the OFFCORE_RESPONSE_0_OPTIONS and OFFCORE_RESPONSE_1_OPTIONS events. Only for those events two more counter options are available:
Option Argument Description Comment
match0 8 bit hex value Input value masked with 0xFF and written to bits 0-7 in the OFFCORE_RESPONSE register Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/WSM-EX.
match0 8 bit hex value Input value masked with 0xF7 and written to bits 8-15 in the OFFCORE_RESPONSE register Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/WSM-EX.
### Socket-wide counters #### Memory controller counters The Intel® Westmere EX microarchitecture provides measurements of the memory controllers in the uncore. The description from Intel®:
The memory controller interfaces to the Intel® 7500 Scalable Memory Buffers and translates read and write commands into specific Intel® Scalable Memory Interconnect (Intel® SMI) operations. Intel SMI is based on the FB-DIMM architecture, but the Intel 7500 Scalable Memory Buffer is not an AMB2 device and has significant exceptions to the FB-DIMM2 architecture. The memory controller also provides a variety of RAS features, such as ECC, memory scrubbing, thermal throttling, mirroring, and DIMM sparing. Each socket has two independent memory controllers, and each memory controller has two Intel SMI channels that operate in lockstep.
The Intel® Westmere EX microarchitecture has 2 memory controllers, each with 6 general-purpose counters. They are exposed through the MSR interface to the operating system kernel. The MBOX and RBOX setup routines are taken from Likwid 3, they are not as flexible as the newer setup routines but programming of the MBOXes and RBOXes is tedious for Westmere EX. It is not possible to specify a FVID (Fill Victim Index) for the MBOX or IPERF option for RBOXes. ##### Counters
Counter name Event name
MBOX<0,1>C0 *
MBOX<0,1>C1 *
MBOX<0,1>C2 *
MBOX<0,1>C3 *
MBOX<0,1>C4 *
MBOX<0,1>C5 *
##### Special handling for events For the events DRAM_CMD_ALL and DRAM_CMD_ILLEGAL two counter options are available:
Option Argument Description Comment
match0 34 bit address Set bits 0-33 in MSR_M<0,1>_PMON_ADDR_MATCH register
mask0 60 bit hex value Extract bits 6-33 from address and set bits 0-27 in MSR_M<0,1>_PMON_ADDR_MASK register

For the events THERM_TRP_DN and THERM_TRP_UP you cannot measure events for all and one specific DIMM simultaneously because they program the same filter register MSR_M<0,1>_PMON_MSC_THR and have contrary configurations.

Although the events FVC_EV<0-3> are available to measure multiple memory events, some overlap and do not allow simultaneous measuring. That's because they program the same filter register MSR_M<0,1>_PMON_ZDP and have contrary configurations. One case are the FVC_EV<0-3>_BBOX_CMDS_READS and FVC_EV<0-3>_BBOX_CMDS_WRITES events that measure memory reads or writes but cannot be measured at the same time.

Home Agent counters

The Intel® Westmere EX microarchitecture provides measurements of the Home Agent in the uncore. The description from Intel®:
The B-Box is responsible for the protocol side of memory interactions, including coherent and non-coherent home agent protocols (as defined in the Intel® QuickPath Interconnect Specification). Additionally, the B-Box is responsible for ordering memory reads/writes to a given address such that the MBOX does not have to perform this conflict checking. All requests for memory attached to the coupled MBOX must first be ordered through the B-Box.
The memory traffic in an Intel® Westmere EX system is controller by the Home Agents. Each MBOX has a corresponding BBOX. Each BBOX offers 4 general-purpose counters. They are exposed through the MSR interface to the operating system kernel.

Counters
Counter name Event name
BBOX<0,1>C0 *
BBOX<0,1>C1 *
BBOX<0,1>C2 *
BBOX<0,1>C3 *
##### Special handling for events For the matching events MSG_IN_MATCH, MSG_ADDR_IN_MATCH, MSG_OPCODE_ADDR_IN_MATCH, MSG_OPCODE_IN_MATCH, MSG_OPCODE_OUT_MATCH, MSG_OUT_MATCH, OPCODE_ADDR_IN_MATCH, OPCODE_IN_MATCH, OPCODE_OUT_MATCH and ADDR_IN_MATCH two counter options are available:
Option Argument Description Comment
match0 60 bit hex value Set bits 0-59 in MSR_B<0,1>_PMON_MATCH register For register layout and valid settings see Intel® Xeon® Processor E7 Family uncore Performance Monitoring Guide
mask0 60 bit hex value Set bits 0-59 in MSR_B<0,1>_PMON_MASK register For register layout and valid settings see Intel® Xeon® Processor E7 Family uncore Performance Monitoring Guide

Crossbar router counters

The Intel® Westmere EX microarchitecture provides measurements of the crossbar router in the uncore. The description from Intel®:
The Crossbar Router (R-Box) is a 8 port switch/router implementing the Intel® QuickPath Interconnect Link and Routing layers. The R-Box is responsible for routing and transmitting all intra- and inter-processor communication.
The Intel® Westmere EX microarchitecture has two interfaces to the RBOX although each socket contains only one crossbar router, RBOX0 is the left part and RBOX1 is the right part of the single RBOX. Each RBOX side offers 8 general-purpose counters. They are exposed through the MSR interface to the operating system kernel. The MBOX and RBOX setup routines are taken from Likwid 3, they are not as flexible as the newer setup routines but programming of the MBOXes and RBOXes is tedious for Westmere EX. It is not possible to specify a FVID (Fill Victim Index) for the MBOX or IPERF option for RBOXes.

Counters
Counter name Event name
RBOX<0,1>C0 *
RBOX<0,1>C1 *
RBOX<0,1>C2 *
RBOX<0,1>C3 *
RBOX<0,1>C4 *
RBOX<0,1>C5 *
RBOX<0,1>C6 *
RBOX<0,1>C7 *

Last Level cache counters

The Intel® Westmere EX microarchitecture provides measurements of the LLC coherency engine in the uncore. The description from Intel®:
For the Intel Xeon Processor 7500 Series, the LLC coherence engine (C-Box) manages the interface between the core and the last level cache (LLC). All core transactions that access the LLC are directed from the core to a C-Box via the ring interconnect. The C-Box is responsible for managing data delivery from the LLC to the requesting core. It is also responsible for maintaining coherence between the cores within the socket that share the LLC; generating snoops and collecting snoop responses to the local cores when the MESI protocol requires it.
The C-Box is also the gate keeper for all Intel® QuickPath Interconnect (Intel® QPI) messages that originate in the core and is responsible for ensuring that all Intel QuickPath Interconnect messages that pass through the socket’s LLC remain coherent.

The Intel® Westmere EX microarchitecture has 10 CBOX instances. Each CBOX offers 6 general-purpose counters. They are exposed through the MSR interface to the operating system kernel.

Counters
Counter name Event name
CBOX<0-9>C0 *
CBOX<0-9>C1 *
CBOX<0-9>C2 *
CBOX<0-9>C3 *
CBOX<0-9>C4 *
CBOX<0-9>C5 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 5 bit hex value Set bits 24-28 in config register
invert N Set bit 23 in config register
#### LLC-to-QPI interface counters The Intel® Westmere EX microarchitecture provides measurements of the LLC-to-QPI interface in the uncore. The description from Intel®:
The S-Box represents the interface between the last level cache and the system interface. It manages flow control between the C and R & B-Boxes. The S-Box is broken into system bound (ring to Intel® QPI) and ring bound (Intel® QPI to ring) connections.
As such, it shares responsibility with the C-Box(es) as the Intel® QPI caching agent(s). It is responsible for converting C-box requests to Intel® QPI messages (i.e. snoop generation and data response messages from the snoop response) as well as converting/forwarding ring messages to Intel® QPI packets and vice versa.

The Intel® Westmere EX microarchitecture has 2 SBOX instances. Each SBOX offers 4 general-purpose counters. They are exposed through the MSR interface to the operating system kernel. ##### Counters
Counter name Event name
SBOX<0,1>C0 *
SBOX<0,1>C1 *
SBOX<0,1>C2 *
SBOX<0,1>C3 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
##### Special handling for events Only for the TO_R_PROG_EV events two counter options are available:
Option Argument Description Comment
match0 64 bit hex value Set bit 0-63 in MSR_S<0,1>_PMON_MATCH register For register layout and valid settings see Intel® Xeon® Processor E7 Family uncore Performance Monitoring Guide
mask0 39 bit hex value Set bit 0-38 in MSR_S<0,1>_PMON_MASK register For register layout and valid settings see Intel® Xeon® Processor E7 Family uncore Performance Monitoring Guide
#### Power control unit fixed-purpose counters The Intel® Westmere EX microarchitecture provides measurements of the power controller in the uncore. The description from Intel®:
The W-Box is the primary Power Controller for the Intel® Xeon® Processor 7500 Series.
The Intel® Westmere EX microarchitecture has one WBOX and it offers one fixed-purpose counter for the uncore clock frequency. It is exposed through the MSR interface to the operating system kernel. ##### Counters
Counter name Event name
WBOXFIX UNCORE_CLOCKTICKS

Power control unit general-purpose counters

The Intel® Westmere EX microarchitecture provides measurements of the power controller in the uncore. The description from Intel®:
The W-Box is the primary Power Controller for the Intel® Xeon® Processor 7500 Series.
The Intel® Westmere EX microarchitecture has one WBOX and it offers 4 general-purpose counters. They are exposed through the MSR interface to the operating system kernel.

Counters
Counter name Event name
WBOX0 *
WBOX1 *
WBOX2 *
WBOX3 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
#### Uncore management counters The Intel® Westmere EX microarchitecture provides measurements of the system configuration controller in the uncore. The description from Intel®:
The U-Box serves as the system configuration controller for the Intel® Xeon® Processor E7 Family.
The Intel® Westmere EX microarchitecture has one UBOX and it offers a single general-purpose counter. It is exposed through the MSR interface to the operating system kernel. ##### Counters
Counter name Event name
UBOX0 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
Clone this wiki locally