Analysis with Event-Based Profiling
This section is a brief introduction to analysis with event-based
profiling. A CodeAnalyst project must already be opened by following
the directions under Creating a
CodeAnalyst Project, or by opening an existing CodeAnalyst
project. It also assumes that session settings have been established
and CodeAnalyst is ready to profile an application.
Event-based profiling uses the hardware performance event counters to
measure the number of specific kinds of events that occur during
execution. Processor clock cycles, retired instructions, data cache
accesses and data cache misses are examples of events. The specific
events to be measured are determined by the profile configuration that
is used to set up data collection. CodeAnalyst provides five
predefined profile configurations to collect performance data using
event-based sampling. These profile configurations are:
- Assess performance: Assess overall program performance.
- Investigate data access: Investigate how well software uses the data cache (DC).
- Investigate instruction access: Investigate how well software uses the instruction cache (IC).
- Investigate L2 cache access: Investigate how well software uses the unified level 2 (L2) cache.
- Investigate branching: Identify mispredicted branches and near returns.
These profile configurations cover the most common program performance issues of interest. Later in this section, we demonstrate how to select events and configure data collection in order to investigate issues that are not covered by the predefined profile configurations.
Assessing Performance
A drop-down list of the available profile configurations is included in the CodeAnalyst toolbar.
- Select the Assess performance profile configuration. This profile configuration is a good starting point for analysis because it generates an overview of program performance. The overall assessment may indicate one or more potential issues to be investigated in more detail by using one of the other predefined configurations (or by using a custom profile configuration of your own).

- Click the Start button in the toolbar or
select Profile > Start to begin
profiling. CodeAnalyst starts data collection and launches the
application program that was specified in the session
settings. The session status displays in the status bar in the
lower left corner of the CodeAnalyst window. Session progress
displays in the lower right corner. The blank window is
the console window in which the application program,
“classic” is running.

When data collection is complete, CodeAnalyst processes the performance data and creates a new session under “EBP Sessions” in the session management area on the left side of the CodeAnalyst window. Results are shown in the System Data, System Graph and Processes tabs. The System Data table, System Graph and Processes table resemble and behave like their TBP counterparts. However, the type and number of event-based samples are shown instead of timer samples.

The Overall assessment view displays an overview of software performance. The System Data table shows the number of events and computed performance measurements for each module that was active during data collection. The Overall assessment view shows:
- CPU clocks: CPU clocks not halted event
- Instructions per clock cycle (IPC): Retired instructions / CPU clocks
- DC (data cache) miss rate: DC misses / retired instructions
- DTLB (data translation lookaside buffer) miss rate: DTLB miss / retired instructions
- Misaligned access rate: Misaligned accesses / retired instructions
- Mispredict rate: Retired mispredicted branch instructions / retired instructions
In general, when the term rate appears in a computed performance measurement, the rate is expressed as “events per retired instruction.” A rate indicates how frequently an event is occurring. A high rate, such as a high DC miss rate, may indicate the presence of a performance problem and an opportunity for optimization.
The specific combination of events and computed performance measurements that are shown in a table or graph are a view. CodeAnalyst may offer more than one view depending upon the kinds of data (e.g., events) that were collected. The drop-down list (immediately above the System Data tab) contains the available views. The All Data view is always available.
- Select the All Data view from the drop-down list.

Changing Contents of a View
CodeAnalyst provides a way to change the contents of a view.
- Click the Manage button to change the contents of the currently selected view. A dialog box appears showing the name of the view, a description of the view, the available data that can be shown and the columns (data) that are shown.
- To add data for an event to the current view, select an event in the Available data list and click the right arrow button.
- To remove data for an event from the current view, select an event in the Columns shown list and click the left arrow button.
- Remove all events except Retired instructions and Data cache assesses from the Columns shown list.
- Click the OK button to confirm and accept the changes.

After making these changes, CodeAnalyst updates the System Data table and eliminates the columns for the event data that were removed from the view.

- Select the IPC assessment view from the drop-down list of views.

CodeAnalyst updates the System Data table which now shows the IPC assessment view. This view consists of:
- The number of retired instruction samples
- The number of CPU clock samples
- The ratio of instructions per clock cycle (IPC)
- The ratio of clock cycles per instruction (CPI)
The ratio of instructions per clock cycle is a basic measure of execution efficiency and is a good indicator of instruction level parallelism (ILP).

Choosing Events for Data Collection
The predefined profile configurations cover the most common kinds of performance analysis. AMD processors, however, are able to monitor a wide range of performance events.
- To configure data collection using events of your own choice, click on the Session Settings button in the toolbar. A dialog box appears asking for session settings.
- Choose the Current event-based profile configuration in the list of profile configurations. You may freely edit and change this profile configuration and may use this profile configuration as a scratchpad for customized EBP configurations.

- Click the Edit button. A dialog box appears which allows you to edit the “Current event-based profile” configuration. The “Current event-based profile” configure in this example already contains the “CPU clocks not halted” event.
- Scroll through the list of individual events to find the Retired instructions event. Select Retired instructions.
- Click Add Event. The “Retired instructions” event is added to the list of events in the configuration.
- Set the Event Count field to 1,000,000.
- Find and select the Retired uops event. Click Add Event.
- Set the Event Count field to 1,000,000.
The “Retired uops” event is added to the list of events in the configuration
- If you make a mistake and need to remove an event from the configuration, select the event and then click the Remove button.
The Event Count field specifies the sampling period for the event. The Event Count determines how often a sample is taken for the event. If the Event Count is N, then a sample will be taken after the occurrence of N events of that type. Use smaller Event Count values to sample an event more often. However, more frequent sampling increases measurement overhead.
Caution: Choose the Event Count value conservatively. Start with a large value first and then decrease the value until the desired measurement accuracy is achieved. Very small values may cause the system to hang under certain workload conditions.
- Click OK to confirm the changes and to dismiss the dialog box.

CodeAnalyst collects performance data according to the session settings and the Current event-based profile configuration.
- Click the Start button in the toolbar to begin data collection, or select Profile > Start from the Profile menu. Results are displayed in the System Data, System Graph and Processes tabs when data collection is finished. A new session, “ExampleSession1,” is added to the list of EBP sessions in the session management are. Notice that Code Analyst auto-generates new session names when necessary.
- Click on the System Data tab and select the All Data view from the list of available views. Three columns display containing the number of samples taken for the CPU clocks (not halted), retired instruction, and retired micro-op (uops) events.
- Examine the list of available events. The predefined IPC assessment view is offered because data is available for both the retired instruction and CPU clocks not halted events. The decision to offer a view is data-driven. If the right type of event data is available to display a view, CodeAnalyst offers the view.
- Select the IPC assessment view to display a module-by-module breakdown of IPC and CPI measurements in the System Data table.
Next: Analysis with Instruction-Based Sampling