Analysis with Event-Based Profiling

This section is a brief introduction to analysis with event-based profiling. A CodeAnalyst project must already be opened by following the directions under Creating a CodeAnalyst Project, or by opening an existing CodeAnalyst project. It also assumes that session settings have been established and CodeAnalyst is ready to profile an application.

Event-based profiling uses the hardware performance event counters to measure the number of specific kinds of events that occur during execution. Processor clock cycles, retired instructions, data cache accesses and data cache misses are examples of events. The specific events to be measured are determined by the profile configuration that is used to set up data collection. CodeAnalyst provides five predefined profile configurations to collect performance data using event-based sampling. These profile configurations are:

These profile configurations cover the most common program performance issues of interest. Later in this section, we demonstrate how to select events and configure data collection in order to investigate issues that are not covered by the predefined profile configurations.

Assessing Performance

A drop-down list of the available profile configurations is included in the CodeAnalyst toolbar.

  1. Select the Assess performance profile configuration. This profile configuration is a good starting point for analysis because it generates an overview of program performance. The overall assessment may indicate one or more potential issues to be investigated in more detail by using one of the other predefined configurations (or by using a custom profile configuration of your own).

  1. Click the Start button in the toolbar or select Profile > Start to begin profiling. CodeAnalyst starts data collection and launches the application program that was specified in the session settings. The session status displays in the status bar in the lower left corner of the CodeAnalyst window. Session progress displays in the lower right corner. The blank window is the console window in which the application program, “classic” is running.

When data collection is complete, CodeAnalyst processes the performance data and creates a new session under “EBP Sessions” in the session management area on the left side of the CodeAnalyst window. Results are shown in the System Data, System Graph and Processes tabs. The System Data table, System Graph and Processes table resemble and behave like their TBP counterparts. However, the type and number of event-based samples are shown instead of timer samples.

The Overall assessment view displays an overview of software performance. The System Data table shows the number of events and computed performance measurements for each module that was active during data collection. The Overall assessment view shows:

In general, when the term rate appears in a computed performance measurement, the rate is expressed as “events per retired instruction.” A rate indicates how frequently an event is occurring. A high rate, such as a high DC miss rate, may indicate the presence of a performance problem and an opportunity for optimization.

The specific combination of events and computed performance measurements that are shown in a table or graph are a view. CodeAnalyst may offer more than one view depending upon the kinds of data (e.g., events) that were collected. The drop-down list (immediately above the System Data tab) contains the available views. The All Data view is always available.

  1. Select the All Data view from the drop-down list.

Changing Contents of a View

CodeAnalyst provides a way to change the contents of a view.

  1. Click the Manage button to change the contents of the currently selected view. A dialog box appears showing the name of the view, a description of the view, the available data that can be shown and the columns (data) that are shown.
  1. Remove all events except Retired instructions and Data cache assesses from the Columns shown list.
  2. Click the OK button to confirm and accept the changes.

After making these changes, CodeAnalyst updates the System Data table and eliminates the columns for the event data that were removed from the view.

  1. Select the IPC assessment view from the drop-down list of views.

CodeAnalyst updates the System Data table which now shows the IPC assessment view. This view consists of:

The ratio of instructions per clock cycle is a basic measure of execution efficiency and is a good indicator of instruction level parallelism (ILP).

Choosing Events for Data Collection

The predefined profile configurations cover the most common kinds of performance analysis. AMD processors, however, are able to monitor a wide range of performance events.

  1. To configure data collection using events of your own choice, click on the Session Settings button in the toolbar. A dialog box appears asking for session settings.
  2. Choose the Current event-based profile configuration in the list of profile configurations. You may freely edit and change this profile configuration and may use this profile configuration as a scratchpad for customized EBP configurations.

  1. Click the Edit button. A dialog box appears which allows you to edit the “Current event-based profile” configuration. The “Current event-based profile” configure in this example already contains the “CPU clocks not halted” event.
  2. Scroll through the list of individual events to find the Retired instructions event. Select Retired instructions.
  3. Click Add Event. The “Retired instructions” event is added to the list of events in the configuration.
  4. Set the Event Count field to 1,000,000.
  5. Find and select the Retired uops event. Click Add Event.
  6. Set the Event Count field to 1,000,000.

The “Retired uops” event is added to the list of events in the configuration

The Event Count field specifies the sampling period for the event. The Event Count determines how often a sample is taken for the event. If the Event Count is N, then a sample will be taken after the occurrence of N events of that type. Use smaller Event Count values to sample an event more often. However, more frequent sampling increases measurement overhead.

Caution: Choose the Event Count value conservatively. Start with a large value first and then decrease the value until the desired measurement accuracy is achieved. Very small values may cause the system to hang under certain workload conditions.

  1. Click OK to confirm the changes and to dismiss the dialog box.

CodeAnalyst collects performance data according to the session settings and the Current event-based profile configuration.

  1. Click the Start button in the toolbar to begin data collection, or select Profile > Start from the Profile menu. Results are displayed in the System Data, System Graph and Processes tabs when data collection is finished. A new session, “ExampleSession1,” is added to the list of EBP sessions in the session management are. Notice that Code Analyst auto-generates new session names when necessary.
  2. Click on the System Data tab and select the All Data view from the list of available views. Three columns display containing the number of samples taken for the CPU clocks (not halted), retired instruction, and retired micro-op (uops) events.

Next: Analysis with Instruction-Based Sampling