Time-Based Profiling

Time-based profiling (TBP) identifies the hot-spots in a program that are consuming the most time. Hot-spots are good candidates for further investigation and optimization.

Time-based profiling is system-wide, so a hot-spot can be identified anywhere in the system, not just an application under test. Any software component (an executable image, dynamically loaded library, device driver, or even the operating system kernel) that executes during the measurement period may be sampled. System-wide data collection conveniently handles applications consisting of parallel processes without any special configuration.

Time spent in each software module (TBP)

CodeAnalyst reports performance results in one or more tabs in the workspace. Results are broken out and summarized by module, process, function, source line, or instruction.

Time spent in each function within an application (TBP)

CodeAnalyst supports drill-down to source lines and instructions. Source-level presentation of performance results requires symbolic (debug) information. See Preparing an Application for Profiling.

Time spent at source-level hot-spot (TBP)

How Time-Based Profiling Works

Time-based profiling uses statistical sampling to collect and build a program profile. CodeAnalyst handles all of the details and mechanics of collecting and building a profile. However, simple knowledge of the TBP sampling process is helpful.

When time-based profiling is enabled and started, CodeAnalyst configures a timer that periodically interrupts the program executing on a processor core. When a timer interrupt occurs, a sample is created and saved for post-processing by the CodeAnalyst GUI. Post-processing builds up a kind of histogram, which is a profile of what the system and its software components were doing. The most time-consuming parts of a program will have the most samples because, most likely, the program is executing in those regions when a timer interrupt is generated and a sample is taken.

On CodeAnalyst Linux, Time-based profiling uses event CPU_CLK_UNHALTED (performance counter event 0x76) which represents the amount of running time of a processor i.e. CPU is not in a halted state. This event allows system idle time to be automatically factored out from IPC (or CPI) measurements, providing the OS halts the CPU when going idle. The time representation (in seconds or millisecond) can be calculated from the processor clock speed. For instance, on a processor running at clock speed 800MHz, to specify 1 millisecond time interval of time-based profiling.

Sampling Period and Measurement Period

Because TBP relies upon statistical sampling, collecting enough samples to reason about program behavior is important. The number of TBP samples collected during an experimental run depends upon:

The frequency of sample taking is controlled by the timer interval. This quantity is sometimes called the "sampling period." The default timer interval is 1 millisecond, meaning that a TBP sample is taken on a processor core approximately every one millisecond of wall clock-time. The timer interval can be changed by editing the Current time-based profile profile configuration. With the default timer interval, roughly 1,000 TBP samples are taken per second for each processor core.

With a shortened interval time, samples are taken more often and more samples are taken within a given, fixed-length measurement period. However, the amount of effort expended to take samples (known as overhead) will increase, placing the test system under a higher load. The process of taking samples and the overhead have an intrusive effect that perturbs the test workload and may bias statistical results.

As to the second factor—the length of time during which samples are taken—samples accumulate for as long as data is collected. The measurement period depends upon the overall execution time of the workload and the way in which CodeAnalyst data collection is configured. Using either the Session Settings or command line utility options, CodeAnalyst can be configured to collect samples for all or part of the time that the test workload executes. If program run-time is short (less than 15 seconds), it may be necessary to increase program run-time by using a larger data set or more loop iterations to obtain a statistically useful result. Extending the duration of the measurement period by changing the Session Settings or options to the CodeAnalyst command line utility may need to be done.

Deciding how many samples are enough requires judgment and a working knowledge about the characteristics of the workload under test. Scientific applications often have tight inner loops that are executed several times. In these situations, samples accumulate rapidly within the inner loops and even a fairly short run-time yields a statistically useful number of samples. Other workloads, like transaction processing, have few intense inner loops and the profiles are relatively "flat." For flat workloads, a longer measurement period is required to build up samples in code regions of interest.

Predefined Profile Configurations

CodeAnalyst provides predefined profile configurations to make configuration of time-based profiling and other kinds of analysis convenient. The predefine configuration for time-based profiling is called "Time-based profile." CodeAnalyst also provides a configuration named "Current time-based profile" where the timer interval (sampling period) can be changed.