Instruction-Based Sampling (IBS) uses a hardware sampling technique to generate event information which is similar to that produced by event-based profiling. Instruction-Based Sampling can be used to identify and diagnose performance issues in program hot-spots. IBS has these advantages:
The processor pipeline stages can be categorized into two main phases—instruction fetch and execution. Each instruction fetch operation produces a block of instruction data that is passed to the decoder stages in the pipeline. The decoder identifies AMD64 instructions in the fetch block. These AMD64 instructions are translated to one or more macro-operations, called "macro-ops" or "ops," that are executed in the execution phase.
IBS op samples for each software module ![]()
Attribution of IBS op samples to source-level hot-spot ![]()
Instruction-Based Sampling provides separate means to sample fetch operations and macro-ops since the fetch and execution phases of the pipeline treat these entities separately. IBS fetch sampling and IBS op sampling may be enabled and collected separately, or both may be enabled and collected together.
IBS fetch sampling is a statistical sampling method. IBS fetch sampling counts completed fetch operations. When the number of completed fetch operations reaches the maximum fetch count (the sampling period), IBS tags the fetch operation and monitors that operation until it either completes or aborts. When a tagged fetch completes or aborts, a sampling interrupt is generated and an IBS fetch sample is taken. An IBS fetch sample contains a timestamp, the identifier of the interrupted process, the virtual fetch address, and several event flags and values that described what happened during the fetch operation. Like time-based profiling and event-based profiling, CodeAnalyst uses the IBS sample data and information from the executable images, debug information, and source to build an profile IBS for software components executing on the system. (Instruction-Based Sampling is also system-wide.)
The event data reported in an IBS sample includes:
The IBS fetch address may be the address of a fetch block, the target of a branch, or the address of an instruction that is the fall-through of a conditional branch. A fetch block does not always start with a complete valid AMD64 instruction. This situation occurs when an AMD64 instruction straddles two fetch blocks. In this case, CodeAnalyst associates the IBS fetch sample with the AMD64 instruction in the first (preceding) fetch block.
The terms "killed," "attempted," "completed," and "aborted" refer to specific hardware conditions. A fetch operation may be abandoned before it delivers data to the decoder. A fetch may be abandoned due to a control flow redirection, and it may be abandoned at any time during the fetch process. A fetch abandoned before initial access to the ITLB (before address translation) is not regarded as useful for analysis. These early abandoned fetches are called killed fetches. CodeAnalyst filters out killed fetches. The fetch operations remaining after killed fetches are eliminated are called attempted fetches since these fetches represent valid attempts to obtain instruction data.
A completed fetch is an attempted fetch that successfully delivered instruction data to the decoder. An aborted fetch is an attempted fetch that did not complete.
Note: Instruction fetch is an aggressive, speculative activity and even instruction data produced by a completed fetch may not be used.
IBS op sampling selects, tags, and monitors macro-ops as issued from AMD64 instructions. Two options are available for selecting ops for sampling.
The execution stages of the pipeline monitor the tagged macro-op. When the tagged macro-op retires, a sampling interrupt is generated and an IBS op sample is taken. An IBS op sample contains a timestamp, the identifier of the interrupted process, the virtual address of the AMD64 instruction from which the op was issued, and several event flags and values that describe what happened when the macro-op executed. CodeAnalyst uses this and other information to build an IBS profile.
A cycles-based selection generally produces more IBS samples than dispatched op-based selections. However, the statistical distribution of IBS op samples collected with a cycles-based selection may be affected by pipeline stalls and other time-dependent hardware behavior. The statistical bias is due to stalls at the decoding stage of the pipeline. If a macro-op is not available for tagging when the maximum op count is reached, the hardware skips the opportunity to tag a macro-op and starts counting again from a small, pseudo-random initial count.
From a practical perspective, the distribution of cycles-based IBS op samples may not be uniform across instructions with the same execution frequency (i.e., across instructions within the same basic block). The statistical distribution of IBS op samples collected with dispatched op-based selection is generally more uniform across instructions with the same execution frequency. This is a useful property in practice as IBS op statistics can be more readily used to make comparisons between instruction behavior. The dispatched op-based selection is the preferred collection mode and should be used when available.
Note: The cycles-based selection is supported on all IBS-capable AMD processors. The dispatched op-based selection is a newer IBS feature and is not supported on all IBS-capable AMD processors and is only available in AMD Family 10h processors, revision 4 and beyond. Refer to the relevant version of the AMD BIOS and kernel developer's guide for support details.
IBS op sampling reports a wide range of event data. The following values are reported for all ops:
Attribution of event information is precise because the IBS hardware reports the address of the AMD64 instruction causing the events. For example, branch mispredicts are attributed exactly to the branch that mispredicted and cache misses are attributed exactly to the AMD64 instruction that caused the cache miss. IBS makes it easier to identify the instructions which are performance culprits.
Some ops implement branch semantics. Branches include unconditional and conditional branches, subroutine calls and subroutine returns. Event information reported for branch ops include:
IBS also indicates whether a branch operation was a subroutine return and if the return was mispredicted. Some ops may perform a load (memory read), store (memory write), or a load and a store to the same memory address, as in the case of a read-op-write sequence. When an op performs a load and/or store, event information includes:
Requests made through the Northbridge produce additional event information:
A full list of IBS op event information appears in the section on IBS events. Hardware-level details can be found in the BIOS and Kernel Developer's Guide (BKDG) for the AMD processor in your test platform.
CodeAnalyst translates the IBS information produced by the hardware into derived event sample counts that resemble EBP sample counts. All IBS-derived events have "IBS" in the event name and abbreviation. Although IBS-derived events and sample counts look similar to EBP events and sample counts, the source and sampling basis for the IBS event information are quite different. Arithmetic should never be performed between IBS derived event sample counts and EBP event sample counts. It is not meaningful to directly compare the number of samples taken for events which represent the same hardware condition. For example, fewer IBS DC miss samples is not necessarily "better" than a larger quantity of EBP DC miss samples.
See Instruction-Based Sampling Derived Events for descriptions of the IBS derived events.
AMD CodeAnalyst provides a predefined profile configuration called "Instruction-based sampling" which collects both IBS fetch and IBS op samples. It also provides the configuration named "Current Instruction-based profile " which can be changed and customized.