Tuesday, January 13, 2009

Carbide.c++ Performance Investigator

The primary goal of the Carbide.c++ Performance Investigator (PI) is to gather performance data on a handheld device and save the data to a profiler data (.dat) file. On your PC workstation, PI lets you analyze that performance data. The Performance Investigator analyzer software is a plug-in to Eclipse Carbide.c++.

Importing the profiler data file in Carbide.c++ will launch the analyzer and present graphs and tables related to the data captured on the handheld target device.

The Performance Investigator collects performance data. It interrupts software execution at regular intervals and copies to memory the contents of some processor registers and the results of some Symbian OS calls.

The Performance Investigator contains two main parts:

First, on the target device is a user application called the Profiler. Installed with the Profiler is a device driver component that collects information from low-level (kernel and hardware-level) sources that are not generally accessed by application software. The Profiler acts as an interface to the device driver component and is required for its configuration, activation, and deactivation.

Second, on the PC there is Performance Investigator code integrated with Carbide.c++. This software consists of an importer for processing profiler data and the Analyzer, which provides a user-friendly way to examine and analyze the run-time performance information


Software Performance

Two fundamental goals in understanding software performance can be seen as:

The ability to know which part of the software the CPU is executing at any particular instance of time
The ability to know why the CPU is executing that particular software at that particular instance of time

Profiler

The profiler is the part of the Performance Investigator that resides on the target device. Its purpose is to gather performance measurement information from the device at run-time and to output the information to be analyzed within the Analyzer part of the tool. Each type of performance information recorded is called a trace. The most important Profiler traces include:

  • Address / thread trace periodically records the current program counter address and the currently executing thread.
  • Dynamic binary support trace periodically records information sufficient to determine which dynamically loaded user application or DLL is executing.
  • Function call trace periodically records the link register value, so that the caller of the currently executing routine can be determined.

Setting Tracing Options

Use the Tracing options menu in the Profiler to specify the types of tracing data to collect when profiling an application executing on a device.


Dynamic binary support
If the executables you wish to analyze are not in a ROM image (.symbol file) of your target device, then you will need to turn on Dynamic binary support.


Function call capture
If disabled, then Function call information will not appear in the Analyzer.

Button press capture
If disabled, then Button Press information will not appear in the Analyzer.

Memory usage capture
If disabled, then Memory usage information will not appear in the Analyzer.

Thread priority capture
If enabled, captured data will include the time and priority ranking of the currently executing thread when the operating system is called.
If disabled, then thread priority information is not included and a Priority List column will not be shown in the Thread Load table in the Analyzer.

Power usage capture
If disabled, then Power usage information will not appear in the Analyzer.

Trace Items

Recorded data is logically separated into different traces that can be selected or deselected by the Profiler. In analysis, trace data is combined with compile-time produced information (symbol files and binaries). All trace data and required symbolic information is stored in a stand-alone file that can be imported into the analyzer.

· Address/Thread Trace

This is a statistical trace for analyzing thread, binary, and function CPU load. All other traces are in synch with this trace, therefore, there has to be a address/thread trace in each measurement.
Address/Thread trace data can be visualized in different modes, depending on whether the items of interest are threads, binaries, or functions. For the elements associated with each mode, you can calculate the average CPU load within a selected time period, and display that information in tables and graphs. In all modes, address/thread trace graphs share the same semantics:

· Colors represent different trace items, such as threads, binaries, and functions
· Time is represented on the horizontal axis
· CPU load is represented on the vertical axis

· Button Press Trace

This provides an exact trace that records data about buttons pressed during the trace.
It is often useful to see the positions where different buttons were pressed during the trace. The mechanism for collecting BUP traces is simple. A key press event is then stored to the trace, along with the time the event took place

· Dynamic Binary Support Trace
During sampling, the dynamic binary support trace records the full path names, starting locations, and lengths of binaries loaded on the target device. That data is used to determine which binaries and functions are associated with address/thread trace samples taken in dynamic binaries.

· Function Call Trace
This is a statistical trace that records function caller/callee relationships during execution. The trace is based on periodically sampling link register values. With the link register value, it is possible to resolve the caller function for the currently executing function.
· Memory Trace

This trace takes samples of memory consumption of processes and threads. It records the free and total amount of memory on the device, the amount of memory allocated to kernel chunks, and the amount of memory allocated to stack/heap of each individual thread.

· Power Usage Trace

The Power Usage trace sampler uses a specific ISC channel to gather information about energy consumption from the Energy Management (EM) server running on the domestic OS. The sampler collects battery voltage, current, and capacity values, but it does not change any EM settings. The default sampling frequency is 4Hz (250ms), and the minimum allowed is 20Hz (50ms). Requesting an interval faster than the hardware supports will result in return values being identical to the previously sampled values.

· Thread Priority Trace

This trace takes samples of thread priorities. When the operating system is called, the time and priority ranking for the currently executing thread is recorded. Because a thread's priority may change over time, sometimes more than one priority ranking will be recorded for a thread.


Analyzer

The Performance Investigator Analyzer is used to graph, measure and analyze the collected traces from a device running the Profiler. It appears as a standard Analyzer view in a Carbide perspective.

The main analysis view is visible after trace data taken by the profiler has been imported into Carbide.c++ and opened with the Analyzer. The thread load data in the main analysis view is built from the address/thread trace. Other trace data is also shown in the main analysis view, such as dynamic binary support trace and function call trace. Figure 1 is an example view of imported profiler data that has been opened with the analyzer, including memory and power usage information, with a time interval of 9.1 to 13 seconds selected.

Analyzer main view showing Thread, Memory, and Power graphs
The address/thread trace graph is the most generic and perhaps most informative visual of the sampled data. The different colors represent different items. You can select Thread load, Binary load, or Function load as shown in Figure 2. Time is represented on the horizontal axis of the graph and share of CPU use is represented on the vertical axis

Selecting Threads, Binaries, Functions, or Function Calls

Each of the lines in the thread list represents a single thread. The color codes in the thread list correspond to the colors in the thread load graphs. Therefore, each color in the thread load graph represents the load of its corresponding thread.

You can select a portion of the graph by clicking on the trace graph. The Information view beneath the graph displays information about the currently selected part of the graph. For example, thread load statistics are shown in the information view if the thread load graph is selected.

Basic Analysis Procedure
The following gives basic guidelines how to proceed and find possible problems in the system during the actual analyzing work using the analyzer’s capabilities.

Overall CPU load graph

Scale the time line (i.e. x-axis) in the CPU load graph such that you can see all essential activity from start to end of the measurement. Display all threads. This gives an overview of overall CPU load during your use case.

NULL thread overview
Examine overall NULL thread execution. Time frames in which the NULL thread does not gain any CPU time are potential bottlenecks and need more detailed analysis. On the other hand, when the NULL thread has 100% of the CPU time, the system does not have anything to do. If this occurs, for example in the middle of intensive processing, available resources might be used non-optimally.

Overview of other threads
Find out thread(s) causing most of the CPU load. There are typically 1-3 high-load threads within the selected time frame. Study time frames that contain potential bottlenecks one by one.
Application characteristics overview

Take an overview of CPU load characteristics of the specific application(s) running in the use case. Focus on time frames in which the application(s) consume more CPU time than assumed. The binaries executing during those time frames tell you where time is being spent in user and system code.

Within the time frames requiring a more detailed analysis, try to determine at a high level what the application has been doing. When getting into a more detailed analysis, it is essential to know the origin causing the activity chain. For example, the end-user has completed a menu selection, which triggers playback of an audio file. The event has to be associated with a specific point in time in the CPU load profile.

Function-level analysis

Examine the set of functions called within high-load threads during selected time frames. Keep in mind that your use case will determine which threads are relevant to analysis. As a rule of thumb, threads consuming more than 10% of available CPU time are rational candidates to be analyzed.

When you analyze the binaries and functions in which the application spends most of its execution time, it is recommended that you analyze based on a single thread at a time.
For a high-load function, examine the code for an explanation (e.g., computation-intensive loop) as to why significant time was spent in the function. If you have captured function call data, examine the functions calling the high-load function to determine whether so many calls were necessary.

No comments:

Post a Comment