Author : McMillan
Page : 1 Next >>
Introduction
One often-heard claim during the past 30 years is that performance doesn't matter because the computational power of hardware is constantly dropping. Therefore, buying a stronger machine or extending the RAM of an existing one can make up for the sluggish performance of software written in a high-level programming language. In other words, a hardware upgrade is more cost-effective than the laborious task of hand-tuning code. That might be correct for client applications that execute on a standard personal computer. A modestly priced personal computer these days offers higher computational power than a mainframe did two decades ago, and the computational power still grows exponentially every 18 months or so. However, in many other application domains, a hardware upgrade is less favorable because it is too expensive or because it simply is not an option. In proprietary embedded systems with 128K of RAM or less, extending the RAM requires redesigning the entire system from scratch, as well as investing several years in the development and testing of the new chips. In this case, code optimization is the only viable choice for satisfactory performance.
But optimization is not confined to esoteric application domains such as embedded systems or hard core real-time applications. Even in mainstream application domains such as financial and billing systems, code optimization is sometimes necessary. For a bank that owns a $1,500,000 mainframe computer, buying a faster machine is less preferable than rewriting a few thousand lines of critical code. Code optimization is also the primary tool for achieving satisfactory performance from server applications that support numerous users, such as Relational Database Management Systems and Web servers.
Another common belief is that code optimization implies less readable and harder to maintain software. This is not necessarily true. Sometimes, simple code modifications such as relocating the declarations in a source file or choosing a different container type can make all the difference in the world. Yet none of these changes entails unreadable code, nor do they incur any additional maintenance overhead. In fact, some of the optimization techniques can even improve the software's extensibility and readability. More aggressive optimizations can range from using a simplified class hierarchy, through the combination of inline assembly code. The result in this case is less readable, harder to maintain, and less portable code. Optimization can be viewed as a continuum; the extent to which it is applied depends on a variety of considerations.
Scope of This Tutorial
Optimization is a vast subject that can easily fill a few thick volumes. This tutorial discusses various optimization techniques, most of which can be easily applied in C++ code without requiring a deep understanding of the underlying hardware architecture of a particular platform. The intent is to give you a rough estimate of the performance cost of choosing one programming strategy over another (you can experiment with the programs that are discussed in the following sections on your computer). The purpose is to provide you with practical guidelines and notions, rather than delve into theoretical aspects of performance analysis, efficiency of algorithms, or the Big Oh notation.
Before Optimizing Your Software
Detecting the bottlenecks of a program is the first step in optimizing it. It is important, however, to profile the release version rather than the debug version of the program because the debug version of the executable contains additional code. A debug-enabled executable can be about 40% larger than the equivalent release executable. The extra code is required for symbol lookup and other debug "scaffolding". Most implementations provide distinct debug and release versions of operator new and other library functions. Usually, the debug version of new initializes the allocated memory with a unique value and adds a header at block start; the release version of new doesn't perform either of these tasks. Furthermore, a release version of an executable might have been optimized already in several ways, including the elimination of unnecessary temporary objects, loop unrolling (see the sidebar "A Few Compiler Tricks"), moving objects to the registers, and inlining. For these reasons, you cannot assuredly deduce from a debug version where the performance bottlenecks are actually located.
A Few Compiler Tricks
A compiler can automatically optimize the code in several ways. The named return value and loop unrolling are two instances of such automatic optimizations.
Consider the following code:
int *buff = new int[3];
for (int i =0; i<3; i++)
buff[i] = 0;
This loop is inefficient: On every iteration, it assigns a value to the next array element. However, precious CPU time is also wasted on testing and incrementing the counter's value and performing a jump statement. To avoid this overhead, the compiler can unroll the loop into a sequence of three assignment statements, as follows:
buff[0] = 0;
buff[1] = 0;
buff[2] = 0;
The named return value is a C++-specific optimization that eliminates the construction and destruction of a temporary object. When a temporary object is copied to another object using a copy constructor, and when both these objects are cv-unqualified, the Standard allows the implementation to treat the two objects as one, and not perform a copy at all. For example
class A
{
public:
A();
~A();
A(const A&);
A operator=(const A&);
};
A f()
{
A a;
return a;
}
A a2 = f();
The object a does not need to be copied when f() returns. Instead, the return value of f() can be constructed directly into the object a2, thereby avoiding both the construction and destruction of a temporary object on the stack.
Remember also that debugging and optimization are two distinct operations. The debug version needs to be used to trap bugs and to verify that the program is free from logical errors. The tested release version needs to be used in performance tuning and optimizations. Of course, applying the code optimization techniques that are presented in this chapter can enhance the performance of the debug version as well, but the release version is the one that needs to be used for performance evaluation.
NOTE: It is not uncommon to find a "phantom bottleneck" in the debug version, which the programmer strains hard to fix, only to discover later that it has disappeared anyway in the release version. Andrew Koenig wrote an excellent article that tells the story of an evasive bottleneck that automatically dissolved in the release version ("An Example of Hidden Library Overhead", C++ Report vol. 10:2, February 1998, page 11). The lesson that can be learned from this article is applicable to everyone who practices code optimization.
Declaration Placement
The placing of declarations of variables and objects in the program can have significant performance effects. Likewise, choosing between the postfix and prefix operators can also affect performance. This section concentrates on four issues: initialization versus assignment, relocation of declarations to the part of the program that actually uses them, a constructor's member initialization list, and prefix versus postfix operators.
Prefer Initialization to Assignment
C allows declarations only at a block's beginning, before any program statements. For example
void f();
void g()
{
int i;
double d;
char * p;
f();
}
In C++, a declaration is a statement; as such, it can appear almost anywhere within the program. For example
void f();
void g()
{
int i;
f();
double d;
char * p;
}
The motivation for this change in C++ was to allow for declarations of objects right before they are used. There are two benefits to this practice. First, this practice guarantees that an object cannot be tampered with by other parts of the program before it has been used. When objects are declared at the block's beginning and are used only 20 or 50 lines later, there is no such guarantee. For instance, a pointer to an object that was allocated on the free store might be accidentally deleted somewhere before it is actually used. Declaring the pointer right before it is used, however, reduces the likelihood of such mishaps.
The second benefit in declaring objects right before their usage is the capability to initialize them immediately with the desired value. For example
#include <string>
using namespace std;
void func(const string& s)
{
bool emp = s.empty(); //local declarations enables immediate initialization
}
For fundamental types, initialization is only marginally more efficient than assignment; or it can be identical to late assignment in terms of performance. Consider the following version of func(), which applies assignment rather than initialization:
void func2() //less efficient than func()? Not necessarily
{
string s;
bool emp;
emp = s.empty(); //late assignment
}
My compiler produces the same assembly code as it did with the initialization version. However, as far as user-defined types are concerned, the difference between initialization and assignment can be quite noticeable. The following example demonstrates the performance gain in this case (by modifying the preceding example). Instead of a bool variable, a full-blown class object is used, which has all the four special member functions defined:
int constructor, assignment_op, copy,
Page : 1 Next >>