Tải bản đầy đủ - 0trang
Chapter 19. What We Have Learned
facilitate the breaking up of programs into relatively independent pieces that can be developed by
different programmers, who communicate with one another as little as possible.
C++ is a tool for achieving several conflicting goals. It was designed:
as a high-level language for readability (data aggregation, flow of control, scope of
as a language for sharp and quick minds (unique shorthand operators, concise
to use character strings and dynamic memory management
to use libraries that are provided (defacto standard)
C++ as a Traditional Programming Language
Unlike many high-level languages, C++ is case sensitive. Similar to many modern high-level
languages, C++ is space blind (with two or three exceptions). It uses end-of-line comments but
does not use nested block comments.
Similar to most other programming languages, C++ provides basic built-in data types with
operations over the values of these types. The C++ built-in data types are rather limited¡Xjust
simple integers and floating point values.
C++ Built-in Data Types
To achieve maximum performance, the C++ integer type is always the fastest type on any platform.
Its size is 16 bits on 16-bit machines and 32 bits on 32-bit machines. This results in a portability
problem, so typical for C++: There is no guarantee that a program running on one machine will
produce exactly the same results on another machine.
To aid flexibility (i.e., to save memory where possible) and to add computational power (i.e., to
expand ranges where necessary) for complex computations, C++ provides size modifiers (short,
long, unsigned) for finer use of memory. C++ does not standardize the sizes of different types. It
just requires that a short value is not longer than an integer value; it also requires that a long value
is not shorter than an integer value.
As a result, on modern machines, short values are always 16 bits, and long values are always 32
bits. Programmers who strive for portability avoid using plain integers and instead use either short
or long modifiers. Programmers who strive for speed use plain integers and avoid using short and
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1161 of 1187) [8/17/2002 2:58:12 PM]
Use of unsigned values supports even finer memory use and is even more controversial. On the one
hand, defining a value as unsigned indicates to the maintainer that the value is inherently positive
and cannot be negative. Also, the use of the unsigned qualifier doubles the maximum integer value
on the given architecture (for the same number of bits). On the other hand, the mixture of signed
and unsigned values might result in incorrect results in computations. To avoid these errors, many
programmers give up the potential benefits of unsigned values by not using them.
To simplify the choices for the programmer, C++ supports defaults. If the programmer does not
specify whether the value is signed or unsigned, the default is signed; if the programmer does not
specify whether the value is a short integer, a long integer, or just an integer, the default is just an
Striving for maximum performance, C++ tests computational results neither for underflow nor for
overflow. Everything that should be tested in the program should be tested explicitly in the source
code of the program on the program's own time. If the program does not want to spend time
checking the legitimacy of the results, C++ does not provide any default tests or warnings.
C++ treats characters as just another kind of integer. Their size varies from one byte per character
to two bytes per character (expanded character set). Arithmetic operations over character values are
legal in C++. They are popular, but they could create portability problems when different machines
use different character sets.
The language allows the programmer to specify both signed and unsigned characters. There is no
standard for default type¡Xon some machines it is unsigned, on others it is signed. It is a good idea
to assume that a character cannot contain a negative value and to use an integer instead of a
character if a negative value (e.g., end-of-file code) is possible.
Character literals are enclosed in single quotes. They should not be confused with string literals that
are enclosed in double quotes. C++ does not store the string length with the string contents. It uses
the 0 code to mark the end of the string. This is why the length of a string literal is one more than
the number of characters in the literal.
For floating point types, C++ supports three different sizes: float, double, and long double. Their
sizes range from 4 to 8 to 10 bytes, their precision ranges from 7 to 15 to 19 digits. These
characteristics are machine dependent. C++ floating-point constants are always double, not float or
long double. In most cases, this is not important. When it is necessary to specify that the literal is,
for example, float, the appropriate suffix should be used. C++ supports both the fixed decimal point
notation and scientific notation (with the exponent).
Boolean types have two values, true and false. They are also treated as small integers. The size of a
Boolean value of type bool is one byte rather than one bit. C++ does not pack Boolean values one
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1162 of 1187) [8/17/2002 2:58:12 PM]
per bit because addressing individual bits in C++ requires logical operations and shifts. In this
tradeoff between space efficiency and time efficiency, C++ favors time efficiency, since the byte is
the least segment of memory that can be addressed directly.
Symbolic names for literal values of any built-in type can be specified using the preprocessor
#define directive. The preprocessor will replace each occurrence of the symbolic name in the
source code with the literal value. Since this is done before the compiler sees the source code, the
errors in the preprocessor directives are often hard to find. Using the const modifier is better
because the names defined with the const modifier follow the scope rules (the names defined in
the #define directive are global).
For each data type, C++ supports two derived data types, a pointer type and a reference type. Both
these types contain an address of the value, but the syntax of their use is different.
C++ allows any conversions between numeric values of different types: the value of one type can
be used where the value of another type is expected. Boolean values and numeric values are also
interchangeable¡Xno syntax error is generated. For numeric values, C++ is a weakly typed
The values of pointers (or references) to different types cannot be converted to each other (or to the
value of the type). For addresses, C++ is a strongly typed language¡Xa syntax error is generated
even when the pointers of different types contain the same address.
An explicit cast can be used for conversions between pointers (and references), but the integrity of
results remains the responsibility of the programmer¡Xno syntax error is generated by the compiler
if the results do not have reasonable meaning or are not portable between different computer
C++ contains a conventional set of operations over numeric values, such as sign operators,
arithmetic operations, relational operators, equality operators, and logical operators. It has no
exponentiation operator. Similar to most other programming languages, it has no implied
multiplication¡Xthe asterisk should be used as an explicit operator.
C++ treats statements as expressions. To achieve this uniformity, C++ treats the assignment and the
comma as operators (although their priority is the lowest). As a result, erroneous constructs can be
accepted by C++ compilers as valid code.
Since the number of operators is large, C++ uses two-symbol operators and even one three-symbol
operator (the conditional operator). In C++, the meaning of an operator (and of a keyword) is often
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1163 of 1187) [8/17/2002 2:58:12 PM]
reused for different purposes and thus depends on the context.
Since the sizes of built-in data types are machine dependent, C++ allows the programmer to
evaluate the size of a given variable (given the name of the variable) or the size of any variable of a
given type (given the type name).
Logical, relational, and equality operators return Boolean values true and false, but these values can
be freely converted to numerical values 1 (true) and zero (false). Moreover, any numerical value
can be used where a Boolean value is expected¡Xno syntax error is generated. The zero value is
converted to false, and any other numeric value is interpreted as true. This leniency sometimes
forces C++ compilers to accept code that is semantically incorrect.
Another source of error is the equality operator, which is written as two consecutive equal signs:
omitting one equal sign does not generate a syntax error but quietly changes the meaning of the
source code. This is a common source of error that causes waste of time, frustration, and anxiety.
Logical operators && and || are of different priority¡Xthe operator && binds tighter than ||. This
allows avoiding extra parentheses. Both logical operators are short-circuit operators: In the
compound logical expression, the first operand is evaluated first, and the second one is not
evaluated if the result of the operation is known from the evaluation of the first operator.
C++ has a number of unique operators that provide access to the underlying representation of
information in computer memory. Bitwise logical operators are these operators along with,
inclusive or, exclusive or, negation (complement). They operate on each bit of the operand(s)
individually, creating the result bit by bit.
Bitwise shifts shift the given bit pattern to the left and to the right. When the pattern is shifted to the
left, or a positive value is shifted to the right, zeroes are shifted in¡Xthese operations are portable.
When a negative value is shifted to the right, the result depends on the implementation: Either
zeroes are shifted in (logical shift), or ones are shifted in (arithmetic shift). This operation is not
Another set of unique operators includes the increment and decrement operators. They emulate
assembly language type processing by providing a side effect (increment or decrement by 1) on the
single lvalue operand. These operators can be prefix or postfix. The prefix operator is applied first,
then the value is used in other expressions; the postfix operator is applied after the value is used in
C++ does specify the order of evaluation of operators in an expression. However, it does not
specify the order of evaluation of operands. Hence, a C++ program is not allowed to rely on a
specific order of evaluation of operands in an expression. In particular, the operands with side
effects (increment and decrement operators) are a common source of portability problems. It is a
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1164 of 1187) [8/17/2002 2:58:12 PM]
good idea to only use increment and decrement operators in stand-alone expressions to avoid
Another unique operator is the conditional operator: Depending on the value of its first operand, it
evaluates either its second operand (the first operand is true) or its third operand (the second
operand is false).
Yet another set of unique C++ operators includes arithmetic assignments and the comma operator.
These operators help to write succinct and expressive C++ code.
C++ binary operators are always applied to operands of exactly the same type. When the source
code specifies operands of different types, C++ applies widening conversions: A shorter operand is
converted to the widest type in the expression. In the assignment operator, the value on the righthand side is converted to the type of the left-hand side, even if this might cause a possible loss of
C++ Control Flow
As in other languages, C++ statements are executed sequentially. Each statement is terminated by a
semicolon. Blocks (compound statements) are allowed; they are delimited by braces and can have
local variables. No semicolon is used after the closing brace of the block.
Compound statements can be nested; they can serve as a function body or as a control statement
body. Local variables defined in a nested block are not visible outside the block.
C++ has a standard set of control constructs. The if-else statement does not use the then
keyword; it executes the true branch when the statement's expression has a nonzero value of any
type; it executes the false branch when the statement's expression has a zero (false) value.
It implements repeated actions. C++ supports three forms of iterative statements: the while loop (it
allows for zero repetitions), the do-while loop (it enforces at least one repetition), and the for loop
(mostly for a fixed number of repetitions).
A popular C++ programming idiom is to combine a test for continued iteration with the
assignment. Using this idiom, one has to be careful with parentheses: Omitting parentheses might
change the meaning of the expression because C++ comparison has a higher precedence than C++
C++ does not support unrestricted jumps. The goto statement cannot leave its scope and cannot
jump over definitions of variables. The break statement exits from a loop so that control flow
jumps to the statement after the loop. The break statement can be used with all three loop
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1165 of 1187) [8/17/2002 2:58:13 PM]
constructs and is usually executed in a conditional statement. The continue statement skips the
rest of the loop body and returns to the loop top for the test of further iterations.
The C++ switch statement supports multiway decisions in the program: It provides alternative
execution paths based on the value of an integral expression. (Floating point cannot be used.) The
default case is executed if no match is found. Unlike in other languages, the default statement is
optional; if it is absent and the match is not found, the next statement is executed. To create a
construct with multiple branches, break statements should be used at the end of each branch.
C++ as a Modular Language
Similar to most modern high-level languages, C++ supports hierarchies of building blocks for
program data and for program operations. From a software engineering point of view, the benefits
of modularization for large projects include division of labor, simpler programming tasks, reusable
and maintainable program elements, and the opportunity to study the program at different levels,
either in general (disregarding details), or in detail (disregarding high-level issues).
When used correctly, these benefits result in higher productivity both for development and for
maintenance and in fewer errors.
C++ supports programmer-defined aggregate data types: arrays, structures, unions, and
enumerations. Their components can be either of built-in data types or of other C++ aggregate
types (arrays, structures, etc.).
C++ supports programmer-defined functions. The hierarchies of functions model the hierarchies of
actions of real-life objects that the program maintains information about. C++ supports the use of
standard libraries. Standard libraries implement a variety of common tasks. Library functions are
optimized, well tested, and have broad applicability.
The need to specify header files with function prototypes makes using library functions more
difficult than necessary, but one can learn to live with it.
C++ Aggregate Types: Arrays
C++ arrays can only contain elements of the same type. The greatest limitation of C++ arrays is
that the array size must be known at compile time. If the array contains more elements than
necessary, memory is wasted. If the array contains less elements than necessary, memory is
Another common source of errors in using C++ arrays is that the index of the first element is
always 0. This cannot be changed. Hence, the index of the last element is 1 less than the array
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1166 of 1187) [8/17/2002 2:58:13 PM]
range. C++ does not support the compile time index check: This is often impossible, but the
compiler would not do that even if it were possible. Neither does C++ support the run-time test of
index validity: It would affect execution time.
C++ philosophy assumes that you do not want to waste time at any access to the array; when you
want to check index validity, you can write your own code to do that; when you do not check index
validity, C++ assumes that you know what you are doing. On memory-rich machines, index errors
might not result in incorrect run-time results (until the memory usage changes). This is a serious
problem with no good solution.
C++ allows the programmer to implement array processing algorithms using either indices to
access array components or pointers. This is based on the fact that the increment (or decrement)
operator applied to a pointer increments the address not by 1 but by the size of the array element.
This operation points the pointer to the next element of the array. The use of pointers allows one to
write concise and expressive array processing code. However, there are no performance advantages
in using this technique. Some programmers find this kind of code somewhat difficult to verify.
C++ supports arrays of any dimension. Under the hood, they are implemented as one-dimensional
arrays with the row-major order. (The right subscript varies the fastest.) Similar to one-dimensional
arrays, C++ multidimensional arrays support no checks of index validity.
C++ represents text as arrays of characters. These arrays have to have an extra element to
accommodate the zero sentinel value that is used to mark the end of valid data in the array. When
the compiler processes the program text literals, it also appends the terminating zero to the symbols
of the string; hence, the literals have an extra element as well. All library functions that deal with
arrays of characters expect the terminating zero at the end of valid data. When these library
functions change the contents of the array, they append the terminating zero to the end of valid data
to keep the string in the valid state.
C++ supports neither array assignments nor array comparisons. For arrays of arbitrary types, it is
the responsibility of the programmer to make sure that these operations are performed correctly.
For text strings, library functions are used for assignment, comparisons, concatenation, and other
Most library functions do not work well when strings overlap in memory. When writing to a
character array, no C++ library function checks for available space. If the string does not have
enough available space, the computer memory is silently corrupted. This is a serious integrity
C++ Aggregate Types: Structures, Unions, Enumerations
C++ structures combine related components. What components are related and what are not is often
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1167 of 1187) [8/17/2002 2:58:13 PM]
a matter of judgment. C++ leaves this to the discretion of the programmer and does not impose any
limitations on the types of the components.
The structure definition is a blueprint for creation of structure variables. For each structure field,
the programmer supplies the type and the name of the field. The scope of the structure definition is
delimited by the opening and the closing brace that is followed by the semicolon.
Structure variables can be initialized using the syntax similar to the syntax of array initialization (a
comma-separated list of values delimited by the braces).
The dot selector operator selects a structure object's fields (both as an lvalue and as an rvalue).
When the structure variable is referred through a pointer, the dot selector operator does not work;
instead, the arrow selector operator should be used.
C++ supports assignment for structure variables of the same type. The value semantics is
implemented: The fields of the rvalue structure variable are copied bitwise into the fields of the
lvalue structure variables.
Assignments between structure variables of different types are not allowed, even when they have
the same composition and even when the fields in both structure definitions have the same names.
It is the type name that has to be the same. Notice that using the typedef facility would not make
the type name the same: It would only create a synonym for the type name.
Assignments between structure variables and numeric variables (or pointer or reference variables)
are not allowed: For programmer-defined types, C++ behaves as a strongly typed language, and
these assignments are marked as syntax errors by the compiler.
C++ supports no structure comparisons or any other operations over structures; you should write
your own code to implement structure operations.
Union is a type definition that syntactically is similar to the structure definition: Several fields of
different types can be listed between the scope braces (followed by a semicolon). However, these
fields exist in the memory of the computer, not simultaneously (as in structure variables), but
This design allows the program to save space: A union variable can contain information of one of
the mutually exclusive types specified in the union type definition. This is, of course, error prone
because the programmer should make sure that the value retrieved from the union variable is of the
same type as the value saved in this variable earlier, and the union itself has no means to keep the
If the program makes a mistake and retrieves a value of a different type, there is no compile-time
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1168 of 1187) [8/17/2002 2:58:13 PM]
error, and there is no run-time error; a useless bit pattern is retrieved silently. To avoid these errors,
union variables can be used as fields of structures; a tag field can be added to the structure to keep
information on how the union field value was initialized. When the union value is retrieved, the
program consults this tag field and acts accordingly. This is how polymorphism used to be
implemented before virtual functions were invented.
Enumeration types define variables that accept values from a predefined set of symbolic identifiers.
The syntax of the enumeration type definition is similar to the syntax of the structure definition: A
set of comma-separated symbolic names is specified within the scope of the braces (followed by
the semicolon). This is a popular way to define symbolic constants for the program.
C++ defines no operations over enumeration values. Under the hood, they are implemented as
integer (starting, naturally, with zero), and the program might try to use this knowledge, but this is
not a good idea.
C++ Functions as Modularization Tools
C++ supports hiding operation complexity in functions. The client code uses server functions as
single units of code. This streamlines the caller code toward its goal: The client code is expressed in
terms of function calls to the server functions rather than in terms of lower-level details of
operations over data.
In the latter case, when the client code implements data processing without calling server functions,
the maintenance programmer should figure out what the meaning of the sequence of statements is.
In the case of using server functions, the goal of each operation is expressed by the name of the
function (provided that the function name is sufficiently descriptive).
C++ functions cooperate with each other working toward a common program goal. They cooperate
by working on common data. The values of data can be set in one function and used in another
function. The exchange of data can be implemented using global variables, parameters, and return
values. Coupling through global variables is implicit: It is not immediately evident to the
maintainer and hence should be used as little as possible.
Coupling through parameters is better because it is explicit: It is immediately evident to the
maintainer (and to the client programmer) what values participate in the data flow between the
function and its client functions.
Coupling (the number of parameters) should be minimized by dividing responsibilities between
functions so that what belongs together is not torn apart between different functions. It is the
tearing apart of what should belong together that causes the need for communication between
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1169 of 1187) [8/17/2002 2:58:13 PM]
When calling a function, the client code has to supply an actual argument for every formal
parameter in the function definition. It is possible to define default values of the arguments so that
they will be used when the client code does not specify the values of actual arguments.
In parameter passing, C++ is designed as a strongly typed language: the number of arguments
should match the number of formal parameters; the type of each argument has to match the type of
the corresponding formal parameter. The deviations from this rule are flagged as syntax errors by
The exception to this rule is made for numeric types only. If there is a type mismatch between a
formal parameter and its actual argument, promotions and conversions can be applied: small
arguments (enum, char, unsigned char, short) are promoted to integers, unsigned short
are promoted either to int or to unsigned int (depending on the machine architecture), and
float arguments are promoted to type double. If after promotion the actual argument type still
does not match the formal parameter type, conversions are applied: Any numeric type can be
converted to any other numeric type, even if it results in loss of accuracy (e.g., from double to
Promotions and conversions do not apply to programmer-defined types, pointers, and references
(even when they are pointers or references to numeric types). They are applied to numeric types
C++ is a language for separate compilations. To assist the compiler, the function interface must be
known to the compiler before a function call is processed. Unless the function definition precedes
the function call in the source file (not a common occurrence), a function prototype should be used,
with the types of parameters and return value specified. Parameter names in a function prototype
are useful but optional.
A C++ function may be defined only once. It can be declared (as a prototype) as many times as
needed. If the function is used in several files, it has to be declared in each one. Function prototypes
are often put in #include header files.
A C++ global function is defined by its name and by the sequence of types of its parameters. When
the function is defined as a class member, the class name is also a part of the function definition.
This combination (the function signature)¡Xthe class name (if any), the function name, and the list
of parameter types¡Xhas to be unique. This means that the function name can be overloaded: A
function with the same name but with a different set of parameters will be considered a different
function. The return type of the function is not a part of the function signature.
A C++ function can be defined as an inline function. Instead of the function call, the compiler
generates object code for such a function and inserts it into the client code. When this function is
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1170 of 1187) [8/17/2002 2:58:13 PM]
called, time is not spent on the context switch. For applications that are concerned with the speed of
execution, this is important.
C++ has functions only: There are no procedures. If the application needs a procedure, a void
function can be used.
If a function returns a value, C++ allows the caller to ignore a return value in a call and use a
function as a statement. Many C++ library functions have return values that are rarely used. It is a
good idea, however, not to ignore return values.
C++ Functions: Parameter Passing
C++ passes parameters by value. At the time of the call, the space for parameters and local
variables is allocated from the stack, and the argument values (variables, expressions, or literals)
are copied into the space allocated for parameters. These values are used in the function during its
execution. When function terminates, the stack space is returned.
In parameter passing, the values move in one direction only, from actual arguments to formal
parameters; changed parameter values are not passed back, and the actual arguments in the client
scope do not change.
For side effects in the client space, C++ supports pass by pointer: instead of the value of a given
type, a pointer to the value of a given type is passed as an actual argument. C++ pointers are
variables in all respects. They are passed by value: The value of the pointer is copied into the
formal parameter. When the pointer is used during function execution, it contains the address of the
variable in the client space. The value of this variable can be changed through the pointer if
When the function execution reaches the closing brace, the pointer is destroyed along with other
formal parameters (if any). Hence, the function cannot change the value of the pointer. But this is
not a problem, because there is no need to change the address. The goal of parameter passing by
pointer is to change the variable in the client space whose address is passed as the actual parameter.
Passing parameters by pointer is complex: The programmer has to coordinate the code in three
places: (1) the pointer notation (*) is used in the function header and in the prototype, (2) the
dereferencing operator (*) is used in the function body, and (3) the address-of operator (&) is used
in the function call.
To simplify parameter passing, C++ adds yet another mode of parameter passing that supports side
effects in the caller space. In pass by reference, the coordination between different places in code is
simpler: (1) the name of the variable without operators is used in the function header, (2) the
file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1171 of 1187) [8/17/2002 2:58:13 PM]