Tải bản đầy đủ - 0 (trang)
Chapter 19. What We Have Learned

Chapter 19. What We Have Learned

Tải bản đầy đủ - 0trang

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



facilitate the breaking up of programs into relatively independent pieces that can be developed by

different programmers, who communicate with one another as little as possible.

C++ is a tool for achieving several conflicting goals. It was designed:

ϒΠ



as a high-level language for readability (data aggregation, flow of control, scope of

names)



ϒΠ



as a language for sharp and quick minds (unique shorthand operators, concise

expressions)



ϒΠ



to use character strings and dynamic memory management



ϒΠ



to use libraries that are provided (defacto standard)



C++ as a Traditional Programming Language

Unlike many high-level languages, C++ is case sensitive. Similar to many modern high-level

languages, C++ is space blind (with two or three exceptions). It uses end-of-line comments but

does not use nested block comments.

Similar to most other programming languages, C++ provides basic built-in data types with

operations over the values of these types. The C++ built-in data types are rather limited¡Xjust

simple integers and floating point values.



C++ Built-in Data Types

To achieve maximum performance, the C++ integer type is always the fastest type on any platform.

Its size is 16 bits on 16-bit machines and 32 bits on 32-bit machines. This results in a portability

problem, so typical for C++: There is no guarantee that a program running on one machine will

produce exactly the same results on another machine.

To aid flexibility (i.e., to save memory where possible) and to add computational power (i.e., to

expand ranges where necessary) for complex computations, C++ provides size modifiers (short,

long, unsigned) for finer use of memory. C++ does not standardize the sizes of different types. It

just requires that a short value is not longer than an integer value; it also requires that a long value

is not shorter than an integer value.

As a result, on modern machines, short values are always 16 bits, and long values are always 32

bits. Programmers who strive for portability avoid using plain integers and instead use either short

or long modifiers. Programmers who strive for speed use plain integers and avoid using short and

long modifiers.

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1161 of 1187) [8/17/2002 2:58:12 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



Use of unsigned values supports even finer memory use and is even more controversial. On the one

hand, defining a value as unsigned indicates to the maintainer that the value is inherently positive

and cannot be negative. Also, the use of the unsigned qualifier doubles the maximum integer value

on the given architecture (for the same number of bits). On the other hand, the mixture of signed

and unsigned values might result in incorrect results in computations. To avoid these errors, many

programmers give up the potential benefits of unsigned values by not using them.

To simplify the choices for the programmer, C++ supports defaults. If the programmer does not

specify whether the value is signed or unsigned, the default is signed; if the programmer does not

specify whether the value is a short integer, a long integer, or just an integer, the default is just an

integer.

Striving for maximum performance, C++ tests computational results neither for underflow nor for

overflow. Everything that should be tested in the program should be tested explicitly in the source

code of the program on the program's own time. If the program does not want to spend time

checking the legitimacy of the results, C++ does not provide any default tests or warnings.

C++ treats characters as just another kind of integer. Their size varies from one byte per character

to two bytes per character (expanded character set). Arithmetic operations over character values are

legal in C++. They are popular, but they could create portability problems when different machines

use different character sets.

The language allows the programmer to specify both signed and unsigned characters. There is no

standard for default type¡Xon some machines it is unsigned, on others it is signed. It is a good idea

to assume that a character cannot contain a negative value and to use an integer instead of a

character if a negative value (e.g., end-of-file code) is possible.

Character literals are enclosed in single quotes. They should not be confused with string literals that

are enclosed in double quotes. C++ does not store the string length with the string contents. It uses

the 0 code to mark the end of the string. This is why the length of a string literal is one more than

the number of characters in the literal.

For floating point types, C++ supports three different sizes: float, double, and long double. Their

sizes range from 4 to 8 to 10 bytes, their precision ranges from 7 to 15 to 19 digits. These

characteristics are machine dependent. C++ floating-point constants are always double, not float or

long double. In most cases, this is not important. When it is necessary to specify that the literal is,

for example, float, the appropriate suffix should be used. C++ supports both the fixed decimal point

notation and scientific notation (with the exponent).

Boolean types have two values, true and false. They are also treated as small integers. The size of a

Boolean value of type bool is one byte rather than one bit. C++ does not pack Boolean values one

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1162 of 1187) [8/17/2002 2:58:12 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



per bit because addressing individual bits in C++ requires logical operations and shifts. In this

tradeoff between space efficiency and time efficiency, C++ favors time efficiency, since the byte is

the least segment of memory that can be addressed directly.

Symbolic names for literal values of any built-in type can be specified using the preprocessor

#define directive. The preprocessor will replace each occurrence of the symbolic name in the

source code with the literal value. Since this is done before the compiler sees the source code, the

errors in the preprocessor directives are often hard to find. Using the const modifier is better

because the names defined with the const modifier follow the scope rules (the names defined in

the #define directive are global).

For each data type, C++ supports two derived data types, a pointer type and a reference type. Both

these types contain an address of the value, but the syntax of their use is different.

C++ allows any conversions between numeric values of different types: the value of one type can

be used where the value of another type is expected. Boolean values and numeric values are also

interchangeable¡Xno syntax error is generated. For numeric values, C++ is a weakly typed

language.

The values of pointers (or references) to different types cannot be converted to each other (or to the

value of the type). For addresses, C++ is a strongly typed language¡Xa syntax error is generated

even when the pointers of different types contain the same address.

An explicit cast can be used for conversions between pointers (and references), but the integrity of

results remains the responsibility of the programmer¡Xno syntax error is generated by the compiler

if the results do not have reasonable meaning or are not portable between different computer

architectures.



C++ Expressions

C++ contains a conventional set of operations over numeric values, such as sign operators,

arithmetic operations, relational operators, equality operators, and logical operators. It has no

exponentiation operator. Similar to most other programming languages, it has no implied

multiplication¡Xthe asterisk should be used as an explicit operator.

C++ treats statements as expressions. To achieve this uniformity, C++ treats the assignment and the

comma as operators (although their priority is the lowest). As a result, erroneous constructs can be

accepted by C++ compilers as valid code.

Since the number of operators is large, C++ uses two-symbol operators and even one three-symbol

operator (the conditional operator). In C++, the meaning of an operator (and of a keyword) is often

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1163 of 1187) [8/17/2002 2:58:12 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



reused for different purposes and thus depends on the context.

Since the sizes of built-in data types are machine dependent, C++ allows the programmer to

evaluate the size of a given variable (given the name of the variable) or the size of any variable of a

given type (given the type name).

Logical, relational, and equality operators return Boolean values true and false, but these values can

be freely converted to numerical values 1 (true) and zero (false). Moreover, any numerical value

can be used where a Boolean value is expected¡Xno syntax error is generated. The zero value is

converted to false, and any other numeric value is interpreted as true. This leniency sometimes

forces C++ compilers to accept code that is semantically incorrect.

Another source of error is the equality operator, which is written as two consecutive equal signs:

omitting one equal sign does not generate a syntax error but quietly changes the meaning of the

source code. This is a common source of error that causes waste of time, frustration, and anxiety.

Logical operators && and || are of different priority¡Xthe operator && binds tighter than ||. This

allows avoiding extra parentheses. Both logical operators are short-circuit operators: In the

compound logical expression, the first operand is evaluated first, and the second one is not

evaluated if the result of the operation is known from the evaluation of the first operator.

C++ has a number of unique operators that provide access to the underlying representation of

information in computer memory. Bitwise logical operators are these operators along with,

inclusive or, exclusive or, negation (complement). They operate on each bit of the operand(s)

individually, creating the result bit by bit.

Bitwise shifts shift the given bit pattern to the left and to the right. When the pattern is shifted to the

left, or a positive value is shifted to the right, zeroes are shifted in¡Xthese operations are portable.

When a negative value is shifted to the right, the result depends on the implementation: Either

zeroes are shifted in (logical shift), or ones are shifted in (arithmetic shift). This operation is not

portable.

Another set of unique operators includes the increment and decrement operators. They emulate

assembly language type processing by providing a side effect (increment or decrement by 1) on the

single lvalue operand. These operators can be prefix or postfix. The prefix operator is applied first,

then the value is used in other expressions; the postfix operator is applied after the value is used in

other expressions.

C++ does specify the order of evaluation of operators in an expression. However, it does not

specify the order of evaluation of operands. Hence, a C++ program is not allowed to rely on a

specific order of evaluation of operands in an expression. In particular, the operands with side

effects (increment and decrement operators) are a common source of portability problems. It is a

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1164 of 1187) [8/17/2002 2:58:12 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



good idea to only use increment and decrement operators in stand-alone expressions to avoid

portability issues.

Another unique operator is the conditional operator: Depending on the value of its first operand, it

evaluates either its second operand (the first operand is true) or its third operand (the second

operand is false).

Yet another set of unique C++ operators includes arithmetic assignments and the comma operator.

These operators help to write succinct and expressive C++ code.

C++ binary operators are always applied to operands of exactly the same type. When the source

code specifies operands of different types, C++ applies widening conversions: A shorter operand is

converted to the widest type in the expression. In the assignment operator, the value on the righthand side is converted to the type of the left-hand side, even if this might cause a possible loss of

precision.



C++ Control Flow

As in other languages, C++ statements are executed sequentially. Each statement is terminated by a

semicolon. Blocks (compound statements) are allowed; they are delimited by braces and can have

local variables. No semicolon is used after the closing brace of the block.

Compound statements can be nested; they can serve as a function body or as a control statement

body. Local variables defined in a nested block are not visible outside the block.

C++ has a standard set of control constructs. The if-else statement does not use the then

keyword; it executes the true branch when the statement's expression has a nonzero value of any

type; it executes the false branch when the statement's expression has a zero (false) value.

It implements repeated actions. C++ supports three forms of iterative statements: the while loop (it

allows for zero repetitions), the do-while loop (it enforces at least one repetition), and the for loop

(mostly for a fixed number of repetitions).

A popular C++ programming idiom is to combine a test for continued iteration with the

assignment. Using this idiom, one has to be careful with parentheses: Omitting parentheses might

change the meaning of the expression because C++ comparison has a higher precedence than C++

assignment has.

C++ does not support unrestricted jumps. The goto statement cannot leave its scope and cannot

jump over definitions of variables. The break statement exits from a loop so that control flow

jumps to the statement after the loop. The break statement can be used with all three loop

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1165 of 1187) [8/17/2002 2:58:13 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



constructs and is usually executed in a conditional statement. The continue statement skips the

rest of the loop body and returns to the loop top for the test of further iterations.

The C++ switch statement supports multiway decisions in the program: It provides alternative

execution paths based on the value of an integral expression. (Floating point cannot be used.) The

default case is executed if no match is found. Unlike in other languages, the default statement is

optional; if it is absent and the match is not found, the next statement is executed. To create a

construct with multiple branches, break statements should be used at the end of each branch.



C++ as a Modular Language

Similar to most modern high-level languages, C++ supports hierarchies of building blocks for

program data and for program operations. From a software engineering point of view, the benefits

of modularization for large projects include division of labor, simpler programming tasks, reusable

and maintainable program elements, and the opportunity to study the program at different levels,

either in general (disregarding details), or in detail (disregarding high-level issues).

When used correctly, these benefits result in higher productivity both for development and for

maintenance and in fewer errors.

C++ supports programmer-defined aggregate data types: arrays, structures, unions, and

enumerations. Their components can be either of built-in data types or of other C++ aggregate

types (arrays, structures, etc.).

C++ supports programmer-defined functions. The hierarchies of functions model the hierarchies of

actions of real-life objects that the program maintains information about. C++ supports the use of

standard libraries. Standard libraries implement a variety of common tasks. Library functions are

optimized, well tested, and have broad applicability.

The need to specify header files with function prototypes makes using library functions more

difficult than necessary, but one can learn to live with it.



C++ Aggregate Types: Arrays

C++ arrays can only contain elements of the same type. The greatest limitation of C++ arrays is

that the array size must be known at compile time. If the array contains more elements than

necessary, memory is wasted. If the array contains less elements than necessary, memory is

corrupted.

Another common source of errors in using C++ arrays is that the index of the first element is

always 0. This cannot be changed. Hence, the index of the last element is 1 less than the array

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1166 of 1187) [8/17/2002 2:58:13 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



range. C++ does not support the compile time index check: This is often impossible, but the

compiler would not do that even if it were possible. Neither does C++ support the run-time test of

index validity: It would affect execution time.

C++ philosophy assumes that you do not want to waste time at any access to the array; when you

want to check index validity, you can write your own code to do that; when you do not check index

validity, C++ assumes that you know what you are doing. On memory-rich machines, index errors

might not result in incorrect run-time results (until the memory usage changes). This is a serious

problem with no good solution.

C++ allows the programmer to implement array processing algorithms using either indices to

access array components or pointers. This is based on the fact that the increment (or decrement)

operator applied to a pointer increments the address not by 1 but by the size of the array element.

This operation points the pointer to the next element of the array. The use of pointers allows one to

write concise and expressive array processing code. However, there are no performance advantages

in using this technique. Some programmers find this kind of code somewhat difficult to verify.

C++ supports arrays of any dimension. Under the hood, they are implemented as one-dimensional

arrays with the row-major order. (The right subscript varies the fastest.) Similar to one-dimensional

arrays, C++ multidimensional arrays support no checks of index validity.

C++ represents text as arrays of characters. These arrays have to have an extra element to

accommodate the zero sentinel value that is used to mark the end of valid data in the array. When

the compiler processes the program text literals, it also appends the terminating zero to the symbols

of the string; hence, the literals have an extra element as well. All library functions that deal with

arrays of characters expect the terminating zero at the end of valid data. When these library

functions change the contents of the array, they append the terminating zero to the end of valid data

to keep the string in the valid state.

C++ supports neither array assignments nor array comparisons. For arrays of arbitrary types, it is

the responsibility of the programmer to make sure that these operations are performed correctly.

For text strings, library functions are used for assignment, comparisons, concatenation, and other

standard operations.

Most library functions do not work well when strings overlap in memory. When writing to a

character array, no C++ library function checks for available space. If the string does not have

enough available space, the computer memory is silently corrupted. This is a serious integrity

problem.



C++ Aggregate Types: Structures, Unions, Enumerations

C++ structures combine related components. What components are related and what are not is often

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1167 of 1187) [8/17/2002 2:58:13 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



a matter of judgment. C++ leaves this to the discretion of the programmer and does not impose any

limitations on the types of the components.

The structure definition is a blueprint for creation of structure variables. For each structure field,

the programmer supplies the type and the name of the field. The scope of the structure definition is

delimited by the opening and the closing brace that is followed by the semicolon.

Structure variables can be initialized using the syntax similar to the syntax of array initialization (a

comma-separated list of values delimited by the braces).

The dot selector operator selects a structure object's fields (both as an lvalue and as an rvalue).

When the structure variable is referred through a pointer, the dot selector operator does not work;

instead, the arrow selector operator should be used.

C++ supports assignment for structure variables of the same type. The value semantics is

implemented: The fields of the rvalue structure variable are copied bitwise into the fields of the

lvalue structure variables.

Assignments between structure variables of different types are not allowed, even when they have

the same composition and even when the fields in both structure definitions have the same names.

It is the type name that has to be the same. Notice that using the typedef facility would not make

the type name the same: It would only create a synonym for the type name.

Assignments between structure variables and numeric variables (or pointer or reference variables)

are not allowed: For programmer-defined types, C++ behaves as a strongly typed language, and

these assignments are marked as syntax errors by the compiler.

C++ supports no structure comparisons or any other operations over structures; you should write

your own code to implement structure operations.

Union is a type definition that syntactically is similar to the structure definition: Several fields of

different types can be listed between the scope braces (followed by a semicolon). However, these

fields exist in the memory of the computer, not simultaneously (as in structure variables), but

sequentially.

This design allows the program to save space: A union variable can contain information of one of

the mutually exclusive types specified in the union type definition. This is, of course, error prone

because the programmer should make sure that the value retrieved from the union variable is of the

same type as the value saved in this variable earlier, and the union itself has no means to keep the

type information.

If the program makes a mistake and retrieves a value of a different type, there is no compile-time

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1168 of 1187) [8/17/2002 2:58:13 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



error, and there is no run-time error; a useless bit pattern is retrieved silently. To avoid these errors,

union variables can be used as fields of structures; a tag field can be added to the structure to keep

information on how the union field value was initialized. When the union value is retrieved, the

program consults this tag field and acts accordingly. This is how polymorphism used to be

implemented before virtual functions were invented.

Enumeration types define variables that accept values from a predefined set of symbolic identifiers.

The syntax of the enumeration type definition is similar to the syntax of the structure definition: A

set of comma-separated symbolic names is specified within the scope of the braces (followed by

the semicolon). This is a popular way to define symbolic constants for the program.

C++ defines no operations over enumeration values. Under the hood, they are implemented as

integer (starting, naturally, with zero), and the program might try to use this knowledge, but this is

not a good idea.



C++ Functions as Modularization Tools

C++ supports hiding operation complexity in functions. The client code uses server functions as

single units of code. This streamlines the caller code toward its goal: The client code is expressed in

terms of function calls to the server functions rather than in terms of lower-level details of

operations over data.

In the latter case, when the client code implements data processing without calling server functions,

the maintenance programmer should figure out what the meaning of the sequence of statements is.

In the case of using server functions, the goal of each operation is expressed by the name of the

function (provided that the function name is sufficiently descriptive).

C++ functions cooperate with each other working toward a common program goal. They cooperate

by working on common data. The values of data can be set in one function and used in another

function. The exchange of data can be implemented using global variables, parameters, and return

values. Coupling through global variables is implicit: It is not immediately evident to the

maintainer and hence should be used as little as possible.

Coupling through parameters is better because it is explicit: It is immediately evident to the

maintainer (and to the client programmer) what values participate in the data flow between the

function and its client functions.

Coupling (the number of parameters) should be minimized by dividing responsibilities between

functions so that what belongs together is not torn apart between different functions. It is the

tearing apart of what should belong together that causes the need for communication between

functions.



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1169 of 1187) [8/17/2002 2:58:13 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



When calling a function, the client code has to supply an actual argument for every formal

parameter in the function definition. It is possible to define default values of the arguments so that

they will be used when the client code does not specify the values of actual arguments.

In parameter passing, C++ is designed as a strongly typed language: the number of arguments

should match the number of formal parameters; the type of each argument has to match the type of

the corresponding formal parameter. The deviations from this rule are flagged as syntax errors by

the compiler.

The exception to this rule is made for numeric types only. If there is a type mismatch between a

formal parameter and its actual argument, promotions and conversions can be applied: small

arguments (enum, char, unsigned char, short) are promoted to integers, unsigned short

are promoted either to int or to unsigned int (depending on the machine architecture), and

float arguments are promoted to type double. If after promotion the actual argument type still

does not match the formal parameter type, conversions are applied: Any numeric type can be

converted to any other numeric type, even if it results in loss of accuracy (e.g., from double to

integer).

Promotions and conversions do not apply to programmer-defined types, pointers, and references

(even when they are pointers or references to numeric types). They are applied to numeric types

only.

C++ is a language for separate compilations. To assist the compiler, the function interface must be

known to the compiler before a function call is processed. Unless the function definition precedes

the function call in the source file (not a common occurrence), a function prototype should be used,

with the types of parameters and return value specified. Parameter names in a function prototype

are useful but optional.

A C++ function may be defined only once. It can be declared (as a prototype) as many times as

needed. If the function is used in several files, it has to be declared in each one. Function prototypes

are often put in #include header files.

A C++ global function is defined by its name and by the sequence of types of its parameters. When

the function is defined as a class member, the class name is also a part of the function definition.

This combination (the function signature)¡Xthe class name (if any), the function name, and the list

of parameter types¡Xhas to be unique. This means that the function name can be overloaded: A

function with the same name but with a different set of parameters will be considered a different

function. The return type of the function is not a part of the function signature.

A C++ function can be defined as an inline function. Instead of the function call, the compiler

generates object code for such a function and inserts it into the client code. When this function is

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1170 of 1187) [8/17/2002 2:58:13 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



called, time is not spent on the context switch. For applications that are concerned with the speed of

execution, this is important.

C++ has functions only: There are no procedures. If the application needs a procedure, a void

function can be used.

If a function returns a value, C++ allows the caller to ignore a return value in a call and use a

function as a statement. Many C++ library functions have return values that are rarely used. It is a

good idea, however, not to ignore return values.



C++ Functions: Parameter Passing

C++ passes parameters by value. At the time of the call, the space for parameters and local

variables is allocated from the stack, and the argument values (variables, expressions, or literals)

are copied into the space allocated for parameters. These values are used in the function during its

execution. When function terminates, the stack space is returned.

In parameter passing, the values move in one direction only, from actual arguments to formal

parameters; changed parameter values are not passed back, and the actual arguments in the client

scope do not change.

For side effects in the client space, C++ supports pass by pointer: instead of the value of a given

type, a pointer to the value of a given type is passed as an actual argument. C++ pointers are

variables in all respects. They are passed by value: The value of the pointer is copied into the

formal parameter. When the pointer is used during function execution, it contains the address of the

variable in the client space. The value of this variable can be changed through the pointer if

necessary.

When the function execution reaches the closing brace, the pointer is destroyed along with other

formal parameters (if any). Hence, the function cannot change the value of the pointer. But this is

not a problem, because there is no need to change the address. The goal of parameter passing by

pointer is to change the variable in the client space whose address is passed as the actual parameter.

Passing parameters by pointer is complex: The programmer has to coordinate the code in three

places: (1) the pointer notation (*) is used in the function header and in the prototype, (2) the

dereferencing operator (*) is used in the function body, and (3) the address-of operator (&) is used

in the function call.

To simplify parameter passing, C++ adds yet another mode of parameter passing that supports side

effects in the caller space. In pass by reference, the coordination between different places in code is

simpler: (1) the name of the variable without operators is used in the function header, (2) the

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (1171 of 1187) [8/17/2002 2:58:13 PM]



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 19. What We Have Learned

Tải bản đầy đủ ngay(0 tr)

×