Tải bản đầy đủ - 0 (trang)
Chapter 5. Aggregation with Programmer-Defined Data Types

Chapter 5. Aggregation with Programmer-Defined Data Types

Tải bản đầy đủ - 0trang


Quickly: A Brief Overview of C++," is sufficient for understanding the basic concepts about

functions, but not for understanding classes and different ways to build them.

I am going to cover a lot of ground in this chapter, and the material I discuss is going to be diverse

and complex. You may want to make your road to learning classes easier and skip those parts of

this chapter that are not direct prerequisites for understanding classes. If this is the case, concentrate

on arrays (one-dimensional only) and structures (but not on hierarchical structures). Unions and bit

fields are programming techniques that have little to do with classes. I do not mean to say that they

are not important. You can come back to this chapter when you feel you want to expand the breadth

of your programming skills.

I am not so sure about enumerations. Formally, you do not need enumerations to understand

classes, but C++ programmers often use enumerations to define sizes of class components. When

you study Chapter 9, "C++ Class as a Unit of Modularization," you will see some enumerations

used. These are quite intuitive, but if you feel you need more discussion of enumeration type, come

back to this chapter and look it up.

Arrays as Homogeneous Aggregates

An array is a set of elements of the same data type. One can visualize an array as a set of

contiguous memory locations. These locations are all of the same size and represent components of

the same type. We can define arrays of integers, or floating point values, or characters, or any

programmer-defined type as long as this type is known at the place in the source code where the

array is being defined.

Arrays as Vectors of Values

The ordinary variables we studied in Chapter 3, "Working with C++ Data and Expressions," are

called scalars or atomic variables (simple variables).

They are characterized by a single value. Sometimes you might want to distinguish between

different components of the value, for example, between the whole part and the fractional part of a

floating point number. However, the language does not support this distinction. It treats these

variables as having no components. This is why these variables are called scalar or atomic

variables. To extract the fractional part of a floating point number, you have to invent some C++

code to do that. This is not too difficult (library functions are available), but there is no languagedefined, built-in way to do that.

fraction = x - floor(x);

// get fractional part of x

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (196 of 1187) [8/17/2002 2:57:48 PM]


Here, x and fraction are double floating point numbers, and floor() is a function defined in the

math.h (or cmath) library header file that returns the largest integer (converted to double) that does

not exceed its argument. The language itself, however, treats the values of built-in types as atomic.

Arrays are vectors¡Xtheir state is characterized by a set of values, not by a single value. Each

component value is available immediately by using native C++ notation (the subscript operator).

Arrays are useful when each array element undergoes the same processing by the program. This is

why arrays have to be homogeneous: All elements of the array have to be of the same type. Then

the program can go over the elements of the array, performing the same operations over each

element. This is why array components are usually processed in a loop. The fact that the array

components are of the same type is important. This prevents any problem that might arise if an

operation can be applied to one array component but cannot be applied to another.

Arrays are ordered collections of data. This means that each element of the array has the previous

element and the next element. There are two obvious exceptions: the first array element does not

have the previous element, and the last element of the array does not have the next element. The

array has a name, but individual array elements do not have individual names. The program

accesses them using the name of the array appended with the subscript (or index), the position of

this element in the ordered collection.

Arrays are finite. The number of elements in the array has to be known at compile time and cannot

be changed during program execution. The programmer has to decide how many elements will be

stored in the array, make a commitment at the time of writing the program, and stick to this

commitment for good and for bad.

This is a serious limitation. If the programmer allocates too much space for program arrays, this

space will be wasted, and the program might not have enough memory for other purposes. If the

programmer does not allocate enough space, the program corrupts memory during execution time,

and the application can crash or produce incorrect results. If the programmer wants to change the

size of the array, this can be done only through editing the program source code, recompiling, and

relinking it. This is simple for a small program, but very difficult for a complex production program

or for a program distributed to thousands of customers.

Sometimes, the size of the array is known exactly. For example, an array with the hours worked for

different days of the week should have seven components (unless you expect a day or two to be

added to the week in the near future). The same is true for the array whose components contain the

number of days in a month (unless the number of months in the year changes). The same is true for

an array whose components represent the chessboard. In most cases, however, we try to find a

"reasonable" compromise, allocating more elements than we think we are going to need but not too

many (twice the amount). The compromise is "reasonable" if this decision is supported by the code

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (197 of 1187) [8/17/2002 2:57:48 PM]


that checks for array overflow and takes a "reasonable" action when it occurs. For some people,

"reasonable" might mean program termination. For others it means termination of input with

notification of the user.

Sometimes, the position of an element in the array has an application-related meaning. For

example, the name of the doctor running the hospital ward on a given day is related to the day of

the week. When data come in, it does not necessarily come for the first element first and for the

second element next. Some array elements might not have valid data at all. When arrays are used in

this way, we have to design a way for the program to tell the difference between the elements that

have valid data from the elements that do not. These arrays are called sparse arrays.

Most arrays are used as contiguous arrays. The first data item to be stored in the array is stored in

the first element. The next data item is stored in the next element. Here, we also need to distinguish

between elements that contain valid data and the elements that do not. The advantage of this

approach is that there is no need to mark each array element as valid or unused. All array elements

up to a specific location are used; all array elements after that location are unused.

There are two ways to implement contiguous arrays. One is to keep count of valid values inserted

into the array. Then the loops that process valid elements of the array could use this count to

terminate iterations. Another way to implement contiguous arrays is to have a special value that is

inserted after the last valid element of the array. Then the loops that process valid array elements

would stop when they find this special value. This special value is called a sentinel (it is similar to

the sentinels I used to determine the end of input in Chapter 4, "C++ Control Flow" ). It should be

different from the values that a valid array element can assume.

Defining C++ Arrays

As any C++ variable, an array variable has to be defined before it can be used. The array definition

connects the name of the array with the type of the array elements and with the number of elements

in the array. As any definition, array definitions cause memory for the array to be allocated during

execution time. As any definition, array definitions end with a semicolon. You can define each

array on a separate line, or you can combine several definitions on the same line, for example:

int hours[7]; char grade[35]; double amount[20];

This line defines three arrays: array hours[] of 7 integer components, array grade[] of 35

character components, and array amount[] of 20 double floating point components. Notice empty

square brackets attached to the name of the array in this paragraph. This notation indicates that the

variable we are discussing is a vector with several values rather than a scalar with a single value.

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (198 of 1187) [8/17/2002 2:57:48 PM]


For arrays of different types, as in the previous examples, each array has to be defined separately,

ending its definition with the semicolon. For arrays of the same type, it is all right to define several

arrays separating the definitions with commas (and ending the last definition with a semicolon).

Actually, one can combine definitions of arrays and scalar variables if their type is the same, for


int category[7], i, num, scores[35], n;

Some programmers choose the names for their arrays using plural. When an array is passed as a

parameter to a function, it is more appropriate to indicate that the function gets a set of scores rather

than an individual score, for example, sum(scores). Others choose array names using singular.

When an individual element of the array is referred to using its index, for example, category[i],

it is more appropriate to indicate that it is a single category that is manipulated rather than a set of

categories. In the broader scheme of things, this issue is not very important.

Although the array size should be known at compile time, it does not have to be a literal value. It

can be a #defined symbolic literal, an integer constant, or an integer expression of any complexity.

The only requirement is that this expression could be evaluated at compile time, not at run time.

For example:

#define MAX_RATES 35

// array size as a #defined value

int const NUM_ITEMS = 10;

// array size as a constant

int rates [MAX_RATES]; double amount[2*NUM_ITEMS];

An array can be initialized at definition just like any other C++ variable. The programmer supplies

initial values similar to initializing scalar variables. These initial values are specified in a commaseparated list of values delimited by braces. Since commas are separators and not terminators, the

last initializer before the closing brace does not have a comma after it.

int hours[7] = { 8, 8, 12, 8, 4, 0, 0 };

// 7 values

int side[5] = { 40,35,41 } ;

// other array elements are 0's

char option[2] = { 'Y', 'N', 'y', 'n' };

// syntax error

int week[52] = { , , 40, 48 };

// syntax error

The first initial value initializes the first component of the array, the second initializer goes to the

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (199 of 1187) [8/17/2002 2:57:48 PM]


second component, and so on. Initial values should be of the same type as the array type or, if the

types are different, conversion between the values of the two types should be allowed. These

conversions are the same as the conversions discussed in Chapter 3 for mixed numeric types in

expressions. (For example, it is OK to initialize an array of components of type double using

integer initial values.)

In these examples, I supplied values for each component of the array hours[]. It is all right to

supply fewer values than there are components, as for array side[]: The components are

initialized starting with the first one until all initial values are exhausted. Those components that are

left without values are initialized to zero of the appropriate type. It is not all right to supply more

initial values than there are components in the array, as I did for array option[]. And it is not

allowed to skip some components by using commas, as I did for array week[]. Job Control

Language (JCL) allows this syntax, but C++ is not JCL.

Similar to scalar variables, an array variable defined in one file might be used in algorithms that are

implemented in another file. To make it possible, that other file has to declare the array variable

using the same name. The major difference between the array definition and declaration is that the

declaration does not specify the size of the array. Array declarations do not allocate memory for the

array. (This is the task for array definitions.) Although C++ declarations and definitions are similar,

the programmer has to distinguish between them.

For example, some other file might need the values of components of array hours[], or it might

compute the values that these components have to be assigned to. In that file, array hours[] would

be declared this way:

extern int hours[];

// declaration: no memory allocated

For this declaration to be valid, the original definition of the array hours[] should be placed

outside of any function as a global variable.

Similar to declarations of scalar variables, array declarations are used to establish the address of the

array in memory. Now the code in this file can access elements of array hours[] as if the array

were defined in this file. Since array declarations (as any other declarations) do not allocate

memory, they do not support initialization.

C++, however, allows the programmer to use the declaration syntax for defining arrays. This is

done when the size of the array is specified by the number of initializers rather than by an explicit

compile-time constant, for example,

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (200 of 1187) [8/17/2002 2:57:48 PM]


double rates[] = { 1.0, 1.2, 1.4 };

// three elements

Here, despite the declaration notation for array rates[], three elements of the array are allocated

and initialized. This definition is equivalent to the following definition.

double rates[3] = { 1.0, 1.2, 1.4 };

// explicit count

The advantage of the first definition is that one saves keystrokes for the size of the array. On the

other hand, the first definition forgoes the opportunity to define a constant for the size of the array,

and such a constant could come in handy in the algorithms for array processing.

One of the ways to resolve this problem is to compute the number of array components using the

sizeof operator you saw in Chapter 3. Dividing the size of the array by the size of one component,

you get the number of array components.

int num = sizeof(rates) / sizeof(double);

Notice the sequence of topics that the discussion of C++ arrays goes through. It is similar to the

discussion of other data definition facilities. Each time, we discuss the meaning of the new C++

facility to be introduced (variables in Chapter 3, arrays in this chapter, then structures, classes,

composite classes, and derived classes), the syntax of the definitions (and declarations if

appropriate), and then we discuss the initialization issues. This sequence of discussion is no

accident. Initialization is extremely important in C++, and we will be studying the methods of

initialization related to each kind of memory usage in C++.

Operations over Arrays

The discussion of initialization is invariably followed by the discussion of array operations. What

can we do with arrays? C++ is very limited in this regard. You cannot assign one array variable to

another, and you cannot compare two arrays, add two arrays, multiply them, and so on. The only

thing that you can do to an array is to pass it as an argument to a function. This is why when we

want to assign one array to another, or compare two arrays, and so on, we write our own code or

use library functions if they are available.

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (201 of 1187) [8/17/2002 2:57:48 PM]


All operations can be performed over individual array elements only. When we copy one array to

another, we copy each array element individually. When we compare arrays, we compare

corresponding array elements individually. In these operations, we refer to individual array

elements by using the subscript operator and the index (also called subscript) value.

For example, side[2] denotes the element of array side at index 2. For all intents and purposes,

side[2] is an ordinary scalar integer variable. Since array side[] is an array of integers, you can

do to side[2] all you can do to any integer variable as an lvalue or as an rvalue. It is just the name

that is different¡X instead of the identifier that we use for an integer variable, we use the array

name plus the index and the subscript operator.

side[2] = 40;

num = side[2] * 2;

// use as lvalue

// use as rvalue

On the first line, side[2] gets the value 40 to store at its location. On the second line, the value

stored at location side[2] is multiplied by 2 and the result is stored in variable num (it has to be

numeric). As we see, individual array elements do not have individual names. Their names are

composed from the name of the array and the value of the subscript (index).

The C++ notation for array elements is quite conventional. What is unusual is that C++ treats the

square brackets as the operator rather than just an element of notation. And if you look at Table 3.1

in Chapter 3, you will see that this operator is of high priority, on the top of the C++ operator table.

As any operator, the subscript operator has operands. What are they? It is the name of the array and

the value of the subscript. The operator is applied to name side and value 2, the result of the

operation is side[2], the name of the array component.

I know this sounds pretty abstract and remote from practice. What difference does it make whether

it is an operator or special notation? At this point it makes no difference. Later on, we will use this

operator in some interesting contexts.

The subscript does not have to be a literal value or even a compile-time value. Any run-time

numeric expression can be used as a subscript. If the expression is a floating point, character,

short, or long value, it is converted to an integer. Here, for example, the function foo() is called

at run time, and its return value is used to compute the subscript.

side[3*foo()] = 40;

// is this legal?

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (202 of 1187) [8/17/2002 2:57:48 PM]


For this to be legal, the function foo() should be defined and its return value (the value of

subscript) should be within the range of legal indices. If only part of the array elements is assigned

values, the index has to be the index of one of these elements. If all array elements are assigned

values, the index has to be within the first and last components. Indices that are outside of this

range refer to locations in memory that do not belong to the array and hence should not be referred

to as components of the array.

Index Validity Checking

Now brace yourself, fasten the seat belt, and get ready for a bombshell. The programmer cannot

choose the range of index values for an array arbitrarily: It is fixed for all C++ arrays. This is

unpleasant because often we want to assign some meaning to the index. For example, we might

have an array revenue[] that stores revenue data from 1997 to 2006; it would be convenient to

have the range from 1997 to 2006 as the range of array indices. Other languages allow

programmers to choose subscript ranges, but C++ does not: In this case, the interests of compiler

writers got precedence over the interests of application programmers. In C++, the range is fixed.

More over, it starts with 0.

Yes, the index of the first array component for any C++ array is 0 and not 1, and this is very


For example, if the array side[] has five components, the legal array components are side[0],

side[1], side[2], side[3], and side[4]. Notice that side[5] is not a legal component of

this array.

What happens if you make a mistake and refer to side[-1], or side[6], or even to side[5]?

Does the compiler tell you that you made a mistake? No. A subscript value can be a run-time value,

not known at compilation time, and the compiler writers give up on checking. Even when the index

is a compile-time literal value that can be easily checked, the compiler does not check the index


C++ skips this validity check with the air of deference to the programmer. If you say side[-1] in

your code, you obviously meant something, and it is not the job of the compiler to second-guess

you and tell you that you are wrong. No, there is no built-in compile-time validity check for


Is there a run-time check? After all, some other languages validate every reference to array

components against the legal range of indices. Not C++. Index or subscript validation at run time

would affect performance, and this is a sacred cow in C++. And what if you do not have a

performance problem in your program? What if you want to check the validity of the index at run

time? No problem. Do it yourself; check the value of the index against its legal limits. No, there is

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (203 of 1187) [8/17/2002 2:57:48 PM]


no built-in run-time validity check for indices.

Of course, the underlying assumption for this kind of language design is that the programmer

knows what he or she is doing at every moment and does not need any help from the compiler or

the run-time system. Needless to say, this assumption is totally baseless, and errors in handling

subscripts are a common source of errors and worries for C++ programmers.

The reason for this rigidity (inherited from C) is that the array name is used as the address of the

first element of the array. The displacement of the first element from the beginning is zero. The

displacement of the second element is one length of the element (depending on its type). The

displacement of the third element is two lengths of the element. The compiler knows the size of the

element, and it is simpler to compute the address of the element using its displacement than using

its position in the array.

When the index value is invalid, the compiler still uses this index as displacement to compute the

address of the component in memory, and the program corrupts its memory. However, if this

address is not used for something useful, you can get away with that.


There is no compile-time index validity check in C++. There is no run-time index validity check in

C++. The computer memory can be corrupted by your program. Watch out!

Let us look at some consequences of errors in handling indices. Listing 5.1 shows a program that

correctly assigns values to the sides of a polygon, but prints them incorrectly: The first value of the

index is 1, the last value of the index is 5. The program output is shown on Figure 5-1.

Figure 5-1. Output reveals the error in code.

Example 5.1. Erroneous scan over the array.


using namespace std;

// or #include

int main()


int size[5] = { 39, 40, 41, 42, 43 };

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (204 of 1187) [8/17/2002 2:57:48 PM]


for (int i = 1; i <= 5; i++)

// bad start, bad end

cout << " " << size[i]; cout << endl;

return 0;


In this case, inspection of the output tells you that there is an error in the code. Often, however, if

the programmer is consistent in making errors, inspection of the output shows no sign of error.

Listing 5.2 shows the program that assigns the sides of a polygon incorrectly and prints them

incorrectly. The program does not use the location side[0] that belongs to the array. Instead, it

used the memory location side[5] that does not belong to the array. As Figure 5-2 shows, the

output is correct although the program corrupts the memory location it refers to as side[5].

Figure 5-2. Correct output hides errors in array handling.

Example 5.2. Error is hidden by correct output.


using namespace std;

// or #include

int main()


int size[5];

size[1]=39; size[2]=40; size[3]=41; size[4]=42; size[5]=43;

for (int i = 1; i <= 5; i++)

// bad start, bad end

cout << " " << size[i]; cout << endl;

return 0;


How dangerous is corruption of memory? If the memory that this code corrupts is not allocated for

anything useful (and there are plenty of machines around with huge memories that are not allocated

for anything useful), this is not a problem. If the corrupted memory is used by the program, the

error is difficult to find. As Listing 5.2 shows, it is even difficult to realize that the program is

incorrect and where to start looking for the error. Listing 5.3 expands this example. As you can see

from Figure 5-3, the value in a[0] is incorrect: It changes from 11 to 43 even though there is no

second assignment to a[0]. It is quite unlikely that in a real-life setting one would suspect that

handling the array side[] might change the values in array a[]. On your machine, this program

might corrupt memory in a different way. Despite whatever it does, this innocent-looking little

program is incorrect.

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (205 of 1187) [8/17/2002 2:57:48 PM]


Figure 5-3. Array a[] is corrupted by handling of array side[].

Example 5.3. Error in one place corrupts memory in another.


void main()

{ int a[3]; int size[5];

a[1]=11; a[2]=12; a[3]=13;

// a victim of corruption

size[1]=39; size[2]=40; size[3]=41; size[4]=42; size[5]=43;

for (int i = 1; i <= 5; i++)

// bad start, bad end

cout << " " << size[i];

cout << endl;

for (i = 0; i < 3; i++)

// correct start, end

cout << " " << a[i];

cout << endl; }

Correct iteration over array components should start with 0, not with 1. Iteration should end one

value short of the array size. If array size is 5, the correct form of the test is i<5; if array size is 3,

the correct form of the test is i<3. In general, if the number of valid array elements is in variable

NUM, the correct form of the continuation test is i
designed correctly. Notice that index i is defined in the first loop, not at the beginning of the

program. Its name is known until the end of the function. Hence, the second loop does not define

this variable but uses it as if it were defined at the beginning of the function. My compiler

(Microsoft Visual C++ version 6.0) does not implement standard C++ correctly¡Xthe scope of

index variable i should be the first loop only, not throughout the function.

Validity of indices is something a C++ programmer has to think about all the time. When you

iterate over all elements of the array, you should start the iteration with index 0. You should end the

iteration with index one less than the number of elements. This is a very simple rule. It is not

difficult to remember. It is not difficult to use. And most of the time we get it right. But at one time

or another, every programmer makes mistakes in accessing array components, and these mistakes

are very costly, especially when they are made at the time of maintenance. If you add together all

the time, effort and, frustration that the software industry has wasted on errors in handling array

subscripts, the result would be staggering. This is why C++ programmers should think about the

validity of indices all the time.


Start iterations over the array with the index set to 0. Continue while the index value is less than the

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (206 of 1187) [8/17/2002 2:57:48 PM]

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 5. Aggregation with Programmer-Defined Data Types

Tải bản đầy đủ ngay(0 tr)