Tải bản đầy đủ - 0 (trang)
4 Summarizing with SUM( ) and AVG( )

# 4 Summarizing with SUM( ) and AVG( )

Tải bản đầy đủ - 0trang

SUM( ) and AVG( ) are strictly numeric functions, so they can't be used with strings or

temporal values. On the other hand, sometimes you can convert non-numeric values to useful

numeric forms. Suppose a table stores TIME values that represent elapsed time:

mysql> SELECT t1 FROM time_val;

+----------+

| t1

|

+----------+

| 15:00:00 |

| 05:01:30 |

| 12:30:20 |

+----------+

To compute the total elapsed time, use TIME_TO_SEC( ) to convert the values to seconds

before summing them. The result also will be in seconds; pass it to SEC_TO_TIME( ) should

you wish the sum to be in TIME format:

mysql> SELECT SUM(TIME_TO_SEC(t1)) AS 'total seconds',

-> SEC_TO_TIME(SUM(TIME_TO_SEC(t1))) AS 'total time'

-> FROM time_val;

+---------------+------------+

| total seconds | total time |

+---------------+------------+

|

117110 | 32:31:50

|

+---------------+------------+

The SUM( ) and AVG( ) functions are especially useful in applications that compute statistics.

They're explored further in Chapter 13, along with STD( ), a related function that calculates

standard deviations.

7.5 Using DISTINCT to Eliminate Duplicates

7.5.1 Problem

You want to know which values are present in a set of values, without listing duplicate values

a bunch of times. Or you want to know how many distinct values there are.

7.5.2 Solution

Use DISTINCT to select unique values, or COUNT(DISTINCT) to count them.

7.5.3 Discussion

A summary operation that doesn't use aggregate functions is to determine which values or

rows are contained in a dataset by eliminating duplicates. Do this with DISTINCT (or

DISTINCTROW, which is synonymous). DISTINCT is useful for boiling down a query result,

and often is combined with ORDER BY to place the values in more meaningful order. For

example, if you want to know the names of the drivers listed in the driver_log table, use

the following query:

mysql> SELECT DISTINCT name FROM driver_log ORDER BY name;

+-------+

| name |

+-------+

| Ben

|

| Henry |

| Suzi |

+-------+

A query without DISTINCT produces the same names, but is not nearly as easy to

understand:

mysql> SELECT name FROM driver_log;

+-------+

| name |

+-------+

| Ben

|

| Suzi |

| Henry |

| Henry |

| Ben

|

| Henry |

| Suzi |

| Henry |

| Ben

|

| Henry |

+-------+

If you want to know how many different drivers there are, use COUNT(DISTINCT):

mysql> SELECT COUNT(DISTINCT name) FROM driver_log;

+----------------------+

| COUNT(DISTINCT name) |

+----------------------+

|

3 |

+----------------------+

COUNT(DISTINCT) ignores NULL values. If you also want to count NULL as one of the values

in the set if it's present, do this:

COUNT(DISTINCT val) + IF(COUNT(IF(val IS NULL,1,NULL))=0,0,1)

The same effect can be achieved using either of the following expressions:

COUNT(DISTINCT val) + IF(SUM(ISNULL(val))=0,0,1)

COUNT(DISTINCT val) + (SUM(ISNULL(val))!=0)

COUNT(DISTINCT) is available as of MySQL 3.23.2. Prior to that, you have to use some kind

of workaround based on counting the number of rows in a SELECT DISTINCT query. One way

to do this is to select the distinct values into another table, then use COUNT(*) to count the

number of rows in that table.

DISTINCT queries often are useful in conjunction with aggregate functions to obtain a more

complete characterization of your data. For example, applying COUNT(*) to a customer table

indicates how many customers you have, using DISTINCT on the state values in the table

tells you which states you have customers in, and COUNT(DISTINCT) on the state values

tells you how many states your customer base represents.

When used with multiple columns, DISTINCT shows the different combinations of values in

the columns and COUNT(DISTINCT) counts the number of combinations. The following

queries show the different sender/recipient pairs in the mail table, and how many such pairs

there are:

mysql> SELECT DISTINCT srcuser, dstuser FROM mail

-> ORDER BY srcuser, dstuser;

+---------+---------+

| srcuser | dstuser |

+---------+---------+

| barb

| barb

|

| barb

| tricia |

| gene

| barb

|

| gene

| gene

|

| gene

| tricia |

| phil

| barb

|

| phil

| phil

|

| phil

| tricia |

| tricia | gene

|

| tricia | phil

|

+---------+---------+

mysql> SELECT COUNT(DISTINCT srcuser, dstuser) FROM mail;

+----------------------------------+

| COUNT(DISTINCT srcuser, dstuser) |

+----------------------------------+

|

10 |

+----------------------------------+

DISTINCT works with expressions, too, not just column values. To determine the number of

hours of the day during which messages in the mail were sent, count the distinct HOUR( )

values:

mysql> SELECT COUNT(DISTINCT HOUR(t)) FROM mail;

+-------------------------+

| COUNT(DISTINCT HOUR(t)) |

+-------------------------+

|

12 |

+-------------------------+

To find out which hours those were, list them:

mysql> SELECT DISTINCT HOUR(t) FROM mail ORDER BY 1;

+---------+

| HOUR(t) |

+---------+

|

7 |

|

8 |

|

9 |

|

10 |

|

11 |

|

12 |

|

13 |

|

14 |

|

15 |

|

17 |

|

22 |

|

23 |

+---------+

Note that this query doesn't tell you how many messages were sent each hour. That's covered

in Recipe 7.16.

7.6 Finding Values Associated with Minimum and Maximum Values

7.6.1 Problem

You want to know the values for other columns in the row containing the minimum or

maximum value.

7.6.2 Solution

Use two queries and a SQL variable. Or use the "MAX-CONCAT trick." Or use a join.

7.6.3 Discussion

MIN( ) and MAX( ) find the endpoints of a range of values, but sometimes when finding a

minimum or maximum value, you're also interested in other values from the row in which the

value occurs. For example, you can find the largest state population like this:

mysql> SELECT MAX(pop) FROM states;

+----------+

| MAX(pop) |

+----------+

| 29760021 |

+----------+

But that doesn't show you which state has this population. The obvious way to try to get that

information is like this:

mysql> SELECT name, MAX(pop) FROM states WHERE pop = MAX(pop);

ERROR 1111 at line 1: Invalid use of group function

Probably everyone attempts something like that sooner or later, but it doesn't work, because

aggregate functions like MIN( ) and MAX( ) cannot be used in WHERE clauses. The intent of

the statement is to determine which record has the maximum population value, then display

the associated state name. The problem is that while you and I know perfectly well what we'd

mean by writing such a thing, it makes no sense at all to MySQL. The query fails because

MySQL uses the WHERE clause to determine which records to select, but it knows the value of

an aggregate function only after selecting the records from which the function's value is

determined! So, in a sense, the statement is self-contradictory. You could solve this problem

using a subselect, except that MySQL won't have those until Version 4.1. Meanwhile, you can

use a two-stage approach involving one query that selects the maximum size into a SQL

variable, and another that refers to the variable in its WHERE clause:

mysql> SELECT @max := MAX(pop) FROM states;

mysql> SELECT @max AS 'highest population', name FROM states WHERE pop =

@max;

+--------------------+------------+

| highest population | name

|

+--------------------+------------+

| 29760021

| California |

+--------------------+------------+

This technique also works even if the minimum or maximum value itself isn't actually

contained in the row, but is only derived from it. If you want to know the length of the

shortest verse in the King James Version, that's easy to find:

mysql> SELECT MIN(LENGTH(vtext)) FROM kjv;

+--------------------+

| MIN(LENGTH(vtext)) |

+--------------------+

|

11 |

+--------------------+

If you want to ask "What verse is that?," do this instead:

mysql> SELECT @min := MIN(LENGTH(vtext)) FROM kjv;

mysql> SELECT bname, cnum, vnum, vtext FROM kjv WHERE LENGTH(vtext) = @min;

+-------+------+------+-------------+

| bname | cnum | vnum | vtext

|

+-------+------+------+-------------+

| John |

11 |

35 | Jesus wept. |

+-------+------+------+-------------+

Another technique you can use for finding values associated with minima or maxima is found

in the MySQL Reference Manual, where it's called the "MAX-CONCAT trick." It's pretty

gruesome, but can be useful if your version of MySQL precedes the introduction of SQL

variables. The technique involves appending a column to the summary column using CONCAT(

), finding the maximum of the resulting values using MAX( ), and extracting the nonsummarized part of the value from the result. For example, to find the name of the state with

the largest population, you can select the maximum combined value of the pop and name

columns, then extract the name part from it. It's easiest to see how this works by proceeding

in stages. First, determine the maximum population value to find out how wide it is:

mysql> SELECT MAX(pop) FROM states;

+----------+

| MAX(pop) |

+----------+

| 29760021 |

+----------+

That's eight characters. It's important to know this, because each column within the combined

population-plus-name values should occur at a fixed position so that the state name can be

extracted reliably later. (By padding the pop column to a length of eight, the name values will

all begin at the ninth character.)

However, we must be careful how we pad the populations. The values produced by CONCAT(

) are strings, so the population-plus-name values will be treated as such by MAX( ) for

sorting purposes. If we left justify the pop values by padding them on the right with RPAD( ),

we'll get combined values like the following:

mysql> SELECT CONCAT(RPAD(pop,8,' '),name) FROM states;

+------------------------------+

+------------------------------+

| 4040587 Alabama

|

|

| 3665228 Arizona

|

| 2350725 Arkansas

|

...

Those values will sort lexically. That's okay for finding the largest of a set of string values with

MAX( ). But pop values are numbers, so we want the values in numeric order. To make the

lexical ordering correspond to the numeric ordering, we must right justify the population

mysql> SELECT CONCAT(LPAD(pop,8,' '),name) FROM states;

+------------------------------+

+------------------------------+

| 4040587Alabama

|

|

|

| 3665228Arizona

|

| 2350725Arkansas

|

...

Next, use the CONCAT( ) expression with MAX( ) to find the value with the largest

population part:

mysql> SELECT MAX(CONCAT(LPAD(pop,8,' '),name)) FROM states;

+-----------------------------------+

+-----------------------------------+

| 29760021California

|

+-----------------------------------+

To obtain the final result (the state name associated with the maximum population), extract

from the maximum combined value the substring that begins with the ninth character:

mysql> SELECT SUBSTRING(MAX(CONCAT(LPAD(pop,8,' '),name)),9) FROM states;

+------------------------------------------------+

+------------------------------------------------+

| California

|

+------------------------------------------------+

Clearly, using a SQL variable to hold an intermediate result is much easier. In this case, it's

also more efficient because it avoids the overhead for concatenating column values for sorting

and decomposing the result for display.

Yet another way to select other columns from rows containing a minimum or maximum value

is to use a join. Select the value into another table, then join it to the original table to select

the row that matches the value. To find the record for the state with the highest population,

use a join like this:

mysql> CREATE TEMPORARY TABLE t

-> SELECT MAX(pop) as maxpop FROM states;

mysql> SELECT states.* FROM states, t WHERE states.pop = t.maxpop;

+------------+--------+------------+----------+

| name

| abbrev | statehood | pop

|

+------------+--------+------------+----------+

| California | CA

| 1850-09-09 | 29760021 |

+------------+--------+------------+----------+

7.7 Controlling String Case Sensitivity for MIN( ) and MAX( )

7.7.1 Problem

MIN( ) and MAX( ) select strings in case sensitive fashion when you don't want them to, or

vice versa.

7.7.2 Solution

Alter the case sensitivity of the strings.

7.7.3 Discussion

When applied to string values, MIN( ) and MAX( ) produce results determined according to

lexical sorting rules. One factor in string sorting is case sensitivity, so MIN( ) and MAX( ) are

affected by that as well. In Chapter 6, we used a textblob_val table containing two

columns of apparently identical values:

mysql> SELECT tstr, bstr FROM textblob_val;

+------+------+

| tstr | bstr |

+------+------+

| aaa | aaa |

| AAA | AAA |

| bbb | bbb |

| BBB | BBB |

+------+------+

However, although the values look the same, they don't behave the same. bstr is a BLOB

column and is case sensitive. tstr, a TEXT column, is not. As a result, MIN( ) and MAX( )

will not necessarily produce the same results for the two columns:

mysql> SELECT MIN(tstr), MIN(bstr) FROM textblob_val;

+-----------+-----------+

| MIN(tstr) | MIN(bstr) |

+-----------+-----------+

| aaa

| AAA

|

+-----------+-----------+

To make tstr case sensitive, use BINARY:

mysql> SELECT MIN(BINARY tstr) FROM textblob_val;

+------------------+

| MIN(BINARY tstr) |

+------------------+

| AAA

|

+------------------+

To make bstr not case sensitive, you can convert the values to a given lettercase:

mysql> SELECT MIN(LOWER(bstr)) FROM textblob_val;

+------------------+

| MIN(LOWER(bstr)) |

+------------------+

| aaa

|

+------------------+

Unfortunately, doing so also changes the displayed value. If that's an issue, use this technique

instead (and note that it may yield a somewhat different result):

mysql> SELECT @min := MIN(LOWER(bstr)) FROM textblob_val;

mysql> SELECT bstr FROM textblob_val WHERE LOWER(bstr) = @min;

+------+

| bstr |

+------+

| aaa |

| AAA |

+------+

7.8 Dividing a Summary into Subgroups

7.8.1 Problem

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

4 Summarizing with SUM( ) and AVG( )

Tải bản đầy đủ ngay(0 tr)

×