SQL GROUP BYThe SQL GROUP BY clause is a powerful way of summarizing and aggregating data.
Let's assume a table called CustomerOrderItem. This is a denormalized table generated from a normalized database for reporting purposes. I am using this example to eliminate the use of joins, which have not been described yet in this series of SQL tutorials.
|John Smith||17-Jun-2008||iPod Nano||149.99|
|John Smith||17-Jun-2008||iPod Leather Wallet||59.99|
|Alistair Charmondley-Warner||19-Jun-2008||Widescreen TV||449.50|
|Susan Brown||20-Jun-2008||iPod Nano||149.99|
|Susan Brown||20-Jun-2008||XBox 360||299.99|
If you want the total value of all sales, you would write this:
The GROUP BY allows you breakdown these totals by Item:
|iPod Leather Wallet||59.99|
The SUM(...) function is called an aggregate function. Other aggregate functions that can be used are: COUNT, AVG, MIN, MAX, STDEV, STDEVP, VAR and VARP. It is important that the column being used for the grouping is included in the GROUP BY clause. If you leave this out you will get the following error:
Msg 8120, Level 16, State 1, Line 1
Column 'CustomerOrderItem.ItemName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Let's have another example, this time using the AVG function. The average purchase value for each customer would be obtained as follows:
The COUNT function is slightly different to the rest. It can take '*' as a parameter, which means "count the number of rows". The number of orders for each customer would be:
You could have used COUNT(ItemName) and it would have returned the same result. Another feature is COUNT(DISTINCT <column_name>) as follows:
This returns the number of different items ordered by each customer. With the example data used here this will actually return the same as the previous query, but it should be obvious what the difference is.
These examples have all used one column for the SQL GROUP BY and one column for the aggregate, but you can have as many as required:
Do not assume that the rows will be returned in the order specified by the GROUP BY columns. The only way to guarantee this is to include an ORDER BY clause at the end.
All these statements could have limited the rows returned through use of the WHERE clause, but only against actual columns in the table(s) or against calculations on the columns. So how do you apply a WHERE condition to the result of the aggregate functions? For example, what if I want to see all customers who have ordered more than one item?
This is done with the HAVING clause:
The comparison in the HAVING clause can also be on the result of a calculation, and does not have to be included in the selected list of columns. The following query returns the names of all customers who have ordered more than 1 "expensive" item (value more than 200.00).
This example uses the CASE statement, which will be described in a future article.
I hope you can see the power of the SQL GROUP BY and HAVING clauses. They enable some very powerful and complex queries to be performed with only a few lines of SQL. Read on for a description of the SQL JOIN statement.