More SQL Please: Aggregate Functions

pop out tables

users

| id | name_first | name_last | email_address | |----|------------|-----------|---------------------------| | 1 | Marcus | Crawford | marcus@mcrawford.ca | | 2 | Elena | Moreno | elena@hellocorp.com | | 3 | Simon | Chen | simon@sqlanalytics.io | | 4 | Priya | Amini | priya@aminiconsulting.com | | 5 | Zara | Amini | zara@aminiconsulting.com | | 6 | Charlotte | NULL | NULL |

donations

| id | user_id | date | amount | |----|---------|---------------------|--------| | 1 | 1 | 2024-04-12 09:10:01 | 20 | | 2 | 3 | 2024-04-19 12:40:15 | 10 | | 3 | 5 | 2024-05-01 19:51:45 | 40 | | 4 | 2 | 2024-05-03 14:09:11 | 75 | | 5 | 99 | 2024-06-01 01:11:44 | 100 | | 6 | 1 | 2024-06-04 09:01:59 | 40 |

The Functions topic introduces the concept of simple/scalar functions, which accept zero or more input values and produce a single output value per row.

Aggregate functions are categorized differently because they operate on more than one row of data at a time.

count() is a function that accepts a column name or a * as input. Instead of returning data though, it returns a count of the total number of rows. If a * is specified, the count of all rows in the table is returned. Think of it as asking for a row count "including all columns".

SELECT count(*)
FROM users;
| count(*) | |----------| | 6 |

If a column name is specified, the returned count is the number of non- NULL rows.

SELECT count(name_last)
FROM users;
| count(name_last) | |------------------| | 5 |

Aggregate functions can accept a DISTINCT clause which causes duplicate rows to be ignored by the function. Here, the second "Amini" last name is not included in the count.

SELECT count(DISTINCT name_last)
FROM users;
| count(DISTINCT name_last) | |---------------------------| | 4 |

Another commonly used aggregate function is max(). It accepts a single column name and returns the maximum value out of all of the rows. The maximum value is determined in the same way as an ORDER BY: by following the current collation for that data type.

To instead find the minimum value, the min() function is used.

Keep in mind that in SQLite and other databases, max() and min() have versions that accept multiple arguments which are actually scalar functions that perform a different task.

SELECT max(amount)
FROM donations;
| max(amount) | |-------------| | 100 |

The sum() function calculates the sum of all non- NULL values.

If every input row is NULL then sum() will return NULL. Unfortunately this is required by the SQL standard and several aggregate functions behave this way. This is not very useful if you are expecting a numerical result so some databases like SQLite have implemented a nonstandard total() function that returns 0.0 in this case instead.

SELECT sum(amount)
FROM donations;
| sum(amount) | |-------------| | 285 |

The avg() function calculates an average value across all the rows. It is equivalent to running total() / count(). Note again that only non- NULL values are considered in the calculation.

In SQLite string values that can't be converted into numbers, such as 'hello', are treated as 0 in the average calculation.

SELECT avg(amount)
FROM donations;
| avg(amount) | |-------------| | 47.5 |

There are often more built-in aggregate functions to use as well as the ability to define your own aggregate functions similar to scalar functions.

Aggregate functions are powerful tools alone but they become even more useful when combined with the GROUP BY clause which we will explore next.

previous: Functions home next: GROUP BY