Aggregation
Aggregate functions will calculate some property of the returned data set. The universally available aggregate functions are
- COUNT(column_name): Counts the number of values in the specified column
- MAX(column_name): returns the highest numeric value in the specified column
- MIN(column_name): returns the lowest numeric value in the specified column
- SUM(column_name): returns the sum of the values in the specified column
- AVG(column_name): returns the average of the values in the specified column
The alias for the MAX column is just to give that derived column a name which will be displayed with the returned dataset.
If we wanted to count the number of instrument models:
GROUP BY
With tables full of data, it's common to want to collect data on related groups within the data returned by a query. For example, in our weatherstations database below, we might want to know how many instruments each weatherstation has, or how many instruments of a given model there are, or the average reading for each instrument of a particular type.
In SQL, we can answer these questions of a dataset by grouping together rows with a common property and acting on them with an aggregate function.. The syntax is demonstrated below.
Counting the number of instruments at each weatherstation:
Counting the instruments of each type:
Calculating the average measurement of all the Atmos instruments, grouped by their instrument_model:
Note that in this case we have constructed a JOIN as the data are shared across Measurements and Instruments, and we can use a WHERE which will be executed before the GROUP BY is applied to restrict the dataset to only Atmos instruments. The GROUP BY then collects together all the Atmos instruments into Atmos 2017 and Atmos 2021 groups, and then calculates the mean measurement for each group.
Restricting which GROUPs are returned with HAVING
We can use WHERE on tables, which is applied before GROUPing, but not with aggregate functions. In the previous query counting instrument_models, if we wanted to restrict the returned groupings to instrument models of which we have more than 10 instruments installed we use HAVING:
WHERE is applied before GROUPing, but HAVING is applied after the groups have formed, to restrict which ones are returned.