Lesson 5 - MySQL step by step: Ordering, Limit, and Aggregation
In the previous lesson, MySQL step by step: Selecting Data (Searching), we prepared test data and learned basic
queries. It was the
SELECT statement and several operators. Today,
we're going to look at ordering and aggregation.
We haven't yet cared about the order of the results that the
SELECT query returned. In fact, there even wasn't any order.
Internally, the database keeps certain sophisticated rules (which are above the
level of this course) and it returned the items as they went. If we made a
change in the database and called the same query again, the order would be
probably completely different. Of course, the database is able to sort the
We can sort by any column. When sorting by the id, we get the items in the
order they were inserted into the database. We can sort by numeric columns, but
also by the text ones (which sorts by alphabet). We can also sort by date and
all other data types, the database will always handle it. Let's select all the
users and sort them by their last name. We'll use the
clause to do so, which comes at the end of the query:
SELECT `first_name`, `last_name` FROM `user` ORDER BY `last_name`
Patricia Adams James Baker Donald Brown David Clark Michelle Davis Lisa Green ...
Of course, there could also be a
WHERE clause in the query, but
to keep things simple, we've just selected all the users.
We can sort by several criteria (columns). Let's sort the users by the number articles they wrote and those with the same number also in alphabetical order:
SELECT `first_name` ,`last_name`, `articles_count` FROM `user` ORDER BY `articles_count`, `last_name`
Michelle Davis 0 Charles Lopez 0 Aron Heaven 1 Daniel Jones 1 Paul Lee 1 Betty Miller 1 Jenifer Williams 1 Donald Brown 2 Lucy Harris 2 Anthony Jackson 2 ...
Of course, we can also sort in the other direction. We can sort in ascending
order (the default direction) with the
ASC keyword and in
descending with the
DESC keyword. Let's try to create a list of the
users by the number of their articles. First will be those with the most
articles, so we'll sort in descending order. Those with the same number of
articles will be sorted in alphabetical order:
SELECT `first_name`, `last_name`, `articles_count` FROM `user` ORDER BY `articles_count` DESC, `last_name`;
Jeff Moore 18 John Smith 17 Donna Johnson 12 Stephen Murphy 12 Paul Walker 9 Carol Young 9 ...
We always have to specify the
DESC keyword again. You can see
that the ordering by
last_name is ascending because we've put the
DESC keyword only after
Let's stay with our user rating by their number of articles for a while. Now, we want to list just 10 best users. If you had one million users, it probably wouldn't be a good idea to select all of them and then use just 10 in your application and throw the 999 990 away. We'll define a limit, the maximum number of the entries we want to select. At the same time, we'll define the sorting order. The limit is always at the end of a query:
SELECT `first_name`, `last_name`, `articles_count` FROM `user` ORDER BY `articles_count` DESC, `last_name` LIMIT 10;
We can use
ORDER BY for other commands as
well, such as
UPDATE. This way, we can
ensure that only one entry is deleted or updated.
The database offers many of so-called aggregation functions. These are functions that manipulate with multiple values in some way and return just a single value as the result.
An example of such a function is the
COUNT() function, that
returns the number of table rows that meet certain criteria. Let's count how
many users wrote at least one article:
SELECT COUNT(*) FROM `user` WHERE `articles_count` > 0;
SELECT to get the value of
COUNT() is not a command, it's a function which gets the table rows
as inputs, and the results are returned by the
SELECT. We use
parentheses to call functions as in other programming languages (at least
in most of them). The asterisk means that we're interested in all the columns.
For example, we can count only the users who have their name filled in (more
precisely, who don't have
NULL in the name column but let's keep
this for further lessons).
You'd certainly come up with another way to achieve this
result. You'd simply select some value as before (for example the name), then
send these rows to your application and count how many names there are. The data
could be removed then. However, such a transfer would overload the database
unnecessarily and slow down the application.
COUNT() transfers only
a single number. Never count by selecting values, just use the
AVG() returns the average of given values. Let's look at the
average number of articles per user:
SELECT AVG (`articles_count`) FROM `user`;
SUM() returns the sum of the given values. Let's see how many
articles were written by people born since 1980:
SELECT SUM(`articles_count`) FROM `user` WHERE `birth_date` > '1980-1-1';
MIN() function returns the minimum (the lowest value). Let's
find the lowest date of birth:
SELECT MIN(`birth_date`) FROM `user`;
Beware, if we'd like to select the first and last names too, this code won't work:
-- This code won't work SELECT `first_name`, `last_name`, MIN(`birth_date`) FROM `user`;
The aggregation function works with multiple column values and the selected
last_name) won't be related
with the value returned by
MIN() anyhow. We could solve this
problem with a sub-query or even more easily, avoid the
MAX() functions and use
ORDER BY and
SELECT `first_name`, `last_name`, `birth_date` FROM `user` ORDER BY `birth_date` LIMIT 1;
Linda Martin 1935-05-15
MIN(), there's also the
Let's find the maximum number of articles a user wrote:
SELECT MAX(`articles_count`) FROM `user`;
MySQL has some more aggregation functions, but these are no longer interesting to us.
We can group database entries by certain criteria. We use grouping almost
always together with aggregation functions. Let's group the users by
SELECT `first_name` FROM `user` GROUP BY `first_name`;
Anthony Aron Barbara Betty Carol ...
We can see that each name is included only once, even if there are there
multiple times in the database. We'll now add, besides the name, also the number
of its occurrences in the table, using the
SELECT `first_name`, COUNT(*) FROM `user` GROUP BY `first_name`;
... Jeff 1 Jenifer 1 John 2 Joseph 1 Linda 1 ...
We can see that there are, for example, two Johns.
To simplify a query, we can define aliases in it. For example, we can rename
a column with a long name to make the query more readable. We'll find this often
in queries through multiple tables where it's very useful. For tables, we use
AS to simplify the operations within the query. For columns, we use
AS so our application can see the data under a different name than
it's really in the database. This can be especially useful for aggregation
functions because there's no column for their result in the database, so it
could be difficult to work with the result instead. Let's modify our last
SELECT `first_name`, COUNT(*) AS `cnt` FROM `user` GROUP BY `first_name`;
... Jeff 1 Jennifer 1 John 2 Joseph 1 Linda 1 ...
In the next lesson MySQL step by step: Data Types and NULL, we'll say something about
introduce charts of data types in MySQL. I also have some additional test data
for you, for a content management system, so we can do more advanced queries
No one has commented yet - be the first!