Seldomly Used SQL: bool Aggregate Functions
Aggregate functions are the cornerstone of performing any type of analysis or reporting on your data that lives within Postgres. The bool_or
and bool_and
aggregate functions are ones that I’ve started to pull out of my toolbox more often. These functions have been particularly useful when I’ve needed to know if any or all values within a column meets a certain condition and to report on it apporpriately.
Logical Operators
Before we can dive into the aggregate functions, we first need to discuss logical operators. If you’re already familiar with these types of operators, this section will likely be review. Any time you’re writing SQL, and are using multiple filters in your where clause, you’re using a logical operator.
SELECT products.id
, products.cost
, sum(orders.amount)
FROM products
INNER JOIN orders
ON orders.product_id = products.id
WHERE products.category = 'food'
AND orders.ordered_at > '2016-01-01 00:00:00'
AND (
orders.region = 'west'
or orders.express_delivery = true
)
GROUP BY 1, 2
In this query we’re using both an AND
operator and an OR
operator. The AND
operator means that both conditions need to be true to return a result. In our example, the products that need to have been ordered need to be food items and ordered after the first of the year. The OR
operator allows for either condition to be true to return a value as part of the result set. In the case of our example, this means orders can be express delivery or have been ordered in the west region. If you need more background on logical operators, the Postgres documentation has a great table on the possible scenarios for all logical operators.
Now imagine being able to take the concept of logical operators and apply that to the rows that are processed as part of a result set. That’s what the bool_or
and bool_and
aggregate functions do for us. Both functions must take a boolean value as it’s parameter. The bool_or
function will return true
if at least one of the input values is true. On the other hand, the bool_and
will return true only if all of the input values are true.
Example Usage
The best way I’ve found to explore these concepts is with some example usage. Let’s assume I own a warehouse and my warehouse will take orders and, based on the customer, will track the category of the products sold and whether the customer asked for express delivery. My orders table might look something like this:
CREATE TABLE orders (
id BIGSERIAL,
customer_id BIGINT,
category TEXT,
express_delivery BOOLEAN
);
Let’s assume this is what our initial dataset looks like:
id | customer_id | category | express_delivery |
---|---|---|---|
1 | 1 | food | TRUE |
2 | 1 | shoes | TRUE |
4 | 2 | food | TRUE |
5 | 2 | auto | FALSE |
6 | 2 | shoes | FALSE |
7 | 3 | books | FALSE |
8 | 3 | auto | FALSE |
bool_and
Our boss has come in and was curious to know if, based on the category of product, the customers will always ask for express delivery. Also the boss wants to see this breakout against all categories in the database. No problem, let’s use the bool_and
function to make sure that all orders by category are always express delivered:
SELECT category
, bool_and(express_delivery) as always_express
FROM orders
GROUP BY 1
Our output should look something like this:
category | always_express |
---|---|
food | t |
shoes | f |
auto | f |
books | f |
bool_or
The boss has come back to us and is curious why so few categories are express delivered. We astutely point out that, we’re only looking for instances where every delivery in a category is express delivered. A better analysis should be to see which product categories ever get express delivered. This means that if a category has express delivery, even once, we should indicate that it has been for the category. Our SQL and the result set should look like this:
SELECT category
, bool_or(express_delivery) as ever_expressed
FROM orders
GROUP BY 1
category | ever_expressed |
---|---|
food | t |
shoes | t |
auto | f |
books | f |
We now see that shoes
have been added to the list of products that have been asked to be expressed delivered by customers. The boss has some sense of what to proritize within the warehouse so that food and shoes get out the warehouse door much more quickly.
Null Values
What happens if our application that’s backing the ordering system has an error and we don’t record the express delivery field correctly? Let’s assume the record got created in the database but the express_delivery
field was set as NULL
. Ideally, we’d have a constraint on the field in the database, but for the purposes of this example let’s assume we didn’t. Let’s say our table now looks like this:
id | customer_id | category | express_delivery |
---|---|---|---|
1 | 1 | food | TRUE |
2 | 1 | shoes | TRUE |
4 | 2 | food | TRUE |
5 | 2 | auto | FALSE |
6 | 2 | shoes | FALSE |
7 | 3 | books | FALSE |
8 | 3 | auto | FALSE |
9 | 1 | running | |
10 | 1 | food |
Both the bool_or
and the bool_and
functions will treat NULL
just as that. Our above queries will do one of two things:
- If other input values in set are not null, then the output will be based on the other values.
- If all input values in the set are null, then the output of the query will be null as well.
Keep this behavior in mind when you’re querying and there is possibility of null values in your data.
Wrapping Up
I hope you found the bool_or
and bool_and
functions as useful as I have. Any time you find yourself starting your analysis with does any
or do all
then checkout these functions. If you have other instances where you use these functions, let me know via twitter or email, I’d love to hear more.
Edit: Thanks to @epc and @floatingatoll for pointing out a small mistake on one of my headings on the query output. Fixed!