5 min read

Partition Usage in PostgreSQL

Partition Usage in PostgreSQL
PostgreSQL

This post includes the following things;

  • What is the partition?
  • Why do we need partitions?
  • How do we use partitions?

What is the Partition?

Basically, Partition is splitting the data into the tables by specific column or columns. It uses mostly for large data tables. In this way, the partition helps us with disk IO optimization, low CPU usage, and low Memory usage. Also, we could scale the table without increasing resources in this way.

For example;

CREATE TABLE logs
(
id serial,
timestamp timestamp default current_timestamp,
type varchar(128) not null,
message text null
) PARTITION BY LIST (type);

CREATE TABLE logs_error PARTITION OF logs FOR VALUES IN ('error');
CREATE TABLE logs_info PARTITION OF logs FOR VALUES IN ('info');
CREATE TABLE logs_warning PARTITION OF logs FOR VALUES IN ('warning');

Why do we need partitions?

Assume you have a table that is used under high traffic, and that table's getting 100 thousand new records each hour. That means 2.4 million new records every single day.

Usually, most users don't think it's a big deal in the beginning. But in time, the select queries could not respond in quickly. Also, some queries need more resource more than at the beginning.

Luckily, the partition comes to rescue us. If we split the data in the table, we probably wouldn't have those problems. Let's explain what I meant;

Let's think we have a query like the following;

SELECT count(id), type FROM logs 
WHERE timestamp >= '2021–06–06 02:00:00' AND timestamp < '2021–06–06 03:00:00' GROUP BY type;

If that query runs each hour, the database engine searches the whole table (or whole index), But it's not necessary. Because we know the data where there are. If the table has partitions by timestamp, the database engine will search the data in just a specific partition table, and the query could be completed in shorter than the non-partitioned table.

Sounds good; how do we use partition in PostgreSQL?

This article will examine LIST and Range (It's my favorite) partitions.

  • LIST Partition

In LIST partition, the table is partitioned into your chosen specific values. It's very useful to group the data by type. Let's make an example.

We have a log table, and we want to split it by log type.

CREATE TABLE logs
(
id serial,
timestamp timestamp default current_timestamp,
type varchar(128) not null,
message text null
) PARTITION BY LIST (type);

CREATE TABLE logs_error PARTITION OF logs FOR VALUES IN ('error');
CREATE TABLE logs_info PARTITION OF logs FOR VALUES IN ('info');
CREATE TABLE logs_warning PARTITION OF logs FOR VALUES IN ('warning');

Now we can insert some records.

INSERT INTO logs (type, message) VALUES ('error', 'This is an error message.');
INSERT INTO logs (type, message) VALUES ('info', 'This is an info message.');
INSERT INTO logs (type, message) VALUES ('warning', 'This is a warning message.');

Looks like everything is fine or not.

SELECT * FROM logs;

+--+--------------------------+-------+--------------------------+
|id|timestamp                 |type   |message                   |
+--+--------------------------+-------+--------------------------+
|1 |2021-06-18 15:03:18.167217|error  |This is an error message. |
|2 |2021-06-18 15:05:31.484505|info   |This is an info message.  |
|3 |2021-06-18 15:05:32.869462|warning|This is a warning message.|
+--+--------------------------+-------+--------------------------+

EXPLAIN SELECT * FROM logs WHERE type = 'info';

+----------------------------------------------------------+
|QUERY PLAN                                                |
+----------------------------------------------------------+
|Seq Scan on logs_info  (cost=0.00..12.88 rows=1 width=318)|
|  Filter: ((type)::text = 'info'::text)                   |
+----------------------------------------------------------+
  • Range Partition

It's used for splitting data between dates and numbers. That's my favorite because I usually use that to split the huge tables into the days. Let's make an example;

You know we have a log table but this time, we will partition it by timestamp.

CREATE TABLE logs
(
id serial,
timestamp timestamp default current_timestamp,
type varchar(128) not null,
message text null
) PARTITION BY RANGE (timestamp);

Now, we need partitions;

CREATE TABLE logs_20210606 PARTITION OF logs FOR VALUES FROM ('2021–06–06') TO ('2021–06–07');
CREATE TABLE logs_20210607 PARTITION OF logs FOR VALUES FROM ('2021–06–07') TO ('2021–06–08');
CREATE TABLE logs_20210608 PARTITION OF logs FOR VALUES FROM ('2021–06–08') TO ('2021–06–09');
CREATE TABLE logs_20210609 PARTITION OF logs FOR VALUES FROM ('2021–06–09') TO ('2021–06–10');

In the end, we can make insert;

INSERT INTO logs (timestamp, type, message) VALUES ('2021–06–06 02:12:12', 'error', 'This is an error message.');
INSERT INTO logs (timestamp, type, message) VALUES ('2021–06–06 02:12:13', 'info', 'This is an info message.');
INSERT INTO logs (timestamp, type, message) VALUES ('2021–06–06 02:12:14', 'warning', 'This is a warning message.');
INSERT INTO logs (timestamp, type, message) VALUES ('2021–06–08 22:12:12', 'info', 'This is an error message.');
INSERT INTO logs (timestamp, type, message) VALUES ('2021–06–08 17:42:23', 'info', 'This is an info message.');
INSERT INTO logs (timestamp, type, message) VALUES ('2021–06–08 15:00:00', 'warning', 'This is a warning message.');

Let's take a look log table.

SELECT * FROM logs;

+--+--------------------------+-------+--------------------------+
|id|timestamp                 |type   |message                   |
+--+--------------------------+-------+--------------------------+
|1 |2021-06-06 02:12:12.000000|error  |This is an error message. |
|2 |2021-06-06 02:12:13.000000|info   |This is an info message.  |
|3 |2021-06-06 02:12:14.000000|warning|This is a warning message.|
|4 |2021-06-08 22:12:12.000000|info   |This is an error message. |
|5 |2021-06-08 17:42:23.000000|info   |This is an info message.  |
|6 |2021-06-08 15:00:00.000000|warning|This is a warning message.|
+--+--------------------------+-------+--------------------------+

Here it is. All data seems in the log table. But I need specific days' data.

EXPLAIN SELECT * FROM logs WHERE timestamp < '2021-06-08';

+----------------------------------------------------------------------------------+
|QUERY PLAN                                                                        |
+----------------------------------------------------------------------------------+
|Append  (cost=0.00..26.52 rows=154 width=318)                                     |
|  ->  Seq Scan on logs_20210606  (cost=0.00..12.88 rows=77 width=318)             |
|        Filter: ("timestamp" < '2021-06-08 00:00:00'::timestamp without time zone)|
|  ->  Seq Scan on logs_20210607  (cost=0.00..12.88 rows=77 width=318)             |
|        Filter: ("timestamp" < '2021-06-08 00:00:00'::timestamp without time zone)|
+----------------------------------------------------------------------------------+

As you could see, The PostgreSQL engine run the query in a specific partition.


Finally

As you know, PostgreSQL is a powerful database, and it's developing each day. I wanted to show you PostgreSQL's partition feature in this article. I hope I can do that.

If you need more details, you could check it out.

5.11. Table Partitioning
5.11.&nbsp;Table Partitioning 5.11.1. Overview 5.11.2. Declarative Partitioning 5.11.3. Partitioning Using Inheritance 5.11.4. Partition Pruning 5.11.5. Partitioning and Constraint Exclusion 5.11.6. …