IVM (Incremental View Maintenance) implementation as a PostgreSQL extension
Find a file
2022-06-02 11:27:41 +09:00
expected Fix to delete the pg_ivm_immv entry when an IMMV is dropped 2022-04-28 19:47:30 +09:00
rpm Fix spec file 2022-06-02 11:27:41 +09:00
sql Fix to delete the pg_ivm_immv entry when an IMMV is dropped 2022-04-28 19:47:30 +09:00
createas.c Split files to make it easier to follow the core code 2022-04-27 14:45:47 +09:00
LICENSE Add copyright 2022-04-01 10:27:00 +09:00
Makefile Fix to delete the pg_ivm_immv entry when an IMMV is dropped 2022-04-28 19:47:30 +09:00
matview.c Split files to make it easier to follow the core code 2022-04-27 14:45:47 +09:00
pg_ivm--1.0.sql Fix to delete the pg_ivm_immv entry when an IMMV is dropped 2022-04-28 19:47:30 +09:00
pg_ivm.c Fix to parse relname and column in create_immv arguments 2022-04-28 18:59:48 +09:00
pg_ivm.control Initial release of pg_ivm 1.0 2022-04-01 01:08:28 +09:00
pg_ivm.h Fix a header gurad macro 2022-05-03 03:19:58 +09:00
README.md Update README.md 2022-05-12 16:25:15 +09:00

pg_ivm

The pg_ivm module provides Incremental View Maintenance (IVM) feature for PostgreSQL.

The extension is compatible with PostgreSQL 14.

Description

Incremental View Maintenance (IVM) is a way to make materialized views up-to-date in which only incremental changes are computed and applied on views rather than recomputing the contents from scratch as REFRESH MATERIALIZED VIEW does. IVM can update materialized views more efficiently than recomputation when only small parts of the view are changed.

There are two approaches with regard to timing of view maintenance: immediate and deferred. In immediate maintenance, views are updated in the same transaction that its base table is modified. In deferred maintenance, views are updated after the transaction is committed, for example, when the view is accessed, as a response to user command like REFRESH MATERIALIZED VIEW, or periodically in background, and so on. pg_ivm provides a kind of immediate maintenance, in which materialized views are updated immediately in AFTER triggers when a base table is modified.

We call a materialized view supporting IVM an Incrementally Maintainable Materialized View (IMMV). To create IMMV, you have to call create_immv function with a relation name and a view definition query. For example:

SELECT create_immv('myview', 'SELECT * FROM mytab');

creates an IMMV with name 'myview' defined as 'SELECT * FROM mytab'. This is corresponding to the following command to create a normal materialized view;

CREATE MATERIALIZED VIEW myview AS SELECT * FROM mytab;

When an IMMV is created, some triggers are automatically created so that the view's contents are immediately updated when its base tables are modified.

postgres=# SELECT create_immv('m', 'SELECT * FROM t0');
NOTICE:  could not create an index on immv "m" automatically
DETAIL:  This target list does not have all the primary key columns, or this view does not contain DISTINCT clause.
HINT:  Create an index on the immv for efficient incremental maintenance.
 create_immv 
-------------
           3
(1 row)

postgres=# SELECT * FROM m;
 i
---
 1
 2
 3
(3 rows)

postgres=# INSERT INTO t0 VALUES (4);
INSERT 0 1
postgres=# SELECT * FROM m; -- automatically updated
 i
---
 1
 2
 3
 4
(4 rows)

Installation

To install pg_ivm, execute this in the module's directory:

make install

If you installed PostgreSQL from rpm or deb, you will need the devel package (for example, postgresql14-devel or postgresql-server-dev-14).

Important: Don't forget to set the PG_CONFIG variable (make PG_CONFIG=...) in case you want to test pg_ivm on a non-default or custom build of PostgreSQL. Read more here.

And, execute CREATE EXTENSION comand.

CREATE EXTENSION pg_ivm;

RPM packages and yum repository

RPM packages of pg_ivm are available from the PostgreSQL yum repository. See the instruction for details. Note that we are not the maintainer of this yum repository and RPMs for pg_ivm in it may be not always latest.

Objects

When pg_ivm is installed, the following objects are created.

IMMV creation function

Use create_immv function to creae IMMV.

create_immv(immv_name text, view_definition text) RETURNS bigint

create_immv defined a new IMMV of a query. A table of the name immv_name is created and a query specified by view_definition is executed and used to populate the IMMV. The query is stored in pg_ivm_immv, so that it can be refreshed later upon incremental view maintenance. create_immv returns the number of rows in the created IMMV.

When an IMMV is created, some triggers are automatically created so that the view's contents are immediately updated when its base tables are modified. In addition, a unique index is created on the IMMV automatically if possible. If the IMMV contains all primary key attritubes of its base tables in the target list, a unique index is created on these attritubes. Or, if the view has DISTINCT clause, a unique index is created on all columns in the target list. In other cases, no index is created.

IMMV metadata catalog

The catalog pg_ivm_immv stores IMMV information.

Name Type Description
immvrelid regclass The OID of the IMMV
viewdef text Query tree (in the form of a nodeToString() representation) for the view definition

Example

In general, IMMVs allow faster updates than REFRESH MATERIALIZED VIEW at the price of slower updates to their base tables. Update of base tables is slower because triggers will be invoked and the IMMV is updated in triggers per modification statement.

For example, suppose a normal materialized view defined as below:

test=# CREATE MATERIALIZED VIEW mv_normal AS
        SELECT a.aid, b.bid, a.abalance, b.bbalance
        FROM pgbench_accounts a JOIN pgbench_branches b USING(bid);
SELECT 10000000

Updating a tuple in a base table of this materialized view is rapid but the REFRESH MATERIALIZED VIEW command on this view takes a long time:

test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1;
UPDATE 1
Time: 9.052 ms

test=# REFRESH MATERIALIZED VIEW mv_normal ;
REFRESH MATERIALIZED VIEW
Time: 20575.721 ms (00:20.576)

On the other hand, after creating IMMV with the same view definition as below:

test=# SELECT create_immv('immv',
        'SELECT a.aid, b.bid, a.abalance, b.bbalance
         FROM pgbench_accounts a JOIN pgbench_branches b USING(bid)');
NOTICE:  created index "immv_index" on immv "immv"
 create_immv 
-------------
    10000000
(1 row)

updating a tuple in a base table takes more than the normal view, but its content is updated automatically and this is faster than the REFRESH MATERIALIZED VIEW command.

test=# UPDATE pgbench_accounts SET abalance = 1234 WHERE aid = 1;
UPDATE 1
Time: 15.448 ms

test=# SELECT * FROM immv WHERE aid = 1;
 aid | bid | abalance | bbalance 
-----+-----+----------+----------
   1 |   1 |     1234 |        0
(1 row)

An appropriate indexe on IMMV is necessary for efficient IVM because we need to looks for tuples to be updated in IMMV. If there are no indexes, it will take a long time.

Therefore, when an IMMV is created by the create_immv function, a unique index is created on it automatically if possible. If the IMMV contains all primary key attritubes of its base tables in the target list, a unique index is created on these attritubes. Also, if the view has DISTINCT clause, a unique index is created on all columns in the target list. In other cases, no index is created.

In the previous example, a unique index "immv_index" is created on aid and bid columns of "immv", and this enables the rapid update of the view. Dropping this index make updating the view take a loger time.

test=# DROP INDEX immv_index;
DROP INDEX

test=# UPDATE pgbench_accounts SET abalance = 9876 WHERE aid = 1;
UPDATE 1
Time: 3224.741 ms (00:03.225)

Supported View Definitions and Restriction

Currently, IMMV's view definition can contain inner joins, and DISTINCT clause. Inner joins including self-join are supported, but outer joins are not supported. Aggregates, sub-quereis, CTEs, window functions, LIMIT/OFFSET, UNION/INTERSECT/EXCEPT, DISTINCT ON, TABLEAMPLE, VALUES, and FOR UPDATE/SHARE can not be used in view definition.

The base tables must be simple. Views, materialized views, inheritance parent tables, partitioned tables, partitions, and foreign tables can not be used.

The targetlist cannot contain system columns, columns whose name starts with __ivm_.

Logical replication is not supported, that is, even when a base table at a publisher node is modified, IMMVs at subscriber nodes defined on these base tables are not updated.

When the TRUNCATE command is executed on a base table, nothing is changed on the IMMV.

Notes

DISTINCT

DISTINCT is allowed in IMMV's definition queries. Suppose an IMMV defined with DISTINCT on a base table containing duplicate tuples. When tuples are deleted from the base table, a tuple in the view is deleted if and only if the multiplicity of the tuple becomes zero. Moreover, when tuples are inserted into the base table, a tuple is inserted into the view only if the same tuple doesn't already exist in it.

Physically, an IMMV defined with DISTINCT contains tuples after eliminating duplicates, and the multiplicity of each tuple is stored in a extra column named __ivm_count__ that is added when such IMMV is created.

Concurrent Transactions

Suppose an IMMV is defined on two base tables and each table was modified in different a concurrent transaction simultaneously. In the transaction which was committed first, the IMMV can be updated considering only the change which happened in this transaction. On the other hand, in order to update the IMMV correctly in the transaction which was committed later, we need to know the changes occurred in both transactions. For this reason, ExclusiveLock is held on an IMMV immediately after a base table is modified in READ COMMITTED mode to make sure that the IMMV is updated in the latter transaction after the former transaction is committed. In REPEATABLE READ or SERIALIZABLE mode, an error is raised immediately if lock acquisition fails because any changes which occurred in other transactions are not be visible in these modes and IMMV cannot be updated correctly in such situations. However, as an exception if the IMMV has only one base table, the lock held on the IMMV is RowExclusiveLock.

Row Level Security

If some base tables have row level security policy, rows that are not visible to the materialized view's owner are excluded from the result. In addition, such rows are excluded as well when views are incrementally maintained. However, if a new policy is defined or policies are changed after the materialized view was created, the new policy will not be applied to the view contents. To apply the new policy, you need to recreate IMMV.

Authors

IVM Development Group

License

PostgreSQL License

  • Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
  • Portions Copyright (c) 2022, IVM Development Group