refresh materialized view concurrently
От | Kevin Grittner |
---|---|
Тема | refresh materialized view concurrently |
Дата | |
Msg-id | 1371225929.28496.YahooMailNeo@web162905.mail.bf1.yahoo.com обсуждение исходный текст |
Ответы |
Re: refresh materialized view concurrently
(Simon Riggs <simon@2ndQuadrant.com>)
Re: refresh materialized view concurrently (Heikki Linnakangas <hlinnakangas@vmware.com>) Re: refresh materialized view concurrently (Hitoshi Harada <umi.tanuki@gmail.com>) |
Список | pgsql-hackers |
Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. It is my hope to get this committed during this CF to allow me to focus on incremental maintenance for the rest of the release cycle. I didn't need to touch very much outside of matview-specific files for this. My biggest concern is that I needed two small functions which did *exactly* what some static functions in ri_triggers.c were doing and couldn't see where the best place to share them from was. For the moment I just duplicated them, but my hope would be that they could be put in a suitable location and called from both places, rather than duplicating the 30-some lines of code. The function signatures are: void quoteOneName(char *buffer, const char *name) void quoteRelationName(char *buffer, Relation rel) Comments in the patch describe the technique used for the transactional refresh, but I'm not sure how easy it is to understand the technique from the comments. Here is a demonstration of the basic technique, using a table to mock the materialized view so it can be run directly. ------------------------------------------------------------------- -- -- Setup -- drop table if exists n, nt, nd cascade; drop table if exists nm; create table n (id int not null primary key, val text); insert into n values (1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'five'), (6, null), (7, null), (8, null), (9, null); -- We use a table to mock this materialized view definition: -- create materialized view nm as select * from n; create table nm as select * from n; insert into n values (10, 'ten'), (11, null); update n set val = 'zwei' where id = 2; update n set val = null where id = 3; update n set id = 44, val = 'forty-four' where id = 4; update n set val = 'seven' where id = 7; delete from n where id = 5; delete from n where id = 8; vacuum analyze; -- -- Sample of internal processing for REFRESH MV CONCURRENTLY. -- begin; create temp table nt as select * from n; analyze nt; create temp table nd as SELECT x.ctid as tid, y FROM nm x FULL JOIN n y ON (y.id OPERATOR(pg_catalog.=) x.id) WHERE (y.*) IS DISTINCT FROM (x.*) ORDER BY tid; analyze nd; delete from nm where ctid in (select tid from nd where tid is not null and y is not distinct from null); update nm x set id = (d.y).id, val = (d.y).val from nd d where d.tid is not null and x.ctid = d.tid; insert into nm select (y).* from nd where tid is null; commit; -- -- Check that results match. -- select * from n order by id; select * from nm order by id; ------------------------------------------------------------------- I also tried a million-row materialized view with the patch to see what the performace was like on a large table with just a few changes. I was surprised that a small change-set like this was actually faster than replacing the heap, at least on my machine. Obviously, when a larger number of rows are affected the transactional CONCURRENTLY option will be slower, and this is not intended in any way as a performace-enhancing feature, that was just a happy surprise in testing. ------------------------------------------------------------------- -- drop from previous test drop table if exists testv cascade; -- create and populate permanent table create table testv (id int primary key, val text); insert into testv select n, cash_words((floor(random() * 100000000) / 100)::text::money) from (select generate_series(1, 2000000, 2)) s(n); update testv set val = NULL where id = 547345; create materialized view matv as select * from testv; create unique index matv_id on matv (id); vacuum analyze matv; delete from testv where id = 16405; insert into testv values (393466, cash_words((floor(random() * 100000000) / 100)::text::money)); update testv set val = cash_words((floor(random() * 100000000) / 100)::text::money) where id = 1947141; refresh materialized view concurrently matv; ------------------------------------------------------------------- People may be surprised to see this using SPI even more than ri_triggers.c does. I think this is the safest and most maintainable approach, although I welcome alternative suggestions. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: