record identical operator
От | Kevin Grittner |
---|---|
Тема | record identical operator |
Дата | |
Msg-id | 1379024847.48294.YahooMailNeo@web162904.mail.bf1.yahoo.com обсуждение исходный текст |
Ответы |
Re: record identical operator
(Benedikt Grundmann <bgrundmann@janestreet.com>)
Re: record identical operator (Andres Freund <andres@2ndquadrant.com>) Re: record identical operator - Review (Steve Singer <steve@ssinger.info>) |
Список | pgsql-hackers |
Attached is a patch for a bit of infrastructure I believe to be necessary for correct behavior of REFRESH MATERIALIZED VIEW CONCURRENTLY as well as incremental maintenance of matviews. The idea is that after RMVC or incremental maintenance, the matview should not be visibly different that it would have been at creation or on a non-concurrent REFRESH. The issue is easy to demonstrate with citext, but anywhere that the = operator allows user-visible differences between "equal" values it can be an issue. test=# CREATE TABLE citext_table ( test(# id serial primary key, test(# name citext test(# ); CREATE TABLE test=# INSERT INTO citext_table (name) test-# VALUES ('one'), ('two'), ('three'), (NULL), (NULL); INSERT 0 5 test=# CREATE MATERIALIZED VIEW citext_matview AS test-# SELECT * FROM citext_table; SELECT 5 test=# CREATE UNIQUE INDEX citext_matview_id test-# ON citext_matview (id); CREATE INDEX test=# UPDATE citext_table SET name = 'Two' WHERE name = 'TWO'; UPDATE 1 At this point, the table and the matview have visibly different values, yet without the patch the query used to find differences for RMVC would be essentially like this (slightly simplified for readability): test=# SELECT * FROM citext_matview m FULL JOIN citext_table t ON (t.id = m.id AND t = m) WHERE t IS NULL OR m IS NULL; id | name | id | name ----+------+----+------ (0 rows) No differences were found, so without this patch, the matview would remain visibly different from the results generated by a run of its defining query. The patch adds an "identical" operator (===) for the record type: test=# SELECT * FROM citext_matview m FULL JOIN citext_table t ON (t.id = m.id AND t === m) WHERE t IS NULL OR m IS NULL; id | name | id | name ----+------+----+------ | | 2 | Two 2 | two | | (2 rows) The difference is now found, so RMVC makes the appropriate change. test=# REFRESH MATERIALIZED VIEW CONCURRENTLY citext_matview; REFRESH MATERIALIZED VIEW test=# SELECT * FROM citext_matview ORDER BY id; id | name ----+------- 1 | one 2 | Two 3 | three 4 | 5 | (5 rows) The patch adds all of the functions, operators, and catalog information to support merge joins using the "identical" operator. The new operator is logically similar to IS NOT DISTINCT FROM for a record, although its implementation is very different. For one thing, it doesn't replace the operation with column level operators in the parser. For another thing, it doesn't look up operators for each type, so the "identical" operator does not need to be implemented for each type to use it as shown above. It compares values byte-for-byte, after detoasting. The test for identical records can avoid the detoasting altogether for any values with different lengths, and it stops when it finds the first column with a difference. I toyed with the idea of supporting hashing of records using this operator, but could not see how that would be a performance win. The identical (===) and not identical (!==) operator names were chosen because of a vague similarity to the "exactly equals" concepts in JavaScript and PHP, which use that name. The semantics aren't quite the same, but it seemed close enough not to be too surprising. The additional operator names seemed natural to me based on the first two, but I'm not really that attached to these names for the operators if someone has a better idea. Since the comparison of record values is not documented (only comparisons involving row value constructors), it doesn't seem like we should document this special case. It is intended primarily for support of matview refresh and maintenance, and it seems likely that record comparison was not documented on the basis that it is intended primarily for support of such things as indexing and merge joins -- so leaving the new operators undocumented seems consistent with existing policy. I'm open to arguments that the policy should change. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: