Re: record identical operator

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: record identical operator
Дата
Msg-id 1379076896.90225.YahooMailNeo@web162901.mail.bf1.yahoo.com
обсуждение исходный текст
Ответ на Re: record identical operator  (Benedikt Grundmann <bgrundmann@janestreet.com>)
Список pgsql-hackers
Benedikt Grundmann <bgrundmann@janestreet.com> wrote:

> Kevin Grittner <kgrittn@ymail.com> wrote:
>
>> Attached is a patch for a bit of infrastructure I believe to be
>> necessary for correct behavior of REFRESH MATERIALIZED VIEW
>> CONCURRENTLY as well as incremental maintenance of matviews.
>> [...]
>> The patch adds an "identical" operator (===) for the record
>> type:

> Wouldn't it be slightly less surprising / magical to not declare
> new operators but just new primitive functions?  In the least
> invasive version they could even be called
> matview_is_record_identical or similar.

I'm not sure what is particularly surprising or magical about new
operators, but it is true that the patch could be smaller if
operators were not added.  The SQL functions added by the current
patch to support the operator approach are:

extern Datum record_image_eq(PG_FUNCTION_ARGS);
extern Datum record_image_ne(PG_FUNCTION_ARGS);
extern Datum record_image_lt(PG_FUNCTION_ARGS);
extern Datum record_image_gt(PG_FUNCTION_ARGS);
extern Datum record_image_le(PG_FUNCTION_ARGS);
extern Datum record_image_ge(PG_FUNCTION_ARGS);
extern Datum btrecordimagecmp(PG_FUNCTION_ARGS);

All take two record arguments and all but the last return a
boolean.  The last one returns -1, 0, or 1 depending on how the
values compare.  All comparisons are based on memcmp() of the data
in its storage format (after detoasting, where applicable).

As currently written, the patch still requires a unique index on
the matview in order to allow RMVC, but this patch was intended to
support removal of that restriction, which is something some people
were saying they wanted.  It just seemed best to do that with a
separate patch once we had the mechanism to support it.

RMVC currently generates its "diff" data with a query similar to
this:

test=# explain analyze
test-# SELECT *
test-#   FROM citext_matview m
test-#   FULL JOIN citext_table t ON (t === m)
test-#   WHERE t.id IS NULL OR m.id IS NULL;
test=# \pset pager off
Pager usage is off.
test=# explain analyze
SELECT *
  FROM citext_matview m
  FULL JOIN citext_table t ON (t.id = m.id AND t === m)
  WHERE t.id IS NULL OR m.id IS NULL;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=1.11..2.24 rows=1 width=16) (actual time=0.056..0.056 rows=0 loops=1)
   Hash Cond: (m.id = t.id)
   Join Filter: (t.* === m.*)
   Filter: ((t.id IS NULL) OR (m.id IS NULL))
   Rows Removed by Filter: 5
   ->  Seq Scan on citext_matview m  (cost=0.00..1.05 rows=5 width=40) (actual time=0.002..0.006 rows=5 loops=1)
   ->  Hash  (cost=1.05..1.05 rows=5 width=40) (actual time=0.023..0.023 rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Seq Scan on citext_table t  (cost=0.00..1.05 rows=5 width=40) (actual time=0.010..0.016 rows=5 loops=1)
 Total runtime: 0.102 ms
(10 rows)

With the operator support, we can remove the primary key columns
from the join conditions, and it still works, albeit with a slower
plan:

test=# explain analyze
SELECT *
  FROM citext_matview m
  FULL JOIN citext_table t ON (t === m)
  WHERE t.id IS NULL OR m.id IS NULL;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Merge Full Join  (cost=2.22..2.32 rows=1 width=16) (actual time=0.072..0.072 rows=0 loops=1)
   Merge Cond: (m.* === t.*)
   Filter: ((t.id IS NULL) OR (m.id IS NULL))
   Rows Removed by Filter: 5
   ->  Sort  (cost=1.11..1.12 rows=5 width=40) (actual time=0.035..0.035 rows=5 loops=1)
         Sort Key: m.*
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on citext_matview m  (cost=0.00..1.05 rows=5 width=40) (actual time=0.012..0.016 rows=5 loops=1)
   ->  Sort  (cost=1.11..1.12 rows=5 width=40) (actual time=0.014..0.014 rows=5 loops=1)
         Sort Key: t.*
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on citext_table t  (cost=0.00..1.05 rows=5 width=40) (actual time=0.003..0.003 rows=5 loops=1)
 Total runtime: 0.128 ms
(13 rows)

So, if the operators are included it will be a very small and
simple patch to relax the requirement for a unique index.
Currently we generate an error if no index columns were found, but
with the new operators we could leave them out and the query would
still work.  Adding indexes to matviews would then be just a matter
of optimization, not a requirement to be able to use the RMVC
feature.

Using the functions instead of the operators things work just as
well as long as we use columns in the join conditions, which is
currently based on indexed columns:

test=# explain analyze
test-# SELECT *
test-#   FROM citext_matview m
test-#   FULL JOIN citext_table t ON (t.id = m.id AND record_image_eq(t, m))
test-#   WHERE t.id IS NULL OR m.id IS NULL;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=1.11..2.24 rows=1 width=16) (actual time=0.056..0.056 rows=0 loops=1)
   Hash Cond: (m.id = t.id)
   Join Filter: record_image_eq(t.*, m.*)
   Filter: ((t.id IS NULL) OR (m.id IS NULL))
   Rows Removed by Filter: 5
   ->  Seq Scan on citext_matview m  (cost=0.00..1.05 rows=5 width=40) (actual time=0.003..0.006 rows=5 loops=1)
   ->  Hash  (cost=1.05..1.05 rows=5 width=40) (actual time=0.023..0.023 rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Seq Scan on citext_table t  (cost=0.00..1.05 rows=5 width=40) (actual time=0.011..0.015 rows=5 loops=1)
 Total runtime: 0.101 ms
(10 rows)

Without columns in the FULL JOIN conditions, the function fails
entirely, because it is the operator information which lets the
planner know that a FULL JOIN is even possible:

test=# explain analyze
test-# SELECT *
test-#   FROM citext_matview m
test-#   FULL JOIN citext_table t ON (record_image_eq(t, m))
test-#   WHERE t.id IS NULL OR m.id IS NULL;
ERROR:  FULL JOIN is only supported with merge-joinable or hash-joinable join conditions

So, either the indexes would continue to be required, or we would
need some other criteria for deciding which columns to use for the
additional FULL JOIN criteria.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Protocol forced to V2 in low-memory conditions?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Custom Plan node