Обсуждение: [HACKERS] WIP Patch: Precalculate stable functions

Поиск
Список
Период
Сортировка

[HACKERS] WIP Patch: Precalculate stable functions

От
Marina Polyakova
Дата:
Hello everyone!

Now in Postgresql only immutable functions are precalculated; stable 
functions are calculated for every row so in fact they don't differ from 
volatile functions.

There's a proposal to precalculate stable and immutable functions (= 
calculate once for all output rows, but as many times as function is 
mentioned in query), if they don't return a set and their arguments are 
constants or recursively precalculated functions. The same for 
operators' functions, strict functions, tracking functions. It can be 
very effective, for example, there's a comparison for full text search 
in messages (Intel® Core™ i5-6500 CPU @ 3.20GHz × 4, RAM 8Gb):

Without precalculation:

EXPLAIN (ANALYZE TRUE, BUFFERS TRUE) SELECT COUNT(*) FROM messages WHERE 
body_tsvector @@ to_tsquery('postgres');                                                                QUERY 
PLAN

------------------------------------------------------------------------------------------------------
------------------------------------ Aggregate  (cost=18714.82..18714.83 rows=1 width=8) (actual 
time=2275.334..2275.334 rows=1 loops=1)   Buffers: shared hit=309234 read=184261   ->  Bitmap Heap Scan on messages
(cost=66.93..18702.34rows=4991 
 
width=0) (actual time=70.661..224
7.462 rows=151967 loops=1)         Recheck Cond: (body_tsvector @@ to_tsquery('postgres'::text))         Rows Removed
byIndex Recheck: 118531         Heap Blocks: exact=56726 lossy=33286         Buffers: shared hit=309234 read=184261
   ->  Bitmap Index Scan on message_body_idx  (cost=0.00..65.68 
 
rows=4991 width=0) (actual time=
54.599..54.599 rows=151967 loops=1)               Index Cond: (body_tsvector @@ 
to_tsquery('postgres'::text))               Buffers: shared hit=1 read=37 Planning time: 0.493 ms Execution time:
2276.412ms
 
(12 rows)

With precalculation:

EXPLAIN (ANALYZE TRUE, BUFFERS TRUE) SELECT COUNT(*) FROM messages WHERE 
body_tsvector @@ to_tsquery('postgres');                                                                  QUERY 
PLAN

------------------------------------------------------------------------------------------------------
---------------------------------------- Aggregate  (cost=192269.70..192269.71 rows=1 width=8) (actual 
time=1458.679..1458.680 rows=1 loops=1)   Buffers: shared hit=309234 read=184261   ->  Bitmap Heap Scan on messages
(cost=1445.68..191883.51
 
rows=154474 width=0) (actual time=70.069
..1433.999 rows=151967 loops=1)         Recheck Cond: (body_tsvector @@ to_tsquery('postgres'::text))         Rows
Removedby Index Recheck: 118531         Heap Blocks: exact=56726 lossy=33286         Buffers: shared hit=309234
read=184261        ->  Bitmap Index Scan on message_body_idx  (cost=0.00..1406.81 
 
rows=154474 width=0) (actual t
ime=56.149..56.149 rows=151967 loops=1)               Index Cond: (body_tsvector @@ 
to_tsquery('postgres'::text))               Buffers: shared hit=1 read=37 Planning time: 1.644 ms Execution time:
1459.836ms
 
(12 rows)

Patch is attached. It isn't done yet:
- changing documentation (partly because of next lines);
- precalculation of expressions IS DISTINCT FROM and NULLIF which use 
nonvolatile equality operators;
- precalculation of expressions "scalar op ANY/ALL (array)" which use 
nonvolatile operators;
- precalculation of row compare expressions which use nonvolatile 
operators.

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
+7 926 92 00 265



[HACKERS] Fwd: WIP Patch: Precalculate stable functions

От
Marina Polyakova
Дата:
Sorry, attached patch.

-------- Исходное сообщение --------
Тема: WIP Patch: Precalculate stable functions
Дата: 20-04-2017 19:56
От: Marina Polyakova <m.polyakova@postgrespro.ru>
Кому: pgsql-hackers@postgresql.org

Hello everyone!

Now in Postgresql only immutable functions are precalculated; stable 
functions are calculated for every row so in fact they don't differ from 
volatile functions.

There's a proposal to precalculate stable and immutable functions (= 
calculate once for all output rows, but as many times as function is 
mentioned in query), if they don't return a set and their arguments are 
constants or recursively precalculated functions. The same for 
operators' functions, strict functions, tracking functions. It can be 
very effective, for example, there's a comparison for full text search 
in messages (Intel® Core™ i5-6500 CPU @ 3.20GHz × 4, RAM 8Gb):

Without precalculation:

EXPLAIN (ANALYZE TRUE, BUFFERS TRUE) SELECT COUNT(*) FROM messages WHERE 
body_tsvector @@ to_tsquery('postgres');
                                                                 QUERY 
PLAN

------------------------------------------------------------------------------------------------------
------------------------------------
  Aggregate  (cost=18714.82..18714.83 rows=1 width=8) (actual 
time=2275.334..2275.334 rows=1 loops=1)
    Buffers: shared hit=309234 read=184261
    ->  Bitmap Heap Scan on messages  (cost=66.93..18702.34 rows=4991 
width=0) (actual time=70.661..224
7.462 rows=151967 loops=1)
          Recheck Cond: (body_tsvector @@ to_tsquery('postgres'::text))
          Rows Removed by Index Recheck: 118531
          Heap Blocks: exact=56726 lossy=33286
          Buffers: shared hit=309234 read=184261
          ->  Bitmap Index Scan on message_body_idx  (cost=0.00..65.68 
rows=4991 width=0) (actual time=
54.599..54.599 rows=151967 loops=1)
                Index Cond: (body_tsvector @@ 
to_tsquery('postgres'::text))
                Buffers: shared hit=1 read=37
  Planning time: 0.493 ms
  Execution time: 2276.412 ms
(12 rows)

With precalculation:

EXPLAIN (ANALYZE TRUE, BUFFERS TRUE) SELECT COUNT(*) FROM messages WHERE 
body_tsvector @@ to_tsquery('postgres');
                                                                   QUERY 
PLAN

------------------------------------------------------------------------------------------------------
----------------------------------------
  Aggregate  (cost=192269.70..192269.71 rows=1 width=8) (actual 
time=1458.679..1458.680 rows=1 loops=1)
    Buffers: shared hit=309234 read=184261
    ->  Bitmap Heap Scan on messages  (cost=1445.68..191883.51 
rows=154474 width=0) (actual time=70.069
..1433.999 rows=151967 loops=1)
          Recheck Cond: (body_tsvector @@ to_tsquery('postgres'::text))
          Rows Removed by Index Recheck: 118531
          Heap Blocks: exact=56726 lossy=33286
          Buffers: shared hit=309234 read=184261
          ->  Bitmap Index Scan on message_body_idx  (cost=0.00..1406.81 
rows=154474 width=0) (actual t
ime=56.149..56.149 rows=151967 loops=1)
                Index Cond: (body_tsvector @@ 
to_tsquery('postgres'::text))
                Buffers: shared hit=1 read=37
  Planning time: 1.644 ms
  Execution time: 1459.836 ms
(12 rows)

Patch is attached. It isn't done yet:
- changing documentation (partly because of next lines);
- precalculation of expressions IS DISTINCT FROM and NULLIF which use 
nonvolatile equality operators;
- precalculation of expressions "scalar op ANY/ALL (array)" which use 
nonvolatile operators;
- precalculation of row compare expressions which use nonvolatile 
operators.

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
+7 926 92 00 265
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Fwd: WIP Patch: Precalculate stable functions

От
Tom Lane
Дата:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> Now in Postgresql only immutable functions are precalculated; stable
> functions are calculated for every row so in fact they don't differ from
> volatile functions.

> There's a proposal to precalculate stable and immutable functions (=
> calculate once for all output rows, but as many times as function is
> mentioned in query), if they don't return a set and their arguments are
> constants or recursively precalculated functions.

Have you looked at the previous efforts in this direction?  The last
discussion I can find is

https://www.postgresql.org/message-id/flat/CABRT9RA-RomVS-yzQ2wUtZ%3Dm-eV61LcbrL1P1J3jydPStTfc6Q%40mail.gmail.com

In particular, that relied on the planner to decide which subtrees were
worth caching and insert marker nodes for the purpose.  I'm not certain
that that's better than putting the intelligence into execExpr.c, but
I'm not sure it isn't either.  In principle we could afford to spend
more effort on making such determinations at plan time than we should
do at executor startup.  Also, the fundamental implementation seemed
less invasive, in that only the marker node type had to know about the
caching behavior, whereas I gather from your description that what you
are doing is going to end up touching almost all node types.

v10's new expression eval technology is sufficiently different that
it may well be that that old approach isn't very relevant anymore.
But it would be a good idea to look.
        regards, tom lane