Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От a.rybakina
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id 668892c1-fb11-3a79-ce5e-1c194b7b3263@postgrespro.ru
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  ("a.rybakina" <a.rybakina@postgrespro.ru>)
Ответы Re: POC, WIP: OR-clause support for indexes  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On 29.09.2023 20:35, a.rybakina wrote:

I'm sorry I didn't write for a long time, but I really had a very difficult month, now I'm fully back to work.I was able to implement the patches to the end and moved the transformation of "OR" expressions to ANY. I haven't seen a big difference between them yet, one has a transformation before calculating selectivity (v7.1-Replace-OR-clause-to-ANY.patch), the other after (v7.2-Replace-OR-clause-to-ANY.patch). Regression tests are passing, I don't see any problems with selectivity, nothing has fallen into the coredump, but I found some incorrect transformations. What is the reason for these inaccuracies, I have not found, but, to be honest, they look unusual). Gave the error below.

In the patch, I don't like that I had to drag three libraries from parsing until I found a way around it.The advantage of this approach compared to the other ([1]) is that at this stage all possible or transformations are performed, compared to the patch, where the transformation was done at the parsing stage. That is, here, for example, there are such optimizations in the transformation:
I took the common element out of the bracket and the rest is converted to ANY, while, as noted by Peter Geoghegan, we did not have several bitmapscans, but only one scan through the array.

postgres=# explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 AND prolang=1 OR prolang = 13 AND prolang = 2 OR prolang = 13 AND prolang = 3;
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..151.66 rows=1 width=68) (actual time=1.167..1.168 rows=0 loops=1)
   Filter: ((prolang = '13'::oid) AND (prolang = ANY (ARRAY['1'::oid, '2'::oid, '3'::oid])))
   Rows Removed by Filter: 3302
 Planning Time: 0.146 ms
 Execution Time: 1.191 ms
(5 rows)

Falls into coredump at me:
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;



Hi, all!

I fixed the kernel dump issue and all the regression tests were successful, but I discovered another problem when I added my own regression tests.
Some queries that contain "or" expressions do not convert to "ANY". I have described this in more detail using diff as expected and real results:diff -U3 /home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out /home/alena/postgrespro__copy6/src/test/regress/results/create_index.out
--- /home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out 2023-10-04 21:54:12.496282667 +0300
+++ /home/alena/postgrespro__copy6/src/test/regress/results/create_index.out  2023-10-04 21:55:41.665422459 +0300
@@ -1925,17 +1925,20 @@
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41;
-                                               QUERY PLAN                                               
---------------------------------------------------------------------------------------------------------
+                                                        QUERY PLAN                                                         
+---------------------------------------------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: (((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[]))) OR (thousand = 41))
+         Recheck Cond: ((((thousand = 42) AND (tenthous = 1)) OR ((thousand = 42) AND (tenthous = 3))) OR (thousand = 41))
          ->  BitmapOr
-               ->  Bitmap Index Scan on tenk1_thous_tenthous
-                     Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[])))
+               ->  BitmapOr
+                     ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Index Cond: ((thousand = 42) AND (tenthous = 1))
+                     ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Index Cond: ((thousand = 42) AND (tenthous = 3))
                ->  Bitmap Index Scan on tenk1_thous_tenthous
                      Index Cond: (thousand = 41)
-(8 rows)
+(11 rows)
@@ -1946,24 +1949,50 @@
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
+  WHERE thousand = 42 OR tenthous = 1 AND thousand = 42 OR tenthous = 1;
+                                            QUERY PLAN                                             
+---------------------------------------------------------------------------------------------------
+ Aggregate
+   ->  Bitmap Heap Scan on tenk1
+         Recheck Cond: ((thousand = 42) OR ((thousand = 42) AND (tenthous = 1)) OR (tenthous = 1))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Index Cond: (thousand = 42)
+               ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Index Cond: ((thousand = 42) AND (tenthous = 1))
+               ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Index Cond: (tenthous = 1)
+(10 rows)
+
+SELECT count(*) FROM tenk1
+  WHERE thousand = 42 OR tenthous = 1 AND thousand = 42 OR tenthous = 1;
+ count
+-------
+    11
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous < 2) OR thousand = 41;
-                                                         QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: (((hundred = 42) AND ((tenthous < 2) OR (thousand = ANY ('{42,99}'::integer[])))) OR (thousand = 41))
+         Recheck Cond: (((hundred = 42) AND ((thousand = 42) OR (thousand = 99) OR (tenthous < 2))) OR (thousand = 41))
          ->  BitmapOr
                ->  BitmapAnd
                      ->  Bitmap Index Scan on tenk1_hundred
                            Index Cond: (hundred = 42)
                      ->  BitmapOr
                            ->  Bitmap Index Scan on tenk1_thous_tenthous
-                                 Index Cond: (tenthous < 2)
+                                 Index Cond: (thousand = 42)
                            ->  Bitmap Index Scan on tenk1_thous_tenthous
-                                 Index Cond: (thousand = ANY ('{42,99}'::integer[]))
+                                 Index Cond: (thousand = 99)
+                           ->  Bitmap Index Scan on tenk1_thous_tenthous
+                                 Index Cond: (tenthous < 2)
                ->  Bitmap Index Scan on tenk1_thous_tenthous
                      Index Cond: (thousand = 41)
-(14 rows)
+(16 rows)

diff -U3 /home/alena/postgrespro__copy6/src/test/regress/expected/join.out /home/alena/postgrespro__copy6/src/test/regress/results/join.out
--- /home/alena/postgrespro__copy6/src/test/regress/expected/join.out 2023-10-04 21:53:55.632069079 +0300
+++ /home/alena/postgrespro__copy6/src/test/regress/results/join.out  2023-10-04 21:55:46.597485979 +0300
 explain (costs off)
 select * from tenk1 a join tenk1 b on
   (a.unique1 < 20 or a.unique1 = 3 or a.unique1 = 1 and b.unique1 = 2) or
   ((a.unique2 = 3 or a.unique2 = 7) and b.hundred = 4);
-                                                                          QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop
-   Join Filter: ((a.unique1 < 20) OR ((a.unique1 = 1) AND (b.unique1 = 2)) OR ((a.unique2 = ANY ('{3,7}'::integer[])) AND (b.hundred = 4)) OR (a.unique1 = 3))
+   Join Filter: ((a.unique1 < 20) OR (a.unique1 = 3) OR ((a.unique1 = 1) AND (b.unique1 = 2)) OR (((a.unique2 = 3) OR (a.unique2 = 7)) AND (b.hundred = 4)))
    ->  Seq Scan on tenk1 b
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
-               Recheck Cond: ((unique1 < 20) OR (unique1 = 1) OR (unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 3))
+               Recheck Cond: ((unique1 < 20) OR (unique1 = 3) OR (unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
                            Index Cond: (unique1 < 20)
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Index Cond: (unique1 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique1
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
-                           Index Cond: (unique2 = ANY ('{3,7}'::integer[]))
-                     ->  Bitmap Index Scan on tenk1_unique1
-                           Index Cond: (unique1 = 3)
-(15 rows)
+                           Index Cond: (unique2 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique2
+                           Index Cond: (unique2 = 7)
+(17 rows)

 explain (costs off)
 select * from tenk1 a join tenk1 b on
   (a.unique1 < 20 or a.unique1 = 3 or a.unique1 = 1 and b.unique1 = 2) or
   ((a.unique2 = 3 or a.unique2 = 7) and b.hundred = 4);
-                                                                          QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop
-   Join Filter: ((a.unique1 < 20) OR ((a.unique1 = 1) AND (b.unique1 = 2)) OR ((a.unique2 = ANY ('{3,7}'::integer[])) AND (b.hundred = 4)) OR (a.unique1 = 3))
+   Join Filter: ((a.unique1 < 20) OR (a.unique1 = 3) OR ((a.unique1 = 1) AND (b.unique1 = 2)) OR (((a.unique2 = 3) OR (a.unique2 = 7)) AND (b.hundred = 4)))
    ->  Seq Scan on tenk1 b
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
-               Recheck Cond: ((unique1 < 20) OR (unique1 = 1) OR (unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 3))
+               Recheck Cond: ((unique1 < 20) OR (unique1 = 3) OR (unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique1
                            Index Cond: (unique1 < 20)
                      ->  Bitmap Index Scan on tenk1_unique1
+                           Index Cond: (unique1 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique1
                            Index Cond: (unique1 = 1)
                      ->  Bitmap Index Scan on tenk1_unique2
-                           Index Cond: (unique2 = ANY ('{3,7}'::integer[]))
-                     ->  Bitmap Index Scan on tenk1_unique1
-                           Index Cond: (unique1 = 3)
-(15 rows)
+                           Index Cond: (unique2 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique2
+                           Index Cond: (unique2 = 7)
+(17 rows)

I haven't been able to fully deal with this problem yet

I have attached my experimental patch with the code.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Request for comment on setting binary format output per session
Следующее
От: Robert Haas
Дата:
Сообщение: Re: trying again to get incremental backup