Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От Alena Rybakina
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id 759292d5-cb51-4b12-89fa-576c1d9b374d@postgrespro.ru
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On 22.07.2024 03:52, Alexander Korotkov wrote:
Hi, Alena!

Let me answer to some of your findings.

On Mon, Jul 22, 2024 at 12:53 AM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
To be honest,I saw a larger problem. Look at the query bellow:

master:

alena@postgres=# create table t (a int not null, b int not null, c int not null);
insert into t (select 1, 1, i from generate_series(1,10000) i);
insert into t (select i, 2, 2 from generate_series(1,10000) i);
create index t_a_b_idx on t (a, b);
Just a side note.  As I mention in [1], there is missing statement
create index t_a_b_idx on t (a, b);
to get same plan as in [2].

create statistics t_a_b_stat (mcv) on a, b from t;
create statistics t_b_c_stat (mcv) on b, c from t;
vacuum analyze t;
CREATE TABLE
INSERT 0 10000
INSERT 0 10000
CREATE INDEX
CREATE STATISTICS
CREATE STATISTICS
VACUUM
alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and c = 2;                                  QUERY PLAN
------------------------------------------------------------------------------ Bitmap Heap Scan on t  (cost=156.55..465.57 rows=5001 width=12)   Recheck Cond: (a = 1)   Filter: ((c = 2) AND ((b = 1) OR (b = 2)))   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..155.29 rows=10001 width=0)         Index Cond: (a = 1)
(5 rows)


The query plan if v26[0] and v27[1] versions are equal and wrong in my opinion -where is c=2 expression?

v27 [1]
alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and c = 2;                                  QUERY PLAN
------------------------------------------------------------------------------ Bitmap Heap Scan on t  (cost=165.85..474.87 rows=5001 width=12)   Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))   Filter: (c = 2)   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001 width=0)         Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
(5 rows)
v26 [0]
alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and c = 2;                                  QUERY PLAN
------------------------------------------------------------------------------ Bitmap Heap Scan on t  (cost=165.85..449.86 rows=5001 width=12)   Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))   Filter: (c = 2)   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001 width=0)         Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
(5 rows)
I think both v26 and v27 are correct here.  The c = 2 condition is in
the Filter.
Yes, I see it and agree with that.
In addition, I noticed that the ANY expression will be formed only for first group and ignore for others, like in the sample bellow:

v26 version [0]:

alena@postgres=# explain select * from t where (b = 1 or b = 2) and (a = 2 or a=3);                                    QUERY PLAN
----------------------------------------------------------------------------------- Index Scan using t_a_b_idx on t  (cost=0.29..24.75 rows=2 width=12)   Index Cond: ((a = ANY ('{2,3}'::integer[])) AND (b = ANY ('{1,2}'::integer[])))
(2 rows)

v27 version [1]:

alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or a=3);                       QUERY PLAN
-------------------------------------------------------- Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)   Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
(2 rows)
Did you notice you're running different queries on v26 and v27 here?
If you will run ton v27 the same query you run on v26, the plan also
will be the same.

alena@postgres=# create index a_idx on t(a);
CREATE INDEX
alena@postgres=# create index b_idx on t(b);
CREATE INDEX
alena@postgres=# analyze;
ANALYZE

v26:

alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or a=3);                                     QUERY PLAN
------------------------------------------------------------------------------------ Bitmap Heap Scan on t  (cost=17.18..30.94 rows=4 width=12)   Recheck Cond: ((a = ANY ('{2,3}'::integer[])) OR (a = ANY ('{2,3}'::integer[])))   ->  BitmapOr  (cost=17.18..17.18 rows=4 width=0)         ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)               Index Cond: (a = ANY ('{2,3}'::integer[]))         ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)               Index Cond: (a = ANY ('{2,3}'::integer[]))
(7 rows)

v27:

alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or a=3);                       QUERY PLAN
-------------------------------------------------------- Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)   Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
(2 rows)

The behavior in version 26 is incorrect, but in version 27, it does not select anything other than seqscan
Please, check that there is still possibility to the generate BitmapOr plan.
It is fine, I think. The transformation works, but due to the fact that index columns are different for two indexes, the transformation hasn't been applied.

# explain select * from t where (b = 1 or b = 2 or a = 2 or a = 3);                                     QUERY PLAN
------------------------------------------------------------------------------------ Bitmap Heap Scan on t  (cost=326.16..835.16 rows=14999 width=12)   Recheck Cond: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))   ->  BitmapOr  (cost=326.16..326.16 rows=20000 width=0)         ->  Bitmap Index Scan on t_b_c_idx  (cost=0.00..151.29
rows=10000 width=0)               Index Cond: (b = 1)         ->  Bitmap Index Scan on t_b_c_idx  (cost=0.00..151.29
rows=10000 width=0)               Index Cond: (b = 2)         ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..4.29 rows=1 width=0)               Index Cond: (a = 2)         ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..4.29 rows=1 width=0)               Index Cond: (a = 3)

It has higher cost than SeqScan plan, but I think it would be selected
on larger tables.  And yes, this is not ideal, because it fails to
generate BitmapOr over two IndexScans on SAOPs.  But it's not worse
than what current master does.  An optimization doesn't have to do
everything it could possible do.  So, I think this could be improved
in a separate patch.

Links
1. https://www.postgresql.org/message-id/CAPpHfdvhWE5pArZhgJeLViLx3-A3rxEREZvfkTj3E%3Dh7q-Bx9w%40mail.gmail.com
2. https://www.postgresql.org/message-id/CAPpHfdtSXxhdv3mLOLjEewGeXJ%2BFtfhjqodn1WWuq5JLsKx48g%40mail.gmail.com

Yes, I see and agree with you.

To be honest, I have found a big problem in this patch - we try to perform the transformation every time we examime a column:

for (indexcol = 0; indexcol < index->nkeycolumns; indexcol++) { ...

}

I have fixed it and moved the transformation before going through the loop.

I try to make an array expression for "OR" expr, but at the same time I form the result as an "AND" expression, consisting of an "Array" expression and "OR" expressions, and then I check whether there is an index for this column, if so, I save it and write down the transformation. I also had to return the previous part of the patch, where we formed "ANY" groups, since we could end up with several such groups. I hope I made my idea clear, but if not, please tell me.

Unfortunately, I have got the different result one of the query from regression tests and I'm not sure if it is correct:

diff -U3 /home/alena/postgrespro_or3/src/test/regress/expected/create_index.out /home/alena/postgrespro_or3/src/test/regress/results/create_index.out --- /home/alena/postgrespro_or3/src/test/regress/expected/create_index.out 2024-07-23 18:51:13.077311360 +0300 +++ /home/alena/postgrespro_or3/src/test/regress/results/create_index.out 2024-07-25 16:43:56.895132328 +0300 @@ -1860,13 +1860,14 @@ EXPLAIN (COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = (SELECT 1 + 2) OR tenthous = 42); - QUERY PLAN ----------------------------------------------------------------------------------------- + QUERY PLAN +--------------------------------------------------------------------------------- Index Scan using tenk1_thous_tenthous on tenk1 - Index Cond: ((thousand = 42) AND (tenthous = ANY (ARRAY[1, (InitPlan 1).col1, 42]))) + Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,-1,42}'::integer[]))) + Filter: ((tenthous = 1) OR (tenthous = (InitPlan 1).col1) OR (tenthous = 42)) InitPlan 1 -> Result -(4 rows) +(5 rows) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = (SELECT 1 + 2) OR tenthous = 42);

I'm researching what's wrong here now.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dagfinn Ilmari Mannsåker
Дата:
Сообщение: Re: CREATE MATERIALIZED VIEW
Следующее
От: Dagfinn Ilmari Mannsåker
Дата:
Сообщение: Re: CREATE MATERIALIZED VIEW