Обсуждение: Differents execution times with gin index, prepared statement and literals.

Поиск
Список
Период
Сортировка

Differents execution times with gin index, prepared statement and literals.

От
Pierrick Chovelon
Дата:

Hello,

One of our customer is facing an issue regarding a prepared statement and a gin index.
An important gap between execution times can be shown when using execute or plain literals.

Here is the test that shown this issue :

Init table :

create table tmp_tk_test_index
(
    sync_id           bigint,
    line_id           varchar(50),
    chk_upgrade_index smallint default 0
);

create index idx_tmp_tk_test_index_1
    on tmp_tk_test_index using gin (sync_id, line_id);

insert into tmp_tk_test_index SELECT tb_seq as sync_id, case when tb_seq = 950000 then 'the-test-value-fa529a621a15' else gen_random_uuid()::text end as line_id FROM generate_series(1,1000000) as tb_seq;

Prepare query :

prepare  stmt(bigint, text, int) as delete from  tmp_tk_test_index where sync_id <= $1 and line_id = $2 and chk_upgrade_index = $3;

And then execute it :

postgres=# begin;
BEGIN
postgres=*# explain (analyse) execute stmt(950000, 'the-test-value-fa529a621a15', 0);
                                                                 QUERY PLAN                                            

              

-----------------------------------------------------------------------------------------------------------------------

 Delete on tmp_tk_test_index  (cost=212.52..2662.59 rows=0 width=0) (actual time=60.766..60.767 rows=0 loops=1)

   ->  Bitmap Heap Scan on tmp_tk_test_index  (cost=212.52..2662.59 rows=4 width=6) (actual time=60.756..60.758 rows=1 

loops=1)

         Recheck Cond: ((sync_id <= '950000'::bigint) AND ((line_id)::text = 'the-test-value-fa529a621a15'::text))

         Filter: (chk_upgrade_index = 0)

         Heap Blocks: exact=1

         ->  Bitmap Index Scan on idx_tmp_tk_test_index_1  (cost=0.00..212.52 rows=810 width=0) (actual time=60.745..60

.745 rows=1 loops=1)

               Index Cond: ((sync_id <= '950000'::bigint) AND ((line_id)::text = 'the-test-value-fa529a621a15'::text))

 Planning Time: 6.765 ms

 Execution Time: 61.160 ms

(9 rows)
postgres=*# rollback ;
ROLLBACK

It takes 61.160ms to be executed. However, the "same" query without a prepared statement is far faster :

postgres=# begin;
BEGIN
postgres=*# explain analyze delete from  tmp_tk_test_index where sync_id <= 950000 and line_id = 'the-test-value-fa529a621a15' and chk_upgrade_index = 0;
                                                              QUERY PLAN                                               
                
-----------------------------------------------------------------------------------------------------------------------
----------------
 Delete on tmp_tk_test_index  (cost=21.36..25.38 rows=0 width=0) (actual time=0.084..0.085 rows=0 loops=1)
   ->  Bitmap Heap Scan on tmp_tk_test_index  (cost=21.36..25.38 rows=1 width=6) (actual time=0.042..0.043 rows=1 loops
=1)
         Recheck Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
         Filter: ((sync_id <= 950000) AND (chk_upgrade_index = 0))
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on idx_tmp_tk_test_index_1  (cost=0.00..21.36 rows=1 width=0) (actual time=0.027..0.028 
rows=1 loops=1)
               Index Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
 Planning Time: 0.325 ms
 Execution Time: 0.148 ms
(9 rows)
postgres=*# rollback ;
ROLLBACK

Only 0.148ms to be executed...

What can cause this quite important gap ? (0.148ms vs 61.160ms).
In the explain plan, I can see that the Index Cond is different : in the slowest case (sync_id <= '950000'::bigint) is present.
Why isn't it present in the second case ?

Two more tests :

The prepared statement is tweaked with sync_id+0. The query is fast :

prepare stmt3(bigint, text, int) as delete from tmp_tk_test_index where sync_id+0 <= $1 and line_id = $2 and chk_upgrade_index = $3;
explain (analyse) execute stmt3(950000, 'the-test-value-fa529a621a15', 0);
Delete on tmp_tk_test_index  (cost=21.36..25.38 rows=0 width=0) (actual time=0.057..0.058 rows=0 loops=1)
   ->  Bitmap Heap Scan on tmp_tk_test_index  (cost=21.36..25.38 rows=1 width=6) (actual time=0.040..0.042 rows=1 loops
=1)
         Recheck Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
         Filter: ((chk_upgrade_index = 0) AND ((sync_id + 0) <= '950000'::bigint))
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on idx_tmp_tk_test_index_1  (cost=0.00..21.36 rows=1 width=0) (actual time=0.020..0.020 
rows=1 loops=1)
               Index Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
 Planning Time: 0.355 ms
 Execution Time: 0.101 ms
(9 rows)

And one with an index but without sync_id inside it. Prepared statement isn't changed.

create index idx_tmp_tk_test_index_2 on tmp_tk_test_index using gin (line_id);

 Delete on tmp_tk_test_index  (cost=21.36..25.38 rows=0 width=0) (actual time=0.052..0.053 rows=0 loops=1)
   ->  Bitmap Heap Scan on tmp_tk_test_index  (cost=21.36..25.38 rows=1 width=6) (actual time=0.036..0.038 rows=1 loops
=1)
         Recheck Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
         Filter: ((sync_id <= '950000'::bigint) AND (chk_upgrade_index = 0))
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on idx_tmp_tk_test_index_2  (cost=0.00..21.36 rows=1 width=0) (actual time=0.020..0.021 
rows=1 loops=1)
               Index Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
 Planning Time: 0.439 ms
 Execution Time: 0.098 ms
(9 rows

Quite fast as well...

Have you got an idea on the initial issue ? Why when using a prepared statement and a gin index the execution time "explode" ?
Something to do with the planner ? optimizer ?

(We executed the same test with a btree index and execution times are the same in both cases).

Regards







On 7/16/24 17:43, Pierrick Chovelon wrote:
> ...
>
> Quite fast as well...
> 
> Have you got an idea on the initial issue ? Why when using a prepared
> statement and a gin index the execution time "explode" ?
> Something to do with the planner ? optimizer ?
> 
> (We executed the same test with a btree index and execution times are
> the same in both cases).
> 

The reason why the two queries end up with different plans is pretty
simple - the condition ends up matching different operators, because of
data type difference. In case of the prepared query, the (x <= 950000)
matches <=(bigint,bitint) operator, and thus it matches the index. But
that happens because the query is prepared with bigint parameter. For
the regular query, the 950000 literal gets treated as int, the condition
matches to <=(bigint,int) and that does not match the index - hence it's
treated as a filter, not an index condition.

If you cast the literal to bigint (by doing ::bigint) in the regular
query, we end it'll use the same same plan as the prepared query - but
that's the slower one, unfortunately :-(

Which gets us to why that plan is slower, compared to the plan using
fewer conditions. I think the problem is that <= 950000 matches most of
the table, which means the GIN index will have to load and process a
pretty long TID list, which is clearly not cheap.

I don't think there's much you can do do - we don't consider this when
matching conditions to the index, we simply match as many conditions as
possible. And the GIN code is not smart enough to make judgements about
which columns to process first - it just goes column by column and
builds the bitmap, and building a bitmap on 95% of the table is costly.

If this is a systemic problem for most/all queries (i.e. it's enough to
have a condition on line_id), I believe the +0 trick is a good way to
make sure the condition is treated as a filter.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Differents execution times with gin index, prepared statement and literals.

От
Pierrick Chovelon
Дата:

Hello,

Thanks a lot for your clear answer.

On 16/07/2024 19:54, Tomas Vondra wrote:
On 7/16/24 17:43, Pierrick Chovelon wrote:
...

Quite fast as well...

Have you got an idea on the initial issue ? Why when using a prepared
statement and a gin index the execution time "explode" ?
Something to do with the planner ? optimizer ?

(We executed the same test with a btree index and execution times are
the same in both cases).

The reason why the two queries end up with different plans is pretty
simple - the condition ends up matching different operators, because of
data type difference. In case of the prepared query, the (x <= 950000)
matches <=(bigint,bitint) operator, and thus it matches the index. But
that happens because the query is prepared with bigint parameter. For
the regular query, the 950000 literal gets treated as int, the condition
matches to <=(bigint,int) and that does not match the index - hence it's
treated as a filter, not an index condition.

If you cast the literal to bigint (by doing ::bigint) in the regular
query, we end it'll use the same same plan as the prepared query - but
that's the slower one, unfortunately :-(
I try the following thing :
postgres=# prepare stmt(int, text, int) as delete from  tmp_tk_test_index where sync_id <= $1 and line_id = $2 and chk_upgrade_index = $3;
PREPARE
postgres=# begin ;
BEGIN
postgres=*# explain (analyse) execute stmt(950000, 'the-test-value-fa529a621a15', 0);
                                                              QUERY PLAN                                               
-----------------------------------------------------------------------------------------------------------------------
 Delete on tmp_tk_test_index  (cost=21.36..25.38 rows=0 width=0) (actual time=0.148..0.149 rows=0 loops=1)
   ->  Bitmap Heap Scan on tmp_tk_test_index  (cost=21.36..25.38 rows=1 width=6) (actual time=0.146..0.147 rows=0 loops=1)
         Recheck Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
         Filter: ((sync_id <= 950000) AND (chk_upgrade_index = 0))
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on idx_tmp_tk_test_index_1  (cost=0.00..21.36 rows=1 width=0) (actual time=0.099..0.099 rows=1 loops=1)
               Index Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
 Planning Time: 9.412 ms
 Execution Time: 1.570 ms
(9 rows)
postgres=*# rollback ;
ROLLBACK

So preparing a query with a data type different from the column (int (prepared statement) vs bigint (table)) is faster in our case :/
It doesn't sound obvious to me :)

Thanks again for your answer Tomas.

Which gets us to why that plan is slower, compared to the plan using
fewer conditions. I think the problem is that <= 950000 matches most of
the table, which means the GIN index will have to load and process a
pretty long TID list, which is clearly not cheap.

I don't think there's much you can do do - we don't consider this when
matching conditions to the index, we simply match as many conditions as
possible. And the GIN code is not smart enough to make judgements about
which columns to process first - it just goes column by column and
builds the bitmap, and building a bitmap on 95% of the table is costly.

If this is a systemic problem for most/all queries (i.e. it's enough to
have a condition on line_id), I believe the +0 trick is a good way to
make sure the condition is treated as a filter.


regards

-- 
Pierrick Chovelon
Consultant DBA PostgreSQL - Dalibo