Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE
Дата
Msg-id CAFj8pRAKo9JUWnEXyYbMbknYWuRXxCLubU0Yu3HhVLUGM1ci3Q@mail.gmail.com
обсуждение исходный текст
Ответ на Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE  (Jan Kort <jan.kort@genetics.nl>)
Ответы RE: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE  (Jan Kort <jan.kort@genetics.nl>)
Список pgsql-bugs
Hi

út 18. 5. 2021 v 15:36 odesílatel Jan Kort <jan.kort@genetics.nl> napsal:

Hi,

 

I tried this on both PostgreSQL 12.7 and 13.3, I tried both VACUUM ANALYZE and VACUUM FULL ANALYZE, they both seem to change a straightforward plan to a reversed and complex plan that make the query slow. The only thing that works is to do a pg_dump, pg_restore, but that is not feasible.

 

It seems to depend on specific data, I tried generating other variations of around 1M records, but so far the attached DB (also generated) is the only one where I could reproduce it, I can’t figure out what is so special about this data.

 

Given the following record counts:

 

GFO_ZAKEN: 1048587

GFO_ZAKEN_KOSTEN: 3

GFO_ZAKEN_TYPECODE: 4

GFO_ZAKEN.ZAAKTYPECODE_ID has a value 9 times and the rest is null

 

When I run this query:

 

EXPLAIN ANALYZE

SELECT GFO_ZAKEN_KOSTEN.ID AS GFO_ZAKEN_KOSTEN_ID,

       GFO_ZAKEN.ID AS GFO_ZAKEN_ID,

       GFO_ZAKEN_TYPECODE.OMSCHRIJVING AS ZAAKTYPECODE_ID

  FROM TRIAL.GFO_ZAKEN_KOSTEN

  JOIN TRIAL.GFO_ZAKEN ON GFO_ZAKEN_KOSTEN.GFO_ZAKEN_ID = GFO_ZAKEN.ID

  JOIN TRIAL.GFO_ZAKEN_TYPECODE ON GFO_ZAKEN.ZAAKTYPECODE_ID = GFO_ZAKEN_TYPECODE.ID

 

It is taking 500ms or so, which I associate with a full table scan, but they are just simple referential constraints and corresponding indexes:

 

CREATE TABLE TRIAL.GFO_ZAKEN_TYPECODE (ID INTEGER PRIMARY KEY, OMSCHRIJVING CHARACTER VARYING(4000));

CREATE TABLE TRIAL.GFO_ZAKEN (ID INTEGER PRIMARY KEY, ZAAKTYPECODE_ID INTEGER,

  CONSTRAINT ZAAKTYPECODE_IDC1 FOREIGN KEY (ZAAKTYPECODE_ID) REFERENCES TRIAL.GFO_ZAKEN_TYPECODE (ID));

CREATE INDEX GFO_ZAKENO18 ON TRIAL.GFO_ZAKEN USING BTREE (ZAAKTYPECODE_ID);

CREATE TABLE TRIAL.GFO_ZAKEN_KOSTEN (ID INTEGER PRIMARY KEY, GFO_ZAKEN_ID INTEGER,

  CONSTRAINT GFO_ZAKEN_IDC1 FOREIGN KEY (GFO_ZAKEN_ID) REFERENCES TRIAL.GFO_ZAKEN (ID));

CREATE INDEX GFO_ZAKEN_KOSTENO14 ON TRIAL.GFO_ZAKEN_KOSTEN USING BTREE (GFO_ZAKEN_ID ASC NULLS LAST);

 

After pg_restore it gives a straightforward plan, starting with the gfo_zaken_kosten primary key and continuing with the join on gfo_zaken_kosten.gfo_zaken_id:

 

Nested Loop  (cost=0.56..3.64 rows=1 width=524) (actual time=0.036..0.037 rows=1 loops=1)

  ->  Nested Loop  (cost=0.43..3.48 rows=1 width=12) (actual time=0.030..0.030 rows=1 loops=1)

        ->  Seq Scan on gfo_zaken_kosten  (cost=0.00..1.04 rows=1 width=8) (actual time=0.016..0.016 rows=1 loops=1)

              Filter: (id = 13)

              Rows Removed by Filter: 2

        ->  Index Scan using gfo_zakenp on gfo_zaken  (cost=0.43..2.45 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)

              Index Cond: (id = gfo_zaken_kosten.gfo_zaken_id)

  ->  Index Scan using gfo_zaken_typecodep on gfo_zaken_typecode  (cost=0.13..0.15 rows=1 width=520) (actual time=0.005..0.005 rows=1 loops=1)

        Index Cond: (id = gfo_zaken.zaaktypecode_id)

Planning Time: 1.538 ms

Execution Time: 0.095 ms

 

After VACUUM ANALYZE the plan becomes inefficient again, and does not start with the gfo_zaken_kosten primary key, the plan starts at the wrong end with an index scan on 1M rows:

 

Merge Join  (cost=1.48..1.59 rows=1 width=159) (actual time=619.374..619.376 rows=1 loops=1)

  Merge Cond: (gfo_zaken.id = gfo_zaken_kosten.gfo_zaken_id)

  ->  Nested Loop  (cost=0.43..96503.47 rows=1048587 width=155) (actual time=0.022..619.359 rows=9 loops=1)

        Join Filter: (gfo_zaken.zaaktypecode_id = gfo_zaken_typecode.id)

        Rows Removed by Join Filter: 4194316

        ->  Index Scan using gfo_zakenp on gfo_zaken  (cost=0.43..33587.23 rows=1048587 width=8) (actual time=0.006..141.167 rows=1048587 loops=1)

        ->  Materialize  (cost=0.00..1.06 rows=4 width=155) (actual time=0.000..0.000 rows=4 loops=1048587)

              ->  Seq Scan on gfo_zaken_typecode  (cost=0.00..1.04 rows=4 width=155) (actual time=0.011..0.012 rows=4 loops=1)

  ->  Sort  (cost=1.05..1.05 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)

        Sort Key: gfo_zaken_kosten.gfo_zaken_id

        Sort Method: quicksort  Memory: 25kB

        ->  Seq Scan on gfo_zaken_kosten  (cost=0.00..1.04 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1)

              Filter: (id = 13)

              Rows Removed by Filter: 2

Planning Time: 69.151 ms

Execution Time: 619.410 ms

 



This is not a bug. You can see very bad estimation on join. Postgres expects one value has the same probability in both tables. In your case, it is not true. Unfortunately, Postgres has not multi table statistics, so there is no easy solution. Usually you need to divide your query to two. And maybe you can check your data, why the predicate gfo_zaken.zaaktypecode_id = gfo_zaken_typecode.id is almost ivalid.

Regards

Pavel Stehule


So 6520 times slower after vacuum analyze then before.

 

Doing VACUUM ANALYZE again doesn’t make it better, only pg_dump + pg_restore will go back to the original plan, but then it will break again on the first VACUUM ANALYZE.

 

I attached a 800K test DB with all sensitive data removed.

 

I tried both the default config without changes, and the default config with our settings appended:

 

max_connections = 1000

shared_buffers = 512MB

effective_cache_size = 6GB

work_mem = 10485kB

maintenance_work_mem = 512MB

min_wal_size = 1GB

max_wal_size = 2GB

checkpoint_completion_target = 0.7

wal_buffers = 16MB

default_statistics_target = 100

random_page_cost = 1

wal_sync_method = open_datasync

fsync = on

synchronous_commit = off

 

Doing a VACUUM ANALYZE shouldn’t change a straightforward plan.

 

Regards,

 

Jan Kort

 

 

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17018: Two versions of the same row of records are returned in one query
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17017: Two versions of the same row of records are returned in one query