Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters

Поиск

Список

Период

Сортировка

От	Cosmin Prund
Тема	Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters
Дата	16 января 2020 г. 17:18:24
Msg-id	CAGU4dz8BrUvdgt6c5Povgj--aQee_psbRyaYqdx_ffyUh7FBmg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters
Список	pgsql-performance

Дерево обсуждения

Hi Tom, and thanks.

Running ANALYZE doesn't change a thing. REINDEXING doesn't change a thing. I know it's an odd choice of plan - that's why I'm here!

I thought I'd just post what felt relevant, hoping it's not something out of the ordinary and I'm just missing something obvious.

Here's lots of data:

===========================================================================

SELECT version()

PostgreSQL 10.10 (Ubuntu 10.10-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit

===========================================================================

===========================================================================

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='LucrareBugetDate';
relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
------------------+----------+-------------+---------------+---------+----------+----------------+-----------------+---------------
LucrareBugetDate | 2659660 | 4.17124e+07 | 671510 | r | 25 | f | {fillfactor=50} | 21793775616
(1 row)

===========================================================================

Does the table have anything unusual about it?

contains large objects: NO
has a large proportion of NULLs in several columns: NO
receives a large number of UPDATEs or DELETEs regularly: YES - Lots of UPDATES but no UPDATES to indexed columns. No DELETE's.
is growing rapidly: I'm inserting millions of records at once but not very often. Have manually done ANALYZE and REINDEX
has many indexes on it: NO
uses triggers that may be executing database functions, or is calling functions directly: NO

===========================================================================

EXPLAIN (ANALYZE, BUFFERS) select "IdRand", "IdColoana", "Valoare" from "LucrareBugetDate" where ("LucrareBugetVersiuneId" = 92) and ("LucrareBugetDateId" in (10,11));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using "IX_LucrareBugetDate_LucrareBugetVersiuneId_LucrareBugetDateId" on "LucrareBugetDate" (cost=0.56..2.37 rows=1 width=13) (actual time=0.096..978.398 rows=2 loops=1)
Index Cond: ("LucrareBugetVersiuneId" = 92)
Filter: ("LucrareBugetDateId" = ANY ('{10,11}'::integer[]))
Rows Removed by Filter: 1869178
Buffers: shared hit=161178
Planning time: 0.699 ms
Execution time: 978.433 ms

===========================================================================

Was this query always slow, or has it gotten slower over time? If the plan/execution of the query used to be different, do you have copies of those query plans? Has anything changed in your database other than the accumulation of data?

The query is usually instantaneous.

Here's the same query ran o a different server running the same database with comparable data (COLD server, frist run! The second run has execution time = 0.040ms):

EXPLAIN (ANALYZE, BUFFERS) select "IdRand", "IdColoana", "Valoare" from "LucrareBugetDate" where ("LucrareBugetVersiuneId" = 92) and ("LucrareBugetDateId" in (10,11));

Index Scan using "PK_LucrareBugetDate" on "LucrareBugetDate" (cost=0.56..4.85 rows=2 width=13) (actual time=22.922..23.123 rows=2 loops=1)

Index Cond: (("LucrareBugetVersiuneId" = 92) AND ("LucrareBugetDateId" = ANY ('{10,11}'::integer[])))

Buffers: shared hit=12 read=4

Planning time: 66.743 ms

Execution time: 23.190 ms

===========================================================================

Hardware:

2 x Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz, 128 GBhz, Local ZFS-based storage built from 4 x NVME SSD drives.

I doubt it's hardware related.

===========================================================================

SELECT * FROM pg_stat_user_tables WHERE relname='table_name';

-------+------------+------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-----------------+-------------------------------+-------------------------------+--------------+------------------+---------------+-------------------
20655 | public | LucrareBugetDate | 306 | 7765749768 | 8398680 | 983464378904 | 58388025 | 2944618 | 16675590 | 2887093 | 41712435 | 61524 | 2588381 | 2019-11-03 19:15:58.765546+00 | | 2020-01-15 16:11:26.301756+00 | 2019-12-20 10:12:53.737619+00 | 1 | 0 | 40 | 12

===========================================================================

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE attname='LucrareBugetDateId' AND tablename='LucrareBugetDate' ORDER BY 1 DESC;

===========================================================================

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE attname='LucrareBugetVersiuneId' AND tablename='LucrareBugetDate' ORDER BY 1 DESC;

===========================================================================

I think I went through most of https://wiki.postgresql.org/wiki/SlowQueryQuestions

I can provide more information if helpful.

On Thu, 16 Jan 2020 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Cosmin Prund <cprund@gmail.com> writes:
> explain analyze
> select R, C, V from LBD
> where Ver = 92 and Id in (10,11)

> Index Scan using "IX_LBD_Ver_Id" on "LBD" (cost=0.56..2.37 rows=1
> width=13) (actual time=0.063..857.725 rows=2 loops=1)
> Index Cond: ("Ver" = 92)
> Filter: ("Id" = ANY ('{10,11}'::integer[]))
> Rows Removed by Filter: 1869178
> Planning time: 0.170 ms
> Execution time: 857.767 ms

> The IX_LBD_Ver_Id index is on two columns (Ver, Id) - it's not in "Ver"
> alone!

Seems like an odd choice of plan, then, but you haven't provided any
detail that would let anyone guess why it's not using the second index
column. For starters it would be good to show the exact table and
index schema (eg via \d+ in psql). Also, does explicitly ANALYZE'ing
the table change anything?

https://wiki.postgresql.org/wiki/Slow_Query_Questions

regards, tom lane

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters