Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters

Поиск
Список
Период
Сортировка
От Michael Lewis
Тема Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters
Дата
Msg-id CAHOFxGohqdj7VPrWKdQGUFyo6zOVrrF-ACTSr528oo9P0XbXMw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters  (Cosmin Prund <cprund@gmail.com>)
Список pgsql-performance
"Finally I ran "ANALYZE" again and now the problem went away. Running the query with Ver=92 uses the proper plan. I'm not happy with this - I know I haven't solved the problem (I've ran ANALYZE multiple times before)."

Does 92 appear in MCVs list with that new sampling? I wonder if default_statistics_target should be increased a bit to help ensure a thorough sample of the data in this table. Note- don't go too high (maybe 250, not 1000) or planning time can increase significantly. Also, perhaps only increase on this Ver column.

What is the real frequency of value 92? With default_statistics_target = 100, analyze takes 100*300 rows as sample, and if it is missed in that 30k rows set, or very very small when in fact it has equal weight with other values, then during planning time it is expected to be very very rare when in fact it is only slightly less common than the others in the list. If the others in the list are expected to be 100% of the table as you showed with the query to compute "frac_MCV" from pg_stats for that column, then perhaps the optimizer is wise to scan only the LucrareBugetVersiuneId column of the composite index and filter in memory.

Curious, when you get bad plans (re-analyze the table repeatedly to get new samples until the wrong plan is chosen), what does PG estimate for total rows returned with ONLY LucrareBugetVersiuneId = 92 as the where condition?

Note- Tom & Laurenz are real experts. I might have no idea what I am doing yet. It is too early to say.

On Thu, Jan 16, 2020 at 11:15 AM Cosmin Prund <cprund@gmail.com> wrote:
Hello Michael and hello again Tom, sorry for mailing you directly. I just hit Reply in gmail - I expected the emails to have a reply-to=Pgsql. Apparently they do not.

Running the same query with a different "Ver" produces a proper plan. Here's a non-redacted example (Ver=91):

EXPLAIN (ANALYZE, BUFFERS) select "IdRand", "IdColoana", "Valoare" from "LucrareBugetDate" where ("LucrareBugetVersiuneId" = 91) and ("LucrareBugetDateId" in (10,11));
                                                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using "IX_LucrareBugetDate_LucrareBugetVersiuneId_LucrareBugetDateId" on "LucrareBugetDate"  (cost=0.56..4.95 rows=2 width=13) (actual time=3.617..3.631 rows=2 loops=1)
   Index Cond: (("LucrareBugetVersiuneId" = 91) AND ("LucrareBugetDateId" = ANY ('{10,11}'::integer[])))
   Buffers: shared hit=9 read=3
 Planning time: 0.223 ms
 Execution time: 3.663 ms
(5 rows)

I have reindex everything, not just this INDEX.

"reltuples" for this table is 41712436.

> I'd be curious of the fraction in the MCVs frequency list in stats indicates that rows with Ver = 92 are rare and therefore the index on only Ver column is sufficient to find the rows quickly.

There are 25 valid values for "Ver" in this database. I ran the query for all of them. The only one miss-behaving is "92". I ran the query with random values for Ver (invalid values), the query plan always attempts to use the index using both values.
I looked into "most_common_values" in pg_stats, this value (92) is not in that list.
Finally I ran "ANALYZE" again and now the problem went away. Running the query with Ver=92 uses the proper plan. I'm not happy with this - I know I haven't solved the problem (I've ran ANALYZE multiple times before).


On Thu, 16 Jan 2020 at 19:00, Michael Lewis <mlewis@entrata.com> wrote:
Does the behavior change with different values of Ver column? I'd be curious of the fraction in the MCVs frequency list in stats indicates that rows with Ver = 92 are rare and therefore the index on only Ver column is sufficient to find the rows quickly. What is reltuples for this table by the way?

I also wonder if the situation may be helped by re-indexing the "index on both columns" to remove any chance of issues on bloat in the index. Which order are the columns by the way? If Ver is first, is there also an index on only id column?. Since you aren't on v12, you don't get to re-index concurrently but I assume you know the work around of create concurrently (different name), drop concurrently (old one), and finally rename new index.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters