Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Поиск
Список
Период
Сортировка
От Michael Guissine
Тема Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4
Дата
Msg-id CACxDrAkYZA44Sn2wbcE+waD45EKc5gD2D-FBDAsuGCikKemVAw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-bugs
Hi Jeff, this is super useful, thanks for spending your time explaining the internals. In our case, we ended up dropping the trgm index and everything is back to normal but I like your idea of reversing the column order, will give it a try!

~ Michael

On Thu, Oct 20, 2022 at 1:03 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Oct 19, 2022 at 3:44 PM Michael Guissine <mguissine@gmail.com> wrote:
paritioned_table_hash_p3_partition_key_text_idx is in fact pg_trgm index, you are right! 
interesting, why would it use that index at all on some partitions but not on others? the query is a simple match on 3 columns and we have other indexes to cover it 

The two index choices are (falsely) similar in cost, so just small differences in the selectivity estimates is enough to drive one over the other from partition to partition. 

There are (at least) two planner deficiencies at work here.

For the actually faster index, it thinks it will need to re-descend the index one time for every value in the IN-list on the leading column, and each of those re-descents is going to end up on a different leaf page, charging about one random_io_cost for each one.  But it is counting a re-descent for every item in the IN-list, for every partition.  But it only needs to re-descend for the items of the IN-list which also happen to match the remainder for that hash partition. This is incorrectly inflating the cost of the faster index.  (You could argue that if the executor really does do the re-descents, then it is correct for the planner to account for them.  But I suspect that many of the hopeless re-descents may end up on leaf pages already in memory.  Anyway, fixing the planner and fixing the executor would probably go hand in hand) 

For the slower index, the gin code doesn't estimate how many rechecks there will be, only how many end results there will be.  But the rechecks are where most of the true cost is.  So that makes this one end up with an incorrectly low cost.  The overestimation of one and the underestimation of the other bring the two estimates close enough together that random variations in the stats can drive the choice between them.  Now you could say "all the row estimates are 1, so how could there be any variation"?  But notice that for the bitmap heap scans, there is a filter step which removes around 1000 rows from each partition.  We know how many rows were actually removed from each partition, but we are not told how many rows the planner thought it would remove.  Variations in this unseen estimated filtered number  would lead to variations in the cost estimate.

While fixing the planner would be nice, I think the best current solution for you is to change the order of columns in the 3-column index to be (text, start_time, key).  Since the condition on "text" is extremely selective, the planner should know that by the time it gets to the key column it can check it with a single hash-table probe rather than a bunch of costly re-descents.  This should be estimated to be faster than either of the existing two indexes, and also should actually be faster.

Cheers,

Jeff

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17658: Download release 15 without success
Следующее
От: Richard Guo
Дата:
Сообщение: Re: Documentation error