Обсуждение: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Поиск
Список
Период
Сортировка

BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17652
Logged by:          Michael Guissine
Email address:      mguissine@gmail.com
PostgreSQL version: 14.4
Operating system:   AWS RDS
Description:

Hello,

We recently upgraded our AWS RDS instances to PG 14.4 and are seeing weird
performance degradation. The issue seems to be isolated to one specific use
case which is querying partitioned table with a large number of elements in
the IN clause. In our setup, we observed that Postgres is switching from
Index Scan to Bitmap Index Scan/Bitmap Heap Scan if the number of elements
in the IN clause is above 60, below 60 the Index Scan is performed on all
partitions.

Query and query plan are attached below. We also tried to turn off
enable_bitmapscan and observed that query is performing well which was our
experience in the previous version (pg 13.4)

https://explain.depesz.com/s/F587

Thank you!


Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Jeff Janes
Дата:
On Wed, Oct 19, 2022 at 12:21 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      17652
Logged by:          Michael Guissine
Email address:      mguissine@gmail.com
PostgreSQL version: 14.4
Operating system:   AWS RDS
Description:       

Hello,

We recently upgraded our AWS RDS instances to PG 14.4 and are seeing weird
performance degradation.

First things first, did you do an ANALYZE on the database after the upgrade?  (Maybe RDS does it for you, I don't know.  But it isn't worth looking into anything else until you make sure that that has been done)

Cheers,

Jeff

Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Michael Guissine
Дата:
Hi Jeff,

Sorry I should've mentioned that. Yes we did ANALYZE all the tables after upgrade, we even went ahead and REINDEX all indexes and then also run VACUUM ANALYZE but still facing same performance issue

On Wed, Oct 19, 2022 at 1:09 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Oct 19, 2022 at 12:21 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      17652
Logged by:          Michael Guissine
Email address:      mguissine@gmail.com
PostgreSQL version: 14.4
Operating system:   AWS RDS
Description:       

Hello,

We recently upgraded our AWS RDS instances to PG 14.4 and are seeing weird
performance degradation.

First things first, did you do an ANALYZE on the database after the upgrade?  (Maybe RDS does it for you, I don't know.  But it isn't worth looking into anything else until you make sure that that has been done)

Cheers,

Jeff

Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Andres Freund
Дата:
Yi,

On 2022-10-19 16:11:15 +0000, PG Bug reporting form wrote:
> We recently upgraded our AWS RDS instances to PG 14.4 and are seeing weird
> performance degradation.

Are they actually configured the same? Of particular interest would be
shared_buffers, effective_cache_size, random_page_cost, seq_page_cost,
cpu_index_tuple_cost, cpu_operator_cost, cpu_tuple_cost, work_mem.

Your explain shows that most of the time is spent doing IO:
 Append  (cost=0.43..93144.17 rows=121 width=1121) (actual time=225005.053..1160202.658 rows=7 loops=1)
   Buffers: shared hit=379846 read=2382552 dirtied=57320 written=231982
   I/O Timings: read=1113819.977 write=2850.752

Is there a chance at least part of this is due to a cold cache after
upgrading?


What is the definition of the indexes the partitions?

Based on the names of the indexes I'd guess that the reason the bitmap scans
are bad is that the index includes the partition_key column as a leading
column, but skips over that column and just filters by "text".


I'm a bit confused by your plan showing:
         Rows Removed by Index Recheck: 62841
and
         Heap Blocks: exact=46111
in the same node.

I haven't rechecked the code, but I don't think that should happen for btree
indexes. It could however for gin, bloom, hash etc.


> Query and query plan are attached below. We also tried to turn off
> enable_bitmapscan and observed that query is performing well which was our
> experience in the previous version (pg 13.4)
> 
> https://explain.depesz.com/s/F587

Do you know what the plan was in 13?

Greetings,

Andres Freund



Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Jeff Janes
Дата:
On Wed, Oct 19, 2022 at 1:30 PM Michael Guissine <mguissine@gmail.com> wrote:
Hi Jeff,

Sorry I should've mentioned that. Yes we did ANALYZE all the tables after upgrade, we even went ahead and REINDEX all indexes and then also run VACUUM ANALYZE but still facing same performance issue

I'm guessing that paritioned_table_hash_p3_partition_key_text_idx is a pg_trgm index, and that you are victim of the change "Allow GiST/GIN pg_trgm indexes to do equality lookups".  It is not surprising that this change would bite someone.  If you can redo the upgrade but this time refrain from upgrading pg_trgm from 1.5 to 1.6, that should fix the issue (assuming RDS even gives you enough control to do that), or drop the extension and recreate it specifically as version 1.5 (which unfortunately would cascade to dropping all the indexes, and they would need to be recreated after--even though the index structures themselves would require no change).

I have a file which would let you downgrade the extension without needing to drop all the pg_trgm indexes, but there is no way for you to deploy that to RDS.

Cheers,

Jeff
Вложения

Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Michael Guissine
Дата:
Thank you Andres,

The column names in the plan and queries are obfuscated, the cold cache should not be an issue as we upgraded 3 days ago and since reindexed/re-analyzed everything plus we have an active application running against this database so the data should be cached

here is a obfuscated table definition, there are multiple indexes on those columns to support different query patterns, note that the partitioned_table_partition_key_start_time_text index was added after the upgrade as we thought it might help the optimizer to come up with a better plan but it didn't help

                                          Partitioned table "public.partitioned_table"
Column | Type | Collation | Nullable | Default
---------------------------+-----------------------------+-----------+----------+------------------------------------
id | bigint | | not null | nextval('partitioned_table_id_seq'::regclass)
partition_key | integer | | not null |
text | text | | |
description | text | | |
...
more fields
...
Partition key: HASH (partition_key)
Indexes:
"partitioned_table_pkey" PRIMARY KEY, btree (partition_key, id)
"partitioned_table_partition_key_start_time_index" btree (partition_key, start_time)
"partitioned_table_partition_key_start_time_text" btree (partition_key, start_time, text)
"partitioned_table_partition_key_text_idx" gin (partition_key, text gin_trgm_ops)
"partitioned_table_text_tsv_index" gin (to_tsvector('simple'::regconfig, text)) WHERE text IS NOT NULL
Number of partitions: 128 (Use \d+ to list them.)

On Wed, Oct 19, 2022 at 2:46 PM Andres Freund <andres@anarazel.de> wrote:
Yi,

On 2022-10-19 16:11:15 +0000, PG Bug reporting form wrote:
> We recently upgraded our AWS RDS instances to PG 14.4 and are seeing weird
> performance degradation.

Are they actually configured the same? Of particular interest would be
shared_buffers, effective_cache_size, random_page_cost, seq_page_cost,
cpu_index_tuple_cost, cpu_operator_cost, cpu_tuple_cost, work_mem.

Your explain shows that most of the time is spent doing IO:
 Append  (cost=0.43..93144.17 rows=121 width=1121) (actual time=225005.053..1160202.658 rows=7 loops=1)
   Buffers: shared hit=379846 read=2382552 dirtied=57320 written=231982
   I/O Timings: read=1113819.977 write=2850.752

Is there a chance at least part of this is due to a cold cache after
upgrading?


What is the definition of the indexes the partitions?

Based on the names of the indexes I'd guess that the reason the bitmap scans
are bad is that the index includes the partition_key column as a leading
column, but skips over that column and just filters by "text".


I'm a bit confused by your plan showing:
         Rows Removed by Index Recheck: 62841
and
         Heap Blocks: exact=46111
in the same node.

I haven't rechecked the code, but I don't think that should happen for btree
indexes. It could however for gin, bloom, hash etc.


> Query and query plan are attached below. We also tried to turn off
> enable_bitmapscan and observed that query is performing well which was our
> experience in the previous version (pg 13.4)
>
> https://explain.depesz.com/s/F587

Do you know what the plan was in 13?

Greetings,

Andres Freund

Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Michael Guissine
Дата:
paritioned_table_hash_p3_partition_key_text_idx is in fact pg_trgm index, you are right! 
interesting, why would it use that index at all on some partitions but not on others? the query is a simple match on 3 columns and we have other indexes to cover it 

On Wed, Oct 19, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Oct 19, 2022 at 1:30 PM Michael Guissine <mguissine@gmail.com> wrote:
Hi Jeff,

Sorry I should've mentioned that. Yes we did ANALYZE all the tables after upgrade, we even went ahead and REINDEX all indexes and then also run VACUUM ANALYZE but still facing same performance issue

I'm guessing that paritioned_table_hash_p3_partition_key_text_idx is a pg_trgm index, and that you are victim of the change "Allow GiST/GIN pg_trgm indexes to do equality lookups".  It is not surprising that this change would bite someone.  If you can redo the upgrade but this time refrain from upgrading pg_trgm from 1.5 to 1.6, that should fix the issue (assuming RDS even gives you enough control to do that), or drop the extension and recreate it specifically as version 1.5 (which unfortunately would cascade to dropping all the indexes, and they would need to be recreated after--even though the index structures themselves would require no change).

I have a file which would let you downgrade the extension without needing to drop all the pg_trgm indexes, but there is no way for you to deploy that to RDS.

Cheers,

Jeff

Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Andres Freund
Дата:
Hi,

On 2022-10-19 15:32:17 -0400, Michael Guissine wrote:
> The column names in the plan and queries are obfuscated

I guessed so :)


> Indexes:
>     "partitioned_table_pkey" PRIMARY KEY, btree (partition_key, id)
>     "partitioned_table_partition_key_start_time_index" btree
> (partition_key, start_time)
>     "partitioned_table_partition_key_start_time_text" btree
> (partition_key, start_time, text)
>     "partitioned_table_partition_key_text_idx" gin (partition_key,
> text gin_trgm_ops)
>     "partitioned_table_text_tsv_index" gin
> (to_tsvector('simple'::regconfig, text)) WHERE text IS NOT NULL
> Number of partitions: 128 (Use \d+ to list them.)

Ok, that explains why we see the massive rechecks. For the bitmap scans we end
up using a gin index instead of the btree index.

The reason you're seeing this in 14 is likely the following entry in the
release notes:

> Allow GiST/GIN pg_trgm indexes to do equality lookups (Julien Rouhaud)
> This is similar to LIKE except no wildcards are honored.

CCed author/committer of that change. I suspect this would need proper costing
to not be inadvertently be used.


I assume you need the pg_trgm index?


In theory you could likely work around the issue by reverting the relevant
catalog change:

+ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
+        OPERATOR        11       pg_catalog.= (text, text);

by issuing

ALTER OPERATOR FAMILY gin_trgm_ops USING gin DROP OPERATOR 11 (text, text);

but I suspect you won't be allowed to do that in RDS.

Greetings,

Andres Freund



Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Peter Geoghegan
Дата:
On Wed, Oct 19, 2022 at 1:12 PM Andres Freund <andres@anarazel.de> wrote:
> The reason you're seeing this in 14 is likely the following entry in the
> release notes:
>
> > Allow GiST/GIN pg_trgm indexes to do equality lookups (Julien Rouhaud)
> > This is similar to LIKE except no wildcards are honored.
>
> CCed author/committer of that change. I suspect this would need proper costing
> to not be inadvertently be used.

I am reminded of this recent thread:

https://postgr.es/m/3188617.44csPzL39Z@aivenronan

(Perhaps you were thinking of the same thing already, but thought I'd
mention it, just in case.)
-- 
Peter Geoghegan



Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Andres Freund
Дата:
Hi,

On 2022-10-19 13:20:37 -0700, Peter Geoghegan wrote:
> On Wed, Oct 19, 2022 at 1:12 PM Andres Freund <andres@anarazel.de> wrote:
> > The reason you're seeing this in 14 is likely the following entry in the
> > release notes:
> >
> > > Allow GiST/GIN pg_trgm indexes to do equality lookups (Julien Rouhaud)
> > > This is similar to LIKE except no wildcards are honored.
> >
> > CCed author/committer of that change. I suspect this would need proper costing
> > to not be inadvertently be used.
> 
> I am reminded of this recent thread:
> 
> https://postgr.es/m/3188617.44csPzL39Z@aivenronan
> 
> (Perhaps you were thinking of the same thing already, but thought I'd
> mention it, just in case.)

No, I hadn't seen that one. That indeed looks quite relevant...

Greetings,

Andres Freund



Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Michael Guissine
Дата:
Thank you all for such super quick responses. We are considering removing the pg_trgm index and hope that will help!

On Wed, Oct 19, 2022 at 4:36 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2022-10-19 13:20:37 -0700, Peter Geoghegan wrote:
> On Wed, Oct 19, 2022 at 1:12 PM Andres Freund <andres@anarazel.de> wrote:
> > The reason you're seeing this in 14 is likely the following entry in the
> > release notes:
> >
> > > Allow GiST/GIN pg_trgm indexes to do equality lookups (Julien Rouhaud)
> > > This is similar to LIKE except no wildcards are honored.
> >
> > CCed author/committer of that change. I suspect this would need proper costing
> > to not be inadvertently be used.
>
> I am reminded of this recent thread:
>
> https://postgr.es/m/3188617.44csPzL39Z@aivenronan
>
> (Perhaps you were thinking of the same thing already, but thought I'd
> mention it, just in case.)

No, I hadn't seen that one. That indeed looks quite relevant...

Greetings,

Andres Freund

Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Jeff Janes
Дата:
On Wed, Oct 19, 2022 at 3:44 PM Michael Guissine <mguissine@gmail.com> wrote:
paritioned_table_hash_p3_partition_key_text_idx is in fact pg_trgm index, you are right! 
interesting, why would it use that index at all on some partitions but not on others? the query is a simple match on 3 columns and we have other indexes to cover it 

The two index choices are (falsely) similar in cost, so just small differences in the selectivity estimates is enough to drive one over the other from partition to partition. 

There are (at least) two planner deficiencies at work here.

For the actually faster index, it thinks it will need to re-descend the index one time for every value in the IN-list on the leading column, and each of those re-descents is going to end up on a different leaf page, charging about one random_io_cost for each one.  But it is counting a re-descent for every item in the IN-list, for every partition.  But it only needs to re-descend for the items of the IN-list which also happen to match the remainder for that hash partition. This is incorrectly inflating the cost of the faster index.  (You could argue that if the executor really does do the re-descents, then it is correct for the planner to account for them.  But I suspect that many of the hopeless re-descents may end up on leaf pages already in memory.  Anyway, fixing the planner and fixing the executor would probably go hand in hand) 

For the slower index, the gin code doesn't estimate how many rechecks there will be, only how many end results there will be.  But the rechecks are where most of the true cost is.  So that makes this one end up with an incorrectly low cost.  The overestimation of one and the underestimation of the other bring the two estimates close enough together that random variations in the stats can drive the choice between them.  Now you could say "all the row estimates are 1, so how could there be any variation"?  But notice that for the bitmap heap scans, there is a filter step which removes around 1000 rows from each partition.  We know how many rows were actually removed from each partition, but we are not told how many rows the planner thought it would remove.  Variations in this unseen estimated filtered number  would lead to variations in the cost estimate.

While fixing the planner would be nice, I think the best current solution for you is to change the order of columns in the 3-column index to be (text, start_time, key).  Since the condition on "text" is extremely selective, the planner should know that by the time it gets to the key column it can check it with a single hash-table probe rather than a bunch of costly re-descents.  This should be estimated to be faster than either of the existing two indexes, and also should actually be faster.

Cheers,

Jeff

Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

От
Michael Guissine
Дата:
Hi Jeff, this is super useful, thanks for spending your time explaining the internals. In our case, we ended up dropping the trgm index and everything is back to normal but I like your idea of reversing the column order, will give it a try!

~ Michael

On Thu, Oct 20, 2022 at 1:03 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Oct 19, 2022 at 3:44 PM Michael Guissine <mguissine@gmail.com> wrote:
paritioned_table_hash_p3_partition_key_text_idx is in fact pg_trgm index, you are right! 
interesting, why would it use that index at all on some partitions but not on others? the query is a simple match on 3 columns and we have other indexes to cover it 

The two index choices are (falsely) similar in cost, so just small differences in the selectivity estimates is enough to drive one over the other from partition to partition. 

There are (at least) two planner deficiencies at work here.

For the actually faster index, it thinks it will need to re-descend the index one time for every value in the IN-list on the leading column, and each of those re-descents is going to end up on a different leaf page, charging about one random_io_cost for each one.  But it is counting a re-descent for every item in the IN-list, for every partition.  But it only needs to re-descend for the items of the IN-list which also happen to match the remainder for that hash partition. This is incorrectly inflating the cost of the faster index.  (You could argue that if the executor really does do the re-descents, then it is correct for the planner to account for them.  But I suspect that many of the hopeless re-descents may end up on leaf pages already in memory.  Anyway, fixing the planner and fixing the executor would probably go hand in hand) 

For the slower index, the gin code doesn't estimate how many rechecks there will be, only how many end results there will be.  But the rechecks are where most of the true cost is.  So that makes this one end up with an incorrectly low cost.  The overestimation of one and the underestimation of the other bring the two estimates close enough together that random variations in the stats can drive the choice between them.  Now you could say "all the row estimates are 1, so how could there be any variation"?  But notice that for the bitmap heap scans, there is a filter step which removes around 1000 rows from each partition.  We know how many rows were actually removed from each partition, but we are not told how many rows the planner thought it would remove.  Variations in this unseen estimated filtered number  would lead to variations in the cost estimate.

While fixing the planner would be nice, I think the best current solution for you is to change the order of columns in the 3-column index to be (text, start_time, key).  Since the condition on "text" is extremely selective, the planner should know that by the time it gets to the key column it can check it with a single hash-table probe rather than a bunch of costly re-descents.  This should be estimated to be faster than either of the existing two indexes, and also should actually be faster.

Cheers,

Jeff