Re: Query performance
| От | Laurenz Albe |
|---|---|
| Тема | Re: Query performance |
| Дата | |
| Msg-id | a6f4756df05f38d3bf8df37c5fb9e810450cb1e8.camel@cybertec.at обсуждение |
| Ответ на | Query performance ("Dirschel, Steve-CW" <Steve.Dirschel@bestbuy.com>) |
| Список | pgsql-performance |
On Mon, 2026-04-27 at 15:52 +0000, Dirschel, Steve-CW wrote:
> Aurora Postgres version 17.4.
Aurora works quite differently, as far as storage is concerned, so you may be
suffering from some peculiarity of that closed source software.
> Table in question:
>
>
> \d poslog_publisher_rms_stage
> Table "public.poslog_publisher_rms_stage"
> Column | Type | Collation | Nullable | Default
> -------------------+--------------------------+-----------+----------+---------
> stage_id | uuid | | not null |
> status | character varying(100) | | |
> message_body | text | | not null |
> error_code | character varying(100) | | |
> error_category | character varying(100) | | |
> error_message | text | | |
> error_retry_count | integer | | | 0
> create_date | timestamp with time zone | | not null | now()
> modified_date | timestamp with time zone | | not null | now()
> Indexes:
> "poslog_publisher_rms_stage_pkey" PRIMARY KEY, btree (stage_id)
> "idx_poslog_publisher_stage_create_date_col" btree (create_date)
> "idx_poslog_publisher_stage_status_error_retry_count_modi_date_c" btree (status, error_retry_count,
modified_date)
> Referenced by:
> TABLE "poslog_publisher_rms_detail" CONSTRAINT "fk_poslog_publisher_detail_stage_id" FOREIGN KEY (stage_id)
REFERENCESposlog_publisher_rms_stage(stage_id)
> Publications:
> “sashpsrms_publication"
>
> The table is constantly getting loaded into. Rows are inserted with a status ready and
> then there is a different process looking for that status and will update to processed
> after processing the row. We have multiple tables like this.
>
> Then every 2 hours a different process runs this query looking for failed or unprocessed rows:
>
> select
> ppse.stage_id as stageId,
> ppse.status as status,
> ppse.message_body as messageBody
> from
> poslog_publisher_rms_stage ppse
> where
> ppse.status in ('UNPROCESSED','FAILED')
> and ppse.error_retry_count < 3
> order by
> ppse.create_date
> limit 100;
>
> If I run that query with explain it is doing a ton of work to find 0 rows. The index it
> uses seems appropriate for the query.
The index can be used, but it is far from perfect. The ideal index would be:
CREATE INDEX ON poslog_publisher_rms_stage (create_date)
WHERE status in ('UNPROCESSED','FAILED') AND error_retry_count < 3;
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=10.92..10.93 rows=1 width=1019) (actual time=66566.823..66566.824 rows=0 loops=1)
> Buffers: shared hit=1509768 read=2011479
> I/O Timings: shared read=79792.017
> -> Sort (cost=10.92..10.93 rows=1 width=1019) (actual time=66566.821..66566.821 rows=0 loops=1)
> Sort Key: create_date
> Sort Method: quicksort Memory: 25kB
> Buffers: shared hit=1509768 read=2011479
> I/O Timings: shared read=79792.017
> -> Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on
poslog_publisher_rms_stageppse (cost=0.57..10.91 rows=1 width=1019) (actual time=66566.761..66566.761 rows=0 loops=1)
> Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
> Buffers: shared hit=1509765 read=2011479
> I/O Timings: shared read=79792.017
> Planning:
> Buffers: shared hit=195 read=1
> I/O Timings: shared read=1.038
> Planning Time: 2.909 ms
> Execution Time: 66581.498 ms
>
> The query did 3.5 million block reads when scanning the index of which 1.5 million were
> in memory and 2 million were from disk. 5 seconds later I ran the exact same query again:
>
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=10.92..10.93 rows=1 width=1019) (actual time=23.589..23.591 rows=0 loops=1)
> Buffers: shared hit=18736
> -> Sort (cost=10.92..10.93 rows=1 width=1019) (actual time=23.588..23.589 rows=0 loops=1)
> Sort Key: create_date
> Sort Method: quicksort Memory: 25kB
> Buffers: shared hit=18736
> -> Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on
poslog_publisher_rms_stageppse (cost=0.57..10.91 rows=1 width=1019) (actual time=23.583..23.584 rows=0 loops=1)
> Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
> Buffers: shared hit=18736
> Planning Time: 0.118 ms
> Execution Time: 23.628 ms
>
> Now it only did 18k block reads all in memory. It used the same index, it also
> returned 0 rows. Between those 2 runs I looked at pg_stat_user_tables and could see the
> n_tup_ins increased by 13, n_tup_del increased by 13, n_live_tup increased by 13, and
> n_dead_tup increased by 13. n_live_tup was 96 million and n_dead_tup was 8.4 million.
> 18k logical reads to find 0 rows is still high but I believe that is most likely caused
> by the 8.4 million n_dead_tups.
>
> What is going on here where this query has to do 3+ million block reads to find 0 rows?
> And how is it possible when I run the query 2 times in a row the logical reads from the
> 2nd run comes down significantly? Is this somehow related to determining if rows are
> visible or something like that?
The only way I can imagine this happening in PostgreSQL is if the first execution
kills a lot of index tuples (https://www.cybertec-postgresql.com/en/killed-index-tuples/).
> When I waited 15 minutes between runs the inserted/updated rows only increased by
> 10.3k yet total block reads increased by 3 million.
Now that seems to speak against the above theory, so you are suffering from some Amazon-
specific behavior.
Yours,
Laurenz Albe
В списке pgsql-performance по дате отправления: