Query performance
| От | Dirschel, Steve-CW |
|---|---|
| Тема | Query performance |
| Дата | |
| Msg-id | BYAPR06MB447199DE1FCF363EF389EF1A8A362@BYAPR06MB4471.namprd06.prod.outlook.com обсуждение |
| Ответ на | Re: how to switch user in postgres ("David G. Johnston" <david.g.johnston@gmail.com>) |
| Ответы |
Re: Query performance
|
| Список | pgsql-performance |
Aurora Postgres version 17.4.
Table in question:
\d poslog_publisher_rms_stage
Table "public.poslog_publisher_rms_stage"
Column | Type | Collation | Nullable | Default
-------------------+--------------------------+-----------+----------+---------
stage_id | uuid | | not null |
status | character varying(100) | | |
message_body | text | | not null |
error_code | character varying(100) | | |
error_category | character varying(100) | | |
error_message | text | | |
error_retry_count | integer | | | 0
create_date | timestamp with time zone | | not null | now()
modified_date | timestamp with time zone | | not null | now()
Indexes:
"poslog_publisher_rms_stage_pkey" PRIMARY KEY, btree (stage_id)
"idx_poslog_publisher_stage_create_date_col" btree (create_date)
"idx_poslog_publisher_stage_status_error_retry_count_modi_date_c" btree (status, error_retry_count, modified_date)
Referenced by:
TABLE "poslog_publisher_rms_detail" CONSTRAINT "fk_poslog_publisher_detail_stage_id" FOREIGN KEY (stage_id) REFERENCES poslog_publisher_rms_stage(stage_id)
Publications:
“sashpsrms_publication"
The table is constantly getting loaded into. Rows are inserted with a status ready and then there is a different process looking for that status and will update to processed after processing the row. We have multiple tables like this.
Then every 2 hours a different process runs this query looking for failed or unprocessed rows:
select
ppse.stage_id as stageId,
ppse.status as status,
ppse.message_body as messageBody
from
poslog_publisher_rms_stage ppse
where
ppse.status in ('UNPROCESSED','FAILED')
and ppse.error_retry_count < 3
order by
ppse.create_date
limit 100;
If I run that query with explain it is doing a ton of work to find 0 rows. The index it uses seems appropriate for the query.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10.92..10.93 rows=1 width=1019) (actual time=66566.823..66566.824 rows=0 loops=1)
Buffers: shared hit=1509768 read=2011479
I/O Timings: shared read=79792.017
-> Sort (cost=10.92..10.93 rows=1 width=1019) (actual time=66566.821..66566.821 rows=0 loops=1)
Sort Key: create_date
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1509768 read=2011479
I/O Timings: shared read=79792.017
-> Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_stage ppse (cost=0.57..10.91 rows=1 width=1019) (actual time=66566.761..66566.761 rows=0 loops=1)
Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
Buffers: shared hit=1509765 read=2011479
I/O Timings: shared read=79792.017
Planning:
Buffers: shared hit=195 read=1
I/O Timings: shared read=1.038
Planning Time: 2.909 ms
Execution Time: 66581.498 ms
The query did 3.5 million block reads when scanning the index of which 1.5 million were in memory and 2 million were from disk. 5 seconds later I ran the exact same query again:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10.92..10.93 rows=1 width=1019) (actual time=23.589..23.591 rows=0 loops=1)
Buffers: shared hit=18736
-> Sort (cost=10.92..10.93 rows=1 width=1019) (actual time=23.588..23.589 rows=0 loops=1)
Sort Key: create_date
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=18736
-> Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_stage ppse (cost=0.57..10.91 rows=1 width=1019) (actual time=23.583..23.584 rows=0 loops=1)
Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
Buffers: shared hit=18736
Planning Time: 0.118 ms
Execution Time: 23.628 ms
Now it only did 18k block reads all in memory. It used the same index, it also returned 0 rows. Between those 2 runs I looked at pg_stat_user_tables and could see the n_tup_ins increased by 13, n_tup_del increased by 13, n_live_tup increased by 13, and n_dead_tup increased by 13. n_live_tup was 96 million and n_dead_tup was 8.4 million. 18k logical reads to find 0 rows is still high but I believe that is most likely caused by the 8.4 million n_dead_tups.
15 minutes later I ran the query again:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10.92..10.93 rows=1 width=1019) (actual time=59431.795..59431.796 rows=0 loops=1)
Buffers: shared hit=1286676 read=1734000
I/O Timings: shared read=70153.578
-> Sort (cost=10.92..10.93 rows=1 width=1019) (actual time=59431.794..59431.794 rows=0 loops=1)
Sort Key: create_date
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1286676 read=1734000
I/O Timings: shared read=70153.578
-> Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_stage ppse (cost=0.57..10.91 rows=1 width=1019) (actual time=59431.789..59431.790 rows=0 loops=1)
Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
Buffers: shared hit=1286676 read=1734000
I/O Timings: shared read=70153.578
Planning Time: 0.114 ms
Execution Time: 59431.839 ms
Total blocks reads increased from 18k to 3 million. n_tup_ins, n_tup_del, n_live_tup, and n_dead_tup all increased by 10,320 over that 15 minute period of time.
What is going on here where this query has to do 3+ million block reads to find 0 rows? And how is it possible when I run the query 2 times in a row the logical reads from the 2nd run comes down significantly? Is this somehow related to determining if rows are visible or something like that? When I waited 15 minutes between runs the inserted/updated rows only increased by 10.3k yet total block reads increased by 3 million.
Thanks
В списке pgsql-performance по дате отправления: