Query performance

Поиск
Список
Период
Сортировка
От Dirschel, Steve-CW
Тема Query performance
Дата
Msg-id BYAPR06MB447199DE1FCF363EF389EF1A8A362@BYAPR06MB4471.namprd06.prod.outlook.com
обсуждение
Ответ на Re: how to switch user in postgres  ("David G. Johnston" <david.g.johnston@gmail.com>)
Ответы Re: Query performance
Список pgsql-performance
Aurora Postgres version 17.4.

Table in question:

\d poslog_publisher_rms_stage
                   Table "public.poslog_publisher_rms_stage"
      Column       |           Type           | Collation | Nullable | Default
-------------------+--------------------------+-----------+----------+---------
 stage_id          | uuid                     |           | not null |
 status            | character varying(100)   |           |          |
 message_body      | text                     |           | not null |
 error_code        | character varying(100)   |           |          |
 error_category    | character varying(100)   |           |          |
 error_message     | text                     |           |          |
 error_retry_count | integer                  |           |          | 0
 create_date       | timestamp with time zone |           | not null | now()
 modified_date     | timestamp with time zone |           | not null | now()
Indexes:
    "poslog_publisher_rms_stage_pkey" PRIMARY KEY, btree (stage_id)
    "idx_poslog_publisher_stage_create_date_col" btree (create_date)
    "idx_poslog_publisher_stage_status_error_retry_count_modi_date_c" btree (status, error_retry_count, modified_date)
Referenced by:
    TABLE "poslog_publisher_rms_detail" CONSTRAINT "fk_poslog_publisher_detail_stage_id" FOREIGN KEY (stage_id) REFERENCES poslog_publisher_rms_stage(stage_id)
Publications:
    “sashpsrms_publication"

The table is constantly getting loaded into.  Rows are inserted with a status ready and then there is a different process looking for that status and will update to processed after processing the row.  We have multiple tables like this.

Then every 2 hours a different process runs this query looking for failed or unprocessed rows:

select
                      ppse.stage_id as stageId,
                      ppse.status as status,
                      ppse.message_body as messageBody
                    from
                      poslog_publisher_rms_stage ppse
                    where
                      ppse.status in ('UNPROCESSED','FAILED')
                      and ppse.error_retry_count < 3
                    order by
                      ppse.create_date
                    limit 100;

If I run that query with explain it is doing a ton of work to find 0 rows.  The index it uses seems appropriate for the query.

                                                                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.92..10.93 rows=1 width=1019) (actual time=66566.823..66566.824 rows=0 loops=1)
   Buffers: shared hit=1509768 read=2011479
   I/O Timings: shared read=79792.017
   ->  Sort  (cost=10.92..10.93 rows=1 width=1019) (actual time=66566.821..66566.821 rows=0 loops=1)
         Sort Key: create_date
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=1509768 read=2011479
         I/O Timings: shared read=79792.017
         ->  Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_stage ppse  (cost=0.57..10.91 rows=1 width=1019) (actual time=66566.761..66566.761 rows=0 loops=1)
               Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
               Buffers: shared hit=1509765 read=2011479
               I/O Timings: shared read=79792.017
 Planning:
   Buffers: shared hit=195 read=1
   I/O Timings: shared read=1.038
 Planning Time: 2.909 ms
 Execution Time: 66581.498 ms

The query did 3.5 million block reads when scanning the index of which 1.5 million were in memory and 2 million were from disk.  5 seconds later I ran the exact same query again:

                                                                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.92..10.93 rows=1 width=1019) (actual time=23.589..23.591 rows=0 loops=1)
   Buffers: shared hit=18736
   ->  Sort  (cost=10.92..10.93 rows=1 width=1019) (actual time=23.588..23.589 rows=0 loops=1)
         Sort Key: create_date
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=18736
         ->  Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_stage ppse  (cost=0.57..10.91 rows=1 width=1019) (actual time=23.583..23.584 rows=0 loops=1)
               Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
               Buffers: shared hit=18736
 Planning Time: 0.118 ms
 Execution Time: 23.628 ms

Now it only did 18k block reads all in memory.  It used the same index, it also returned 0 rows.  Between those 2 runs I looked at pg_stat_user_tables and could see the n_tup_ins increased by 13, n_tup_del increased by 13, n_live_tup increased by 13, and n_dead_tup increased by 13.  n_live_tup was 96 million and n_dead_tup was 8.4 million.  18k logical reads to find 0 rows is still high but I believe that is most likely caused by the 8.4 million n_dead_tups.

15 minutes later I ran the query again:

                                                                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.92..10.93 rows=1 width=1019) (actual time=59431.795..59431.796 rows=0 loops=1)
   Buffers: shared hit=1286676 read=1734000
   I/O Timings: shared read=70153.578
   ->  Sort  (cost=10.92..10.93 rows=1 width=1019) (actual time=59431.794..59431.794 rows=0 loops=1)
         Sort Key: create_date
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=1286676 read=1734000
         I/O Timings: shared read=70153.578
         ->  Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_stage ppse  (cost=0.57..10.91 rows=1 width=1019) (actual time=59431.789..59431.790 rows=0 loops=1)
               Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
               Buffers: shared hit=1286676 read=1734000
               I/O Timings: shared read=70153.578
 Planning Time: 0.114 ms
 Execution Time: 59431.839 ms 

Total blocks reads increased from 18k to 3 million.  n_tup_ins, n_tup_del, n_live_tup, and n_dead_tup all increased by 10,320 over that 15 minute period of time.  

What is going on here where this query has to do 3+ million block reads to find 0 rows?  And how is it possible when I run the query 2 times in a row the logical reads from the 2nd run comes down significantly?  Is this somehow related to determining if rows are visible or something like that?  When I waited 15 minutes between runs the inserted/updated rows only increased by 10.3k yet total block reads increased by 3 million.

Thanks

В списке pgsql-performance по дате отправления: