Select count(*) on a 2B Rows Tables Takes ~20 Hours

Поиск
Список
Период
Сортировка
От Fd Habash
Тема Select count(*) on a 2B Rows Tables Takes ~20 Hours
Дата
Msg-id 5b9a9f81.1c69fb81.43388.0e8b@mx.google.com
обсуждение исходный текст
Ответы Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Список pgsql-performance

Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected to be slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time increase linearly with table size.

 

However, I do not know if elapsed time I’m getting is to be expected.

 

Table reltuples in pg_class = 2,266,649,344 (pretty close)

Query = select count(*) from jim.sttyations ;

Elapsed time (ET) = 18.5 hrs

 

This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g). CPU usage during count run hovers around 20% with 20g of freeable memory.

 

Is this ET expected? If not, what could be slowing it down? I’m currently running explain analyze and I’ll share the final output when done.

 

I’m familiar with the ideas listed here https://www.citusdata.com/blog/2016/10/12/count-performance/

 

Table "jim.sttyations"

      Column       |           Type           |         Modifiers          | Storage  | Stats target | Description

-------------------+--------------------------+----------------------------+----------+--------------+-------------

stty_id            | bigint                   | not null                   | plain    |              |

 stty_hitlist_line  | text                     | not null                   | extended |              |

 stty_status        | text                     | not null default 'Y'::text | extended |              |

 stty_status_date   | timestamp with time zone | not null                   | plain    |              |

 vs_number         | integer                  | not null                   | plain    |              |

 stty_date_created  | timestamp with time zone | not null                   | plain    |              |

 stty_stty_id        | bigint                   |                            | plain    |              |

 stty_position      | bigint                   |                            | plain    |              |

 mstty_id           | bigint                   |                            | plain    |              |

 vsr_number        | integer                  |                            | plain    |              |

 stty_date_modified | timestamp with time zone |                            | plain    |              |

 stty_stored        | text                     | not null default 'N'::text | extended |              |

 stty_sequence      | text                     |                            | extended |              |

 stty_hash          | text                     |                            | extended |              |

Indexes:

    "stty_pk" PRIMARY KEY, btree (stty_id)

    "stty_indx_fk01" btree (stty_stty_id)

    "stty_indx_fk03" btree (vsr_number)

    "stty_indx_fk04" btree (vs_number)

    "stty_indx_pr01" btree (mstty_id, stty_id)

Check constraints:

    "stty_cnst_ck01" CHECK (stty_status = ANY (ARRAY['Y'::text, 'N'::text]))

    "stty_cnst_ck02" CHECK (stty_stored = ANY (ARRAY['N'::text, 'Y'::text]))

Foreign-key constraints:

    "stty_cnst_fk01" FOREIGN KEY (stty_stty_id) REFERENCES sttyations(stty_id) NOT VALID

    "stty_cnst_fk02" FOREIGN KEY (mstty_id) REFERENCES master_sttyations(mstty_id)

    "stty_cnst_fk03" FOREIGN KEY (vsr_number) REFERENCES valid_status_reasons(vsr_number)

 

----------------
Thank you

 

 

refpep-> select count(*) from jim.sttyations;

                                                    QUERY PLAN                                                   

------------------------------------------------------------------------------------------------------------------

Aggregate  (cost=73451291.77..73451291.78 rows=1 width=8)

   Output: count(*)

   ->  Index Only Scan using stty_indx_fk03 on jim.sttyations  (cost=0.58..67784668.41 rows=2266649344 width=0)

         Output: vsr_number

(4 rows)

 

В списке pgsql-performance по дате отправления:

Предыдущее
От: padusuma
Дата:
Сообщение: Re: Performance of INSERT into temporary tables using psqlODBCdriver
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours