Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Дата
Msg-id 20210608173404.GF16435@telsasoft.com
обсуждение исходный текст
Ответ на Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Peter Geoghegan <pg@bowt.ie>)
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote:
> On 2021-Jun-06, Justin Pryzby wrote:
> 
> > However, I also found an autovacuum chewing 100% CPU, and it appears the
> > problem is actually because autovacuum has locked a page of pg-statistic, and
> > every other process then gets stuck waiting in the planner.  I checked a few
> > and found these:
> 
> Hmm ... I wonder if this could be related to commits d9d076222f5b,
> c98763bf51bf, etc.  I don't have any connecting thoughts other than the
> tuple visibility code being involved.  Do you see any procs with the
> PROC_IN_SAFE_IC flag set?

Can you give me a hint how to do that from a corefile ?

I got this far:
(gdb) p MyProc->links
$13 = {prev = 0x0, next = 0x0}

I do have a reindex script which runs CIC, and it does looks suspicious.

<<Mon Jun  7 22:00:54 CDT 2021: starting db=ts
...
Mon Jun  7 22:01:16 CDT 2021: ts: pg_subscription_rel: pg_subscription_rel_srrelid_srsubid_index(reindex system)...
Mon Jun  7 22:01:16 CDT 2021: ts: pg_subscription: pg_subscription_oid_index(reindex system)...
Mon Jun  7 22:01:16 CDT 2021: ts: pg_subscription: pg_subscription_subname_index(reindex system)...
Mon Jun  7 22:01:16 CDT 2021: ts: pg_subscription: pg_toast.pg_toast_6100_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext_data: pg_statistic_ext_data_stxoid_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext_data: pg_toast.pg_toast_3429_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_name_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_oid_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_relid_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_toast.pg_toast_3381_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic: pg_statistic_relid_att_inh_index(reindex system)...
Mon Jun  7 22:02:56 CDT 2021: ts: pg_statistic: pg_toast.pg_toast_2619_index(reindex system)...
Mon Jun  7 22:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_statio_all_tables_snap_t_idx(reindex
non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: pg_statio_all_tables_snap_t_idx_ccnew
Mon Jun  7 23:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_toast.pg_toast_33011_index(reindex system)...
Mon Jun  7 23:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_statio_all_indexes_snap_t_idx(reindex
non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: pg_statio_all_indexes_snap_t_idx_ccnew
Tue Jun  8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)...
Tue Jun  8 00:02:57 CDT 2021: ts: pg_shseclabel: pg_shseclabel_object_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shseclabel: pg_toast.pg_toast_3592_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shdescription: pg_shdescription_o_c_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shdescription: pg_toast.pg_toast_2396_index(reindex system)...
...
Tue Jun  8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)...
Tue Jun  8 00:02:57 CDT 2021: ts: pg_shseclabel: pg_shseclabel_object_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shseclabel: pg_toast.pg_toast_3592_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shdescription: pg_shdescription_o_c_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shdescription: pg_toast.pg_toast_2396_index(reindex system)...
...
Tue Jun  8 01:21:20 CDT 2021: ts: pg_aggregate: pg_aggregate_fnoid_index(reindex system)...
Tue Jun  8 01:21:20 CDT 2021: ts: pg_aggregate: pg_toast.pg_toast_2600_index(reindex system)...
Tue Jun  8 01:21:20 CDT 2021: ts: permissions: perm_group_idx(reindex non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: perm_group_idx_ccnew
Tue Jun  8 02:21:20 CDT 2021: ts: permissions: perm_user_idx(reindex non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: perm_user_idx_ccnew
Tue Jun  8 03:21:20 CDT 2021: ts: permissions: pg_toast.pg_toast_33577_index(reindex system)...
Tue Jun  8 03:21:20 CDT 2021: ts: patchfiles: patchfiles_filename_idx(reindex non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: patchfiles_filename_idx_ccnew
Tue Jun  8 04:21:21 CDT 2021: ts: patchfiles: patchfiles_pkey(reindex non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: patchfiles_pkey_ccnew

=> It's strange that these timed out, since the statio tables are less than
15MB, and permissions and patchfiles are both under 100kB.

And it seems to say that it time out after less than 1sec.

They're running this:
| PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i"
And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad"

I've killed it a little bit ago, but I should've saved the start time of the
autovacuum.  I found this:

#5  heap_vacuum_rel (rel=0x7fe636faed28, params=0x12ce89c, bstrategy=<optimized out>) at vacuumlazy.c:612
starttime = 676436464463888
ru0 = {tv = {tv_sec = 1623121264, tv_usec = 463887}, ru = {ru_utime = {tv_sec = 0, tv_usec = 77418}, ru_stime = {tv_sec
=0, tv_usec = 13866}, {ru_maxrss = 7440, __ru_maxrss_word = 7440}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {
 
      ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 1826, __ru_minflt_word =
1826},{ru_majflt = 1, __ru_majflt_word = 1}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 2008,
 
      __ru_inblock_word = 2008}, {ru_oublock = 192, __ru_oublock_word = 192}, {ru_msgsnd = 0, __ru_msgsnd_word = 0},
{ru_msgrcv= 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 29,
 
      __ru_nvcsw_word = 29}, {ru_nivcsw = 9, __ru_nivcsw_word = 9}}}

$ date -d @1623121264
Mon Jun  7 22:01:04 CDT 2021

$ date -d '2000-01-01 UTC + 676436464seconds'
Mon Jun  7 22:01:04 CDT 2021

-- 
Justin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: automatically generating node support functions