Re: BUG #16833: postgresql 13.1 process crash every hour

Поиск
Список
Период
Сортировка
От Alex F
Тема Re: BUG #16833: postgresql 13.1 process crash every hour
Дата
Msg-id CAGbr_zVNNrpDuDTi5eNYrUjg_8SQ_anggyfR6C0=Lx_3yv=jbg@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #16833: postgresql 13.1 process crash every hour  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #16833: postgresql 13.1 process crash every hour  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Dear team, previously I wrote about postgresql crash (ref #16833)
Unfortunately the issue is still present. After databases reload from backup I see that crash happens once 1-2 weeks.
It's almost impossible to activate debugging because daily logs size in regular mode > 100gb/day with 10+million queries per day.

Together with colleagues I think we are able to identify a crash case. This is a test environment running on pg13.2 with 10 databases inside (1 working test server = 1 database)
Some tables inside the database are always updated with data product crawler. Based on these tables in reporting purposes we build concurrent materialized views which can build for 9-25min.

Some log example for such view build:
2021-05-03 06:14:55 UTC [6220]: user=servlet,db=feature5042021,app=feature5-vertx,vertx-DataStore,client=vm_ip LOG:  duration: 0.031 ms  parse <unnamed>: REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv
2021-05-03 06:14:55 UTC [6220]: user=servlet,db=feature5042021,app=feature5-vertx,vertx-DataStore,client=vm_ip LOG:  duration: 0.038 ms  bind <unnamed>: REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv
2021-05-03 06:16:21 UTC [6220]: user=servlet,db=feature5042021,app=feature5-vertx,vertx-DataStore,client=vm_ip STATEMENT:  REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv
...
2021-05-03 06:21:22 UTC [6220]: user=servlet,db=feature5042021,app=feature5-vertx,vertx-DataStore,client=vm_ip STATEMENT:  REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv
2021-05-03 06:37:42 UTC [6220]: user=servlet,db=feature5042021,app=feature5-vertx,vertx-DataStore,client=vm_ip LOG:  duration: 1366910.618 ms  execute <unnamed>: REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv

From example log cut you can see that STATEMENT [6220] process logs job just every second until it's end.

Now suggest a review of the crash case. MATERIALIZED VIEW build started work 18 min 12s before database crash
2021-05-10 06:45:58 UTC [28907]: user=servlet,db=feature5042021,app=feature5-vertx,vertx-DataStore,client=vm_ip LOG:  duration: 0.039 ms  parse <unnamed>: REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv
2021-05-10 06:45:58 UTC [28907]: user=servlet,db=feature5042021,app=feature5-vertx,vertx-DataStore,client=vm_ip LOG:  duration: 0.039 ms  bind <unnamed>: REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv
2021-05-10 06:47:06 UTC [28907]: user=servlet,db=feature5042021,app=feature5-vertx,vertx-DataStore,client=vm_ip STATEMENT:  REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv
...with last logged entry for [28907] process
2021-05-10 06:52:35 UTC [28907]: user=servlet,db=feature5042021,app=feature5-vertx,vertx-DataStore,client=vm_ip STATEMENT:  REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv
...then [28907] process do not log anymore
...and crash happened here after 11m 35s from last [28907] log entry
2021-05-10 07:04:10 UTC [1]: user=,db=,app=,client= LOG:  server process (PID 28907) was terminated by signal 11: Segmentation fault
2021-05-10 07:04:10 UTC [1]: user=,db=,app=,client= DETAIL:  Failed process was running: REFRESH MATERIALIZED VIEW CONCURRENTLY project.product_original_mv
2021-05-10 07:04:10 UTC [1]: user=,db=,app=,client= LOG:  terminating any other active server processes
2021-05-10 07:04:10 UTC [32461]: user=servlet,db=staging,app=staging-servlet,servlet-PostgresClient,client=vm_ip1 WARNING:  terminating connection because of crash of another server process
2021-05-10 07:04:10 UTC [32461]: user=servlet,db=staging,app=staging-servlet,servlet-PostgresClient,client=vm_ip1 DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-05-10 07:04:10 UTC [32461]: user=servlet,db=staging,app=staging-servlet,servlet-PostgresClient,client=vm_ip1 HINT:  In a moment you should be able to reconnect to the database and repeat your command.

One more important note. Production system run on pg12.2 and have absolutely the same project logic with the same crawlers, table updates and views but work stable for a long time without any crash
Test system with log examples shown run on pg13.2 and crash every week.

Thanks for your support!


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #15990: PROCEDURE throws "SQL Error [XX000]: ERROR: no known snapshots" with PostGIS geometries
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16833: postgresql 13.1 process crash every hour