Обсуждение: backend process was crashed when using zabbix monitor

Поиск
Список
Период
Сортировка

backend process was crashed when using zabbix monitor

От
"梁博"
Дата:
## 1. environmental information
PostgreSQL version: 9.6.8
Operating system:   CentOS 6.9

## 2. problem
When ZABBIX is monitored, the following error message appears in the PG log

    2018-12-24 10:54:33.928 CST 24677     LOG:  server process (PID 13627) was terminated by signal 6: Aborted
    2018-12-24 10:54:33.928 CST 24677     DETAIL:  Failed process was running: select sum(calls) as sum_calls from pg_stat_statements
    2018-12-24 10:54:33.928 CST 24677     LOG:  terminating any other active server processes

## 3. Occurrence condition
SQL was executed in ZABBIX monitoring:select sum(calls) as sum_calls from pg_stat_statements

## 4. invest
### The similar report
https://www.postgresql.org/message-id/1935021523893352%40web12o.yandex.ru

### stacktrace
[root@sndsdevsdd01 crash_20181224]# gdb /usr/pgsql-10/bin/postgres core.postgres.1545620056.13627
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/pgsql-10/bin/postgres...Reading symbols from /usr/lib/debug/usr/pgsql-10/bin/postgres.debug...done.
done.
[New Thread 13627]
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/37/7408871ceed5b3282b8fc1d771a14fec2a7cc7
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
...Core was generated by `postgres: monitor lobadbt1'.
Program terminated with signal 6, Aborted.
#0  0x000000369e232625 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install audit-libs-2.2-2.el6.x86_64 cyrus-sasl-lib-2.1.23-13.el6.x86_64 glibc-2.12-1.166.el6_7.7.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.9-33.el6.x86_64 libcom_err-1.41.12-12.el6.x86_64 libcurl-7.19.7-26.el6_2.4.x86_64 libgcc-4.4.6-4.el6.x86_64 libicu-4.2.1-9.1.el6_2.x86_64 libidn-1.18-2.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libssh2-1.2.2-7.el6_2.3.x86_64 libstdc++-4.4.6-4.el6.x86_64 libxml2-2.7.6-4.el6_2.4.x86_64 nspr-4.9-1.el6.x86_64 nss-3.13.3-6.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 nss-util-3.13.3-2.el6.x86_64 openldap-2.4.23-26.el6.x86_64 openssl-1.0.1e-42.el6_7.4.x86_64 pam-1.1.1-10.el6_2.1.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x000000369e232625 in raise () from /lib64/libc.so.6
#1  0x000000369e233e05 in abort () from /lib64/libc.so.6
#2  0x000000369e270537 in __libc_message () from /lib64/libc.so.6
#3  0x000000369e275f4e in malloc_printerr () from /lib64/libc.so.6
#4  0x000000369e278cf0 in _int_free () from /lib64/libc.so.6
#5  0x0000000000858ab7 in tuplestore_end (state=0x2b6cee0) at tuplestore.c:458#6  0x00000000006088a5 in ExecEndFunctionScan (node=0x2b48d00) at nodeFunctionscan.c:550
#7  0x00000000005f7c9e in ExecEndPlan (queryDesc=<value optimized out>) at execMain.c:1606
#8  standard_ExecutorEnd (queryDesc=<value optimized out>) at execMain.c:495
#9  0x00000000005af67e in PortalCleanup (portal=0x2a68e08) at portalcmds.c:302
#10 0x000000000084d19a in PortalDrop (portal=0x2a68e08, isTopCommit=0 '\000') at portalmem.c:489
#11 0x0000000000723f62 in exec_simple_query (
    query_string=0x2a29ef8 "select sum(calls) as sum_calls from pg_stat_statements") at postgres.c:1109
#12 0x0000000000724f09 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x29b7960 "lobadbt1", username=<value optimized out>) at postgres.c:4088
#13 0x00000000006b8c8a in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4405
#14 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4077
#15 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1755
#16 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1363
#17 0x00000000006395e0 in main (argc=1, argv=0x29739e0) at main.c:228
(gdb) f 5
#5  0x0000000000858ab7 in tuplestore_end (state=0x2b6cee0) at tuplestore.c:458
458 BufFileClose(state->myfile);
(gdb) p state->myfile$1 = (BufFile *) 0x74e1a68
(gdb) p *state->myfile
$2 = {numFiles = 1, files = 0x587d0a8, offsets = 0x5564668, isTemp = 1 '\001', isInterXact = 0 '\000',
  dirty = 0 '\000', resowner = 0x2974e68, curFile = 0, curOffset = 42890164, pos = 0, nbytes = 0,
  buffer = "152958850543,20181222153019086124,20181222153127806225,20181222153158113565,20181222153317920155,20181222153437816384,20181222153640243830,20181222153758325087,20181227,2018122215383582234"...}
(gdb) p debug_query_string
$3 = 0x2a29ef8 "select sum(calls) as sum_calls from pg_stat_statements"
(gdb)

Re: backend process was crashed when using zabbix monitor

От
Tom Lane
Дата:
"=?utf-8?B?5qKB5Y2a?=" <12020405@qq.com> writes:
> [ abort trap inside malloc_printerr ]

It looks like glibc noticed corruption of malloc's data structures, which
implies that something inside the server overwrote memory it shouldn't
have ... but just where the bug is is impossible to say from this info.

Do you use any other PG extensions besides pg_stat_statements?  Our
usual experience is that extensions have more bugs than the core server,
though in any one case that could be wrong.

It would be worth updating to current (9.6.11) just on general principles.
I'm too lazy to check the release notes to see whether we fixed any
memory-stomp bugs since 9.6.8, but we've certainly fixed many bugs since
then.

            regards, tom lane


Re:Re: backend process was crashed when using zabbix monitor

От
"梁博"
Дата:
hello tom lane

Happy new year!

> Do you use any other PG extensions besides pg_stat_statements?  Our
> usual experience is that extensions have more bugs than the core server,
> though in any one case that could be wrong.

This problem is hard to replicate (our multiple business systems have only appeared once so far), so it is not guaranteed that other extensions will be able to replicate.

> It would be worth updating to current (9.6.11) just on general principles.
> I'm too lazy to check the release notes to see whether we fixed any
> memory-stomp bugs since 9.6.8, but we've certainly fixed many bugs since
> then.

In fact, we are using PostgreSQL 10.2(last written as 9.6.8, I'm sorry), because LWLock buffer_content problem also appears in our other systems (https://www.postgresql.org/message-id/flat/47384c2e.297.16703d5b958.Coremail.chjischj%40163.com ef3fb6e8d82013234067f92194),

Subsequently, we plan to upgrade all our systems to 10.7, and then confirm.

Thank you very much for your advice.



------------------ 原始邮件 ------------------
发送时间: 2018年12月28日(星期五) 23:36
收件人: "梁博" <12020405@qq.com>;
抄送: "pgsql-sql" <pgsql-sql@lists.postgresql.org>;
主题: Re: backend process was crashed when using zabbix monitor


"梁博" <12020405@qq.com> writes:
> [ abort trap inside malloc_printerr ]

It looks like glibc noticed corruption of malloc's data structures, which
implies that something inside the server overwrote memory it shouldn't
have ... but just where the bug is is impossible to say from this info.

Do you use any other PG extensions besides pg_stat_statements?  Our
usual experience is that extensions have more bugs than the core server,
though in any one case that could be wrong.

It would be worth updating to current (9.6.11) just on general principles.
I'm too lazy to check the release notes to see whether we fixed any
memory-stomp bugs since 9.6.8, but we've certainly fixed many bugs since
then.

regards, tom lane

Re: backend process was crashed when using zabbix monitor

От
Peter Geoghegan
Дата:
On Fri, Dec 28, 2018 at 7:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It would be worth updating to current (9.6.11) just on general principles.
> I'm too lazy to check the release notes to see whether we fixed any
> memory-stomp bugs since 9.6.8, but we've certainly fixed many bugs since
> then.

My guess is the same as it was back in April: tuplestore and/or one of
its callers gets confused about some aspect of resource management,
leading to a double-free.

Last time it was also a view that expands to a function call
(pg_buffercache, rather than pg_stat_statements).

I have a question for 梁博: Does pgpool happen to be involved here? That
was a factor in the bug report back in April. It's not impossible that
it's part of the problem here, too.

--
Peter Geoghegan


Re:Re: backend process was crashed when using zabbix monitor

От
"梁博"
Дата:
>I have a question for 梁博: Does pgpool happen to be involved here? That
>was a factor in the bug report back in April. It's not impossible that
>it's part of the problem here, too.

Pgpool is not used in our system(pgbouncer is used in our system).ZABBIX monitoring is deployed locally.

------------------ 原始邮件 ------------------
发送时间: 2019年1月1日(星期二) 9:27
收件人: "Tom Lane" <tgl@sss.pgh.pa.us>;
抄送: "梁博" <12020405@qq.com>;"pgsql-sql" <pgsql-sql@lists.postgresql.org>;
主题: Re: backend process was crashed when using zabbix monitor


On Fri, Dec 28, 2018 at 7:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It would be worth updating to current (9.6.11) just on general principles.
> I'm too lazy to check the release notes to see whether we fixed any
> memory-stomp bugs since 9.6.8, but we've certainly fixed many bugs since
> then.

My guess is the same as it was back in April: tuplestore and/or one of
its callers gets confused about some aspect of resource management,
leading to a double-free.

Last time it was also a view that expands to a function call
(pg_buffercache, rather than pg_stat_statements).

I have a question for 梁博: Does pgpool happen to be involved here? That
was a factor in the bug report back in April. It's not impossible that
it's part of the problem here, too.

--
Peter Geoghegan