Обсуждение: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault

Поиск
Список
Период
Сортировка

BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault

От
harukat@sraoss.co.jp
Дата:
The following bug has been logged on the website:

Bug reference:      8397
Logged by:          TAKATSUKA Haruka
Email address:      harukat@sraoss.co.jp
PostgreSQL version: 9.2.4
Operating system:   Linux (CentOS6)
Description:

Hi.


I report a small bug.
pg_basebackup -x from new standby server sometimes causes Segmentation
fault.


(1) create new standby server dir by pg_basebackup without -x
(2) start new standby server
(3) pg_basebackup from new standby server with -x
(!) when new standby has no WAL files in pg_xlog,
    new standby's wal sender crash


new standby server's core file:


Core was generated by `postgres: wal sender process postgres ::1(55210)
sending backup "pg_basebackup'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-4.el6.x86_64
zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
#1  0x0000003b73675990 in _IO_str_init_static_internal () from
/lib64/libc.so.6
#2  0x0000003b73669935 in vsscanf () from /lib64/libc.so.6
#3  0x0000003b736639a8 in sscanf () from /lib64/libc.so.6
#4  0x0000000000622351 in perform_base_backup (opt=0x7fffc2e22300,
    tblspcdir=0xd424c0) at basebackup.c:304
#5  0x0000000000622c50 in SendBaseBackup (cmd=<value optimized out>)
    at basebackup.c:558
#6  0x000000000061f5b0 in HandleReplicationCommand () at walsender.c:482
#7  WalSndHandshake () at walsender.c:257
#8  WalSenderMain () at walsender.c:181
#9  0x0000000000650b12 in PostgresMain (argc=1, argv=<value optimized out>,
    dbname=0xc82a90 "", username=0xc82a70 "postgres") at postgres.c:3715
#10 0x000000000060c4f1 in BackendRun () at postmaster.c:3614
#11 BackendStartup () at postmaster.c:3304
#12 ServerLoop () at postmaster.c:1367
#13 0x000000000060f031 in PostmasterMain (argc=<value optimized out>,
    argv=<value optimized out>) at postmaster.c:1127
#14 0x00000000005ae140 in main (argc=5, argv=0xc80bb0) at main.c:199




./backend/replication/basebackup.c:304
   XLogFromFileName(walFiles[0], &tli, &logid, &logseg);


In this case, nWalFiles = 0 and walFiles[] palloced zero size.


Though pg_basebackup does not have to work in this rare case,
we should insert something like "if (nWalFiles <= 0) ereport(...);".




regards,

Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault

От
Magnus Hagander
Дата:
On Sat, Aug 24, 2013 at 1:46 PM,  <harukat@sraoss.co.jp> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      8397
> Logged by:          TAKATSUKA Haruka
> Email address:      harukat@sraoss.co.jp
> PostgreSQL version: 9.2.4
> Operating system:   Linux (CentOS6)
> Description:
>
> Hi.
>
>
> I report a small bug.
> pg_basebackup -x from new standby server sometimes causes Segmentation
> fault.
>
>
> (1) create new standby server dir by pg_basebackup without -x
> (2) start new standby server
> (3) pg_basebackup from new standby server with -x
> (!) when new standby has no WAL files in pg_xlog,
>     new standby's wal sender crash
>
>
> new standby server's core file:
>
>
> Core was generated by `postgres: wal sender process postgres ::1(55210)
> sending backup "pg_basebackup'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-4.el6.x86_64
> zlib-1.2.3-27.el6.x86_64
> (gdb) bt
> #0  0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
> #1  0x0000003b73675990 in _IO_str_init_static_internal () from
> /lib64/libc.so.6
> #2  0x0000003b73669935 in vsscanf () from /lib64/libc.so.6
> #3  0x0000003b736639a8 in sscanf () from /lib64/libc.so.6
> #4  0x0000000000622351 in perform_base_backup (opt=0x7fffc2e22300,
>     tblspcdir=0xd424c0) at basebackup.c:304
> #5  0x0000000000622c50 in SendBaseBackup (cmd=<value optimized out>)
>     at basebackup.c:558
> #6  0x000000000061f5b0 in HandleReplicationCommand () at walsender.c:482
> #7  WalSndHandshake () at walsender.c:257
> #8  WalSenderMain () at walsender.c:181
> #9  0x0000000000650b12 in PostgresMain (argc=1, argv=<value optimized out>,
>     dbname=0xc82a90 "", username=0xc82a70 "postgres") at postgres.c:3715
> #10 0x000000000060c4f1 in BackendRun () at postmaster.c:3614
> #11 BackendStartup () at postmaster.c:3304
> #12 ServerLoop () at postmaster.c:1367
> #13 0x000000000060f031 in PostmasterMain (argc=<value optimized out>,
>     argv=<value optimized out>) at postmaster.c:1127
> #14 0x00000000005ae140 in main (argc=5, argv=0xc80bb0) at main.c:199
>
>
>
>
> ./backend/replication/basebackup.c:304
>    XLogFromFileName(walFiles[0], &tli, &logid, &logseg);
>
>
> In this case, nWalFiles = 0 and walFiles[] palloced zero size.
>
>
> Though pg_basebackup does not have to work in this rare case,
> we should insert something like "if (nWalFiles <= 0) ereport(...);".

Yes, we definitely need better error checking there - a crash is never
the right answer.

Does this happen only when you take a backup "really quickly" after
setting up the new standby, or is there some scenario further in it's
lifetime when it can happen? In the first case, throwing a hard error
seems quite reasonable, but if it's repeatable, perhaps there is
something better we can do?

Also, while we definitely need a sanity check at this point, might it
be worth it to put a second check earlier in the process as well -
since AFAICT this error gets thrown only after all the data has been
sent arlready.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault

От
TAKATSUKA Haruka
Дата:
Thanks for the response.

On Sat, 24 Aug 2013 17:04:21 +0200
Magnus Hagander <magnus@hagander.net> wrote:

> > (1) create new standby server dir by pg_basebackup without -x
> > (2) start new standby server
> > (3) pg_basebackup from new standby server with -x
> > (!) when new standby has no WAL files in pg_xlog,
> >     new standby's wal sender crash
(snip)
> > Though pg_basebackup does not have to work in this rare case,
> > we should insert something like "if (nWalFiles <= 0) ereport(...);".
>
> Yes, we definitely need better error checking there - a crash is never
> the right answer.
>
> Does this happen only when you take a backup "really quickly" after
> setting up the new standby,

It's just this first case.
Therefore, we recognize that it is the problem of how to use.

regards,


> or is there some scenario further in it's
> lifetime when it can happen? In the first case, throwing a hard error
> seems quite reasonable, but if it's repeatable, perhaps there is
> something better we can do?
>
> Also, while we definitely need a sanity check at this point, might it
> be worth it to put a second check earlier in the process as well -
> since AFAICT this error gets thrown only after all the data has been
> sent arlready.
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/

______________________________________________________
  harukat@sraoss.co.jp    (SRA OSS, Inc.  http://www.sraoss.co.jp)

Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault

От
Magnus Hagander
Дата:
On Sun, Aug 25, 2013 at 9:05 AM, TAKATSUKA Haruka <harukat@sraoss.co.jp> wrote:
> Thanks for the response.
>
> On Sat, 24 Aug 2013 17:04:21 +0200
> Magnus Hagander <magnus@hagander.net> wrote:
>
>> > (1) create new standby server dir by pg_basebackup without -x
>> > (2) start new standby server
>> > (3) pg_basebackup from new standby server with -x
>> > (!) when new standby has no WAL files in pg_xlog,
>> >     new standby's wal sender crash
> (snip)
>> > Though pg_basebackup does not have to work in this rare case,
>> > we should insert something like "if (nWalFiles <= 0) ereport(...);".
>>
>> Yes, we definitely need better error checking there - a crash is never
>> the right answer.
>>
>> Does this happen only when you take a backup "really quickly" after
>> setting up the new standby,
>
> It's just this first case.
> Therefore, we recognize that it is the problem of how to use.

Yeah. Ok, for now I have the patch I applied yesterday that makes it
an error instead of a crash per your suggestion. And if I failed to
mention it, thanks for the report!


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/