Re: several problems in pg_receivexlog
От | Magnus Hagander |
---|---|
Тема | Re: several problems in pg_receivexlog |
Дата | |
Msg-id | CABUevEwrxKaz-ifrUXBdiw_UrUv8mS9sF_hEE0SEZF5YYvFodg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: several problems in pg_receivexlog (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: several problems in pg_receivexlog
(Fujii Masao <masao.fujii@gmail.com>)
|
Список | pgsql-hackers |
On Thu, Jul 12, 2012 at 6:07 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Jul 12, 2012 at 8:39 PM, Magnus Hagander <magnus@hagander.net> wrote: >> On Tue, Jul 10, 2012 at 7:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Tue, Jul 10, 2012 at 3:23 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> Hi, >>>> >>>> I found several problems in pg_receivexlog, e.g., memory leaks, >>>> file-descripter leaks, ..etc. The attached patch fixes these problems. >>>> >>>> ISTM there are still some other problems in pg_receivexlog, so I'll >>>> read it deeply later. >>> >>> While pg_basebackup background process is streaming WAL records, >>> if its replication connection is terminated (e.g., walsender in the server >>> is accidentally terminated by SIGTERM signal), pg_basebackup ends >>> up failing to include all required WAL files in the backup. The problem >>> is that, in this case, pg_basebackup doesn't emit any error message at all. >>> So an user might misunderstand that a base backup has been successfully >>> taken even though it doesn't include all required WAL files. >> >> Ouch. That is definitely a bug if it behaves that way. >> >> >>> To fix this problem, I think that, when the replication connection is >>> terminated, ReceiveXlogStream() should check whether we've already >>> reached the stop point by calling stream_stop() before returning TRUE. >>> If we've not yet (this means that we've not received all required WAL >>> files yet), ReceiveXlogStream() should return FALSE and >>> pg_basebackup should emit an error message. Comments? >> >> Doesn't it already return false because it detects the error of the >> connection? What's the codepath where we end up returning true even >> though we had a connection failure? Shouldn't that end up under the >> "could not read copy data" branch, which already returns false? > > You're right. If the error is detected, that function always returns false > and the error message is emitted (but I think that current error message > "pg_basebackup: child process exited with error 1" is confusing....), > so it's OK. But if walsender in the server is terminated by SIGTERM, > no error is detected and pg_basebackup background process gets out > of the loop in ReceiveXlogStream() and returns true. Oh. Because the server does a graceful shutdown. D'uh, of course. Then yes, your suggested fix seems like a good one. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
В списке pgsql-hackers по дате отправления: