Обсуждение: Re: Your review of pg_receivexlog/pg_basebackup

Поиск
Список
Период
Сортировка

Re: Your review of pg_receivexlog/pg_basebackup

От
Heikki Linnakangas
Дата:
(CC'ing pgsql-hackers, this started as an IM discussion yesterday but 
really belongs in the archives)

On 25.10.2011 23:52, Magnus Hagander wrote:
>> There's a tiny chance to get incomplete xlog files with pg_receivexlog if you crash:
>> 1. pg_receivexlog finishes write()ing a file but system crashes before fsync() finishes.
>> 2. When pg_receivexlog restarts after crash, the last WAL file was not fully flushed to disk, with
>> holes in the middle, but it has the right length. pg_receivexlog will continue streaming from the next file.
>> not sure if we care about such a narrow window, but maybe we do
>
> So how would we go about fixing that?  Always unlink the last file in
> the directory and try from there would seem dangerous too - what if
> it's not available on the master anymore, then we might have given up
> on data...

Start streaming from the beginning of the last segment, but don't unlink 
it first. Just overwrite it as you receive the data.

Or, always create new xlog file as "0000000100000001000000D3.partial", 
and only when it's fully written, fsync it, and then rename it to 
"0000000100000001000000D3". Then you know that if a file doesn't have 
the .partial suffix, it's complete. The fact that the last partial file 
always has the .partial suffix needs some extra pushups in the 
restore_command, though.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Your review of pg_receivexlog/pg_basebackup

От
Magnus Hagander
Дата:
On Wed, Oct 26, 2011 at 09:52, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> (CC'ing pgsql-hackers, this started as an IM discussion yesterday but really
> belongs in the archives)
>
> On 25.10.2011 23:52, Magnus Hagander wrote:
>>>
>>> There's a tiny chance to get incomplete xlog files with pg_receivexlog if
>>> you crash:
>>> 1. pg_receivexlog finishes write()ing a file but system crashes before
>>> fsync() finishes.
>>> 2. When pg_receivexlog restarts after crash, the last WAL file was not
>>> fully flushed to disk, with
>>> holes in the middle, but it has the right length. pg_receivexlog will
>>> continue streaming from the next file.
>>> not sure if we care about such a narrow window, but maybe we do
>>
>> So how would we go about fixing that?  Always unlink the last file in
>> the directory and try from there would seem dangerous too - what if
>> it's not available on the master anymore, then we might have given up
>> on data...
>
> Start streaming from the beginning of the last segment, but don't unlink it
> first. Just overwrite it as you receive the data.
>
> Or, always create new xlog file as "0000000100000001000000D3.partial", and
> only when it's fully written, fsync it, and then rename it to
> "0000000100000001000000D3". Then you know that if a file doesn't have the
> .partial suffix, it's complete. The fact that the last partial file always
> has the .partial suffix needs some extra pushups in the restore_command,
> though.

Here's a version that does this. Turns out this requires a lot less
code than what was previously in there, which is always nice.

We still need to solve the other part which is how to deal with the
partial files on restore. But this is definitely a cleaner way from a
pure pg_receivexlog perspective.

Comments/reviews?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Вложения

Re: Your review of pg_receivexlog/pg_basebackup

От
Fujii Masao
Дата:
On Thu, Oct 27, 2011 at 11:14 PM, Magnus Hagander <magnus@hagander.net> wrote:
> Here's a version that does this. Turns out this requires a lot less
> code than what was previously in there, which is always nice.
>
> We still need to solve the other part which is how to deal with the
> partial files on restore. But this is definitely a cleaner way from a
> pure pg_receivexlog perspective.
>
> Comments/reviews?

Looks good.

Minor comment:
the source code comment of FindStreamingStart() seems to need to be updated.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Your review of pg_receivexlog/pg_basebackup

От
Magnus Hagander
Дата:
On Fri, Oct 28, 2011 at 08:46, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Oct 27, 2011 at 11:14 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> Here's a version that does this. Turns out this requires a lot less
>> code than what was previously in there, which is always nice.
>>
>> We still need to solve the other part which is how to deal with the
>> partial files on restore. But this is definitely a cleaner way from a
>> pure pg_receivexlog perspective.
>>
>> Comments/reviews?
>
> Looks good.
>
> Minor comment:
> the source code comment of FindStreamingStart() seems to need to be updated.

Here's an updated patch that both includes this update to the comment,
and also the functionality to pre-pad files to 16Mb. This also seems
to have simplified the code, which is a nice bonus.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Вложения

Re: Your review of pg_receivexlog/pg_basebackup

От
Fujii Masao
Дата:
On Tue, Nov 1, 2011 at 3:08 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Fri, Oct 28, 2011 at 08:46, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Thu, Oct 27, 2011 at 11:14 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> Here's a version that does this. Turns out this requires a lot less
>>> code than what was previously in there, which is always nice.
>>>
>>> We still need to solve the other part which is how to deal with the
>>> partial files on restore. But this is definitely a cleaner way from a
>>> pure pg_receivexlog perspective.
>>>
>>> Comments/reviews?
>>
>> Looks good.
>>
>> Minor comment:
>> the source code comment of FindStreamingStart() seems to need to be updated.
>
> Here's an updated patch that both includes this update to the comment,
> and also the functionality to pre-pad files to 16Mb. This also seems
> to have simplified the code, which is a nice bonus.

Here are the comments:

In open_walfile(), "zerobuf" needs to be free'd after use of it.

+    f = open(fn, O_WRONLY | O_CREAT | PG_BINARY, 0666);

We should use "S_IRUSR | S_IWUSR" instead of "0666" as a file access modes?

+        if (write(f, zerobuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
+        {
+            fprintf(stderr, _("%s: could not pad WAL segment %s: %s\n"),
+                    progname, fn, strerror(errno));
+            close(f);
+            return -1;
+        }

When write() fails, we should delete the partial WAL file, like
XLogFileInit() does?
If not, subsequent pg_receivexlog always fails unless a user deletes
it manually.
Because open_walfile() always fails when it finds an existing partial WAL file.

When open_walfile() fails, pg_receivexlog exits without closing the connection.
I don't think this is good error handling. But this issue itself is
not what we're
trying to address now. So I think you can commit separately from current patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Your review of pg_receivexlog/pg_basebackup

От
Magnus Hagander
Дата:
On Tue, Nov 1, 2011 at 05:53, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Nov 1, 2011 at 3:08 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Fri, Oct 28, 2011 at 08:46, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Thu, Oct 27, 2011 at 11:14 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> Here's a version that does this. Turns out this requires a lot less
>>>> code than what was previously in there, which is always nice.
>>>>
>>>> We still need to solve the other part which is how to deal with the
>>>> partial files on restore. But this is definitely a cleaner way from a
>>>> pure pg_receivexlog perspective.
>>>>
>>>> Comments/reviews?
>>>
>>> Looks good.
>>>
>>> Minor comment:
>>> the source code comment of FindStreamingStart() seems to need to be updated.
>>
>> Here's an updated patch that both includes this update to the comment,
>> and also the functionality to pre-pad files to 16Mb. This also seems
>> to have simplified the code, which is a nice bonus.
>
> Here are the comments:
>
> In open_walfile(), "zerobuf" needs to be free'd after use of it.

Ooops, yes.


> +       f = open(fn, O_WRONLY | O_CREAT | PG_BINARY, 0666);
>
> We should use "S_IRUSR | S_IWUSR" instead of "0666" as a file access modes?

Agreed, changed.


> +               if (write(f, zerobuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
> +               {
> +                       fprintf(stderr, _("%s: could not pad WAL segment %s: %s\n"),
> +                                       progname, fn, strerror(errno));
> +                       close(f);
> +                       return -1;
> +               }
>
> When write() fails, we should delete the partial WAL file, like
> XLogFileInit() does?

Yes, that's probably a good idae. Added a simple unlink() call
directly after the close().

> If not, subsequent pg_receivexlog always fails unless a user deletes
> it manually.
> Because open_walfile() always fails when it finds an existing partial WAL file.
>
> When open_walfile() fails, pg_receivexlog exits without closing the connection.
> I don't think this is good error handling. But this issue itself is
> not what we're
> trying to address now. So I think you can commit separately from current patch.

Wow, when looking into that, there was a nice bug in open_walfile -
when the file failed to open, it would write that error message but
not return - then proceed to write a second error message from fstat.
Oops.

Anyway - yes, the return value of ReceiveXLogStream isn't checked at
all. That's certainly not very nice. I'll go fix that too.

I'll apply the patch with the fixes you've mentioned above. Please
check master again in a few minutes. Thanks!


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/