Обсуждение: [HACKERS] Reversed sync check in pg_receivewal

Поиск
Список
Период
Сортировка

[HACKERS] Reversed sync check in pg_receivewal

От
Magnus Hagander
Дата:
This bug seems to have snuck in there with the introduction of walmethods. AFAICT we are testing the result of sync() backwards, so whenever a partial segment exists for pg_receivewal, it will fail. It will then unlink the file, so when it retries 5 seconds later it works.

It also doesn't log the failure. Oops.

Attached patch reverses the check, and adds a failure message. I'd appreciate a quick review in case I have the logic backwards in my head...

--
Вложения

Re: [HACKERS] Reversed sync check in pg_receivewal

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> Attached patch reverses the check, and adds a failure message. I'd
> appreciate a quick review in case I have the logic backwards in my head...

I think the patch is correct, but if there's any documentation of the
walmethod APIs that would allow one to assert which side of the API got
this wrong, I sure don't see it.  Would it be unreasonable to insist
on some documentation around that?
        regards, tom lane



Re: [HACKERS] Reversed sync check in pg_receivewal

От
Michael Paquier
Дата:
On Tue, Apr 11, 2017 at 9:41 PM, Magnus Hagander <magnus@hagander.net> wrote:
> This bug seems to have snuck in there with the introduction of walmethods.
> AFAICT we are testing the result of sync() backwards, so whenever a partial
> segment exists for pg_receivewal, it will fail. It will then unlink the
> file, so when it retries 5 seconds later it works.
>
> It also doesn't log the failure. Oops.
>
> Attached patch reverses the check, and adds a failure message. I'd
> appreciate a quick review in case I have the logic backwards in my head...

This has been fat-fingered in 56c7d8d4, and looking around I am not
seeing similar mistakes. Thanks for the fix.
-- 
Michael



Re: [HACKERS] Reversed sync check in pg_receivewal

От
Aleksander Alekseev
Дата:
Hi Magnus,

> Attached patch reverses the check, and adds a failure message. I'd
> appreciate a quick review in case I have the logic backwards in my head...

Well, I can state that `make check-world` passes on my laptop and that
code seems to be right. However documentation to WalWriteMethod could
probably be improved. Currently there is none.

--
Best regards,
Aleksander Alekseev

Re: [HACKERS] Reversed sync check in pg_receivewal

От
Magnus Hagander
Дата:
On Tue, Apr 11, 2017 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> Attached patch reverses the check, and adds a failure message. I'd
> appreciate a quick review in case I have the logic backwards in my head...

I think the patch is correct, but if there's any documentation of the
walmethod APIs that would allow one to assert which side of the API got
this wrong, I sure don't see it.  Would it be unreasonable to insist
on some documentation around that?


Agreed.

Would you say comments in the struct in walmethods.h is enough, or were you thinking actual sgml docs when you commented that? 


--

Re: [HACKERS] Reversed sync check in pg_receivewal

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Apr 11, 2017 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the patch is correct, but if there's any documentation of the
>> walmethod APIs that would allow one to assert which side of the API got
>> this wrong, I sure don't see it.  Would it be unreasonable to insist
>> on some documentation around that?

> Would you say comments in the struct in walmethods.h is enough, or were you
> thinking actual sgml docs when you commented that?

This is just internal to pg_basebackup isn't it?  I think comments in
walmethods.h would be plenty.
        regards, tom lane



Re: [HACKERS] Reversed sync check in pg_receivewal

От
Magnus Hagander
Дата:
On Tue, Apr 11, 2017 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Apr 11, 2017 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the patch is correct, but if there's any documentation of the
>> walmethod APIs that would allow one to assert which side of the API got
>> this wrong, I sure don't see it.  Would it be unreasonable to insist
>> on some documentation around that?

> Would you say comments in the struct in walmethods.h is enough, or were you
> thinking actual sgml docs when you commented that?

This is just internal to pg_basebackup isn't it?  I think comments in
walmethods.h would be plenty.

Local to pg_basebackup and pg_receivewal, yes.

Something like the attached? 

--
Вложения

Re: [HACKERS] Reversed sync check in pg_receivewal

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> Something like the attached?

Not sure about

+ * All methods that have a failure path will set errno on failure.

Given that you've got a getlasterror method, I don't think that's really
the API invariant is it?  If it were, you'd just have the callers doing
strerror(errno) for themselves.  I think maybe a more accurate statement
would be
* All methods that have a failure return indicator will set state* allowing the getlasterror() method to return a
suitablemessage.* Commonly, errno is this state (or part of it); so callers must take* care not to clobber errno
betweena failed method call and use of* getlasterror() to retrieve the message.
 

Also:

* the arguments of open_for_write() could stand to be explained

* what's the return value of finish()?

* wouldn't it be better to declare getlasterror() as returning "const char *"?
        regards, tom lane



Re: [HACKERS] Reversed sync check in pg_receivewal

От
Magnus Hagander
Дата:


On Tue, Apr 11, 2017 at 6:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> Something like the attached?

Not sure about

+ * All methods that have a failure path will set errno on failure.

Given that you've got a getlasterror method, I don't think that's really
the API invariant is it?  If it were, you'd just have the callers doing
strerror(errno) for themselves.  I think maybe a more accurate statement
would be

Hmm. You're right, this is what I get for rushing a patch before plane departure :/
 
 * All methods that have a failure return indicator will set state
 * allowing the getlasterror() method to return a suitable message.
 * Commonly, errno is this state (or part of it); so callers must take
 * care not to clobber errno between a failed method call and use of
 * getlasterror() to retrieve the message.


Yeah, that's much better.

 
Also:

* the arguments of open_for_write() could stand to be explained

* what's the return value of finish()?

Both fixed.
 

* wouldn't it be better to declare getlasterror() as returning
  "const char *"?

Yeah, probably is. Will do and push.

Thanks!
 
--