Re: Make mesage at end-of-recovery less scary.

Поиск
Список
Период
Сортировка
От James Coleman
Тема Re: Make mesage at end-of-recovery less scary.
Дата
Msg-id CAAaqYe88ENQp=ksG4c1J-Mi1axM5dtO+48x8tC3afE-_Z_qFSw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Make mesage at end-of-recovery less scary.  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Thu, Mar 26, 2020 at 12:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 25, 2020 at 8:53 AM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > HINT:  This is to be expected if this is the end of the WAL.  Otherwise,
> > it could indicate corruption.
>
> First, I agree that this general issue is a problem, because it's come
> up for me in quite a number of customer situations. Either people get
> scared when they shouldn't, because the message is innocuous, or they
> don't get scared about other things that actually are scary, because
> if some scary-looking messages are actually innocuous, it can lead
> people to believe that the same is true in other cases.
>
> Second, I don't really like the particular formulation you have above,
> because the user still doesn't know whether or not to be scared. Can
> we figure that out? I think if we're in crash recovery, I think that
> we should not be scared, because we have no alternative to assuming
> that we've reached the end of WAL, so all crash recoveries will end
> like this. If we're in archive recovery, we should definitely be
> scared if we haven't yet reached the minimum recovery point, because
> more WAL than that should certainly exist. After that, it depends on
> how we got the WAL. If it's being streamed, the question is whether
> we've reached the end of what got streamed. If it's being copied from
> the archive, we ought to have the whole segment, but maybe not more.
> Can we get the right context to the point where the error is being
> reported to know whether we hit the error at the end of the WAL that
> was streamed? If not, can we somehow rejigger things so that we only
> make it sound scary if we keep getting stuck at the same point when we
> woud've expected to make progress meanwhile?
>
> I'm just spitballing here, but it would be really good if there's a
> way to know definitely whether or not you should be scared. Corrupted
> WAL segments are definitely a thing that happens, but retries are a
> lot more common.

First, I agree that getting enough context to say precisely is by far the ideal.

That being said, as an end user who's found this surprising -- and
momentarily scary every time I initially scan it even though I *know
intellectually it's not* -- I would find Peter's suggestion a
significant improvement over what we have now. I'm fairly certainly my
co-workers on our database team would also. Knowing that something is
at least not always scary is good. Though I'll grant that this does
have the negative in reverse: if it actually is a scary
situation...this mutes your concern level. On the other hand,
monitoring would tell us if there's a real problem (namely replication
lag), so I think the trade-off is clearly worth it.

How about this minor tweak:
HINT:  This is expected if this is the end of currently available WAL.
Otherwise, it could indicate corruption.

Thanks,
James



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: pgbench - refactor init functions with buffers
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: [PATCH] Incremental sort (was: PoC: Partial sort)