Re: Backpatch b61d161c14

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Backpatch b61d161c14
Дата
Msg-id 20200622200939.jkuniq3vtiumeszj@alap3.anarazel.de
обсуждение исходный текст
Ответ на Backpatch b61d161c14  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
Hi,

On 2020-06-22 10:35:47 +0530, Amit Kapila wrote:
> I propose to backpatch b61d161c14 [1] (Introduce vacuum errcontext to
> display additional information.).  In the recent past, we have seen an
> error report similar to "ERROR: found xmin 2157740646 from before
> relfrozenxid 1197" from multiple EDB customers.  A similar report is
> seen on pgsql-bugs as well [2] which I think has triggered the
> implementation of this feature for v13.  Such reports mostly indicate
> database corruption rather than any postgres bug which is also
> indicated by the error-code (from before relfrozenxid) for this
> message. I think there is a good reason to back-patch this as multiple
> users are facing similar issues.  This patch won't fix this issue but
> it will help us in detecting the problematic part of the heap/index
> and then if users wish they can delete the portion of data that
> appeared to be corrupted and resume the operations on that relation.
> 
> I have tried to back-patch this for v12 and attached is the result.
> The attached patch passes make check-world but I have yet to test it
> manually and also prepare the patch for other branches once we agree
> on this proposal.

I think having the additional information in the back branches would be
good. But on the other hand I think this is a somewhat large change
to backpatch, and it hasn't yet much real world exposure.

I'm also uncomfortable with the approach of just copying all of
LVRelStats in several places:

>  /*
> @@ -1580,9 +1648,15 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
>      int            uncnt = 0;
>      TransactionId visibility_cutoff_xid;
>      bool        all_frozen;
> +    LVRelStats    olderrinfo;
>  
>      pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
>  
> +    /* Update error traceback information */
> +    olderrinfo = *vacrelstats;
> +    update_vacuum_error_info(vacrelstats, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> +                             blkno, NULL);
> +
>      START_CRIT_SECTION();
>  
>      for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
> @@ -1659,6 +1733,11 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
>                                *vmbuffer, visibility_cutoff_xid, flags);
>      }
>  
> +    /* Revert to the previous phase information for error traceback */
> +    update_vacuum_error_info(vacrelstats,
> +                             olderrinfo.phase,
> +                             olderrinfo.blkno,
> +                             olderrinfo.indname);
>      return tupindex;
>  }

To me that's a very weird approach. It's fragile because we need to be
sure that there's no updates to the wrong LVRelStats for important
things, and it has a good bit of potential to be inefficient because
LVRelStats isn't exactly small. This pretty much relies on the compiler
doing good enough escape analysis to realize that most parts of
olderrinfo aren't touched.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: More tzdb fun: POSIXRULES is being deprecated upstream
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk