Re: datfrozenxid not dropping after vacuum

Поиск
Список
Период
Сортировка
От Matthew Tice
Тема Re: datfrozenxid not dropping after vacuum
Дата
Msg-id CA+taBv8K+FWbh6gHHKZhxmXpa1ryOt7VAi5qxOaCa6OZVX3zKg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: datfrozenxid not dropping after vacuum  (Matthew Tice <mjtice@gmail.com>)
Список pgsql-general
Interestingly enough, I hopped on the database system this morning and
found the `datfrozenxid` dropped back down below
`autovacuum_freeze_max_age` around 0200 local time (roughly 18 hours
after the fact).

Looking through the Postgresql logs I don't see anything standing out
at that time.

I still plan on patching to 10.17 tonight.

Matt

On Wed, Sep 1, 2021 at 4:01 PM Matthew Tice <mjtice@gmail.com> wrote:
>
> Hi Alvaro, thanks for the quick reply.
>
> I'm scheduled to do my patching maintenance at the end of this month -
> but at this point I don't think I'm going to make it.
>
> Other than patching, is there a work around?  For example, in #2 above:
> >The fix for 2) is simpler,
> >    simply always remove both the shared and local init files.
>
> I'm not familiar with the differences between 'shared' and 'local'
> init files (I'd imagine I referenced a 'local' file in my original
> post)?
>
>
>
> Thanks!
>
> Matt
>
> On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2021-Sep-01, Matthew Tice wrote:
> >
> > [ problem table is pg_database ]
> >
> > > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> > > standby databases have been patched to 10.17.
> >
> > Hmm, I think there was a bug in the early 10.x versions where advancing
> > the xid age of shared tables would not work correctly for some reason ...
> > Ah yes, this was fixed in 10.5, a mere three years ago:
> >
> > Author: Andres Freund <andres@anarazel.de>
> > Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
> > Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700
> > Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700
> > Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700
> > Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700
> > Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700
> >
> >     Fix bugs in vacuum of shared rels, by keeping their relcache entries current.
> >
> >     When vacuum processes a relation it uses the corresponding relcache
> >     entry's relfrozenxid / relminmxid as a cutoff for when to remove
> >     tuples etc. Unfortunately for nailed relations (i.e. critical system
> >     catalogs) bugs could frequently lead to the corresponding relcache
> >     entry being stale.
> >
> >     This set of bugs could cause actual data corruption as vacuum would
> >     potentially not remove the correct row versions, potentially reviving
> >     them at a later point.  After 699bf7d05c some corruptions in this vein
> >     were prevented, but the additional error checks could also trigger
> >     spuriously. Examples of such errors are:
> >       ERROR: found xmin ... from before relfrozenxid ...
> >     and
> >       ERROR: found multixact ... from before relminmxid ...
> >     To be caused by this bug the errors have to occur on system catalog
> >     tables.
> >
> >     The two bugs are:
> >
> >     1) Invalidations for nailed relations were ignored, based on the
> >        theory that the relcache entry for such tables doesn't
> >        change. Which is largely true, except for fields like relfrozenxid
> >        etc.  This means that changes to relations vacuumed in other
> >        sessions weren't picked up by already existing sessions.  Luckily
> >        autovacuum doesn't have particularly longrunning sessions.
> >
> >     2) For shared *and* nailed relations, the shared relcache init file
> >        was never invalidated while running.  That means that for such
> >        tables (e.g. pg_authid, pg_database) it's not just already existing
> >        sessions that are affected, but even new connections are as well.
> >        That explains why the reports usually were about pg_authid et. al.
> >
> >     To fix 1), revalidate the rd_rel portion of a relcache entry when
> >     invalid. This implies a bit of extra complexity to deal with
> >     bootstrapping, but it's not too bad.  The fix for 2) is simpler,
> >     simply always remove both the shared and local init files.
> >
> >     Author: Andres Freund
> >     Reviewed-By: Alvaro Herrera
> >     Discussion:
> >         https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
> >         https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
> >         https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
> >         https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
> >     Backpatch: 9.3-
> >
> >
> > --
> > Álvaro Herrera           39°49'30"S 73°17'W
> > "El número de instalaciones de UNIX se ha elevado a 10,
> > y se espera que este número aumente" (UPM, 1972)



В списке pgsql-general по дате отправления:

Предыдущее
От: Laurenz Albe
Дата:
Сообщение: Re: No xmin in pg_database
Следующее
От: Charles Paperman
Дата:
Сообщение: jsonpath duplication result