Re: improving wraparound behavior

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: improving wraparound behavior
Дата
Msg-id CA+TgmoY=j-0r+d9PnyD3FAMXOnhmhpsnKBZiwSVyDQofE3rNag@mail.gmail.com
обсуждение исходный текст
Ответ на Re: improving wraparound behavior  (Andres Freund <andres@anarazel.de>)
Ответы Re: improving wraparound behavior  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Fri, May 3, 2019 at 4:47 PM Andres Freund <andres@anarazel.de> wrote:
> I'd actually say the proper fix would be to instead move the truncation
> to *after* finishing updating relfrozenxid etc.  If we truncate, the
> additional cost of another in-place pg_class update, to update relpages,
> is basically insignificant.  And the risk of errors, or being cancelled,
> during truncation is much higher than before (due to the AEL).

That would prevent the ERROR from impeding relfrozenxid advancement,
but it does not prevent the error itself, nor the XID consumption.  If
autovacuum is hitting that ERROR, it will spew errors in the log but
succeed in advancing relfrozenxid anyway.  I don't think that's as
nice as the behavior I proposed, but it's certainly better than the
status quo.  If you are hitting that error due to a manual VACUUM,
under your proposal, you'll stop the manual VACUUM as soon as you hit
the first table where this happens, which is not what you want.
You'll also keep consuming XIDs, which is not what you want either,
especially if you are in single-user mode because the number of
remaining XIDs is less that a million.

> > Also, I think that old prepared transactions and stale replication
> > slots should be emphasized more prominently.  Maybe something like:
> >
> > HINT:  Commit or roll back old prepared transactions, drop stale
> > replication slots, or kill long-running sessions.
> > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.
>
> I think it'd be good to instead compute what the actual problem is. It'd
> not be particularly hard to show some these in the errdetail:
>
> 1) the xid horizon (xid + age) of the problematic database; potentially,
>    if connected to that database, additionally compute what the oldest
>    xid is (although that's computationally potentially too expensive)
> 2) the xid horizon (xid + age) due to prepared transactions, and the
>    oldest transaction's name
> 3) the xid horizon (xid + age) due to replication slot, and the "oldest"
>    slot's name
> 4) the xid horizon (xid + age) and pid for the connection with the
>    oldest snapshot.
>
> I think that'd allow users much much easier to pinpoint what's going on.
>
> In fact, I think we probably should additionally add a function that can
> display the above. That'd make it much easier to write monitoring
> queries.

I think that the error and hint that you get from
GetNewTransactionId() has to be something that we can generate very
quickly, without doing anything that might hang on cluster with lots
of databases or lots of relations; but if there's useful detail we can
display there, that's good.  With a view, it's more OK if it takes a
long time on a big cluster.

> IMO we also ought to compute the *actual* relfrozenxid/relminmxid for a
> table. I.e. the oldest xid actually present. It's pretty common for most
> tables to have effective horizons that are much newer than what
> GetOldestXmin()/vacuum_set_xid_limits() can return. Obviously we can do
> so only when scanning all non-frozen pages. But being able to record
> "more aggressive" horizons would often save unnecessary work.  And it
> ought to not be hard.  I think especially for regular non-freeze,
> non-wraparound vacuums that'll often result in a much newer relfrozenxid
> (as we'll otherwise just have GetOldestXmin() - vacuum_freeze_min_age).

Sure, that would make sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: First-draft release notes for back branches are up
Следующее
От: Tom Lane
Дата:
Сообщение: Re: improving wraparound behavior