Re: snapshot too old issues, first around wraparound and then more.

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: snapshot too old issues, first around wraparound and then more.
Дата
Msg-id 20200401211111.4gya2pu4iqqw4suc@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: snapshot too old issues, first around wraparound and then more.  (Kevin Grittner <kgrittn@gmail.com>)
Ответы Re: snapshot too old issues, first around wraparound and then more.
Список pgsql-hackers
Hi,

On 2020-04-01 15:11:52 -0500, Kevin Grittner wrote:
> On Wed, Apr 1, 2020 at 2:43 PM Andres Freund <andres@anarazel.de> wrote:
>
> > The thing that makes me really worried is that the contents of the time
> > mapping seem very wrong. I've reproduced query results in a REPEATABLE
> > READ transaction changing (pruned without triggering an error).
>
>
> That is a very big problem.  On the sort-of bright side (ironic in light of
> the fact that I'm a big proponent of using serializable transactions), none
> of the uses that I have personally seen of this feature use anything other
> than the default READ COMMITTED isolation level.  That might help explain
> the lack of complaints for those using the feature.  But yeah, I REALLY
> want to see a solid fix for that!

I don't think it's dependent on RR - it's just a bit easier to verify
that the query results are wrong that way.


> > And I've
> > reproduced rows not getting removed for much longer than than they
> > should, according to old_snapshot_threshold.
> >
> > I suspect one reason for users not noticing either is that
> >
> > a) it's plausible that users of the feature would mostly have
> >   long-running queries/transactions querying immutable or insert only
> >   data. Those would not notice that, on other tables, rows are getting
> >   removed, where access would not trigger the required error.
> >
> > b) I observe long-ish phases were no cleanup is happening (due to
> >   oldSnapshotControl->head_timestamp getting updated more often than
> >   correct). But if old_snapshot_threshold is small enough in relation to
> >   the time the generated bloat becomes problematic, there will still be
> >   occasions to actually perform cleanup.
> >
>
> Keep in mind that the real goal of this feature is not to eagerly _see_
> "snapshot too old" errors, but to prevent accidental debilitating bloat due
> to one misbehaving user connection.

I don't think it's an "intentional" inaccuracy issue leading to
this. The map contents are just wrong, in particular the head_timestamp
most of the time is so new that
TransactionIdLimitedForOldSnapshots(). When filling a new bucket,
MaintainOldSnapshotThreshold() unconditionally updates
oldSnapshotControl->head_timestamp to be the current minute, which means
it'll take old_snapshot_threshold minutes till
TransactionIdLimitedForOldSnapshots() even looks at the mapping again.

As far as I can tell, with a large old_snapshot_threshold, it can take a
very long time to get to a head_timestamp that's old enough for
TransactionIdLimitedForOldSnapshots() to do anything.  Look at this
trace of a pgbench run with old_snapshot_threshold enabled, showing some of
the debugging output added in the patch upthread.

This is with a threshold of 10min, in a freshly started database:
> 2020-04-01 13:49:00.000 PDT [1268502][2/43571:2068881994] WARNING:  head 0 min: filling 1 buckets starting at 0 for
whenTaken1 min, with xmin 2068881994
 
> 2020-04-01 13:49:00.000 PDT [1268502][2/43571:2068881994] WARNING:  old snapshot mapping at "after update" with head
ts:1, current entries: 2 max entries: 20, offset: 0
 
>       entry 0 (ring 0): min 1: xid 2068447214
>       entry 1 (ring 1): min 2: xid 2068881994
>
> 2020-04-01 13:50:00.000 PDT [1268505][5/122542:0] WARNING:  old snapshot mapping at "before update" with head ts: 1,
currententries: 2 max entries: 20, offset: 0
 
>       entry 0 (ring 0): min 1: xid 2068447214
>       entry 1 (ring 1): min 2: xid 2068881994
>
> 2020-04-01 13:50:00.000 PDT [1268505][5/122542:0] WARNING:  head 1 min: updating existing bucket 1 for whenTaken 2
min,with xmin 2069199511
 
> 2020-04-01 13:50:00.000 PDT [1268505][5/122542:0] WARNING:  old snapshot mapping at "after update" with head ts: 1,
currententries: 2 max entries: 20, offset: 0
 
>       entry 0 (ring 0): min 1: xid 2068447214
>       entry 1 (ring 1): min 2: xid 2069199511
>
> 2020-04-01 13:51:00.000 PDT [1268502][2/202674:2069516501] WARNING:  old snapshot mapping at "before update" with
headts: 1, current entries: 2 max entries: 20, offset: 0
 
>       entry 0 (ring 0): min 1: xid 2068447214
>       entry 1 (ring 1): min 2: xid 2069199511
>
> 2020-04-01 13:51:00.000 PDT [1268502][2/202674:2069516501] WARNING:  head 1 min: filling 2 buckets starting at 0 for
whenTaken3 min, with xmin 2069516499
 
> 2020-04-01 13:51:00.000 PDT [1268502][2/202674:2069516501] WARNING:  old snapshot mapping at "after update" with head
ts:3, current entries: 4 max entries: 20, offset: 0
 
>       entry 0 (ring 0): min 3: xid 2068447214
>       entry 1 (ring 1): min 4: xid 2069199511
>       entry 2 (ring 2): min 5: xid 2069516499
>       entry 3 (ring 3): min 6: xid 2069516499
> ...
> 2020-04-01 14:03:00.000 PDT [1268504][4/1158832:2075918094] WARNING:  old snapshot mapping at "before update" with
headts: 7, current entries: 8 max entries: 20, offset: 0
 
>       entry 0 (ring 0): min 7: xid 2068447214
>       entry 1 (ring 1): min 8: xid 2071112480
>       entry 2 (ring 2): min 9: xid 2071434473
>       entry 3 (ring 3): min 10: xid 2071755177
>       entry 4 (ring 4): min 11: xid 2072075827
>       entry 5 (ring 5): min 12: xid 2072395700
>       entry 6 (ring 6): min 13: xid 2072715464
>       entry 7 (ring 7): min 14: xid 2073035816

Before the mapping change the database had been running for 15
minutes. But the mapping starts only at 7 minutes from start. And then
is updated to

> 2020-04-01 14:03:00.000 PDT [1268504][4/1158832:2075918094] WARNING:  head 7 min: filling 8 buckets starting at 0 for
whenTaken15 min, with xmin 2075918093
 
> 2020-04-01 14:03:00.000 PDT [1268504][4/1158832:2075918094] WARNING:  old snapshot mapping at "after update" with
headts: 15, current entries: 16 max entries: 20, offset: 0
 
>       entry 0 (ring 0): min 15: xid 2068447214
>       entry 1 (ring 1): min 16: xid 2071112480
>       entry 2 (ring 2): min 17: xid 2071434473
>       entry 3 (ring 3): min 18: xid 2071755177
>       entry 4 (ring 4): min 19: xid 2072075827
>       entry 5 (ring 5): min 20: xid 2072395700
>       entry 6 (ring 6): min 21: xid 2072715464
>       entry 7 (ring 7): min 22: xid 2073035816
>       entry 8 (ring 8): min 23: xid 2075918093
>       entry 9 (ring 9): min 24: xid 2075918093
>       entry 10 (ring 10): min 25: xid 2075918093
>       entry 11 (ring 11): min 26: xid 2075918093
>       entry 12 (ring 12): min 27: xid 2075918093
>       entry 13 (ring 13): min 28: xid 2075918093
>       entry 14 (ring 14): min 29: xid 2075918093
>       entry 15 (ring 15): min 30: xid 2075918093

be considered having started in that moment. And we expand the size of
the mapping by 8 at the same time, filling the new buckets with the same
xid. Despite there being a continuous workload.

After a few more minutes we get:
> 2020-04-01 14:07:00.000 PDT [1268503][3/1473617:2077202085] WARNING:  head 15 min: updating existing bucket 4 for
whenTaken19 min, with xmin 2077202085
 
> 2020-04-01 14:07:00.000 PDT [1268503][3/1473617:2077202085] WARNING:  old snapshot mapping at "after update" with
headts: 15, current entries: 16 max entries: 20, offset: 0
 
>       entry 0 (ring 0): min 15: xid 2068447214
>       entry 1 (ring 1): min 16: xid 2076238895
>       entry 2 (ring 2): min 17: xid 2076559154
>       entry 3 (ring 3): min 18: xid 2076880731
>       entry 4 (ring 4): min 19: xid 2077202085
>       entry 5 (ring 5): min 20: xid 2072395700
>       entry 6 (ring 6): min 21: xid 2072715464
>       entry 7 (ring 7): min 22: xid 2073035816
>       entry 8 (ring 8): min 23: xid 2075918093
>       entry 9 (ring 9): min 24: xid 2075918093
>       entry 10 (ring 10): min 25: xid 2075918093
>       entry 11 (ring 11): min 26: xid 2075918093
>       entry 12 (ring 12): min 27: xid 2075918093
>       entry 13 (ring 13): min 28: xid 2075918093
>       entry 14 (ring 14): min 29: xid 2075918093
>       entry 15 (ring 15): min 30: xid 2075918093
>

Note how the xids are not monotonically ordered. And how IsLimited still
won't be able to make use of the mapping, as the head timestamp is only
4 minutes old (whenTaken == 19 min, head == 15min).

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: backup manifests
Следующее
От: James Coleman
Дата:
Сообщение: Proposal: Expose oldest xmin as SQL function for monitoring