Re: pgsql: Compute XID horizon for page level index vacuum on primary.

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: pgsql: Compute XID horizon for page level index vacuum on primary.
Дата
Msg-id CANP8+jLEWNQX9oW0RQPPvOXFOh3zEBUdC62QWZ2GLNkeZmXnPA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pgsql: Compute XID horizon for page level index vacuum onprimary.  (Andres Freund <andres@anarazel.de>)
Ответы Re: pgsql: Compute XID horizon for page level index vacuum onprimary.  (Andres Freund <andres@anarazel.de>)
Re: pgsql: Compute XID horizon for page level index vacuum onprimary.  (Andres Freund <andres@anarazel.de>)
Список pgsql-committers
On Fri, 29 Mar 2019 at 15:29, Andres Freund <andres@anarazel.de> wrote:
 
On 2019-03-29 09:37:11 +0000, Simon Riggs wrote:
 
> While trying to understand this, I see there is an even better way to
> optimize this. Since we are removing dead index tuples, we could alter the
> killed index tuple interface so that it returns the xmax of the tuple being
> marked as killed, rather than just a boolean to say it is dead.

Wouldn't that quite possibly result in additional and unnecessary
conflicts? Right now the page level horizon is computed whenever the
page is actually reused, rather than when an item is marked as
deleted. As it stands right now, the computed horizons are commonly very
"old", because of that delay, leading to lower rates of conflicts.

I wasn't suggesting we change when the horizon is calculated, so no change there.

The idea was to cache the data for later use, replacing the hint bit with a hint xid.

That won't change the rate of conflicts, up or down - but it does avoid I/O.
 
> Indexes can then mark the killed tuples with the xmax that killed them
> rather than just a hint bit. This is possible since the index tuples
> are dead and cannot be used to follow the htid to the heap, so the
> htid is redundant and so the block number of the tid could be
> overwritten with the xmax, zeroing the itemid. Each killed item we
> mark with its xmax means one less heap fetch we need to perform when
> we delete the page - it's possible we optimize that away completely by
> doing this.

That's far from a trivial feature imo. It seems quite possible that we'd
end up with increased overhead, because the current logic can get away
with only doing hint bit style writes - but would that be true if we
started actually replacing the item pointers? Because I don't see any
guarantee they couldn't cross a page boundary etc? So I think we'd need
to do WAL logging during index searches, which seems prohibitively
expensive.

Don't see that.

I was talking about reusing the first 4 bytes of an index tuple's ItemPointerData,
which is the first field of an index tuple. Index tuples are MAXALIGNed, so I can't see how that would ever cross a page boundary.
 
And I'm also doubtful it's worth it because:

> Since this point of the code is clearly going to be a performance issue it
> seems like something we should do now.

I've tried quite a bit to find a workload where this matters, but after
avoiding redundant buffer accesses by sorting, and prefetching I was
unable to do so.  What workload do you see where this would be really be
bad? Without the performance optimization I'd found a very minor
regression by trying to force the heap visits to happen in a pretty
random order, but after sorting that went away.  I'm sure it's possible
to find a case on overloaded rotational disks where you'd find a small
regression, but I don't think it'd be particularly bad.

The code can do literally hundreds of random I/Os in an 8192 blocksize. What happens with 16 or 32kB?

"Small regression" ?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-committers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: pgsql: tableam: Comment fixes.
Следующее
От: Andres Freund
Дата:
Сообщение: pgsql: Show table access methods as such in psql's \dA.