Re: pgsql: Compute XID horizon for page level index vacuum onprimary.

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: pgsql: Compute XID horizon for page level index vacuum onprimary.
Дата
Msg-id 20190329161238.eyrmdtofbzmapvyv@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: pgsql: Compute XID horizon for page level index vacuum on primary.  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: pgsql: Compute XID horizon for page level index vacuum on primary.  (Simon Riggs <simon@2ndquadrant.com>)
Re: pgsql: Compute XID horizon for page level index vacuum on primary.  (Simon Riggs <simon@2ndquadrant.com>)
Re: pgsql: Compute XID horizon for page level index vacuum on primary.  (Peter Geoghegan <pg@bowt.ie>)
Re: pgsql: Compute XID horizon for page level index vacuum on primary.  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-committers
Hi,

On 2019-03-29 15:58:14 +0000, Simon Riggs wrote:
> On Fri, 29 Mar 2019 at 15:29, Andres Freund <andres@anarazel.de> wrote:
> > That's far from a trivial feature imo. It seems quite possible that we'd
> > end up with increased overhead, because the current logic can get away
> > with only doing hint bit style writes - but would that be true if we
> > started actually replacing the item pointers? Because I don't see any
> > guarantee they couldn't cross a page boundary etc? So I think we'd need
> > to do WAL logging during index searches, which seems prohibitively
> > expensive.
> >
> 
> Don't see that.
> 
> I was talking about reusing the first 4 bytes of an index tuple's
> ItemPointerData,
> which is the first field of an index tuple. Index tuples are MAXALIGNed, so
> I can't see how that would ever cross a page boundary.

They're 8 bytes, and MAXALIGN often is 4 bytes:

struct ItemPointerData {
        BlockIdData                ip_blkid;             /*     0     4 */
        OffsetNumber               ip_posid;             /*     4     2 */

        /* size: 6, cachelines: 1, members: 2 */
        /* last cacheline: 6 bytes */
};

struct IndexTupleData {
        ItemPointerData            t_tid;                /*     0     6 */
        short unsigned int         t_info;               /*     6     2 */

        /* size: 8, cachelines: 1, members: 2 */
        /* last cacheline: 8 bytes */
};

So as a whole they definitely can cross sector boundaries. You might be
able to argue your way out of that by saying that the blkid is going to
be aligned, but that's not that trivial, as t_info isn't guaranteed
that.

But even so, you can't have unlogged changes that you then rely on. Even
if there's no torn page issue. Currently BTP_HAS_GARBAGE and
ItemIdMarkDead() are treated as hints - if we want to guarantee all
these are accurate, I don't quite see how we'd get around WAL logging
those.


> > And I'm also doubtful it's worth it because:
> >
> > > Since this point of the code is clearly going to be a performance issue
> > it
> > > seems like something we should do now.
> >
> > I've tried quite a bit to find a workload where this matters, but after
> > avoiding redundant buffer accesses by sorting, and prefetching I was
> > unable to do so.  What workload do you see where this would be really be
> > bad? Without the performance optimization I'd found a very minor
> > regression by trying to force the heap visits to happen in a pretty
> > random order, but after sorting that went away.  I'm sure it's possible
> > to find a case on overloaded rotational disks where you'd find a small
> > regression, but I don't think it'd be particularly bad.

> The code can do literally hundreds of random I/Os in an 8192 blocksize.
> What happens with 16 or 32kB?

It's really hard to construct such cases after the sorting changes, but
obviously not impossible. But to make it actually painful you need a
workload where the implied randomness of accesses isn't already a major
bottleneck - and that's hard.

This has been discussed publically for a few months...

Greetings,

Andres Freund



В списке pgsql-committers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: pgsql: Show table access methods as such in psql's \dA.
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: pgsql: Compute XID horizon for page level index vacuum on primary.