Re: PATCH: optimized DROP of multiple tables within a transaction

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: PATCH: optimized DROP of multiple tables within a transaction
Дата
Msg-id 20121220113300.GB4303@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: PATCH: optimized DROP of multiple tables within a transaction  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: PATCH: optimized DROP of multiple tables within a transaction
Список pgsql-hackers
On 2012-12-20 02:54:48 +0100, Tomas Vondra wrote:
> On 19.12.2012 02:18, Andres Freund wrote:
> > On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote:
> >
> > I think except of the temp buffer issue mentioned below its ready.
> >
> >> -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
> >> +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes)
> >>  {
> >> -    int            i;
> >> +    int         i, j;
> >> +
> >> +    /* sort the list of rnodes */
> >> +    pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator);
> >>
> >>      /* If it's a local relation, it's localbuf.c's problem. */
> >> -    if (RelFileNodeBackendIsTemp(rnode))
> >> +    for (i = 0; i < nnodes; i++)
> >>      {
> >> -        if (rnode.backend == MyBackendId)
> >> -            DropRelFileNodeAllLocalBuffers(rnode.node);
> >> -        return;
> >> +        if (RelFileNodeBackendIsTemp(rnodes[i]))
> >> +        {
> >> +            if (rnodes[i].backend == MyBackendId)
> >> +                DropRelFileNodeAllLocalBuffers(rnodes[i].node);
> >> +        }
> >>      }
> >
> > While you deal with local buffers here you don't anymore in the big loop
> > over shared buffers. That wasn't needed earlier since we just returned
> > after noticing we have local relation, but thats not the case anymore.
>
> Hmm, but that would require us to handle the temp relations explicitly
> wherever we call DropRelFileNodeAllBuffers. Currently there are two such
> places - smgrdounlink() and smgrdounlinkall().
>
> By placing it into DropRelFileNodeAllBuffers() this code is shared and I
> think it's a good thing.
>
> But that does not mean the code is perfect - it was based on the
> assumption that if there's a mix of temp and regular relations, the temp
> relations will be handled in the first part and the rest in the second one.
>
> Maybe it'd be better to improve it so that the temp relations are
> removed from the array after the first part (and skip the second one if
> there are no remaining relations).

The problem is that afaik without the backend-local part a temporary
relation could match the same relfilenode as a "full" relation which
would have pretty bad consequences.

> > Still surprised this is supposed to be faster than a memcmp, but as you
> > seem to have measured it earlier..
>
> It surprised me too. These are the numbers with the current patch:
>
> 1) one by one
> =============
>           0    2    4    6    8    10   12   14   16   18   20
> --------------------------------------------------------------
> current  15   22   28   34   41    75   77   82   92   99  106
> memcmp   16   23   29   36   44   122  125  128  153  154  158
>
> Until the number of indexes reaches ~10, the numbers are almost exactly
> the same. Then the bsearch branch kicks in and it's clear how much
> slower the memcmp comparator is.
>
> 2) batches of 100
> =================
>           0    2    4    6    8    10   12   14   16   18   20
> --------------------------------------------------------------
> current   3    5    8   10   12    15   17   21   23   27   31
> memcmp    4    7   10   13   16    19   22   28   30   32   36
>
> Here the difference is much smaller, but even here the memcmp is
> consistently a bit slower.
>
>
> My theory is that while the current comparator starts with the most
> variable part (relation OID), and thus usually starts after the
> comparing the first 4B, the memcmp starts from the other end (tablespace
> and database) and therefore needs to compare more data.

Thats a good guess. I think its ok this way, but if you feel like
playing you could just change the order in the struct...

> >> +void smgrdounlinkall(SMgrRelation * rels, int nrels, bool isRedo)
> >> +{
> >> +    int i = 0;
> >> +    RelFileNodeBackend *rnodes;
> >> +    ForkNumber  forknum;
> >> +
> >> +    /* initialize an array which contains all relations to be dropped */
> >> +    rnodes = palloc(sizeof(RelFileNodeBackend) * nrels);
> >> +    for (i = 0; i < nrels; i++)
> >> +    {
> >> +        RelFileNodeBackend rnode = rels[i]->smgr_rnode;
> >> +        int            which = rels[i]->smgr_which;
> >> +
> >> +        rnodes[i] = rnode;
> >> +
> >> +        /* Close the forks at smgr level */
> >> +        for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> >> +            (*(smgrsw[which].smgr_close)) (rels[i], forknum);
> >> +    }
> >> +
> >> +    /*
> >> +     * Get rid of any remaining buffers for the relation.  bufmgr will just
> >> +     * drop them without bothering to write the contents.
> >> +     */
> >> +    DropRelFileNodeAllBuffers(rnodes, nrels);
> >
> > I think it might be a good idea to handle temp relations on/buffers on
> > this level instead of trying to put it into
> > DropRelFileNodeAllBuffers. Would also allow you to only build an array
> > of RelFileNode's instead of RelFileNodeBackend's which might make it
> > slightl faster.
>
> Hmmm, sadly that'd require duplication of code because it needs to be
> done in smgrdounlink too.

It should be fairly small though.

int nrels_nonlocal = 0;

for (i = 0; i < nrels; i++)
{RelFileNodeBackend rnode = rels[i]->smgr_rnode;int            which = rels[i]->smgr_which;
       if (RelFileNodeBackendIsTemp(rnode))          DropRelFileNodeAllLocalBuffers       else
rnodes[nrels_nonlocal++];
for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)           (*(smgrsw[which].smgr_close)) (rels[i], forknum);
}

DropRelFileNodeAllLocalBuffers(rnode, nrels_nonlocal);

In smgrdounlink it should only be a

if (RelFileNodeBackendIsTemp(rnode))  DropRelFileNodeAllLocalBuffers(rnode)
else  DropRelFileNodeAllBuffers(rnode);

ISTM that would be cleaner then memmove'ing the rnode array to remove
the temp rels away.

> >
> >> +    /*
> >> +     * It'd be nice to tell the stats collector to forget it immediately, too.
> >> +     * But we can't because we don't know the OID (and in cases involving
> >> +     * relfilenode swaps, it's not always clear which table OID to forget,
> >> +     * anyway).
> >> +     */
> >
> > This and at least one other comment seems to be in too many locations
> > now.
>
> I see it in three places - smgrdounlink, smgrdounlinkall and
> smgrdounlinkfork. Which ones you consider superfluous? I think it's
> appropriate to keep them in all three places.

I only read the patch, not the result, so maybe youre right. I'll look
at it sometime.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Groshev Andrey
Дата:
Сообщение: Re: [GENERAL] trouble with pg_upgrade 9.0 -> 9.1
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [GENERAL] trouble with pg_upgrade 9.0 -> 9.1