Re: PATCH: optimized DROP of multiple tables within a transaction

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: PATCH: optimized DROP of multiple tables within a transaction
Дата
Msg-id 50D26FE8.1040800@fuzzy.cz
обсуждение исходный текст
Ответ на Re: PATCH: optimized DROP of multiple tables within a transaction  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: PATCH: optimized DROP of multiple tables within a transaction
Список pgsql-hackers
On 19.12.2012 02:18, Andres Freund wrote:
> On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote:
> 
> I think except of the temp buffer issue mentioned below its ready.
> 
>> -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
>> +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes)
>>  {
>> -    int            i;
>> +    int         i, j;
>> +
>> +    /* sort the list of rnodes */
>> +    pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator);
>>
>>      /* If it's a local relation, it's localbuf.c's problem. */
>> -    if (RelFileNodeBackendIsTemp(rnode))
>> +    for (i = 0; i < nnodes; i++)
>>      {
>> -        if (rnode.backend == MyBackendId)
>> -            DropRelFileNodeAllLocalBuffers(rnode.node);
>> -        return;
>> +        if (RelFileNodeBackendIsTemp(rnodes[i]))
>> +        {
>> +            if (rnodes[i].backend == MyBackendId)
>> +                DropRelFileNodeAllLocalBuffers(rnodes[i].node);
>> +        }
>>      }
> 
> While you deal with local buffers here you don't anymore in the big loop
> over shared buffers. That wasn't needed earlier since we just returned
> after noticing we have local relation, but thats not the case anymore.

Hmm, but that would require us to handle the temp relations explicitly
wherever we call DropRelFileNodeAllBuffers. Currently there are two such
places - smgrdounlink() and smgrdounlinkall().

By placing it into DropRelFileNodeAllBuffers() this code is shared and I
think it's a good thing.

But that does not mean the code is perfect - it was based on the
assumption that if there's a mix of temp and regular relations, the temp
relations will be handled in the first part and the rest in the second one.

Maybe it'd be better to improve it so that the temp relations are
removed from the array after the first part (and skip the second one if
there are no remaining relations).

> 
>>      for (i = 0; i < NBuffers; i++)
>>      {
>> +        RelFileNodeBackend *rnode = NULL;
>>          volatile BufferDesc *bufHdr = &BufferDescriptors[i];
>> -
>> +
>>          /*
>>           * As in DropRelFileNodeBuffers, an unlocked precheck should be safe
>>           * and saves some cycles.
>>           */
>> -        if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
>> +
>> +        /*
>> +         * For low number of relations to drop just use a simple walk through,
>> +         * to save the bsearch overhead. The BSEARCH_LIMIT is rather a guess
>> +         * than a exactly determined value, as it depends on many factors (CPU
>> +         * and RAM speeds, amount of shared buffers etc.).
>> +         */
>> +        if (nnodes <= BSEARCH_LIMIT)
> 
> I think thats a sensible plan. It makes sense that for a small number of
> relations a sequential scan of the rnodes array is faster than a bsearch
> and 10 sounds like a good value although I would guess the optimal value
> is slightly higher on most machines. But if it works fine without
> regressions thats pretty good...

I think it's pointless to look for the optimal value in this case, given
on how many factors it depends. We could use 20 instead of 10, but I
wouldn't go higher probably.

>> +
>> +/*
>> + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so
>> + * that it's suitable for bsearch.
>> + */
>> +static int
>> +rnode_comparator(const void * p1, const void * p2)
>> +{
>> +    RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1;
>> +    RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2;
>> +
>> +    if (n1.node.relNode < n2.node.relNode)
>> +        return -1;
>> +    else if (n1.node.relNode > n2.node.relNode)
>> +        return 1;
>> +
>> +    if (n1.node.dbNode < n2.node.dbNode)
>> +        return -1;
>> +    else if (n1.node.dbNode > n2.node.dbNode)
>> +        return 1;
>> +
>> +    if (n1.node.spcNode < n2.node.spcNode)
>> +        return -1;
>> +    else if (n1.node.spcNode > n2.node.spcNode)
>> +        return 1;
>> +    else
>> +        return 0;
>> +}
> 
> Still surprised this is supposed to be faster than a memcmp, but as you
> seem to have measured it earlier..

It surprised me too. These are the numbers with the current patch:

1) one by one
=============         0    2    4    6    8    10   12   14   16   18   20
--------------------------------------------------------------
current  15   22   28   34   41    75   77   82   92   99  106
memcmp   16   23   29   36   44   122  125  128  153  154  158

Until the number of indexes reaches ~10, the numbers are almost exactly
the same. Then the bsearch branch kicks in and it's clear how much
slower the memcmp comparator is.

2) batches of 100
=================         0    2    4    6    8    10   12   14   16   18   20
--------------------------------------------------------------
current   3    5    8   10   12    15   17   21   23   27   31
memcmp    4    7   10   13   16    19   22   28   30   32   36

Here the difference is much smaller, but even here the memcmp is
consistently a bit slower.


My theory is that while the current comparator starts with the most
variable part (relation OID), and thus usually starts after the
comparing the first 4B, the memcmp starts from the other end (tablespace
and database) and therefore needs to compare more data.

>> +void smgrdounlinkall(SMgrRelation * rels, int nrels, bool isRedo)
>> +{
>> +    int i = 0;
>> +    RelFileNodeBackend *rnodes;
>> +    ForkNumber  forknum;
>> +
>> +    /* initialize an array which contains all relations to be dropped */
>> +    rnodes = palloc(sizeof(RelFileNodeBackend) * nrels);
>> +    for (i = 0; i < nrels; i++)
>> +    {
>> +        RelFileNodeBackend rnode = rels[i]->smgr_rnode;
>> +        int            which = rels[i]->smgr_which;
>> +
>> +        rnodes[i] = rnode;
>> +
>> +        /* Close the forks at smgr level */
>> +        for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
>> +            (*(smgrsw[which].smgr_close)) (rels[i], forknum);
>> +    }
>> +
>> +    /*
>> +     * Get rid of any remaining buffers for the relation.  bufmgr will just
>> +     * drop them without bothering to write the contents.
>> +     */
>> +    DropRelFileNodeAllBuffers(rnodes, nrels);
> 
> I think it might be a good idea to handle temp relations on/buffers on
> this level instead of trying to put it into
> DropRelFileNodeAllBuffers. Would also allow you to only build an array
> of RelFileNode's instead of RelFileNodeBackend's which might make it
> slightl faster.

Hmmm, sadly that'd require duplication of code because it needs to be
done in smgrdounlink too.

> 
>> +    /*
>> +     * It'd be nice to tell the stats collector to forget it immediately, too.
>> +     * But we can't because we don't know the OID (and in cases involving
>> +     * relfilenode swaps, it's not always clear which table OID to forget,
>> +     * anyway).
>> +     */
> 
> This and at least one other comment seems to be in too many locations
> now.

I see it in three places - smgrdounlink, smgrdounlinkall and
smgrdounlinkfork. Which ones you consider superfluous? I think it's
appropriate to keep them in all three places.

Tomas



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: strange OOM errors with EXECUTE in PL/pgSQL
Следующее
От: Josh Kupershmidt
Дата:
Сообщение: discarding duplicate indexes