Re: COPY FROM WHEN condition

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: COPY FROM WHEN condition
Дата
Msg-id 20190403055646.anm2tfoqo5rzbcls@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: COPY FROM WHEN condition  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: COPY FROM WHEN condition  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
On 2019-04-03 18:44:32 +1300, David Rowley wrote:
> (Fixed all of what you mentioned)
> 
> On Wed, 3 Apr 2019 at 06:57, Andres Freund <andres@anarazel.de> wrote:
> > > +/*
> > > + * CopyMultiInsertInfo_Flush
> > > + *           Write out all stored tuples in all buffers out to the tables.
> > > + *
> > > + * To save us from ending up with buffers for 1000s of partitions we remove
> > > + * buffers belonging to partitions that we've seen no tuples for in this batch
> >
> > That seems a little naive (imagine you have like 5 partitions, and we
> > constantly cycle through 2-3 of them per batch).  It's probably OK for
> > this version.   I guess I'd only do that cleanup if we're actually
> > flushing due to the number of partitions.
> 
> hmm good point.  It seems like being smarter there would be a good thing.
> 
> I've ended up getting rid of the hash table in favour of the List that
> you mentioned and storing the buffer in ResultRelInfo.


Cool.

> I also changed
> the logic that removes buffers once we reach the limit.  Instead of
> getting rid of buffers that were not used on this run, I've changed it
> so it just gets rid of the buffers starting with the oldest one first,
> but stops once the number of buffers is at the maximum again.  This
> can mean that we end up with MAX_BUFFERED_TUPLES buffers instead of
> MAX_PARTITION_BUFFERS if there is only 1 tuple per buffer.  My current
> thinking is that this does not matter since only 1 slot will be
> allocated per buffer.  We'll remove all of the excess buffers during
> the flush and keep just MAX_PARTITION_BUFFERS of the newest buffers.

Yea, that makes sense to me.


> Also, after changing CopyMultiInsertBuffer to use fixed sized arrays
> instead of allocating them with another palloc the performance has
> improved a bit more.
> 
> Using the perl files mentioned in [1]
> 
> Master + Patched:
> # copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
> COPY 35651564
> Time: 9106.776 ms (00:09.107)
> # truncate table listp;
> TRUNCATE TABLE
> # copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 10154.196 ms (00:10.154)
> 
> 
> Master only:
> # copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
> COPY 35651564
> Time: 22200.535 ms (00:22.201)
> # truncate table listp;
> TRUNCATE TABLE
> # copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 18592.107 ms (00:18.592)

Awesome. I'm probably too tired to give you meaningful feedback
tonight. But I'll do a quick readthrough.



> +/* Class the buffer full if there are >= this many bytes of tuples stored */
> +#define MAX_BUFFERED_BYTES        65535

Random aside: This seems pretty small (but should be changed separately.


> +/* Trim the list of buffers back down to this number after flushing */
> +#define MAX_PARTITION_BUFFERS    32
> +
> +/* Stores multi-insert data related to a single relation in CopyFrom. */
> +typedef struct CopyMultiInsertBuffer
> +{
> +    TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
> +    ResultRelInfo *resultRelInfo;    /* ResultRelInfo for 'relid' */
> +    BulkInsertState bistate;    /* BulkInsertState for this rel */
> +    int            nused;            /* number of 'slots' containing tuples */
> +    uint64        linenos[MAX_BUFFERED_TUPLES];    /* Line # of tuple in copy
> +                                                 * stream */
> +} CopyMultiInsertBuffer;

I don't think it's needed now, but you'd probably achieve a bit better
locality by storing slots + linenos in a single array of (slot,lineno).

> +/*
> + * CopyMultiInsertBuffer_Init
> + *        Allocate memory and initialize a new CopyMultiInsertBuffer for this
> + *        ResultRelInfo.
> + */
> +static CopyMultiInsertBuffer *
> +CopyMultiInsertBuffer_Init(ResultRelInfo *rri)
> +{
> +    CopyMultiInsertBuffer *buffer;
> +
> +    buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
> +    memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);

Is there a reason to not just directly palloc0?


> +/*
> + * CopyMultiInsertBuffer_Cleanup
> + *        Drop used slots and free member for this buffer.  The buffer
> + *        must be flushed before cleanup.
> + */
> +static inline void
> +CopyMultiInsertBuffer_Cleanup(CopyMultiInsertBuffer *buffer)
> +{
> +    int            i;
> +
> +    /* Ensure buffer was flushed */
> +    Assert(buffer->nused == 0);
> +
> +    /* Remove back-link to ourself */
> +    buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
> +
> +    ReleaseBulkInsertStatePin(buffer->bistate);

Hm, afaict this still leaks the bistate itself?


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Ordered Partitioned Table Scans
Следующее
От: Pavan Deolasee
Дата:
Сообщение: Re: Re: A separate table level option to control compression