Re: COPY FROM WHEN condition

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: COPY FROM WHEN condition
Дата
Msg-id CAKJS1f_PK33w3sKcG_2=T+aL4JuOjwA_892r2w93XKZOYXiC4A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: COPY FROM WHEN condition  (Andres Freund <andres@anarazel.de>)
Ответы Re: COPY FROM WHEN condition
Список pgsql-hackers
On Fri, 29 Mar 2019 at 01:15, Andres Freund <andres@anarazel.de> wrote:
> On 2019-03-28 20:48:47 +1300, David Rowley wrote:
> > I had a look at this and performance has improved again, thanks.
> > However, I'm not sure if the patch is exactly what we need, let me
> > explain.
>
> I'm not entirely sure either, I just haven't really seen an alternative
> that's convincing.

I wonder if instead of having the array of slots in ResultRelInfo,
have a struct that's local to copy.c containing the array and the
number of tuples stored so far.  For partitioned tables, we could
store this struct in a hashtable by partition Oid. When the partition
changes check if we've got this partition Oid in the hash table and
keep adding tuples until the buffer fills.   We could keep a global
count of the number of tuple stored in all the slot arrays and flush
all of them when it gets full.

The trade-off here would be that instead of flushing on each partition
change, we'd do a hash table lookup on each partition change and
possibly create a new array of slots.   This would allow us to get rid
of the code that conditionally switches on/off the batching based on
how often the partition is changing. The key to it being better would
hang on the hash lookup + multi-row-inserts being faster than
single-row-inserts.

I'm just not too sure about how to handle getting rid of the slots
when we flush all the tuples.  Getting rid of them might be a waste,
but it might also stop the code creating tens of millions of slots in
the worst case.  Maybe to fix that we could get rid of the slots in
arrays that didn't get any use at all when we flush the tuples, as
indicated by a 0 tuple count.  This would require a hash seq scan, but
maybe we could keep that cheap by flushing early if we get too many
distinct partitions. That would save the table from getting bloated if
there happened to be a point in the copy stream where we saw high
numbers of distinct partitions with just a few tuples each.
Multi-inserts won't help much in that case anyway.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Darafei "Komяpa" Praliaskouski
Дата:
Сообщение: Re: Berserk Autovacuum (let's save next Mandrill)
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: [HACKERS] generated columns