Re: Parallel copy

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Parallel copy
Дата
Msg-id CAA4eK1KtdjBJtyLjwFrqykBD0SQgK6JJHLmTYRooUJn08EcTCA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel copy  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Parallel copy  (Alastair Turner <minion@decodable.me>)
Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Список pgsql-hackers
On Tue, Feb 25, 2020 at 9:30 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sun, Feb 23, 2020 at 05:09:51PM -0800, Andres Freund wrote:
> >Hi,
> >
> >> The one piece of information I'm missing here is at least a very rough
> >> quantification of the individual steps of CSV processing - for example
> >> if parsing takes only 10% of the time, it's pretty pointless to start by
> >> parallelising this part and we should focus on the rest. If it's 50% it
> >> might be a different story. Has anyone done any measurements?
> >
> >Not recently, but I'm pretty sure that I've observed CSV parsing to be
> >way more than 10%.
> >
>
> Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
> so I still think we need to do some measurements first.
>

Agreed.

> I'm willing to
> do that, but (a) I doubt I'll have time for that until after 2020-03,
> and (b) it'd be good to agree on some set of typical CSV files.
>

Right, I don't know what is the best way to define that.  I can think
of the below tests.

1. A table with 10 columns (with datatypes as integers, date, text).
It has one index (unique/primary). Load with 1 million rows (basically
the data should be probably 5-10 GB).
2. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). Load with 1
million rows (basically the data should be probably 5-10 GB).
3. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). It has before
and after trigeers. Load with 1 million rows (basically the data
should be probably 5-10 GB).
4. A table with 10 columns (with datatypes as integers, date, text).
It has five or six indexes, one index can be (unique/primary). Load
with 1 million rows (basically the data should be probably 5-10 GB).

Among all these tests, we can check how much time did we spend in
reading, parsing the csv files vs. rest of execution?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michail Nikolaev
Дата:
Сообщение: Re: BUG #16108: Colorization to the output of command-line hasunproperly behaviors at Windows platform
Следующее
От: Juan José Santamaría Flecha
Дата:
Сообщение: Re: BUG #16108: Colorization to the output of command-line hasunproperly behaviors at Windows platform