Re: Parallel copy

Поиск
Список
Период
Сортировка
От Ashutosh Sharma
Тема Re: Parallel copy
Дата
Msg-id CAE9k0PkY1cT2Ax9B4TrYHCPw_YNibWJQ0wBNiPDTXpQ0_aXS0Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel copy  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
I think, when doing the performance testing for partitioned table, it would be good to also mention about the distribution of data in the input file. One possible data distribution could be that we have let's say 100 tuples in the input file, and every consecutive tuple belongs to a different partition.

On Thu, Jul 23, 2020 at 8:51 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Jul 22, 2020 at 7:56 PM vignesh C <vignesh21@gmail.com> wrote:
>
> Thanks for reviewing and providing the comments Ashutosh.
> Please find my thoughts below:
>
> On Fri, Jul 17, 2020 at 7:18 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Some review comments (mostly) from the leader side code changes:  
> >
> > 3) Should we allow Parallel Copy when the insert method is CIM_MULTI_CONDITIONAL?
> >
> > +   /* Check if the insertion mode is single. */
> > +   if (FindInsertMethod(cstate) == CIM_SINGLE)
> > +       return false;
> >
> > I know we have added checks in CopyFrom() to ensure that if any trigger (before row or instead of) is found on any of partition being loaded with data, then COPY FROM operation would fail, but does it mean that we are okay to perform parallel copy on partitioned table. Have we done some performance testing with the partitioned table where the data in the input file needs to be routed to the different partitions?
> >
>
> Partition data is handled like what Amit had told in one of earlier mails [1].  My colleague Bharath has run performance test with partition table, he will be sharing the results.
>

I ran tests for partitioned use cases - results are similar to that of non partitioned cases[1].

parallel workerstest case 1(exec time in sec): copy from csv file, 5.1GB, 10million tuples, 4 range partitions, 3 indexes on integer columns unique datatest case 2(exec time in sec): copy from csv file, 5.1GB, 10million tuples, 4 range partitions, unique data
0205.403(1X)135(1X)
2114.724(1.79X)59.388(2.27X)
499.017(2.07X)56.742(2.34X)
899.722(2.06X)66.323(2.03X)
1698.147(2.09X)66.054(2.04X)
2097.723(2.1X)66.389(2.03X)
3097.048(2.11X)70.568(1.91X)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "tsunakawa.takay@fujitsu.com"
Дата:
Сообщение: RE: Global snapshots
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions