[PATCH] Performance Improvement For Copy From Binary Files

Поиск

Список

Период

Сортировка

От	Bharath Rupireddy
Тема	[PATCH] Performance Improvement For Copy From Binary Files
Дата	29 июня 2020 г. 08:20:59
Msg-id	CALj2ACU5Bz06HWLwqSzNMN=Gupoj6Rcn_QVC+k070V4em9wu=A@mail.gmail.com обсуждение исходный текст
Ответы	Re: [PATCH] Performance Improvement For Copy From Binary Files (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) Re: [PATCH] Performance Improvement For Copy From Binary Files (Amit Langote <amitlangote09@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi Hackers,

For Copy From Binary files, there exists below information for each tuple/row.

1. field count(number of columns)

2. for every field, field size(column data length)

3. field data of field size(actual column data)

Currently, all the above data required at each step is read directly from file using fread() and this happens for all the tuples/rows.

One observation is that in the total execution time of a copy from binary file, the fread() call is taking upto 20% of time and the fread() function call count is also too high.

For instance, with a dataset of size 5.3GB, 10million tuples with 10 columns,

total exec time in sec	total time taken for fread()	fread() function call count
101.193	21.33	210000005
101.345	21.436	210000005

The total time taken for fread() and the corresponding function call count may increase if we have more number of columns for instance 1000.

One solution to this problem is to read data from binary file in RAW_BUF_SIZE(64KB) chunks to avoid repeatedly calling fread()(thus possibly avoiding few disk IOs). This is similar to the approach followed for csv/text files.

Attaching a patch, implementing the above solution for binary format files.

Below is the improvement gained.

total exec time in sec	total time taken for fread()	fread() function call count
75.757	2.73	160884
75.351	2.742	160884

Execution is 1.36X times faster, fread() time is reduced by 87%, fread() call count is reduced by 99%.

Request the community to take this patch for review if this approach and improvement seem beneficial.

Any suggestions to improve further are most welcome.

Attached also is the config file used for testing the above use case.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Langote
Дата: 29 июня 2020 г., 08:00:28
Сообщение: Re: POC: postgres_fdw insert batching

Следующее

От: "Tharakan, Robins"
Дата: 29 июня 2020 г., 08:48:35
Сообщение: track_planning causing performance regression

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[PATCH] Performance Improvement For Copy From Binary Files

Вложения

Предыдущее

Следующее