COPY, bytea streaming and memory footprint

Поиск

Список

Период

Сортировка

От	Jerome Wagner
Тема	COPY, bytea streaming and memory footprint
Дата	10 июня 2020 г. 15:52:02
Msg-id	CA+=V_fNkQyZFcUo2XEwFz95=9eaOUViyNcW0JbgiTdy8RsqcFA@mail.gmail.com обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

Hello,

I am trying to understand/optimize how a COPY operation behaves when transfering a bytea from the database to a client.

For simplicity, I'll consider that I have only one bytea _image_ in the _images_ table.

Starting with

COPY (SELECT image FROM images) TO STDOUT BINARY

I understand that :

- the server will create a linear buffer on the server side holding the whole image and then start sending it over the network in one big copyData message chunked in 64KB network chunks

- the client can manage to extract this copyData payload by re-assembling those chunks in memory or by streaming the relevant data parts of the chunks elsewhere.

so the problem I see in a streaming situation is that the server actually needs to buffer the whole image in memory.

Now the image is already compressed so if I

ALTER TABLE images ALTER image SET STORAGE EXTERNAL

I can use the fact that substring on non compressed toasted values will fetch only the needed parts and do

COPY (

SELECT (

SELECT substring(image from n for 65536) from images)

FROM generate_series(1, (select length(image) from images), 65536) n

) TO STDOUT BINARY

As I understand it, this would be less memory intensive on the server side if the server starts sending rows before all rows of the subselect are built because it would only need to prepare a sequence of 65536 bytes long buffers for the rows it would decide to have in memory.

but is there a way to know if such a COPY/SELECT statement will indeed start sending rows before they are all prepared on the server ? Does it depend on the request and is there a difference if I add an order by on the select versus the natural order of the table ?

How many rows will be needed in memory before the sending begins ?

I hope my explanation was clear. I am looking for help in better understanding how the server decides to stream the COPY data out of the server vs the internal retrieval of the COPY'd subselect.

Thank you

Jérôme

В списке pgsql-hackers по дате отправления:

Предыдущее

От: David Rowley
Дата: 10 июня 2020 г., 15:33:45
Сообщение: Re: Parallel Seq Scan vs kernel read ahead

Следующее

От: David Rowley
Дата: 10 июня 2020 г., 15:58:34
Сообщение: Recording test runtimes with the buildfarm

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

COPY, bytea streaming and memory footprint

Предыдущее

Следующее