Re: badly scaling performance with appending to bytea

Поиск

Список

Период

Сортировка

От	Pavel Stehule
Тема	Re: badly scaling performance with appending to bytea
Дата	21 марта 2018 г. 18:59:47
Msg-id	CAFj8pRDfOHDNHLVAgPzV4TH3b3exMWmY32ucFr5C_+pMfQTLnw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: badly scaling performance with appending to bytea (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы	Re: badly scaling performance with appending to bytea
Список	pgsql-performance

Дерево обсуждения

2018-03-21 13:56 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

2018-03-21 13:03 GMT+01:00 Gary Cowell <gary.cowell@gmail.com>:
We are trying to implement postgresql code to load a large object into
a postgresql bytea in chunks to avoid loading the file into memory in
the client.

First attempt was to do

update build_attachment set chunk = chunk || newdata ;

this did not scale and got significantly slower after 4000-5000 updates.

The chunks are 4K in size, and I'm testing with a 128MB input file,
requiring 32,774 chunk updates.

Next, I tried creating an aggregate, thus:

(taken from stackoverflow)

CREATE AGGREGATE bytea_agg(bytea) (SFUNC=byteacat,STYPE=bytea);

changed the code to insert the chunks to a temporary table :

create temporary table build_attachment (seq bigserial primary key,
chunk bytea ) on commit drop;

we then insert our 4K chunks to this, which takes very little time (20
seconds for the 32,774 inserts)

Here's an example though of trying to select the aggregate:

gary=> \timing
Timing is on.
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 4000 \g output
Time: 13372.843 ms
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 8000 \g output
Time: 54447.541 ms
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 16000 \g output
Time: 582219.773 ms

So those partial aggregates completed in somewhat acceptable times but ...

gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 32000 \g output
this one hadn't completed in an hour - the PostgreSQL connection
process for my connection on the server goes to 100% CPU and stays
there, not using much RAM, not doing much IO, oddly

EXPLAINing these aggregate selects doesn't show anything useful.

Am I doomed to not be able to update a bytea this way? Is there some
way I can tune this?

bytea is immutable object without preallocation - so update of big tasks is very expensive.

I am thinking so using LO API and then transformation to bytea will be much more effective

\lo_import path

you can use

CREATE OR REPLACE FUNCTION attachment_to_bytea(attachment oid)
RETURNS bytea AS $$
DECLARE
fd        integer;
size      integer;
BEGIN
fd   := lo_open(attachment, 262144);
size := lo_lseek(fd, 0, 2);
PERFORM lo_lseek(fd, 0, 0);
RETURN loread(fd, size);
EXCEPTION WHEN undefined_object THEN
   PERFORM lo_close(fd);
   RETURN NULL;
END;
$$ LANGUAGE plpgsql STRICT SECURITY DEFINER SET search_path = 'pg_catalog';

function

import cca 44MB was in few seconds

there is native function lo_get

https://www.postgresql.org/docs/current/static/lo-funcs.html

Regards

Pavel

В списке pgsql-performance по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 21 марта 2018 г., 18:56:24
Сообщение: Re: badly scaling performance with appending to bytea

Следующее

От: Gary Cowell
Дата: 21 марта 2018 г., 19:04:55
Сообщение: Re: badly scaling performance with appending to bytea

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: badly scaling performance with appending to bytea

Предыдущее

Следующее