We are trying to implement postgresql code to load a large object into a postgresql bytea in chunks to avoid loading the file into memory in the client.
First attempt was to do
update build_attachment set chunk = chunk || newdata ;
this did not scale and got significantly slower after 4000-5000 updates.
The chunks are 4K in size, and I'm testing with a 128MB input file, requiring 32,774 chunk updates.
we then insert our 4K chunks to this, which takes very little time (20 seconds for the 32,774 inserts)
Here's an example though of trying to select the aggregate:
gary=> \timing Timing is on. gary=> select bytea_agg(chunk order by seq) from build_attachment where seq < 4000 \g output Time: 13372.843 ms gary=> select bytea_agg(chunk order by seq) from build_attachment where seq < 8000 \g output Time: 54447.541 ms gary=> select bytea_agg(chunk order by seq) from build_attachment where seq < 16000 \g output Time: 582219.773 ms
So those partial aggregates completed in somewhat acceptable times but ...
gary=> select bytea_agg(chunk order by seq) from build_attachment where seq < 32000 \g output this one hadn't completed in an hour - the PostgreSQL connection process for my connection on the server goes to 100% CPU and stays there, not using much RAM, not doing much IO, oddly
EXPLAINing these aggregate selects doesn't show anything useful.
Am I doomed to not be able to update a bytea this way? Is there some way I can tune this?
bytea is immutable object without preallocation - so update of big tasks is very expensive.
I am thinking so using LO API and then transformation to bytea will be much more effective
\lo_import path
you can use
CREATE OR REPLACE FUNCTION attachment_to_bytea(attachment oid) RETURNS bytea AS $$ DECLARE fd integer; size integer; BEGIN fd := lo_open(attachment, 262144); size := lo_lseek(fd, 0, 2); PERFORM lo_lseek(fd, 0, 0); RETURN loread(fd, size); EXCEPTION WHEN undefined_object THEN PERFORM lo_close(fd); RETURN NULL; END; $$ LANGUAGE plpgsql STRICT SECURITY DEFINER SET search_path = 'pg_catalog';