Re: [GENERAL] storing large files in database - performance

Поиск
Список
Период
Сортировка
От Eric Hill
Тема Re: [GENERAL] storing large files in database - performance
Дата
Msg-id CY1PR05MB22654E9D0E41E2AF0894AB2CF0E40@CY1PR05MB2265.namprd05.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [GENERAL] storing large files in database - performance  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-general
Thanks, Merlin - lots of good information here, and I had not yet stumbled across pg-large-object - I will look into
it.

Eric

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Thursday, May 18, 2017 9:49 AM
To: Eric Hill <Eric.Hill@jmp.com>
Cc: Thomas Kellerer <spam_eater@gmx.net>; PostgreSQL General <pgsql-general@postgresql.org>
Subject: Re: storing large files in database - performance

EXTERNAL

On Thu, May 18, 2017 at 7:34 AM, Eric Hill <Eric.Hill@jmp.com> wrote:
> I would be thrilled to get 76 MB per second, and it is comforting to know that we have that as a rough upper bound on
performance. I've got work to do to figure out how to approach that upper bound from Node.js.
 
>
> In the meantime, I've been looking at performance on the read side.  For that, I can bypass all my Node.js layers and
justrun a query from pgAdmin 4.  I ran this query, where indexFile.contents for the row in question is 25MB in size.
Thequery itself took 4 seconds in pgAdmin 4.  Better than the 12 seconds I'm getting in Node.js, but still on the order
of6MB per second, not 76.  Do you suppose pgAdmin 4 and I are doing similarly inefficient things in querying bytea
values?

Probably.  I haven't spent a lot of time with pgadmin 4 so I'm not entirely sure.  If you want a quick and dirty
comparison,try using running your query in psql unaligned mode for a comaprison point.  You can also do \copy BINARY in
thecase of byte transfers.
 

The basic problem is not really the database, it's that database interaction APIs tend not to be directed to this kind
ofproblem.
 
The big picture issues are:

*) Driver overhead marshaling from wire format to managed types

*) Driver overhead for memory management

*) Wire format issues.  Certain types are *much* faster with the binary wire format and are additionally much more
memoryefficient.
 
Your bytea transfers are probably being serialized to text and back in both directions which is very wasteful,
especiallyfor very large transfers since it's wasteful in terms of memory.
 

If I were to seriously look at node.js performance, my rough thinking is that I'd want to be setting up the javascript
variablesdirectly in C somehow using plv8 internal routines.  Short of that, I would probably be querying all data out
ofpostgres in json rather than serializing individual fields (which is what I generally do in practice).
 

Another point, some googling turned up
https://www.npmjs.com/package/pg-large-object which is definitely something to consider trying.

merlin

В списке pgsql-general по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: [GENERAL] union all taking years - PG 9.6
Следующее
От: Martin Goodson
Дата:
Сообщение: Re: [GENERAL] EnterpriseDB installed PostgreSQL 9.6 vs. REPMGR. Round4 - compilation issues on RHEL 7.2