Обсуждение: Tweaking bytea / large object block sizes?

Поиск
Список
Период
Сортировка

Tweaking bytea / large object block sizes?

От
Hanno Schlichting
Дата:
Hi.

I'm a new Postgres user. If I happen to ask stupid questions please
feel free to point me to any documentation I should read or guidelines
for asking questions.

I'm looking into storing binary data in Postgres and trying to
understand how data is actually stored in the database. The dataset
I'm looking at is images, photos, pdf documents which should commonly
be at a minimum 100kb, on average 10mb and can scale up to 100mb for
each document. I want to store this data in the database, as I need
transactional integrity and want to avoid the extra complexity of
managing shared filesystems between a number of frontend application
servers and database backends.

The binary data will only be accessed as a whole. So either a complete
new file is written to the DB or a complete file will be read and
cached on a frontend server. I don't need streaming access or be able
to stream partial data. The use-case seems to be well supported by
Oracle 11g with the introduction of "secure files" (pdf: [1]).

But from what I read of Postgres, my best bet is to store data as
large objects [2]. Going all the way down this means storing the
binary data as 2kb chunks and adding table row overhead for each of
those chunks. Using the bytea type and the toast backend [3] it seems
to come down to the same: data is actually stored in 2kb chunks for a
page size of 8kb.

I'm assuming I'll be able to get ~8kb jumbo frame packets over a
gigabit network connection and would be able to use a ext4 volume with
a block size of either 32kb or 64kb for the volume housing the binary
data, but a smaller block size for the one housing the relational
data.

Given those other constraints, it seems silly to split data up into
2kb chunks on the database level. Is there any way the chunk size for
binary data can be increased here independent of the one for "normal
relational" data?

Thanks,
Hanno

[1] http://www.oracle.com/us/dm/h2fy11/securefiles-362607.pdf?evite=EMEAFM10041984MPP017
[2] http://www.postgresql.org/docs/9.0/static/catalog-pg-largeobject.html
[3] http://www.postgresql.org/docs/9.0/static/storage-toast.html

Re: Tweaking bytea / large object block sizes?

От
Vincent Veyron
Дата:
Le dimanche 12 juin 2011 à 18:00 +0200, Hanno Schlichting a écrit :

> I'm looking into storing binary data in Postgres and trying to
> understand how data is actually stored in the database. The dataset
> I'm looking at is images, photos, pdf documents which should commonly
> be at a minimum 100kb, on average 10mb and can scale up to 100mb for
> each document. I want to store this data in the database, as I need
> transactional integrity and want to avoid the extra complexity of
> managing shared filesystems between a number of frontend application
> servers and database backends.
>

I had a similar requirement for the app that's in my sig. It uses a
PostgreSQL database, but the binary content of the documents resides in
an SQLite database.

That way, my PostgreSQL database remains very small, easy to manage and
backup. I have a separate procedure to do incremental backups of the
documents as they are added.

The SQLite db looks like this :

CREATE TABLE tbldocument_content (id_courrier INTEGER NOT NULL PRIMARY
KEY, content blob);
CREATE TABLE tbldocument_state (id_courrier INTEGER NOT NULL, backed_up
integer default 0, date_created date default (date('now')), FOREIGN
KEY(id_courrier) REFERENCES tbldocument_content(id_courrier));
CREATE INDEX tbldocument_state_backed_up_idx ON
tbldocument_state(backed_up);
CREATE INDEX tbldocument_state_id_courrier_idx ON
tbldocument_state(id_courrier);
CREATE TRIGGER create_document_state AFTER INSERT ON
tbldocument_content
 BEGIN
  INSERT INTO tbldocument_state (id_courrier) VALUES (NEW.id_courrier);
 END;
CREATE TRIGGER drop_document_state AFTER DELETE ON tbldocument_content
 BEGIN
  DELETE FROM tbldocument_state WHERE id_courrier=OLD.id_courrier;
 END;

id_courrier is generated by the PostgreSQL db.

Works great. You can't see it on in action on the web site with the demo
account, though.

--
Vincent Veyron
http://marica.fr/
Logiciel de gestion des sinistres et des contentieux pour le service juridique


Re: Tweaking bytea / large object block sizes?

От
Bill Moran
Дата:
On 6/12/11 12:00:19 PM, Hanno Schlichting wrote:
> Hi.
>
> I'm a new Postgres user. If I happen to ask stupid questions please
> feel free to point me to any documentation I should read or guidelines
> for asking questions.
>
> I'm looking into storing binary data in Postgres and trying to
> understand how data is actually stored in the database. The dataset
> I'm looking at is images, photos, pdf documents which should commonly
> be at a minimum 100kb, on average 10mb and can scale up to 100mb for
> each document. I want to store this data in the database, as I need
> transactional integrity and want to avoid the extra complexity of
> managing shared filesystems between a number of frontend application
> servers and database backends.
>
> The binary data will only be accessed as a whole. So either a complete
> new file is written to the DB or a complete file will be read and
> cached on a frontend server. I don't need streaming access or be able
> to stream partial data. The use-case seems to be well supported by
> Oracle 11g with the introduction of "secure files" (pdf: [1]).
>
> But from what I read of Postgres, my best bet is to store data as
> large objects [2]. Going all the way down this means storing the
> binary data as 2kb chunks and adding table row overhead for each of
> those chunks. Using the bytea type and the toast backend [3] it seems
> to come down to the same: data is actually stored in 2kb chunks for a
> page size of 8kb.
>
> I'm assuming I'll be able to get ~8kb jumbo frame packets over a
> gigabit network connection and would be able to use a ext4 volume with
> a block size of either 32kb or 64kb for the volume housing the binary
> data, but a smaller block size for the one housing the relational
> data.
>
> Given those other constraints, it seems silly to split data up into
> 2kb chunks on the database level. Is there any way the chunk size for
> binary data can be increased here independent of the one for "normal
> relational" data?

You could redefine LOBLKSIZE and/or BLCKSZ such that the result was
larger chunks stored for large objects and then build PG from source.
I assume that LOBLKSIZE is defined as a multiple of BLKSIZE for a
reason, and that adjusting BLKSIZE is the better way to go.

Before doing this, I would set yourself up a performance test case so
that you can be sure that your changes are actually leading to an
improvement.  Since an RDBMS isn't typically used to simulate a
large-scale filestore, it's likely that he 8K page size and 2K LO
block size are inefficient if that's what it's being used for, but I
wouldn't assume that larger sizes automatically mean more performance
until you actually test it.

--
Bill Moran

Re: Tweaking bytea / large object block sizes?

От
Craig Ringer
Дата:
On 06/13/2011 12:00 AM, Hanno Schlichting wrote:

> But from what I read of Postgres, my best bet is to store data as
> large objects [2]. Going all the way down this means storing the
> binary data as 2kb chunks and adding table row overhead for each of
> those chunks. Using the bytea type and the toast backend [3] it seems
> to come down to the same: data is actually stored in 2kb chunks for a
> page size of 8kb.

This is probably much less of a concern than you expect. Consider that
your file system almost certainly stores file data in chunks of between
512 bytes and 4kb (the block size) and performs just fine.

Given the file sizes you're working with, I'd try using `bytea' and see
how you go. Put together a test or simulation that you can use to
evaluate performance if you're concerned.

Maybe one day Linux systems will have a file system capable of
transactional behaviour like NTFS is, so Pg could integrate with the
file system for transactional file management. In the mean time, `bytea'
or `lo' seem to be your best bet.

--
Craig Ringer

Re: Tweaking bytea / large object block sizes?

От
Merlin Moncure
Дата:
On Sun, Jun 12, 2011 at 11:00 AM, Hanno Schlichting <hanno@hannosch.eu> wrote:
> Hi.
>
> I'm a new Postgres user. If I happen to ask stupid questions please
> feel free to point me to any documentation I should read or guidelines
> for asking questions.
>
> I'm looking into storing binary data in Postgres and trying to
> understand how data is actually stored in the database. The dataset
> I'm looking at is images, photos, pdf documents which should commonly
> be at a minimum 100kb, on average 10mb and can scale up to 100mb for
> each document. I want to store this data in the database, as I need
> transactional integrity and want to avoid the extra complexity of
> managing shared filesystems between a number of frontend application
> servers and database backends.
>
> The binary data will only be accessed as a whole. So either a complete
> new file is written to the DB or a complete file will be read and
> cached on a frontend server. I don't need streaming access or be able
> to stream partial data. The use-case seems to be well supported by
> Oracle 11g with the introduction of "secure files" (pdf: [1]).
>
> But from what I read of Postgres, my best bet is to store data as
> large objects [2]. Going all the way down this means storing the
> binary data as 2kb chunks and adding table row overhead for each of
> those chunks. Using the bytea type and the toast backend [3] it seems
> to come down to the same: data is actually stored in 2kb chunks for a
> page size of 8kb.
>
> I'm assuming I'll be able to get ~8kb jumbo frame packets over a
> gigabit network connection and would be able to use a ext4 volume with
> a block size of either 32kb or 64kb for the volume housing the binary
> data, but a smaller block size for the one housing the relational
> data.
>
> Given those other constraints, it seems silly to split data up into
> 2kb chunks on the database level. Is there any way the chunk size for
> binary data can be increased here independent of the one for "normal
> relational" data?

I would not even consider tweaking the internal block sizes until
you've determined there is a problem you expect you might solve by
doing so.  The single most important factor affecting blob performance
in postgres is how you send and receive the data -- you absolutely
want to use the binary protocol mode (especially for postgres versions
that don't support hex mode).  The next thing to look at is using
bytea/large object -- large objects are a bit faster and have a higher
theoretical limit on size but byea is a standard type and this offers
a lot of conveniences -- I'd say stick with bytea unless you've
determined there is a reason not to.  That said, if you are not
writing C some client side drivers might only allow binary
transmission through the lo interface so that's something to think
about.

merlin

merlin

Re: Tweaking bytea / large object block sizes?

От
Craig Ringer
Дата:
On 13/06/11 09:27, Merlin Moncure wrote:

> want to use the binary protocol mode (especially for postgres versions
> that don't support hex mode)

Allowing myself to get a wee bit sidetracked:

I've been wondering lately why hex was chosen as the new input/output
format when the bytea_output change went in. The Base64 encoding is
trivial to implement, already supported by standard libraries for many
languages and add-ons for the rest, fast to encode/decode, and much more
compact than a hex encoding, so it seems like a more attractive option.
PostgreSQL already supports base64 in explicit 'escape()' calls.

Was concern about input format ambiguity a motivator for avoiding
base64? Checking the archives:

http://archives.postgresql.org/pgsql-hackers/2009-05/msg00238.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00192.php

... it was considered but knocked back because it's enough more complex
to encode that it could matter on big dumps and standards-compliant
base64 appears to require newlines - something that was viewed as ugly
and problematic. Initial input format detection reliability options were
also raised, but as the same solution used for hex input would apply to
base64 input too it doesn't look like that was a big factor.

Personally, even with the newline 'ick factor' I think it'd be pretty
nice to have as an option for dumps and COPY.

Ascii85 (base85) would be another alternative. It's used in PostScript
and PDF, but isn't anywhere near as widespread as base64. It's still
trivial to implement and is 7-8% more space-efficient than base64.

After a bit of digging, though, I can't help wonder if a binary dump
format that's machine-representation independent, fast and compact isn't
more practical. Tools like Thrift (http://thrift.apache.org), Protocol
Buffers, etc might make it less painful. Maybe an interesting GsOC
project? Supporting binary COPY with a machine independent format would
be a natural extension of that, too.

--
Craig Ringer

Re: Tweaking bytea / large object block sizes?

От
Merlin Moncure
Дата:
On Mon, Jun 13, 2011 at 1:58 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 13/06/11 09:27, Merlin Moncure wrote:
>
>> want to use the binary protocol mode (especially for postgres versions
>> that don't support hex mode)
>
> Allowing myself to get a wee bit sidetracked:
>
> I've been wondering lately why hex was chosen as the new input/output
> format when the bytea_output change went in. The Base64 encoding is
> trivial to implement, already supported by standard libraries for many
> languages and add-ons for the rest, fast to encode/decode, and much more
> compact than a hex encoding, so it seems like a more attractive option.
> PostgreSQL already supports base64 in explicit 'escape()' calls.

yeah -- I remember the discussion.  I think the bottom line is that
hex is just simpler all around.  This conveys a number of small
advantages that, when added up, outweigh the slightly better space
efficiency.

merlin

Re: Tweaking bytea / large object block sizes?

От
Hanno Schlichting
Дата:
On Mon, Jun 13, 2011 at 3:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I would not even consider tweaking the internal block sizes until
> you've determined there is a problem you expect you might solve by
> doing so.

It's not a problem as such, but managing data chunks of 2000 bytes +
the hundreds of rows per object in the large_object table for 10mb
objects seems like a lot of wasted overhead, especially if the
underlying filesystem manages 32kb or 64kb blocks. My impression of
those values was that they are a bit antiquated or are tuned for
storing small variable character objects, but not anything I'd call
"binary large objects" these days.

> The single most important factor affecting blob performance
> in postgres is how you send and receive the data -- you absolutely
> want to use the binary protocol mode (especially for postgres versions
> that don't support hex mode).  The next thing to look at is using
> bytea/large object -- large objects are a bit faster and have a higher
> theoretical limit on size but byea is a standard type and this offers
> a lot of conveniences -- I'd say stick with bytea unless you've
> determined there is a reason not to.  That said, if you are not
> writing C some client side drivers might only allow binary
> transmission through the lo interface so that's something to think
> about.

Thanks, I got as much from the docs and the blogosphere.

We are going to use the large object interface. That seems to be the
least we can do - especially to avoid some encoding overhead. We are
storing bytes after all and not ascii characters so there should be no
encoding at all. We aren't using SQL as the query interface as such
but the Python bindings (http://www.initd.org/psycopg/) so we can take
full advantage of the underlying large object API's and do pretty
direct lo_import / lo_export calls. We are targeting at least Postgres
9.0, potentially going for 9.1 soon after it hits final.

Once we get further in the project, we'll of course do some intensive
benchmarking for the various options with our specific data and
configuration. I'm just trying to understand what bits and bytes are
actually stored and transferred behind all those API's.

Hanno

Re: Tweaking bytea / large object block sizes?

От
Merlin Moncure
Дата:
On Tue, Jun 14, 2011 at 6:48 AM, Hanno Schlichting <hanno@hannosch.eu> wrote:
> On Mon, Jun 13, 2011 at 3:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> I would not even consider tweaking the internal block sizes until
>> you've determined there is a problem you expect you might solve by
>> doing so.
>
> It's not a problem as such, but managing data chunks of 2000 bytes +
> the hundreds of rows per object in the large_object table for 10mb
> objects seems like a lot of wasted overhead, especially if the
> underlying filesystem manages 32kb or 64kb blocks. My impression of
> those values was that they are a bit antiquated or are tuned for
> storing small variable character objects, but not anything I'd call
> "binary large objects" these days.

That very well may be the case, and 10mb is approaching the upper
limit of what is sane to store inside the database.  Still, if you're
going through the trouble to adjust the setting and recompile, I'd
definitely benchmark the changes and post your findings here.  Point
being, all else being equal, it's always better to run with stock
postgres if you can manage it.

merlin