Обсуждение: INSERT times - same storage space but more fields -> much slower inserts

От:
Craig Ringer
Дата:

Hi

I've been doing some testing for the Bacula project, which uses
PostgreSQL as one of the databases in which it stores backup catalogs.

Insert times are critical in this environment, as the app may insert
millions of records a day.

I've been evaluating a schema change for Bacula that takes a field
that's currently stored as a gruesome-to-work-with base64-encoded
representation of a binary blob, and expands it into a set of integer
fields that can be searched, indexed, etc.

The table size of the expanded form is marginally smaller than for the
base64-encoded string version. However, INSERT times are *CONSIDERABLY*
greater for the version with more fields. It takes 1011 seconds to
insert the base64 version, vs 1290 seconds for the expanded-fields
version. That's a difference of 279 seconds, or 27%.

Despite that, the final table sizes are the same.

The SQL dump for the base64 version is 1734MB and the expanded one is
2189MB, about a 25% increase. Given that the final table sizes are the
same, is the slowdown likely to just be the cost of parsing the extra
SQL, converting the textual representations of the numbers, etc?

If I use tab-separated input and COPY, the original-format file is
1300MB and the expanded-structure format is 1618MB. The performance hit
on COPY-based insert is not as bad, at 161s vs 182s (13%), but still
quite significant.

Any ideas about what I might be able to do to improve the efficiency of
inserting records with many integer fields?


In case it's of interest, the base64 and expanded schema are:


CREATE TABLE file (
    fileid bigint NOT NULL,
    fileindex integer DEFAULT 0 NOT NULL,
    jobid integer NOT NULL,
    pathid integer NOT NULL,
    filenameid integer NOT NULL,
    markid integer DEFAULT 0 NOT NULL,
    lstat text NOT NULL,
    md5 text NOT NULL
);



CREATE TABLE file (
    fileid bigint,
    fileindex integer,
    jobid integer,
    pathid integer,
    filenameid integer,
    markid integer,
    st_dev integer,
    st_ino integer,
    st_mod integer,
    st_nlink integer,
    st_uid integer,
    st_gid integer,
    st_rdev bigint,
    st_size integer,
    st_blksize integer,
    st_blocks integer,
    st_atime integer,
    st_mtime integer,
    st_ctime integer,
    linkfi integer,
    md5 text
);


( Yes, those are the fields of a `struct lstat' ).

--
Craig Ringer


От:
Stephen Frost
Дата:

Craig,

* Craig Ringer () wrote:
> I've been doing some testing for the Bacula project, which uses
> PostgreSQL as one of the databases in which it stores backup catalogs.

We also use Bacula with a PostgreSQL backend.

> I've been evaluating a schema change for Bacula that takes a field
> that's currently stored as a gruesome-to-work-with base64-encoded
> representation of a binary blob, and expands it into a set of integer
> fields that can be searched, indexed, etc.

This would be extremely nice.

> The table size of the expanded form is marginally smaller than for the
> base64-encoded string version. However, INSERT times are *CONSIDERABLY*
> greater for the version with more fields. It takes 1011 seconds to
> insert the base64 version, vs 1290 seconds for the expanded-fields
> version. That's a difference of 279 seconds, or 27%.
>
> Despite that, the final table sizes are the same.
>
> If I use tab-separated input and COPY, the original-format file is
> 1300MB and the expanded-structure format is 1618MB. The performance hit
> on COPY-based insert is not as bad, at 161s vs 182s (13%), but still
> quite significant.
>
> Any ideas about what I might be able to do to improve the efficiency of
> inserting records with many integer fields?

Bacula should be using COPY for the batch data loads, so hopefully won't
suffer too much from having the fields split out.  I think it would be
interesting to try doing PQexecPrepared with binary-format data instead
of using COPY though.  I'd be happy to help you implement a test setup
for doing that, if you'd like.

        Thanks,

            Stephen

От:
Matthew Wakeling
Дата:

On Tue, 14 Apr 2009, Stephen Frost wrote:
> Bacula should be using COPY for the batch data loads, so hopefully won't
> suffer too much from having the fields split out.  I think it would be
> interesting to try doing PQexecPrepared with binary-format data instead
> of using COPY though.  I'd be happy to help you implement a test setup
> for doing that, if you'd like.

You can always do binary-format COPY.

Matthew

--
 An ant doesn't have a lot of processing power available to it. I'm not trying
 to be speciesist - I wouldn't want to detract you from such a wonderful
 creature, but, well, there isn't a lot there, is there?
                                        -- Computer Science Lecturer

От:
Stephen Frost
Дата:

* Matthew Wakeling () wrote:
> On Tue, 14 Apr 2009, Stephen Frost wrote:
>> Bacula should be using COPY for the batch data loads, so hopefully won't
>> suffer too much from having the fields split out.  I think it would be
>> interesting to try doing PQexecPrepared with binary-format data instead
>> of using COPY though.  I'd be happy to help you implement a test setup
>> for doing that, if you'd like.
>
> You can always do binary-format COPY.

I've never played with binary-format COPY actually.  I'd be happy to
help test that too though.

    Thanks,

        Stephen

От:
Craig Ringer
Дата:

Stephen Frost wrote:
> * Matthew Wakeling () wrote:
>> On Tue, 14 Apr 2009, Stephen Frost wrote:
>>> Bacula should be using COPY for the batch data loads, so hopefully won't
>>> suffer too much from having the fields split out.  I think it would be
>>> interesting to try doing PQexecPrepared with binary-format data instead
>>> of using COPY though.  I'd be happy to help you implement a test setup
>>> for doing that, if you'd like.
>> You can always do binary-format COPY.
>
> I've never played with binary-format COPY actually.  I'd be happy to
> help test that too though.

I'd have to check the source/a protocol dump to be sure, but I think
PQexecPrepared(...), while it takes binary arguments, actually sends
them over the wire in text form. PostgreSQL does have a binary protocol
as well, but it suffers from the same issues as binary-format COPY:

Unlike PQexecPrepared(...), binary-format COPY doesn't handle endian and
type size issues for you. You need to convert the data to the database
server's endianness and type sizes, but I don't think the PostgreSQL
protocol provides any way to find those out.

It's OK if we're connected via a UNIX socket (and thus are on the same
host), though I guess a sufficiently perverse individual could install a
32-bit bacula+libpq, and run a 64-bit PostgreSQL server, or even vice versa.

It should also be OK when connected to `localhost' (127.0.0.0/8) .

In other cases, binary-format COPY would be unsafe without some way to
determine remote endianness and sizeof(various types).

--
Craig Ringer

От:
Tom Lane
Дата:

Craig Ringer <> writes:
> Unlike PQexecPrepared(...), binary-format COPY doesn't handle endian and
> type size issues for you. You need to convert the data to the database
> server's endianness and type sizes, but I don't think the PostgreSQL
> protocol provides any way to find those out.

The on-the-wire binary format is much better specified than you think.
(The documentation of it sucks, however.)  It's big-endian in all cases
and the datatype sizes are well defined.

            regards, tom lane

От:
Stephen Frost
Дата:

Craig,

* Craig Ringer () wrote:
> In other cases, binary-format COPY would be unsafe without some way to
> determine remote endianness and sizeof(various types).

As Tom mentioned already, the binary protocol is actually pretty well
defined, and it's in network-byte-order, aka, big-endian.  The only
issue that I can think of off-hand that you need to know about the
server is if it's using 64-bit integers for date-times or if it's using
float.  That's a simple check to do, however, specifically with:

show integer_datetimes;

It's also alot cheaper to do the necessary byte-flipping to go from
whatever-endian to network-byte-order than to do the whole printf/atoi
conversion.  Handling timestamps takes a bit more magic but you can just
pull the appropriate code/#defines from the server backend, but I don't
think that's even an issue for this particular set.

What does your test harness currently look like, and what would you like
to see to test the binary-format COPY?  I'd be happy to write up the
code necessary to implement binary-format COPY for this.

    Thanks,

        Stephen

От:
Matthew Wakeling
Дата:

On Tue, 14 Apr 2009, Stephen Frost wrote:
> What does your test harness currently look like, and what would you like
> to see to test the binary-format COPY?  I'd be happy to write up the
> code necessary to implement binary-format COPY for this.

If anyone needs this code in Java, we have a version at
http://www.intermine.org/

Download source code: http://www.intermine.org/wiki/SVNCheckout

Javadoc: http://www.intermine.org/api/

The code is contained in the org.intermine.sql.writebatch package, in the
intermine/objectstore/main/src/org/intermine/sql/writebatch directory in
the source.

The public interface is org.intermine.sql.writebatch.Batch.

The Postgres-specific binary COPY code is in
org.intermine.sql.writebatch.BatchWriterPostgresCopyImpl.

The implementation unfortunately relies on a very old modified version of
the Postgres JDBC driver, which is in the intermine/objectstore/main/lib
directory.

The code is released under the LGPL, and we would appreciate notification
if it is used.

The code implements quite a sophisticated system for writing rows to
database tables very quickly. It batches together multiple writes into
COPY statements, and writes them in the background in another thread,
while fully honouring flush calls. When it is using the database
connection is well-defined. I hope someone can find it useful.

Matthew

--
 -. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-.
 ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||
 |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/
 '   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'

От:
Matthew Wakeling
Дата:

On Wed, 15 Apr 2009, Matthew Wakeling wrote:
> If anyone needs this code in Java, we have a version at
> http://www.intermine.org/
>
> Download source code: http://www.intermine.org/wiki/SVNCheckout
>
> Javadoc: http://www.intermine.org/api/

Sorry, that should be http://www.flymine.org/api/

Matthew

--
 What goes up must come down. Ask any system administrator.