Re: INSERT times - same storage space but more fields -> much slower inserts

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: INSERT times - same storage space but more fields -> much slower inserts
Дата
Msg-id 20090414155614.GL8123@tamriel.snowman.net
обсуждение исходный текст
Ответ на INSERT times - same storage space but more fields -> much slower inserts  (Craig Ringer <craig@postnewspapers.com.au>)
Ответы Re: INSERT times - same storage space but more fields -> much slower inserts  (Matthew Wakeling <matthew@flymine.org>)
Список pgsql-performance
Craig,

* Craig Ringer (craig@postnewspapers.com.au) wrote:
> I've been doing some testing for the Bacula project, which uses
> PostgreSQL as one of the databases in which it stores backup catalogs.

We also use Bacula with a PostgreSQL backend.

> I've been evaluating a schema change for Bacula that takes a field
> that's currently stored as a gruesome-to-work-with base64-encoded
> representation of a binary blob, and expands it into a set of integer
> fields that can be searched, indexed, etc.

This would be extremely nice.

> The table size of the expanded form is marginally smaller than for the
> base64-encoded string version. However, INSERT times are *CONSIDERABLY*
> greater for the version with more fields. It takes 1011 seconds to
> insert the base64 version, vs 1290 seconds for the expanded-fields
> version. That's a difference of 279 seconds, or 27%.
>
> Despite that, the final table sizes are the same.
>
> If I use tab-separated input and COPY, the original-format file is
> 1300MB and the expanded-structure format is 1618MB. The performance hit
> on COPY-based insert is not as bad, at 161s vs 182s (13%), but still
> quite significant.
>
> Any ideas about what I might be able to do to improve the efficiency of
> inserting records with many integer fields?

Bacula should be using COPY for the batch data loads, so hopefully won't
suffer too much from having the fields split out.  I think it would be
interesting to try doing PQexecPrepared with binary-format data instead
of using COPY though.  I'd be happy to help you implement a test setup
for doing that, if you'd like.

        Thanks,

            Stephen

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: Nested query performance issue
Следующее
От: Nikolas Everett
Дата:
Сообщение: Re: difficulties with time based queries