Обсуждение: pg_dump and --inserts / --column-inserts

Поиск

Список

Период

Сортировка

pg_dump and --inserts / --column-inserts

От

Thomas Kellerer

Дата:

16 июля 2010 г., 08:36:19

Hi,

the explanation of the --inserts option of pg_dumps states that

"The --column-inserts option is safe against column order changes, though even slower."

The way I read this is, that
   INSERT INTO table (column, ...) VALUES ...
is slower than
   INSERT INTO table VALUES ...

Is that really true?
Why would explicitely stating the columns be slower than relying on implicit column ordering?


Regards
Thomas

Re: pg_dump and --inserts / --column-inserts

От

Tom Lane

Дата:

16 июля 2010 г., 16:40:18

Thomas Kellerer <spam_eater@gmx.net> writes:
> the explanation of the --inserts option of pg_dumps states that

> "The --column-inserts option is safe against column order changes, though even slower."

> The way I read this is, that
>    INSERT INTO table (column, ...) VALUES ...
> is slower than
>    INSERT INTO table VALUES ...

> Is that really true?

I believe so, though I've not measured by how much.

> Why would explicitely stating the columns be slower than relying on implicit column ordering?

Well, first off, the volume of pg_dump'd data gets a lot larger due to
all the extra text.  If your column values aren't textually wide, you
could easily be looking at 2x the space.  That costs in I/O and network
transmission.  In the second place, it does take time to parse those
column names and look them up in the catalog.  Not much, but it'll add
up since it's done over again for every row.

            regards, tom lane

Re: pg_dump and --inserts / --column-inserts

От

Thomas Kellerer

Дата:

16 июля 2010 г., 20:26:38

Tom Lane wrote on 16.07.2010 18:40:
> Thomas Kellerer<spam_eater@gmx.net>  writes:
>> the explanation of the --inserts option of pg_dumps states that
>
>> "The --column-inserts option is safe against column order changes, though even slower."
>
>> The way I read this is, that
>>     INSERT INTO table (column, ...) VALUES ...
>> is slower than
>>     INSERT INTO table VALUES ...
>
>> Is that really true?
>
> I believe so, though I've not measured by how much.
>
>> Why would explicitely stating the columns be slower than relying on implicit column ordering?
>
> Well, first off, the volume of pg_dump'd data gets a lot larger due to
> all the extra text.  If your column values aren't textually wide, you
> could easily be looking at 2x the space.  That costs in I/O and network
> transmission.

Of course

> In the second place, it does take time to parse those
> column names and look them up in the catalog.  Not much, but it'll add
> up since it's done over again for every row.

Hmm.
For years I have been advocating to always use fully qualified column lists in INSERTs (for clarity and stability)
And now I learn it's slower when I do so :(

Thomas

Re: pg_dump and --inserts / --column-inserts

От

Craig Ringer

Дата:

17 июля 2010 г., 01:13:32

On 17/07/10 04:26, Thomas Kellerer wrote:

> Hmm.
> For years I have been advocating to always use fully qualified column
> lists in INSERTs (for clarity and stability)
> And now I learn it's slower when I do so :(

If you're not doing hundreds of thousands of identical ones at a time,
it's still very much a good idea. The costs of parsing and transmission
are usually pretty insignificant, and the readability/maintenance
benefits are huge.

It's only when dealing with bulk loading that this sort of thing starts
to be worth thinking about.

--
Craig Ringer

Re: pg_dump and --inserts / --column-inserts

От

Thomas Kellerer

Дата:

17 июля 2010 г., 07:55:34

Craig Ringer wrote on 17.07.2010 03:13:
> On 17/07/10 04:26, Thomas Kellerer wrote:
>
>> Hmm.
>> For years I have been advocating to always use fully qualified column
>> lists in INSERTs (for clarity and stability)
>> And now I learn it's slower when I do so :(
>
> If you're not doing hundreds of thousands of identical ones at a time,
> it's still very much a good idea. The costs of parsing and transmission
> are usually pretty insignificant, and the readability/maintenance
> benefits are huge.
>
> It's only when dealing with bulk loading that this sort of thing starts
> to be worth thinking about.
>
Thanks for the clarification.

I'm till a bit surprised that parsing the statement _with_ a column list is mesurably slower than withou a column list.

Regards
Thomas

Re: pg_dump and --inserts / --column-inserts

От

Tom Lane

Дата:

17 июля 2010 г., 14:36:13

Thomas Kellerer <spam_eater@gmx.net> writes:
> I'm till a bit surprised that parsing the statement _with_ a column list is mesurably slower than withou a column
list.

Well, nobody's offered any actual *numbers* here.  It's clear that
parsing the column list will take more time than not doing so, but
whether that adds up to anything significant or even measurable
compared to the rest of the statement's cost is not so clear.
Want to do some experiments?

            regards, tom lane

Re: pg_dump and --inserts / --column-inserts

От

Thomas Kellerer

Дата:

17 июля 2010 г., 16:29:41

Tom Lane wrote on 17.07.2010 16:36:
> Thomas Kellerer<spam_eater@gmx.net>  writes:
>> I'm till a bit surprised that parsing the statement _with_ a column list is mesurably slower than withou a column
list.
>
> Well, nobody's offered any actual *numbers* here.  It's clear that
> parsing the column list will take more time than not doing so, but
> whether that adds up to anything significant or even measurable
> compared to the rest of the statement's cost is not so clear.

> Want to do some experiments?

OK, I wrote a small Java program that inserts rows using a column list and without column list

I did that for 5,10,15 and 20 (integer) columns. then inserting 10000 rows into the empty table.

I measured the runtime as seen from the JDBC client and as reported by explain analyze (the last line reading "Total
runtime:")
All times are milliseconds  and are averaged over 20 runs

ColumnCount   with columns   without columns
    5          1132           1092.6
   10          1288.53        1148.33
   15          1430           1215.67
   20          1657.6         1313.2

Apparently there *is* a substiantial overhead, but I suspected the sending of the raw SQL literal to be a major factor
here.
(Server and JDBC program were running on the same machine)

So I ran "EXECUTE ANALYZE" instead of INSERT to get the processing time of the server and remove the JDBC/SQL literal
overhead.

   ColumnCount with columns   without columns
    5          116.33         115.3
   10          149.89         128.28
   15          169.94         159.14
   20          197.72         193.66

Which still shows an overhead, but less.

So apparently the "even slower" in the manual *is* right, as the overhead of sending the longer SQL Statement over the
networkdoes impact psql as well.

Re: pg_dump and --inserts / --column-inserts

От

Thomas Kellerer

Дата:

17 июля 2010 г., 16:56:12

Thomas Kellerer wrote on 17.07.2010 18:29:
>> Want to do some experiments?
>
> Apparently there *is* a substiantial overhead, but I suspected the
> sending of the raw SQL literal to be a major factor here.
> (Server and JDBC program were running on the same machine)
>

In case any one is interested.

Out of curiosity I ran the same test with a local Oracle installation (10.2) and there the overhead is substantially
lower(for 20 columns, only 2% slower, compared to 26% with Postgres) 

I can't run something equivalent to explain analyze in Oracle from within JDBC, so I could not compare those figures.
Butit seems that either the parsing overhead in Oracle is lower or the JDBC driver is more efficient... 

Regards
Thomas

Re: pg_dump and --inserts / --column-inserts

От

Tom Lane

Дата:

17 июля 2010 г., 17:35:41

Thomas Kellerer <spam_eater@gmx.net> writes:
> Tom Lane wrote on 17.07.2010 16:36:
>> Well, nobody's offered any actual *numbers* here.

> I measured the runtime as seen from the JDBC client and as reported by explain analyze (the last line reading "Total
runtime:")

The "runtime" from explain analyze really should not be measurably
different, since it doesn't include parse time or data transmission
time, and you ought to get the same execution plan with or without the
column names.  I'd dismiss those numbers as being within experimental
error, except it seems odd that they all differ in the same direction.
The overall times seen from the client seem plausible though;
particularly since we can see an increase in the percentage overhead
as the number of columns increases, which is what you'd expect if
you were accurately measuring a column-name-lookup overhead.

            regards, tom lane

Re: pg_dump and --inserts / --column-inserts

От

Thomas Kellerer

Дата:

17 июля 2010 г., 17:46:28

Tom Lane wrote on 17.07.2010 19:35:
> Thomas Kellerer<spam_eater@gmx.net>  writes:
>> Tom Lane wrote on 17.07.2010 16:36:
>>> Well, nobody's offered any actual *numbers* here.
>
>> I measured the runtime as seen from the JDBC client and as reported by explain analyze (the last line reading "Total
runtime:")
>
> The "runtime" from explain analyze really should not be measurably
> different, since it doesn't include parse time or data transmission
> time, and you ought to get the same execution plan with or without the
> column names.

Interesting.

My intend _was_ to exclude data transmission from the test by using explain analyze, but I'm surprised that it doesn't
includethe parsing in the execution time reported from that. 

> I'd dismiss those numbers as being within experimental
> error, except it seems odd that they all differ in the same direction.

And it's reproducable (at least on my computer). As I said I ran it 20 times (each run did it for 5,10,... columns) and
thevalues I posted were averages of those runs. 

Regards
Thomas

Re: pg_dump and --inserts / --column-inserts

От

Sam Mason

Дата:

19 июля 2010 г., 14:13:27

On Sat, Jul 17, 2010 at 07:46:23PM +0200, Thomas Kellerer wrote:
> Tom Lane wrote on 17.07.2010 19:35:
>> I'd dismiss those numbers as being within experimental
>> error, except it seems odd that they all differ in the same direction.
>
> And it's reproducable (at least on my computer). As I said I ran it 20
> times (each run did it for 5,10,... columns) and the values I posted
> were averages of those runs.

You couldn't give us the standard deviation as well could you?  If the
deviation within a test is larger than that between tests then you can't
say much, but without the numbers this can't be determined.

--
  Sam  http://samason.me.uk/

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: pg_dump and --inserts / --column-inserts