benchmarking the query planner (was Re: Simple postgresql.conf wizard)

Поиск
Список
Период
Сортировка
От Robert Haas
Тема benchmarking the query planner (was Re: Simple postgresql.conf wizard)
Дата
Msg-id 603c8f070812061019r5aca3aajeaa5493b11c5feff@mail.gmail.com
обсуждение исходный текст
Ответы Re: benchmarking the query planner (was Re: Simple postgresql.conf wizard)  (Greg Stark <greg.stark@enterprisedb.com>)
Список pgsql-hackers
Sorry for top posting but we are getting a bit far afield from the
original topic.  I followed up the tests I did last night:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00369.php

I benchmarked 100 iterations of EXPLAIN on the query Greg Stark put
together as a synthetic benchmark for default_statistics_target with
various values for "SET STATISTICS n".  Testing was done on CVS HEAD
on my laptop with no configure options other than --prefix.  Then I
did this, to disable compression on pg_statistic.

alter table pg_statistic alter column stanumbers1 set storage external;
alter table pg_statistic alter column stanumbers2 set storage external;
alter table pg_statistic alter column stanumbers3 set storage external;
alter table pg_statistic alter column stanumbers4 set storage external;
alter table pg_statistic alter column stavalues1 set storage external;
alter table pg_statistic alter column stavalues2 set storage external;
alter table pg_statistic alter column stavalues3 set storage external;
alter table pg_statistic alter column stavalues4 set storage external;

(Note that you'll need to put allow_system_table_mods=true in your
postgresql.conf file if you want this to work.)  Then I reran the
tests.  The results were pretty dramatic.  In the table below, the
first column is value of "SET STATISTICS n" that was performed the
table column prior to analyzing it.  The second column is the time
required to plan the query 100x AFTER disabling compression on
pg_statistic, and the third column is the time required to plan the
query 100x BEFORE disabling compression on pg_statistic.

10    0.829202    0.8249
20    1.059976    1.06957
30    1.168727    1.143803
40    1.287189    1.263252
50    1.370167    1.363951
60    1.486589    1.460464
70    1.603899    1.571107
80    1.69402    1.689651
90    1.79068    1.804454
100    1.930877    2.803941
150    2.446471    4.833002
200    2.95323    6.217708
250    3.436741    7.507919
300    3.983568    8.895015
350    4.497475    10.201713
400    5.072471    11.576961
450    5.615272    12.933128
500    6.286358    14.408157
550    6.895951    15.745378
600    7.400134    17.192916
650    8.038159    18.568616
700    8.606704    20.025952
750    9.154889    21.45775
800    9.80953    22.74635
850    10.363471    24.057379
900    11.022348    25.559911
950    11.69732    27.021034
1000    12.266699    28.711027

As you can see, for default_statistics_target > 90, this is a HUGE win.

After doing this test, I rebuilt with --enable-profiling and profiled
EXPLAIN 10x with SET STATISTICS 10, 70, 100, 200 with a vanilla
configuration, and then 200 again with compression turned off as
described above.  The, ahem, ridiculously small limit on attachment
size prevents me from attaching the full results, so please see the
attached results which are truncated after the first section.  10x
doesn't seem to be quite enough to get the exact picture of where the
bottlenecks are, but the overall picture is clear enough:
decompression introduces a huge overhead.

Looking specifically at the 200-decompress output, the next biggest
hit is AllocSetAlloc(), which, from the detailed results that I
unfortunately can't include, is being called mostly by datumCopy()
which is being called mostly by get_attstatsslot().  There are 4000
calls to get_attstatsslot() which result 701,500 calls to datumCopy().

I'm not too sure what any of this means in terms of optimizatiion,
other than that changing the storage type of pg_statistic columns to
external looks like a huge win.  Perhaps someone more knowledgeable
than I has some thoughts.

...Robert

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Grzegorz Jaskiewicz
Дата:
Сообщение: Re: Optimizing DISTINCT with LIMIT
Следующее
От: Andrew Chernow
Дата:
Сообщение: user-based query white list