Re: I: About "Our CLUSTER implementation is pessimal" patch

Поиск
Список
Период
Сортировка
От Josh Kupershmidt
Тема Re: I: About "Our CLUSTER implementation is pessimal" patch
Дата
Msg-id AANLkTimQxxis81hh1weqVtwx_HU7exE_Mr8Ki0zPp3Bf@mail.gmail.com
обсуждение исходный текст
Ответ на Re: I: About "Our CLUSTER implementation is pessimal" patch  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Ответы Re: I: About "Our CLUSTER implementation is pessimal" patch  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: I: About "Our CLUSTER implementation is pessimal" patch  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Список pgsql-hackers
On Mon, Sep 27, 2010 at 10:05 PM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> I re-ordered some description in the doc. Does it look better?
> Comments and suggestions welcome.

I thought this paragraph was a little confusing:

!     In the second case, a full table scan is followed by a sort operation.
!     The method is faster than the first one when the table is highly
fragmented.
!     You need twice disk space of the sum in the case. In addition to the free
!     space needed by the previous case, this approach may also need a temporary
!     disk sort file which can be as big as the original table.

I think the worst-case disk space could be made a little more clear
here, and maybe some general wordsmithing as well. I wasn't sure what
"twice disk space of the sum" was in this description -- sum of what
(table and all indexes?). And does "twice disk space" include the
temporary disk sort file? Here's an idea of how I think this paragraph
could be cleaned up a bit, if my understanding of the disk space
required is about right:

!     In the second case, a full table scan is followed by a sort operation.
!     This method is faster than when the table is highly fragmented.
!     However, <command>CLUSTER</command> may require available disk space of
!     up to twice the sum of the size of the table and its indexes, if
it uses a temporary
!     disk sort file, which can be as big as the original table.

Also, AIUI, this second clustering method is similar to the older
idiom of CREATE TABLE new AS SELECT * FROM old ORDER BY col; Since the
paragraph describing this older idiom is being removed, perhaps a
brief mention in the documentation could be made of this similarity.

Some more wordsmithing: change
!      The planner tries to choose a faster method in them base on the
information
to:
!      The planner tries to choose the fastest method based on the information

I started looking at the performance impact of this patch based on
Leonardo's SQL file. On the 2 million row table, I see a consistent
~10% advantage for the sequential scan clusters. I'm going to try to
run the bigger tests a few times and post results from there when I
get a chance.

Josh


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: Using streaming replication as log archiving
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: I: About "Our CLUSTER implementation is pessimal" patch