trying to use CLUSTER

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема trying to use CLUSTER
Дата
Msg-id CAMkU=1zcK+O+_hvQpj+4D+-+mH_LiVxpY+hzW8hOL3wtaQosxQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: trying to use CLUSTER  (Albe Laurenz <laurenz.albe@wien.gv.at>)
Список pgsql-general
On Tuesday, February 12, 2013, Sahagian, David wrote:

> Version=9.1.7****
>
> ** **
>
> INFO: clustering "my_cool_table"  using sequential scan and sort****
>
> INFO: "my_cool_table": found 1 removable, 1699139 nonremovable row
> versions in 49762 pages****
>
> Detail: 1689396 dead row versions cannot be removed yet.****
>
> CPU 9.80s/4.98u sec elapsed 175.92 sec. ****
>
> **
>
...


> ** **
>
> Can somebody tell me why some "dead row versions cannot be removed yet" ?*
> ***
>
> I assume that means CLUSTER must write them to the new table ?
>

It means that while the CLUSTER itself considers them dead, they might
still look alive to some older transaction, so they need to be copied.
 (That older transaction must not have touched the table yet, or else it
would hold a lock that would prevent the CLUSTER from taking place).  You
might want to hunt down the source of those long-lived transactions and try
to eliminated them.



> It seems very costly to do the CLUSTER, if the new table is not really
> going to be a tiny fraction of the old table.****
>
> Is there a way for me to discover the approx number of "non-removables"
> BEFORE I do the CLUSTER ?
>

Not that I can think of.  Well, other than doing a VACUUM VERBOSE, but that
itself would be very costly to do and wasteful if it most of the table is
dead and immediately going to be CLUSTERed anyway.

You might be able to use the contrib module pageinspect to come up with
your own sampling technique.  Although I don't immediately see how to do
that.

>  **
>
> Also, does the use of [index scan on "pk_cool"] basically depend on the
> ratio of removable/nonremovable row versions ?
>

I don't think so.  It mostly depends on the correlation (i.e. how well it
is already clustered), the maintenance_work_mem, and the size the table in
pages, and the estimated size/number of the "live" rows.   (The true costs
depend on the live+nonremovable, but the planner does not have that number
available to it so it uses "live" instead in making the estimates)

Cheers,

Jeff

В списке pgsql-general по дате отправления:

Предыдущее
От: Sergey Konoplev
Дата:
Сообщение: Re: Determining if an hstore is empty
Следующее
От: Aleksey Tsalolikhin
Дата:
Сообщение: Graphing query results from within psql.