Re: Cluster using tablespaces?

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: Cluster using tablespaces?
Дата
Msg-id 20071202135627.GA6693@alvh.no-ip.org
обсуждение исходный текст
Ответ на Re: Cluster using tablespaces?  (Rainer Bauer <usenet@munnin.com>)
Список pgsql-general
Rainer Bauer wrote:
> Alvaro Herrera wrote:

> >It has been theorized that cluster would be faster in general if instead
> >of doing an indexscan we would instead use a seqscan + sort step.  It
> >would be good to measure it.
>
> Could a reindex on the clustered index speed up the clustering (when executed
> immediatelly before the cluster command)? As I understand it, this index is
> used to fetch the table data in the correct order. Or is most of the time
> spend fetching the table data?

I haven't measured it, but my guess is that most of the time is in
fetching heap pages in random order.

> Also, would it make sense to increase <shared_buffers> for the cluster
> operation. This is set to 32MB here on my Windows box as was recommended.

Not sure.  In general yes, but on Windows things are different.


> >> >For btree indexes, there is a temporary copy of the index data, which
> >> >will go wherever you have arranged for temp files to go.  (I think that
> >> >easy user control of this may be new for 8.3, though.)
> >>
> >> Could you give me a hint where that would be on Windows? I guess this might be
> >> worth a try since there are a couple of btree indexes in the database.
> >
> >I think Tom is referring to the new temp_tablespaces config variable.
>
> I moved the pgsql_tmp directory to another disk, but that didn't speed up the
> cluster command.

Probably most of the time is going into creating the new table then.

If you are looking for a short-term solution to your problem, maybe the
best is to follow the recommendation on CLUSTER ref page:

    There is another way to cluster data. The CLUSTER command
    reorders the original table by scanning it using the index you
    specify. This can be slow on large tables because the rows are
    fetched from the table in index order, and if the table is
    disordered, the entries are on random pages, so there is one
    disk page retrieved for every row moved. (PostgreSQL has a
    cache, but the majority of a big table will not fit in the
    cache.) The other way to cluster a table is to use

    CREATE TABLE newtable AS
        SELECT * FROM table ORDER BY columnlist;

    which uses the PostgreSQL sorting code to produce the desired
    order; this is usually much faster than an index scan for
    disordered data. Then you drop the old table, use ALTER TABLE
    ... RENAME to rename newtable to the old name, and recreate the
    table's indexes. The big disadvantage of this approach is that
    it does not preserve OIDs, constraints, foreign key
    relationships, granted privileges, and other ancillary
    properties of the table — all such items must be manually
    recreated. Another disadvantage is that this way requires a sort
    temporary file about the same size as the table itself, so peak
    disk usage is about three times the table size instead of twice
    the table size.

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)

В списке pgsql-general по дате отправления:

Предыдущее
От: Dewins Murillo
Дата:
Сообщение: Dewins has Tagged you! :)
Следующее
От: "Gautam Sampathkumar"
Дата:
Сообщение: "relation deleted while in use" encountered with Postgresql 8.0.8