Re: Feature suggestion : FAST CLUSTER

Поиск

Список

Период

Сортировка

От	PFC
Тема	Re: Feature suggestion : FAST CLUSTER
Дата	27 мая 2007 г. 14:31:58
Msg-id	op.tszxrsircigqcu@apollo13 обсуждение исходный текст
Ответ на	Re: Feature suggestion : FAST CLUSTER ("Jim C. Nasby" <decibel@decibel.org>)
Ответы	Re: Feature suggestion : FAST CLUSTER Re: Feature suggestion : FAST CLUSTER Re: Feature suggestion : FAST CLUSTER
Список	pgsql-performance

Дерево обсуждения

On Sun, 27 May 2007 17:53:38 +0200, Jim C. Nasby <decibel@decibel.org>
wrote:

> On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote:
>>     This does not run a complete sort on the table. It would be about as
>>     fast  as your seq scan disk throughput. Obviously, the end result is
>> not as
>> good  as a real CLUSTER since the table will be made up of several
>> ordered
>> chunks and a range lookup. Therefore, a range lookup on the clustered
>> columns would need at most N seeks, versus 1 for a really clustered
>> table.
>> But it only scans the table once and writes it once, even counting index
>> rebuild.
>
> Do you have any data that indicates such an arrangement would be
> substantially better than less-clustered data?

    While the little benchmark that will answer your question is running,
I'll add a few comments :

    I have been creating a new benchmark for PostgreSQL and MySQL, that I
will call the Forum Benchmark. It mimics the activity of a forum.
    So far, I have got interesting results about Postgres and InnoDB and will
publish an extensive report with lots of nasty stuff in it, in, say, 2
weeks, since I'm doing this in spare time.

    Anyway, forums like clustered tables, specifically clusteriing posts on
(topic_id, post_id), in order to be able to display a page with one disk
seek, instead of one seek per post.
    PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster
since I run it on dual core ; InnoDB uses only one core). However, InnoDB
can automatically cluster tables without maintenance. This means InnoDB
will, even though it sucks and is awfully bloated, run a lot faster than
postgres if things become IO-bound, ie. if the dataset is larger than RAM.
    Postgres needs to cluster the posts table in order to keep going. CLUSTER
is very slow. I tried inserting into a new posts table, ordering by
(post_id, topic_id), then renaming the new table in place of the old. It
is faster, but still slow when handling lots of data.
    I am trying other approaches, some quite hack-ish, and will report my
findings.

    Regards

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Feature suggestion : FAST CLUSTER