Re: Feature suggestion : FAST CLUSTER
От | PFC |
---|---|
Тема | Re: Feature suggestion : FAST CLUSTER |
Дата | |
Msg-id | op.tszxrsircigqcu@apollo13 обсуждение исходный текст |
Ответ на | Re: Feature suggestion : FAST CLUSTER ("Jim C. Nasby" <decibel@decibel.org>) |
Ответы |
Re: Feature suggestion : FAST CLUSTER
Re: Feature suggestion : FAST CLUSTER Re: Feature suggestion : FAST CLUSTER |
Список | pgsql-performance |
On Sun, 27 May 2007 17:53:38 +0200, Jim C. Nasby <decibel@decibel.org> wrote: > On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote: >> This does not run a complete sort on the table. It would be about as >> fast as your seq scan disk throughput. Obviously, the end result is >> not as >> good as a real CLUSTER since the table will be made up of several >> ordered >> chunks and a range lookup. Therefore, a range lookup on the clustered >> columns would need at most N seeks, versus 1 for a really clustered >> table. >> But it only scans the table once and writes it once, even counting index >> rebuild. > > Do you have any data that indicates such an arrangement would be > substantially better than less-clustered data? While the little benchmark that will answer your question is running, I'll add a few comments : I have been creating a new benchmark for PostgreSQL and MySQL, that I will call the Forum Benchmark. It mimics the activity of a forum. So far, I have got interesting results about Postgres and InnoDB and will publish an extensive report with lots of nasty stuff in it, in, say, 2 weeks, since I'm doing this in spare time. Anyway, forums like clustered tables, specifically clusteriing posts on (topic_id, post_id), in order to be able to display a page with one disk seek, instead of one seek per post. PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster since I run it on dual core ; InnoDB uses only one core). However, InnoDB can automatically cluster tables without maintenance. This means InnoDB will, even though it sucks and is awfully bloated, run a lot faster than postgres if things become IO-bound, ie. if the dataset is larger than RAM. Postgres needs to cluster the posts table in order to keep going. CLUSTER is very slow. I tried inserting into a new posts table, ordering by (post_id, topic_id), then renaming the new table in place of the old. It is faster, but still slow when handling lots of data. I am trying other approaches, some quite hack-ish, and will report my findings. Regards
В списке pgsql-performance по дате отправления: