Feature suggestion : FAST CLUSTER
От | PFC |
---|---|
Тема | Feature suggestion : FAST CLUSTER |
Дата | |
Msg-id | op.tspwemefcigqcu@apollo13 обсуждение исходный текст |
Ответы |
Re: Feature suggestion : FAST CLUSTER
|
Список | pgsql-performance |
Well, CLUSTER is so slow (and it doesn't cluster the toast tables associated with the table to be clustered). However, when people use CLUSTER they use it to speed up their queries. For that the table does not need to be perfectly in-order. So, here is a new idea for CLUSTER : - choose a chunk size (about 50% of your RAM) - setup disk sorts for all indexes - seq scan the table : - take a chunk of chunk_size - sort it (in memory) - write it into new table file - while we have the data on-hand, also send the indexed columns data into the corresponding disk-sorts - finish the index disk sorts and rebuild indexes This does not run a complete sort on the table. It would be about as fast as your seq scan disk throughput. Obviously, the end result is not as good as a real CLUSTER since the table will be made up of several ordered chunks and a range lookup. Therefore, a range lookup on the clustered columns would need at most N seeks, versus 1 for a really clustered table. But it only scans the table once and writes it once, even counting index rebuild. I would think that, with this approach, if people can CLUSTER a large table in 5 minutes instead of hours, they will use it, instead of not using it. Therefore, even if the resulting table is not as optimal as a fully clustered table, it will still be much better than the non-clustered case.
В списке pgsql-performance по дате отправления: