Re: Incremental clustering?
От | Ray Ontko |
---|---|
Тема | Re: Incremental clustering? |
Дата | |
Msg-id | 20040105191830.GD8061@ontko.com обсуждение исходный текст |
Ответ на | Re: Incremental clustering? (John Siracusa <siracusa@mindspring.com>) |
Список | pgsql-admin |
John, et al, We too have an interest in reclustering large tables, but in our case most of the transactions are spread throughout the table (though in some cases not uniformly). I have been pondering a program that selects all the rows in the table in cluster order and then, as a single transaction, deletes a block-full of rows and then re-inserts them. The program would then move on to the next block-full and repeat the operation. Note that if there are triggers on the table, this may have unintended side-effects. This would require having a pretty clear idea of the space required for each row, and would probably require frequent vacuums during the process. If there were a way to tell the block address of each row, I suppose you could leave some rows where they are. In the end, you might end up with the same space requirements (a full copy as workspace), but I'm not sure. Depending on the data, it may be possible to add new rows to the table while this process is going on. Any new rows added by other processes will certainly not be in order, and may interfere with new rows being added in a contiguous fashion (I'm not sure of the allocation algorithm used by PG). Thoughts? Comments? Ray On Mon, Jan 05, 2004 at 01:54:16PM -0500, John Siracusa wrote: > On 1/4/04 6:24 PM, Christopher Browne wrote: > > The cluster operation potentially has to reorder all the tuples, and > > the fact that the table is already _partially_ organized only > > diminishes the potential. If the new data, generally added "at the > > end," has values that are fairly uniformly distributed across the > > index, then the operation really will have to reorder all of the > > tuples... > > What about the special case of a table that is clustered on a column and all > subsequent inserts will add rows with ever-increasing values of that column? > This would be the case for creation dates or even a column created from a > sequence. Basically, after clustering, it would be nice if you could tell > the system to "only add to the end" and to "add in clustered order." > > Programming for special cases is annoying, but sometimes it really helps. > > -John > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org ---------------------------------------------------------------------- Ray Ontko rayo@ontko.com Phone 1.765.935.4283 Fax 1.765.962.9788 Ray Ontko & Co. Software Consulting Services http://www.ontko.com/
В списке pgsql-admin по дате отправления: