Re: Free Space Map data structure
От | PFC |
---|---|
Тема | Re: Free Space Map data structure |
Дата | |
Msg-id | op.t9d0j4jzcigqcu@apollo13.peufeu.com обсуждение исходный текст |
Ответ на | Re: Free Space Map data structure (Heikki Linnakangas <heikki@enterprisedb.com>) |
Список | pgsql-hackers |
> PFC wrote: >> About the FSM : >> Would it be possible to add a flag marking pages where all tuples >> are visible to all transactions ? (kinda like frozen I think) > > Ah, the visibility map. That's another line of discussion. The current > plan is to not tie that to the FSM, but implement it separately. There's > some infrastructure changes that are needed for both, like the "map > forks" (see recent FSM discussions), which is why we need to have a > design for FSM as well before we start implementing the visibility map. > > It's definitely something I want to do for 8.4. Ahhhhhh, yes, yes, yes ;) yes !!!!! > Here's my rough plan: > 1. Common "map forks" support > 2. Rewrite FSM > 3. Implement visibility map, to allow partial vacuums > 4. Implement index-only scans, using the visibility map. Throwing another idea that is related to partial vacuums (perhaps ?):Consider a table that is often inserted (archive, forumposts, log, whatever), we want to CLUSTER it.In that case it would be beneficial to only cluster the "tail" of the table, where all the recently inserted rows are. Example :- Table is clustered.- Insert rows in random order, update some, delete some, etc, supposing inserts happen at the end- Table now looks like "head":[clustered part with some holes] plus "tail":[rows in random order]- As long as the "tail" fits in disk cache, the random order isn't a problem.- So, when the "tail" reaches a certain size :- Grab it, sort it by cluster order, write it again in the heap- Updatethe indexes in a manner similar to VACUUM (ie. bulk update)- Table now looks like "head":[clustered part with someholes] plus "tail":[clustered]This does not remove the holes in the "head", but is this really a problem ? In this usage scenario, I don't think so. Regular CLUSTER could also be run, much less frequently than before, and it will also be much faster since the rows are approximately in-order already. This approach is complimentary to the "auto-cluster" approach where the index is asked "where should I insert that row ?" (this will be used to fill the holes). Auto-cluster will work well in tables that are updated very often. But starting from an empty table, or an already clustered table, or in a mostly-insert scenario, the index will have no idea where to put that row... The goodness of this approach is that- As long as the "tail" fits in RAM, sorting it is going to be very fast (unlike the current CLUSTER).- Bulk index updates will also be fast as long as the list of changes to apply to the index fits in memory.- Therefore it will block the table for much less time than good old CLUSTER.- Therefore it will get more use ;) How to make it non-locking ?- Doing something like this in pseudo SQL :INSERT INTO table SELECT * FROM (DELETE FROM tableWHERE date > last time we did this RETURNING *) ORDER BY cluster_columns;VACUUM; That is, take the "tail" of the table (as above), sort it, insert it back in big chunks, and mark the old rows as deleted just like a regular delete would have done. Then VACUUM. In this case you now have :"head":[clustered part with some holes] + "big hole" + "tail":[clustered rows] Is the "big hole" a problem ? Probably not, it will be marked as free space by VACUUM and used for new inserts. A week later we get this :"head":[clustered part with some holes] + [rows in randomorder] + "tail":[clustered rows]Repeating the process above will make the "tail" grow and the "hole" will stay more or less in the same place. Another way to do it is to use partitions :- "archive" table- "current" table Periodically the rows from "current" are transferred to the "archive" table and sorted in the process. Then "current" is truncated.This works, but it is blocking, and you have the overhead from partitioning...
В списке pgsql-hackers по дате отправления: