Re: Open issues for HOT patch
От | Gregory Stark |
---|---|
Тема | Re: Open issues for HOT patch |
Дата | |
Msg-id | 87myvikgse.fsf@oxford.xeocode.com обсуждение исходный текст |
Ответ на | Re: Open issues for HOT patch (Heikki Linnakangas <heikki@enterprisedb.com>) |
Ответы |
Re: Open issues for HOT patch
(Alvaro Herrera <alvherre@commandprompt.com>)
|
Список | pgsql-hackers |
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > There is one wacky idea I haven't dared to propose yet: > > We could lift the limitation that you can't defragment a page that's > pinned, if we play some smoke and mirrors in the buffer manager. When > you prune a page, make a *copy* of the page you're pruning, and keep > both versions in the buffer cache. Old pointers keep pointing to the old > version. Any new calls to ReadBuffer will return the new copy, and the > old copy can be dropped when its pin count drops to zero. Fwiw when Heikki first mentioned this idea I thought it was the craziest thing I ever heard. But the more I thought about it the more I liked it. I've come to the conclusion that while it's a wart, it's not much worse than the wart of the super-exclusive lock which it replaces. In fact it's arguably cleaner in some ways. As a result vacuum would never have to wait for arbitrarily long pins and there wouldn't be the concept of a vacuum waiting for a vacuum lock with strange lock queueing semantics. It also means we could move tuples around on the page more freely. The only places which would have to deal with a possible new buffer would be precisely those places that lock the page. If you aren't locking the page then you definitely aren't about to fiddle with any bits that matter since your writes could be lost. Certainly you're not about to set xmin or xmax or anything like that. You might set hint bits which would be lost but probably not often since you would have already checked the visibility of the tuples with the page locked. There may be one or two places where we fiddle bits for a tuple we've just inserted ourselves thinking nobody else can see it yet, but the current philosophy seems to be leaning towards treating such coding practices as unacceptably fragile anyways. The buffer manager doesn't really need to track multiple "versions" of pages. It would just mark the old version as an orphaned buffer which is automatically a victim for the clock sweep as soon as the pin count drops to 0. It will never need to return such a buffer. What we would need is enough information to reread the buffer if someone tries to lock it and to unpin it when someone unpins a newer version. At first I thought the cost of copying the page would be a downside but in fact Heikki pointed out that in defragmentation we're already copying the page. In fact copying it to new memory instead of memory which is almost certainly likely in processor caches which would need to be invalidated would actually be faster and avoiding the use of memmove could be faster too. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: