Re: Improving vacuum/VM/etc
От | Robert Haas |
---|---|
Тема | Re: Improving vacuum/VM/etc |
Дата | |
Msg-id | CA+Tgmoa8JV1gWTELmkc=OqTr7PUM9=hnym8CKeJef49Lfswuzw@mail.gmail.com обсуждение исходный текст |
Ответ на | Improving vacuum/VM/etc (Jim Nasby <Jim.Nasby@BlueTreble.com>) |
Ответы |
Re: Improving vacuum/VM/etc
(Jim Nasby <Jim.Nasby@BlueTreble.com>)
|
Список | pgsql-hackers |
On Thu, Apr 23, 2015 at 3:09 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > Unfortunately, the states I came up with using existing semantics don't look > hugely useful[4], but if we take Robert's idea and make all-visible mean > all-frozen, we can do much better: > > 0: Newly inserted tuples > Tracking this state allows us to aggressively set hint bits. Who is "us"? And what do you mean by "aggressively"? As things stand, any process that has to touch a tuple always sets any applicable hint bits. > 1: Newly deleted > There are tuples that have been deleted but not pruned. There may also be > newly inserted tuples that need hinting (state 0). > > Similar to state 0, we'd want to be fairly aggressive with these pages, > because as soon as the deleting XID is committed and older than all > snapshots we can prune. Because we can prune without hitting indexes, this > is still a fairly cheap operation, though not as cheap as 0. What behavior difference would you foresee between state 0 and state 1? > 2: Fully hinted, not frozen > This is the really painful state to clean up, because we have to deal with > indexes. We must enter this state after being in 1. Neither the fact that a page is fully hinted nor the fact that it is or is not frozen implies anything about dealing with indexes. We need to deal with indexes because the page contains either dead tuples (as a result of an aborted insert, a committed delete, or an aborted or committed update) or dead line pointers (as a result of pruning dead tuples). > 3: All-visible-frozen > Every tuple on the page is visible and frozen. Pages in this state need no > maintenance at all. We might be able to enter this state directly from state > 0. > > > BENEFITS > This tracking should help at least 3 problems: the need to set hint bits > after insert, SELECT queries doing pruning (Simon's recent complaint), and > needing to scan an entire table for freezing. > > The improvement in hinting and pruning is based on the idea that normally > there would not be a lot of pages in state 0 or 1, and pages that were in > those states are very likely to still be in disk cache (if not shared > buffers). That means we can have a background process (or 2) that is very > aggressive at targeting pages in these states. OK, I agree that a background process could be useful. Whenever it sees a dirty page, it could attempt to aggressively set hint bits, prune, mark all-visible, and freeze the page before that page gets evicted. However, that doesn't require the sort of state map you're proposing here. I think your statement about "pages that were in those states are still likely to be in the disk cache" is not really true. I mean, if we're doing OLTP, yes. But not if we're bulk-loading. > Not needing to scan everything that's frozen is thanks to state 3. I think > it's OK (at least for now) if only vacuum puts pages into this state, which > means it can actually freeze the tuples when it does it (thanks to 37484ad > we won't lose forensic data doing this). That means there's no extra work > necessary by a foreground process that's dirtying a page. Did you notice the discussion on the other thread about this increasing WAL volume by a factor of 113? > Because of 37484ad, I think as part of this we should also deprecate > vacuum_freeze_min_age, or at least change it's behavior. AFAIK the only > objection to aggressive freezing was loss of forensic data, and that's gone > now. So vacuum (and presumably the bg process(es) than handle state 0 and 1) > should freeze tuples if it would allow the whole page to be frozen. Possibly > it should just do it any time it's dirtying the page. (We could actually do > this right now; it would let us eliminate the GUC, but I'm not sure there'd > be other benefit without the rest of this.) Reducing vacuum_freeze_min_age certainly seems worth considering. I don't know how to judge whether it's a good idea, though. You're balancing less I/O later against a lot more WAL right now. > DOWNSIDES > This does mean doubling the size of the VM. It would still be 32,000 times > smaller than the heap with 8k pages (and 128,000 times smaller with the > common warehouse 32k page size), so I suspect this is a non-issue, but it's > worth mentioning. It might have some effect on a almost entirely read-only > system; but I suspect in most other cases the other benefits will outweigh > this. I don't think that's a problem. > This approach still does nothing to help the index related activity in > vacuum. My gut says state 2 should be further split; but I'm not sure why. > Perhaps if we had another state we could do something more intelligent with > index cleanup... I can't really follow why you've got these states to begin with. 0, 1, and 2 are all pretty much the same. The useful distinction AFAICS is between not-all-visible, all-visible, and all-visible-plus-frozen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Robert HaasДата:
Сообщение: Re: adding more information about process(es) cpu and memory usage