Re: [PERFORM] encouraging index-only scans
От | Robert Haas |
---|---|
Тема | Re: [PERFORM] encouraging index-only scans |
Дата | |
Msg-id | CA+Tgmoaf_+kXH6g2syaqaUETJx7=UY47badxv1P16rpPM-oeKw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PERFORM] encouraging index-only scans (Andres Freund <andres@2ndquadrant.com>) |
Ответы |
Re: [PERFORM] encouraging index-only scans
(Andres Freund <andres@2ndquadrant.com>)
|
Список | pgsql-hackers |
On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund <andres@2ndquadrant.com> wrote: > The reason I suggested keeping track of the xids of unremovable tuples > is that the current logic doesn't handle that at all. We just > unconditionally set n_dead_tuples to zero after a vacuum even if not a > single row could actually be cleaned out. Which has the effect that we > will not start a vacuum until enough bloat (or after changing this, new > inserts) has collected to start vacuum anew. Which then will do twice > the work. > > Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do > much good either - we would just immediately trigger a new vacuum the > next time we check, even if the xmin horizon is still the same. One idea would be to store the xmin we used for the vacuum somewhere. Could we make that part of the pgstats infrastructure? Or store it in a new pg_class column? Then we could avoid re-triggering until it advances. Or, maybe better, we could remember the oldest XID that we weren't able to remove due to xmin considerations and re-trigger when the horizon passes that point. >> However, I do have one concern: it might lead to excessive >> index-vacuuming. Right now, we skip the index vac step only if there >> ZERO dead tuples are found during the heap scan. Even one dead tuple >> (or line pointer) will cause an index vac cycle, which may easily be >> excessive. So I further propose that we introduce a threshold for >> index-vac; so that we only do index vac cycle if the number of dead >> tuples exceeds, say 0.1% of the table size. > > Yes, that's a pretty valid concern. But we can't really do it that > easily. a) We can only remove dead line pointers when we know there's no > index pointing to it anymore. Which we only know after the index has > been removed. b) We cannot check the validity of an index pointer if > there's no heap tuple for it. Sure, we could check whether we're > pointing to a dead line pointer, but the random io costs of that are > prohibitive. > Now, we could just mark line pointers as dead and not mark that page as > all-visible and pick it up again on the next vacuum cycle. But that > would suck long-term. > > I think the only real solution here is to store removed tuples tids > (i.e. items where we've marked as dead) somewhere. Whenever we've found > sufficient tuples to-be-removed from indexes we do phase 2. I don't really agree with that. Yes, we could make that change, and yes, it might be better than what we're doing today, but it would be complex and have its own costs. And it doesn't mean that lesser steps are without merit. A vacuum pass over the heap buys us a LOT of space for reuse even without touching the indexes: we don't reclaim the line pointers, but we do reclaim the space for the tuples themselves, which is a big deal. So being able to do that more frequently without causing problems has a lot of value, I think. The fact that we get to set all-visible bits along the way makes future vacuums cheaper, and makes index scans work better, so that's good too. And the first vacuum to find a dead tuple will dirty the page to truncate it to a dead line pointer, while any subsequent revisits prior to the index vac cycle will only examine the page without dirtying it. All in all, just leaving the page to be caught be a future vacuum doesn't seem that bad to me, at least for a first cut. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: