Обсуждение: Setting visibility map in VACUUM's second phase
Hi All, I briefly mentioned this idea in one of the other thread, but starting a new thread to highlight the point. Today, we set the visibility map *only* in the first phase of vacuum. This works when the page has no dead tuples. But the vacuum itself is removing one or more dead tuples from the page, the visibility map update must wait for the next vacuum to come in. If the page receives no other action between this and the next vacuum, the VM bit will get set next time but not before the next vacuum reads and rescans the page. To explain this further, please see the test below on current master: I ran pgbench for a few seconds and then vacuumed the pgbench_accounts table (stripping out non-interesting output) INFO: "pgbench_accounts": found 4709 removable, 1000000 nonremovable row versions in 16475 out of 16475 pages So VACUUM removed all the dead tuples from 16475 pages. One would expect the next VACUUM to not do nothing because the table just vacuumed is not being accessed at all. But if I run vacuum again, we get this: INFO: "pgbench_accounts": found 0 removable, 1000000 nonremovable row versions in 16475 out of 16475 pages Its only third vacuum on the table that skips all the heap pages because they all are now marked as all-visible in the VM. INFO: "pgbench_accounts": found 0 removable, 0 nonremovable row versions in 0 out of 16475 pages If I repeat the same test with the attached patch, the second vacuum itself will skip all the heap pages because the VM is up-to-date at the end of the first vacuum. Please see the output from the first and second vacuum with the patch applied. INFO: "pgbench_accounts": found 4163 removable, 1000000 nonremovable row versions in 16465 out of 16465 pages INFO: "pgbench_accounts": found 0 removable, 0 nonremovable row versions in 0 out of 16465 pages So the idea that the patch implements is this. When we scan pages in the first phase of vacuum, if we find a page that has all-visible tuples but also has one or more dead tuples that we know the second phase of vacuum will remove, we mark such page with a special flag called PD_ALL_VISIBLE_OR_DEAD (I'm not married to the name, so feel free to suggest a better name). What this flag tells the second phase of vacuum is to mark the page all-visible and set the VM bit unless some intermediate action on the page again inserts a not-all-visible tuple. If such an action happens, the PD_ALL_VISIBLE_OR_DEAD flag will be cleared by that backend and the second phase of vacuum will not set all-visible flag and VM bit. The patch itself is quite small and works as intended. One thing that demanded special handling is the fact that visibilitymap_set() requires a cutoff xid to tell the Hot Standby to resolve conflicts appropriately. We could have scanned the page again during the second phase to compute the cutoff xid, but I thought we can overload the pd_prune_xid field in the page header to store this information which is already computed in the first phase. We don't need this field when a page is in PD_ALL_VISIBLE_OR_DEAD state because there is nothing to prune on the page as such in that state. I added a new page header flag to tell if the XID stored in pd_prune_xid is a prune XID or a cutoff XID. But may be its not required. If the page has PD_ALL_VISIBLE_OR_DEAD flag set, then we can assume that the XID is a cutoff XID, otherwise its a prune XID. So PageIsPrunable() will need consult the pd_prune_xid only if PD_ALL_VISIBLE_OR_DEAD is clear. AFAICS pgbench itself may not be the most appropriate benchmark to test this feature because between successive vacuums of the large pgbench_accounts table, I think almost every page in that table will have dead tuples and need vacuum's attention. But I think this will be a very powerful enhancement for the cases where a user does a lot of UPDATE/DELETE operations on a large table and then calls VACUUM to remove all the dead tuples. In the current master, the visibility map will remain unset (as shown by the vacuum output above). Hence index-only scans won't work until the table is vacuumed one more time. Another downside on the current master is that the next vacuum will once again read all those heap blocks in shared buffers and scan them over again. This unnecessary IO activity can be avoided with this idea.I don't have any reasonable hardware to test this though, so any help is highly appreciated. Comments ? Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Вложения
Pavan Deolasee <pavan.deolasee@gmail.com> writes: > So the idea that the patch implements is this. When we scan pages in > the first phase of vacuum, if we find a page that has all-visible > tuples but also has one or more dead tuples that we know the second > phase of vacuum will remove, we mark such page with a special flag > called PD_ALL_VISIBLE_OR_DEAD (I'm not married to the name, so feel > free to suggest a better name). What this flag tells the second phase > of vacuum is to mark the page all-visible and set the VM bit unless > some intermediate action on the page again inserts a not-all-visible > tuple. If such an action happens, the PD_ALL_VISIBLE_OR_DEAD flag will > be cleared by that backend and the second phase of vacuum will not set > all-visible flag and VM bit. This seems overly complicated, as well as a poor use of a precious page header flag bit. Why not just recheck all-visibility status after performing the deletions? Keep in mind that even if there were some not-all-visible tuples when we were on the page the first time, they might have become all-visible by the time we return. So this is going out of its way to produce a less-than-optimal result. > The patch itself is quite small and works as intended. One thing that > demanded special handling is the fact that visibilitymap_set() > requires a cutoff xid to tell the Hot Standby to resolve conflicts > appropriately. We could have scanned the page again during the second > phase to compute the cutoff xid, but I thought we can overload the > pd_prune_xid field in the page header to store this information which > is already computed in the first phase. And this is *far* too cute. The use of that field is complicated enough already without having its meaning vary depending on randomly-designed flag bits. And it's by no means obvious that using a by-now-old value when we return to the page is a good idea anyway (or even correct). I think taking a second whack at setting the visibility bit is a fine idea, but let's drop all the rest of this premature optimization. regards, tom lane
On Thu, Dec 6, 2012 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think taking a second whack at setting the visibility bit is a fine > idea, but let's drop all the rest of this premature optimization. +1. If there's any optimization needed here, we should try to do it by remembering relevant details from the first vacuum pass in backend-private memory, rather than by changing the on-disk format. One other thought: I'm wondering if we shouldn't try to push the work of setting the all-visible bit into heap_page_prune(). That would allow HOT pruning to set the bit. For example, consider an all-visible page. A tuple is HOT-updated and the page becomes not-all-visible. Now the page is pruned, removing the old tuple and changing the line pointer to a redirect. Presto, page is all-visible again. Also, it seems to me that heap_page_prune() is already figuring out most of the stuff we need to know in order to decide whether to set PD_ALL_VISIBLE. The logic looks quite different from what is happening in the vacuum code, because the vacuum code iterates over all of the line pointers while the pruning code is walking HOT chains.But it seems to me that a page can't be all-visibleunless there are no dead line pointers and no HOT chains of length != 1, and heap_prune_chain() does manage to call HeapTupleSatisfiesVacuum() for every tuple, so the raw information seems like it is available without any additional CLOG lookups. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > One other thought: I'm wondering if we shouldn't try to push the work > of setting the all-visible bit into heap_page_prune(). Hm, maybe ... > But it seems to me that a page can't be all-visible unless there are > no dead line pointers and no HOT chains of length != 1, and > heap_prune_chain() does manage to call HeapTupleSatisfiesVacuum() for > every tuple, so the raw information seems like it is available without > any additional CLOG lookups. HeapTupleSatisfiesVacuum is interested in whether a dead tuple is dead to everybody, but I don't think it figures out whether a live tuple is live to everybody. On the assumption that most tuples are live, adding the latter calculation might represent significant expense. regards, tom lane
On Thu, Dec 6, 2012 at 11:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I think taking a second whack at setting the visibility bit is a fine > idea, but let's drop all the rest of this premature optimization. > Fair enough. I thought about doing it that way but was worried that an additional page scan will raise eyebrows. While it does not affect the common case because we would have done that scan anyways in the subsequent vacuum, but in the worst case where most of the pages not remain all-visible by the time we come back to the second phase of vacuum, this additional line pointer scan will add some overhead. With couple of pluses for the approach, I won't mind doing it this way though. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On Fri, Dec 7, 2012 at 12:05 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Dec 6, 2012 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think taking a second whack at setting the visibility bit is a fine >> idea, but let's drop all the rest of this premature optimization. > > +1. > > If there's any optimization needed here, we should try to do it by > remembering relevant details from the first vacuum pass in > backend-private memory, rather than by changing the on-disk format. > Yeah, I talked about that approach on the other thread. I thought we can store the page LSN in the backend private memory for all such pages and compare that with the current page LSN to know if the page got an intermediate action to demand a recheck for all-visibility. But I agree to keep these aggressive optimizations to side for now and revisit them if necessary. > One other thought: I'm wondering if we shouldn't try to push the work > of setting the all-visible bit into heap_page_prune(). That would > allow HOT pruning to set the bit. For example, consider an > all-visible page. A tuple is HOT-updated and the page becomes > not-all-visible. Now the page is pruned, removing the old tuple and > changing the line pointer to a redirect. Presto, page is all-visible > again. > +1 I had submitted a patch for that way back in 2008 or 2009, but blame me for not pursuing to the end. http://archives.postgresql.org/message-id/2e78013d0812042257x175e5a45w5edeaff14f7249ac@mail.gmail.com Alex Hunsaker had reviewed that patch and confirmed a significant improvement in the vacuum time. I think the patch needed some rework, but I dropped the ball or got busy with other things while waiting on others. If you think its useful to have, I will do the necessary work and submit for the next commitfest. Excerpts from Alex's comments: "The only major difference was with this patch vacuum time (after the first select after some hot updates) was significantly reduced for my test case (366ms vs 16494ms). There was no noticeable (within noise) select or update slow down. I was able to trigger WARNING: PD_ALL_VISIBLE flag once while running pgbench but have not be able to re-create it... (should I keep trying?)" Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On Friday, December 07, 2012 12:06 AM Robert Haas wrote: > On Thu, Dec 6, 2012 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I think taking a second whack at setting the visibility bit is a fine > > idea, but let's drop all the rest of this premature optimization. > > +1. > > If there's any optimization needed here, we should try to do it by > remembering relevant details from the first vacuum pass in > backend-private memory, rather than by changing the on-disk format. > > One other thought: I'm wondering if we shouldn't try to push the work > of setting the all-visible bit into heap_page_prune(). That would > allow HOT pruning to set the bit. For example, consider an > all-visible page. A tuple is HOT-updated and the page becomes > not-all-visible. Now the page is pruned, removing the old tuple and > changing the line pointer to a redirect. Presto, page is all-visible > again. I think it can reduce some load of Vacuum as well, but the only thing is it should not make Page prune as heavy. By the way, isn't this idea similar to patch at below link: http://archives.postgresql.org/pgsql-hackers/2010-02/msg02344.php With Regards, Amit Kapila.
On Fri, Dec 7, 2012 at 9:21 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > On Thu, Dec 6, 2012 at 11:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> >> I think taking a second whack at setting the visibility bit is a fine >> idea, but let's drop all the rest of this premature optimization. >> > > Fair enough. I thought about doing it that way but was worried that an > additional page scan will raise eyebrows. While it does not affect the > common case because we would have done that scan anyways in the > subsequent vacuum, but in the worst case where most of the pages not > remain all-visible by the time we come back to the second phase of > vacuum, this additional line pointer scan will add some overhead. With > couple of pluses for the approach, I won't mind doing it this way > though. > Revised patch attached. This time much less invasive. I added a new function heap_page_is_all_visible() to check if the given page is all-visible and also return the visibility cutoff xid. Couple of things: - We use the same OldestXmin computed in the first phase for visibility checks. We can potentially recompute this at the start of second phase. I wasn't convinced thats needed or even correct - It will be a good idea to consolidate the page scanning in a single place to avoid code duplication. But may be will do that some other day. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Вложения
On Friday, December 7, 2012, Pavan Deolasee wrote:
Revised patch attached. This time much less invasive. I added a new
function heap_page_is_all_visible() to check if the given page is
all-visible and also return the visibility cutoff xid.
Hi Pavan,
lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks all-visible, which seems like a lot of needless traffic since the next vmbuffer is likely to be the same as the previous one.
Could it instead get vmbuffer passed down to it from the two calling sites? (One call site would have to have it introduced, just for this purpose, the other should be able to use its existing one.)
Cheers,
Jeff
Hi Jeff, On Thu, Jan 24, 2013 at 2:41 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > > lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks > all-visible, which seems like a lot of needless traffic since the next > vmbuffer is likely to be the same as the previous one. > Good idea. Even though the cost of pinning/unpinning may not be high with respect to the vacuum cost itself, but it seems to be a good idea because we already do that at other places. Do you have any other review comments on the patch or I'll fix this and send an updated patch soon. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On Thu, Jan 24, 2013 at 1:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > Hi Jeff, > > On Thu, Jan 24, 2013 at 2:41 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> >> lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks >> all-visible, which seems like a lot of needless traffic since the next >> vmbuffer is likely to be the same as the previous one. >> > > Good idea. Even though the cost of pinning/unpinning may not be high > with respect to the vacuum cost itself, but it seems to be a good idea > because we already do that at other places. Do you have any other > review comments on the patch or I'll fix this and send an updated > patch soon. That was the only thing that stood out to me. You had some doubts about visibility_cutoff_xid, but I don't know enough about that to address them. I agree that it would be nice to avoid the code duplication, but I don't see a reasonable way to do that. It applies cleanly (some offsets), builds without warnings, passes make check under cassert. No documentation or regression tests are needed. We want this, and it does it. I have verified the primary objective (setting vm sooner) but haven't experimentally verified that this actually increases throughput for some workload. For pgbench when all data fits in shared_buffers, at least it doesn't cause a readily noticeable slow down. I haven't devised any patch-specific test cases, either for correctness or performance. I just used make check, pgbench, and the "vacuum verbose" verification you told us about in the original submission. If we are going to scan a block twice, I wonder if it should be done the other way around. If there are any dead tuples that need to be removed from indexes, there is no point in dirtying the page with a HOT prune on the first pass when it will just be dirtied again after the indexes are vacuumed. I don't see this idea holding up your patch though, as surely it would be more invasive than what you are doing. Cheers, Jeff
On Thu, Jan 24, 2013 at 9:31 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Jan 24, 2013 at 1:28 AM, Pavan Deolasee > <pavan.deolasee@gmail.com> wrote: >> >> Good idea. Even though the cost of pinning/unpinning may not be high >> with respect to the vacuum cost itself, but it seems to be a good idea >> because we already do that at other places. Do you have any other >> review comments on the patch or I'll fix this and send an updated >> patch soon. > > That was the only thing that stood out to me. The attached patch gets that improvement. Also rebased on the latest head. > You had some doubts > about visibility_cutoff_xid, but I don't know enough about that to > address them. I agree that it would be nice to avoid the code > duplication, but I don't see a reasonable way to do that. > I have left a comment there to ensure that someone changing the code also takes pain to look at the other part. > It applies cleanly (some offsets), builds without warnings, passes > make check under cassert. No documentation or regression tests are > needed. We want this, and it does it. > > I have verified the primary objective (setting vm sooner) but haven't > experimentally verified that this actually increases throughput for > some workload. For pgbench when all data fits in shared_buffers, at > least it doesn't cause a readily noticeable slow down. > > I haven't devised any patch-specific test cases, either for > correctness or performance. I just used make check, pgbench, and the > "vacuum verbose" verification you told us about in the original > submission. Thanks for the tests and all the review. > > If we are going to scan a block twice, I wonder if it should be done > the other way around. If there are any dead tuples that need to be > removed from indexes, there is no point in dirtying the page with a > HOT prune on the first pass when it will just be dirtied again after > the indexes are vacuumed. I don't see this idea holding up your patch > though, as surely it would be more invasive than what you are doing. > I think there is a merit in this idea. But as you rightly said, we should tackle that as a separate patch. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Вложения
On Sat, Jan 26, 2013 at 11:25 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > On Thu, Jan 24, 2013 at 9:31 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Thu, Jan 24, 2013 at 1:28 AM, Pavan Deolasee >> <pavan.deolasee@gmail.com> wrote: > >>> >>> Good idea. Even though the cost of pinning/unpinning may not be high >>> with respect to the vacuum cost itself, but it seems to be a good idea >>> because we already do that at other places. Do you have any other >>> review comments on the patch or I'll fix this and send an updated >>> patch soon. >> >> That was the only thing that stood out to me. > > The attached patch gets that improvement. Also rebased on the latest head. Hi Pavan, I get this warning: vacuumlazy.c:890: warning: passing argument 6 of 'lazy_vacuum_page' makes pointer from integer without a cast and make check then fails. I've added '&' to that line, and it now passes make check with --enable-cassert. At line 1096, when you release the vmbuffer, you don't set it to InvalidBuffer like the other places in the code do. It seems like this does would lead to a crash or assertion failure, but it does not seem to do so. other places: if (BufferIsValid(vmbuffer)) { ReleaseBuffer(vmbuffer); vmbuffer = InvalidBuffer; } Also, the "Note: If you change anything below, also look at" should probably say "Note: If you change anything in the for loop below, also look at". Otherwise I'd be wondering how far below the caveat applies. I've attached a patch with these changes made. Does this look OK? Thanks, Jeff
Вложения
On Sun, Feb 3, 2013 at 2:31 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > Hi Pavan, > > I get this warning: > vacuumlazy.c:890: warning: passing argument 6 of 'lazy_vacuum_page' > makes pointer from integer without a cast > > and make check then fails. > > I've added '&' to that line, and it now passes make check with --enable-cassert. > Stupid me. Obviously I did not run make check before submitting the patch, but I'm surprised my short pgbench test did not catch this. Thanks a lot for finding and fixing this. > > At line 1096, when you release the vmbuffer, you don't set it to > InvalidBuffer like the other places in the code do. It seems like > this does would lead to a crash or assertion failure, but it does not > seem to do so. > That's harmless because vmbuffer is just a local variable in that function and we are at the end of the function and that variable is not used again. But it helps to just be consistent. So I'm OK with your change. > > Also, the "Note: If you change anything below, also look at" should > probably say "Note: If you change anything in the for loop below, also > look at". Otherwise I'd be wondering how far below the caveat > applies. Ok. > > I've attached a patch with these changes made. Does this look OK? > Looks good to me. I also repeated pgbench and make check and they work as expected. I'll add it to the CF and also mark the patch "ready for committer" Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On 03.02.2013 08:24, Pavan Deolasee wrote: > On Sun, Feb 3, 2013 at 2:31 AM, Jeff Janes<jeff.janes@gmail.com> wrote: >> I've attached a patch with these changes made. Does this look OK? > > Looks good to me. I also repeated pgbench and make check and they work > as expected. I'll add it to the CF and also mark the patch "ready for > committer" Looks good to me too. Committed, thanks. - Heikki