Обсуждение: Re: [PERFORM] encouraging index-only scans
On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote: > > On 12/12/2012 05:12 PM, Andrew Dunstan wrote: > > > >On 12/12/2012 04:32 PM, Tom Lane wrote: > >>Andrew Dunstan <andrew@dunslane.net> writes: > >>>A client is testing a migration from 9.1 to 9.2, and has found that a > >>>large number of queries run much faster if they use index-only scans. > >>>However, the only way he has found to get such a plan is by increasing > >>>the seq_page_cost to insanely high levels (3.5). Is there any approved > >>>way to encourage such scans that's a but less violent than this? > >>Is the pg_class.relallvisible estimate for the table realistic? They > >>might need a few more VACUUM and ANALYZE cycles to get it into the > >>neighborhood of reality, if not. > > > >That was the problem - I didn't know this hadn't been done. > > > > Actually, the table had been analysed but not vacuumed, so this > kinda begs the question what will happen to this value on > pg_upgrade? Will people's queries suddenly get slower until > autovacuum kicks in on the table? [ moved to hackers list.] Yes, this does seem like a problem for upgrades from 9.2 to 9.3? We can have pg_dump --binary-upgrade set these, or have ANALYZE set it. I would prefer the later. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote: >> Actually, the table had been analysed but not vacuumed, so this >> kinda begs the question what will happen to this value on >> pg_upgrade? Will people's queries suddenly get slower until >> autovacuum kicks in on the table? > [ moved to hackers list.] > Yes, this does seem like a problem for upgrades from 9.2 to 9.3? We can > have pg_dump --binary-upgrade set these, or have ANALYZE set it. I > would prefer the later. ANALYZE does not set that value, and is not going to start doing so, because it doesn't scan enough of the table to derive a trustworthy value. It's been clear for some time that pg_upgrade ought to do something about transferring the "statistics" columns in pg_class to the new cluster. This is just another example of why. regards, tom lane
On Thu, Dec 13, 2012 at 9:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: >> On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote: >>> Actually, the table had been analysed but not vacuumed, so this >>> kinda begs the question what will happen to this value on >>> pg_upgrade? Will people's queries suddenly get slower until >>> autovacuum kicks in on the table? > >> [ moved to hackers list.] > >> Yes, this does seem like a problem for upgrades from 9.2 to 9.3? We can >> have pg_dump --binary-upgrade set these, or have ANALYZE set it. I >> would prefer the later. > > ANALYZE does not set that value, and is not going to start doing so, > because it doesn't scan enough of the table to derive a trustworthy > value. > Should we do that though ? i.e. scan the entire map and count the number of bits at the end of ANALYZE, like we do at the end of VACUUM ? I recently tried to optimize that code path by not recounting at the end of the vacuum and instead track the number of all-visible bits while scanning them in the earlier phases on vacuum. But it turned out that its so fast to count even a million bits that its probably not worth doing so. > It's been clear for some time that pg_upgrade ought to do something > about transferring the "statistics" columns in pg_class to the new > cluster. This is just another example of why. > +1. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On 13 December 2012 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yes, this does seem like a problem for upgrades from 9.2 to 9.3? We can >> have pg_dump --binary-upgrade set these, or have ANALYZE set it. I >> would prefer the later. > > ANALYZE does not set that value, and is not going to start doing so, > because it doesn't scan enough of the table to derive a trustworthy > value. ISTM that ANALYZE doesn't need to scan the table to do this. The vismap is now trustworthy and we can scan it separately on ANALYZE. More to the point, since we run ANALYZE more frequently than we run VACUUM, the value stored by the last VACUUM could be very stale. > It's been clear for some time that pg_upgrade ought to do something > about transferring the "statistics" columns in pg_class to the new > cluster. This is just another example of why. Agreed, but that could bring other problems as well. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 13, 2012 at 09:40:40AM +0000, Simon Riggs wrote: > On 13 December 2012 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> Yes, this does seem like a problem for upgrades from 9.2 to 9.3? We can > >> have pg_dump --binary-upgrade set these, or have ANALYZE set it. I > >> would prefer the later. > > > > ANALYZE does not set that value, and is not going to start doing so, > > because it doesn't scan enough of the table to derive a trustworthy > > value. > > ISTM that ANALYZE doesn't need to scan the table to do this. The > vismap is now trustworthy and we can scan it separately on ANALYZE. > > More to the point, since we run ANALYZE more frequently than we run > VACUUM, the value stored by the last VACUUM could be very stale. Wouldn't inserts affect the relallvisible ratio, but not cause a vacuum? Seems we should be having analyze update this independent of pg_upgrade needing it. Also, why is this in pg_class? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 13 December 2012 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > ANALYZE does not set that value, and is not going to start doing so, > because it doesn't scan enough of the table to derive a trustworthy > value. I'm slightly surprised by your remarks here, because the commit message where the relallvisible column was added (commit a2822fb9337a21f98ac4ce850bb4145acf47ca27) says: "Add a column pg_class.relallvisible to remember the number of pages that were all-visible according to the visibility map as of the last VACUUM (or ANALYZE, or some other operations that update pg_class.relpages). Use relallvisible/relpages, instead of an arbitrary constant, to estimate how many heap page fetches can be avoided during an index-only scan." Have I missed some nuance? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Dec 13, 2012 at 03:31:06PM +0000, Peter Geoghegan wrote: > On 13 December 2012 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > ANALYZE does not set that value, and is not going to start doing so, > > because it doesn't scan enough of the table to derive a trustworthy > > value. > > I'm slightly surprised by your remarks here, because the commit > message where the relallvisible column was added (commit > a2822fb9337a21f98ac4ce850bb4145acf47ca27) says: > > "Add a column pg_class.relallvisible to remember the number of pages > that were all-visible according to the visibility map as of the last > VACUUM > (or ANALYZE, or some other operations that update pg_class.relpages). > Use relallvisible/relpages, instead of an arbitrary constant, to > estimate how many heap page fetches can be avoided during an > index-only scan." > > Have I missed some nuance? I am looking back at this issue now and I think you are correct. The commit you mention (Oct 7 2011) says ANALYZE updates the visibility map, and the code matches that: if (!inh) vac_update_relstats(onerel, RelationGetNumberOfBlocks(onerel), totalrows, --> visibilitymap_count(onerel), hasindex, InvalidTransactionId); so if an index scan was not being used after an ANALYZE, it isn't a bad allvisibile estimate but something else. This code was in PG 9.2. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Sep 4, 2013 at 04:56:55PM -0400, Bruce Momjian wrote: > > "Add a column pg_class.relallvisible to remember the number of pages > > that were all-visible according to the visibility map as of the last > > VACUUM > > (or ANALYZE, or some other operations that update pg_class.relpages). > > Use relallvisible/relpages, instead of an arbitrary constant, to > > estimate how many heap page fetches can be avoided during an > > index-only scan." > > > > Have I missed some nuance? > > I am looking back at this issue now and I think you are correct. The > commit you mention (Oct 7 2011) says ANALYZE updates the visibility map, > and the code matches that: > > if (!inh) > vac_update_relstats(onerel, > RelationGetNumberOfBlocks(onerel), > totalrows, > --> visibilitymap_count(onerel), > hasindex, > InvalidTransactionId); > > so if an index scan was not being used after an ANALYZE, it isn't a bad > allvisibile estimate but something else. This code was in PG 9.2. Actually, I now realize it is more complex than that, and worse. There are several questions to study to understand when pg_class.relallvisible is updated (which is used to determine if index-only scans are a good optimization choice), and when VM all-visible bits are set so heap pages can be skipped during index-only scans: 1) When are VM bits set: vacuum (non-full) analyze (only some random pages)2) When are massive rows added but VMbits not set: copy3) When are VM bits cleared: insert/update/delete vacuum (non-full)4) When are VM map filescleared: vacuum full cluster5) When is pg_class.relallvisible updated via a VM map file scan: vacuum (non-full) analyze Vacuums run by autovacuum are driven by n_dead_tuples, which is only update and delete. Therefore, any scenario where vacuum (non-full) is never run will not have significant VM bits set. The only bits that will be set will be by pages visited randomly by analyze. The following table activities will not set proper VM bits: vacuum full cluster copy insert-only If updates and deletes happen, there will eventually be sufficient reason for autovacuum to vacuum the table and set proper VM bits, and pg_class.relallvisible. The calculus we should use to determine when we need to run vacuum has changed with index-only scans, and I am not sure we ever fully addressed this. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce@momjian.us> wrote: > Actually, I now realize it is more complex than that, and worse. There > are several questions to study to understand when pg_class.relallvisible > is updated (which is used to determine if index-only scans are a good > optimization choice), and when VM all-visible bits are set so heap pages > can be skipped during index-only scans: > > 1) When are VM bits set: > vacuum (non-full) > analyze (only some random pages) Analyze doesn't set visibility-map bits. It only updates statistics about how many are set. > The calculus we should use to determine when we need to run vacuum has > changed with index-only scans, and I am not sure we ever fully addressed > this. Yeah, we didn't. I think the hard part is figuring out what behavior would be best. Counting inserts as well as updates and deletes would be a simple approach, but I don't have much confidence in it. My experience is that having vacuum or analyze kick in during a bulk-load operation is a disaster. We'd kinda like to come up with a way to make vacuum run after the bulk load is complete, maybe, but how would we identify that time, and there are probably cases where that's not right either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<div class="moz-cite-prefix">On 06/09/13 13:10, Robert Haas wrote:<br /></div><blockquote cite="mid:CA+Tgmobh_-n-GvKX++DVctToZ9SZkdhvszP=CBjh9XZOvWsTEg@mail.gmail.com"type="cite"><pre wrap="">On Thu, Sep 5, 2013at 8:14 PM, Bruce Momjian <a class="moz-txt-link-rfc2396E" href="mailto:bruce@momjian.us"><bruce@momjian.us></a>wrote: </pre><blockquote type="cite"><pre wrap="">Actually, I now realize it is more complex than that, and worse. There are several questions to study to understand when pg_class.relallvisible is updated (which is used to determine if index-only scans are a good optimization choice), and when VM all-visible bits are set so heap pages can be skipped during index-only scans: 1) When are VM bits set: vacuum (non-full) analyze (only some random pages) </pre></blockquote><pre wrap=""> Analyze doesn't set visibility-map bits. It only updates statistics about how many are set. </pre><blockquote type="cite"><pre wrap="">The calculus we should use to determine when we need to run vacuum has changed with index-only scans, and I am not sure we ever fully addressed this. </pre></blockquote><pre wrap=""> Yeah, we didn't. I think the hard part is figuring out what behavior would be best. Counting inserts as well as updates and deletes would be a simple approach, but I don't have much confidence in it. My experience is that having vacuum or analyze kick in during a bulk-load operation is a disaster. We'd kinda like to come up with a way to make vacuum run after the bulk load is complete, maybe, but how would we identify that time, and there are probably cases where that's not right either. </pre></blockquote><font size="-1">How about a 'VACUUM AFTER' command</font> (part of the 'BEGIN' transaction syntax?) thatwould:<br /><ol><li>only be valid in a transaction<li>initiate a vacuum after the current transaction completed<li>deferany vacuum triggered due to other criteria<br /></ol> If the transaction was rolled back: then if therewas a pending vacuum, due to other reasons, it would then be actioned.<br /><br /> On normal transaction completion,then if there was a pending vacuum it would be combined with the one in the transaction.<br /><br /> Still wouldneed some method of ensuring any pending vacuum was done if the transaction hung, or took too long.<br /><br /><br />Cheers,<br /> Gavin<br />
On Thu, Sep 5, 2013 at 09:10:06PM -0400, Robert Haas wrote: > On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Actually, I now realize it is more complex than that, and worse. There > > are several questions to study to understand when pg_class.relallvisible > > is updated (which is used to determine if index-only scans are a good > > optimization choice), and when VM all-visible bits are set so heap pages > > can be skipped during index-only scans: > > > > 1) When are VM bits set: > > vacuum (non-full) > > analyze (only some random pages) > > Analyze doesn't set visibility-map bits. It only updates statistics > about how many are set. Sorry, yes you are correct. > > The calculus we should use to determine when we need to run vacuum has > > changed with index-only scans, and I am not sure we ever fully addressed > > this. > > Yeah, we didn't. I think the hard part is figuring out what behavior > would be best. Counting inserts as well as updates and deletes would > be a simple approach, but I don't have much confidence in it. My > experience is that having vacuum or analyze kick in during a bulk-load > operation is a disaster. We'd kinda like to come up with a way to > make vacuum run after the bulk load is complete, maybe, but how would > we identify that time, and there are probably cases where that's not > right either. I am unsure how we have gone a year with index-only scans and I am just now learning that it only works well with update/delete workloads or by running vacuum manually. I only found this out going back over January emails. Did other people know this? Was it not considered a serious problem? Well, our logic has been that vacuum is only for removing expired rows. I think we either need to improve that, or somehow make sequential scans update the VM map, and then find a way to trigger update of relallvisible even without inserts. Ideas ----- I think we need to detect tables that do not have VM bits set and try to determine if they should be vacuumed. If a table has most of its VM bits set, there in need to vacuum it for VM bit setting. Autovacuum knows how many pages are in the table via its file size, and it can scan the VM map to see how many pages are _not_ marked all-visible. If the VM map has many pages that are _not_ marked as all-visible, and change count since last vacuum is low, those pages might now be all-visible and vacuum might find them. One problem is that a long-running transaction is not going to update relallvisible until commit, so you might be vacuuming a table that is being modified, e.g. bulk loads. Do we have any way of detecting if a backend is modifying a table? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian escribió: > Ideas > ----- > > I think we need to detect tables that do not have VM bits set and try to > determine if they should be vacuumed. If a table has most of its VM > bits set, there in need to vacuum it for VM bit setting. I think it's shortsighted to keep thinking of autovacuum as just a way to run VACUUM and ANALYZE. We have already discussed work items that need to be done separately, such as truncating the last few empty pages on a relation that was vacuumed recently. We also need to process a GIN index' pending insertion list; and with minmax indexes I will want to run summarization of heap page ranges. So maybe instead of trying to think of VM bit setting as part of vacuum, we could just keep stats about how many pages we might need to scan because of possibly needing to set the bit, and then doing that in autovacuum, independently from actually vacuuming the relation. I'm not sure if we need to expose all these new maintenance actions as SQL commands. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I'm not sure if we need to expose all these new maintenance actions as > SQL commands. I strongly think we should, if only for diagnostic purposes. Also to adapt to some well defined workloads that the automatic system is not designed to handle. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 09/06/2013 09:23 AM, Dimitri Fontaine wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> I'm not sure if we need to expose all these new maintenance actions as >> SQL commands. > I strongly think we should, if only for diagnostic purposes. It would be much easier and more flexible to expose them as pg_*() function calls, not proper "commands". > Also to > adapt to some well defined workloads that the automatic system is not > designed to handle. +1 -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 2013-09-06 01:22:36 -0400, Alvaro Herrera wrote: > I think it's shortsighted to keep thinking of autovacuum as just a way > to run VACUUM and ANALYZE. We have already discussed work items that > need to be done separately, such as truncating the last few empty pages > on a relation that was vacuumed recently. We also need to process a GIN > index' pending insertion list; and with minmax indexes I will want to > run summarization of heap page ranges. Agreed. > So maybe instead of trying to think of VM bit setting as part of vacuum, > we could just keep stats about how many pages we might need to scan > because of possibly needing to set the bit, and then doing that in > autovacuum, independently from actually vacuuming the relation. I am not sure I understand this though. What would be the point to go and set all visible and not do the rest of the vacuuming work? I think triggering vacuuming by scanning the visibility map for the number of unset bits and use that as another trigger is a good idea. The vm should ensure we're not doing superflous work. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-09-06 13:38:56 +0200, Hannu Krosing wrote: > On 09/06/2013 09:23 AM, Dimitri Fontaine wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> I'm not sure if we need to expose all these new maintenance actions as > >> SQL commands. > > I strongly think we should, if only for diagnostic purposes. > It would be much easier and more flexible to expose them > as pg_*() function calls, not proper "commands". I don't think that's as easy as you might imagine. For much of what's done in that context you cannot be in a transaction, you even need to be in a toplevel statement (since we internally CommitTransactionCommand/StartTransactionCommand). So those pg_* commands couldn't be called (except possibly via the fastpath function call API ...) which might restrict their usefulnes a teensy bit ;) So, I think extending the options passed to VACUUM - since it can take pretty generic options these days - is a more realistic path. > > Also to > > adapt to some well defined workloads that the automatic system is not > > designed to handle. > +1 What would you like to expose individually? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/06/2013 03:12 PM, Andres Freund wrote: > On 2013-09-06 13:38:56 +0200, Hannu Krosing wrote: >> On 09/06/2013 09:23 AM, Dimitri Fontaine wrote: >>> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>>> I'm not sure if we need to expose all these new maintenance actions as >>>> SQL commands. >>> I strongly think we should, if only for diagnostic purposes. >> It would be much easier and more flexible to expose them >> as pg_*() function calls, not proper "commands". > I don't think that's as easy as you might imagine. For much of what's > done in that context you cannot be in a transaction, you even need to be > in a toplevel statement (since we internally > CommitTransactionCommand/StartTransactionCommand). > > So those pg_* commands couldn't be called (except possibly via the > fastpath function call API ...) which might restrict their usefulnes a > teensy bit ;) > > So, I think extending the options passed to VACUUM - since it can take > pretty generic options these days - is a more realistic path. Might be something convoluted like VACUUM indexname WITH (function = "pg_cleanup_gin($1)"); :) > >>> Also to >>> adapt to some well defined workloads that the automatic system is not >>> designed to handle. >> +1 > What would you like to expose individually? > > Greetings, > > Andres Freund >
On Fri, Sep 6, 2013 at 03:08:54PM +0200, Andres Freund wrote: > On 2013-09-06 01:22:36 -0400, Alvaro Herrera wrote: > > I think it's shortsighted to keep thinking of autovacuum as just a way > > to run VACUUM and ANALYZE. We have already discussed work items that > > need to be done separately, such as truncating the last few empty pages > > on a relation that was vacuumed recently. We also need to process a GIN > > index' pending insertion list; and with minmax indexes I will want to > > run summarization of heap page ranges. > > Agreed. > > > So maybe instead of trying to think of VM bit setting as part of vacuum, > > we could just keep stats about how many pages we might need to scan > > because of possibly needing to set the bit, and then doing that in > > autovacuum, independently from actually vacuuming the relation. > > I am not sure I understand this though. What would be the point to go > and set all visible and not do the rest of the vacuuming work? > > I think triggering vacuuming by scanning the visibility map for the > number of unset bits and use that as another trigger is a good idea. The > vm should ensure we're not doing superflous work. Yes, I think it might be hard to justify a separate VM-set-only scan of the table. If you are already reading the table, and already checking to see if you can set the VM bit, I am not sure why you would not also remove old rows, especially since removing those rows might be necessary to allow setting VM bits. Another problem I thought of is that while automatic vacuuming only happens with high update/delete load, index-only scans are best on mostly non-write tables, so we have bad behavior where the ideal case (static data) doesn't get vm-bits set, while update/delete has the vm-bits set, but then cleared as more update/deletes occur. The more I look at this the worse it appears. How has this gone unaddressed for over a year? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote: > > I am not sure I understand this though. What would be the point to go > > and set all visible and not do the rest of the vacuuming work? > > > > I think triggering vacuuming by scanning the visibility map for the > > number of unset bits and use that as another trigger is a good idea. The > > vm should ensure we're not doing superflous work. > > Yes, I think it might be hard to justify a separate VM-set-only scan of > the table. If you are already reading the table, and already checking > to see if you can set the VM bit, I am not sure why you would not also > remove old rows, especially since removing those rows might be necessary > to allow setting VM bits. Yep. Although adding the table back into the fsm will lead to it being used for new writes again... > Another problem I thought of is that while automatic vacuuming only > happens with high update/delete load, index-only scans are best on > mostly non-write tables, so we have bad behavior where the ideal case > (static data) doesn't get vm-bits set, while update/delete has the > vm-bits set, but then cleared as more update/deletes occur. Well, older tables will get vacuumed due to vacuum_freeze_table_age. So at some point they will get vacuumed and the vm bits will get set. > The more I look at this the worse it appears. How has this gone > unaddressed for over a year? It's been discussed several times including during the introduction of the feature. I am a bit surprised about the panickey tone in this thread. Yes, we need to overhaul the way vacuum works (to reduce the frequency of rewriting stuff repeatedly) and the way it's triggered (priorization, more trigger conditions) but all these are known things and "just" need somebody with time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Sep 6, 2013 at 06:36:47PM +0200, Andres Freund wrote: > On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote: > > > I am not sure I understand this though. What would be the point to go > > > and set all visible and not do the rest of the vacuuming work? > > > > > > I think triggering vacuuming by scanning the visibility map for the > > > number of unset bits and use that as another trigger is a good idea. The > > > vm should ensure we're not doing superflous work. > > > > Yes, I think it might be hard to justify a separate VM-set-only scan of > > the table. If you are already reading the table, and already checking > > to see if you can set the VM bit, I am not sure why you would not also > > remove old rows, especially since removing those rows might be necessary > > to allow setting VM bits. > > Yep. Although adding the table back into the fsm will lead to it being > used for new writes again... You mean adding _pages_ back into the table's FSM? Yes, that is going to cause those pages to get dirty, but it is better than expanding the table size. I don't see why you would not update the FSM. > > Another problem I thought of is that while automatic vacuuming only > > happens with high update/delete load, index-only scans are best on > > mostly non-write tables, so we have bad behavior where the ideal case > > (static data) doesn't get vm-bits set, while update/delete has the > > vm-bits set, but then cleared as more update/deletes occur. > > Well, older tables will get vacuumed due to vacuum_freeze_table_age. So > at some point they will get vacuumed and the vm bits will get set. Hmm, good point. That would help with an insert-only workload, as long as you can chew through 200M transactions. That doesn't help with a read-only workload as we don't consume transction IDs for SELECT. > > The more I look at this the worse it appears. How has this gone > > unaddressed for over a year? > > It's been discussed several times including during the introduction of > the feature. I am a bit surprised about the panickey tone in this > thread. This December 2012 thread by Andrew Dunstan shows he wasn't aware that a manual VACUUM was required for index-only scans. That thread ended with us realizing that pg_upgrade's ANALYZE runs will populate pg_class.relallvisible. What I didn't see in that thread is an analysis of what cases are going to require manual vacuum, and I have seen no work in 9.3 to improve that. I don't even see it on the TODO list. It bothers me that we spent time developing index-only scans, but have significant workloads where it doesn't work, no efforts on improving it, and no documentation on manual workarounds. I have not even seen discussion on how we are going to improve this. I would like to have that discussion now. > Yes, we need to overhaul the way vacuum works (to reduce the frequency > of rewriting stuff repeatedly) and the way it's triggered (priorization, > more trigger conditions) but all these are known things and "just" need > somebody with time. Based on the work needed to improve this, I would have thought someone would have taken this on during 9.3 development. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, Sep 6, 2013 at 01:01:59PM -0400, Bruce Momjian wrote: > This December 2012 thread by Andrew Dunstan shows he wasn't aware that a > manual VACUUM was required for index-only scans. That thread ended with > us realizing that pg_upgrade's ANALYZE runs will populate > pg_class.relallvisible. > > What I didn't see in that thread is an analysis of what cases are going > to require manual vacuum, and I have seen no work in 9.3 to improve > that. I don't even see it on the TODO list. OK, let's start the discussion then. I have added a TODO list: Improve setting of visibility map bits for read-only and insert-only workloads So, what should trigger an auto-vacuum vacuum for these workloads? Rather than activity, which is what normally drives autovacuum, it is lack of activity that should drive it, combined with a high VM cleared bit percentage. It seems we can use these statistics values: n_tup_ins | bigint n_tup_upd | bigint n_tup_del | bigint n_tup_hot_upd | bigint n_live_tup | bigint n_dead_tup | bigint n_mod_since_analyze | bigint last_vacuum | timestampwith time zone last_autovacuum | timestamp with time zone Particilarly last_vacuum and last_autovacuum can tell us the last time of vacuum. If the n_tup_upd/n_tup_del counts are low, and the VM set bit count is low, it might need vacuuming, though inserts into existing pages would complicate that. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 9/5/13 8:29 PM, Gavin Flower wrote: > How about a 'VACUUM AFTER' command (part of the 'BEGIN' transaction syntax?) that would: > > 1. only be valid in a transaction > 2. initiate a vacuum after the current transaction completed > 3. defer any vacuum triggered due to other criteria > > If the transaction was rolled back: then if there was a pending vacuum, due to other reasons, it would then be actioned. > > On normal transaction completion, then if there was a pending vacuum it would be combined with the one in the transaction. > > Still would need some method of ensuring any pending vacuum was done if the transaction hung, or took too long. I *really* like the idea of BEGIN VACUUM AFTER, but I suspect it would be of very limited usefulness if it didn't accountfor currently running transactions. I'm thinking we add a vacuum_after_xid field somewhere (pg_class), and instead of attempting to vacuum inside the backendat commit time the transaction would set that field to it's XID unless the field already had a newer XID in it. autovac would then add all tables where vacuum_after_xid < the oldest running transaction to it's priority list. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 9/6/13 2:13 PM, Bruce Momjian wrote: > On Fri, Sep 6, 2013 at 01:01:59PM -0400, Bruce Momjian wrote: >> This December 2012 thread by Andrew Dunstan shows he wasn't aware that a >> manual VACUUM was required for index-only scans. That thread ended with >> us realizing that pg_upgrade's ANALYZE runs will populate >> pg_class.relallvisible. >> >> What I didn't see in that thread is an analysis of what cases are going >> to require manual vacuum, and I have seen no work in 9.3 to improve >> that. I don't even see it on the TODO list. > > OK, let's start the discussion then. I have added a TODO list: > > Improve setting of visibility map bits for read-only and insert-only workloads > > So, what should trigger an auto-vacuum vacuum for these workloads? > Rather than activity, which is what normally drives autovacuum, it is > lack of activity that should drive it, combined with a high VM cleared > bit percentage. > > It seems we can use these statistics values: > > n_tup_ins | bigint > n_tup_upd | bigint > n_tup_del | bigint > n_tup_hot_upd | bigint > n_live_tup | bigint > n_dead_tup | bigint > n_mod_since_analyze | bigint > last_vacuum | timestamp with time zone > last_autovacuum | timestamp with time zone > > Particilarly last_vacuum and last_autovacuum can tell us the last time > of vacuum. If the n_tup_upd/n_tup_del counts are low, and the VM set > bit count is low, it might need vacuuming, though inserts into existing > pages would complicate that. Something else that might be useful to look at is if there are any FSM entries or not. True insert only shouldn't have anyFSM. That said, there's definitely another case to think about... tables that see update activity on newly inserted rows but noton older rows. A work queue that is not pruned would be an example of that: INSERT new work item UPDATE work item SET status = 'In process'; UPDATE work item SET completion = '50%'; UPDATE work item SET sattus = 'Complete", completion = '100%'; In this case I would expect most of the pages in the table (except the very end) to be all visible. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 2013-09-06 13:01:59 -0400, Bruce Momjian wrote: > On Fri, Sep 6, 2013 at 06:36:47PM +0200, Andres Freund wrote: > > On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote: > > > > I am not sure I understand this though. What would be the point to go > > > > and set all visible and not do the rest of the vacuuming work? > > > > > > > > I think triggering vacuuming by scanning the visibility map for the > > > > number of unset bits and use that as another trigger is a good idea. The > > > > vm should ensure we're not doing superflous work. > > > > > > Yes, I think it might be hard to justify a separate VM-set-only scan of > > > the table. If you are already reading the table, and already checking > > > to see if you can set the VM bit, I am not sure why you would not also > > > remove old rows, especially since removing those rows might be necessary > > > to allow setting VM bits. > > > > Yep. Although adding the table back into the fsm will lead to it being > > used for new writes again... > > You mean adding _pages_ back into the table's FSM? Yes, that is going > to cause those pages to get dirty, but it is better than expanding the > table size. I don't see why you would not update the FSM. You're right, we should add them, I wasn't really questioning that. I was, quietly so you couldn't hear it, wondering whether we should priorize the target buffer selection differently. > > > Another problem I thought of is that while automatic vacuuming only > > > happens with high update/delete load, index-only scans are best on > > > mostly non-write tables, so we have bad behavior where the ideal case > > > (static data) doesn't get vm-bits set, while update/delete has the > > > vm-bits set, but then cleared as more update/deletes occur. > > > > Well, older tables will get vacuumed due to vacuum_freeze_table_age. So > > at some point they will get vacuumed and the vm bits will get set. > > Hmm, good point. That would help with an insert-only workload, as long > as you can chew through 200M transactions. That doesn't help with a > read-only workload as we don't consume transction IDs for SELECT. It's even 150mio. For the other workloads, its pretty "common" wisdom to VACUUM after bulk data loading. I think we even document that. > > > The more I look at this the worse it appears. How has this gone > > > unaddressed for over a year? > > > > It's been discussed several times including during the introduction of > > the feature. I am a bit surprised about the panickey tone in this > > thread. > > This December 2012 thread by Andrew Dunstan shows he wasn't aware that a > manual VACUUM was required for index-only scans. That thread ended with > us realizing that pg_upgrade's ANALYZE runs will populate > pg_class.relallvisible. > What I didn't see in that thread is an analysis of what cases are going > to require manual vacuum, and I have seen no work in 9.3 to improve > that. I don't even see it on the TODO list. Yes, TODO maybe missing. > It bothers me that we spent time developing index-only scans, but have > significant workloads where it doesn't work, no efforts on improving it, > and no documentation on manual workarounds. I have not even seen > discussion on how we are going to improve this. I would like to have > that discussion now. It's not like the feature is useless in this case. You just need to perform an extra operation to activate it. I am not saying we shouldn't document it better, but it seriously worries me that a useful feature is depicted as useless because it requires a manual VACUUM in some circumstances. > > Yes, we need to overhaul the way vacuum works (to reduce the frequency > > of rewriting stuff repeatedly) and the way it's triggered (priorization, > > more trigger conditions) but all these are known things and "just" need > > somebody with time. > Based on the work needed to improve this, I would have thought someone > would have taken this on during 9.3 development. There has been some discussion about it indirectly via the freezing stuff. That also would require more "advanced" scheduling. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-09-06 15:13:30 -0400, Bruce Momjian wrote: > On Fri, Sep 6, 2013 at 01:01:59PM -0400, Bruce Momjian wrote: > > This December 2012 thread by Andrew Dunstan shows he wasn't aware that a > > manual VACUUM was required for index-only scans. That thread ended with > > us realizing that pg_upgrade's ANALYZE runs will populate > > pg_class.relallvisible. > > > > What I didn't see in that thread is an analysis of what cases are going > > to require manual vacuum, and I have seen no work in 9.3 to improve > > that. I don't even see it on the TODO list. > > OK, let's start the discussion then. I have added a TODO list: > > Improve setting of visibility map bits for read-only and insert-only workloads > > So, what should trigger an auto-vacuum vacuum for these workloads? > Rather than activity, which is what normally drives autovacuum, it is > lack of activity that should drive it, combined with a high VM cleared > bit percentage. > > It seems we can use these statistics values: > > n_tup_ins | bigint > n_tup_upd | bigint > n_tup_del | bigint > n_tup_hot_upd | bigint > n_live_tup | bigint > n_dead_tup | bigint > n_mod_since_analyze | bigint > last_vacuum | timestamp with time zone > last_autovacuum | timestamp with time zone > > Particilarly last_vacuum and last_autovacuum can tell us the last time > of vacuum. If the n_tup_upd/n_tup_del counts are low, and the VM set > bit count is low, it might need vacuuming, though inserts into existing > pages would complicate that. I wonder if we shouldn't trigger most vacuums (not analyze!) via unset fsm bits. Perhaps combined with keeping track of RecentGlobalXmin to make sure we're not repeatedly checking for work that cannot yet be done. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Sep 7, 2013 at 12:26:23AM +0200, Andres Freund wrote: > > So, what should trigger an auto-vacuum vacuum for these workloads? > > Rather than activity, which is what normally drives autovacuum, it is > > lack of activity that should drive it, combined with a high VM cleared > > bit percentage. > > > > It seems we can use these statistics values: > > > > n_tup_ins | bigint > > n_tup_upd | bigint > > n_tup_del | bigint > > n_tup_hot_upd | bigint > > n_live_tup | bigint > > n_dead_tup | bigint > > n_mod_since_analyze | bigint > > last_vacuum | timestamp with time zone > > last_autovacuum | timestamp with time zone > > > > Particilarly last_vacuum and last_autovacuum can tell us the last time > > of vacuum. If the n_tup_upd/n_tup_del counts are low, and the VM set > > bit count is low, it might need vacuuming, though inserts into existing > > pages would complicate that. > > I wonder if we shouldn't trigger most vacuums (not analyze!) via unset > fsm bits. Perhaps combined with keeping track of RecentGlobalXmin to Fsm bits? FSM tracks the free space on each page. How does that help? > make sure we're not repeatedly checking for work that cannot yet be > done. The idea of using RecentGlobalXmin to see how much _work_ has happened since the last vacuum is interesting, but it doesn't handle read-only transactions; I am not sure how they can be tracked. You make a good point that 5 minutes passing is meaningless --- you really want to know how many transactions have completed. Unfortunately, our virtual transactions make that hard to compute. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 2013-09-06 20:29:08 -0400, Bruce Momjian wrote: > On Sat, Sep 7, 2013 at 12:26:23AM +0200, Andres Freund wrote: > > I wonder if we shouldn't trigger most vacuums (not analyze!) via unset > > fsm bits. Perhaps combined with keeping track of RecentGlobalXmin to > > Fsm bits? FSM tracks the free space on each page. How does that > help? Err. I was way too tired when I wrote that. vm bits. > > make sure we're not repeatedly checking for work that cannot yet be > > done. > The idea of using RecentGlobalXmin to see how much _work_ has happened > since the last vacuum is interesting, but it doesn't handle read-only > transactions; I am not sure how they can be tracked. You make a good > point that 5 minutes passing is meaningless --- you really want to know > how many transactions have completed. So, what I was pondering went slightly into a different direction: (lets ignore anti wraparound vacuum for now) Currently we trigger autovacuums by the assumed number of dead tuples. In the course of it's action it usually will find that it cannot remove all dead rows and that it cannot mark everything as all visible. That's because the xmin horizon hasn't advanced far enough. We won't trigger another vacuum after that unless there are further dead tuples in the relation... One trick if we want to overcome that problem and that we do not handle setting all visible nicely for INSERT only workloads would be to trigger vacuum by the amount of pages that are not marked all visible in the vm. The problem there is that repeatedly scanning a relation that's only 50% visible where the rest cannot be marked all visible because of a longrunning pg_dump obivously isn't a good idea. So we need something to notify us when there's work to be done. Using elapsed time seems like a bad idea because it doesn't adapt to changing workloads very well and doesn't work nicely for different relations. What I was thinking of was to keep track of the oldest xids on pages that cannot be marked all visible. I haven't thought about the statistics part much, but what if we binned the space between [RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of pages falling into each bin. Then after the vacuum finished we could compute how far RecentGlobalXmin would have to progress to make another vacuum worthwile by counting the number of pages from the lowest bin upwards and use the bin's upper limit as the triggering xid. Now, we'd definitely need to amend that scheme by something that handles pages that are newly written to, but it seems something like that wouldn't be too hard to implement and would make autovacuum more useful. > Unfortunately, our virtual transactions make that hard to compute. I don't think they pose too much of a complexity. We basically only have to care about PGXACT->xmin here and virtual transactions don't change the handling of that ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Sep 7, 2013 at 07:34:49AM +0200, Andres Freund wrote: > > The idea of using RecentGlobalXmin to see how much _work_ has happened > > since the last vacuum is interesting, but it doesn't handle read-only > > transactions; I am not sure how they can be tracked. You make a good > > point that 5 minutes passing is meaningless --- you really want to know > > how many transactions have completed. > > So, what I was pondering went slightly into a different direction: > > (lets ignore anti wraparound vacuum for now) > > Currently we trigger autovacuums by the assumed number of dead > tuples. In the course of it's action it usually will find that it cannot > remove all dead rows and that it cannot mark everything as all > visible. That's because the xmin horizon hasn't advanced far enough. We > won't trigger another vacuum after that unless there are further dead > tuples in the relation... > One trick if we want to overcome that problem and that we do not handle > setting all visible nicely for INSERT only workloads would be to trigger > vacuum by the amount of pages that are not marked all visible in the vm. > > The problem there is that repeatedly scanning a relation that's only 50% > visible where the rest cannot be marked all visible because of a > longrunning pg_dump obivously isn't a good idea. So we need something to > notify us when there's work to be done. Using elapsed time seems like a > bad idea because it doesn't adapt to changing workloads very well and > doesn't work nicely for different relations. > > What I was thinking of was to keep track of the oldest xids on pages > that cannot be marked all visible. I haven't thought about the > statistics part much, but what if we binned the space between > [RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of > pages falling into each bin. Then after the vacuum finished we could > compute how far RecentGlobalXmin would have to progress to make another > vacuum worthwile by counting the number of pages from the lowest bin > upwards and use the bin's upper limit as the triggering xid. > > Now, we'd definitely need to amend that scheme by something that handles > pages that are newly written to, but it seems something like that > wouldn't be too hard to implement and would make autovacuum more useful. That seems very complicated. I think it would be enough to record the current xid at the time of the vacuum, and when testing for later vacuums, if that saved xid is earlier than the RecentGlobalXmin, and there have been no inserts/updates/deletes, we know that all of the pages can now be marked as allvisible. What this doesn't handle is the insert case. What we could do there is to record the total free space map space, and if the FSM has not changed between the last vacuum, we can even vacuum if inserts happened in that period because we assume the inserts are on new pages. One problem there is that the FSM is only updated if an insert will not fit on the page. We could record the table size and make sure the table size has increased before we allow inserts to trigger a vm-set vacuum. None of this is perfect, but it is better than what we have, and it would eventually get the VM bits set. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Hi, On 2013-09-07 12:50:59 -0400, Bruce Momjian wrote: > That seems very complicated. I think it would be enough to record the > current xid at the time of the vacuum, and when testing for later > vacuums, if that saved xid is earlier than the RecentGlobalXmin, and > there have been no inserts/updates/deletes, we know that all of > the pages can now be marked as allvisible. But that would constantly trigger vacuums, or am I missing something? Or what are you suggesting this xid to be used for? What I was talking about was how to evaluate the benefit of triggering an VACUUM even if there's not a significant amount of new dead rows. If we know that for a certain xmin horizon there's N pages that potentially can be cleaned and marked all visible we have a change of making sensible decisions. We could just use one bin (i.e. use one cutoff xid as you propose) and count the number of pages that would be affected. But that would mean we'd only trigger vacuums very irregularly if you have a workload with several longrunning transactions. When the oldest of a set of longrunning transactions finishes you possibly can already clean up a good bit reducing the chance of further bloat. Otherwise you have to wait for all of them to finish. > What this doesn't handle is the insert case. What we could do there is > to record the total free space map space, and if the FSM has not changed > between the last vacuum, we can even vacuum if inserts happened in that > period because we assume the inserts are on new pages. One problem > there is that the FSM is only updated if an insert will not fit on the > page. We could record the table size and make sure the table size has > increased before we allow inserts to trigger a vm-set vacuum. Not sure why that's better than just counting the number of pages that have unset vm bits? Note that you cannot rely on the FSM data to be correct all the time, we can only use such tricks to trigger vacuums not for the actual operation in the vacuum. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 5, 2013 at 7:00 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Sep 5, 2013 at 09:10:06PM -0400, Robert Haas wrote: >> On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > Actually, I now realize it is more complex than that, and worse. There >> > are several questions to study to understand when pg_class.relallvisible >> > is updated (which is used to determine if index-only scans are a good >> > optimization choice), and when VM all-visible bits are set so heap pages >> > can be skipped during index-only scans: >> > >> > 1) When are VM bits set: >> > vacuum (non-full) >> > analyze (only some random pages) >> >> Analyze doesn't set visibility-map bits. It only updates statistics >> about how many are set. > > Sorry, yes you are correct. > >> > The calculus we should use to determine when we need to run vacuum has >> > changed with index-only scans, and I am not sure we ever fully addressed >> > this. >> >> Yeah, we didn't. I think the hard part is figuring out what behavior >> would be best. Counting inserts as well as updates and deletes would >> be a simple approach, but I don't have much confidence in it. My >> experience is that having vacuum or analyze kick in during a bulk-load >> operation is a disaster. We'd kinda like to come up with a way to >> make vacuum run after the bulk load is complete, maybe, but how would >> we identify that time, and there are probably cases where that's not >> right either. > > I am unsure how we have gone a year with index-only scans and I am just > now learning that it only works well with update/delete workloads or by > running vacuum manually. I only found this out going back over January > emails. Did other people know this? Was it not considered a serious > problem? I thought it was well known, but maybe I was overly optimistic. I've considered IOS to be mostly useful for data mining work on read-mostly tables, which you would probably vacuum manually after a bulk load. For transactional tables, I think that trying to keep the vm set-bit density high enough would be a losing battle. If we redefined the nature of the vm so that doing a HOT update would not clear the visibility bit, perhaps that would change the outcome of this battle. Cheers, Jeff
On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Sep 5, 2013 at 7:00 PM, Bruce Momjian <bruce@momjian.us> wrote: >> On Thu, Sep 5, 2013 at 09:10:06PM -0400, Robert Haas wrote: >>> On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce@momjian.us> wrote: >>> > Actually, I now realize it is more complex than that, and worse. There >>> > are several questions to study to understand when pg_class.relallvisible >>> > is updated (which is used to determine if index-only scans are a good >>> > optimization choice), and when VM all-visible bits are set so heap pages >>> > can be skipped during index-only scans: >>> > >>> > 1) When are VM bits set: >>> > vacuum (non-full) >>> > analyze (only some random pages) >>> >>> Analyze doesn't set visibility-map bits. It only updates statistics >>> about how many are set. >> >> Sorry, yes you are correct. >> >>> > The calculus we should use to determine when we need to run vacuum has >>> > changed with index-only scans, and I am not sure we ever fully addressed >>> > this. >>> >>> Yeah, we didn't. I think the hard part is figuring out what behavior >>> would be best. Counting inserts as well as updates and deletes would >>> be a simple approach, but I don't have much confidence in it. My >>> experience is that having vacuum or analyze kick in during a bulk-load >>> operation is a disaster. We'd kinda like to come up with a way to >>> make vacuum run after the bulk load is complete, maybe, but how would >>> we identify that time, and there are probably cases where that's not >>> right either. >> >> I am unsure how we have gone a year with index-only scans and I am just >> now learning that it only works well with update/delete workloads or by >> running vacuum manually. I only found this out going back over January >> emails. Did other people know this? Was it not considered a serious >> problem? > > I thought it was well known, but maybe I was overly optimistic. I've > considered IOS to be mostly useful for data mining work on read-mostly > tables, which you would probably vacuum manually after a bulk load. > > For transactional tables, I think that trying to keep the vm set-bit > density high enough would be a losing battle. If we redefined the > nature of the vm so that doing a HOT update would not clear the > visibility bit, perhaps that would change the outcome of this battle. Wouldn't it make the Vacuum bit in-efficient in the sense that it will skip some of the pages in which there are only HOT updates for cleaning dead rows. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Sep 8, 2013 at 8:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> I thought it was well known, but maybe I was overly optimistic. I've >> considered IOS to be mostly useful for data mining work on read-mostly >> tables, which you would probably vacuum manually after a bulk load. >> >> For transactional tables, I think that trying to keep the vm set-bit >> density high enough would be a losing battle. If we redefined the >> nature of the vm so that doing a HOT update would not clear the >> visibility bit, perhaps that would change the outcome of this battle. > > Wouldn't it make the Vacuum bit in-efficient in the sense that it will > skip some of the pages in which there are only > HOT updates for cleaning dead rows. Maybe. But anyone is competent to clean up dead rows from HOT updates, it is not exclusively vacuum that can do it, like it is for non-HOT tuples. So I think any inefficiency would be very small. Cheers, Jeff
On Sun, Sep 8, 2013 at 12:47:35AM +0200, Andres Freund wrote: > Hi, > > On 2013-09-07 12:50:59 -0400, Bruce Momjian wrote: > > That seems very complicated. I think it would be enough to record the > > current xid at the time of the vacuum, and when testing for later > > vacuums, if that saved xid is earlier than the RecentGlobalXmin, and > > there have been no inserts/updates/deletes, we know that all of > > the pages can now be marked as allvisible. > > But that would constantly trigger vacuums, or am I missing something? Or > what are you suggesting this xid to be used for? OK, let me give some specifices. Let's suppose we run a vacuum, and at the time the current xid counter is 200. If we later have autovacuum check if it should vacuum, and there have been no dead rows generated (no update/delete/abort), if the current RecentGlobalXmin is >200, then we know that all the transactions that prevented all-visible marking the last time we ran vacuum has completed. That leaves us with just inserts that could prevent all-visible. If there have been no inserts, we can assume that we can vacuum just the non-all-visible pages, and even if there are only 10, it just means we have to read 10 8k blocks, not the entire table, because the all-visible is set for all the rest of the pages. Now, if there have been inserts, there are a few cases. If the inserts happened in pages that were previously marked all-visible, then we now have pages that lost all-visible, and we probably don't want to vacuum those. Of course, we will not have recorded which pages changed, but any decrease in the all-visible table count perhaps should have us avoiding vacuum just to set the visibility map. We should probably update our stored vm bit-set count and current xid value so we can check again later to see if things have sabilized. If the vm-set bit count is the same as the last time autovacuum checked the table, then the inserts happened either in the vm-bit cleared pages, or in new data pages. If the table size is the same, the inserts happened in existing pages, so we probably don't want to vacuum. If the table size has increased, some inserts went into new pages, so we might want to vacuum, but I am unclear how many new pages should force a vacuum. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Mon, Sep 9, 2013 at 9:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Sun, Sep 8, 2013 at 8:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> I thought it was well known, but maybe I was overly optimistic. I've >>> considered IOS to be mostly useful for data mining work on read-mostly >>> tables, which you would probably vacuum manually after a bulk load. >>> >>> For transactional tables, I think that trying to keep the vm set-bit >>> density high enough would be a losing battle. If we redefined the >>> nature of the vm so that doing a HOT update would not clear the >>> visibility bit, perhaps that would change the outcome of this battle. >> >> Wouldn't it make the Vacuum bit in-efficient in the sense that it will >> skip some of the pages in which there are only >> HOT updates for cleaning dead rows. > > Maybe. But anyone is competent to clean up dead rows from HOT > updates, it is not exclusively vacuum that can do it, like it is for > non-HOT tuples. Yes, that is right, but how about freezing of tuples, delaying that also might not be good. Also it might not be good for all kind of scenarios that always foreground operations take care of cleaning up dead rows leaving very less chance for Vacuum (only when it has to scan all pages aka anti-wraparound vacuum) to cleanup dead rows. If we are sure that Vacuum skipping pages in a database where there are less non-HOT updates and deletes (or mostly inserts and Hot-updates) is not having any significant impact, then it can be quite useful for IOS. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 9/7/13 12:34 AM, Andres Freund wrote: > What I was thinking of was to keep track of the oldest xids on pages > that cannot be marked all visible. I haven't thought about the > statistics part much, but what if we binned the space between > [RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of > pages falling into each bin. Then after the vacuum finished we could > compute how far RecentGlobalXmin would have to progress to make another > vacuum worthwile by counting the number of pages from the lowest bin > upwards and use the bin's upper limit as the triggering xid. > > Now, we'd definitely need to amend that scheme by something that handles > pages that are newly written to, but it seems something like that > wouldn't be too hard to implement and would make autovacuum more useful. If we're binning by XID though you're still dependent on scanning to build that range. Anything that creates dead tupleswill also be be problematic, because it's going to unset VM bits on you, and you won't know if it's due to INSERTSor dead tuples. What if we maintained XID stats for ranges of pages in a separate fork? Call it the XidStats fork. Presumably the interestingpieces would be min(xmin) and max(xmax) for pages that aren't all visible. If we did that at a granularity of,say, 1MB worth of pages[1] we're talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case alignment bumpsthat up to 2 XidStats pages per GB of heap.) Having both min(xmin) and max(xmax) for a range of pages would allow for very granular operation of vacuum. Instead of hittingevery heap page that's not all-visible, it would only hit those that are not visible and where min(xmin) or max(xmax)were less than RecentGlobalXmin. One concern is maintaining this data. A key point is that we don't have to update it every time it changes; if the min/maxare only off by a few hundred XIDs there's no point to updating the XidStats page. We'd obviously need the XidStatspage to be read in, but even a 100GB heap would be either 100 or 200 XidStats pages. [1]: There's a trade-off between how much space we 'waste' on XidStats pages and how many heap pages we potentially haveto scan in the range. We'd want to see what this looked like in a real system. The thing that helps here is that regardlessof what the stats for a particular heap range are, you're not going to scan any pages in that range that are alreadyall-visible. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 2013-09-17 11:37:35 -0500, Jim Nasby wrote: > On 9/7/13 12:34 AM, Andres Freund wrote: > >What I was thinking of was to keep track of the oldest xids on pages > >that cannot be marked all visible. I haven't thought about the > >statistics part much, but what if we binned the space between > >[RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of > >pages falling into each bin. Then after the vacuum finished we could > >compute how far RecentGlobalXmin would have to progress to make another > >vacuum worthwile by counting the number of pages from the lowest bin > >upwards and use the bin's upper limit as the triggering xid. > > > >Now, we'd definitely need to amend that scheme by something that handles > >pages that are newly written to, but it seems something like that > >wouldn't be too hard to implement and would make autovacuum more useful. > > If we're binning by XID though you're still dependent on scanning to > build that range. Anything that creates dead tuples will also be be > problematic, because it's going to unset VM bits on you, and you won't > know if it's due to INSERTS or dead tuples. I don't think that's all that much of a problem. In the end, it's a good idea to look at pages shortly after they have been filled/been touched. Setting hint bits at that point avoid repetitive IO and in many cases we will already be able to mark them all-visible. The binning idea was really about sensibly estimating whether a new scan already makes sense which is currently very hard to judge. I generally think the current logic for triggering VACUUMs via autovacuum doesn't really make all that much sense in the days where we have the visibility map. > What if we maintained XID stats for ranges of pages in a separate > fork? Call it the XidStats fork. Presumably the interesting pieces > would be min(xmin) and max(xmax) for pages that aren't all visible. If > we did that at a granularity of, say, 1MB worth of pages[1] we're > talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case > alignment bumps that up to 2 XidStats pages per GB of heap.) Yes, I have thought about similar ideas as well, but I came to the conclusion that it's not worth it. If you want to make the boundaries precise and the xidstats fork small, you're introducing new contention points because every DML will need to make sure it's correct. Also, the amount of code that would require seems to be bigger than justified by the increase of precision when to vacuum. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 9/17/13 6:10 PM, Andres Freund wrote: >> What if we maintained XID stats for ranges of pages in a separate >> >fork? Call it the XidStats fork. Presumably the interesting pieces >> >would be min(xmin) and max(xmax) for pages that aren't all visible. If >> >we did that at a granularity of, say, 1MB worth of pages[1] we're >> >talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case >> >alignment bumps that up to 2 XidStats pages per GB of heap.) > Yes, I have thought about similar ideas as well, but I came to the > conclusion that it's not worth it. If you want to make the boundaries > precise and the xidstats fork small, you're introducing new contention > points because every DML will need to make sure it's correct. Actually, that's not true... the XidStats only need to be "relatively" precise. IE: within a few hundred or thousand XIDs. So for example, you'd only need to attempt an update if the XID already stored was more than a few hundred/thousand/whateverXIDs away from your XID. If it's any closer don't even bother to update. That still leaves potential for thundering herd on the fork buffer lock if you've got a ton of DML on one table across abunch of backends, but there might be other ways around that. For example, if you know you can update the XID with a CPU-atomicinstruction, you don't need to lock the page. > Also, the amount of code that would require seems to be bigger than > justified by the increase of precision when to vacuum. That's very possibly true. I haven't had a chance to see how much VM bits help reduce vacuum overhead yet, so I don't haveanything to add on this front. Perhaps others might. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Tue, Sep 17, 2013 at 7:10 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I generally think the current logic for triggering VACUUMs via > autovacuum doesn't really make all that much sense in the days where we > have the visibility map. Right now, whether or not to autovacuum is the rest of a two-pronged test. The first prong is based on number of updates and deletes relative to table size; that triggers a regular autovacuum. The second prong is based on age(relfrozenxid) and triggers a non-page-skipping vacuum (colloquially, an anti-wraparound vacuum). The typical case in which this doesn't work out well is when the table has a lot of inserts but few or no updates and deletes. So I propose that we change the first prong to count inserts as well as updates and deletes when deciding whether it needs to vacuum the table. We already use that calculation to decide whether to auto-analyze, so it wouldn't be very novel. We know that the work of marking pages all-visible will need to be done at some point, and doing it sooner will result in doing it in smaller batches, which seems generally good. However, I do have one concern: it might lead to excessive index-vacuuming. Right now, we skip the index vac step only if there ZERO dead tuples are found during the heap scan. Even one dead tuple (or line pointer) will cause an index vac cycle, which may easily be excessive. So I further propose that we introduce a threshold for index-vac; so that we only do index vac cycle if the number of dead tuples exceeds, say 0.1% of the table size. Thoughts? Let the hurling of rotten tomatoes begin. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > Right now, whether or not to autovacuum is the rest of a two-pronged > test. The first prong is based on number of updates and deletes > relative to table size; that triggers a regular autovacuum. The > second prong is based on age(relfrozenxid) and triggers a > non-page-skipping vacuum (colloquially, an anti-wraparound vacuum). > > The typical case in which this doesn't work out well is when the table > has a lot of inserts but few or no updates and deletes. So I propose > that we change the first prong to count inserts as well as updates and > deletes when deciding whether it needs to vacuum the table. We > already use that calculation to decide whether to auto-analyze, so it > wouldn't be very novel. We know that the work of marking pages > all-visible will need to be done at some point, and doing it sooner > will result in doing it in smaller batches, which seems generally > good. > > However, I do have one concern: it might lead to excessive > index-vacuuming. Right now, we skip the index vac step only if there > ZERO dead tuples are found during the heap scan. Even one dead tuple > (or line pointer) will cause an index vac cycle, which may easily be > excessive. So I further propose that we introduce a threshold for > index-vac; so that we only do index vac cycle if the number of dead > tuples exceeds, say 0.1% of the table size. +1 I've been thinking of suggesting something along the same lines, for the same reasons. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-19 14:39:43 -0400, Robert Haas wrote: > On Tue, Sep 17, 2013 at 7:10 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > I generally think the current logic for triggering VACUUMs via > > autovacuum doesn't really make all that much sense in the days where we > > have the visibility map. > > Right now, whether or not to autovacuum is the rest of a two-pronged > test. The first prong is based on number of updates and deletes > relative to table size; that triggers a regular autovacuum. The > second prong is based on age(relfrozenxid) and triggers a > non-page-skipping vacuum (colloquially, an anti-wraparound vacuum). And I have some hopes we can get rid of that in 9.4 (that alone would be worth a bump to 10.0 ;)). I really like Heikki's patch, even if I am envious that I didn't have the idea :P. Although it needs quite a bit of work to be ready. > The typical case in which this doesn't work out well is when the table > has a lot of inserts but few or no updates and deletes. So I propose > that we change the first prong to count inserts as well as updates and > deletes when deciding whether it needs to vacuum the table. We > already use that calculation to decide whether to auto-analyze, so it > wouldn't be very novel. We know that the work of marking pages > all-visible will need to be done at some point, and doing it sooner > will result in doing it in smaller batches, which seems generally > good. Yes, that's a desperately needed change. The reason I suggested keeping track of the xids of unremovable tuples is that the current logic doesn't handle that at all. We just unconditionally set n_dead_tuples to zero after a vacuum even if not a single row could actually be cleaned out. Which has the effect that we will not start a vacuum until enough bloat (or after changing this, new inserts) has collected to start vacuum anew. Which then will do twice the work. Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do much good either - we would just immediately trigger a new vacuum the next time we check, even if the xmin horizon is still the same. > However, I do have one concern: it might lead to excessive > index-vacuuming. Right now, we skip the index vac step only if there > ZERO dead tuples are found during the heap scan. Even one dead tuple > (or line pointer) will cause an index vac cycle, which may easily be > excessive. So I further propose that we introduce a threshold for > index-vac; so that we only do index vac cycle if the number of dead > tuples exceeds, say 0.1% of the table size. Yes, that's a pretty valid concern. But we can't really do it that easily. a) We can only remove dead line pointers when we know there's no index pointing to it anymore. Which we only know after the index has been removed. b) We cannot check the validity of an index pointer if there's no heap tuple for it. Sure, we could check whether we're pointing to a dead line pointer, but the random io costs of that are prohibitive. Now, we could just mark line pointers as dead and not mark that page as all-visible and pick it up again on the next vacuum cycle. But that would suck long-term. I think the only real solution here is to store removed tuples tids (i.e. items where we've marked as dead) somewhere. Whenever we've found sufficient tuples to-be-removed from indexes we do phase 2. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund <andres@2ndquadrant.com> wrote: > The reason I suggested keeping track of the xids of unremovable tuples > is that the current logic doesn't handle that at all. We just > unconditionally set n_dead_tuples to zero after a vacuum even if not a > single row could actually be cleaned out. Which has the effect that we > will not start a vacuum until enough bloat (or after changing this, new > inserts) has collected to start vacuum anew. Which then will do twice > the work. > > Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do > much good either - we would just immediately trigger a new vacuum the > next time we check, even if the xmin horizon is still the same. One idea would be to store the xmin we used for the vacuum somewhere. Could we make that part of the pgstats infrastructure? Or store it in a new pg_class column? Then we could avoid re-triggering until it advances. Or, maybe better, we could remember the oldest XID that we weren't able to remove due to xmin considerations and re-trigger when the horizon passes that point. >> However, I do have one concern: it might lead to excessive >> index-vacuuming. Right now, we skip the index vac step only if there >> ZERO dead tuples are found during the heap scan. Even one dead tuple >> (or line pointer) will cause an index vac cycle, which may easily be >> excessive. So I further propose that we introduce a threshold for >> index-vac; so that we only do index vac cycle if the number of dead >> tuples exceeds, say 0.1% of the table size. > > Yes, that's a pretty valid concern. But we can't really do it that > easily. a) We can only remove dead line pointers when we know there's no > index pointing to it anymore. Which we only know after the index has > been removed. b) We cannot check the validity of an index pointer if > there's no heap tuple for it. Sure, we could check whether we're > pointing to a dead line pointer, but the random io costs of that are > prohibitive. > Now, we could just mark line pointers as dead and not mark that page as > all-visible and pick it up again on the next vacuum cycle. But that > would suck long-term. > > I think the only real solution here is to store removed tuples tids > (i.e. items where we've marked as dead) somewhere. Whenever we've found > sufficient tuples to-be-removed from indexes we do phase 2. I don't really agree with that. Yes, we could make that change, and yes, it might be better than what we're doing today, but it would be complex and have its own costs. And it doesn't mean that lesser steps are without merit. A vacuum pass over the heap buys us a LOT of space for reuse even without touching the indexes: we don't reclaim the line pointers, but we do reclaim the space for the tuples themselves, which is a big deal. So being able to do that more frequently without causing problems has a lot of value, I think. The fact that we get to set all-visible bits along the way makes future vacuums cheaper, and makes index scans work better, so that's good too. And the first vacuum to find a dead tuple will dirty the page to truncate it to a dead line pointer, while any subsequent revisits prior to the index vac cycle will only examine the page without dirtying it. All in all, just leaving the page to be caught be a future vacuum doesn't seem that bad to me, at least for a first cut. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-20 11:30:26 -0400, Robert Haas wrote: > On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > The reason I suggested keeping track of the xids of unremovable tuples > > is that the current logic doesn't handle that at all. We just > > unconditionally set n_dead_tuples to zero after a vacuum even if not a > > single row could actually be cleaned out. Which has the effect that we > > will not start a vacuum until enough bloat (or after changing this, new > > inserts) has collected to start vacuum anew. Which then will do twice > > the work. > > > > Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do > > much good either - we would just immediately trigger a new vacuum the > > next time we check, even if the xmin horizon is still the same. > > One idea would be to store the xmin we used for the vacuum somewhere. > Could we make that part of the pgstats infrastructure? Or store it in > a new pg_class column? Then we could avoid re-triggering until it > advances. Or, maybe better, we could remember the oldest XID that we > weren't able to remove due to xmin considerations and re-trigger when > the horizon passes that point. I suggested a slightly more complex variant of this upthread: http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Sep 20, 2013 at 11:51 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-20 11:30:26 -0400, Robert Haas wrote: >> On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> > The reason I suggested keeping track of the xids of unremovable tuples >> > is that the current logic doesn't handle that at all. We just >> > unconditionally set n_dead_tuples to zero after a vacuum even if not a >> > single row could actually be cleaned out. Which has the effect that we >> > will not start a vacuum until enough bloat (or after changing this, new >> > inserts) has collected to start vacuum anew. Which then will do twice >> > the work. >> > >> > Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do >> > much good either - we would just immediately trigger a new vacuum the >> > next time we check, even if the xmin horizon is still the same. >> >> One idea would be to store the xmin we used for the vacuum somewhere. >> Could we make that part of the pgstats infrastructure? Or store it in >> a new pg_class column? Then we could avoid re-triggering until it >> advances. Or, maybe better, we could remember the oldest XID that we >> weren't able to remove due to xmin considerations and re-trigger when >> the horizon passes that point. > > I suggested a slightly more complex variant of this upthread: > http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de Ah, yeah. Sorry, I forgot about that. Personally, I'd try the simpler version first. But I think whoever takes the time to implement this will probably get to pick. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 19, 2013 at 02:39:43PM -0400, Robert Haas wrote: > Right now, whether or not to autovacuum is the rest of a two-pronged > test. The first prong is based on number of updates and deletes > relative to table size; that triggers a regular autovacuum. The > second prong is based on age(relfrozenxid) and triggers a > non-page-skipping vacuum (colloquially, an anti-wraparound vacuum). > > The typical case in which this doesn't work out well is when the table > has a lot of inserts but few or no updates and deletes. So I propose > that we change the first prong to count inserts as well as updates and > deletes when deciding whether it needs to vacuum the table. We > already use that calculation to decide whether to auto-analyze, so it > wouldn't be very novel. We know that the work of marking pages > all-visible will need to be done at some point, and doing it sooner > will result in doing it in smaller batches, which seems generally > good. > > However, I do have one concern: it might lead to excessive > index-vacuuming. Right now, we skip the index vac step only if there > ZERO dead tuples are found during the heap scan. Even one dead tuple > (or line pointer) will cause an index vac cycle, which may easily be > excessive. So I further propose that we introduce a threshold for > index-vac; so that we only do index vac cycle if the number of dead > tuples exceeds, say 0.1% of the table size. > > Thoughts? Let the hurling of rotten tomatoes begin. Robert, where are we on this? Should I post a patch? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Fri, Jan 31, 2014 at 10:22 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Sep 19, 2013 at 02:39:43PM -0400, Robert Haas wrote: >> Right now, whether or not to autovacuum is the rest of a two-pronged >> test. The first prong is based on number of updates and deletes >> relative to table size; that triggers a regular autovacuum. The >> second prong is based on age(relfrozenxid) and triggers a >> non-page-skipping vacuum (colloquially, an anti-wraparound vacuum). >> >> The typical case in which this doesn't work out well is when the table >> has a lot of inserts but few or no updates and deletes. So I propose >> that we change the first prong to count inserts as well as updates and >> deletes when deciding whether it needs to vacuum the table. We >> already use that calculation to decide whether to auto-analyze, so it >> wouldn't be very novel. We know that the work of marking pages >> all-visible will need to be done at some point, and doing it sooner >> will result in doing it in smaller batches, which seems generally >> good. >> >> However, I do have one concern: it might lead to excessive >> index-vacuuming. Right now, we skip the index vac step only if there >> ZERO dead tuples are found during the heap scan. Even one dead tuple >> (or line pointer) will cause an index vac cycle, which may easily be >> excessive. So I further propose that we introduce a threshold for >> index-vac; so that we only do index vac cycle if the number of dead >> tuples exceeds, say 0.1% of the table size. >> >> Thoughts? Let the hurling of rotten tomatoes begin. > > Robert, where are we on this? Should I post a patch? I started working on this at one point but didn't finish the implementation, let alone the no-doubt-onerous performance testing that will be needed to validate whatever we come up with. It would be really easy to cause serious regressions with ill-considered changes in this area, and I don't think many people here have the bandwidth for a detailed study of all the different workloads that might be affected here right this very minute. More generally, you're sending all these pings three weeks after the deadline for CF4. I don't think that's a good time to encourage people to *start* revising old patches, or writing new ones. I've also had some further thoughts about the right way to drive vacuum scheduling. I think what we need to do is tightly couple the rate at which we're willing to do vacuuming to the rate at which we're incurring "vacuum debt". That is, if we're creating 100kB/s of pages needing vacuum, we vacuum at 2-3MB/s (with default settings). If we're creating 10MB/s of pages needing vacuum, we *still* vacuum at 2-3MB/s. Not shockingly, vacuum gets behind, the database bloats, and everything goes to heck. The rate of vacuuming needs to be tied somehow to the rate at which we're creating stuff that needs to be vacuumed. Right now we don't even have a way to measure that, let alone auto-regulate the aggressiveness of autovacuum on that basis. Similarly, for marking of pages as all-visible, we currently make the same decision whether the relation is getting index-scanned (in which case the failure to mark those pages all-visible may be suppressing the use of index scans or making them less effective) or whether it's not being accessed at all (in which case vacuuming it won't help anything, and might hurt by pushing other pages out of cache). Again, if we had better statistics, we could measure this - counting heap fetches for actual index-only scans plus heap fetches for index scans that might have been planned index-only scans but for the relation having too few all-visible pages doesn't sound like an impossible metric to gather. And if we had that, we could use it to trigger vacuuming, instead of guessing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Feb 3, 2014 at 8:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
vacuum scheduling. I think what we need to do is tightly couple theI've also had some further thoughts about the right way to drive
rate at which we're willing to do vacuuming to the rate at which we're
incurring "vacuum debt". That is, if we're creating 100kB/s of pages
needing vacuum, we vacuum at 2-3MB/s (with default settings).
If we can tolerate 2-3MB/s without adverse impact on other work, then we can tolerate it. Do we gain anything substantial by sand-bagging it?
If
we're creating 10MB/s of pages needing vacuum, we *still* vacuum at
2-3MB/s. Not shockingly, vacuum gets behind, the database bloats, and
everything goes to heck.
(Your reference to bloat made be me think your comments here are about vacuuming in general, not specific to IOS. If that isn't the case, then please ignore.)
If we can only vacuum at 2-3MB/s without adversely impacting other activity, but we are creating 10MB/s of future vacuum need, then there are basically two possibilities I can think of. Either the 10MB/s represents a spike, and vacuum should tolerate it and hope to catch up on the debt later. Or it represents a new permanent condition, in which case I bought too few hard drives for the work load, and no scheduling decision that autovacuum can make will save me from my folly. Perhaps there is some middle ground between those possibilities, but I don't see room for much middle ground.
I guess there might be entirely different possibilities not between those two; for example, I don't realize I'm doing something that is generating 10MB/s of vacuum debt, and would like to have this thing I'm doing be automatically throttled to the point it doesn't interfere with other processes (either directly, or indirectly by bloat)
The rate of vacuuming needs to be tied
somehow to the rate at which we're creating stuff that needs to be
vacuumed. Right now we don't even have a way to measure that, let
alone auto-regulate the aggressiveness of autovacuum on that basis.
There is the formula used to decide when a table gets vacuumed. Isn't the time delta in this formula a measure of how fast we are creating stuff that needs to be vacuumed for bloat reasons? Is your objection that it doesn't include other reasons we might want to vacuum, or that it just doesn't work very well, or that is not explicitly exposed?
Similarly, for marking of pages as all-visible, we currently make the
same decision whether the relation is getting index-scanned (in which
case the failure to mark those pages all-visible may be suppressing
the use of index scans or making them less effective) or whether it's
not being accessed at all (in which case vacuuming it won't help
anything, and might hurt by pushing other pages out of cache).
If it is not getting accessed at all because the database is not very active right now, that would be the perfect time to vacuum it. Between "I can accurately project current patterns of (in)activity into the future" and "People don't build large tables just to ignore them forever", I think the latter is more likely to be true. If the system is busy but this particular table is not, then that would be a better reason to de-prioritise vacuuming that table. But can this degree of reasoning really be implemented in a practical way? In core?
Again,
if we had better statistics, we could measure this - counting heap
fetches for actual index-only scans plus heap fetches for index scans
that might have been planned index-only scans but for the relation
having too few all-visible pages doesn't sound like an impossible
metric to gather.
My experience has been that if too few pages are all visible, it generally switches to a seq scan, not an index scan of a different index. But many things that are semantically possible to be index-only-scans would never be planned that way even if allvisible were 100%, so I think it would have to do two planning passes, one with the real allvisible, and a hypothetical one with allvisible set to 100%. And then there is the possibility that, while a high allvisible would be useful, the table is so active that no amount of vacuuming could ever keep it high.
Cheers,
Jeff
First, thanks for this thoughtful email. On Tue, Feb 4, 2014 at 7:14 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Feb 3, 2014 at 8:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I've also had some further thoughts about the right way to drive >> vacuum scheduling. I think what we need to do is tightly couple the >> rate at which we're willing to do vacuuming to the rate at which we're >> incurring "vacuum debt". That is, if we're creating 100kB/s of pages >> needing vacuum, we vacuum at 2-3MB/s (with default settings). > > If we can tolerate 2-3MB/s without adverse impact on other work, then we can > tolerate it. Do we gain anything substantial by sand-bagging it? No. The problem is the other direction. >> If >> we're creating 10MB/s of pages needing vacuum, we *still* vacuum at >> 2-3MB/s. Not shockingly, vacuum gets behind, the database bloats, and >> everything goes to heck. > > (Your reference to bloat made be me think your comments here are about > vacuuming in general, not specific to IOS. If that isn't the case, then > please ignore.) > > If we can only vacuum at 2-3MB/s without adversely impacting other activity, > but we are creating 10MB/s of future vacuum need, then there are basically > two possibilities I can think of. Either the 10MB/s represents a spike, and > vacuum should tolerate it and hope to catch up on the debt later. Or it > represents a new permanent condition, in which case I bought too few hard > drives for the work load, and no scheduling decision that autovacuum can > make will save me from my folly. Perhaps there is some middle ground between > those possibilities, but I don't see room for much middle ground. > > I guess there might be entirely different possibilities not between those > two; for example, I don't realize I'm doing something that is generating > 10MB/s of vacuum debt, and would like to have this thing I'm doing be > automatically throttled to the point it doesn't interfere with other > processes (either directly, or indirectly by bloat) The underlying issue here is that, in order for there not to be a problem, a user needs to configure their autovacuum processes to vacuum at a rate which is greater than or equal to the average rate at which vacuum debt is being created. If they don't, they get runaway bloat. But to do that, they need to know at what rate they are creating vacuum debt, which is almost impossible to figure out right now; and even if they did know it, they'd then need to figure out what vacuum cost delay settings would allow vacuuming at a rate sufficient to keep up, which isn't quite as hard to estimate but certainly involves nontrivial math. So a lot of people have this set wrong, and it's not easy to get it right except by frobbing the settings until you find something that works well in practice. Also, a whole *lot* of problems in this area are caused by cases where the rate at which vacuum debt is being created *changes*. Autovacuum is keeping up, but then you have either a load spike or just a gradual increase in activity and it doesn't keep up any more. You don't necessarily notice right away, and by the time you do there's no easy way to recover. If you've got a table with lots of dead tuples in it, but it's also got enough internal freespace to satisfy as many inserts and updates as are happening, then it's possibly reasonable to put off vacuuming in the hopes that system load will be lower at some time in the future. But if you've got a table with lots of dead tuples in it, and you're extending it to create internal freespace instead of vacuuming it, it is highly like that you are not doing what will make the user most happy. Even if vacuuming that table slows down foreground activity quite badly, it is probably better than accumulating an arbitrary amount of bloat. >> The rate of vacuuming needs to be tied >> somehow to the rate at which we're creating stuff that needs to be >> vacuumed. Right now we don't even have a way to measure that, let >> alone auto-regulate the aggressiveness of autovacuum on that basis. > > There is the formula used to decide when a table gets vacuumed. Isn't the > time delta in this formula a measure of how fast we are creating stuff that > needs to be vacuumed for bloat reasons? Is your objection that it doesn't > include other reasons we might want to vacuum, or that it just doesn't work > very well, or that is not explicitly exposed? AFAICT, the problem isn't when the table gets vacuumed so much as *how fast* it gets vacuumed. The autovacuum algorithm does a fine job selecting tables for vacuuming, for the most part. There are problems with insert-only tables and sometimes for large tables the default threshold (0.20) is too high, but it's not terrible. However, the limit on the overall rate of vacuuming activity to 2-3MB/s regardless of how fast we're creating vacuum debt is a big problem. >> Similarly, for marking of pages as all-visible, we currently make the >> same decision whether the relation is getting index-scanned (in which >> case the failure to mark those pages all-visible may be suppressing >> the use of index scans or making them less effective) or whether it's >> not being accessed at all (in which case vacuuming it won't help >> anything, and might hurt by pushing other pages out of cache). > > If it is not getting accessed at all because the database is not very active > right now, that would be the perfect time to vacuum it. Between "I can > accurately project current patterns of (in)activity into the future" and > "People don't build large tables just to ignore them forever", I think the > latter is more likely to be true. If the system is busy but this particular > table is not, then that would be a better reason to de-prioritise vacuuming > that table. But can this degree of reasoning really be implemented in a > practical way? In core? I don't know. But the algorithm for determining the rate at which we vacuum (2-3MB/s) could hardly be stupider than it is right now. It's almost a constant, and to the extent that it's not a constant, it depends on the wrong things. The fact that getting this perfectly right is unlikely to be easy, and may be altogether impossible, shouldn't discourage us from trying to come up with something better than what we have now. > My experience has been that if too few pages are all visible, it generally > switches to a seq scan, not an index scan of a different index. But many > things that are semantically possible to be index-only-scans would never be > planned that way even if allvisible were 100%, so I think it would have to > do two planning passes, one with the real allvisible, and a hypothetical one > with allvisible set to 100%. And then there is the possibility that, while > a high allvisible would be useful, the table is so active that no amount of > vacuuming could ever keep it high. Yeah, those are all good points. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Feb 3, 2014 at 11:55:34AM -0500, Robert Haas wrote: > > Robert, where are we on this? Should I post a patch? > > I started working on this at one point but didn't finish the > implementation, let alone the no-doubt-onerous performance testing > that will be needed to validate whatever we come up with. It would be > really easy to cause serious regressions with ill-considered changes > in this area, and I don't think many people here have the bandwidth > for a detailed study of all the different workloads that might be > affected here right this very minute. More generally, you're sending > all these pings three weeks after the deadline for CF4. I don't think > that's a good time to encourage people to *start* revising old > patches, or writing new ones. > > I've also had some further thoughts about the right way to drive > vacuum scheduling. I think what we need to do is tightly couple the I understand the problems with vacuum scheduling, but I was trying to address _just_ the insert-only workload problem for index-only scans. Right now, as I remember, only vacuum sets the visibility bits. If we don't want to make vacuum trigger for insert-only workloads, can we set pages all-visible more often? Is there a reason that a sequential scan, which does do page pruning, doesn't set the visibility bits too? Or does it? Can an non-index-only index scan that finds the heap tuple all-visible and the page not all-visible check the other items on the page to see if the page can be marked all-visible? Does analyze set pages all-visible? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Tue, Feb 11, 2014 at 10:56 AM, Bruce Momjian <bruce@momjian.us> wrote: > Right now, as I remember, only vacuum sets the visibility bits. If we > don't want to make vacuum trigger for insert-only workloads, can we set > pages all-visible more often? > > Is there a reason that a sequential scan, which does do page pruning, > doesn't set the visibility bits too? Or does it? Can an non-index-only > index scan that finds the heap tuple all-visible and the page not > all-visible check the other items on the page to see if the page can be > marked all-visible? Does analyze set pages all-visible? A sequential scan will set hint bits and will prune the page, but pruning the page doesn't ever mark it all-visible; that logic is entirely in vacuum. If that could be made cheap enough to be negligible, it might well be worth doing in heap_page_prune(). I think there might be a way to do that, but it's a bit tricky because the pruning logic iterates over the page in a somewhat complex way, not just a straightforward scan of all the item pointers the way the existing logic doesn't. It would be pretty cool if we could just use a bit out of the heap-prune xlog record to indicate whether the all-visible bit should be set; then we'd gain the benefit of marking things all-visible much more often without needing vacuum. That doesn't help insert-only tables much, though, because those won't require pruning. We set hint bits (which dirties the page) but currently don't write WAL. We'd have to change that to set the all-visible bit when scanning such a table, and that would be expensive. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 11, 2014 at 11:28:36AM -0500, Robert Haas wrote: > A sequential scan will set hint bits and will prune the page, but > pruning the page doesn't ever mark it all-visible; that logic is > entirely in vacuum. If that could be made cheap enough to be > negligible, it might well be worth doing in heap_page_prune(). I > think there might be a way to do that, but it's a bit tricky because > the pruning logic iterates over the page in a somewhat complex way, > not just a straightforward scan of all the item pointers the way the > existing logic doesn't. It would be pretty cool if we could just use > a bit out of the heap-prune xlog record to indicate whether the > all-visible bit should be set; then we'd gain the benefit of marking > things all-visible much more often without needing vacuum. > > That doesn't help insert-only tables much, though, because those won't > require pruning. We set hint bits (which dirties the page) but > currently don't write WAL. We'd have to change that to set the > all-visible bit when scanning such a table, and that would be > expensive. :-( Yes, that pretty much sums it up. We introduced index-only scans in 9.2 (2012) but they still seem to be not usable for insert-only workloads two years later. Based on current progress, it doesn't look like this will be corrected until 9.5 (2015). I am kind of confused why this has not generated more urgency. I guess my question is what approach do we want to take to fixing this? If we are doing pruning, aren't we emitting WAL? You are right that for an insert-only workload, we aren't going to prune, but if pruning WAL overhead is acceptable for a sequential scan, isn't index-only page-all-visible WAL overhead acceptable? Do we want to track the number of inserts in statistics and trigger an auto-vacuum after a specified number of inserts? The problem there is that we really don't need to do any index cleanup, which is what vacuum typically does --- we just want to scan the table and set the all-visible bits, so that approach seems non-optimal. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote: > Yes, that pretty much sums it up. We introduced index-only scans in 9.2 > (2012) but they still seem to be not usable for insert-only workloads > two years later. Based on current progress, it doesn't look like this > will be corrected until 9.5 (2015). I am kind of confused why this has > not generated more urgency. I think this largely FUD. They are hugely beneficial in some scenarios and less so in others. Just like lots of other features we have. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Feb 11, 2014 at 06:54:10PM +0100, Andres Freund wrote: > On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote: > > Yes, that pretty much sums it up. We introduced index-only scans in 9.2 > > (2012) but they still seem to be not usable for insert-only workloads > > two years later. Based on current progress, it doesn't look like this > > will be corrected until 9.5 (2015). I am kind of confused why this has > > not generated more urgency. > > I think this largely FUD. They are hugely beneficial in some scenarios > and less so in others. Just like lots of other features we have. I don't understand. Index-only scans are known to have benefits --- if an insert-only workload can't use that, why is that acceptable? What is fear-uncertainty-and-doubt about that? Please explain. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 2014-02-11 13:23:19 -0500, Bruce Momjian wrote: > On Tue, Feb 11, 2014 at 06:54:10PM +0100, Andres Freund wrote: > > On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote: > > > Yes, that pretty much sums it up. We introduced index-only scans in 9.2 > > > (2012) but they still seem to be not usable for insert-only workloads > > > two years later. Based on current progress, it doesn't look like this > > > will be corrected until 9.5 (2015). I am kind of confused why this has > > > not generated more urgency. > > > > I think this largely FUD. They are hugely beneficial in some scenarios > > and less so in others. Just like lots of other features we have. > > I don't understand. Index-only scans are known to have benefits --- if > an insert-only workload can't use that, why is that acceptable? What is > fear-uncertainty-and-doubt about that? Please explain. Uh, for one, insert only workloads certainly aren't the majority of usecases. Ergo there are plenty of cases where index only scans work out of the box. Also, they *do* work for insert only workloads, you just either have to wait longer, or manually trigger VACUUMs. That's a far cry from not being usable. I am not saying it shouldn't be improved, I just don't see the point of bringing it up while everyone is busy with the last CF and claiming it is unusable and that stating that it is surprisising that nobody really cares. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Feb 11, 2014 at 07:31:03PM +0100, Andres Freund wrote: > On 2014-02-11 13:23:19 -0500, Bruce Momjian wrote: > > On Tue, Feb 11, 2014 at 06:54:10PM +0100, Andres Freund wrote: > > > On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote: > > > > Yes, that pretty much sums it up. We introduced index-only scans in 9.2 > > > > (2012) but they still seem to be not usable for insert-only workloads > > > > two years later. Based on current progress, it doesn't look like this > > > > will be corrected until 9.5 (2015). I am kind of confused why this has > > > > not generated more urgency. > > > > > > I think this largely FUD. They are hugely beneficial in some scenarios > > > and less so in others. Just like lots of other features we have. > > > > I don't understand. Index-only scans are known to have benefits --- if > > an insert-only workload can't use that, why is that acceptable? What is > > fear-uncertainty-and-doubt about that? Please explain. > > Uh, for one, insert only workloads certainly aren't the majority of > usecases. Ergo there are plenty of cases where index only scans work out > of the box. True. > Also, they *do* work for insert only workloads, you just either have to > wait longer, or manually trigger VACUUMs. That's a far cry from not Wait longer for what? Anti-xid-wraparound vacuum? > being usable. Is using VACUUM for these cases documented? Should it be? > I am not saying it shouldn't be improved, I just don't see the point of > bringing it up while everyone is busy with the last CF and claiming it > is unusable and that stating that it is surprisising that nobody really > cares. Well, I brought it up in September too. My point was not that it is a new issue but that it has been such an ignored issue for two years. I am not asking for a fix, but right now we don't even have a plan on how to improve this. I still don't see how this is FUD, and you have not explained it to me. This is a known limitation for two years, not documented (?), and with no TODO item and no plan on how to improve it. Do you want to declare such cases FUD and just ignore them? I don't see how that moves us forward. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: > On Tue, Feb 11, 2014 at 07:31:03PM +0100, Andres Freund wrote: >> I am not saying it shouldn't be improved, I just don't see the point of >> bringing it up while everyone is busy with the last CF and claiming it >> is unusable and that stating that it is surprisising that nobody really >> cares. > Well, I brought it up in September too. My point was not that it is a > new issue but that it has been such an ignored issue for two years. I > am not asking for a fix, but right now we don't even have a plan on how > to improve this. Indeed, and considering that we're all busy with the CF, I think it's quite unreasonable of you to expect that we'll drop everything else to think about this problem right now. The reason it's like it is is that it's not easy to see how to make it better; so even if we did drop everything else, it's not clear to me that any plan would emerge anytime soon. regards, tom lane
On 2014-02-11 13:41:46 -0500, Bruce Momjian wrote: > Wait longer for what? Anti-xid-wraparound vacuum? Yes. > Is using VACUUM for these cases documented? Should it be? No idea, it seems to be part of at least part of the folkloric knowledge, from what I see at clients. > > I am not saying it shouldn't be improved, I just don't see the point of > > bringing it up while everyone is busy with the last CF and claiming it > > is unusable and that stating that it is surprisising that nobody really > > cares. > Well, I brought it up in September too. My point was not that it is a > new issue but that it has been such an ignored issue for two years. I > am not asking for a fix, but right now we don't even have a plan on how > to improve this. Coming up with a plan for this takes time and discussion, not something we seem to have aplenty of atm. And even if were to agree on a plan right now, we wouldn't incorporate it into 9.4, so what's the point of bringing it up now? > I still don't see how this is FUD, and you have not explained it to me. > This is a known limitation for two years, not documented (?), and with > no TODO item and no plan on how to improve it. Do you want to declare > such cases FUD and just ignore them? I don't see how that moves us > forward. Claiming something doesn't work while it just has manageable usability issues doesn't strike me as a reasonable starting point. If it bugs somebody enough to come up with a rough proposal it will get fixed... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Feb 11, 2014 at 01:54:48PM -0500, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Tue, Feb 11, 2014 at 07:31:03PM +0100, Andres Freund wrote: > >> I am not saying it shouldn't be improved, I just don't see the point of > >> bringing it up while everyone is busy with the last CF and claiming it > >> is unusable and that stating that it is surprisising that nobody really > >> cares. > > > Well, I brought it up in September too. My point was not that it is a > > new issue but that it has been such an ignored issue for two years. I > > am not asking for a fix, but right now we don't even have a plan on how > > to improve this. > > Indeed, and considering that we're all busy with the CF, I think it's > quite unreasonable of you to expect that we'll drop everything else > to think about this problem right now. The reason it's like it is > is that it's not easy to see how to make it better; so even if we did > drop everything else, it's not clear to me that any plan would emerge > anytime soon. Well, documenting the VACUUM requirement and adding it to the TODO list are things we should consider for 9.4. If you think doing that after the commit-fest is best, I can do that. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Tue, Feb 11, 2014 at 9:12 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Feb 11, 2014 at 11:28:36AM -0500, Robert Haas wrote:Yes, that pretty much sums it up. We introduced index-only scans in 9.2
> A sequential scan will set hint bits and will prune the page, but
> pruning the page doesn't ever mark it all-visible; that logic is
> entirely in vacuum. If that could be made cheap enough to be
> negligible, it might well be worth doing in heap_page_prune(). I
> think there might be a way to do that, but it's a bit tricky because
> the pruning logic iterates over the page in a somewhat complex way,
> not just a straightforward scan of all the item pointers the way the
> existing logic doesn't. It would be pretty cool if we could just use
> a bit out of the heap-prune xlog record to indicate whether the
> all-visible bit should be set; then we'd gain the benefit of marking
> things all-visible much more often without needing vacuum.
>
> That doesn't help insert-only tables much, though, because those won't
> require pruning. We set hint bits (which dirties the page) but
> currently don't write WAL. We'd have to change that to set the
> all-visible bit when scanning such a table, and that would be
> expensive. :-(
(2012) but they still seem to be not usable for insert-only workloads
two years later. Based on current progress, it doesn't look like this
will be corrected until 9.5 (2015). I am kind of confused why this has
not generated more urgency.
For insert and select only, they are usable (if your queries are of the type that could benefit from them), you just have to do some manual intervention. The list of features that sometimes require a DBA to do something to make maximum use of them under some circumstance would be a long one. It would be nice if it were better, but I don't see why this feature is particularly urgent compared to all the other things that could be improved. In particular I think the Freezing without IO is much more important. Freezing is rather unimportant until suddenly it is is the most important thing in the universe. If we could stop worrying about that, I think it would free up other aspects of vacuum scheduling to have more meddling/optimization done to it.
I guess my question is what approach do we want to take to fixing this?
If we are doing pruning, aren't we emitting WAL? You are right that for
an insert-only workload, we aren't going to prune, but if pruning WAL
overhead is acceptable for a sequential scan, isn't index-only
page-all-visible WAL overhead acceptable?
We often don't find that pruning particularly acceptable in seq scans, and there is a patch pending to conditionally turn it off for them.
Do we want to track the number of inserts in statistics and trigger an
auto-vacuum after a specified number of inserts?
We track relpages and relallvisible, which seems like a more direct measure. Once analyze is done (which is already triggered by inserts) and sets those, it could fire a vacuum based on the ratio of those values, or the autovac process could just look at the ratio after naptime. So just introduce autovacuum_vacuum_visible_factor. A problem there is that it would be a lot of work to aggressively keep the ratio high, and pointless if the types of queries done on that table don't benefit from IOS anyway, or if pages are dirtied so rapidly that no amount of vacuuming will keep the ratio high. Would we try to automatically tell which tables were which, or rely on the DBA setting per-table autovacuum_vacuum_visible_factor for tables that differ from the database norm?
The problem there is
that we really don't need to do any index cleanup, which is what vacuum
typically does --- we just want to scan the table and set the
all-visible bits, so that approach seems non-optimal.
In the case of no updates or deletes (or aborted inserts?), there would be nothing to clean up in the indexes and that step would be skipped (already in the current code). And if the indexes do need cleaning up, we certainly can't set the page all visible without doing that clean up.
Cheers,
Jeff
On Tue, Feb 11, 2014 at 4:13 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> >> Do we want to track the number of inserts in statistics and trigger an >> auto-vacuum after a specified number of inserts? > > > We track relpages and relallvisible, which seems like a more direct measure. > Once analyze is done (which is already triggered by inserts) and sets those, > it could fire a vacuum based on the ratio of those values, or the autovac > process could just look at the ratio after naptime. So just introduce > autovacuum_vacuum_visible_factor. A problem there is that it would be a lot > of work to aggressively keep the ratio high, and pointless if the types of > queries done on that table don't benefit from IOS anyway, or if pages are > dirtied so rapidly that no amount of vacuuming will keep the ratio high. > Would we try to automatically tell which tables were which, or rely on the > DBA setting per-table autovacuum_vacuum_visible_factor for tables that > differ from the database norm? Why not track how many times an IOS would be used but wasn't, or how many heap fetches in IOS have to be performed? Seems like a more direct measure of whether allvisible needs an update.
On Tue, Feb 11, 2014 at 05:51:36PM -0200, Claudio Freire wrote: > > We track relpages and relallvisible, which seems like a more direct measure. > > Once analyze is done (which is already triggered by inserts) and sets those, > > it could fire a vacuum based on the ratio of those values, or the autovac > > process could just look at the ratio after naptime. So just introduce > > autovacuum_vacuum_visible_factor. A problem there is that it would be a lot > > of work to aggressively keep the ratio high, and pointless if the types of > > queries done on that table don't benefit from IOS anyway, or if pages are > > dirtied so rapidly that no amount of vacuuming will keep the ratio high. > > Would we try to automatically tell which tables were which, or rely on the > > DBA setting per-table autovacuum_vacuum_visible_factor for tables that > > differ from the database norm? > > > Why not track how many times an IOS would be used but wasn't, or how > many heap fetches in IOS have to be performed? > > Seems like a more direct measure of whether allvisible needs an update. Now that is in interesting idea, and more direct. Do we need to adjust for the insert count, i.e. would the threadhold to trigger an autovacuum after finding index lookups that had to check the heap page for visibility be higher if many inserts are happening, perhaps dirtying pages? (If we are dirtying via update/delete, autovacuum will already trigger.) We are aggressive in clearing the page-all-visible flag (we have to be), but I think we need a little more aggressiveness for setting it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +