Обсуждение: Re: [PERFORM] encouraging index-only scans

Поиск
Список
Период
Сортировка

Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote:
> 
> On 12/12/2012 05:12 PM, Andrew Dunstan wrote:
> >
> >On 12/12/2012 04:32 PM, Tom Lane wrote:
> >>Andrew Dunstan <andrew@dunslane.net> writes:
> >>>A client is testing a migration from 9.1 to 9.2, and has found that a
> >>>large number of queries run much faster if they use index-only scans.
> >>>However, the only way he has found to get such a plan is by increasing
> >>>the seq_page_cost to insanely high levels (3.5). Is there any approved
> >>>way to encourage such scans that's a but less violent than this?
> >>Is the pg_class.relallvisible estimate for the table realistic? They
> >>might need a few more VACUUM and ANALYZE cycles to get it into the
> >>neighborhood of reality, if not.
> >
> >That was the problem - I didn't know this hadn't been done.
> >
> 
> Actually, the table had been analysed but not vacuumed, so this
> kinda begs the question what will happen to this value on
> pg_upgrade? Will people's queries suddenly get slower until
> autovacuum kicks in on the table?

[ moved to hackers list.]

Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
would prefer the later.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote:
>> Actually, the table had been analysed but not vacuumed, so this
>> kinda begs the question what will happen to this value on
>> pg_upgrade? Will people's queries suddenly get slower until
>> autovacuum kicks in on the table?

> [ moved to hackers list.]

> Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
> have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
> would prefer the later.

ANALYZE does not set that value, and is not going to start doing so,
because it doesn't scan enough of the table to derive a trustworthy
value.

It's been clear for some time that pg_upgrade ought to do something
about transferring the "statistics" columns in pg_class to the new
cluster.  This is just another example of why.
        regards, tom lane



Re: [PERFORM] encouraging index-only scans

От
Pavan Deolasee
Дата:
On Thu, Dec 13, 2012 at 9:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote:
>>> Actually, the table had been analysed but not vacuumed, so this
>>> kinda begs the question what will happen to this value on
>>> pg_upgrade? Will people's queries suddenly get slower until
>>> autovacuum kicks in on the table?
>
>> [ moved to hackers list.]
>
>> Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
>> have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
>> would prefer the later.
>
> ANALYZE does not set that value, and is not going to start doing so,
> because it doesn't scan enough of the table to derive a trustworthy
> value.
>

Should we do that though ? i.e. scan the entire map and count the
number of bits at the end of ANALYZE, like we do at the end of VACUUM
? I recently tried to optimize that code path by not recounting at the
end of the vacuum and instead track the number of all-visible bits
while scanning them in the earlier phases on vacuum. But it turned out
that its so fast to count even a million bits that its probably not
worth doing so.

> It's been clear for some time that pg_upgrade ought to do something
> about transferring the "statistics" columns in pg_class to the new
> cluster.  This is just another example of why.
>

+1.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee



Re: [PERFORM] encouraging index-only scans

От
Simon Riggs
Дата:
On 13 December 2012 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
>> have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
>> would prefer the later.
>
> ANALYZE does not set that value, and is not going to start doing so,
> because it doesn't scan enough of the table to derive a trustworthy
> value.

ISTM that ANALYZE doesn't need to scan the table to do this. The
vismap is now trustworthy and we can scan it separately on ANALYZE.

More to the point, since we run ANALYZE more frequently than we run
VACUUM, the value stored by the last VACUUM could be very stale.

> It's been clear for some time that pg_upgrade ought to do something
> about transferring the "statistics" columns in pg_class to the new
> cluster.  This is just another example of why.

Agreed, but that could bring other problems as well.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Thu, Dec 13, 2012 at 09:40:40AM +0000, Simon Riggs wrote:
> On 13 December 2012 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> >> Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
> >> have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
> >> would prefer the later.
> >
> > ANALYZE does not set that value, and is not going to start doing so,
> > because it doesn't scan enough of the table to derive a trustworthy
> > value.
> 
> ISTM that ANALYZE doesn't need to scan the table to do this. The
> vismap is now trustworthy and we can scan it separately on ANALYZE.
> 
> More to the point, since we run ANALYZE more frequently than we run
> VACUUM, the value stored by the last VACUUM could be very stale.

Wouldn't inserts affect the relallvisible ratio, but not cause a vacuum?
Seems we should be having analyze update this independent of pg_upgrade
needing it.  Also, why is this in pg_class?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Peter Geoghegan
Дата:
On 13 December 2012 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ANALYZE does not set that value, and is not going to start doing so,
> because it doesn't scan enough of the table to derive a trustworthy
> value.

I'm slightly surprised by your remarks here, because the commit
message where the relallvisible column was added (commit
a2822fb9337a21f98ac4ce850bb4145acf47ca27) says:

"Add a column pg_class.relallvisible to remember the number of pages
that were all-visible according to the visibility map as of the last
VACUUM
(or ANALYZE, or some other operations that update pg_class.relpages).
Use relallvisible/relpages, instead of an arbitrary constant, to
estimate how many heap page fetches can be avoided during an
index-only scan."

Have I missed some nuance?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Thu, Dec 13, 2012 at 03:31:06PM +0000, Peter Geoghegan wrote:
> On 13 December 2012 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > ANALYZE does not set that value, and is not going to start doing so,
> > because it doesn't scan enough of the table to derive a trustworthy
> > value.
> 
> I'm slightly surprised by your remarks here, because the commit
> message where the relallvisible column was added (commit
> a2822fb9337a21f98ac4ce850bb4145acf47ca27) says:
> 
> "Add a column pg_class.relallvisible to remember the number of pages
> that were all-visible according to the visibility map as of the last
> VACUUM
> (or ANALYZE, or some other operations that update pg_class.relpages).
> Use relallvisible/relpages, instead of an arbitrary constant, to
> estimate how many heap page fetches can be avoided during an
> index-only scan."
> 
> Have I missed some nuance?

I am looking back at this issue now and I think you are correct.  The
commit you mention (Oct 7 2011) says ANALYZE updates the visibility map,
and the code matches that:
    if (!inh)        vac_update_relstats(onerel,                            RelationGetNumberOfBlocks(onerel),
                 totalrows,
 
-->                                visibilitymap_count(onerel),                            hasindex,
       InvalidTransactionId);
 

so if an index scan was not being used after an ANALYZE, it isn't a bad
allvisibile estimate but something else.  This code was in PG 9.2.


--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Wed, Sep  4, 2013 at 04:56:55PM -0400, Bruce Momjian wrote:
> > "Add a column pg_class.relallvisible to remember the number of pages
> > that were all-visible according to the visibility map as of the last
> > VACUUM
> > (or ANALYZE, or some other operations that update pg_class.relpages).
> > Use relallvisible/relpages, instead of an arbitrary constant, to
> > estimate how many heap page fetches can be avoided during an
> > index-only scan."
> > 
> > Have I missed some nuance?
> 
> I am looking back at this issue now and I think you are correct.  The
> commit you mention (Oct 7 2011) says ANALYZE updates the visibility map,
> and the code matches that:
> 
>         if (!inh)
>             vac_update_relstats(onerel,
>                                 RelationGetNumberOfBlocks(onerel),
>                                 totalrows,
> -->                                visibilitymap_count(onerel),
>                                 hasindex,
>                                 InvalidTransactionId);
> 
> so if an index scan was not being used after an ANALYZE, it isn't a bad
> allvisibile estimate but something else.  This code was in PG 9.2.

Actually, I now realize it is more complex than that, and worse.  There
are several questions to study to understand when pg_class.relallvisible
is updated (which is used to determine if index-only scans are a good
optimization choice), and when VM all-visible bits are set so heap pages
can be skipped during index-only scans:
1)  When are VM bits set:    vacuum (non-full)    analyze (only some random pages)2)  When are massive rows added but
VMbits not set:    copy3) When are VM bits cleared:    insert/update/delete    vacuum (non-full)4)  When are VM map
filescleared:    vacuum full    cluster5) When is pg_class.relallvisible updated via a VM map file scan:    vacuum
(non-full)   analyze
 

Vacuums run by autovacuum are driven by n_dead_tuples, which is only
update and delete.  Therefore, any scenario where vacuum (non-full) is
never run will not have significant VM bits set.  The only bits that
will be set will be by pages visited randomly by analyze.

The following table activities will not set proper VM bits:
               vacuum full               cluster               copy    insert-only

If updates and deletes happen, there will eventually be sufficient
reason for autovacuum to vacuum the table and set proper VM bits, and
pg_class.relallvisible.

The calculus we should use to determine when we need to run vacuum has
changed with index-only scans, and I am not sure we ever fully addressed
this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Robert Haas
Дата:
On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Actually, I now realize it is more complex than that, and worse.  There
> are several questions to study to understand when pg_class.relallvisible
> is updated (which is used to determine if index-only scans are a good
> optimization choice), and when VM all-visible bits are set so heap pages
> can be skipped during index-only scans:
>
>         1)  When are VM bits set:
>                 vacuum (non-full)
>                 analyze (only some random pages)

Analyze doesn't set visibility-map bits.  It only updates statistics
about how many are set.

> The calculus we should use to determine when we need to run vacuum has
> changed with index-only scans, and I am not sure we ever fully addressed
> this.

Yeah, we didn't.  I think the hard part is figuring out what behavior
would be best.  Counting inserts as well as updates and deletes would
be a simple approach, but I don't have much confidence in it.  My
experience is that having vacuum or analyze kick in during a bulk-load
operation is a disaster.  We'd kinda like to come up with a way to
make vacuum run after the bulk load is complete, maybe, but how would
we identify that time, and there are probably cases where that's not
right either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PERFORM] encouraging index-only scans

От
Gavin Flower
Дата:
<div class="moz-cite-prefix">On 06/09/13 13:10, Robert Haas wrote:<br /></div><blockquote
cite="mid:CA+Tgmobh_-n-GvKX++DVctToZ9SZkdhvszP=CBjh9XZOvWsTEg@mail.gmail.com"type="cite"><pre wrap="">On Thu, Sep 5,
2013at 8:14 PM, Bruce Momjian <a class="moz-txt-link-rfc2396E"
href="mailto:bruce@momjian.us"><bruce@momjian.us></a>wrote:
 
</pre><blockquote type="cite"><pre wrap="">Actually, I now realize it is more complex than that, and worse.  There
are several questions to study to understand when pg_class.relallvisible
is updated (which is used to determine if index-only scans are a good
optimization choice), and when VM all-visible bits are set so heap pages
can be skipped during index-only scans:
       1)  When are VM bits set:               vacuum (non-full)               analyze (only some random pages)
</pre></blockquote><pre wrap="">
Analyze doesn't set visibility-map bits.  It only updates statistics
about how many are set.

</pre><blockquote type="cite"><pre wrap="">The calculus we should use to determine when we need to run vacuum has
changed with index-only scans, and I am not sure we ever fully addressed
this.
</pre></blockquote><pre wrap="">
Yeah, we didn't.  I think the hard part is figuring out what behavior
would be best.  Counting inserts as well as updates and deletes would
be a simple approach, but I don't have much confidence in it.  My
experience is that having vacuum or analyze kick in during a bulk-load
operation is a disaster.  We'd kinda like to come up with a way to
make vacuum run after the bulk load is complete, maybe, but how would
we identify that time, and there are probably cases where that's not
right either.

</pre></blockquote><font size="-1">How about a 'VACUUM AFTER' command</font> (part of the 'BEGIN' transaction syntax?)
thatwould:<br /><ol><li>only be valid in a transaction<li>initiate a vacuum after the current transaction
completed<li>deferany vacuum triggered due to other criteria<br /></ol> If the transaction was rolled back: then if
therewas a pending vacuum, due to other reasons, it would then be actioned.<br /><br /> On normal transaction
completion,then if there was a pending vacuum it would be combined with the one in the transaction.<br /><br /> Still
wouldneed some method of ensuring any pending vacuum was done if the transaction hung, or took too long.<br /><br /><br
/>Cheers,<br /> Gavin<br /> 

Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Thu, Sep  5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
> On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Actually, I now realize it is more complex than that, and worse.  There
> > are several questions to study to understand when pg_class.relallvisible
> > is updated (which is used to determine if index-only scans are a good
> > optimization choice), and when VM all-visible bits are set so heap pages
> > can be skipped during index-only scans:
> >
> >         1)  When are VM bits set:
> >                 vacuum (non-full)
> >                 analyze (only some random pages)
> 
> Analyze doesn't set visibility-map bits.  It only updates statistics
> about how many are set.

Sorry, yes you are correct.

> > The calculus we should use to determine when we need to run vacuum has
> > changed with index-only scans, and I am not sure we ever fully addressed
> > this.
> 
> Yeah, we didn't.  I think the hard part is figuring out what behavior
> would be best.  Counting inserts as well as updates and deletes would
> be a simple approach, but I don't have much confidence in it.  My
> experience is that having vacuum or analyze kick in during a bulk-load
> operation is a disaster.  We'd kinda like to come up with a way to
> make vacuum run after the bulk load is complete, maybe, but how would
> we identify that time, and there are probably cases where that's not
> right either.

I am unsure how we have gone a year with index-only scans and I am just
now learning that it only works well with update/delete workloads or by
running vacuum manually.  I only found this out going back over January
emails.  Did other people know this?  Was it not considered a serious
problem?

Well, our logic has been that vacuum is only for removing expired rows. 
I think we either need to improve that, or somehow make sequential scans
update the VM map, and then find a way to trigger update of
relallvisible even without inserts.

Ideas
-----

I think we need to detect tables that do not have VM bits set and try to
determine if they should be vacuumed.  If a table has most of its VM
bits set, there in need to vacuum it for VM bit setting.

Autovacuum knows how many pages are in the table via its file size, and
it can scan the VM map to see how many pages are _not_ marked
all-visible.  If the VM map has many pages that are _not_ marked as
all-visible, and change count since last vacuum is low, those pages
might now be all-visible and vacuum might find them.  One problem is
that a long-running transaction is not going to update relallvisible
until commit, so you might be vacuuming a table that is being modified,
e.g. bulk loads.  Do we have any way of detecting if a backend is
modifying a table?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Alvaro Herrera
Дата:
Bruce Momjian escribió:

> Ideas
> -----
> 
> I think we need to detect tables that do not have VM bits set and try to
> determine if they should be vacuumed.  If a table has most of its VM
> bits set, there in need to vacuum it for VM bit setting.

I think it's shortsighted to keep thinking of autovacuum as just a way
to run VACUUM and ANALYZE.  We have already discussed work items that
need to be done separately, such as truncating the last few empty pages
on a relation that was vacuumed recently.  We also need to process a GIN
index' pending insertion list; and with minmax indexes I will want to
run summarization of heap page ranges.

So maybe instead of trying to think of VM bit setting as part of vacuum,
we could just keep stats about how many pages we might need to scan
because of possibly needing to set the bit, and then doing that in
autovacuum, independently from actually vacuuming the relation.

I'm not sure if we need to expose all these new maintenance actions as
SQL commands.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [PERFORM] encouraging index-only scans

От
Dimitri Fontaine
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I'm not sure if we need to expose all these new maintenance actions as
> SQL commands.

I strongly think we should, if only for diagnostic purposes. Also to
adapt to some well defined workloads that the automatic system is not
designed to handle.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: [PERFORM] encouraging index-only scans

От
Hannu Krosing
Дата:
On 09/06/2013 09:23 AM, Dimitri Fontaine wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> I'm not sure if we need to expose all these new maintenance actions as
>> SQL commands.
> I strongly think we should, if only for diagnostic purposes. 
It would be much easier and more flexible to expose them
as pg_*() function calls, not proper "commands".
> Also to
> adapt to some well defined workloads that the automatic system is not
> designed to handle.
+1

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2013-09-06 01:22:36 -0400, Alvaro Herrera wrote:
> I think it's shortsighted to keep thinking of autovacuum as just a way
> to run VACUUM and ANALYZE.  We have already discussed work items that
> need to be done separately, such as truncating the last few empty pages
> on a relation that was vacuumed recently.  We also need to process a GIN
> index' pending insertion list; and with minmax indexes I will want to
> run summarization of heap page ranges.

Agreed.

> So maybe instead of trying to think of VM bit setting as part of vacuum,
> we could just keep stats about how many pages we might need to scan
> because of possibly needing to set the bit, and then doing that in
> autovacuum, independently from actually vacuuming the relation.

I am not sure I understand this though. What would be the point to go
and set all visible and not do the rest of the vacuuming work?

I think triggering vacuuming by scanning the visibility map for the
number of unset bits and use that as another trigger is a good idea. The
vm should ensure we're not doing superflous work.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2013-09-06 13:38:56 +0200, Hannu Krosing wrote:
> On 09/06/2013 09:23 AM, Dimitri Fontaine wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> >> I'm not sure if we need to expose all these new maintenance actions as
> >> SQL commands.
> > I strongly think we should, if only for diagnostic purposes. 
> It would be much easier and more flexible to expose them
> as pg_*() function calls, not proper "commands".

I don't think that's as easy as you might imagine. For much of what's
done in that context you cannot be in a transaction, you even need to be
in a toplevel statement (since we internally
CommitTransactionCommand/StartTransactionCommand).

So those pg_* commands couldn't be called (except possibly via the
fastpath function call API ...) which might restrict their usefulnes a
teensy bit ;)

So, I think extending the options passed to VACUUM - since it can take
pretty generic options these days - is a more realistic path.

> > Also to
> > adapt to some well defined workloads that the automatic system is not
> > designed to handle.
> +1

What would you like to expose individually?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Hannu Krosing
Дата:
On 09/06/2013 03:12 PM, Andres Freund wrote:
> On 2013-09-06 13:38:56 +0200, Hannu Krosing wrote:
>> On 09/06/2013 09:23 AM, Dimitri Fontaine wrote:
>>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>>> I'm not sure if we need to expose all these new maintenance actions as
>>>> SQL commands.
>>> I strongly think we should, if only for diagnostic purposes. 
>> It would be much easier and more flexible to expose them
>> as pg_*() function calls, not proper "commands".
> I don't think that's as easy as you might imagine. For much of what's
> done in that context you cannot be in a transaction, you even need to be
> in a toplevel statement (since we internally
> CommitTransactionCommand/StartTransactionCommand).
>
> So those pg_* commands couldn't be called (except possibly via the
> fastpath function call API ...) which might restrict their usefulnes a
> teensy bit ;)
>
> So, I think extending the options passed to VACUUM - since it can take
> pretty generic options these days - is a more realistic path.
Might be something convoluted like 

VACUUM indexname WITH (function = "pg_cleanup_gin($1)");

:)
>
>>> Also to
>>> adapt to some well defined workloads that the automatic system is not
>>> designed to handle.
>> +1
> What would you like to expose individually?
>
> Greetings,
>
> Andres Freund
>




Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Fri, Sep  6, 2013 at 03:08:54PM +0200, Andres Freund wrote:
> On 2013-09-06 01:22:36 -0400, Alvaro Herrera wrote:
> > I think it's shortsighted to keep thinking of autovacuum as just a way
> > to run VACUUM and ANALYZE.  We have already discussed work items that
> > need to be done separately, such as truncating the last few empty pages
> > on a relation that was vacuumed recently.  We also need to process a GIN
> > index' pending insertion list; and with minmax indexes I will want to
> > run summarization of heap page ranges.
> 
> Agreed.
> 
> > So maybe instead of trying to think of VM bit setting as part of vacuum,
> > we could just keep stats about how many pages we might need to scan
> > because of possibly needing to set the bit, and then doing that in
> > autovacuum, independently from actually vacuuming the relation.
> 
> I am not sure I understand this though. What would be the point to go
> and set all visible and not do the rest of the vacuuming work?
> 
> I think triggering vacuuming by scanning the visibility map for the
> number of unset bits and use that as another trigger is a good idea. The
> vm should ensure we're not doing superflous work.

Yes, I think it might be hard to justify a separate VM-set-only scan of
the table.  If you are already reading the table, and already checking
to see if you can set the VM bit, I am not sure why you would not also
remove old rows, especially since removing those rows might be necessary
to allow setting VM bits.

Another problem I thought of is that while automatic vacuuming only
happens with high update/delete load, index-only scans are best on
mostly non-write tables, so we have bad behavior where the ideal case
(static data) doesn't get vm-bits set, while update/delete has the
vm-bits set, but then cleared as more update/deletes occur.

The more I look at this the worse it appears.   How has this gone
unaddressed for over a year?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote:
> > I am not sure I understand this though. What would be the point to go
> > and set all visible and not do the rest of the vacuuming work?
> >
> > I think triggering vacuuming by scanning the visibility map for the
> > number of unset bits and use that as another trigger is a good idea. The
> > vm should ensure we're not doing superflous work.
>
> Yes, I think it might be hard to justify a separate VM-set-only scan of
> the table.  If you are already reading the table, and already checking
> to see if you can set the VM bit, I am not sure why you would not also
> remove old rows, especially since removing those rows might be necessary
> to allow setting VM bits.

Yep. Although adding the table back into the fsm will lead to it being
used for new writes again...

> Another problem I thought of is that while automatic vacuuming only
> happens with high update/delete load, index-only scans are best on
> mostly non-write tables, so we have bad behavior where the ideal case
> (static data) doesn't get vm-bits set, while update/delete has the
> vm-bits set, but then cleared as more update/deletes occur.

Well, older tables will get vacuumed due to vacuum_freeze_table_age. So
at some point they will get vacuumed and the vm bits will get set.

> The more I look at this the worse it appears.   How has this gone
> unaddressed for over a year?

It's been discussed several times including during the introduction of
the feature. I am a bit surprised about the panickey tone in this
thread.
Yes, we need to overhaul the way vacuum works (to reduce the frequency
of rewriting stuff repeatedly) and the way it's triggered (priorization,
more trigger conditions) but all these are known things and "just" need
somebody with time.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Fri, Sep  6, 2013 at 06:36:47PM +0200, Andres Freund wrote:
> On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote:
> > > I am not sure I understand this though. What would be the point to go
> > > and set all visible and not do the rest of the vacuuming work?
> > >
> > > I think triggering vacuuming by scanning the visibility map for the
> > > number of unset bits and use that as another trigger is a good idea. The
> > > vm should ensure we're not doing superflous work.
> >
> > Yes, I think it might be hard to justify a separate VM-set-only scan of
> > the table.  If you are already reading the table, and already checking
> > to see if you can set the VM bit, I am not sure why you would not also
> > remove old rows, especially since removing those rows might be necessary
> > to allow setting VM bits.
> 
> Yep. Although adding the table back into the fsm will lead to it being
> used for new writes again...

You mean adding _pages_ back into the table's FSM?  Yes, that is going
to cause those pages to get dirty, but it is better than expanding the
table size.  I don't see why you would not update the FSM.

> > Another problem I thought of is that while automatic vacuuming only
> > happens with high update/delete load, index-only scans are best on
> > mostly non-write tables, so we have bad behavior where the ideal case
> > (static data) doesn't get vm-bits set, while update/delete has the
> > vm-bits set, but then cleared as more update/deletes occur.
> 
> Well, older tables will get vacuumed due to vacuum_freeze_table_age. So
> at some point they will get vacuumed and the vm bits will get set.

Hmm, good point.  That would help with an insert-only workload, as long
as you can chew through 200M transactions.   That doesn't help with a
read-only workload as we don't consume transction IDs for SELECT.

> > The more I look at this the worse it appears.   How has this gone
> > unaddressed for over a year?
> 
> It's been discussed several times including during the introduction of
> the feature. I am a bit surprised about the panickey tone in this
> thread.

This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
manual VACUUM was required for index-only scans.  That thread ended with
us realizing that pg_upgrade's ANALYZE runs will populate
pg_class.relallvisible.

What I didn't see in that thread is an analysis of what cases are going
to require manual vacuum, and I have seen no work in 9.3 to improve
that.  I don't even see it on the TODO list.

It bothers me that we spent time developing index-only scans, but have
significant workloads where it doesn't work, no efforts on improving it,
and no documentation on manual workarounds.  I have not even seen
discussion on how we are going to improve this.  I would like to have
that discussion now.

> Yes, we need to overhaul the way vacuum works (to reduce the frequency
> of rewriting stuff repeatedly) and the way it's triggered (priorization,
> more trigger conditions) but all these are known things and "just" need
> somebody with time.

Based on the work needed to improve this, I would have thought someone
would have taken this on during 9.3 development.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Fri, Sep  6, 2013 at 01:01:59PM -0400, Bruce Momjian wrote:
> This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
> manual VACUUM was required for index-only scans.  That thread ended with
> us realizing that pg_upgrade's ANALYZE runs will populate
> pg_class.relallvisible.
> 
> What I didn't see in that thread is an analysis of what cases are going
> to require manual vacuum, and I have seen no work in 9.3 to improve
> that.  I don't even see it on the TODO list.

OK, let's start the discussion then.  I have added a TODO list:
Improve setting of visibility map bits for read-only and insert-only workloads

So, what should trigger an auto-vacuum vacuum for these workloads? 
Rather than activity, which is what normally drives autovacuum, it is
lack of activity that should drive it, combined with a high VM cleared
bit percentage.

It seems we can use these statistics values:
 n_tup_ins           | bigint                    n_tup_upd           | bigint                    n_tup_del           |
bigint                   n_tup_hot_upd       | bigint                    n_live_tup          | bigint
n_dead_tup          | bigint                    n_mod_since_analyze | bigint                    last_vacuum         |
timestampwith time zone  last_autovacuum     | timestamp with time zone 
 

Particilarly last_vacuum and last_autovacuum can tell us the last time
of vacuum.  If the n_tup_upd/n_tup_del counts are low, and the  VM set
bit count is low, it might need vacuuming, though inserts into existing
pages would complicate that.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Jim Nasby
Дата:
On 9/5/13 8:29 PM, Gavin Flower wrote:
> How about a 'VACUUM AFTER' command (part of the 'BEGIN' transaction syntax?) that would:
>
>  1. only be valid in a transaction
>  2. initiate a vacuum after the current transaction completed
>  3. defer any vacuum triggered due to other criteria
>
> If the transaction was rolled back: then if there was a pending vacuum, due to other reasons, it would then be
actioned.
>
> On normal transaction completion, then if there was a pending vacuum it would be combined with the one in the
transaction.
>
> Still would need some method of ensuring any pending vacuum was done if the transaction hung, or took too long.

I *really* like the idea of BEGIN VACUUM AFTER, but I suspect it would be of very limited usefulness if it didn't
accountfor currently running transactions.
 

I'm thinking we add a vacuum_after_xid field somewhere (pg_class), and instead of attempting to vacuum inside the
backendat commit time the transaction would set that field to it's XID unless the field already had a newer XID in it.
 

autovac would then add all tables where vacuum_after_xid < the oldest running transaction to it's priority list.
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: [PERFORM] encouraging index-only scans

От
Jim Nasby
Дата:
On 9/6/13 2:13 PM, Bruce Momjian wrote:
> On Fri, Sep  6, 2013 at 01:01:59PM -0400, Bruce Momjian wrote:
>> This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
>> manual VACUUM was required for index-only scans.  That thread ended with
>> us realizing that pg_upgrade's ANALYZE runs will populate
>> pg_class.relallvisible.
>>
>> What I didn't see in that thread is an analysis of what cases are going
>> to require manual vacuum, and I have seen no work in 9.3 to improve
>> that.  I don't even see it on the TODO list.
>
> OK, let's start the discussion then.  I have added a TODO list:
>
>     Improve setting of visibility map bits for read-only and insert-only workloads
>
> So, what should trigger an auto-vacuum vacuum for these workloads?
> Rather than activity, which is what normally drives autovacuum, it is
> lack of activity that should drive it, combined with a high VM cleared
> bit percentage.
>
> It seems we can use these statistics values:
>
>      n_tup_ins           | bigint
>      n_tup_upd           | bigint
>      n_tup_del           | bigint
>      n_tup_hot_upd       | bigint
>      n_live_tup          | bigint
>      n_dead_tup          | bigint
>      n_mod_since_analyze | bigint
>      last_vacuum         | timestamp with time zone
>      last_autovacuum     | timestamp with time zone
>
> Particilarly last_vacuum and last_autovacuum can tell us the last time
> of vacuum.  If the n_tup_upd/n_tup_del counts are low, and the  VM set
> bit count is low, it might need vacuuming, though inserts into existing
> pages would complicate that.

Something else that might be useful to look at is if there are any FSM entries or not. True insert only shouldn't have
anyFSM.
 

That said, there's definitely another case to think about... tables that see update activity on newly inserted rows but
noton older rows. A work queue that is not pruned would be an example of that:
 

INSERT new work item
UPDATE work item SET status = 'In process';
UPDATE work item SET completion = '50%';
UPDATE work item SET sattus = 'Complete", completion = '100%';

In this case I would expect most of the pages in the table (except the very end) to be all visible.
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2013-09-06 13:01:59 -0400, Bruce Momjian wrote:
> On Fri, Sep  6, 2013 at 06:36:47PM +0200, Andres Freund wrote:
> > On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote:
> > > > I am not sure I understand this though. What would be the point to go
> > > > and set all visible and not do the rest of the vacuuming work?
> > > >
> > > > I think triggering vacuuming by scanning the visibility map for the
> > > > number of unset bits and use that as another trigger is a good idea. The
> > > > vm should ensure we're not doing superflous work.
> > >
> > > Yes, I think it might be hard to justify a separate VM-set-only scan of
> > > the table.  If you are already reading the table, and already checking
> > > to see if you can set the VM bit, I am not sure why you would not also
> > > remove old rows, especially since removing those rows might be necessary
> > > to allow setting VM bits.
> > 
> > Yep. Although adding the table back into the fsm will lead to it being
> > used for new writes again...
> 
> You mean adding _pages_ back into the table's FSM?  Yes, that is going
> to cause those pages to get dirty, but it is better than expanding the
> table size.  I don't see why you would not update the FSM.

You're right, we should add them, I wasn't really questioning that. I
was, quietly so you couldn't hear it, wondering whether we should
priorize the target buffer selection differently.

> > > Another problem I thought of is that while automatic vacuuming only
> > > happens with high update/delete load, index-only scans are best on
> > > mostly non-write tables, so we have bad behavior where the ideal case
> > > (static data) doesn't get vm-bits set, while update/delete has the
> > > vm-bits set, but then cleared as more update/deletes occur.
> > 
> > Well, older tables will get vacuumed due to vacuum_freeze_table_age. So
> > at some point they will get vacuumed and the vm bits will get set.
> 
> Hmm, good point.  That would help with an insert-only workload, as long
> as you can chew through 200M transactions.   That doesn't help with a
> read-only workload as we don't consume transction IDs for SELECT.

It's even 150mio. For the other workloads, its pretty "common" wisdom to
VACUUM after bulk data loading. I think we even document that.

> > > The more I look at this the worse it appears.   How has this gone
> > > unaddressed for over a year?
> > 
> > It's been discussed several times including during the introduction of
> > the feature. I am a bit surprised about the panickey tone in this
> > thread.
> 
> This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
> manual VACUUM was required for index-only scans.  That thread ended with
> us realizing that pg_upgrade's ANALYZE runs will populate
> pg_class.relallvisible.

> What I didn't see in that thread is an analysis of what cases are going
> to require manual vacuum, and I have seen no work in 9.3 to improve
> that.  I don't even see it on the TODO list.

Yes, TODO maybe missing.

> It bothers me that we spent time developing index-only scans, but have
> significant workloads where it doesn't work, no efforts on improving it,
> and no documentation on manual workarounds.  I have not even seen
> discussion on how we are going to improve this.  I would like to have
> that discussion now.

It's not like the feature is useless in this case. You just need to
perform an extra operation to activate it. I am not saying we shouldn't
document it better, but it seriously worries me that a useful feature is
depicted as useless because it requires a manual VACUUM in some
circumstances.

> > Yes, we need to overhaul the way vacuum works (to reduce the frequency
> > of rewriting stuff repeatedly) and the way it's triggered (priorization,
> > more trigger conditions) but all these are known things and "just" need
> > somebody with time.

> Based on the work needed to improve this, I would have thought someone
> would have taken this on during 9.3 development.

There has been some discussion about it indirectly via the freezing
stuff. That also would require more "advanced" scheduling.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2013-09-06 15:13:30 -0400, Bruce Momjian wrote:
> On Fri, Sep  6, 2013 at 01:01:59PM -0400, Bruce Momjian wrote:
> > This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
> > manual VACUUM was required for index-only scans.  That thread ended with
> > us realizing that pg_upgrade's ANALYZE runs will populate
> > pg_class.relallvisible.
> > 
> > What I didn't see in that thread is an analysis of what cases are going
> > to require manual vacuum, and I have seen no work in 9.3 to improve
> > that.  I don't even see it on the TODO list.
> 
> OK, let's start the discussion then.  I have added a TODO list:
> 
>     Improve setting of visibility map bits for read-only and insert-only workloads
> 
> So, what should trigger an auto-vacuum vacuum for these workloads? 
> Rather than activity, which is what normally drives autovacuum, it is
> lack of activity that should drive it, combined with a high VM cleared
> bit percentage.
> 
> It seems we can use these statistics values:
> 
>      n_tup_ins           | bigint                   
>      n_tup_upd           | bigint                   
>      n_tup_del           | bigint                   
>      n_tup_hot_upd       | bigint                   
>      n_live_tup          | bigint                   
>      n_dead_tup          | bigint                   
>      n_mod_since_analyze | bigint                   
>      last_vacuum         | timestamp with time zone 
>      last_autovacuum     | timestamp with time zone 
> 
> Particilarly last_vacuum and last_autovacuum can tell us the last time
> of vacuum.  If the n_tup_upd/n_tup_del counts are low, and the  VM set
> bit count is low, it might need vacuuming, though inserts into existing
> pages would complicate that.

I wonder if we shouldn't trigger most vacuums (not analyze!) via unset
fsm bits. Perhaps combined with keeping track of RecentGlobalXmin to
make sure we're not repeatedly checking for work that cannot yet be
done.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Sat, Sep  7, 2013 at 12:26:23AM +0200, Andres Freund wrote:
> > So, what should trigger an auto-vacuum vacuum for these workloads? 
> > Rather than activity, which is what normally drives autovacuum, it is
> > lack of activity that should drive it, combined with a high VM cleared
> > bit percentage.
> > 
> > It seems we can use these statistics values:
> > 
> >      n_tup_ins           | bigint                   
> >      n_tup_upd           | bigint                   
> >      n_tup_del           | bigint                   
> >      n_tup_hot_upd       | bigint                   
> >      n_live_tup          | bigint                   
> >      n_dead_tup          | bigint                   
> >      n_mod_since_analyze | bigint                   
> >      last_vacuum         | timestamp with time zone 
> >      last_autovacuum     | timestamp with time zone 
> > 
> > Particilarly last_vacuum and last_autovacuum can tell us the last time
> > of vacuum.  If the n_tup_upd/n_tup_del counts are low, and the  VM set
> > bit count is low, it might need vacuuming, though inserts into existing
> > pages would complicate that.
> 
> I wonder if we shouldn't trigger most vacuums (not analyze!) via unset
> fsm bits. Perhaps combined with keeping track of RecentGlobalXmin to

Fsm bits?  FSM tracks the free space on each page.  How does that help?

> make sure we're not repeatedly checking for work that cannot yet be
> done.

The idea of using RecentGlobalXmin to see how much _work_ has happened
since the last vacuum is interesting, but it doesn't handle read-only
transactions;  I am not sure how they can be tracked.  You make a good
point that 5 minutes passing is meaningless --- you really want to know
how many transactions have completed.  Unfortunately, our virtual
transactions make that hard to compute.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2013-09-06 20:29:08 -0400, Bruce Momjian wrote:
> On Sat, Sep  7, 2013 at 12:26:23AM +0200, Andres Freund wrote:
> > I wonder if we shouldn't trigger most vacuums (not analyze!) via unset
> > fsm bits. Perhaps combined with keeping track of RecentGlobalXmin to
> 
> Fsm bits?  FSM tracks the free space on each page.  How does that
> help?

Err. I was way too tired when I wrote that. vm bits.

> > make sure we're not repeatedly checking for work that cannot yet be
> > done.

> The idea of using RecentGlobalXmin to see how much _work_ has happened
> since the last vacuum is interesting, but it doesn't handle read-only
> transactions;  I am not sure how they can be tracked.  You make a good
> point that 5 minutes passing is meaningless --- you really want to know
> how many transactions have completed.

So, what I was pondering went slightly into a different direction:

(lets ignore anti wraparound vacuum for now)

Currently we trigger autovacuums by the assumed number of dead
tuples. In the course of it's action it usually will find that it cannot
remove all dead rows and that it cannot mark everything as all
visible. That's because the xmin horizon hasn't advanced far enough. We
won't trigger another vacuum after that unless there are further dead
tuples in the relation...
One trick if we want to overcome that problem and that we do not handle
setting all visible nicely for INSERT only workloads would be to trigger
vacuum by the amount of pages that are not marked all visible in the vm.

The problem there is that repeatedly scanning a relation that's only 50%
visible where the rest cannot be marked all visible because of a
longrunning pg_dump obivously isn't a good idea. So we need something to
notify us when there's work to be done. Using elapsed time seems like a
bad idea because it doesn't adapt to changing workloads very well and
doesn't work nicely for different relations.

What I was thinking of was to keep track of the oldest xids on pages
that cannot be marked all visible. I haven't thought about the
statistics part much, but what if we binned the space between
[RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of
pages falling into each bin. Then after the vacuum finished we could
compute how far RecentGlobalXmin would have to progress to make another
vacuum worthwile by counting the number of pages from the lowest bin
upwards and use the bin's upper limit as the triggering xid.

Now, we'd definitely need to amend that scheme by something that handles
pages that are newly written to, but it seems something like that
wouldn't be too hard to implement and would make autovacuum more useful.

> Unfortunately, our virtual transactions make that hard to compute.

I don't think they pose too much of a complexity. We basically only have
to care about PGXACT->xmin here and virtual transactions don't change
the handling of that ...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Sat, Sep  7, 2013 at 07:34:49AM +0200, Andres Freund wrote:
> > The idea of using RecentGlobalXmin to see how much _work_ has happened
> > since the last vacuum is interesting, but it doesn't handle read-only
> > transactions;  I am not sure how they can be tracked.  You make a good
> > point that 5 minutes passing is meaningless --- you really want to know
> > how many transactions have completed.
> 
> So, what I was pondering went slightly into a different direction:
> 
> (lets ignore anti wraparound vacuum for now)
> 
> Currently we trigger autovacuums by the assumed number of dead
> tuples. In the course of it's action it usually will find that it cannot
> remove all dead rows and that it cannot mark everything as all
> visible. That's because the xmin horizon hasn't advanced far enough. We
> won't trigger another vacuum after that unless there are further dead
> tuples in the relation...
> One trick if we want to overcome that problem and that we do not handle
> setting all visible nicely for INSERT only workloads would be to trigger
> vacuum by the amount of pages that are not marked all visible in the vm.
> 
> The problem there is that repeatedly scanning a relation that's only 50%
> visible where the rest cannot be marked all visible because of a
> longrunning pg_dump obivously isn't a good idea. So we need something to
> notify us when there's work to be done. Using elapsed time seems like a
> bad idea because it doesn't adapt to changing workloads very well and
> doesn't work nicely for different relations.
> 
> What I was thinking of was to keep track of the oldest xids on pages
> that cannot be marked all visible. I haven't thought about the
> statistics part much, but what if we binned the space between
> [RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of
> pages falling into each bin. Then after the vacuum finished we could
> compute how far RecentGlobalXmin would have to progress to make another
> vacuum worthwile by counting the number of pages from the lowest bin
> upwards and use the bin's upper limit as the triggering xid.
> 
> Now, we'd definitely need to amend that scheme by something that handles
> pages that are newly written to, but it seems something like that
> wouldn't be too hard to implement and would make autovacuum more useful.

That seems very complicated.  I think it would be enough to record the
current xid at the time of the vacuum, and when testing for later
vacuums, if that saved xid is earlier than the RecentGlobalXmin, and
there have been no inserts/updates/deletes, we know that all of
the pages can now be marked as allvisible.

What this doesn't handle is the insert case.  What we could do there is
to record the total free space map space, and if the FSM has not changed
between the last vacuum, we can even vacuum if inserts happened in that
period because we assume the inserts are on new pages.  One problem
there is that the FSM is only updated if an insert will not fit on the
page.  We could record the table size and make sure the table size has
increased before we allow inserts to trigger a vm-set vacuum.

None of this is perfect, but it is better than what we have, and it
would eventually get the VM bits set.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
Hi,

On 2013-09-07 12:50:59 -0400, Bruce Momjian wrote:
> That seems very complicated.  I think it would be enough to record the
> current xid at the time of the vacuum, and when testing for later
> vacuums, if that saved xid is earlier than the RecentGlobalXmin, and
> there have been no inserts/updates/deletes, we know that all of
> the pages can now be marked as allvisible.

But that would constantly trigger vacuums, or am I missing something? Or
what are you suggesting this xid to be used for?

What I was talking about was how to evaluate the benefit of triggering
an VACUUM even if there's not a significant amount of new dead rows. If
we know that for a certain xmin horizon there's N pages that potentially
can be cleaned and marked all visible we have a change of making
sensible decisions.
We could just use one bin (i.e. use one cutoff xid as you propose) and
count the number of pages that would be affected. But that would mean
we'd only trigger vacuums very irregularly if you have a workload with
several longrunning transactions. When the oldest of a set of
longrunning transactions finishes you possibly can already clean up a
good bit reducing the chance of further bloat. Otherwise you have to
wait for all of them to finish.

> What this doesn't handle is the insert case.  What we could do there is
> to record the total free space map space, and if the FSM has not changed
> between the last vacuum, we can even vacuum if inserts happened in that
> period because we assume the inserts are on new pages.  One problem
> there is that the FSM is only updated if an insert will not fit on the
> page.  We could record the table size and make sure the table size has
> increased before we allow inserts to trigger a vm-set vacuum.

Not sure why that's better than just counting the number of pages that
have unset vm bits?
Note that you cannot rely on the FSM data to be correct all the time, we
can only use such tricks to trigger vacuums not for the actual operation
in the vacuum.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Jeff Janes
Дата:
On Thu, Sep 5, 2013 at 7:00 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Sep  5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
>> On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > Actually, I now realize it is more complex than that, and worse.  There
>> > are several questions to study to understand when pg_class.relallvisible
>> > is updated (which is used to determine if index-only scans are a good
>> > optimization choice), and when VM all-visible bits are set so heap pages
>> > can be skipped during index-only scans:
>> >
>> >         1)  When are VM bits set:
>> >                 vacuum (non-full)
>> >                 analyze (only some random pages)
>>
>> Analyze doesn't set visibility-map bits.  It only updates statistics
>> about how many are set.
>
> Sorry, yes you are correct.
>
>> > The calculus we should use to determine when we need to run vacuum has
>> > changed with index-only scans, and I am not sure we ever fully addressed
>> > this.
>>
>> Yeah, we didn't.  I think the hard part is figuring out what behavior
>> would be best.  Counting inserts as well as updates and deletes would
>> be a simple approach, but I don't have much confidence in it.  My
>> experience is that having vacuum or analyze kick in during a bulk-load
>> operation is a disaster.  We'd kinda like to come up with a way to
>> make vacuum run after the bulk load is complete, maybe, but how would
>> we identify that time, and there are probably cases where that's not
>> right either.
>
> I am unsure how we have gone a year with index-only scans and I am just
> now learning that it only works well with update/delete workloads or by
> running vacuum manually.  I only found this out going back over January
> emails.  Did other people know this?  Was it not considered a serious
> problem?

I thought it was well known, but maybe I was overly optimistic.  I've
considered IOS to be mostly useful for data mining work on read-mostly
tables, which you would probably vacuum manually after a bulk load.

For transactional tables, I think that trying to keep the vm set-bit
density high enough would be a losing battle.  If we redefined the
nature of the vm so that doing a HOT update would not clear the
visibility bit, perhaps that would change the outcome of this battle.


Cheers,

Jeff



Re: [PERFORM] encouraging index-only scans

От
Amit Kapila
Дата:
On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, Sep 5, 2013 at 7:00 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> On Thu, Sep  5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
>>> On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>> > Actually, I now realize it is more complex than that, and worse.  There
>>> > are several questions to study to understand when pg_class.relallvisible
>>> > is updated (which is used to determine if index-only scans are a good
>>> > optimization choice), and when VM all-visible bits are set so heap pages
>>> > can be skipped during index-only scans:
>>> >
>>> >         1)  When are VM bits set:
>>> >                 vacuum (non-full)
>>> >                 analyze (only some random pages)
>>>
>>> Analyze doesn't set visibility-map bits.  It only updates statistics
>>> about how many are set.
>>
>> Sorry, yes you are correct.
>>
>>> > The calculus we should use to determine when we need to run vacuum has
>>> > changed with index-only scans, and I am not sure we ever fully addressed
>>> > this.
>>>
>>> Yeah, we didn't.  I think the hard part is figuring out what behavior
>>> would be best.  Counting inserts as well as updates and deletes would
>>> be a simple approach, but I don't have much confidence in it.  My
>>> experience is that having vacuum or analyze kick in during a bulk-load
>>> operation is a disaster.  We'd kinda like to come up with a way to
>>> make vacuum run after the bulk load is complete, maybe, but how would
>>> we identify that time, and there are probably cases where that's not
>>> right either.
>>
>> I am unsure how we have gone a year with index-only scans and I am just
>> now learning that it only works well with update/delete workloads or by
>> running vacuum manually.  I only found this out going back over January
>> emails.  Did other people know this?  Was it not considered a serious
>> problem?
>
> I thought it was well known, but maybe I was overly optimistic.  I've
> considered IOS to be mostly useful for data mining work on read-mostly
> tables, which you would probably vacuum manually after a bulk load.
>
> For transactional tables, I think that trying to keep the vm set-bit
> density high enough would be a losing battle.  If we redefined the
> nature of the vm so that doing a HOT update would not clear the
> visibility bit, perhaps that would change the outcome of this battle.

Wouldn't it make the Vacuum bit in-efficient in the sense that it will
skip some of the pages in which there are only
HOT updates for cleaning dead rows.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [PERFORM] encouraging index-only scans

От
Jeff Janes
Дата:
On Sun, Sep 8, 2013 at 8:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> I thought it was well known, but maybe I was overly optimistic.  I've
>> considered IOS to be mostly useful for data mining work on read-mostly
>> tables, which you would probably vacuum manually after a bulk load.
>>
>> For transactional tables, I think that trying to keep the vm set-bit
>> density high enough would be a losing battle.  If we redefined the
>> nature of the vm so that doing a HOT update would not clear the
>> visibility bit, perhaps that would change the outcome of this battle.
>
> Wouldn't it make the Vacuum bit in-efficient in the sense that it will
> skip some of the pages in which there are only
> HOT updates for cleaning dead rows.

Maybe.  But anyone is competent to clean up dead rows from HOT
updates, it is not exclusively vacuum that can do it, like it is for
non-HOT tuples.  So I think any inefficiency would be very small.

Cheers,

Jeff



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Sun, Sep  8, 2013 at 12:47:35AM +0200, Andres Freund wrote:
> Hi,
> 
> On 2013-09-07 12:50:59 -0400, Bruce Momjian wrote:
> > That seems very complicated.  I think it would be enough to record the
> > current xid at the time of the vacuum, and when testing for later
> > vacuums, if that saved xid is earlier than the RecentGlobalXmin, and
> > there have been no inserts/updates/deletes, we know that all of
> > the pages can now be marked as allvisible.
> 
> But that would constantly trigger vacuums, or am I missing something? Or
> what are you suggesting this xid to be used for?

OK, let me give some specifices.  Let's suppose we run a vacuum, and at
the time the current xid counter is 200.  If we later have autovacuum
check if it should vacuum, and there have been no dead rows generated
(no update/delete/abort), if the current RecentGlobalXmin is >200, then
we know that all the transactions that prevented all-visible marking the
last time we ran vacuum has completed.  That leaves us with just
inserts that could prevent all-visible.

If there have been no inserts, we can assume that we can vacuum just the
non-all-visible pages, and even if there are only 10, it just means we
have to read 10 8k blocks, not the entire table, because the all-visible
is set for all the rest of the pages.

Now, if there have been inserts, there are a few cases.  If the inserts
happened in pages that were previously marked all-visible, then we now
have pages that lost all-visible, and we probably don't want to vacuum
those.  Of course, we will not have recorded which pages changed, but
any decrease in the all-visible table count perhaps should have us
avoiding vacuum just to set the visibility map.  We should probably
update our stored vm bit-set count and current xid value so we can check
again later to see if things have sabilized.

If the vm-set bit count is the same as the last time autovacuum checked
the table, then the inserts happened either in the vm-bit cleared pages,
or in new data pages.  If the table size is the same, the inserts
happened in existing pages, so we probably don't want to vacuum.  If the
table size has increased, some inserts went into new pages, so we might
want to vacuum, but I am unclear how many new pages should force a
vacuum.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [PERFORM] encouraging index-only scans

От
Amit Kapila
Дата:
On Mon, Sep 9, 2013 at 9:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Sun, Sep 8, 2013 at 8:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> I thought it was well known, but maybe I was overly optimistic.  I've
>>> considered IOS to be mostly useful for data mining work on read-mostly
>>> tables, which you would probably vacuum manually after a bulk load.
>>>
>>> For transactional tables, I think that trying to keep the vm set-bit
>>> density high enough would be a losing battle.  If we redefined the
>>> nature of the vm so that doing a HOT update would not clear the
>>> visibility bit, perhaps that would change the outcome of this battle.
>>
>> Wouldn't it make the Vacuum bit in-efficient in the sense that it will
>> skip some of the pages in which there are only
>> HOT updates for cleaning dead rows.
>
> Maybe.  But anyone is competent to clean up dead rows from HOT
> updates, it is not exclusively vacuum that can do it, like it is for
> non-HOT tuples.

Yes, that is right, but how about freezing of tuples, delaying that
also might not be good. Also it might not be good for all kind of
scenarios that always foreground operations take care of cleaning up
dead rows leaving very less chance for Vacuum (only when it has to
scan all pages aka anti-wraparound vacuum) to cleanup dead rows.

If we are sure that Vacuum skipping pages in a database where there
are less non-HOT updates and deletes (or mostly inserts and
Hot-updates) is not having any significant impact, then it can be
quite useful for IOS.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [PERFORM] encouraging index-only scans

От
Jim Nasby
Дата:
On 9/7/13 12:34 AM, Andres Freund wrote:
> What I was thinking of was to keep track of the oldest xids on pages
> that cannot be marked all visible. I haven't thought about the
> statistics part much, but what if we binned the space between
> [RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of
> pages falling into each bin. Then after the vacuum finished we could
> compute how far RecentGlobalXmin would have to progress to make another
> vacuum worthwile by counting the number of pages from the lowest bin
> upwards and use the bin's upper limit as the triggering xid.
>
> Now, we'd definitely need to amend that scheme by something that handles
> pages that are newly written to, but it seems something like that
> wouldn't be too hard to implement and would make autovacuum more useful.

If we're binning by XID though you're still dependent on scanning to build that range. Anything that creates dead
tupleswill also be be problematic, because it's going to unset VM bits on you, and you won't know if it's due to
INSERTSor dead tuples.
 

What if we maintained XID stats for ranges of pages in a separate fork? Call it the XidStats fork. Presumably the
interestingpieces would be min(xmin) and max(xmax) for pages that aren't all visible. If we did that at a granularity
of,say, 1MB worth of pages[1] we're talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case alignment
bumpsthat up to 2 XidStats pages per GB of heap.)
 

Having both min(xmin) and max(xmax) for a range of pages would allow for very granular operation of vacuum. Instead of
hittingevery heap page that's not all-visible, it would only hit those that are not visible and where min(xmin) or
max(xmax)were less than RecentGlobalXmin.
 

One concern is maintaining this data. A key point is that we don't have to update it every time it changes; if the
min/maxare only off by a few hundred XIDs there's no point to updating the XidStats page. We'd obviously need the
XidStatspage to be read in, but even a 100GB heap would be either 100 or 200 XidStats pages.
 

[1]: There's a trade-off between how much space we 'waste' on XidStats pages and how many heap pages we potentially
haveto scan in the range. We'd want to see what this looked like in a real system. The thing that helps here is that
regardlessof what the stats for a particular heap range are, you're not going to scan any pages in that range that are
alreadyall-visible.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2013-09-17 11:37:35 -0500, Jim Nasby wrote:
> On 9/7/13 12:34 AM, Andres Freund wrote:
> >What I was thinking of was to keep track of the oldest xids on pages
> >that cannot be marked all visible. I haven't thought about the
> >statistics part much, but what if we binned the space between
> >[RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of
> >pages falling into each bin. Then after the vacuum finished we could
> >compute how far RecentGlobalXmin would have to progress to make another
> >vacuum worthwile by counting the number of pages from the lowest bin
> >upwards and use the bin's upper limit as the triggering xid.
> >
> >Now, we'd definitely need to amend that scheme by something that handles
> >pages that are newly written to, but it seems something like that
> >wouldn't be too hard to implement and would make autovacuum more useful.
> 
> If we're binning by XID though you're still dependent on scanning to
> build that range. Anything that creates dead tuples will also be be
> problematic, because it's going to unset VM bits on you, and you won't
> know if it's due to INSERTS or dead tuples.

I don't think that's all that much of a problem. In the end, it's a good
idea to look at pages shortly after they have been filled/been
touched. Setting hint bits at that point avoid repetitive IO and in many
cases we will already be able to mark them all-visible.
The binning idea was really about sensibly estimating whether a new scan
already makes sense which is currently very hard to judge.

I generally think the current logic for triggering VACUUMs via
autovacuum doesn't really make all that much sense in the days where we
have the visibility map.

> What if we maintained XID stats for ranges of pages in a separate
> fork? Call it the XidStats fork. Presumably the interesting pieces
> would be min(xmin) and max(xmax) for pages that aren't all visible. If
> we did that at a granularity of, say, 1MB worth of pages[1] we're
> talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case
> alignment bumps that up to 2 XidStats pages per GB of heap.)

Yes, I have thought about similar ideas as well, but I came to the
conclusion that it's not worth it. If you want to make the boundaries
precise and the xidstats fork small, you're introducing new contention
points because every DML will need to make sure it's correct.
Also, the amount of code that would require seems to be bigger than
justified by the increase of precision when to vacuum.


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Jim Nasby
Дата:
On 9/17/13 6:10 PM, Andres Freund wrote:
>> What if we maintained XID stats for ranges of pages in a separate
>> >fork? Call it the XidStats fork. Presumably the interesting pieces
>> >would be min(xmin) and max(xmax) for pages that aren't all visible. If
>> >we did that at a granularity of, say, 1MB worth of pages[1] we're
>> >talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case
>> >alignment bumps that up to 2 XidStats pages per GB of heap.)

> Yes, I have thought about similar ideas as well, but I came to the
> conclusion that it's not worth it. If you want to make the boundaries
> precise and the xidstats fork small, you're introducing new contention
> points because every DML will need to make sure it's correct.

Actually, that's not true... the XidStats only need to be "relatively" precise. IE: within a few hundred or thousand
XIDs.

So for example, you'd only need to attempt an update if the XID already stored was more than a few
hundred/thousand/whateverXIDs away from your XID. If it's any closer don't even bother to update.
 

That still leaves potential for thundering herd on the fork buffer lock if you've got a ton of DML on one table across
abunch of backends, but there might be other ways around that. For example, if you know you can update the XID with a
CPU-atomicinstruction, you don't need to lock the page.
 

> Also, the amount of code that would require seems to be bigger than
> justified by the increase of precision when to vacuum.

That's very possibly true. I haven't had a chance to see how much VM bits help reduce vacuum overhead yet, so I don't
haveanything to add on this front. Perhaps others might.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: [PERFORM] encouraging index-only scans

От
Robert Haas
Дата:
On Tue, Sep 17, 2013 at 7:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> I generally think the current logic for triggering VACUUMs via
> autovacuum doesn't really make all that much sense in the days where we
> have the visibility map.

Right now, whether or not to autovacuum is the rest of a two-pronged
test.  The first prong is based on number of updates and deletes
relative to table size; that triggers a regular autovacuum.  The
second prong is based on age(relfrozenxid) and triggers a
non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).

The typical case in which this doesn't work out well is when the table
has a lot of inserts but few or no updates and deletes.  So I propose
that we change the first prong to count inserts as well as updates and
deletes when deciding whether it needs to vacuum the table.  We
already use that calculation to decide whether to auto-analyze, so it
wouldn't be very novel.   We know that the work of marking pages
all-visible will need to be done at some point, and doing it sooner
will result in doing it in smaller batches, which seems generally
good.

However, I do have one concern: it might lead to excessive
index-vacuuming.  Right now, we skip the index vac step only if there
ZERO dead tuples are found during the heap scan.  Even one dead tuple
(or line pointer) will cause an index vac cycle, which may easily be
excessive.  So I further propose that we introduce a threshold for
index-vac; so that we only do index vac cycle if the number of dead
tuples exceeds, say 0.1% of the table size.

Thoughts?  Let the hurling of rotten tomatoes begin.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PERFORM] encouraging index-only scans

От
Kevin Grittner
Дата:
Robert Haas <robertmhaas@gmail.com> wrote:

> Right now, whether or not to autovacuum is the rest of a two-pronged

> test.  The first prong is based on number of updates and deletes
> relative to table size; that triggers a regular autovacuum.  The
> second prong is based on age(relfrozenxid) and triggers a
> non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).
>
> The typical case in which this doesn't work out well is when the table
> has a lot of inserts but few or no updates and deletes.  So I propose
> that we change the first prong to count inserts as well as updates and
> deletes when deciding whether it needs to vacuum the table.  We
> already use that calculation to decide whether to auto-analyze, so it
> wouldn't be very novel.   We know that the work of marking pages
> all-visible will need to be done at some point, and doing it sooner
> will result in doing it in smaller batches, which seems generally
> good.
>
> However, I do have one concern: it might lead to excessive
> index-vacuuming.  Right now, we skip the index vac step only if there
> ZERO dead tuples are found during the heap scan.  Even one dead tuple
> (or line pointer) will cause an index vac cycle, which may easily be
> excessive.  So I further propose that we introduce a threshold for
> index-vac; so that we only do index vac cycle if the number of dead
> tuples exceeds, say 0.1% of the table size.

+1  I've been thinking of suggesting something along the same lines,
for the same reasons.

 
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2013-09-19 14:39:43 -0400, Robert Haas wrote:
> On Tue, Sep 17, 2013 at 7:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I generally think the current logic for triggering VACUUMs via
> > autovacuum doesn't really make all that much sense in the days where we
> > have the visibility map.
> 
> Right now, whether or not to autovacuum is the rest of a two-pronged
> test.  The first prong is based on number of updates and deletes
> relative to table size; that triggers a regular autovacuum.  The
> second prong is based on age(relfrozenxid) and triggers a
> non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).

And I have some hopes we can get rid of that in 9.4 (that alone would be
worth a bump to 10.0 ;)). I really like Heikki's patch, even if I am
envious that I didn't have the idea :P. Although it needs quite a bit of
work to be ready.

> The typical case in which this doesn't work out well is when the table
> has a lot of inserts but few or no updates and deletes.  So I propose
> that we change the first prong to count inserts as well as updates and
> deletes when deciding whether it needs to vacuum the table.  We
> already use that calculation to decide whether to auto-analyze, so it
> wouldn't be very novel.   We know that the work of marking pages
> all-visible will need to be done at some point, and doing it sooner
> will result in doing it in smaller batches, which seems generally
> good.

Yes, that's a desperately needed change.

The reason I suggested keeping track of the xids of unremovable tuples
is that the current logic doesn't handle that at all. We just
unconditionally set n_dead_tuples to zero after a vacuum even if not a
single row could actually be cleaned out. Which has the effect that we
will not start a vacuum until enough bloat (or after changing this, new
inserts) has collected to start vacuum anew. Which then will do twice
the work.

Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
much good either - we would just immediately trigger a new vacuum the
next time we check, even if the xmin horizon is still the same.

> However, I do have one concern: it might lead to excessive
> index-vacuuming.  Right now, we skip the index vac step only if there
> ZERO dead tuples are found during the heap scan.  Even one dead tuple
> (or line pointer) will cause an index vac cycle, which may easily be
> excessive.  So I further propose that we introduce a threshold for
> index-vac; so that we only do index vac cycle if the number of dead
> tuples exceeds, say 0.1% of the table size.

Yes, that's a pretty valid concern. But we can't really do it that
easily. a) We can only remove dead line pointers when we know there's no
index pointing to it anymore. Which we only know after the index has
been removed. b) We cannot check the validity of an index pointer if
there's no heap tuple for it. Sure, we could check whether we're
pointing to a dead line pointer, but the random io costs of that are
prohibitive.
Now, we could just mark line pointers as dead and not mark that page as
all-visible and pick it up again on the next vacuum cycle. But that
would suck long-term.

I think the only real solution here is to store removed tuples tids
(i.e. items where we've marked as dead) somewhere. Whenever we've found
sufficient tuples to-be-removed from indexes we do phase 2.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Robert Haas
Дата:
On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> The reason I suggested keeping track of the xids of unremovable tuples
> is that the current logic doesn't handle that at all. We just
> unconditionally set n_dead_tuples to zero after a vacuum even if not a
> single row could actually be cleaned out. Which has the effect that we
> will not start a vacuum until enough bloat (or after changing this, new
> inserts) has collected to start vacuum anew. Which then will do twice
> the work.
>
> Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
> much good either - we would just immediately trigger a new vacuum the
> next time we check, even if the xmin horizon is still the same.

One idea would be to store the xmin we used for the vacuum somewhere.
Could we make that part of the pgstats infrastructure?  Or store it in
a new pg_class column?  Then we could avoid re-triggering until it
advances.  Or, maybe better, we could remember the oldest XID that we
weren't able to remove due to xmin considerations and re-trigger when
the horizon passes that point.

>> However, I do have one concern: it might lead to excessive
>> index-vacuuming.  Right now, we skip the index vac step only if there
>> ZERO dead tuples are found during the heap scan.  Even one dead tuple
>> (or line pointer) will cause an index vac cycle, which may easily be
>> excessive.  So I further propose that we introduce a threshold for
>> index-vac; so that we only do index vac cycle if the number of dead
>> tuples exceeds, say 0.1% of the table size.
>
> Yes, that's a pretty valid concern. But we can't really do it that
> easily. a) We can only remove dead line pointers when we know there's no
> index pointing to it anymore. Which we only know after the index has
> been removed. b) We cannot check the validity of an index pointer if
> there's no heap tuple for it. Sure, we could check whether we're
> pointing to a dead line pointer, but the random io costs of that are
> prohibitive.
> Now, we could just mark line pointers as dead and not mark that page as
> all-visible and pick it up again on the next vacuum cycle. But that
> would suck long-term.
>
> I think the only real solution here is to store removed tuples tids
> (i.e. items where we've marked as dead) somewhere. Whenever we've found
> sufficient tuples to-be-removed from indexes we do phase 2.

I don't really agree with that.  Yes, we could make that change, and
yes, it might be better than what we're doing today, but it would be
complex and have its own costs.  And it doesn't mean that lesser steps
are without merit.  A vacuum pass over the heap buys us a LOT of space
for reuse even without touching the indexes: we don't reclaim the line
pointers, but we do reclaim the space for the tuples themselves, which
is a big deal.  So being able to do that more frequently without
causing problems has a lot of value, I think.  The fact that we get to
set all-visible bits along the way makes future vacuums cheaper, and
makes index scans work better, so that's good too.  And the first
vacuum to find a dead tuple will dirty the page to truncate it to a
dead line pointer, while any subsequent revisits prior to the index
vac cycle will only examine the page without dirtying it.  All in all,
just leaving the page to be caught be a future vacuum doesn't seem
that bad to me, at least for a first cut.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2013-09-20 11:30:26 -0400, Robert Haas wrote:
> On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > The reason I suggested keeping track of the xids of unremovable tuples
> > is that the current logic doesn't handle that at all. We just
> > unconditionally set n_dead_tuples to zero after a vacuum even if not a
> > single row could actually be cleaned out. Which has the effect that we
> > will not start a vacuum until enough bloat (or after changing this, new
> > inserts) has collected to start vacuum anew. Which then will do twice
> > the work.
> >
> > Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
> > much good either - we would just immediately trigger a new vacuum the
> > next time we check, even if the xmin horizon is still the same.
> 
> One idea would be to store the xmin we used for the vacuum somewhere.
> Could we make that part of the pgstats infrastructure?  Or store it in
> a new pg_class column?  Then we could avoid re-triggering until it
> advances.  Or, maybe better, we could remember the oldest XID that we
> weren't able to remove due to xmin considerations and re-trigger when
> the horizon passes that point.

I suggested a slightly more complex variant of this upthread:
http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Robert Haas
Дата:
On Fri, Sep 20, 2013 at 11:51 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-20 11:30:26 -0400, Robert Haas wrote:
>> On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > The reason I suggested keeping track of the xids of unremovable tuples
>> > is that the current logic doesn't handle that at all. We just
>> > unconditionally set n_dead_tuples to zero after a vacuum even if not a
>> > single row could actually be cleaned out. Which has the effect that we
>> > will not start a vacuum until enough bloat (or after changing this, new
>> > inserts) has collected to start vacuum anew. Which then will do twice
>> > the work.
>> >
>> > Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
>> > much good either - we would just immediately trigger a new vacuum the
>> > next time we check, even if the xmin horizon is still the same.
>>
>> One idea would be to store the xmin we used for the vacuum somewhere.
>> Could we make that part of the pgstats infrastructure?  Or store it in
>> a new pg_class column?  Then we could avoid re-triggering until it
>> advances.  Or, maybe better, we could remember the oldest XID that we
>> weren't able to remove due to xmin considerations and re-trigger when
>> the horizon passes that point.
>
> I suggested a slightly more complex variant of this upthread:
> http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de

Ah, yeah.  Sorry, I forgot about that.

Personally, I'd try the simpler version first.  But I think whoever
takes the time to implement this will probably get to pick.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Thu, Sep 19, 2013 at 02:39:43PM -0400, Robert Haas wrote:
> Right now, whether or not to autovacuum is the rest of a two-pronged
> test.  The first prong is based on number of updates and deletes
> relative to table size; that triggers a regular autovacuum.  The
> second prong is based on age(relfrozenxid) and triggers a
> non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).
> 
> The typical case in which this doesn't work out well is when the table
> has a lot of inserts but few or no updates and deletes.  So I propose
> that we change the first prong to count inserts as well as updates and
> deletes when deciding whether it needs to vacuum the table.  We
> already use that calculation to decide whether to auto-analyze, so it
> wouldn't be very novel.   We know that the work of marking pages
> all-visible will need to be done at some point, and doing it sooner
> will result in doing it in smaller batches, which seems generally
> good.
> 
> However, I do have one concern: it might lead to excessive
> index-vacuuming.  Right now, we skip the index vac step only if there
> ZERO dead tuples are found during the heap scan.  Even one dead tuple
> (or line pointer) will cause an index vac cycle, which may easily be
> excessive.  So I further propose that we introduce a threshold for
> index-vac; so that we only do index vac cycle if the number of dead
> tuples exceeds, say 0.1% of the table size.
> 
> Thoughts?  Let the hurling of rotten tomatoes begin.

Robert, where are we on this?  Should I post a patch?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: [PERFORM] encouraging index-only scans

От
Robert Haas
Дата:
On Fri, Jan 31, 2014 at 10:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Sep 19, 2013 at 02:39:43PM -0400, Robert Haas wrote:
>> Right now, whether or not to autovacuum is the rest of a two-pronged
>> test.  The first prong is based on number of updates and deletes
>> relative to table size; that triggers a regular autovacuum.  The
>> second prong is based on age(relfrozenxid) and triggers a
>> non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).
>>
>> The typical case in which this doesn't work out well is when the table
>> has a lot of inserts but few or no updates and deletes.  So I propose
>> that we change the first prong to count inserts as well as updates and
>> deletes when deciding whether it needs to vacuum the table.  We
>> already use that calculation to decide whether to auto-analyze, so it
>> wouldn't be very novel.   We know that the work of marking pages
>> all-visible will need to be done at some point, and doing it sooner
>> will result in doing it in smaller batches, which seems generally
>> good.
>>
>> However, I do have one concern: it might lead to excessive
>> index-vacuuming.  Right now, we skip the index vac step only if there
>> ZERO dead tuples are found during the heap scan.  Even one dead tuple
>> (or line pointer) will cause an index vac cycle, which may easily be
>> excessive.  So I further propose that we introduce a threshold for
>> index-vac; so that we only do index vac cycle if the number of dead
>> tuples exceeds, say 0.1% of the table size.
>>
>> Thoughts?  Let the hurling of rotten tomatoes begin.
>
> Robert, where are we on this?  Should I post a patch?

I started working on this at one point but didn't finish the
implementation, let alone the no-doubt-onerous performance testing
that will be needed to validate whatever we come up with.  It would be
really easy to cause serious regressions with ill-considered changes
in this area, and I don't think many people here have the bandwidth
for a detailed study of all the different workloads that might be
affected here right this very minute.  More generally, you're sending
all these pings three weeks after the deadline for CF4.  I don't think
that's a good time to encourage people to *start* revising old
patches, or writing new ones.

I've also had some further thoughts about the right way to drive
vacuum scheduling.  I think what we need to do is tightly couple the
rate at which we're willing to do vacuuming to the rate at which we're
incurring "vacuum debt".  That is, if we're creating 100kB/s of pages
needing vacuum, we vacuum at 2-3MB/s (with default settings).  If
we're creating 10MB/s of pages needing vacuum, we *still* vacuum at
2-3MB/s.  Not shockingly, vacuum gets behind, the database bloats, and
everything goes to heck.  The rate of vacuuming needs to be tied
somehow to the rate at which we're creating stuff that needs to be
vacuumed.  Right now we don't even have a way to measure that, let
alone auto-regulate the aggressiveness of autovacuum on that basis.

Similarly, for marking of pages as all-visible, we currently make the
same decision whether the relation is getting index-scanned (in which
case the failure to mark those pages all-visible may be suppressing
the use of index scans or making them less effective) or whether it's
not being accessed at all (in which case vacuuming it won't help
anything, and might hurt by pushing other pages out of cache).  Again,
if we had better statistics, we could measure this - counting heap
fetches for actual index-only scans plus heap fetches for index scans
that might have been planned index-only scans but for the relation
having too few all-visible pages doesn't sound like an impossible
metric to gather.  And if we had that, we could use it to trigger
vacuuming, instead of guessing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PERFORM] encouraging index-only scans

От
Jeff Janes
Дата:
On Mon, Feb 3, 2014 at 8:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
 
I've also had some further thoughts about the right way to drive
vacuum scheduling.  I think what we need to do is tightly couple the
rate at which we're willing to do vacuuming to the rate at which we're
incurring "vacuum debt".  That is, if we're creating 100kB/s of pages
needing vacuum, we vacuum at 2-3MB/s (with default settings).  

If we can tolerate 2-3MB/s without adverse impact on other work, then we can tolerate it.  Do we gain anything substantial by sand-bagging it?

 
If
we're creating 10MB/s of pages needing vacuum, we *still* vacuum at
2-3MB/s.  Not shockingly, vacuum gets behind, the database bloats, and
everything goes to heck.  

(Your reference to bloat made be me think your comments here are about vacuuming in general, not specific to IOS.  If that isn't the case, then please ignore.)

If we can only vacuum at 2-3MB/s without adversely impacting other activity, but we are creating 10MB/s of future vacuum need, then there are basically two possibilities I can think of.  Either the 10MB/s represents a spike, and vacuum should tolerate it and hope to catch up on the debt later.  Or it represents a new permanent condition, in which case I bought too few hard drives for the work load, and no scheduling decision that autovacuum can make will save me from my folly. Perhaps there is some middle ground between those possibilities, but I don't see room for much middle ground.  

I guess there might be entirely different possibilities not between those two; for example, I don't realize I'm doing something that is generating 10MB/s of vacuum debt, and would like to have this thing I'm doing be automatically throttled to the point it doesn't interfere with other processes (either directly, or indirectly by bloat)

 
The rate of vacuuming needs to be tied
somehow to the rate at which we're creating stuff that needs to be
vacuumed.  Right now we don't even have a way to measure that, let
alone auto-regulate the aggressiveness of autovacuum on that basis.

There is the formula used to decide when a table gets vacuumed.  Isn't the time delta in this formula a measure of how fast we are creating stuff that needs to be vacuumed for bloat reasons?  Is your objection that it doesn't include other reasons we might want to vacuum, or that it just doesn't work very well, or that is not explicitly exposed?


 
Similarly, for marking of pages as all-visible, we currently make the
same decision whether the relation is getting index-scanned (in which
case the failure to mark those pages all-visible may be suppressing
the use of index scans or making them less effective) or whether it's
not being accessed at all (in which case vacuuming it won't help
anything, and might hurt by pushing other pages out of cache).

If it is not getting accessed at all because the database is not very active right now, that would be the perfect time to vacuum it.  Between "I can accurately project current patterns of (in)activity into the future" and "People don't build large tables just to ignore them forever", I think the latter is more likely to be true.  If the system is busy but this particular table is not, then that would be a better reason to de-prioritise vacuuming that table.  But can this degree of reasoning really be implemented in a practical way?  In core?
 
 Again,
if we had better statistics, we could measure this - counting heap
fetches for actual index-only scans plus heap fetches for index scans
that might have been planned index-only scans but for the relation
having too few all-visible pages doesn't sound like an impossible
metric to gather.

My experience has been that if too few pages are all visible, it generally switches to a seq scan, not an index scan of a different index.  But many things that are semantically possible to be index-only-scans would never be planned that way even if allvisible were 100%, so I think it would have to do two planning passes, one with the real allvisible, and a hypothetical one with allvisible set to 100%.  And then there is the possibility that, while a high allvisible would be useful, the table is so active that no amount of vacuuming could ever keep it high.

Cheers,

Jeff

Re: [PERFORM] encouraging index-only scans

От
Robert Haas
Дата:
First, thanks for this thoughtful email.

On Tue, Feb 4, 2014 at 7:14 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Feb 3, 2014 at 8:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I've also had some further thoughts about the right way to drive
>> vacuum scheduling.  I think what we need to do is tightly couple the
>> rate at which we're willing to do vacuuming to the rate at which we're
>> incurring "vacuum debt".  That is, if we're creating 100kB/s of pages
>> needing vacuum, we vacuum at 2-3MB/s (with default settings).
>
> If we can tolerate 2-3MB/s without adverse impact on other work, then we can
> tolerate it.  Do we gain anything substantial by sand-bagging it?

No.  The problem is the other direction.

>> If
>> we're creating 10MB/s of pages needing vacuum, we *still* vacuum at
>> 2-3MB/s.  Not shockingly, vacuum gets behind, the database bloats, and
>> everything goes to heck.
>
> (Your reference to bloat made be me think your comments here are about
> vacuuming in general, not specific to IOS.  If that isn't the case, then
> please ignore.)
>
> If we can only vacuum at 2-3MB/s without adversely impacting other activity,
> but we are creating 10MB/s of future vacuum need, then there are basically
> two possibilities I can think of.  Either the 10MB/s represents a spike, and
> vacuum should tolerate it and hope to catch up on the debt later.  Or it
> represents a new permanent condition, in which case I bought too few hard
> drives for the work load, and no scheduling decision that autovacuum can
> make will save me from my folly. Perhaps there is some middle ground between
> those possibilities, but I don't see room for much middle ground.
>
> I guess there might be entirely different possibilities not between those
> two; for example, I don't realize I'm doing something that is generating
> 10MB/s of vacuum debt, and would like to have this thing I'm doing be
> automatically throttled to the point it doesn't interfere with other
> processes (either directly, or indirectly by bloat)

The underlying issue here is that, in order for there not to be a
problem, a user needs to configure their autovacuum processes to
vacuum at a rate which is greater than or equal to the average rate at
which vacuum debt is being created.  If they don't, they get runaway
bloat.  But to do that, they need to know at what rate they are
creating vacuum debt, which is almost impossible to figure out right
now; and even if they did know it, they'd then need to figure out what
vacuum cost delay settings would allow vacuuming at a rate sufficient
to keep up, which isn't quite as hard to estimate but certainly
involves nontrivial math.  So a lot of people have this set wrong, and
it's not easy to get it right except by frobbing the settings until
you find something that works well in practice.

Also, a whole *lot* of problems in this area are caused by cases where
the rate at which vacuum debt is being created *changes*.  Autovacuum
is keeping up, but then you have either a load spike or just a gradual
increase in activity and it doesn't keep up any more.  You don't
necessarily notice right away, and by the time you do there's no easy
way to recover.  If you've got a table with lots of dead tuples in it,
but it's also got enough internal freespace to satisfy as many inserts
and updates as are happening, then it's possibly reasonable to put off
vacuuming in the hopes that system load will be lower at some time in
the future.  But if you've got a table with lots of dead tuples in it,
and you're extending it to create internal freespace instead of
vacuuming it, it is highly like that you are not doing what will make
the user most happy.  Even if vacuuming that table slows down
foreground activity quite badly, it is probably better than
accumulating an arbitrary amount of bloat.

>> The rate of vacuuming needs to be tied
>> somehow to the rate at which we're creating stuff that needs to be
>> vacuumed.  Right now we don't even have a way to measure that, let
>> alone auto-regulate the aggressiveness of autovacuum on that basis.
>
> There is the formula used to decide when a table gets vacuumed.  Isn't the
> time delta in this formula a measure of how fast we are creating stuff that
> needs to be vacuumed for bloat reasons?  Is your objection that it doesn't
> include other reasons we might want to vacuum, or that it just doesn't work
> very well, or that is not explicitly exposed?

AFAICT, the problem isn't when the table gets vacuumed so much as *how
fast* it gets vacuumed.  The autovacuum algorithm does a fine job
selecting tables for vacuuming, for the most part.  There are problems
with insert-only tables and sometimes for large tables the default
threshold (0.20) is too high, but it's not terrible.  However, the
limit on the overall rate of vacuuming activity to 2-3MB/s regardless
of how fast we're creating vacuum debt is a big problem.

>> Similarly, for marking of pages as all-visible, we currently make the
>> same decision whether the relation is getting index-scanned (in which
>> case the failure to mark those pages all-visible may be suppressing
>> the use of index scans or making them less effective) or whether it's
>> not being accessed at all (in which case vacuuming it won't help
>> anything, and might hurt by pushing other pages out of cache).
>
> If it is not getting accessed at all because the database is not very active
> right now, that would be the perfect time to vacuum it.  Between "I can
> accurately project current patterns of (in)activity into the future" and
> "People don't build large tables just to ignore them forever", I think the
> latter is more likely to be true.  If the system is busy but this particular
> table is not, then that would be a better reason to de-prioritise vacuuming
> that table.  But can this degree of reasoning really be implemented in a
> practical way?  In core?

I don't know.  But the algorithm for determining the rate at which we
vacuum (2-3MB/s) could hardly be stupider than it is right now.  It's
almost a constant, and to the extent that it's not a constant, it
depends on the wrong things.  The fact that getting this perfectly
right is unlikely to be easy, and may be altogether impossible,
shouldn't discourage us from trying to come up with something better
than what we have now.

> My experience has been that if too few pages are all visible, it generally
> switches to a seq scan, not an index scan of a different index.  But many
> things that are semantically possible to be index-only-scans would never be
> planned that way even if allvisible were 100%, so I think it would have to
> do two planning passes, one with the real allvisible, and a hypothetical one
> with allvisible set to 100%.  And then there is the possibility that, while
> a high allvisible would be useful, the table is so active that no amount of
> vacuuming could ever keep it high.

Yeah, those are all good points.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Mon, Feb  3, 2014 at 11:55:34AM -0500, Robert Haas wrote:
> > Robert, where are we on this?  Should I post a patch?
> 
> I started working on this at one point but didn't finish the
> implementation, let alone the no-doubt-onerous performance testing
> that will be needed to validate whatever we come up with.  It would be
> really easy to cause serious regressions with ill-considered changes
> in this area, and I don't think many people here have the bandwidth
> for a detailed study of all the different workloads that might be
> affected here right this very minute.  More generally, you're sending
> all these pings three weeks after the deadline for CF4.  I don't think
> that's a good time to encourage people to *start* revising old
> patches, or writing new ones.
> 
> I've also had some further thoughts about the right way to drive
> vacuum scheduling.  I think what we need to do is tightly couple the

I understand the problems with vacuum scheduling, but I was trying to
address _just_ the insert-only workload problem for index-only scans.

Right now, as I remember, only vacuum sets the visibility bits.  If we
don't want to make vacuum trigger for insert-only workloads, can we set
pages all-visible more often?  

Is there a reason that a sequential scan, which does do page pruning,
doesn't set the visibility bits too?  Or does it?  Can an non-index-only
index scan that finds the heap tuple all-visible and the page not 
all-visible check the other items on the page to see if the page can be
marked all-visible?  Does analyze set pages all-visible?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: [PERFORM] encouraging index-only scans

От
Robert Haas
Дата:
On Tue, Feb 11, 2014 at 10:56 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Right now, as I remember, only vacuum sets the visibility bits.  If we
> don't want to make vacuum trigger for insert-only workloads, can we set
> pages all-visible more often?
>
> Is there a reason that a sequential scan, which does do page pruning,
> doesn't set the visibility bits too?  Or does it?  Can an non-index-only
> index scan that finds the heap tuple all-visible and the page not
> all-visible check the other items on the page to see if the page can be
> marked all-visible?  Does analyze set pages all-visible?

A sequential scan will set hint bits and will prune the page, but
pruning the page doesn't ever mark it all-visible; that logic is
entirely in vacuum.  If that could be made cheap enough to be
negligible, it might well be worth doing in heap_page_prune().  I
think there might be a way to do that, but it's a bit tricky because
the pruning logic iterates over the page in a somewhat complex way,
not just a straightforward scan of all the item pointers the way the
existing logic doesn't.  It would be pretty cool if we could just use
a bit out of the heap-prune xlog record to indicate whether the
all-visible bit should be set; then we'd gain the benefit of marking
things all-visible much more often without needing vacuum.

That doesn't help insert-only tables much, though, because those won't
require pruning.  We set hint bits (which dirties the page) but
currently don't write WAL.  We'd have to change that to set the
all-visible bit when scanning such a table, and that would be
expensive.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Tue, Feb 11, 2014 at 11:28:36AM -0500, Robert Haas wrote:
> A sequential scan will set hint bits and will prune the page, but
> pruning the page doesn't ever mark it all-visible; that logic is
> entirely in vacuum.  If that could be made cheap enough to be
> negligible, it might well be worth doing in heap_page_prune().  I
> think there might be a way to do that, but it's a bit tricky because
> the pruning logic iterates over the page in a somewhat complex way,
> not just a straightforward scan of all the item pointers the way the
> existing logic doesn't.  It would be pretty cool if we could just use
> a bit out of the heap-prune xlog record to indicate whether the
> all-visible bit should be set; then we'd gain the benefit of marking
> things all-visible much more often without needing vacuum.
> 
> That doesn't help insert-only tables much, though, because those won't
> require pruning.  We set hint bits (which dirties the page) but
> currently don't write WAL.  We'd have to change that to set the
> all-visible bit when scanning such a table, and that would be
> expensive.  :-(

Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
(2012) but they still seem to be not usable for insert-only workloads
two years later.  Based on current progress, it doesn't look like this
will be corrected until 9.5 (2015).  I am kind of confused why this has
not generated more urgency.

I guess my question is what approach do we want to take to fixing this? 
If we are doing pruning, aren't we emitting WAL?  You are right that for
an insert-only workload, we aren't going to prune, but if pruning WAL
overhead is acceptable for a sequential scan, isn't index-only
page-all-visible WAL overhead acceptable?

Do we want to track the number of inserts in statistics and trigger an
auto-vacuum after a specified number of inserts?  The problem there is
that we really don't need to do any index cleanup, which is what vacuum
typically does --- we just want to scan the table and set the
all-visible bits, so that approach seems non-optimal.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote:
> Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
> (2012) but they still seem to be not usable for insert-only workloads
> two years later.  Based on current progress, it doesn't look like this
> will be corrected until 9.5 (2015).  I am kind of confused why this has
> not generated more urgency.

I think this largely FUD. They are hugely beneficial in some scenarios
and less so in others. Just like lots of other features we have.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Tue, Feb 11, 2014 at 06:54:10PM +0100, Andres Freund wrote:
> On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote:
> > Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
> > (2012) but they still seem to be not usable for insert-only workloads
> > two years later.  Based on current progress, it doesn't look like this
> > will be corrected until 9.5 (2015).  I am kind of confused why this has
> > not generated more urgency.
> 
> I think this largely FUD. They are hugely beneficial in some scenarios
> and less so in others. Just like lots of other features we have.

I don't understand.  Index-only scans are known to have benefits --- if
an insert-only workload can't use that, why is that acceptable?  What is
fear-uncertainty-and-doubt about that?  Please explain.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2014-02-11 13:23:19 -0500, Bruce Momjian wrote:
> On Tue, Feb 11, 2014 at 06:54:10PM +0100, Andres Freund wrote:
> > On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote:
> > > Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
> > > (2012) but they still seem to be not usable for insert-only workloads
> > > two years later.  Based on current progress, it doesn't look like this
> > > will be corrected until 9.5 (2015).  I am kind of confused why this has
> > > not generated more urgency.
> > 
> > I think this largely FUD. They are hugely beneficial in some scenarios
> > and less so in others. Just like lots of other features we have.
> 
> I don't understand.  Index-only scans are known to have benefits --- if
> an insert-only workload can't use that, why is that acceptable?  What is
> fear-uncertainty-and-doubt about that?  Please explain.

Uh, for one, insert only workloads certainly aren't the majority of
usecases. Ergo there are plenty of cases where index only scans work out
of the box.
Also, they *do* work for insert only workloads, you just either have to
wait longer, or manually trigger VACUUMs. That's a far cry from not
being usable.

I am not saying it shouldn't be improved, I just don't see the point of
bringing it up while everyone is busy with the last CF and claiming it
is unusable and that stating that it is surprisising that nobody really
cares.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Tue, Feb 11, 2014 at 07:31:03PM +0100, Andres Freund wrote:
> On 2014-02-11 13:23:19 -0500, Bruce Momjian wrote:
> > On Tue, Feb 11, 2014 at 06:54:10PM +0100, Andres Freund wrote:
> > > On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote:
> > > > Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
> > > > (2012) but they still seem to be not usable for insert-only workloads
> > > > two years later.  Based on current progress, it doesn't look like this
> > > > will be corrected until 9.5 (2015).  I am kind of confused why this has
> > > > not generated more urgency.
> > > 
> > > I think this largely FUD. They are hugely beneficial in some scenarios
> > > and less so in others. Just like lots of other features we have.
> > 
> > I don't understand.  Index-only scans are known to have benefits --- if
> > an insert-only workload can't use that, why is that acceptable?  What is
> > fear-uncertainty-and-doubt about that?  Please explain.
> 
> Uh, for one, insert only workloads certainly aren't the majority of
> usecases. Ergo there are plenty of cases where index only scans work out
> of the box.

True.

> Also, they *do* work for insert only workloads, you just either have to
> wait longer, or manually trigger VACUUMs. That's a far cry from not

Wait longer for what?  Anti-xid-wraparound vacuum?

> being usable.

Is using VACUUM for these cases documented?  Should it be?

> I am not saying it shouldn't be improved, I just don't see the point of
> bringing it up while everyone is busy with the last CF and claiming it
> is unusable and that stating that it is surprisising that nobody really
> cares.

Well, I brought it up in September too. My point was not that it is a
new issue but that it has been such an ignored issue for two years.  I
am not asking for a fix, but right now we don't even have a plan on how
to improve this.

I still don't see how this is FUD, and you have not explained it to me. 
This is a known limitation for two years, not documented (?), and with
no TODO item and no plan on how to improve it.  Do you want to declare
such cases FUD and just ignore them?  I don't see how that moves us
forward.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: [PERFORM] encouraging index-only scans

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Tue, Feb 11, 2014 at 07:31:03PM +0100, Andres Freund wrote:
>> I am not saying it shouldn't be improved, I just don't see the point of
>> bringing it up while everyone is busy with the last CF and claiming it
>> is unusable and that stating that it is surprisising that nobody really
>> cares.

> Well, I brought it up in September too. My point was not that it is a
> new issue but that it has been such an ignored issue for two years.  I
> am not asking for a fix, but right now we don't even have a plan on how
> to improve this.

Indeed, and considering that we're all busy with the CF, I think it's
quite unreasonable of you to expect that we'll drop everything else
to think about this problem right now.  The reason it's like it is
is that it's not easy to see how to make it better; so even if we did
drop everything else, it's not clear to me that any plan would emerge
anytime soon.
        regards, tom lane



Re: [PERFORM] encouraging index-only scans

От
Andres Freund
Дата:
On 2014-02-11 13:41:46 -0500, Bruce Momjian wrote:
> Wait longer for what?  Anti-xid-wraparound vacuum?

Yes.

> Is using VACUUM for these cases documented?  Should it be?

No idea, it seems to be part of at least part of the folkloric
knowledge, from what I see at clients.

> > I am not saying it shouldn't be improved, I just don't see the point of
> > bringing it up while everyone is busy with the last CF and claiming it
> > is unusable and that stating that it is surprisising that nobody really
> > cares.

> Well, I brought it up in September too. My point was not that it is a
> new issue but that it has been such an ignored issue for two years.  I
> am not asking for a fix, but right now we don't even have a plan on how
> to improve this.

Coming up with a plan for this takes time and discussion, not something
we seem to have aplenty of atm. And even if were to agree on a plan
right now, we wouldn't incorporate it into 9.4, so what's the point of
bringing it up now?

> I still don't see how this is FUD, and you have not explained it to me. 
> This is a known limitation for two years, not documented (?), and with
> no TODO item and no plan on how to improve it.  Do you want to declare
> such cases FUD and just ignore them?  I don't see how that moves us
> forward.

Claiming something doesn't work while it just has manageable usability
issues doesn't strike me as a reasonable starting point. If it bugs
somebody enough to come up with a rough proposal it will get fixed...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Tue, Feb 11, 2014 at 01:54:48PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Tue, Feb 11, 2014 at 07:31:03PM +0100, Andres Freund wrote:
> >> I am not saying it shouldn't be improved, I just don't see the point of
> >> bringing it up while everyone is busy with the last CF and claiming it
> >> is unusable and that stating that it is surprisising that nobody really
> >> cares.
> 
> > Well, I brought it up in September too. My point was not that it is a
> > new issue but that it has been such an ignored issue for two years.  I
> > am not asking for a fix, but right now we don't even have a plan on how
> > to improve this.
> 
> Indeed, and considering that we're all busy with the CF, I think it's
> quite unreasonable of you to expect that we'll drop everything else
> to think about this problem right now.  The reason it's like it is
> is that it's not easy to see how to make it better; so even if we did
> drop everything else, it's not clear to me that any plan would emerge
> anytime soon.

Well, documenting the VACUUM requirement and adding it to the TODO list
are things we should consider for 9.4.  If you think doing that after
the commit-fest is best, I can do that.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: [PERFORM] encouraging index-only scans

От
Jeff Janes
Дата:
On Tue, Feb 11, 2014 at 9:12 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Feb 11, 2014 at 11:28:36AM -0500, Robert Haas wrote:
> A sequential scan will set hint bits and will prune the page, but
> pruning the page doesn't ever mark it all-visible; that logic is
> entirely in vacuum.  If that could be made cheap enough to be
> negligible, it might well be worth doing in heap_page_prune().  I
> think there might be a way to do that, but it's a bit tricky because
> the pruning logic iterates over the page in a somewhat complex way,
> not just a straightforward scan of all the item pointers the way the
> existing logic doesn't.  It would be pretty cool if we could just use
> a bit out of the heap-prune xlog record to indicate whether the
> all-visible bit should be set; then we'd gain the benefit of marking
> things all-visible much more often without needing vacuum.
>
> That doesn't help insert-only tables much, though, because those won't
> require pruning.  We set hint bits (which dirties the page) but
> currently don't write WAL.  We'd have to change that to set the
> all-visible bit when scanning such a table, and that would be
> expensive.  :-(

Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
(2012) but they still seem to be not usable for insert-only workloads
two years later.  Based on current progress, it doesn't look like this
will be corrected until 9.5 (2015).  I am kind of confused why this has
not generated more urgency.


For insert and select only, they are usable (if your queries are of the type that could benefit from them), you just have to do some manual intervention.  The list of features that sometimes require a DBA to do something to make maximum use of them under some circumstance would be a long one.  It would be nice if it were better, but I don't see why this feature is particularly urgent compared to all the other things that could be improved.  In particular I think the Freezing without IO is much more important.  Freezing is rather unimportant until suddenly it is is the most important thing in the universe.  If we could stop worrying about that, I think it would free up other aspects of vacuum scheduling to have more meddling/optimization done to it.

 

I guess my question is what approach do we want to take to fixing this?
If we are doing pruning, aren't we emitting WAL?  You are right that for
an insert-only workload, we aren't going to prune, but if pruning WAL
overhead is acceptable for a sequential scan, isn't index-only
page-all-visible WAL overhead acceptable?


We often don't find that pruning particularly acceptable in seq scans, and there is a patch pending to conditionally turn it off for them.
 

Do we want to track the number of inserts in statistics and trigger an
auto-vacuum after a specified number of inserts?

We track relpages and relallvisible, which seems like a more direct measure.  Once analyze is done (which is already triggered by inserts) and sets those, it could fire a vacuum based on the ratio of those values, or the autovac process could just look at the ratio after naptime.  So just introduce autovacuum_vacuum_visible_factor. A problem there is that it would be a lot of work to aggressively keep the ratio high, and pointless if the types of queries done on that table don't benefit from IOS anyway, or if pages are dirtied so rapidly that no amount of vacuuming will keep the ratio high.  Would we try to automatically tell which tables were which, or rely on the DBA setting per-table autovacuum_vacuum_visible_factor for tables that differ from the database norm?
 
 The problem there is
that we really don't need to do any index cleanup, which is what vacuum
typically does --- we just want to scan the table and set the
all-visible bits, so that approach seems non-optimal.

In the case of no updates or deletes (or aborted inserts?), there would be nothing to clean up in the indexes and that step would be skipped (already in the current code). And if the indexes do need cleaning up, we certainly can't set the page all visible without doing that clean up.
 
Cheers,

Jeff

Re: [PERFORM] encouraging index-only scans

От
Claudio Freire
Дата:
On Tue, Feb 11, 2014 at 4:13 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>> Do we want to track the number of inserts in statistics and trigger an
>> auto-vacuum after a specified number of inserts?
>
>
> We track relpages and relallvisible, which seems like a more direct measure.
> Once analyze is done (which is already triggered by inserts) and sets those,
> it could fire a vacuum based on the ratio of those values, or the autovac
> process could just look at the ratio after naptime.  So just introduce
> autovacuum_vacuum_visible_factor. A problem there is that it would be a lot
> of work to aggressively keep the ratio high, and pointless if the types of
> queries done on that table don't benefit from IOS anyway, or if pages are
> dirtied so rapidly that no amount of vacuuming will keep the ratio high.
> Would we try to automatically tell which tables were which, or rely on the
> DBA setting per-table autovacuum_vacuum_visible_factor for tables that
> differ from the database norm?


Why not track how many times an IOS would be used but wasn't, or how
many heap fetches in IOS have to be performed?

Seems like a more direct measure of whether allvisible needs an update.



Re: [PERFORM] encouraging index-only scans

От
Bruce Momjian
Дата:
On Tue, Feb 11, 2014 at 05:51:36PM -0200, Claudio Freire wrote:
> > We track relpages and relallvisible, which seems like a more direct measure.
> > Once analyze is done (which is already triggered by inserts) and sets those,
> > it could fire a vacuum based on the ratio of those values, or the autovac
> > process could just look at the ratio after naptime.  So just introduce
> > autovacuum_vacuum_visible_factor. A problem there is that it would be a lot
> > of work to aggressively keep the ratio high, and pointless if the types of
> > queries done on that table don't benefit from IOS anyway, or if pages are
> > dirtied so rapidly that no amount of vacuuming will keep the ratio high.
> > Would we try to automatically tell which tables were which, or rely on the
> > DBA setting per-table autovacuum_vacuum_visible_factor for tables that
> > differ from the database norm?
> 
> 
> Why not track how many times an IOS would be used but wasn't, or how
> many heap fetches in IOS have to be performed?
> 
> Seems like a more direct measure of whether allvisible needs an update.

Now that is in interesting idea, and more direct. 

Do we need to adjust for the insert count, i.e. would the threadhold to
trigger an autovacuum after finding index lookups that had to check the
heap page for visibility be higher if many inserts are happening,
perhaps dirtying pages? (If we are dirtying via update/delete,
autovacuum will already trigger.)

We are aggressive in clearing the page-all-visible flag (we have to be),
but I think we need a little more aggressiveness for setting it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +