Обсуждение: visibility map and reltuples

Поиск
Список
Период
Сортировка

visibility map and reltuples

От
"Ned T. Crigler"
Дата:
It appears that the visibility map patch is causing pg_class.reltuples to be
set improperly after a vacuum. For example, it is set to 0 if the map
indicated that no pages in the heap needed to be scanned.

Perhaps reltuples should not be updated unless every page was scanned during
the vacuum?

-- 
Ned T. Crigler


Re: visibility map and reltuples

От
Heikki Linnakangas
Дата:
Ned T. Crigler wrote:
> It appears that the visibility map patch is causing pg_class.reltuples to be
> set improperly after a vacuum. For example, it is set to 0 if the map
> indicated that no pages in the heap needed to be scanned.
> 
> Perhaps reltuples should not be updated unless every page was scanned during
> the vacuum?

Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all pages.

The interplay of vacuum and analyze in VACUUM ANALYZE needs to be 
changed too. Currently, the analyze after vacuum doesn't overwrite 
reltuples, because the one calculated by vacuum is based on scanning all 
pages, and is thus more accurate than the one estimated from the sample 
(which is not true anymore, as you pointed out). I think the vacuum 
needs to somehow tell analyze whether it updated reltuples or not, so 
that analyze can update reltuples if the vacuum didn't scan all pages.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: visibility map and reltuples

От
Heikki Linnakangas
Дата:
Heikki Linnakangas wrote:
> Ned T. Crigler wrote:
>> It appears that the visibility map patch is causing pg_class.reltuples 
>> to be
>> set improperly after a vacuum. For example, it is set to 0 if the map
>> indicated that no pages in the heap needed to be scanned.
>>
>> Perhaps reltuples should not be updated unless every page was scanned 
>> during
>> the vacuum?
> 
> Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all pages.

Because we use reltuples divided by relpages in the planner, we probably 
shouldn't update relpages either if we don't update reltuples. 
Otherwise, if the table has grown a lot since we last updated reltuples, 
the reltuples / relpages ratio would be less, not more, accurate, if 
relpages is updated to a new higher value but reltuples is not.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: visibility map and reltuples

От
Greg Stark
Дата:
I wonder if we should switch to keeping reltuplesperpage instead. Then  
a partial vacuum could update it by taking the average number of  
tuples per page forbthe pages it saw. Perhaps adjusting it to the  
weights average between the old value and the new value based on how  
many pages were seen.

I suppose there's no reason we can't update reltuples using that same  
logic though it would be a big opaque.

-- 
Greg


On 15 Dec 2008, at 04:01, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com > wrote:

> Heikki Linnakangas wrote:
>> Ned T. Crigler wrote:
>>> It appears that the visibility map patch is causing  
>>> pg_class.reltuples to be
>>> set improperly after a vacuum. For example, it is set to 0 if the  
>>> map
>>> indicated that no pages in the heap needed to be scanned.
>>>
>>> Perhaps reltuples should not be updated unless every page was  
>>> scanned during
>>> the vacuum?
>> Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all  
>> pages.
>
> Because we use reltuples divided by relpages in the planner, we  
> probably shouldn't update relpages either if we don't update  
> reltuples. Otherwise, if the table has grown a lot since we last  
> updated reltuples, the reltuples / relpages ratio would be less, not  
> more, accurate, if relpages is updated to a new higher value but  
> reltuples is not.
>
> -- 
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: visibility map and reltuples

От
Heikki Linnakangas
Дата:
Greg Stark wrote:
> I wonder if we should switch to keeping reltuplesperpage instead. Then a 
> partial vacuum could update it by taking the average number of tuples 
> per page forbthe pages it saw. Perhaps adjusting it to the weights 
> average between the old value and the new value based on how many pages 
> were seen.

The pages scanned by a partial vacuum isn't a random sample of pages in 
the table. That would bias the reltuplesperpage value towards those 
pages that are updated more.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: visibility map and reltuples

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Greg Stark wrote:
>> I wonder if we should switch to keeping reltuplesperpage instead. Then a 
>> partial vacuum could update it by taking the average number of tuples 
>> per page forbthe pages it saw. Perhaps adjusting it to the weights 
>> average between the old value and the new value based on how many pages 
>> were seen.

> The pages scanned by a partial vacuum isn't a random sample of pages in 
> the table. That would bias the reltuplesperpage value towards those 
> pages that are updated more.

Yeah ... and it's highly likely that repeatedly-updated pages would have
more dead space than never-updated ones, so there'd be a systematic
creep towards underestimation of the total tuple count.

I think your previous sketch is right: suppress update of reltuples (and
relpages) from a partial vacuum scan, and ensure that the analyze phase
is allowed to do it instead if it happens during VACUUM ANALYZE.
        regards, tom lane


Re: visibility map and reltuples

От
Greg Smith
Дата:
On Mon, 15 Dec 2008, Greg Stark wrote:

> I wonder if we should switch to keeping reltuplesperpage instead.

It would be preferrable to not touch the user side of reltuples if 
possible, since it's the only instant way to get a good estimate of the 
number of rows in a table right now.  That's been a regular application 
technique for at least two years now, since 
http://www.varlena.com/GeneralBits/120.php popularized it.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: visibility map and reltuples

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> I think your previous sketch is right: suppress update of reltuples (and
> relpages) from a partial vacuum scan, and ensure that the analyze phase
> is allowed to do it instead if it happens during VACUUM ANALYZE.

We also mustn't reset n_live_tuples in pgstat in partial vacuum. 
Committed a patch to do that.

Thanks for the report!

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com