Обсуждение: Wrong stat on pg_stat_user_tables

Поиск
Список
Период
Сортировка

Wrong stat on pg_stat_user_tables

От
Thomas SIMON
Дата:
Hi all,

I have strange behavior with a stat of pg_stat_user_tables (explained
below), where number of live tuples is not the real number at all. (25M
vs 111M)


  - select n_live_tup, last_autoanalyze from pg_stat_user_tables where
relname = 'items'

n_live_tup : 111 191 791
last_autoanalyze : 2016-09-27 04:03:35.09233+02


  - select count (1) from items;
   count
----------
  25307071


I've comparated my 10 biggest tables, only this table gave me wrong results.

Does anyone have an idea about it ?

Thanks

--

Thomas



Re: Wrong stat on pg_stat_user_tables

От
Tom Lane
Дата:
Thomas SIMON <tsimon@neteven.com> writes:
> I have strange behavior with a stat of pg_stat_user_tables (explained
> below), where number of live tuples is not the real number at all.
> ...
> I've comparated my 10 biggest tables, only this table gave me wrong results.

The pgstats stats are only approximate, but ideally they'd converge
towards reality over time when they're wrong.  Seems like maybe this one
table has a usage pattern that's odd enough to throw things off.  Maybe
you have unusual autovacuum/autoanalyze parameters for it, for example?
Is there anything strange about the way you insert/update/delete in it?

A simpler explanation might be that this table has inheritance children
--- the pgstats count would consider only the table itself, but
"select count(1) from items" would scan the child tables too.

            regards, tom lane


Re: Wrong stat on pg_stat_user_tables

От
Glyn Astill
Дата:
> From: Thomas SIMON <tsimon@neteven.com>
>To: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>Sent: Tuesday, 27 September 2016, 9:20
>Subject: [ADMIN] Wrong stat on pg_stat_user_tables
>
>
>Hi all,
>
>I have strange behavior with a stat of pg_stat_user_tables (explained
>below), where number of live tuples is not the real number at all. (25M
>vs 111M)
>
>
>  - select n_live_tup, last_autoanalyze from pg_stat_user_tables where
>relname = 'items'
>
>n_live_tup : 111 191 791
>last_autoanalyze : 2016-09-27 04:03:35.09233+02
>
>
>  - select count (1) from items;
>   count
>----------
>  25307071
>
>
>I've comparated my 10 biggest tables, only this table gave me wrong results.
>
>Does anyone have an idea about it ?
>

Do you have any long running transactions?  The value returned by n_live_tup is an estimate of all the live rows in the
table,vs the value you see from count is the number rows visible to the current transaction. 


How long between sending the email and running the queries? What output do you see if you analyze the table? And what
pgversion is this? 

I think (I may be wrong) the value you see in pg_stat_user_tables is only an estimate based on a sample of tuples per
pagein the table, so if you've an uneven distribution of live rows you could see an incorrect value. 


What do the planner stats show you? Try:

   select reltuples::integer from pg_class where oid = 'items'::regclass;


You might get a better response from the pgsql-general list.

Glyn

Re: Wrong stat on pg_stat_user_tables

От
Thomas SIMON
Дата:
Hi guys,
Thanks for your replies, and sorry for the late answer.

I have grouped your messages to have more visibility.

> Do you have any long running transactions?  The value returned by n_live_tup is an estimate of all the live rows in
thetable, vs the value you see from count is the number rows visible to the current transaction. 
no, I havent got particulary long transactions on this table, or at
least not more than on other tables

> How long between sending the email and running the queries? What output do you see if you analyze the table? And what
pgversion is this? 
If I analyse table, I have following output :

analyse verbose items;
INFO:  analyzing "public.items"
INFO:  "items": scanned 30000 of 11676017 pages, containing 65181 live
rows and 132481 dead rows; 30000 rows in sample, 89400829 estimated
total rows
ANALYZE

It is 9.5 postgresql version

>
> I think (I may be wrong) the value you see in pg_stat_user_tables is only an estimate based on a sample of tuples per
pagein the table, so if you've an uneven distribution of live rows you could see an incorrect value. 
I think you hit something, indeed this is a sample, and indeed when I
see postgres's answer, I see something wrong, I see way too much dead
rows, and planner is probably cheated by this.

I've made some search about analyze, and seen that we can tell to
analyze to use greater sample, so I tried (x4) but I have
approximatively the same result.

#analyse verbose items;
INFO:  analyzing "public.items"
INFO:  "items": scanned 30000 of 11676017 pages, containing 65853 live
rows and 133172 dead rows; 30000 rows in sample, 89236981 estimated
total rows
ANALYZE

# ALTER TABLE items ALTER item_id SET STATISTICS 400;
ALTER TABLE

# analyse verbose items;
INFO:  analyzing "public.items"
INFO:  "items": scanned 120000 of 11676017 pages, containing 260685 live
rows and 527336 dead rows; 120000 rows in sample, 88580538 estimated
total rows
ANALYZE


>
>
> What do the planner stats show you? Try:
>
>     select reltuples::integer from pg_class where oid = 'items'::regclass;
I'm far away from reality too.

# select reltuples::integer from pg_class where oid = 'items'::regclass;
-[ RECORD 1 ]-------
reltuples | 89400832

>
>
> You might get a better response from the pgsql-general list.
>
> Glyn
>

------------


> The pgstats stats are only approximate, but ideally they'd converge
> towards reality over time when they're wrong.  Seems like maybe this one
> table has a usage pattern that's odd enough to throw things off.  Maybe
> you have unusual autovacuum/autoanalyze parameters for it, for example?
> Is there anything strange about the way you insert/update/delete in it?
Indeed I have special autovacuum parameters on this table (
autovacuum_vacuum_scale_factor=0.05 ), because there are lot of updates
on this table, and I did not want it to bloat too much. And so
autovacuum is almost always running on this table. But I don't
understand why it should change something ?
Should i downgrade scale_factor ?
>
> A simpler explanation might be that this table has inheritance children
> --- the pgstats count would consider only the table itself, but
> "select count(1) from items" would scan the child tables too.

I understand your meaning about inheritance, but in my case select count
(25m)  is lower than pgstats (110m).


Thanks
--
Thomas