Обсуждение: Indexes not always used after inserts/updates/vacuum analyze

Поиск
Список
Период
Сортировка

Indexes not always used after inserts/updates/vacuum analyze

От
"Michael G. Martin"
Дата:
I recently upgraded to 7.2 from 7.1.
Prior to 7,2, I was shutting down the database, droping indexes,
vacuuming analayze, re-building all the indexes on a nightly basis ( all
automated of course ;) ).  Things ran fine.
After upgrading to 7.2, I replaced all that with a nightly  on-line
/usr/local/pgsql/bin/vacuumdb -v -z  on the database.

The problems I am seeing is this.  As new data is added and modified,
indexes are not being used with some queries, but work fine on others.
 Even a vacuum full analyze did not fix the problem.  I had to drop and
re-build the index for the query to use the index. Now, I see the
problem starting again.  The table has about 20-30 million rows (
4-5Gigs in size), so seq scan kills any access.

This email is lengthy, but I wanted to detail this well.

Here is the table definition:

                Table "symbol_data"

       Column       |         Type          | Modifiers

--------------------+-----------------------+-----------

 symbol_name        | character varying(10) | not null
 date               | date                  | not null
 open               | numeric(15,3)         |
 high               | numeric(15,3)         |
 low                | numeric(15,3)         |
 d_close            | numeric(15,3)         |
 volume             | numeric(15,0)         |
 earnings           | numeric(15,3)         |
 dividend           | numeric(15,3)         |
 source             | character varying(50) |
 daily_mp_12a_long  | character(3)          |
 weekly_mp_12a_long | character(3)          |
 daily_mp_32a_long  | character(3)          |
 weekly_mp_32a_long | character(3)          |
Indexes: symbol_data_date_indx
Unique keys: symbol_data_pkey
------------------------------------------------------
Index "symbol_data_date_indx"
 Column | Type
--------+------
 date   | date
btree
------------------------------------------------------
    Index "symbol_data_pkey"
   Column    |         Type
-------------+-----------------------
 symbol_name | character varying(10)
 date        | date
unique btree
----------------------------------------------------------

Here is what I would expect which usually happens:

explain select * from symbol_data where symbol_name='IBM';
Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..512.99 rows=128 width=129)

Here is one that fails:

explain select * from symbol_data where symbol_name='ELTE';
Seq Scan on symbol_data  (cost=0.00..707415.32 rows=438015 width=129)

Now I thought maybe it had something to do with the concatenated primary key, but:
 explain select * from symbol_data where symbol_name='IBM' and date between '1990-01-01' and '2002-01-01';
 Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..402.55 rows=100 width=129)

explain select * from symbol_data where symbol_name='ELTE' and date between '1990-01-01' and '2002-01-01';
Seq Scan on symbol_data  (cost=0.00..810075.06 rows=342903 width=129)

Now, changing the date range will eventually use the index:

explain select * from symbol_data where symbol_name='ELTE' and date between '2002-01-01' and '2002-02-01';
NOTICE:  QUERY PLAN:

Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..10815.42 rows=2706 width=129)




...now I do a vacuum analyze:
 VACUUM verbose ANALYZE symbol_data;
NOTICE:  --Relation symbol_data--
NOTICE:  Index symbol_data_date_indx: Pages 49709; Tuples 20536054: Deleted 4221.
    CPU 4.35s/16.30u sec elapsed 45.33 sec.
NOTICE:  Index symbol_data_pkey: Pages 74029; Tuples 20536054: Deleted 4221.
    CPU 6.44s/15.15u sec elapsed 31.00 sec.
NOTICE:  Removed 4221 tuples in 70 pages.
    CPU 0.00s/0.04u sec elapsed 0.08 sec.
NOTICE:  Pages 450911: Changed 0, Empty 0; Tup 20536054: Vac 4221, Keep 0, UnUsed 1858963.
    Total CPU 49.20s/36.31u sec elapsed 149.00 sec.
NOTICE:  Analyzing symbol_data
VACUUM

explain select * from symbol_data where symbol_name='ELTE' and date between '1990-01-01' and '2002-01-01';
Seq Scan on symbol_data  (cost=0.00..810291.94 rows=292916 width=129)

No change on the name.

A new index:

 create index test on symbol_data (symbol_name);

explain select * from symbol_data where symbol_name='ELTE';
NOTICE:  QUERY PLAN:

Seq Scan on symbol_data  (cost=0.00..707611.68 rows=383340 width=129)

EXPLAIN
vpm=> VACUUM verbose ANALYZE  symbol_data;
NOTICE:  --Relation symbol_data--
NOTICE:  Pages 450911: Changed 0, Empty 0; Tup 20536054: Vac 0, Keep 0, UnUsed 1863184.
    Total CPU 38.99s/4.50u sec elapsed 67.95 sec.
NOTICE:  Analyzing symbol_data
VACUUM
vpm=>  explain select * from symbol_data where symbol_name='ELTE';
NOTICE:  QUERY PLAN:

Seq Scan on symbol_data  (cost=0.00..707611.68 rows=355958 width=129)

Doesnt works.  I think the only way to make this ever work is to drop the indexes, vacuum full, and rebuild.

Any thoughts?  This is a very dynamic table, but I was hoping the online vacuum in 7.2 would fix the problems.

I'm in the process of splitting this table up into smaller pieces which will make life easier anyway,
but I think there is something going on here.

Thanks,
Michael

Re: Indexes not always used after inserts/updates/vacuum analyze

От
Tom Lane
Дата:
"Michael G. Martin" <michael@vpmonline.com> writes:
> Here is what I would expect which usually happens:

> explain select * from symbol_data where symbol_name='IBM';
> Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..512.99 rows=128 width=129)

> Here is one that fails:

> explain select * from symbol_data where symbol_name='ELTE';
> Seq Scan on symbol_data  (cost=0.00..707415.32 rows=438015 width=129)

The planner thinks that there are 438K occurrences of 'ELTE' in your
table.  If that's true, a sequential scan is not obviously the wrong
choice.  How many are there, in reality?  What do you get from

select * from pg_stats where tablename = 'symbol_data' and attname =
'symbol_name';

Also, to put the rubber to the road: if you force an indexscan by
doing "set enable_seqscan = off", does it get faster or slower?
(EXPLAIN ANALYZE would be useful here.)

            regards, tom lane

Re: Indexes not always used after inserts/updates/vacuum analyze

От
"Michael G. Martin"
Дата:
Hi Tom,

Here is what is actually there:

 select count(*) from symbol_data where symbol_name='ELTE';
 count
-------
   687

Here is the pg_stat query:


 select * from pg_stats where tablename = 'symbol_data' and attname ='symbol_name';
  tablename  |   attname   | null_frac | avg_width | n_distinct |               most_common_vals               |                                               most_common_freqs                                               |                  histogram_bounds                   | correlation
-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
 symbol_data | symbol_name |         0 |         7 |     152988 | {EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} | {0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333} | {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} |    0.128921
(1 row)


Interesting eh? EBALX also does a full scan--all others in the above list get an index scan.

Here's the  variable stuff--I attached the verbose outputs.
 set enable_seqscan = on;
SET VARIABLE
vpm=> explain select * from symbol_data where symbol_name='ELTE';
NOTICE:  QUERY PLAN:

Seq Scan on symbol_data  (cost=0.00..707611.68 rows=355958 width=129)

 set enable_seqscan = off;
SET VARIABLE
vpm=> explain select * from symbol_data where symbol_name='ELTE';
NOTICE:  QUERY PLAN:

Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..1420899.51 rows=355958 width=129)

Index scan appears slower in explain, but the rows value is weird.

Thanks,
Michael


Tom Lane wrote:
"Michael G. Martin" <michael@vpmonline.com> writes:
Here is what I would expect which usually happens:

explain select * from symbol_data where symbol_name='IBM';
Index Scan using symbol_data_pkey on symbol_data (cost=0.00..512.99 rows=128 width=129)

Here is one that fails:

explain select * from symbol_data where symbol_name='ELTE';
Seq Scan on symbol_data (cost=0.00..707415.32 rows=438015 width=129)

The planner thinks that there are 438K occurrences of 'ELTE' in your
table. If that's true, a sequential scan is not obviously the wrong
choice. How many are there, in reality? What do you get from

select * from pg_stats where tablename = 'symbol_data' and attname =
'symbol_name';

Also, to put the rubber to the road: if you force an indexscan by
doing "set enable_seqscan = off", does it get faster or slower?
(EXPLAIN ANALYZE would be useful here.)

regards, tom lane

Re: Indexes not always used after inserts/updates/vacuum analyze

От
Tom Lane
Дата:
"Michael G. Martin" <michael@vpmonline.com> writes:
> Here is what is actually there:

>  select count(*) from symbol_data where symbol_name='ELTE';
>    687

Hmm.  Do you have reason to think that that was also true when you last
did VACUUM ANALYZE or VACUUM?

> Here is the pg_stat query:
>  select * from pg_stats where tablename = 'symbol_data' and attname
> ='symbol_name';
>   tablename  |   attname   | null_frac | avg_width | n_distinct
> |               most_common_vals
> |
> most_common_freqs
> |                  histogram_bounds                   | correlation
>
-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
>  symbol_data | symbol_name |         0 |         7 |     152988 |
> {EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
> {0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
> | {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} |    0.128921
> (1 row)

What this says is that in the last ANALYZE, EBALX accounted for 18% of
the sample, and ELTE for 17%.  Does that seem plausible to you?  If the
sample was accurate then I'd agree with the planner's choices.  It'd
seem that either your table contents are changing drastically (in which
case more-frequent ANALYZEs may be the answer), or you had the bad luck
to get a very unrepresentative sample, or there's some bug in the
statistical calculations.

            regards, tom lane

Re: Indexes not always used after inserts/updates/vacuum analyze

От
"Michael G. Martin"
Дата:
yes.  each symbol_name only gets one row added and maybe a few updated
each market day.
This is interesting too.  Planner thinks 128 rows on this symbol, GE,
yet there are really 5595.  Not as off as ELTE, but a large factor.  at
least the index get hit here.

explain select * from symbol_data where symbol_name='GE';
NOTICE:  QUERY PLAN:

Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..513.09
rows=128 width=129)

EXPLAIN
vpm=> select count(*) from  symbol_data where symbol_name='GE';
 count
-------
  5595



Tom Lane wrote:

>"Michael G. Martin" <michael@vpmonline.com> writes:
>
>>Here is what is actually there:
>>
>
>> select count(*) from symbol_data where symbol_name='ELTE';
>>   687
>>
>
>Hmm.  Do you have reason to think that that was also true when you last
>did VACUUM ANALYZE or VACUUM?
>
>>Here is the pg_stat query:
>> select * from pg_stats where tablename = 'symbol_data' and attname
>>='symbol_name';
>>  tablename  |   attname   | null_frac | avg_width | n_distinct
>>|               most_common_vals
>>|
>>most_common_freqs
>>|                  histogram_bounds                   | correlation

>>-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
>> symbol_data | symbol_name |         0 |         7 |     152988 |
>>{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
>>{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
>>| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} |    0.128921
>>(1 row)
>>
>
>What this says is that in the last ANALYZE, EBALX accounted for 18% of
>the sample, and ELTE for 17%.  Does that seem plausible to you?  If the
>sample was accurate then I'd agree with the planner's choices.  It'd
>seem that either your table contents are changing drastically (in which
>case more-frequent ANALYZEs may be the answer), or you had the bad luck
>to get a very unrepresentative sample, or there's some bug in the
>statistical calculations.
>
>            regards, tom lane
>

Re: Indexes not always used after inserts/updates/vacuum analyze

От
Tom Lane
Дата:
I said:
>> symbol_data | symbol_name |         0 |         7 |     152988 |
>> {EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
>> {0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
>> | {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} |    0.128921
>> (1 row)

> What this says is that in the last ANALYZE, EBALX accounted for 18% of
> the sample, and ELTE for 17%.

Argh, make that 1.8% and 1.7%.

That's still orders of magnitude away from what you say the correct
frequency is, however: 687 out of 20+ million.  I'd like to think that
the statistical sampling would be unlikely to make such a large error.

            regards, tom lane

Re: Indexes not always used after inserts/updates/vacuum analyze

От
"Michael G. Martin"
Дата:
Sorry, I missed your bottom part before I replied last.

The table breakdown consists of about 8000 symbol_names with at most
5000 rows of data for each symbol ( stock market history ).

So, those sample percents seem huge.  The most any symbol would have
would be about 5000 / (8000*5000) = .0125%

--Michael



Tom Lane wrote:

>"Michael G. Martin" <michael@vpmonline.com> writes:
>
>>Here is what is actually there:
>>
>
>> select count(*) from symbol_data where symbol_name='ELTE';
>>   687
>>
>
>Hmm.  Do you have reason to think that that was also true when you last
>did VACUUM ANALYZE or VACUUM?
>
>>Here is the pg_stat query:
>> select * from pg_stats where tablename = 'symbol_data' and attname
>>='symbol_name';
>>  tablename  |   attname   | null_frac | avg_width | n_distinct
>>|               most_common_vals
>>|
>>most_common_freqs
>>|                  histogram_bounds                   | correlation

>>-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
>> symbol_data | symbol_name |         0 |         7 |     152988 |
>>{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
>>{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
>>| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} |    0.128921
>>(1 row)
>>
>
>What this says is that in the last ANALYZE, EBALX accounted for 18% of
>the sample, and ELTE for 17%.  Does that seem plausible to you?  If the
>sample was accurate then I'd agree with the planner's choices.  It'd
>seem that either your table contents are changing drastically (in which
>case more-frequent ANALYZEs may be the answer), or you had the bad luck
>to get a very unrepresentative sample, or there's some bug in the
>statistical calculations.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html
>

Re: Indexes not always used after inserts/updates/vacuum analyze

От
"Michael G. Martin"
Дата:
Heh--i was gonna ask why the strange percent representation in the stats
table.

I just ran a vacuum analyze with the specific column.  Still get the
same explain plan:

Seq Scan on symbol_data  (cost=0.00..709962.90 rows=369782 width=129)

--Michael

Tom Lane wrote:

>I said:
>
>>>symbol_data | symbol_name |         0 |         7 |     152988 |
>>>{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
>>>{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
>>>| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} |    0.128921
>>>(1 row)
>>>
>
>>What this says is that in the last ANALYZE, EBALX accounted for 18% of
>>the sample, and ELTE for 17%.
>>
>
>Argh, make that 1.8% and 1.7%.
>
>That's still orders of magnitude away from what you say the correct
>frequency is, however: 687 out of 20+ million.  I'd like to think that
>the statistical sampling would be unlikely to make such a large error.
>
>            regards, tom lane
>

Re: Indexes not always used after inserts/updates/vacuum analyze

От
Tom Lane
Дата:
"Michael G. Martin" <michael@vpmonline.com> writes:
> I just ran a vacuum analyze with the specific column.  Still get the
> same explain plan:

Did the pg_stats data change noticeably?

ANALYZE is a statistical sampling process in 7.2, so I'd expect the
results to move around somewhat each time you repeat it.  But if it
changes a lot then we have a problem.

You could also try

ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n

for larger values of n (10 is the default) and then re-ANALYZE
to see if the stats get any more accurate.  The default of 10
was more or less picked out of the air ... perhaps it's too small.

            regards, tom lane

Re: Indexes not always used after inserts/updates/vacuum analyze

От
"Michael G. Martin"
Дата:
Here's the new stats since the vacuum on that column--quite a few changes.

 select * from pg_stats where tablename = 'symbol_data' and attname
='symbol_name';
  tablename  |   attname   | null_frac | avg_width | n_distinct
|               most_common_vals
|
most_common_freqs
|                   histogram_bounds                   | correlation

-------------+-------------+-----------+-----------+------------+----------------------------------------------+-----------------------------------------------------------------------------------------------------------+------------------------------------------------------+-------------
 symbol_data | symbol_name |         0 |         7 |     150712 |
{EBALX,ELTE,SRP,KMG,MKC,AEN,BAC,BDX,BKF,BRT} |
{0.0233333,0.018,0.00266667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {A,BRN,DPAC,FUTR,INTC,MDSN,OCA,RAA,SSYS,USTB,_^^VPM} |    0.112971
(1 row)

I'll alter and play with the table tomorrow and let you know what I find.

-Michael

Tom Lane wrote:

>"Michael G. Martin" <michael@vpmonline.com> writes:
>
>>I just ran a vacuum analyze with the specific column.  Still get the
>>same explain plan:
>>
>
>Did the pg_stats data change noticeably?
>
>ANALYZE is a statistical sampling process in 7.2, so I'd expect the
>results to move around somewhat each time you repeat it.  But if it
>changes a lot then we have a problem.
>
>You could also try
>
>ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n
>
>for larger values of n (10 is the default) and then re-ANALYZE
>to see if the stats get any more accurate.  The default of 10
>was more or less picked out of the air ... perhaps it's too small.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo@postgresql.org so that your
>message can get through to the mailing list cleanly
>

Re: Indexes not always used after inserts/updates/vacuum

От
Reinhard Max
Дата:
On Wed, 27 Feb 2002 at 22:57, Tom Lane wrote:

> Also, to put the rubber to the road: if you force an indexscan by
> doing "set enable_seqscan = off", does it get faster or slower?
> (EXPLAIN ANALYZE would be useful here.)

I've just found a case where forcing indexscans results in much higher
speed. On the the 350000 rows table mentioned in my other post after a
"VACUUM FULL ANALYZE":

max=# set enable_seqscan to false;
max=# EXPLAIN analyze
    SELECT count(foo.id) FROM foo, bar
    WHERE foo.id = bar.ref2foo;
Aggregate    (cost=27513.65..27513.65 rows=1 width=8)
        (actual time=652.38..652.38 rows=1 loops=1)
  ->  Merge Join
        (cost=0.00..27417.57 rows=38431 width=8)
        (actual time=0.06..603.02 rows=38431 loops=1)
        ->  Index Scan using foo_pkey on foo
        (cost=0.00..25153.18 rows=352072 width=4)
        (actual time=0.03..157.57 rows=38432 loops=1)
        ->  Index Scan using idx_bar_ref2foo on bar
        (cost=0.00..807.74 rows=38431 width=4)
        (actual time=0.02..170.25 rows=38431 loops=1)
Total runtime: 652.58 msec
               ^^^^^^^^^^^
max=# set enable_seqscan to true;
max=# EXPLAIN analyze
    SELECT count(foo.id) FROM foo, bar
    WHERE foo.id = bar.ref2foo;

Aggregate
        (cost=18560.65..18560.65 rows=1 width=8)
        (actual time=4951.57..4951.57 rows=1 loops=1)
  ->  Hash Join
        (cost=911.39..18464.58 rows=38431 width=8)
        (actual time=653.26..4905.37 rows=38431 loops=1)
        ->  Seq Scan on foo
        (cost=0.00..9251.72 rows=352072 width=4)
        (actual time=0.02..769.60 rows=352072 loops=1)
        ->  Hash
        (cost=683.31..683.31 rows=38431 width=4)
        (actual time=140.60..140.60 rows=0 loops=1)
              ->  Seq Scan on bar
        (cost=0.00..683.31 rows=38431 width=4)
        (actual time=0.02..78.57 rows=38431 loops=1)
Total runtime: 4951.70 msec
               ^^^^^^^^^^^^

I've reproduced that several times. Even on a newly started postmaster
the query takes less than 2.5 seconds with seqscans swited off.

cu
    Reinhard

Re: Indexes not always used after inserts/updates/vacuum analyze

От
"Michael G. Martin"
Дата:
Ok, so this morning after the automated nightly vacuum -z -v on the
database, ELTE no longer appears in the pg_stats table, and the index is
picked no problem.  The table data has not changed since last eve.

However, now there is a new symbol which is behaving the same way--I.
 This symbol was just loaded into the database yesterday. There are
officially 4108 rows in the symbol_data table where symbol_name='I'.  I
bumped the STATISTICS value up to 100, re-analyzed, but the pg_stats
table still shows I first on the list with a value of 0.0182--didn't
change much from the original STATISTICS value of 10.

Here are the explain analyzes:

set enable_seqscan = on;
explain analyze select * from symbol_data where symbol_name='I' order by
date;
NOTICE:  QUERY PLAN:

Sort  (cost=811813.33..811813.33 rows=373904 width=129) (actual
time=93423.45..93427.02 rows=4108 loops=1)
  ->  Seq Scan on symbol_data  (cost=0.00..709994.20 rows=373904
width=129) (actual time=92483.55..93399.60 rows=4108 loops=1)
Total runtime: 93431.50 msec


 set enable_seqscan = off;
SET VARIABLE
vpm=> explain analyze select * from symbol_data where symbol_name='I'
order by date;
NOTICE:  QUERY PLAN:

Sort  (cost=1584564.49..1584564.49 rows=373904 width=129) (actual
time=129.38..133.01 rows=4108 loops=1)
  ->  Index Scan using symbol_data_pkey on symbol_data
(cost=0.00..1482745.36 rows=373904 width=129) (actual time=21.54..105.46
rows=4108 loops=1)
Total runtime: 137.55 msec


Even though the optimizer thinks the index will cost more, it does pick
it and use it with the performance expected when enable_seqscan = off;

-Michael


Tom Lane wrote:

>"Michael G. Martin" <michael@vpmonline.com> writes:
>
>>I just ran a vacuum analyze with the specific column.  Still get the
>>same explain plan:
>>
>
>Did the pg_stats data change noticeably?
>
>ANALYZE is a statistical sampling process in 7.2, so I'd expect the
>results to move around somewhat each time you repeat it.  But if it
>changes a lot then we have a problem.
>
>You could also try
>
>ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n
>
>for larger values of n (10 is the default) and then re-ANALYZE
>to see if the stats get any more accurate.  The default of 10
>was more or less picked out of the air ... perhaps it's too small.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo@postgresql.org so that your
>message can get through to the mailing list cleanly
>

Re: Indexes not always used after inserts/updates/vacuum analyze

От
Tom Lane
Дата:
Reinhard Max <max@suse.de> writes:
> I've just found a case where forcing indexscans results in much higher
> speed.

>         ->  Index Scan using foo_pkey on foo
>         (cost=0.00..25153.18 rows=352072 width=4)
>         (actual time=0.03..157.57 rows=38432 loops=1)

The major estimation error is evidently in this indexscan.  What
statistics does pg_stats show for this table?

            regards, tom lane

Re: Indexes not always used after inserts/updates/vacuum analyze

От
"Michael G. Martin"
Дата:
Good news.

I looked through the code and after a little debugging found that the
STATISTICS * 300 gives you the sample size of rows used to gather
statistics.
With the symbol_data table with 20million tuples and on this column with
about 8000 unique values, i needed a very large sample size.
Even with a STATISTICS of 500 ( 150,000  random rows) I still got a few
symbols with a most_common_freqs of .01 or so.
Bumping the STATISTICS to 1000 put the highest most_common_freqs at
0.00788667, so no seq scans now.
Not too much of a time difference in the analyze either--at least not an
impact.


The only strange thing I see is still the estimated rows returned.  The
index is picked, so I don't know that it matters.  Even though this
query has 688 tuples, the explain thinks 17k+:

Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..70648.22
rows=17700 width=129)

-Michael

>
> Tom Lane wrote:
>
>>"Michael G. Martin" <michael@vpmonline.com> writes:
>>
>>>I just ran a vacuum analyze with the specific column.  Still get the
>>>same explain plan:
>>>
>>
>>Did the pg_stats data change noticeably?
>>
>>ANALYZE is a statistical sampling process in 7.2, so I'd expect the
>>results to move around somewhat each time you repeat it.  But if it
>>changes a lot then we have a problem.
>>
>>You could also try
>>
>>ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n
>>
>>for larger values of n (10 is the default) and then re-ANALYZE
>>to see if the stats get any more accurate.  The default of 10
>>was more or less picked out of the air ... perhaps it's too small.
>>
>>            regards, tom lane
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 3: if posting/reading through Usenet, please send an appropriate
>>subscribe-nomail command to majordomo@postgresql.org so that your
>>message can get through to the mailing list cleanly
>>
>

Large shared_buffers freezing computers

От
"Michael G. Martin"
Дата:
I've had this happen on 2 seperate servers now.

After reading the docs, I bumped up shared_buffers.  On one machine with
2G pyhsical ram, I set the param to use 1G of memory ( 131072 value), on
another machine with 800M of RAM, I set the value to about 500M ( 64000
).  ipcs shows the correct amounts allocated.

Both servers run fine for a bit, then at some point, the entire box
freezes.  Pings work, but nothing else does, so a hard reboot is necessary.

Any ideas.  Any limits on what you can set these to.  I thought these
values would leave plenty for the other stuff to run on the server.

Here is a top output before freezing:

 9:14pm  up 38 days, 12:47,  2 users,  load average: 4.78, 5.12, 4.91
101 processes: 99 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states: 24.1% user,  7.2% system,  0.0% nice, 68.1% idle
CPU1 states: 28.0% user,  6.4% system,  0.0% nice, 64.4% idle
Mem:   898892K av,  897300K used,    1592K free,       0K shrd,       0K buff
Swap:  819272K av,   65792K used,  753480K free                  805924K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 3370 postgres   9   0  382M 382M  381M S     0.1 43.5  10:53 postmaster
32762 postgres  10   0  104M 104M  103M S    15.2 11.9  65:25 postmaster
 1226 postgres   9   0 54372  53M 52852 S     0.0  6.0   0:08 postmaster
 1334 postgres   9   0 47756  46M 46240 S     0.0  5.3   0:03 postmaster
 1181 postgres   9   0 46184  45M 44592 S     0.0  5.1   0:12 postmaster
 1227 postgres   9   0 39796  38M 38328 S     0.0  4.4   0:06 postmaster
 1228 postgres   9   0 25072  24M 23580 S     0.0  2.7   0:05 postmaster
 9082 postgres  10   0 16608  16M 15180 D     5.0  1.8   0:00 postmaster
 9084 postgres  10   0 14700  14M 13316 S     4.6  1.6   0:00 postmaster
 3244 postgres   9   0 13376  13M 12052 S     0.0  1.4   0:00 postmaster
32668 postgres   9   0 11488  11M 10224 S     0.0  1.2   0:02 postmaster
32669 postgres   9   0 11136  10M  9888 S     0.0  1.2   0:55 postmaster
 9085 postgres  15   0 10820  10M  9520 S     2.5  1.1   0:00 postmaster
 9087 postgres  18   0 10796  10M  9496 R     2.9  1.1   0:00 postmaster
 9086 postgres  16   0 10696  10M  9400 S     2.3  1.1   0:00 postmaster

Thanks,
Michael

Re: Large shared_buffers freezing computers

От
"Michael G. Martin"
Дата:
I read an earlier post by Tom where he recommends 1/4 of physical ram.
 I will go to 1/5 to be safe and I assume it will be ok.  I'm guessing
my 50% was probably overkill.

--Michael

Michael G. Martin wrote:

> I've had this happen on 2 seperate servers now.
>
> After reading the docs, I bumped up shared_buffers.  On one machine
> with 2G pyhsical ram, I set the param to use 1G of memory ( 131072
> value), on another machine with 800M of RAM, I set the value to about
> 500M ( 64000 ).  ipcs shows the correct amounts allocated.
>
> Both servers run fine for a bit, then at some point, the entire box
> freezes.  Pings work, but nothing else does, so a hard reboot is
> necessary.
>
> Any ideas.  Any limits on what you can set these to.  I thought these
> values would leave plenty for the other stuff to run on the server.
>
> Here is a top output before freezing:
>
> 9:14pm  up 38 days, 12:47,  2 users,  load average: 4.78, 5.12, 4.91
> 101 processes: 99 sleeping, 2 running, 0 zombie, 0 stopped
> CPU0 states: 24.1% user,  7.2% system,  0.0% nice, 68.1% idle
> CPU1 states: 28.0% user,  6.4% system,  0.0% nice, 64.4% idle
> Mem:   898892K av,  897300K used,    1592K free,       0K shrd,
> 0K buff
> Swap:  819272K av,   65792K used,  753480K free
> 805924K cached
>
>  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
> 3370 postgres   9   0  382M 382M  381M S     0.1 43.5  10:53 postmaster
> 32762 postgres  10   0  104M 104M  103M S    15.2 11.9  65:25 postmaster
> 1226 postgres   9   0 54372  53M 52852 S     0.0  6.0   0:08 postmaster
> 1334 postgres   9   0 47756  46M 46240 S     0.0  5.3   0:03 postmaster
> 1181 postgres   9   0 46184  45M 44592 S     0.0  5.1   0:12 postmaster
> 1227 postgres   9   0 39796  38M 38328 S     0.0  4.4   0:06 postmaster
> 1228 postgres   9   0 25072  24M 23580 S     0.0  2.7   0:05 postmaster
> 9082 postgres  10   0 16608  16M 15180 D     5.0  1.8   0:00 postmaster
> 9084 postgres  10   0 14700  14M 13316 S     4.6  1.6   0:00 postmaster
> 3244 postgres   9   0 13376  13M 12052 S     0.0  1.4   0:00 postmaster
> 32668 postgres   9   0 11488  11M 10224 S     0.0  1.2   0:02 postmaster
> 32669 postgres   9   0 11136  10M  9888 S     0.0  1.2   0:55 postmaster
> 9085 postgres  15   0 10820  10M  9520 S     2.5  1.1   0:00 postmaster
> 9087 postgres  18   0 10796  10M  9496 R     2.9  1.1   0:00 postmaster
> 9086 postgres  16   0 10696  10M  9400 S     2.3  1.1   0:00 postmaster
>
> Thanks,
> Michael
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster