Обсуждение: Is dump-reload the only cure?

Поиск
Список
Период
Сортировка

Is dump-reload the only cure?

От
Дата:
Hi ,

For a particular table  it was only dump and reload of the table that
helped in enabling index usage.

I tried VACUUM ANALYZE and even recreating the indexes but it
did not work.

why does the planner use the index like a miser?
below are the details

was there anything bettwer i could have done for indexes getting used?


regds
mallah.

Query:

explain SELECT count( email_id  )  from email_bank_mailing_lists   where query_id='499';
NOTICE:  QUERY PLAN:

Aggregate  (cost=4330.48..4330.48 rows=1 width=4)
  ->  Index Scan using email_bank_ml_qid on email_bank_mailing_lists  (cost=0.00..4327.28
  rows=1282 width=4)
EXPLAIN


distribution of query_id in table:
total: 256419

query_id  |  count(*)
----------------------
      298 |  6167
      328 |  2083
      354 |  9875
      404 |  6974
      432 |  5059
      437 |  2497
      440 |  2837
      448 | 14624
      449 | 13053
      454 |   409
      455 |  3725
      456 |   560
      458 |  3477
      460 |  5561
      486 | 41842
      488 | 63642
      492 |  2244
      493 |  6047
      494 | 37415
      499 | 25010
      501 |  3318


before dump reload:
tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists;
NOTICE:  --Relation email_bank_mailing_lists--
NOTICE:  Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
        Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
NOTICE:  Analyzing email_bank_mailing_lists
VACUUM
tradein_clients=# explain SELECT count( email_id  )  from email_bank_mailing_lists   where
query_id=499;NOTICE:  QUERY PLAN:

Aggregate  (cost=6863.24..6863.24 rows=1 width=4)
  ->  Seq Scan on email_bank_mailing_lists  (cost=0.00..6788.24 rows=30001 width=4)

EXPLAIN













-----------------------------------------
Get your free web based email at trade-india.com.
   "India's Leading B2B eMarketplace.!"
http://www.trade-india.com/



Re: Is dump-reload the only cure?

От
Rod Taylor
Дата:
On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote:


Looks like a borderline case.  See the costs of the index scan and
sequential scan are very similar.  Since 499 covers nearly 1 in 10
tuples, it's likely found on nearly every page.  This should make a
sequential scan much cheaper.

However, if the data is clumped together (not distributed throughout the
table) than an index scan may be preferable.   So...  CLUSTER may be
useful to you.

In the future please 'explain analyze' the queries you're looking at to
see actual costs as compared to the estimated cost.


>       499 | 25010
>       501 |  3318
>
>
> before dump reload:
> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists;
> NOTICE:  --Relation email_bank_mailing_lists--
> NOTICE:  Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
>         Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
> NOTICE:  Analyzing email_bank_mailing_lists
> VACUUM
> tradein_clients=# explain SELECT count( email_id  )  from email_bank_mailing_lists   where
> query_id=499;NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=6863.24..6863.24 rows=1 width=4)
>   ->  Seq Scan on email_bank_mailing_lists  (cost=0.00..6788.24 rows=30001 width=4)
>
> EXPLAIN

--
  Rod Taylor


Re: Is dump-reload the only cure?

От
Дата:
Hi Rod ,

Does it means that index scan is used for less frequenlty occuring data?
yes my table was not clustered.

can u tell me what does 0.00..6788.24 and rows and width means?

in explain out put cost=0.00..6788.24 rows=30001 width=4


I have one more table where i face the similar problem , i have not dump - reloaded
it yet , will post again if i face the problem.


thanks

Regds
Mallah.


> On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote:
>
>
> Looks like a borderline case.  See the costs of the index scan and sequential scan are very
> similar.  Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page.  This
> should make a sequential scan much cheaper.
>
> However, if the data is clumped together (not distributed throughout the table) than an index
> scan may be preferable.   So...  CLUSTER may be useful to you.
>
> In the future please 'explain analyze' the queries you're looking at to see actual costs as
> compared to the estimated cost.
>
>
>>       499 | 25010
>>       501 |  3318
>>
>>
>> before dump reload:
>> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE:  --Relation
>> email_bank_mailing_lists--
>> NOTICE:  Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
>>         Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
>> NOTICE:  Analyzing email_bank_mailing_lists
>> VACUUM
>> tradein_clients=# explain SELECT count( email_id  )  from email_bank_mailing_lists   where
>> query_id=499;NOTICE:  QUERY PLAN:
>>
>> Aggregate  (cost=6863.24..6863.24 rows=1 width=4)
>>   ->  Seq Scan on email_bank_mailing_lists  (cost=0.00..6788.24 rows=30001 width=4)
>>
>> EXPLAIN
>
> --
>  Rod Taylor



-----------------------------------------
Get your free web based email at trade-india.com.
   "India's Leading B2B eMarketplace.!"
http://www.trade-india.com/



Re: Is dump-reload the only cure?

От
Дата:

Rod ,

Clustering did work for my other case ;-)


tradein_clients=> explain analyze SELECT  count(*) from email_source where source_id=173;
NOTICE:  QUERY PLAN:

Aggregate  (cost=13042.91..13042.91 rows=1 width=0) (actual time=1415.32..1415.32 rows=1 loops=1)
  ->  Seq Scan on email_source  (cost=0.00..12964.48 rows=31375 width=0) (actual
  time=1.19..1368.58 rows=32851 loops=1)Total runtime: 1415.42 msec

EXPLAIN
tradein_clients=> \d email_source
      Table "email_source"
  Column   |  Type   | Modifiers
-----------+---------+-----------
 email_id  | integer |
 source_id | integer |
Indexes: email_source_sid
Unique keys: email_source_idx
tradein_clients=> CLUSTER email_source_sid on email_source ;
CLUSTER
tradein_clients=>
tradein_clients=> explain analyze SELECT  count(*) from email_source where source_id=173;
NOTICE:  QUERY PLAN:

Aggregate  (cost=11458.83..11458.83 rows=1 width=0) (actual time=207.73..207.73 rows=1 loops=1)
  ->  Index Scan using email_source_sid on email_source  (cost=0.00..11449.76 rows=3627 width=0)
  (actual time=0.27..161.04 rows=32851 loops=1)Total runtime: 207.90 msec
EXPLAIN


Does it Mean that clustered indexes are guarrented  to be used for index scan?
one more thing does clustering means that all future data addition will happen
in the ordered manner only i mean consecutively in terms of source_id?

Regds
MALLAH.











> On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote:
>
>
> Looks like a borderline case.  See the costs of the index scan and sequential scan are very
> similar.  Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page.  This
> should make a sequential scan much cheaper.
>
> However, if the data is clumped together (not distributed throughout the table) than an index
> scan may be preferable.   So...  CLUSTER may be useful to you.
>
> In the future please 'explain analyze' the queries you're looking at to see actual costs as
> compared to the estimated cost.
>
>
>>       499 | 25010
>>       501 |  3318
>>
>>
>> before dump reload:
>> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE:  --Relation
>> email_bank_mailing_lists--
>> NOTICE:  Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
>>         Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
>> NOTICE:  Analyzing email_bank_mailing_lists
>> VACUUM
>> tradein_clients=# explain SELECT count( email_id  )  from email_bank_mailing_lists   where
>> query_id=499;NOTICE:  QUERY PLAN:
>>
>> Aggregate  (cost=6863.24..6863.24 rows=1 width=4)
>>   ->  Seq Scan on email_bank_mailing_lists  (cost=0.00..6788.24 rows=30001 width=4)
>>
>> EXPLAIN
>
> --
>  Rod Taylor



-----------------------------------------
Get your free web based email at trade-india.com.
   "India's Leading B2B eMarketplace.!"
http://www.trade-india.com/



Re: Is dump-reload the only cure?

От
Rod Taylor
Дата:
See Paragraph 2 of the description section:
http://www.postgresql.org/idocs/index.php?sql-explain.html

In the above is a good explanation of 'cost'.  Rows is the number of
rows estimated to be returned, and width is the expected number of
columns it needs to deal with at that point.

On Fri, 2002-11-01 at 08:03, mallah@trade-india.com wrote:
>
> Hi Rod ,
>
> Does it means that index scan is used for less frequenlty occuring data?
> yes my table was not clustered.
>
> can u tell me what does 0.00..6788.24 and rows and width means?
>
> in explain out put cost=0.00..6788.24 rows=30001 width=4
>
>
> I have one more table where i face the similar problem , i have not dump - reloaded
> it yet , will post again if i face the problem.
>
>
> thanks
>
> Regds
> Mallah.
>
>
> > On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote:
> >
> >
> > Looks like a borderline case.  See the costs of the index scan and sequential scan are very
> > similar.  Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page.  This
> > should make a sequential scan much cheaper.
> >
> > However, if the data is clumped together (not distributed throughout the table) than an index
> > scan may be preferable.   So...  CLUSTER may be useful to you.
> >
> > In the future please 'explain analyze' the queries you're looking at to see actual costs as
> > compared to the estimated cost.
> >
> >
> >>       499 | 25010
> >>       501 |  3318
> >>
> >>
> >> before dump reload:
> >> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE:  --Relation
> >> email_bank_mailing_lists--
> >> NOTICE:  Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
> >>         Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
> >> NOTICE:  Analyzing email_bank_mailing_lists
> >> VACUUM
> >> tradein_clients=# explain SELECT count( email_id  )  from email_bank_mailing_lists   where
> >> query_id=499;NOTICE:  QUERY PLAN:
> >>
> >> Aggregate  (cost=6863.24..6863.24 rows=1 width=4)
> >>   ->  Seq Scan on email_bank_mailing_lists  (cost=0.00..6788.24 rows=30001 width=4)
> >>
> >> EXPLAIN
> >
> > --
> >  Rod Taylor
>
>
>
> -----------------------------------------
> Get your free web based email at trade-india.com.
>    "India's Leading B2B eMarketplace.!"
> http://www.trade-india.com/
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
--
  Rod Taylor


Re: Is dump-reload the only cure?

От
Rod Taylor
Дата:
On Fri, 2002-11-01 at 08:03, mallah@trade-india.com wrote:
>
> Hi Rod ,
>
> Does it means that index scan is used for less frequenlty occuring data?
> yes my table was not clustered.
>
> can u tell me what does 0.00..6788.24 and rows and width means?
>
> in explain out put cost=0.00..6788.24 rows=30001 width=4
>
>
> I have one more table where i face the similar problem , i have not dump - reloaded
> it yet , will post again if i face the problem.

Keep in mind that an index scan is very expensive in regards to a single
tuple.  It has to run through (fetch) the index pages, then fetch the
pages from the table.  Since the table fetches are random, the harddrive
will probably incur a seek for each tuple found in the index.  The seeks
add up much quicker than a sequential scan (without nearly as many seeks
or drive head movements).

--
  Rod Taylor


Re: Is dump-reload the only cure?

От
Rod Taylor
Дата:
> Does it Mean that clustered indexes are guarrented  to be used for index scan?
> one more thing does clustering means that all future data addition will happen
> in the ordered manner only i mean consecutively in terms of source_id?

No, but clustering a table allows an index scan to visit less pages, and
make less disk seeks. This in turn makes it a better choice for some
queries due to the current layout of tuples on the disk.  However, there
are new borderline cases -- just in different places than before.

--
  Rod Taylor


Re: Is dump-reload the only cure?

От
Дата:

Thanks for the insight Rod.
is there any other place i can know more abt these principles?

But if the table is clustered then the pages are stored catagiously
with respect to that column rite?


> On Fri, 2002-11-01 at 08:03, mallah@trade-india.com wrote:
>>
>> Hi Rod ,
>>
>> Does it means that index scan is used for less frequenlty occuring data? yes my table was not
>> clustered.
>>
>> can u tell me what does 0.00..6788.24 and rows and width means?
>>
>> in explain out put cost=0.00..6788.24 rows=30001 width=4
>>
>>
>> I have one more table where i face the similar problem , i have not dump - reloaded it yet ,
>> will post again if i face the problem.
>
> Keep in mind that an index scan is very expensive in regards to a single tuple.  It has to run
> through (fetch) the index pages, then fetch the pages from the table.  Since the table fetches
> are random, the harddrive will probably incur a seek for each tuple found in the index.  The
> seeks add up much quicker than a sequential scan (without nearly as many seeks or drive head
> movements).
>
> --
>  Rod Taylor



-----------------------------------------
Get your free web based email at trade-india.com.
   "India's Leading B2B eMarketplace.!"
http://www.trade-india.com/



Re: [pgsql-performance] Is dump-reload the only cure?

От
Robert Treat
Дата:
On Fri, 2002-11-01 at 06:15, mallah@trade-india.com wrote:
> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists;
> NOTICE:  --Relation email_bank_mailing_lists--
> NOTICE:  Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
>         Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
> NOTICE:  Analyzing email_bank_mailing_lists
> VACUUM

I'd suggest running a vacuum full and then running vacuum analyze more
often.  44822 unused tuples seems quite excessive to me...

Robert Treat