Re: [Gmod-schema] Re: performace problem after VACUUM ANALYZE

Поиск
Список
Период
Сортировка
От ShengQiang Shu
Тема Re: [Gmod-schema] Re: performace problem after VACUUM ANALYZE
Дата
Msg-id 3E4D59C5.304@fruitfly.org
обсуждение исходный текст
Ответ на performace problem after VACUUM ANALYZE  (Scott Cain <cain@cshl.org>)
Список pgsql-performance
PG really does not do the right thing as I mentioned earlier for joining
tables. To force not to use seqscan, it still does not use right index
(srcfeature_id, nbeg, nend) and performance is even worse.

c_gadfly3=# \d fl_src_b_e_key;
Index "public.fl_src_b_e_key"
     Column     |  Type
---------------+---------
  srcfeature_id | integer
  nbeg          | integer
  nend          | integer
btree, for table "public.featureloc"

c_gadfly3=# explain analyze select * from featureloc fl, feature f where
f.feature_id = fl.feature_id and srcfeature_id=1 and (nbeg >= 1000 and
nend <= 2000);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=115700.27..232516.85 rows=69007 width=445) (actual
time=12342.97..12461.23 rows=32 loops=1)
    Merge Cond: ("outer".feature_id = "inner".feature_id)
    ->  Index Scan using feature_pkey on feature f
(cost=0.00..110535.53 rows=2060653 width=361) (actual time=17.85..490.86
rows=28341 loops=1)
    ->  Sort  (cost=115700.27..115872.79 rows=69007 width=84) (actual
time=11944.23..11944.25 rows=32 loops=1)
          Sort Key: fl.feature_id
          ->  Seq Scan on featureloc fl  (cost=0.00..107580.43
rows=69007 width=84) (actual time=375.85..11944.10 rows=32 loops=1)
                Filter: ((srcfeature_id = 1) AND (nbeg >= 1000) AND
(nend <= 2000))
  Total runtime: 12461.37 msec
(8 rows)

c_gadfly3=#
c_gadfly3=# set enable_seqscan=0;
SET
c_gadfly3=# explain analyze select * from featureloc fl, feature f where
f.feature_id = fl.feature_id and srcfeature_id=1 and (nbeg >= 1000 and
nend <= 2000);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=0.00..236345.49 rows=69007 width=445) (actual
time=721.75..26078.64 rows=32 loops=1)
    Merge Cond: ("outer".feature_id = "inner".feature_id)
    ->  Index Scan using fl_feature_id_key on featureloc fl
(cost=0.00..119701.43 rows=69007 width=84) (actual time=549.14..25854.12
rows=32 loops=1)
          Filter: ((srcfeature_id = 1) AND (nbeg >= 1000) AND (nend <=
2000))
    ->  Index Scan using feature_pkey on feature f
(cost=0.00..110535.53 rows=2060653 width=361) (actual time=50.95..200.37
rows=28342 loops=1)
  Total runtime: 26078.80 msec
(6 rows)




Scott Cain wrote:

> An update: I ran alter table as suggested, ie,
>
> alter table featureloc alter srcfeature_id set statistics 100;
>
> on each column in the table, running vacuum analyze and explain analyze
> on the query in between each alter to see if it made any difference.  It
> did not.  Postgres still instists on doing a seq scan on featureloc:
>
> Unique  (cost=336831.46..337179.45 rows=2320 width=47) (actual
> time=27219.62..27220.30 rows=179 loops=1)
>   ->  Sort  (cost=336831.46..336831.46 rows=23200 width=47) (actual
> time=27219.61..27219.80 rows=186 loops=1)
>         ->  Nested Loop  (cost=0.00..334732.77 rows=23200 width=47)
> (actual time=1003.04..27217.99 rows=186 loops=1)
>               ->  Seq Scan on featureloc fl  (cost=0.00..261709.31
> rows=23200 width=14) (actual time=814.68..26094.18 rows=186 loops=1)
>               ->  Index Scan using feature_pkey on feature f
> (cost=0.00..3.14 rows=1 width=33) (actual time=6.03..6.03 rows=1
> loops=186)
> Total runtime: 27220.63 msec
>
>
> On Fri, 2003-02-14 at 12:29, Scott Cain wrote:
>
> >Tom,
> >
> >Sorry about that:  I'll try to briefly give the information you are
> >looking for.  I've read the docs on ALTER TABLE, but it is not clear to
> >me what columns I should change STATISTICS on, or should I just do it on
> >all of the columns for which indexes exist?
> >
> >Here's the query again:
> >
> >select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
> >    from feature f, featureloc fl
> >    where
> >      fl.srcfeature_id = 1 and
> >      ((fl.strand=1  and fl.nbeg <= 393164 and fl.nend >= 390956) OR
> >      (fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and
> >      f.feature_id  = fl.feature_id
> >
> >--------------------------------------------------------------------------
> >
> >Naive database:
> >
> >Unique  (cost=75513.46..75513.48 rows=1 width=167) (actual
> >time=22815.25..22815.93 rows=179 loops=1)
> >  ->  Sort  (cost=75513.46..75513.46 rows=1 width=167) (actual
> >time=22815.24..22815.43 rows=186 loops=1)
> >        ->  Nested Loop  (cost=0.00..75513.45 rows=1 width=167) (actual
> >time=2471.25..22814.01 rows=186 loops=1)
> >              ->  Index Scan using featureloc_idx2 on featureloc fl
> >(cost=0.00..75508.43 rows=1 width=14) (actual time=2463.83..22796.50
> >rows=186 loops=1)
> >              ->  Index Scan using feature_pkey on feature f
> >(cost=0.00..5.01 rows=1 width=153) (actual time=0.08..0.08 rows=1
> >loops=186)
> >Total runtime: 22816.63 msec
> >--------------------------------------------------------------------------
> >
> >Naive database after featureloc_idx2 dropped:
> >
> >Unique  (cost=75545.46..75545.48 rows=1 width=167) (actual
> >time=5232.36..5234.51 rows=179 loops=1)
> >  ->  Sort  (cost=75545.46..75545.46 rows=1 width=167) (actual
> >time=5232.35..5232.54 rows=186 loops=1)
> >        ->  Nested Loop  (cost=0.00..75545.45 rows=1 width=167) (actual
> >time=291.46..5220.69 rows=186 loops=1)
> >              ->  Index Scan using featureloc_src_strand_beg_end on
> >featureloc fl  (cost=0.00..75540.43 rows=1 width=14) (actual
> >time=291.30..5214.46 rows=186 loops=1)
> >              ->  Index Scan using feature_pkey on feature f
> >(cost=0.00..5.01 rows=1 width=153) (actual time=0.02..0.03 rows=1
> >loops=186)
> >Total runtime: 5234.89 msec
> >--------------------------------------------------------------------------
> >
> >Database after VACUUM ANALYZE was run:
> >
> >Unique  (cost=344377.70..344759.85 rows=2548 width=47) (actual
> >time=26466.82..26467.51 rows=179 loops=1)
> >  ->  Sort  (cost=344377.70..344377.70 rows=25477 width=47) (actual
> >time=26466.82..26467.01 rows=186 loops=1)
> >        ->  Nested Loop  (cost=0.00..342053.97 rows=25477 width=47)
> >(actual time=262.66..26465.63 rows=186 loops=1)
> >              ->  Seq Scan on featureloc fl  (cost=0.00..261709.31
> >rows=25477 width=14) (actual time=118.62..26006.05 rows=186 loops=1)
> >              ->  Index Scan using feature_pkey on feature f
> >(cost=0.00..3.14 rows=1 width=33) (actual time=2.45..2.46 rows=1
> >loops=186)
> >Total runtime: 26467.85 msec
> >--------------------------------------------------------------------------
> >
> >After disallowing seqscans (set enable_seqscan=0):
> >
> >Unique  (cost=356513.46..356895.61 rows=2548 width=47) (actual
> >time=27494.62..27495.34 rows=179 loops=1)
> >  ->  Sort  (cost=356513.46..356513.46 rows=25477 width=47) (actual
> >time=27494.61..27494.83 rows=186 loops=1)
> >        ->  Nested Loop  (cost=0.00..354189.73 rows=25477 width=47)
> >(actual time=198.88..27493.48 rows=186 loops=1)
> >              ->  Index Scan using featureloc_idx1 on featureloc fl
> >(cost=0.00..273845.08 rows=25477 width=14) (actual time=129.30..27280.95
> >rows=186 loops=1)
> >              ->  Index Scan using feature_pkey on feature f
> >(cost=0.00..3.14 rows=1 width=33) (actual time=1.13..1.13 rows=1
> >loops=186)
> >Total runtime: 27495.66 msec
> >--------------------------------------------------------------------------
> >
> >After dropping featureloc_idx1:
> >
> >Unique  (cost=1310195.21..1310577.36 rows=2548 width=47) (actual
> >time=21692.69..21693.37 rows=179 loops=1)
> >  ->  Sort  (cost=1310195.21..1310195.21 rows=25477 width=47) (actual
> >time=21692.69..21692.88 rows=186 loops=1)
> >        ->  Nested Loop  (cost=0.00..1307871.48 rows=25477 width=47)
> >(actual time=2197.65..21691.39 rows=186 loops=1)
> >              ->  Index Scan using featureloc_idx2 on featureloc fl
> >(cost=0.00..1227526.82 rows=25477 width=14) (actual
> >time=2197.49..21618.89 rows=186 loops=1)
> >              ->  Index Scan using feature_pkey on feature f
> >(cost=0.00..3.14 rows=1 width=33) (actual time=0.37..0.38 rows=1
> >loops=186)
> >Total runtime: 21693.72 msec
> >--------------------------------------------------------------------------
> >
> >After dropping featureloc_idx2:
> >
> >Unique  (cost=1414516.98..1414899.13 rows=2548 width=47) (actual
> >time=1669.17..1669.86 rows=179 loops=1)
> >  ->  Sort  (cost=1414516.98..1414516.98 rows=25477 width=47) (actual
> >time=1669.17..1669.36 rows=186 loops=1)
> >        ->  Nested Loop  (cost=0.00..1412193.25 rows=25477 width=47)
> >(actual time=122.69..1668.08 rows=186 loops=1)
> >              ->  Index Scan using featureloc_src_strand_beg_end on
> >featureloc fl  (cost=0.00..1331848.60 rows=25477 width=14) (actual
> >time=122.51..1661.81 rows=186 loops=1)
> >              ->  Index Scan using feature_pkey on feature f
> >(cost=0.00..3.14 rows=1 width=33) (actual time=0.02..0.03 rows=1
> >loops=186)
> >Total runtime: 1670.20 msec
> >
> >
> >On Fri, 2003-02-14 at 12:00, Tom Lane wrote:
> >
> >>Scott Cain  writes:
> >>
> >>>[ much stuff ]
> >>
> >>Could we see EXPLAIN ANALYZE, not just EXPLAIN, output for all these
> >>alternatives?  Your question boils down to "why is the planner
> >>misestimating these queries" ... which is a difficult question to
> >>answer when given only the estimates and not the reality.
> >>
> >>A suggestion though is that you might need to raise the statistics
> >>target on the indexed columns, so that ANALYZE will collect
> >>finer-grained statistics.  (See ALTER TABLE ... SET STATISTICS.)
> >>Try booting it up to 100 (from the default 10), re-analyze, and
> >>then see if/how the plans change.
> >>
> >>            regards, tom lane



В списке pgsql-performance по дате отправления:

Предыдущее
От: Scott Cain
Дата:
Сообщение: Re: [Gmod-schema] Re: performace problem after VACUUM
Следующее
От: Tom Lane
Дата:
Сообщение: Re: performace problem after VACUUM ANALYZE