Обсуждение: Huge Performance Difference on Similar Query in Pg7.2

Поиск
Список
Период
Сортировка

Huge Performance Difference on Similar Query in Pg7.2

От
Heiko Klein
Дата:
Hi,

I have two views:
LowHiSectorGridEmissionsView: > 1.000.000 rows
DataSetsView: > 30.000 rows

When I now perform the two similar (results equal) queries on those
rows:
1)
select count(*) from EmissionsView, DataSetsView
where DataSetsView.setid = EmissionsView.setid
  and EmissionsView.setid = '4614' ;
      -------------
2)
select count(*) from EmissionsView, DataSetsView
where DataSetsView.setid = EmissionsView.setid
  and DataSetsView.setid = '4614' ;
      ------------

I have a huge performance difference. 1) Makes a index search on
EmissionsView and a Seq scan on DataSetsView: time: 1-2s
2) Makes a index search on DataSetsView and a Seq scan on EmissionsView:
time: ~50s

The count(*) isn't the point here, the same is true for every other type
of select.

I'm just porting the database from a Ora7 on 2x200Mhz SGI machine to a
Pg7.2 on a 850Mhz Athlon/Linux PC. On Ora7, both queries take about the
same time: 1-2s. I cannot easily change all queries, since I got only
time to work on the database, not on the frontend, which creates the
queries.


Why is Pg7.2 using a Seq scan at all? (Even after a vacuum analyze!) And
if a Seq scan is needed: Why doesn't the query-planer change the query
to the much faster version itself? And most important: How should I work
on, is there a Pg internal solution for it, or do I have to go the hard
way and modify the frontend?

Heiko


Below are the explain results:
1)
emep=> explain select  count(*) from  EmissionsView ,    DataSetsView where  DataSetsView.setid = EmissionsView.setid
and    EmissionsView.setid = '4614' ; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=5975.66..5975.66 rows=1 width=131)
  ->  Hash Join  (cost=5958.18..5975.58 rows=35 width=131)
        ->  Hash Join  (cost=5956.30..5972.99 rows=35 width=127)
              ->  Hash Join  (cost=5954.41..5970.40 rows=35 width=123)
                    ->  Seq Scan on reports  (cost=0.00..11.78 rows=378 width=8)
                    ->  Hash  (cost=5954.30..5954.30 rows=42 width=115)
                          ->  Nested Loop  (cost=4375.32..5954.30 rows=42 width=115)
                                ->  Nested Loop  (cost=4375.32..5812.95 rows=42 width=103)
                                      ->  Seq Scan on footnotes  (cost=0.00..1.01 rows=1 width=4)
                                      ->  Materialize  (cost=5811.31..5811.31 rows=42 width=99)
                                            ->  Hash Join  (cost=4375.32..5811.31 rows=42 width=99)
                                                  ->  Hash Join  (cost=4371.95..5796.47 rows=307 width=77)
                                                        ->  Merge Join  (cost=4370.73..5789.88 rows=307 width=50)
                                                              ->  Sort  (cost=4370.73..4370.73 rows=28192 width=38)
                                                                    ->  Hash Join  (cost=24.06..1834.10 rows=28192
width=38)
                                                                          ->  Hash Join  (cost=22.17..1268.37
rows=28192width=34) 
                                                                                ->  Seq Scan on datasets
(cost=0.00..724.86rows=33786 width=22) 
                                                                                ->  Hash  (cost=21.23..21.23 rows=378
width=12)
                                                                                      ->  Hash Join  (cost=1.89..21.23
rows=378width=12) 
                                                                                            ->  Seq Scan on reports
(cost=0.00..11.78rows=378 width=8) 
                                                                                            ->  Hash  (cost=1.71..1.71
rows=71width=4) 
                                                                                                  ->  Seq Scan on areas
(cost=0.00..1.71 rows=71 width=4) 
                                                                          ->  Hash  (cost=1.71..1.71 rows=71 width=4)
                                                                                ->  Seq Scan on areas  (cost=0.00..1.71
rows=71width=4) 
                                                              ->  Index Scan using lhsecgriemis_setid_idx on emissions
(cost=0.00..1343.91rows=368 width=12) 
                                                        ->  Hash  (cost=1.18..1.18 rows=18 width=27)
                                                              ->  Seq Scan on datatypes  (cost=0.00..1.18 rows=18
width=27)
                                                  ->  Hash  (cost=3.09..3.09 rows=109 width=22)
                                                        ->  Seq Scan on sectordefinitions  (cost=0.00..3.09 rows=109
width=22)
                                ->  Index Scan using datasets_setid_idx on datasets  (cost=0.00..3.37 rows=1 width=12)
              ->  Hash  (cost=1.71..1.71 rows=71 width=4)
                    ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)
        ->  Hash  (cost=1.71..1.71 rows=71 width=4)
              ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)

EXPLAIN





2)
emep=> explain select  count(*) from  EmissionsView ,    DataSetsView where  DataSetsView.setid = EmissionsView.setid
and    DataSetsView.setid = '4614' ; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=91660.36..91660.36 rows=1 width=131)
  ->  Hash Join  (cost=2732.23..91660.35 rows=3 width=131)
        ->  Nested Loop  (cost=2719.53..91058.61 rows=117798 width=103)
              ->  Seq Scan on footnotes  (cost=0.00..1.01 rows=1 width=4)
              ->  Materialize  (cost=89290.63..89290.63 rows=117798 width=99)
                    ->  Hash Join  (cost=2719.53..89290.63 rows=117798 width=99)
                          ->  Hash Join  (cost=2716.17..56957.45 rows=863754 width=77)
                                ->  Seq Scan on emissions  (cost=0.00..18502.34 rows=1035134 width=12)
                                ->  Hash  (cost=2328.69..2328.69 rows=28192 width=65)
                                      ->  Hash Join  (cost=25.29..2328.69 rows=28192 width=65)
                                            ->  Hash Join  (cost=24.06..1834.10 rows=28192 width=38)
                                                  ->  Hash Join  (cost=22.17..1268.37 rows=28192 width=34)
                                                        ->  Seq Scan on datasets  (cost=0.00..724.86 rows=33786
width=22)
                                                        ->  Hash  (cost=21.23..21.23 rows=378 width=12)
                                                              ->  Hash Join  (cost=1.89..21.23 rows=378 width=12)
                                                                    ->  Seq Scan on reports  (cost=0.00..11.78 rows=378
width=8)
                                                                    ->  Hash  (cost=1.71..1.71 rows=71 width=4)
                                                                          ->  Seq Scan on areas  (cost=0.00..1.71
rows=71width=4) 
                                                  ->  Hash  (cost=1.71..1.71 rows=71 width=4)
                                                        ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)
                                            ->  Hash  (cost=1.18..1.18 rows=18 width=27)
                                                  ->  Seq Scan on datatypes  (cost=0.00..1.18 rows=18 width=27)
                          ->  Hash  (cost=3.09..3.09 rows=109 width=22)
                                ->  Seq Scan on sectordefinitions  (cost=0.00..3.09 rows=109 width=22)
        ->  Hash  (cost=12.70..12.70 rows=1 width=28)
              ->  Hash Join  (cost=10.62..12.70 rows=1 width=28)
                    ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)
                    ->  Hash  (cost=10.62..10.62 rows=1 width=24)
                          ->  Nested Loop  (cost=0.00..10.62 rows=1 width=24)
                                ->  Nested Loop  (cost=0.00..5.96 rows=1 width=16)
                                      ->  Index Scan using datasets_setid_idx on datasets  (cost=0.00..3.37 rows=1
width=12)
                                      ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)
                                ->  Index Scan using reports_pkey on reports  (cost=0.00..4.64 rows=1 width=8)

EXPLAIN


Re: Huge Performance Difference on Similar Query in Pg7.2

От
"Luis Alberto Amigo Navarro"
Дата:
can you send "explain analyze" instead of "explain" results in order to view
real time?
Regards


Re: Huge Performance Difference on Similar Query in Pg7.2

От
Heiko Klein
Дата:
Luis Alberto Amigo Navarro writes:
 > can you send "explain analyze" instead of "explain" results in order to view
 > real time?
 > Regards
 >
Here it is:
1)
emep=> explain analyze select  count(*) from  EmissionsView ,    DataSetsView where  DataSetsView.setid =
EmissionsView.setidand     EmissionsView.setid = '4614' ;  
NOTICE:  QUERY PLAN:

Aggregate  (cost=5975.66..5975.66 rows=1 width=131) (actual time=1264.12..1264.13 rows=1 loops=1)
  ->  Hash Join  (cost=5958.18..5975.58 rows=35 width=131) (actual time=1205.90..1262.46 rows=1606 loops=1)
        ->  Hash Join  (cost=5956.30..5972.99 rows=35 width=127) (actual time=1197.85..1248.93 rows=1606 loops=1)
              ->  Hash Join  (cost=5954.41..5970.40 rows=35 width=123) (actual time=1197.50..1224.92 rows=1606 loops=1)
                    ->  Seq Scan on reports  (cost=0.00..11.78 rows=378 width=8) (actual time=0.02..1.03 rows=378
loops=1)
                    ->  Hash  (cost=5954.30..5954.30 rows=42 width=115) (actual time=1193.55..1193.55 rows=0 loops=1)
                          ->  Nested Loop  (cost=4375.32..5954.30 rows=42 width=115) (actual time=1121.44..1188.23
rows=1606loops=1) 
                                ->  Nested Loop  (cost=4375.32..5812.95 rows=42 width=103) (actual
time=1115.41..1134.58rows=1606 loops=1) 
                                      ->  Seq Scan on footnotes  (cost=0.00..1.01 rows=1 width=4) (actual
time=0.01..0.02rows=1 loops=1) 
                                      ->  Materialize  (cost=5811.31..5811.31 rows=42 width=99) (actual
time=1115.36..1116.33rows=1606 loops=1) 
                                            ->  Hash Join  (cost=4375.32..5811.31 rows=42 width=99) (actual
time=1047.90..1112.31rows=1606 loops=1) 
                                                  ->  Hash Join  (cost=4371.95..5796.47 rows=307 width=77) (actual
time=1042.42..1076.85rows=1606 loops=1) 
                                                        ->  Merge Join  (cost=4370.73..5789.88 rows=307 width=50)
(actualtime=1039.84..1059.42 rows=1606 loops=1) 
                                                              ->  Sort  (cost=4370.73..4370.73 rows=28192 width=38)
(actualtime=996.25..1001.69 rows=3830 loops=1) 
                                                                    ->  Hash Join  (cost=24.06..1834.10 rows=28192
width=38)(actual time=7.50..507.66 rows=33593 loops=1) 
                                                                          ->  Hash Join  (cost=22.17..1268.37
rows=28192width=34) (actual time=7.15..303.78 rows=33593 loops=1) 
                                                                                ->  Seq Scan on datasets
(cost=0.00..724.86rows=33786 width=22) (actual time=0.36..122.78 rows=33786 loops=1) 
                                                                                ->  Hash  (cost=21.23..21.23 rows=378
width=12)(actual time=6.73..6.73 rows=0 loops=1) 
                                                                                      ->  Hash Join  (cost=1.89..21.23
rows=378width=12) (actual time=3.17..6.03 rows=377 loops=1) 
                                                                                            ->  Seq Scan on reports
(cost=0.00..11.78rows=378 width=8) (actual time=2.83..4.05 rows=378 loops=1) 
                                                                                            ->  Hash  (cost=1.71..1.71
rows=71width=4) (actual time=0.26..0.26 rows=0 loops=1) 
                                                                                                  ->  Seq Scan on areas
(cost=0.00..1.71 rows=71 width=4) (actual time=0.01..0.14 rows=71 loops=1) 
                                                                          ->  Hash  (cost=1.71..1.71 rows=71 width=4)
(actualtime=0.27..0.27 rows=0 loops=1) 
                                                                                ->  Seq Scan on areas  (cost=0.00..1.71
rows=71width=4) (actual time=0.01..0.15 rows=71 loops=1) 
                                                              ->  Index Scan using lhsecgriemis_setid_idx on emissions
(cost=0.00..1343.91rows=368 width=12) (actual time=33.77..42.61 rows=1606 loops=1) 
                                                        ->  Hash  (cost=1.18..1.18 rows=18 width=27) (actual
time=2.49..2.49rows=0 loops=1) 
                                                              ->  Seq Scan on datatypes  (cost=0.00..1.18 rows=18
width=27)(actual time=2.40..2.44 rows=18 loops=1) 
                                                  ->  Hash  (cost=3.09..3.09 rows=109 width=22) (actual time=5.41..5.41
rows=0loops=1) 
                                                        ->  Seq Scan on sectordefinitions  (cost=0.00..3.09 rows=109
width=22)(actual time=4.89..5.19 rows=109 loops=1) 
                                ->  Index Scan using datasets_setid_idx on datasets  (cost=0.00..3.37 rows=1 width=12)
(actualtime=0.02..0.02 rows=1 loops=1606) 
              ->  Hash  (cost=1.71..1.71 rows=71 width=4) (actual time=0.25..0.25 rows=0 loops=1)
                    ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4) (actual time=0.01..0.13 rows=71 loops=1)
        ->  Hash  (cost=1.71..1.71 rows=71 width=4) (actual time=7.95..7.95 rows=0 loops=1)
              ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4) (actual time=7.69..7.83 rows=71 loops=1)
Total runtime: 1361.51 msec

EXPLAIN



2)
emep=> explain analyze select  count(*) from  EmissionsView ,    DataSetsView where  DataSetsView.setid =
EmissionsView.setidand     DataSetsView.setid = '4614' ; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=91660.36..91660.36 rows=1 width=131) (actual time=64414.80..64414.80 rows=1 loops=1)
  ->  Hash Join  (cost=2732.23..91660.35 rows=3 width=131) (actual time=58428.47..64413.14 rows=1606 loops=1)
        ->  Nested Loop  (cost=2719.53..91058.61 rows=117798 width=103) (actual time=49523.50..63005.67 rows=1025405
loops=1)
              ->  Seq Scan on footnotes  (cost=0.00..1.01 rows=1 width=4) (actual time=0.01..0.03 rows=1 loops=1)
              ->  Materialize  (cost=89290.63..89290.63 rows=117798 width=99) (actual time=49523.43..51974.76
rows=1025405loops=1) 
                    ->  Hash Join  (cost=2719.53..89290.63 rows=117798 width=99) (actual time=1619.56..47188.00
rows=1025405loops=1) 
                          ->  Hash Join  (cost=2716.17..56957.45 rows=863754 width=77) (actual time=1617.06..27358.00
rows=1035128loops=1) 
                                ->  Seq Scan on emissions  (cost=0.00..18502.34 rows=1035134 width=12) (actual
time=6.42..5027.94rows=1035134 loops=1) 
                                ->  Hash  (cost=2328.69..2328.69 rows=28192 width=65) (actual time=848.29..848.29
rows=0loops=1) 
                                      ->  Hash Join  (cost=25.29..2328.69 rows=28192 width=65) (actual
time=12.33..762.21rows=33593 loops=1) 
                                            ->  Hash Join  (cost=24.06..1834.10 rows=28192 width=38) (actual
time=10.57..519.08rows=33593 loops=1) 
                                                  ->  Hash Join  (cost=22.17..1268.37 rows=28192 width=34) (actual
time=10.22..315.37rows=33593 loops=1) 
                                                        ->  Seq Scan on datasets  (cost=0.00..724.86 rows=33786
width=22)(actual time=0.44..129.13 rows=33786 loops=1) 
                                                        ->  Hash  (cost=21.23..21.23 rows=378 width=12) (actual
time=9.71..9.71rows=0 loops=1) 
                                                              ->  Hash Join  (cost=1.89..21.23 rows=378 width=12)
(actualtime=5.23..8.98 rows=377 loops=1) 
                                                                    ->  Seq Scan on reports  (cost=0.00..11.78 rows=378
width=8)(actual time=4.90..7.04 rows=378 loops=1) 
                                                                    ->  Hash  (cost=1.71..1.71 rows=71 width=4) (actual
time=0.26..0.26rows=0 loops=1) 
                                                                          ->  Seq Scan on areas  (cost=0.00..1.71
rows=71width=4) (actual time=0.01..0.14 rows=71 loops=1) 
                                                  ->  Hash  (cost=1.71..1.71 rows=71 width=4) (actual time=0.27..0.27
rows=0loops=1) 
                                                        ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)
(actualtime=0.01..0.15 rows=71 loops=1) 
                                            ->  Hash  (cost=1.18..1.18 rows=18 width=27) (actual time=1.69..1.69 rows=0
loops=1)
                                                  ->  Seq Scan on datatypes  (cost=0.00..1.18 rows=18 width=27) (actual
time=1.60..1.64rows=18 loops=1) 
                          ->  Hash  (cost=3.09..3.09 rows=109 width=22) (actual time=2.44..2.44 rows=0 loops=1)
                                ->  Seq Scan on sectordefinitions  (cost=0.00..3.09 rows=109 width=22) (actual
time=1.91..2.20rows=109 loops=1) 
        ->  Hash  (cost=12.70..12.70 rows=1 width=28) (actual time=25.72..25.72 rows=0 loops=1)
              ->  Hash Join  (cost=10.62..12.70 rows=1 width=28) (actual time=25.53..25.71 rows=1 loops=1)
                    ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4) (actual time=0.01..0.14 rows=71 loops=1)
                    ->  Hash  (cost=10.62..10.62 rows=1 width=24) (actual time=25.44..25.44 rows=0 loops=1)
                          ->  Nested Loop  (cost=0.00..10.62 rows=1 width=24) (actual time=25.24..25.43 rows=1 loops=1)
                                ->  Nested Loop  (cost=0.00..5.96 rows=1 width=16) (actual time=21.79..21.98 rows=1
loops=1)
                                      ->  Index Scan using datasets_setid_idx on datasets  (cost=0.00..3.37 rows=1
width=12)(actual time=15.48..15.48 rows=1 loops=1) 
                                      ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4) (actual time=6.23..6.38
rows=71loops=1) 
                                ->  Index Scan using reports_pkey on reports  (cost=0.00..4.64 rows=1 width=8) (actual
time=3.44..3.44rows=1 loops=1) 
Total runtime: 64568.82 msec

EXPLAIN

Re: Huge Performance Difference on Similar Query in Pg7.2

От
"Luis Alberto Amigo Navarro"
Дата:
first of all the difference between expected times and real times may be
reduced tuning your postgresql.conf
try to set enable_seqscan to off and then re explain analyze to see if there
is another possible solution
Regards


Re: Huge Performance Difference on Similar Query in Pg7.2

От
Heiko Klein
Дата:
Luis Alberto Amigo Navarro writes:
 > first of all the difference between expected times and real times may be
 > reduced tuning your postgresql.conf
 > try to set enable_seqscan to off and then re explain analyze to see if there
 > is another possible solution
 > Regards

About the times, how do I adjust those? Does it really makes such a huge
difference or is it just for better display.

I set enable_seqscan to off, and it changed to the Seq scan on emissions
to a index scan on emissions, but time didn't change, still about a
factor 50 between both queries:

New explain analyze report of 2)
emep=> explain analyze select  count(*) from  LowHiSectorGridEmissionsView ,    DataSetsView where  DataSetsView.setid
='4614' and DataSetsView.setid = LowHiSectorGridEmissionsView.setid ; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=103440665.94..103440665.94 rows=1 width=131) (actual time=61151.81..61151.81 rows=1 loops=1)
  ->  Hash Join  (cost=100005420.21..103440665.93 rows=4 width=131) (actual time=49892.37..61150.12 rows=1606 loops=1)
        ->  Nested Loop  (cost=100005403.12..103440051.40 rows=119480 width=103) (actual time=46157.46..59840.49
rows=1025405loops=1) 
              ->  Seq Scan on footnotes  (cost=100000000.00..100000001.01 rows=1 width=4) (actual time=0.01..0.03
rows=1loops=1) 
              ->  Materialize  (cost=3438258.19..3438258.19 rows=119480 width=99) (actual time=46157.41..48845.75
rows=1025405loops=1) 
                    ->  Hash Join  (cost=5403.12..3438258.19 rows=119480 width=99) (actual time=1344.17..43964.04
rows=1025405loops=1) 
                          ->  Merge Join  (cost=5392.13..3405513.72 rows=871449 width=77) (actual
time=1336.38..23857.31rows=1035128 loops=1) 
                                ->  Sort  (cost=5392.13..5392.13 rows=28443 width=65) (actual time=1323.65..1383.10
rows=31471loops=1) 
                                      ->  Hash Join  (cost=54.91..2615.14 rows=28443 width=65) (actual
time=21.62..901.24rows=33593 loops=1) 
                                            ->  Hash Join  (cost=50.15..2112.63 rows=28443 width=38) (actual
time=11.27..646.34rows=33593 loops=1) 
                                                  ->  Hash Join  (cost=43.48..1537.09 rows=28443 width=34) (actual
time=10.80..443.36rows=33593 loops=1) 
                                                        ->  Index Scan using datasets_setid_idx on datasets
(cost=0.00..969.14rows=33786 width=22) (actual time=0.64..249.20 rows=33786 loops=1) 
                                                        ->  Hash  (cost=42.53..42.53 rows=378 width=12) (actual
time=10.08..10.08rows=0 loops=1) 
                                                              ->  Hash Join  (cost=6.68..42.53 rows=378 width=12)
(actualtime=0.53..9.39 rows=377 loops=1) 
                                                                    ->  Index Scan using reports_pkey on reports
(cost=0.00..28.29rows=378 width=8) (actual time=0.07..7.25 rows=378 loops=1) 
                                                                    ->  Hash  (cost=6.50..6.50 rows=71 width=4) (actual
time=0.38..0.38rows=0 loops=1) 
                                                                          ->  Index Scan using areas_pkey on areas
(cost=0.00..6.50rows=71 width=4) (actual time=0.01..0.24 rows=71 loops=1) 
                                                  ->  Hash  (cost=6.50..6.50 rows=71 width=4) (actual time=0.40..0.40
rows=0loops=1) 
                                                        ->  Index Scan using areas_pkey on areas  (cost=0.00..6.50
rows=71width=4) (actual time=0.02..0.27 rows=71 loops=1) 
                                            ->  Hash  (cost=4.71..4.71 rows=18 width=27) (actual time=10.26..10.26
rows=0loops=1) 
                                                  ->  Index Scan using datatypes_pkey on datatypes  (cost=0.00..4.71
rows=18width=27) (actual time=10.15..10.23 rows=18 loops=1) 
                                ->  Index Scan using lhsecgriemis_setid_idx on lowhisectorgridemissions
(cost=0.00..3386569.53rows=1035134 width=12) (actual time=12.67..13841.30 rows=1035134 loops=1) 
                          ->  Hash  (cost=10.72..10.72 rows=109 width=22) (actual time=7.68..7.68 rows=0 loops=1)
                                ->  Index Scan using sectordefinitions_pkey on sectordefinitions  (cost=0.00..10.72
rows=109width=22) (actual time=6.74..7.46 rows=109 loops=1) 
        ->  Hash  (cost=17.09..17.09 rows=1 width=28) (actual time=27.32..27.32 rows=0 loops=1)
              ->  Nested Loop  (cost=0.00..17.09 rows=1 width=28) (actual time=27.29..27.31 rows=1 loops=1)
                    ->  Nested Loop  (cost=0.00..12.55 rows=1 width=24) (actual time=27.26..27.28 rows=1 loops=1)
                          ->  Nested Loop  (cost=0.00..8.01 rows=1 width=20) (actual time=12.04..12.05 rows=1 loops=1)
                                ->  Index Scan using datasets_setid_idx on datasets  (cost=0.00..3.36 rows=1 width=12)
(actualtime=4.61..4.62 rows=1 loops=1) 
                                ->  Index Scan using reports_pkey on reports  (cost=0.00..4.64 rows=1 width=8) (actual
time=7.41..7.41rows=1 loops=1) 
                          ->  Index Scan using areas_pkey on areas  (cost=0.00..4.52 rows=1 width=4) (actual
time=15.21..15.21rows=1 loops=1) 
                    ->  Index Scan using areas_pkey on areas  (cost=0.00..4.52 rows=1 width=4) (actual time=0.02..0.02
rows=1loops=1) 
Total runtime: 61309.75 msec

EXPLAIN

Re: Huge Performance Difference on Similar Query in Pg7.2

От
"Luis Alberto Amigo Navarro"
Дата:
In fact it seems it's no planner's mistake, you would need for some indexing
how is the table layout, and what indexes u have?


Re: Huge Performance Difference on Similar Query in Pg7.2

От
Heiko Klein
Дата:
Luis Alberto Amigo Navarro writes:
 > In fact it seems it's no planner's mistake, you would need for some indexing
 > how is the table layout, and what indexes u have?
 >

Those are the main tables:

CREATE TABLE DataSets
    (setid            INTEGER        NOT NULL,
      areaid          INTEGER     NULL,
      reportid         INTEGER     NOT NULL,
      datatype        VARCHAR(12)    NULL,
      component        VARCHAR(12)     NULL,
      year            INTEGER     NULL,
      updated            DATE         NULL,
      scaling            INTEGER     NULL,
      xcomment        VARCHAR(2000)     NULL,
      quality            CHARACTER(1)     DEFAULT '9'    NULL,
    PRIMARY KEY (setid),
    FOREIGN KEY (areaid) REFERENCES Areas(areaid),
    FOREIGN KEY (reportid) REFERENCES Reports(reportid),
    FOREIGN KEY (datatype) REFERENCES DataTypes(datatype),
    FOREIGN KEY (component) REFERENCES Components(component)
);
Additional indexes on areaid, reportid (as far as I understood, Primary key is
    allways a index?)


CREATE TABLE LowHiSectorGridEmissions
    (setid            integer     not null,
    sector            integer        not null,
    x            integer        not null,
    y            integer        not null,
    lowemission        integer        null,
    highemission        integer        null,
    footnoteid        INTEGER        NULL,
    PRIMARY KEY (setid, sector, x, y),
    FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid),
    FOREIGN KEY (setid) REFERENCES DataSets
);
Additional indexes on setid, sector, x, y.


The main Views:
CREATE VIEW DataSetsView
   AS SELECT
    setid,
        DataSets.areaid AS areaid,
        Areas.name AS name,
        Areas.acronym AS acronym,
        DataSets.reportid AS reportid,
    reportcode,
    reportyear,
    datereceived,
      datatype,
    origin,
      component,
      year,
    (SUBSTRING(origin FROM 1 FOR 1) ||
     SUBSTRING(datatype FROM 1 FOR 2) ||
     '-' ||
     TRIM(BOTH ' ' FROM component) ||
     '-' ||
     TRIM(BOTH ' ' FROM Areas.acronym) ||
     SUBSTRING(year FROM 3 FOR 2) ||
     SUBSTRING (reportyear FROM 3 FOR 2)
    )                    AS datasetcode,
      updated,
      scaling,
        DataSets.xcomment AS xcomment,
    datesent,
    datemscw,
    quality
    FROM DataSets, Areas, ReportsView
    WHERE DataSets.areaid = Areas.areaid
      AND DataSets.reportid = ReportsView.reportid
;


CREATE VIEW LowHiSectorGridEmissionsView
       AS SELECT
        LowHiSectorGridEmissions.setid AS setid,
        LowHiSectorGridEmissions.sector AS sector,
    sectorcode,
        x,
        y,
        scaling,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(LowHiSectorGridEmissions.lowemission)
    )                AS exlowemission,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(LowHiSectorGridEmissions.highemission)
    )                AS exhighemission,
    ('0x' ||
     int2hex(LowHiSectorGridEmissions.lowemission)
    )                AS rawlowemission,
    ('0x' ||
     int2hex(LowHiSectorGridEmissions.highemission)
    )                AS rawhighemission,
    FootNotes.footnote AS footnote
    FROM DataSetsView, LowHiSectorGridEmissions, DataTypes,
         SectorDefinitions, FootNotes
    WHERE DataSetsView.setid = LowHiSectorGridEmissions.setid
      AND DataSetsView.datatype = DataTypes.datatype
      AND DataTypes.sectordefinition = SectorDefinitions.sectordefinition
      AND LowHiSectorGridEmissions.sector = SectorDefinitions.sector
      AND ( LowHiSectorGridEmissions.footnoteid = FootNotes.footnoteid
        OR FootNotes.footnoteid = '0')
;

Area, Footnotes, Components, DataTypes, SectorDefinitions are all small tables.

I tried also to add indexes to all things, which are part of the where
clauses of the views. Without success.

Heiko


Below I add the complete tabledescribtion:
-- $Author$
-- $Date$
-- $Header$
-- $Id$
-- $Name$
-- $Locker$
-- $Log$
-- $Revision$
-- $Source$
-- $State$

CREATE TABLE AreaCollections
    (collection        varchar(12)     not null,
    acronyms        varchar(2000)    null,
    description        varchar(100)    null,
    PRIMARY KEY (collection)
);

CREATE TABLE Areas
    (areaid        INTEGER         NOT NULL,
    includedin      INTEGER        NULL,
    name        VARCHAR(100)    NULL,
    acronym        VARCHAR(12)    NULL,
    areastart    DATE        NULL,
    areaend        DATE        NULL,
    PRIMARY KEY (areaid),
    FOREIGN KEY (includedin) REFERENCES Areas(areaid)
);

CREATE TABLE ComponentCategories
    (componentcategory   VARCHAR(12)    NOT NULL,
    description         VARCHAR(100)    NULL,
    PRIMARY KEY (componentcategory)
);

CREATE TABLE Components
    (component        VARCHAR(12)     NOT NULL,
    componentcategory    VARCHAR(12)    NOT NULL,
    name            VARCHAR(100)    NULL,
    PRIMARY KEY (component),
    FOREIGN KEY (componentcategory)
        REFERENCES ComponentCategories(componentcategory)
);

CREATE TABLE SectorDefinitions
    (sdid            integer        not null,
    sectordefinition    varchar(15)    not null,
    sector            integer        not null,
    sectorcode        varchar(20)    not null,
    anthropogenic        integer        not null,
    description        varchar(2000)    null,
    PRIMARY KEY (sectordefinition, sector),
    UNIQUE (sdid)
);

CREATE TABLE GridDefinitions
    (griddefinition        varchar(12)    not null,
    xlow            integer        null,
    xhigh            integer        null,
    ylow            integer        null,
    yhigh            integer        null,
    description        varchar(2000)    null,
    PRIMARY KEY (griddefinition)
);

CREATE TABLE GridAllocations
    (griddefinition        VARCHAR(12)    NOT NULL,
    areaid            INTEGER        NOT NULL,
    x            INTEGER        NOT NULL,
    y            INTEGER        NOT NULL,
    fraction        VARCHAR(12)    NULL,
    PRIMARY KEY (griddefinition, areaid, x, y),
    FOREIGN KEY (areaid) REFERENCES Areas(areaid),
    FOREIGN KEY (griddefinition) REFERENCES GridDefinitions(griddefinition)
);

CREATE TABLE DataTypes
    (datatype        VARCHAR(12)     NOT NULL,
    sectordefinition    VARCHAR(15)    NULL,
    griddefinition        VARCHAR(12)    NULL,
    description        VARCHAR(2000)    NULL,
    datatable        VARCHAR(100)    NULL,
    PRIMARY KEY (datatype),
    FOREIGN KEY (sectordefinition) REFERENCES SectorDefinitions(sectordefinition),
    FOREIGN KEY (griddefinition) REFERENCES GridDefinitions (griddefinition)
);


create table Origins
    (origin            varchar(12)    not null,
    description        varchar(2000)    null,
    PRIMARY KEY (origin)
);

CREATE TABLE Reports
    (reportid          INTEGER     NOT NULL,
    reportcode              VARCHAR(100)    NULL,
    areaid             INTEGER     NULL,
    datereceived        DATE         NULL,
    datesent        DATE         NULL,
    datemscw        DATE         NULL,
    origin            VARCHAR(12)    NULL,
    xcomment        VARCHAR(2000)    NULL,
    PRIMARY KEY (reportid),
    FOREIGN KEY (areaid) REFERENCES Areas(areaid)
);

CREATE TABLE DataSets
    (setid            INTEGER        NOT NULL,
      areaid          INTEGER     NULL,
      reportid         INTEGER     NOT NULL,
      datatype        VARCHAR(12)    NULL,
      component        VARCHAR(12)     NULL,
      year            INTEGER     NULL,
      updated            DATE         NULL,
      scaling            INTEGER     NULL,
      xcomment        VARCHAR(2000)     NULL,
      quality            CHARACTER(1)     DEFAULT '9'    NULL,
    PRIMARY KEY (setid),
    FOREIGN KEY (areaid) REFERENCES Areas(areaid),
    FOREIGN KEY (reportid) REFERENCES Reports(reportid),
    FOREIGN KEY (datatype) REFERENCES DataTypes(datatype),
    FOREIGN KEY (component) REFERENCES Components(component)
);

create table DataCollections
    (collection         varchar(12)     not null,
    description        varchar(2000)    null,
    PRIMARY KEY (collection)
);

CREATE TABLE CollectionSpecs
    (collection         VARCHAR(12)     NOT NULL,
    setid            INTEGER        NOT NULL,
    PRIMARY KEY (collection, setid),
    FOREIGN KEY (collection) REFERENCES DataCollections(collection),
    FOREIGN KEY (setid) REFERENCES DataSets(setid)
);

CREATE TABLE FootNotes
       (footnoteid             INTEGER        NOT NULL,
        footnote           VARCHAR(2000)    NULL,
    reportid           INTEGER        NULL,
    PRIMARY KEY (footnoteid),
    FOREIGN KEY (reportid) REFERENCES Reports(reportid)
);


CREATE TABLE Emissions
    (setid            INTEGER     NOT NULL,
    emission        INTEGER        NULL,
    xcomment        VARCHAR(2000)    NULL,
    footnoteid        INTEGER        NULL,
    PRIMARY KEY (setid),
    FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid),
    FOREIGN KEY (setid) REFERENCES DataSets
);

create table GridEmissions
    (setid            integer        not null,
    x            integer        not null,
    y            integer        not null,
    emission        integer        null,
    footnoteid        INTEGER        NULL,
    PRIMARY KEY (setid, x, y),
    FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid),
    FOREIGN KEY (setid) REFERENCES DataSets
);

create table PointEmissions
    (setid            integer     not null,
    lps            varchar(12)    not null,
    emission        integer        null,
    footnoteid        INTEGER        NULL,
    PRIMARY KEY (setid, lps),
    FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid),
    FOREIGN KEY (setid) REFERENCES DataSets
);

create table LowHiEmissions
    (setid            integer     not null,
    lowemission        integer        null,
    highemission        integer        null,
    xcomment        varchar(2000)    null,
    footnoteid        INTEGER        NULL,
    PRIMARY KEY (setid),
    FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid),
    FOREIGN KEY (setid) REFERENCES DataSets
);

create table LowHiGridEmissions
    (setid            integer     not null,
    x            integer        not null,
    y            integer        not null,
    lowemission        integer        null,
    highemission        integer        null,
    footnoteid        INTEGER        NULL,
    PRIMARY KEY (setid, x, y),
    FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid),
    FOREIGN KEY (setid) REFERENCES DataSets
);

create table SectorEmissions
    (setid            integer     not null,
    sector            integer        not null,
    emission        integer        null,
    xcomment        varchar(2000)    null,
    footnoteid        INTEGER        NULL,
    PRIMARY KEY (setid, sector),
    FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid),
    FOREIGN KEY (setid) REFERENCES DataSets
);

create table SectorGridEmissions
    (setid            integer     not null,
    sector            integer        not null,
    x            integer     not null,
    y            integer     not null,
    emission        integer        null,
    footnoteid        INTEGER        NULL,
    PRIMARY KEY (setid, sector, x, y),
    FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid),
    FOREIGN KEY (setid) REFERENCES DataSets
);

create table LowHiSectorGridEmissions
    (setid            integer     not null,
    sector            integer        not null,
    x            integer        not null,
    y            integer        not null,
    lowemission        integer        null,
    highemission        integer        null,
    footnoteid        INTEGER        NULL,
    PRIMARY KEY (setid, sector, x, y),
    FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid),
    FOREIGN KEY (setid) REFERENCES DataSets
);

create table LargePointSources
    (lpsid            integer        not null,
    areaid            integer     not null,
    lps             varchar(5)     not null,
    name            varchar(100)     null,
    latitude        integer        null,
    longitude        integer        null,
    height            integer        null,
    exitsurface        integer        null,
    speed            integer        null,
    temperature        integer        null,
    PRIMARY KEY (areaid, lps),
    FOREIGN KEY (areaid) REFERENCES Areas(areaid),
    UNIQUE (lpsid)
);

create table UpdateSessions
    (sesid            integer        not null,
    updatetime        date        not null,
    logname            varchar(30)    not null,
    tablename        varchar(30)    not null,
    PRIMARY KEY (sesid)
);

create table UpdateRecords
    (sesid            integer     not null,
    seqnum            integer     not null,
    updatetype        varchar(12)    not null,
    recordid        varchar(12)     null,
-- recordid can but must not be the Records/recordid
-- this is an awful design-flaw!!!
    setid            integer     null,
    sector            integer     null,
    x            integer     null,
    y            integer     null,
    PRIMARY KEY (sesid,seqnum),
    FOREIGN KEY (sesid) REFERENCES UpdateSessions(sesid)
);

create table UpdateValues
    (sesid            integer        not null,
    seqnum            integer     not null,
    varname            varchar(30)    not null,
    oldval            varchar(2000)    null,
    newval            varchar(2000)    null,
    PRIMARY KEY (sesid, seqnum, varname),
    FOREIGN KEY (sesid) REFERENCES UpdateSessions(sesid)
);

create table Globals
    (globid            varchar(12)     not null,
    globvalue        integer        null,
    PRIMARY KEY (globid)
);

create table Batches
    (batchid        varchar(12)     not null,
    first            integer        not null,
    last            integer        not null,
    PRIMARY KEY (batchid, first)
)





-- $Author$
-- $Date$
-- $Header$
-- $Id$
-- $Name$
-- $Locker$
-- $Log$
-- $Revision$
-- $Source$
-- $State$

-- This works only as superuser. On polar, plperl is compiled into template1
-- so will exist in each db created
-- DROP LANGUAGE 'plperl';
-- CREATE LANGUAGE 'plperl';

DROP FUNCTION int2hex(integer);
CREATE FUNCTION int2hex(integer) RETURNS char(10)
    AS '$tmp = sprintf "%x", $_[0];
        return ("0"x(8-length($tmp)).uc($tmp));'
    LANGUAGE 'plperl';


DROP VIEW SectorDefView;
CREATE VIEW SectorDefView AS
        SELECT DISTINCT
      sectordefinition
    FROM SectorDefinitions
;

DROP VIEW GridAllocView;
CREATE VIEW GridAllocView AS
        SELECT
       griddefinition,
       Areas.areaid      AS areaid,
       name,
       acronym,
       x,
       y,
       fraction
    FROM GridAllocations, Areas
    WHERE GridAllocations.areaid = Areas.areaid
;

DROP VIEW ReportsView;
CREATE VIEW ReportsView
       AS SELECT
      reportid,
      to_char(datereceived,'YYYY')               AS reportyear,
      reportcode,
      Areas.areaid AS areaid,
      acronym,
      name,
      to_char(datereceived,'YYYYMMDD')        AS datereceived,
      to_char(datesent,'YYYYMMDD')            AS datesent,
      to_char(datemscw,'YYYYMMDD')            AS datemscw,
      origin,
      xcomment
      FROM Reports, Areas
      WHERE Reports.areaid = Areas.areaid
;

DROP VIEW YearOfRepView;
CREATE VIEW YearOfRepView
       AS SELECT DISTINCT
      reportyear
      FROM ReportsView
;

DROP VIEW DataSetsView;
CREATE VIEW DataSetsView
   AS SELECT
    setid,
        DataSets.areaid AS areaid,
        Areas.name AS name,
        Areas.acronym AS acronym,
        DataSets.reportid AS reportid,
    reportcode,
    reportyear,
    datereceived,
      datatype,
    origin,
      component,
      year,
    (SUBSTRING(origin FROM 1 FOR 1) ||
     SUBSTRING(datatype FROM 1 FOR 2) ||
     '-' ||
     TRIM(BOTH ' ' FROM component) ||
     '-' ||
     TRIM(BOTH ' ' FROM Areas.acronym) ||
     SUBSTRING(year FROM 3 FOR 2) ||
     SUBSTRING (reportyear FROM 3 FOR 2)
    )                    AS datasetcode,
      updated,
      scaling,
        DataSets.xcomment AS xcomment,
    datesent,
    datemscw,
    quality
    FROM DataSets, Areas, ReportsView
    WHERE DataSets.areaid = Areas.areaid
      AND DataSets.reportid = ReportsView.reportid
;

DROP VIEW EmissionsView;
CREATE VIEW EmissionsView
   AS SELECT
      Emissions.setid AS setid,
      areaid,
      name,
      acronym,
      DataSetsView.reportid,
      reportcode,
      reportyear,
      datereceived,
      datatype,
      origin,
      component,
      year,
      datasetcode,
      updated,
      scaling,
      Emissions.emission AS emission ,
      ('EMIS_' ||
       scaling ||
       '_0x' ||
       int2hex(Emissions.emission)
      )                      AS exemission,
      Emissions.xcomment AS xcomment,
      FootNotes.footnote AS footnote,
      ('0x' ||
       int2hex(Emissions.emission)
      )                                 AS rawemission
      FROM DataSetsView, Emissions, FootNotes
      WHERE DataSetsView.setid = Emissions.setid
        AND ( Emissions.footnoteid = FootNotes.footnoteid
         OR   FootNotes.footnoteid = '0' )
;

DROP VIEW GridEmissionsView;
CREATE VIEW GridEmissionsView
       AS SELECT
      GridEmissions.setid AS setid,
      x,
      y,
      scaling,
      GridEmissions.emission AS emission,
      ('EMIS_' ||
       scaling ||
       '_0x' ||
       int2hex(GridEmissions.emission)
      )                   AS exemission,
      ('0x' ||
       int2hex(GridEmissions.emission)
      )                AS rawemission,
      FootNotes.footnote as footnote
      FROM DataSetsView, GridEmissions, FootNotes
      WHERE DataSetsView.setid = GridEmissions.setid
        AND ( GridEmissions.footnoteid = FootNotes.footnoteid
              OR FootNotes.footnoteid = '0')
;

DROP VIEW LowHiEmissionsView;
CREATE VIEW LowHiEmissionsView
       AS SELECT
        LowHiEmissions.setid AS setid,
      areaid,
    name,
    acronym,
      DataSetsView.reportid,
    reportcode,
    reportyear,
    datereceived,
      datatype,
    origin,
      component,
      year,
    datasetcode,
      updated,
      scaling,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(LowHiEmissions.lowemission)
    )                   AS exlowemission,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(LowHiEmissions.highemission)
    )                   AS exhighemission,
        LowHiEmissions.xcomment AS xcomment,
    ('0x' ||
     int2hex(LowHiEmissions.lowemission)
    )                AS rawlowemission,
    ('0x' ||
     int2hex(LowHiEmissions.highemission)
    )                AS rawhighemission,
    datesent,
    datemscw,
    FootNotes.footnote AS footnote
    FROM DataSetsView, LowHiEmissions, FootNotes
    WHERE DataSetsView.setid = LowHiEmissions.setid
      AND ( LowHiEmissions.footnoteid = FootNotes.footnoteid
            OR FootNotes.footnoteid = '0' )
;

DROP VIEW LowHiGridEmissionsView;
CREATE VIEW LowHiGridEmissionsView
       AS SELECT
    LowHiGridEmissions.setid AS setid,
    x,
    y,
      scaling,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(LowHiGridEmissions.lowemission)
    )                   AS exlowemission,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(LowHiGridEmissions.highemission)
    )                   AS exhighemission,
    ('0x' ||
     int2hex(LowHiGridEmissions.lowemission)
    )                AS rawlowemission,
    ('0x' ||
     int2hex(LowHiGridEmissions.highemission)
    )                AS rawhighemission,
    FootNotes.footnote AS footnote
    FROM DataSetsView, LowHiGridEmissions, FootNotes
    WHERE DataSetsView.setid = LowHiGridEmissions.setid
      AND ( LowHiGridEmissions.footnoteid = FootNotes.footnoteid
        OR FootNotes.footnoteid = '0')
;

DROP VIEW SectorEmissionsView;
CREATE VIEW SectorEmissionsView
       AS SELECT
        SectorEmissions.setid AS setid,
        areaid,
        name,
        acronym,
        DataSetsView.reportid,
        reportcode,
        reportyear,
        datereceived,
        DataSetsView.datatype AS datatype,
        origin,
        component,
        year,
        datasetcode,
        updated,
        scaling,
        SectorEmissions.sector AS sector,
    sectorcode,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(SectorEmissions.emission)
    )                AS exemission,
        SectorEmissions.xcomment AS xcomment,
    ('0x' ||
     int2hex(SectorEmissions.emission)
    )                AS rawemission,
        datesent,
        datemscw,
    FootNotes.footnote AS footnote
    FROM DataSetsView, SectorEmissions, DataTypes, SectorDefinitions,
         FootNotes
    WHERE DataSetsView.setid = SectorEmissions.setid
      AND DataSetsView.datatype = DataTypes.datatype
      AND DataTypes.sectordefinition = SectorDefinitions.sectordefinition
      AND SectorEmissions.sector = SectorDefinitions.sector
      AND ( SectorEmissions.footnoteid = FootNotes.footnoteid
        OR FootNotes.footnoteid = '0')
;

DROP VIEW SectorGridEmissionsView;
CREATE VIEW SectorGridEmissionsView
       AS SELECT
        SectorGridEmissions.setid AS setid,
        SectorGridEmissions.sector AS sector,
    sectorcode,
        x,
        y,
        scaling,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(SectorGridEmissions.emission)
    )                AS exemission,
    ('0x' ||
     int2hex(SectorGridEmissions.emission)
    )                AS rawemission,
    FootNotes.footnote AS footnote
    FROM DataSetsView, SectorGridEmissions, DataTypes,
         SectorDefinitions, FootNotes
    WHERE DataSetsView.setid = SectorGridEmissions.setid
      AND DataSetsView.datatype = DataTypes.datatype
      AND DataTypes.sectordefinition = SectorDefinitions.sectordefinition
      AND SectorGridEmissions.sector = SectorDefinitions.sector
      AND ( SectorGridEmissions.footnoteid = FootNotes.footnoteid
        OR FootNotes.footnoteid = '0' )
;

DROP VIEW LowHiSectorGridEmissionsView;
CREATE VIEW LowHiSectorGridEmissionsView
       AS SELECT
        LowHiSectorGridEmissions.setid AS setid,
        LowHiSectorGridEmissions.sector AS sector,
    sectorcode,
        x,
        y,
        scaling,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(LowHiSectorGridEmissions.lowemission)
    )                AS exlowemission,
    ('EMIS_' ||
     scaling ||
     '_0x' ||
     int2hex(LowHiSectorGridEmissions.highemission)
    )                AS exhighemission,
    ('0x' ||
     int2hex(LowHiSectorGridEmissions.lowemission)
    )                AS rawlowemission,
    ('0x' ||
     int2hex(LowHiSectorGridEmissions.highemission)
    )                AS rawhighemission,
    FootNotes.footnote AS footnote
    FROM DataSetsView, LowHiSectorGridEmissions, DataTypes,
         SectorDefinitions, FootNotes
    WHERE DataSetsView.setid = LowHiSectorGridEmissions.setid
      AND DataSetsView.datatype = DataTypes.datatype
      AND DataTypes.sectordefinition = SectorDefinitions.sectordefinition
      AND LowHiSectorGridEmissions.sector = SectorDefinitions.sector
      AND ( LowHiSectorGridEmissions.footnoteid = FootNotes.footnoteid
        OR FootNotes.footnoteid = '0')
;

DROP VIEW EmissionYearView;
CREATE VIEW EmissionYearView
       AS SELECT DISTINCT
          year
      FROM DataSets
;

DROP VIEW CollSpecsView;
CREATE VIEW CollSpecsView
       AS SELECT
      collection,
      CollectionSpecs.setid AS setid,
      datasetcode
      FROM CollectionSpecs, DataSetsView
      WHERE CollectionSpecs.setid = DataSetsView.setid
;

DROP VIEW LaPoSoView;
CREATE VIEW LaPoSoView
       AS SELECT
      LargePointSources.lpsid AS lpsid,
      LargePointSources.areaid AS areaid,
      Areas.name AS name,
      acronym,
      lps,
      (acronym || lps) AS lpscode,
      LargePointSources.name AS lpsname,
      latitude,
      longitude,
      height,
      exitsurface,
      speed,
      temperature
      FROM LargePointSources, Areas
      WHERE LargePointSources.areaid = Areas.areaid
;

DROP VIEW EmissionsUpdView;
CREATE VIEW EmissionsUpdView
       AS SELECT
      setid,
      xcomment,
      ('0x' ||
       int2hex(Emissions.emission)
      )                AS rawemission
      FROM Emissions
;

Re: Huge Performance Difference on Similar Query in Pg7.2

От
"Luis Alberto Amigo Navarro"
Дата:
> Aggregate  (cost=5975.66..5975.66 rows=1 width=131) (actual
time=1264.12..1264.13 rows=1 loops=1)
>   ->  Hash Join  (cost=5958.18..5975.58 rows=35 width=131) (actual
time=1205.90..1262.46 rows=1606 loops=1)
>         ->  Hash Join  (cost=5956.30..5972.99 rows=35 width=127) (actual
time=1197.85..1248.93 rows=1606 loops=1)
 2)
> emep=> explain analyze select  count(*) from  EmissionsView ,
DataSetsView where  DataSetsView.setid = EmissionsView.setid and
DataSetsView.setid = '4614' ;
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=91660.36..91660.36 rows=1 width=131) (actual
time=64414.80..64414.80 rows=1 loops=1)
>   ->  Hash Join  (cost=2732.23..91660.35 rows=3 width=131) (actual
time=58428.47..64413.14 rows=1606 loops=1)
>         ->  Nested Loop  (cost=2719.53..91058.61 rows=117798 width=103)
(actual time=49523.50..63005.67 rows=1025405 loops=1)

I think here is the difference, in the first case you are performing
hash-join from 33000 rows, in second case it is from 1M rows, so it is
wasting time on hashing, it seems that 1st conditions are more restrictive
than the second one. I mean there are less 4614 on EmissionsView than on
DataSetsView
Regards


Re: Huge Performance Difference on Similar Query in Pg7.2

От
"Nigel J. Andrews"
Дата:
On Fri, 22 Mar 2002, Luis Alberto Amigo Navarro wrote:

> > Aggregate  (cost=5975.66..5975.66 rows=1 width=131) (actual
> time=1264.12..1264.13 rows=1 loops=1)
> >   ->  Hash Join  (cost=5958.18..5975.58 rows=35 width=131) (actual
> time=1205.90..1262.46 rows=1606 loops=1)
> >         ->  Hash Join  (cost=5956.30..5972.99 rows=35 width=127) (actual
> time=1197.85..1248.93 rows=1606 loops=1)
>  2)
> > emep=> explain analyze select  count(*) from  EmissionsView ,
> DataSetsView where  DataSetsView.setid = EmissionsView.setid and
> DataSetsView.setid = '4614' ;
> > NOTICE:  QUERY PLAN:
> >
> > Aggregate  (cost=91660.36..91660.36 rows=1 width=131) (actual
> time=64414.80..64414.80 rows=1 loops=1)
> >   ->  Hash Join  (cost=2732.23..91660.35 rows=3 width=131) (actual
> time=58428.47..64413.14 rows=1606 loops=1)
> >         ->  Nested Loop  (cost=2719.53..91058.61 rows=117798 width=103)
> (actual time=49523.50..63005.67 rows=1025405 loops=1)
>
> I think here is the difference, in the first case you are performing
> hash-join from 33000 rows, in second case it is from 1M rows, so it is
> wasting time on hashing, it seems that 1st conditions are more restrictive
> than the second one. I mean there are less 4614 on EmissionsView than on
> DataSetsView
> Regards
>

If I may be so bold as to join this thread as well, this is a close enough
match to the subject of my long pending post I mentioned in another thread to
warrent it I believe.

I composed my message within a newsgroup thread but I had trouble posting it
there. Therefore I have some message fragments quoted at the start which will
not have been seen on this list.

To summarise: it's long (sorry), it's mostly to do with the planner not
using an index and it's long.

Here goes...

Re: Huge Performance Difference on Similar Query in Pg7.2

От
Tom Lane
Дата:
Heiko Klein <Heiko.Klein@met.no> writes:
> When I now perform the two similar (results equal) queries on those
> rows:
> 1)
> select count(*) from EmissionsView, DataSetsView
> where DataSetsView.setid = EmissionsView.setid
>   and EmissionsView.setid = '4614' ;
>       -------------
> 2)
> select count(*) from EmissionsView, DataSetsView
> where DataSetsView.setid = EmissionsView.setid
>   and DataSetsView.setid = '4614' ;
>       ------------

> I have a huge performance difference.

Apparently the condition EmissionsView.setid = '4614' is a lot more
selective on that table than DataSetsView.setid = '4614' is on its
table.  So pushing down the former condition translates into lots
fewer rows to be joined than pushing down the latter.

Unfortunately the planner is not very bright about transitivity of
equals and so does not realize that it could derive EmissionsView.setid
= '4614' from the other two conditions.  I imagine the reason Oracle
gives similar behavior for both queries is that it does expend the
cycles to make that deduction.  (This is a guess though; it'd be
interesting to see their query plans.)

Given the complexity of the plans, it may just be that the planner is
switching into GEQO search mode and is failing to find the best plan.
You might consider setting geqo_threshold larger than the number of
base tables in the query (looks like 12, here) to see if better plans
emerge.

It'd also be worth asking whether you couldn't simplify your views ;-).
Do you really require a 12-way join to do whatever it is you're doing?
I'm pretty amazed that the system is able to find a good plan for either
query ...

            regards, tom lane

Re: Huge Performance Difference on Similar Query in Pg7.2

От
Heiko Klein
Дата:

Tom Lane writes:
 > Unfortunately the planner is not very bright about transitivity of
 > equals and so does not realize that it could derive EmissionsView.setid
 > = '4614' from the other two conditions.  I imagine the reason Oracle
 > gives similar behavior for both queries is that it does expend the
 > cycles to make that deduction.  (This is a guess though; it'd be
 > interesting to see their query plans.)

I don't know how to retrieve the query plan from oracle7. Any hints and
I will send them

 >
 > Given the complexity of the plans, it may just be that the planner is
 > switching into GEQO search mode and is failing to find the best plan.
 > You might consider setting geqo_threshold larger than the number of
 > base tables in the query (looks like 12, here) to see if better plans
 > emerge.

Wow, thats impressive. By changing the geqo_threshold from 11 to 12, I
reduced time from 70000ms to 700ms. Increasing step by step it stay
constant until 15, then I have 70000ms at 16 and 17, and drop to 200ms
at 18 and stay between 100 and 150 when I increase it further. So why
not using a very high geqo_threshold (maybe 100) as threshold? (See
new query-plan below with geqo_threshold = 20)

 > It'd also be worth asking whether you couldn't simplify your views ;-).
 > Do you really require a 12-way join to do whatever it is you're doing?
 > I'm pretty amazed that the system is able to find a good plan for either
 > query ...

Exactly that was the problems, all queries where quite slow. This one
was only one I used very often. But I cannot do everything at once. I
just convinced my boss to switch from Oracle to Postgres, and that this
wouldn't take a long time. So the frontend is important for the views
(it asks for them), and I may not change this before I haven't managed to
converert to Postgres smoothly. Short: its politics

But thanks a lot for your help, that was really usefull and a huge step
forward for me.

Heiko


emep=> explain analyze select  count(*) from  LowHiSectorGridEmissionsView ,    DataSetsView where  DataSetsView.setid
='4614' and DataSetsView.setid = LowHiSectorGridEmissionsView.setid ; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=1992.56..1992.56 rows=1 width=131) (actual time=112.80..112.80 rows=1 loops=1)
  ->  Nested Loop  (cost=1989.38..1992.55 rows=4 width=131) (actual time=105.12..111.02 rows=1606 loops=1)
        ->  Seq Scan on footnotes  (cost=0.00..1.01 rows=1 width=4) (actual time=0.03..0.04 rows=1 loops=1)
        ->  Materialize  (cost=1991.49..1991.49 rows=4 width=127) (actual time=105.06..106.01 rows=1606 loops=1)
              ->  Hash Join  (cost=1989.38..1991.49 rows=4 width=127) (actual time=75.34..102.23 rows=1606 loops=1)
                    ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4) (actual time=0.02..0.19 rows=71 loops=1)
                    ->  Hash  (cost=1989.37..1989.37 rows=4 width=123) (actual time=75.15..75.15 rows=0 loops=1)
                          ->  Hash Join  (cost=1984.27..1989.37 rows=4 width=123) (actual time=42.51..71.05 rows=1606
loops=1)
                                ->  Seq Scan on sectordefinitions  (cost=0.00..3.09 rows=109 width=22) (actual
time=0.01..0.35rows=109 loops=1) 
                                ->  Hash  (cost=1984.20..1984.20 rows=26 width=101) (actual time=30.29..30.29 rows=0
loops=1)
                                      ->  Nested Loop  (cost=22.01..1984.20 rows=26 width=101) (actual time=1.02..25.55
rows=1606loops=1) 
                                            ->  Hash Join  (cost=22.01..24.08 rows=1 width=89) (actual time=0.97..1.19
rows=1loops=1) 
                                                  ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4) (actual
time=0.01..0.16rows=71 loops=1) 
                                                  ->  Hash  (cost=22.00..22.00 rows=1 width=85) (actual time=0.87..0.87
rows=0loops=1) 
                                                        ->  Nested Loop  (cost=16.07..22.00 rows=1 width=85) (actual
time=0.81..0.87rows=1 loops=1) 
                                                              ->  Hash Join  (cost=16.07..17.35 rows=1 width=77)
(actualtime=0.78..0.84 rows=1 loops=1) 
                                                                    ->  Seq Scan on datatypes  (cost=0.00..1.18 rows=18
width=27)(actual time=0.01..0.06 rows=18 loops=1) 
                                                                    ->  Hash  (cost=16.07..16.07 rows=1 width=50)
(actualtime=0.69..0.69 rows=0 loops=1) 
                                                                          ->  Nested Loop  (cost=8.02..16.07 rows=1
width=50)(actual time=0.31..0.68 rows=1 loops=1) 
                                                                                ->  Nested Loop  (cost=8.02..12.69
rows=1width=28) (actual time=0.28..0.65 rows=1 loops=1) 
                                                                                      ->  Hash Join  (cost=8.02..10.09
rows=1width=24) (actual time=0.21..0.39 rows=1 loops=1) 
                                                                                            ->  Seq Scan on areas
(cost=0.00..1.71rows=71 width=4) (actual time=0.01..0.14 rows=71 loops=1) 
                                                                                            ->  Hash  (cost=8.01..8.01
rows=1width=20) (actual time=0.11..0.11 rows=0 loops=1) 
                                                                                                  ->  Nested Loop
(cost=0.00..8.01rows=1 width=20) (actual time=0.10..0.11 rows=1 loops=1) 
                                                                                                        ->  Index Scan
usingdatasets_setid_idx on datasets  (cost=0.00..3.36 rows=1 width=12) (actual time=0.06..0.06 rows=1 loops=1) 
                                                                                                        ->  Index Scan
usingreports_pkey on reports  (cost=0.00..4.64 rows=1 width=8) (actual time=0.03..0.03 rows=1 loops=1) 
                                                                                      ->  Seq Scan on areas
(cost=0.00..1.71rows=71 width=4) (actual time=0.00..0.16 rows=71 loops=1) 
                                                                                ->  Index Scan using datasets_setid_idx
ondatasets  (cost=0.00..3.36 rows=1 width=22) (actual time=0.02..0.02 rows=1 loops=1) 
                                                              ->  Index Scan using reports_pkey on reports
(cost=0.00..4.64rows=1 width=8) (actual time=0.01..0.01 rows=1 loops=1) 
                                            ->  Index Scan using lhsecgriemis_setid_idx on lowhisectorgridemissions
(cost=0.00..1953.36rows=541 width=12) (actual time=0.04..7.30 rows=1606 loops=1) 
Total runtime: 114.50 msec

EXPLAIN

Re: Huge Performance Difference on Similar Query in Pg7.2

От
Stephan Szabo
Дата:
> I've read lots of messages on this subject, and the FAQ, recently but I'm still
> confused. None of what I have read seems to account for the execution plans I'm
> seeing, except for this mention of scanning a million values in an index and
> discarding all but a small number. However, even this I can not see applies to
> a primary key on a million row table. I upgraded from 7.0.x to 7.2 because I
> was thinking it was an oddity that was probably fixed.
>
> First, my apologies for the length of this posting.
>
> Next some background:
>
> I have a database where one of it's tables records 'sessions', called
> chat_sessions. It has an integer field, session_id, declared as primary key.
> This table is the small table in the example with only about 2000 rows.
>
> There is another table called chat_post, the large table holding about 1
> million rows. It has two integer fields, session_id and post_number, which

Are they actually integers (int4), or are either of them a different type
like int2 or int8?  There are special case workarounds for those two due
to a problem with the types of integer literals.  This looks likely since
even with seq_scan set off it wanted to do a sequence scan which generally
means it doesn't believe it can use the index.

> explain analyze select count(*) from chat_post cp where cp.session_id
> = 123;

Does cp.session_id='123' give something different?

> Show looking up in large table, selecting on primary key, uses
> sequential scan on large
>
> explain analyze select count(*) from chat_post cp where cp.session_id
> = 123 and cp.post_number = 10;

Same here for '123' and '10'.


Re: Huge Performance Difference on Similar Query in Pg7.2

От
"Nigel J. Andrews"
Дата:
On Fri, 22 Mar 2002, Stephan Szabo wrote:

> > I wrote:
> >
> > I've read lots of messages on this subject, and the FAQ, recently but I'm still
> > confused. None of what I have read seems to account for the execution plans I'm
> > seeing, except for this mention of scanning a million values in an index and
> > discarding all but a small number. However, even this I can not see applies to
> > a primary key on a million row table. I upgraded from 7.0.x to 7.2 because I
> > was thinking it was an oddity that was probably fixed.
> >
> > First, my apologies for the length of this posting.
> >
> > Next some background:
> >
> > I have a database where one of it's tables records 'sessions', called
> > chat_sessions. It has an integer field, session_id, declared as primary key.
> > This table is the small table in the example with only about 2000 rows.
> >
> > There is another table called chat_post, the large table holding about 1
> > million rows. It has two integer fields, session_id and post_number, which
>
> Are they actually integers (int4), or are either of them a different type
> like int2 or int8?  There are special case workarounds for those two due
> to a problem with the types of integer literals.  This looks likely since
> even with seq_scan set off it wanted to do a sequence scan which generally
> means it doesn't believe it can use the index.
>
> > explain analyze select count(*) from chat_post cp where cp.session_id
> > = 123;
>
> Does cp.session_id='123' give something different?
>
> > Show looking up in large table, selecting on primary key, uses
> > sequential scan on large
> >
> > explain analyze select count(*) from chat_post cp where cp.session_id
> > = 123 and cp.post_number = 10;
>
> Same here for '123' and '10'.

Spectacular! Bingo! etc.

Using the numbers quoted yields use of the primary key. I am indeed using
something other than int4, int2 in fact. So this is something to do with the
using integer literals which are presumably first interpreted as int4 and then
are converted in some long winded fashion, or something, to int2 for each and
every test or row, whereas specifying them as text causes the backend to
convert to the correct int2 only at the start?

I choose the smaller int because these are unlikely to be restrictive for this
DB and I thought I may as well try and not waste space. Whether it used the
same storage as int4 didn't really matter as I'd given it the opportunity to
use less if it could. Is it worth me moving these to int4 type?

Thanks for that, I can now tell my friend who had a good laugh with me last
weekend about this.

BTW, is this sort of feature documented anywhere or does it come into the 'what
makes a person an expert' catagory?


Nigel Andrews
Logictree Systems Limited


Re: Huge Performance Difference on Similar Query in Pg7.2

От
Stephan Szabo
Дата:

On Fri, 22 Mar 2002, Nigel J. Andrews wrote:

> Spectacular! Bingo! etc.
>
> Using the numbers quoted yields use of the primary key. I am indeed using
> something other than int4, int2 in fact. So this is something to do with the
> using integer literals which are presumably first interpreted as int4 and then
> are converted in some long winded fashion, or something, to int2 for each and
> every test or row, whereas specifying them as text causes the backend to
> convert to the correct int2 only at the start?

Basically.  The quoted one gets kept as unknown rather than prematurely
being made into an int4.

> I choose the smaller int because these are unlikely to be restrictive for this
> DB and I thought I may as well try and not waste space. Whether it used the
> same storage as int4 didn't really matter as I'd given it the opportunity to
> use less if it could. Is it worth me moving these to int4 type?

Probably unless the space is really going to be an issue.  Eventually
someone will come up with a way of dealing with it that doesn't involve
hardcoding type information and it'll get gixed.

> BTW, is this sort of feature documented anywhere or does it come into the 'what
> makes a person an expert' catagory?

Hmm, I'm not sure.  If it's not in the FAQ 4.8 it probably should be.



Re: Huge Performance Difference on Similar Query in Pg7.2

От
Masaru Sugawara
Дата:
On Fri, 22 Mar 2002 13:35:47 +0000 (GMT)
"Nigel J. Andrews" <nandrews@investsystems.co.uk> wrote:

> Show looking up in large table, selecting on partial primary key, uses
> sequential scan on large
>
> explain analyze select count(*) from chat_post cp where cp.session_id
> = 123;
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=20411.68..20411.68 rows=1 width=0) (actual
> time=31691.92..31691.93 rows=1 loops=1)
>   ->  Seq Scan on chat_post cp  (cost=0.00..20411.49 rows=77 width=0)
> (actual time=1736.29..31688.80 rows=321 loops=1)
> Total runtime: 31692.35 msec

Judging from the output of the EXPLAIN, queries--which include a chat_post
that is limited by session_id=123--always seem to use a sequential scan
on it.  On the other hand,  other queries--which include one that isn't--seem
to use an index scan on one.  Therefore, instead of session_id=123, you may
as well execute a series of your queries again with other conditions which
will use an index scan. But, this opinion  is not based on any real evidence.

BTW, even though the number of selected rows in a chat_post is  small,
aggregating and sorting time seem to be long.  if sort_mem is a default
value, before trying above,  you need to increase it  -- possibly 10 or 20 times.


Regards,
Masaru Sugawara



Re: Huge Performance Difference on Similar Query in Pg7.2

От
Tom Lane
Дата:
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> Using the numbers quoted yields use of the primary key. I am indeed using
> something other than int4, int2 in fact. So this is something to do with the
> using integer literals which are presumably first interpreted as int4 and then
> are converted in some long winded fashion, or something, to int2 for each and
> every test or row, whereas specifying them as text causes the backend to
> convert to the correct int2 only at the start?

No, they're not converted at all: if you write, say,
    int2var = 42
then the constant is immediately taken as int4, and "=" is resolved as
the int2-equals-int4 operator, which works fine but is not one of the
set of operators that the system knows how to use with an int2 index.

If you write
    int2var = '42'
then the '42' is initially treated as an unknown-type literal, and there
are resolution rules that will preferentially choose int2 to match
what's on the other side of the operator, whereupon "=" gets resolved
as the int2-equals-int2 operator, which is indexable.

To fix this and related problems we need to twiddle the ambiguity
resolution rules so that numeric constants can be given something other
than their "natural" datatype ... without breaking a lot of other cases
that work conveniently today.  See (many) past discussions on the
pghackers list.

            regards, tom lane