Really dumb planner decision

Поиск
Список
Период
Сортировка
От Matthew Wakeling
Тема Really dumb planner decision
Дата
Msg-id alpine.DEB.2.00.0904151800310.4053@aragorn.flymine.org
обсуждение исходный текст
Ответы Re: Really dumb planner decision  (Grzegorz Jaśkiewicz <gryzman@gmail.com>)
Re: Really dumb planner decision  (Grzegorz Jaśkiewicz <gryzman@gmail.com>)
Список pgsql-performance
I have a query that is executed really badly by Postgres. It is a nine
table join, where two of the tables are represented in a view. If I remove
one of the tables from the query, then the query runs very quickly using a
completely different plan.

Here is the view:

release-16.0-preview-09-apr=# \d locatedsequencefeatureoverlappingfeatures
View "public.locatedsequencefeatureoverlappingfeatures"
          Column         |  Type   | Modifiers
------------------------+---------+-----------
  overlappingfeatures    | integer |
  locatedsequencefeature | integer |
View definition:
  SELECT l1.subjectid AS overlappingfeatures, l2.subjectid AS locatedsequencefeature
    FROM location l1, location l2
   WHERE l1.objectid = l2.objectid AND l1.subjectid <> l2.subjectid AND bioseg_create(l1.intermine_start,
l1.intermine_end)&& bioseg_create(l2.intermine_start, l2.intermine_end); 


Here is the query that works:

SELECT *
FROM
     gene AS a1_,
     intergenicregion AS a2_,
     regulatoryregion AS a3_,
     chromosome AS a4_,
     location AS a5_,
     LocatedSequenceFeatureOverlappingFeatures AS indirect0,
     BioEntitiesDataSets AS indirect1
WHERE
         a1_.id = 1267676
     AND a1_.upstreamIntergenicRegionId = a2_.id
     AND a2_.id = indirect0.LocatedSequenceFeature
     AND indirect0.OverlappingFeatures = a3_.id
     AND a3_.chromosomeid = a4_.id
     AND a3_.chromosomeLocationId = a5_.id
     AND a3_.id = indirect1.BioEntities

QUERY PLAN
-----------------------------------------------------------------
  Nested Loop  (cost=0.00..44.82 rows=1 width=787)
               (actual time=18.347..184.178 rows=105 loops=1)
    ->  Nested Loop
         (cost=0.00..44.54 rows=1 width=626)
         (actual time=18.329..182.837 rows=105 loops=1)
          ->  Nested Loop
               (cost=0.00..43.82 rows=1 width=561)
               (actual time=18.249..180.801 rows=105 loops=1)
                ->  Nested Loop
                     (cost=0.00..43.51 rows=1 width=380)
                     (actual time=10.123..178.471 rows=144 loops=1)
                      ->  Nested Loop
                           (cost=0.00..42.85 rows=1 width=372)
                           (actual time=0.854..31.446 rows=142 loops=1)
                            ->  Nested Loop
                                 (cost=0.00..38.57 rows=1 width=168)
                                 (actual time=0.838..29.505 rows=142 loops=1)
                                  Join Filter: ((l1.subjectid <> l2.subjectid) AND (l2.objectid = l1.objectid))
                                  ->  Nested Loop
                                       (cost=0.00..10.02 rows=1 width=176)
                                       (actual time=0.207..0.218 rows=1 loops=1)
                                        ->  Index Scan using gene_pkey on gene a1_
                                              (cost=0.00..4.29 rows=1 width=160)
                                              (actual time=0.107..0.110 rows=1 loops=1)
                                              Index Cond: (id = 1267676)
                                        ->  Index Scan using location__key_all on location l2
                                              (cost=0.00..5.70 rows=2 width=16)
                                              (actual time=0.090..0.093 rows=1 loops=1)
                                              Index Cond: (l2.subjectid = a1_.upstreamintergenicregionid)
                                  ->  Index Scan using location_bioseg on location l1
                                        (cost=0.00..12.89 rows=696 width=16)
                                        (actual time=0.095..26.458 rows=1237 loops=1)
                                        Index Cond: (bioseg_create(l1.intermine_start, l1.intermine_end) &&
bioseg_create(l2.intermine_start,l2.intermine_end)) 
                            ->  Index Scan using intergenicregion_pkey on intergenicregion a2_
                                  (cost=0.00..4.27 rows=1 width=204)
                                  (actual time=0.004..0.006 rows=1 loops=142)
                                  Index Cond: (a2_.id = a1_.upstreamintergenicregionid)
                      ->  Index Scan using bioentitiesdatasets__bioentities on bioentitiesdatasets indirect1
                            (cost=0.00..0.63 rows=2 width=8)
                            (actual time=1.026..1.028 rows=1 loops=142)
                            Index Cond: (indirect1.bioentities = l1.subjectid)
                ->  Index Scan using regulatoryregion_pkey on regulatoryregion a3_
                      (cost=0.00..0.29 rows=1 width=181)
                      (actual time=0.008..0.009 rows=1 loops=144)
                      Index Cond: (a3_.id = l1.subjectid)
          ->  Index Scan using location_pkey on location a5_
                (cost=0.00..0.71 rows=1 width=65)
                (actual time=0.010..0.012 rows=1 loops=105)
                Index Cond: (a5_.id = a3_.chromosomelocationid)
    ->  Index Scan using chromosome_pkey on chromosome a4_
          (cost=0.00..0.27 rows=1 width=161)
          (actual time=0.003..0.005 rows=1 loops=105)
          Index Cond: (a4_.id = a3_.chromosomeid)
  Total runtime: 184.596 ms
(25 rows)


Here is the query that does not work:

SELECT *
FROM
     gene AS a1_,
     intergenicregion AS a2_,
     regulatoryregion AS a3_,
     chromosome AS a4_,
     location AS a5_,
     dataset AS a6_,
     LocatedSequenceFeatureOverlappingFeatures AS indirect0,
     BioEntitiesDataSets AS indirect1
WHERE
         a1_.id = 1267676
     AND a1_.upstreamIntergenicRegionId = a2_.id
     AND a2_.id = indirect0.LocatedSequenceFeature
     AND indirect0.OverlappingFeatures = a3_.id
     AND a3_.chromosomeid = a4_.id
     AND a3_.chromosomeLocationId = a5_.id
     AND a3_.id = indirect1.BioEntities
     AND indirect1.DataSets = a6_.id

I just left this running overnight, and it hasn't completed an EXPLAIN
ANALYSE. It is basically the previous query (which returns 105 rows) with
another table attached on a primary key. Should be very quick.

QUERY PLAN
---------------------------------------------------------------------------
  Nested Loop
    (cost=0.21..49789788.95 rows=1 width=960)
    ->  Nested Loop
          (cost=0.21..49789788.67 rows=1 width=799)
          ->  Nested Loop
                (cost=0.21..49789787.94 rows=1 width=734)
                ->  Merge Join
                      (cost=0.21..49789787.64 rows=1 width=553)
                      Merge Cond: (a1_.upstreamintergenicregionid = a2_.id)
                      ->  Nested Loop
                            (cost=0.00..99575037.26 rows=2 width=349)
                            ->  Nested Loop
                                  (cost=0.00..99575036.70 rows=2 width=176)
                                  ->  Nested Loop
                                        (cost=0.00..99575036.05 rows=1 width=168)
                                        Join Filter: (a1_.upstreamintergenicregionid = l2.subjectid)
                                        ->  Index Scan using gene__upstreamintergenicregion on gene a1_
                                              (cost=0.00..6836.09 rows=1 width=160)
                                              Index Cond: (id = 1267676)
                                        ->  Nested Loop
                                              (cost=0.00..99507386.51 rows=4865076 width=8)
                                              Join Filter: ((l1.subjectid <> l2.subjectid) AND (l1.objectid =
l2.objectid))
                                              ->  Index Scan using location__key_all on location l1
                                                    (cost=0.00..158806.58 rows=3479953 width=16)
                                              ->  Index Scan using location_bioseg on location l2
                                                    (cost=0.00..12.89 rows=696 width=16)
                                                    Index Cond: (bioseg_create(l1.intermine_start, l1.intermine_end) &&
bioseg_create(l2.intermine_start,l2.intermine_end)) 
                                  ->  Index Scan using bioentitiesdatasets__bioentities on bioentitiesdatasets
indirect1
                                        (cost=0.00..0.63 rows=2 width=8)
                                        Index Cond: (indirect1.bioentities = l1.subjectid)
                            ->  Index Scan using dataset_pkey on dataset a6_
                                  (cost=0.00..0.27 rows=1 width=173)
                                  Index Cond: (a6_.id = indirect1.datasets)
                      ->  Index Scan using intergenicregion_pkey on intergenicregion a2_
                            (cost=0.00..2132.03 rows=54785 width=204)
                ->  Index Scan using regulatoryregion_pkey on regulatoryregion a3_
                      (cost=0.00..0.29 rows=1 width=181)
                      Index Cond: (a3_.id = l1.subjectid)
          ->  Index Scan using location_pkey on location a5_
                (cost=0.00..0.71 rows=1 width=65)
                Index Cond: (a5_.id = a3_.chromosomelocationid)
    ->  Index Scan using chromosome_pkey on chromosome a4_
          (cost=0.00..0.27 rows=1 width=161)
          Index Cond: (a4_.id = a3_.chromosomeid)
(27 rows)

I'm curious about two things - firstly why is it choosing such a dumb way
of joining l1 to l2, with a full index scan on l1, where it could use a
conditional index scan on l1 as with the working query? Secondly, why is
the merge join's cost half that of the nested loop inside it?

geqo threshold is set to 15, so this is not the genetic optimiser stuffing
up. Besides, it creates the same plan each time. The database is fully
analysed with a reasonably high statistics target. Here are all the
non-comment entries in postgresql.conf:

listen_addresses = '*'          # what IP address(es) to listen on;
max_connections = 300                   # (change requires restart)
shared_buffers = 500MB                  # min 128kB or max_connections*16kB
temp_buffers = 100MB                    # min 800kB
work_mem = 2000MB                               # min 64kB
maintenance_work_mem = 1600MB           # min 1MB
max_stack_depth = 9MB                   # min 100kB
max_fsm_pages = 204800                  # min max_fsm_relations*16, 6 bytes each
random_page_cost = 2.0                  # same scale as above
effective_cache_size = 23GB
geqo_threshold = 15
default_statistics_target = 500         # range 1-1000
log_destination = 'stderr'              # Valid values are combinations of
logging_collector = on                  # Enable capturing of stderr and csvlog
log_directory = 'pg_log'                # directory where log files are written,
log_truncate_on_rotation = on           # If on, an existing log file of the
log_rotation_age = 1d                   # Automatic rotation of logfiles will
log_rotation_size = 0                   # Automatic rotation of logfiles will
log_min_duration_statement = 0          # -1 is disabled, 0 logs all statements
log_duration = on
log_line_prefix = '%t '                 # special values:
log_statement = 'all'                   # none, ddl, mod, all
datestyle = 'iso, mdy'
lc_messages = 'C'                       # locale for system error message
lc_monetary = 'C'                       # locale for monetary formatting
lc_numeric = 'C'                        # locale for number formatting
lc_time = 'C'                           # locale for time formatting
default_text_search_config = 'pg_catalog.english'

Anything I can do to solve this?

Matthew

--
Surely the value of C++ is zero, but C's value is now 1?
  -- map36, commenting on the "No, C++ isn't equal to D. 'C' is undeclared
  [...] C++ should really be called 1" response to "C++ -- shouldn't it
  be called D?"

В списке pgsql-performance по дате отправления:

Предыдущее
От: Grzegorz Jaśkiewicz
Дата:
Сообщение: Re: Shouldn't the planner have a higher cost for reverse index scans?
Следующее
От: Grzegorz Jaśkiewicz
Дата:
Сообщение: Re: Really dumb planner decision