Обсуждение: planner picking more expensive plan

Поиск
Список
Период
Сортировка

planner picking more expensive plan

От
Sam Mason
Дата:
Hi,

I've just been referred here after a conversion on IRC and everybody
seemed to think I've stumbled upon some strangeness.

The planner (in PG version 8.0.2) is choosing what it thinks is a more
expensive plan.  I've got a table of animals (about 3M rows) and their
movements (about 16M rows), and I'm trying to execute this query:

  SELECT a.birthlocnid, m.locnid
  FROM animals a
    LEFT JOIN movements m ON (a.animalid = m.animalid AND m.mtypeid=0)
  LIMIT 10;

If I have "work_mem" set to something small (1000) it uses this plan:

    QUERY PLAN

 Limit  (cost=0.00..202.52 rows=10 width=8) (actual time=0.221..0.600 rows=10 loops=1)
   ->  Merge Left Join  (cost=0.00..66888828.30 rows=3302780 width=8) (actual time=0.211..0.576 rows=10 loops=1)
         Merge Cond: ("outer".animalid = "inner".animalid)
         ->  Index Scan using animals_pkey on animals a  (cost=0.00..10198983.91 rows=3302780 width=8) (actual
time=0.112..0.276rows=10 loops=1) 
         ->  Index Scan using movement_animal on movements m  (cost=0.00..56642740.73 rows=3107737 width=8) (actual
time=0.088..0.235rows=10 loops=1) 
               Filter: (mtypeid = 0)
 Total runtime: 0.413 ms

But if I increase "work_mem" to 10000 it uses this plan:

    QUERY PLAN

 Limit  (cost=565969.42..566141.09 rows=10 width=8) (actual time=27769.047..27769.246 rows=10 loops=1)
   ->  Merge Right Join  (cost=565969.42..57264070.77 rows=3302780 width=8) (actual time=27769.043..27769.228 rows=10
loops=1)
         Merge Cond: ("outer".animalid = "inner".animalid)
         ->  Index Scan using movement_animal on movements m  (cost=0.00..56642740.73 rows=3107737 width=8) (actual
time=0.022..0.154rows=10 loops=1) 
               Filter: (mtypeid = 0)
         ->  Sort  (cost=565969.42..574226.37 rows=3302780 width=8) (actual time=27768.991..27769.001 rows=10 loops=1)
               Sort Key: a.animalid
               ->  Seq Scan on animals a  (cost=0.00..77086.80 rows=3302780 width=8) (actual time=0.039..5620.651
rows=3303418loops=1) 
 Total runtime: 27851.097 ms


I've tried playing with the statistics as people suggested on IRC but to
no effect.  There was some discussion about why it would be doing this,
but nothing obvious came out of it.

SHOW ALL output is at the end of this mail but it should be pretty
standard apart from:

  shared_buffers = 10000
  work_mem = 8192
  max_connections = 100
  effective_cache_size = 10000

Hope that's enough information to be useful.

Thanks.

   Sam


              name              |            setting
--------------------------------+--------------------------------
 add_missing_from               | on
 archive_command                | /home/postgres/pgarchive "%p"
 australian_timezones           | off
 authentication_timeout         | 60
 bgwriter_delay                 | 200
 bgwriter_maxpages              | 100
 bgwriter_percent               | 1
 block_size                     | 8192
 check_function_bodies          | on
 checkpoint_segments            | 3
 checkpoint_timeout             | 300
 checkpoint_warning             | 30
 client_encoding                | SQL_ASCII
 client_min_messages            | notice
 commit_delay                   | 0
 commit_siblings                | 5
 config_file                    | /home/pgdata/postgresql.conf
 cpu_index_tuple_cost           | 0.001
 cpu_operator_cost              | 0.0025
 cpu_tuple_cost                 | 0.01
 custom_variable_classes        | unset
 data_directory                 | /home/pgdata
 DateStyle                      | ISO, MDY
 db_user_namespace              | off
 deadlock_timeout               | 1000
 debug_pretty_print             | off
 debug_print_parse              | off
 debug_print_plan               | off
 debug_print_rewritten          | off
 debug_shared_buffers           | 0
 default_statistics_target      | 10
 default_tablespace             | unset
 default_transaction_isolation  | read committed
 default_transaction_read_only  | off
 default_with_oids              | on
 dynamic_library_path           | $libdir
 effective_cache_size           | 10000
 enable_hashagg                 | on
 enable_hashjoin                | on
 enable_indexscan               | on
 enable_mergejoin               | on
 enable_nestloop                | on
 enable_seqscan                 | off
 enable_sort                    | on
 enable_tidscan                 | on
 explain_pretty_print           | on
 external_pid_file              | unset
 extra_float_digits             | 0
 from_collapse_limit            | 8
 fsync                          | on
 geqo                           | on
 geqo_effort                    | 5
 geqo_generations               | 0
 geqo_pool_size                 | 0
 geqo_selection_bias            | 2
 geqo_threshold                 | 12
 hba_file                       | /home/pgdata/pg_hba.conf
 ident_file                     | /home/pgdata/pg_ident.conf
 integer_datetimes              | off
 join_collapse_limit            | 8
 krb_server_keyfile             | unset
 lc_collate                     | C
 lc_ctype                       | C
 lc_messages                    | C
 lc_monetary                    | C
 lc_numeric                     | C
 lc_time                        | C
 listen_addresses               | *
 log_connections                | on
 log_destination                | stderr
 log_directory                  | pg_log
 log_disconnections             | off
 log_duration                   | off
 log_error_verbosity            | default
 log_executor_stats             | off
 log_filename                   | postgresql-%Y-%m-%d_%H%M%S.log
 log_hostname                   | off
 log_line_prefix                | %t %u
 log_min_duration_statement     | -1
 log_min_error_statement        | panic
 log_min_messages               | notice
 log_parser_stats               | off
 log_planner_stats              | off
 log_rotation_age               | 1440
 log_rotation_size              | 10240
 log_statement                  | all
 log_statement_stats            | off
 log_truncate_on_rotation       | off
 maintenance_work_mem           | 256000
 max_connections                | 100
 max_files_per_process          | 1000
 max_fsm_pages                  | 20000
 max_fsm_relations              | 1000
 max_function_args              | 32
 max_identifier_length          | 63
 max_index_keys                 | 32
 max_locks_per_transaction      | 64
 max_stack_depth                | 2048
 password_encryption            | on
 port                           | 5432
 pre_auth_delay                 | 0
 preload_libraries              | unset
 random_page_cost               | 4
 redirect_stderr                | off
 regex_flavor                   | advanced
 rendezvous_name                | unset
 search_path                    | $user,public
 server_encoding                | SQL_ASCII
 server_version                 | 8.0.2
 shared_buffers                 | 1000
 silent_mode                    | off
 sql_inheritance                | on
 ssl                            | off
 statement_timeout              | 0
 stats_block_level              | off
 stats_command_string           | off
 stats_reset_on_server_start    | on
 stats_row_level                | off
 stats_start_collector          | on
 superuser_reserved_connections | 2
 syslog_facility                | LOCAL0
 syslog_ident                   | postgres
 TimeZone                       | GMT
 trace_notify                   | off
 transaction_isolation          | read committed
 transaction_read_only          | off
 transform_null_equals          | off
 unix_socket_directory          | unset
 unix_socket_group              | unset
 unix_socket_permissions        | 511
 vacuum_cost_delay              | 0
 vacuum_cost_limit              | 200
 vacuum_cost_page_dirty         | 20
 vacuum_cost_page_hit           | 1
 vacuum_cost_page_miss          | 10
 wal_buffers                    | 8
 wal_sync_method                | fdatasync
 work_mem                       | 128000
 zero_damaged_pages             | off


Re: planner picking more expensive plan

От
Tom Lane
Дата:
Sam Mason <sam@samason.me.uk> writes:
> The planner (in PG version 8.0.2) is choosing what it thinks is a more
> expensive plan.

I fooled around trying to duplicate this behavior, without success.
Can you create a self-contained test case?

            regards, tom lane

Re: planner picking more expensive plan

От
Sam Mason
Дата:
Tom Lane wrote:
>I fooled around trying to duplicate this behavior, without success.
>Can you create a self-contained test case?

I'll try and see if I can put something together, it's probably
going to be early next week though.  I wont be able to give you our
data, so I'll be a bit of a headscratching exercise generating
something that'll provoke the same behaviour.

Not sure if it'll help, but here's what the database schema looks
like at the moment:

             Table "public.animals"
   Column    |         Type          | Modifiers
-------------+-----------------------+-----------
 animalid    | integer               | not null
 sex         | character(1)          | not null
 dob         | date                  | not null
 birthlocnid | integer               |
 breedid     | character varying(8)  |
 eartag_1    | character varying(20) |
 eartag_2    | character varying(20) |
 eartag_3    | character varying(20) |
Indexes:
    "animals_pkey" primary key, btree (animalid)
    "animal_birthlocn" btree (birthlocnid)
    "animal_breed" btree (breedid)
    "animal_eartag" btree (eartag_1)
Check constraints:
    "animal_sex" CHECK (sex = 'M'::bpchar OR sex = 'F'::bpchar)

    Table "public.movements"
  Column  |  Type   | Modifiers
----------+---------+-----------
 locnid   | integer | not null
 animalid | integer | not null
 movedate | date    | not null
 mtypeid  | integer | not null
Indexes:
    "movement_animal" btree (animalid)
    "movement_location" btree (locnid)
    "movement_movedate" btree (movedate)
    "movement_movetype" btree (mtypeid)
Foreign-key constraints:
    "movement_location" FOREIGN KEY (locnid) REFERENCES locations(locnid)
    "movement_animal" FOREIGN KEY (animalid) REFERENCES animals(animalid)
    "movement_type" FOREIGN KEY (mtypeid) REFERENCES k_movement_type(mtypeid)

          Table "public.locations"
 Column |         Type          | Modifiers
--------+-----------------------+-----------
 locnid | integer               | not null
 ptype  | character varying(8)  |
 ltype  | character varying(8)  | not null
 cph    | character varying(20) |
 unk    | integer               |
Indexes:
    "locations_pkey" primary key, btree (locnid)
    "location_cph" btree (cph)
    "location_ltype" btree (ltype)
    "location_ptype" btree (ptype)
Foreign-key constraints:
    "location_ptype" FOREIGN KEY (ptype) REFERENCES k_premise_type(ptypeid)
    "location_ltype" FOREIGN KEY (ltype) REFERENCES k_location_type(ltypeid)

As I said, animals contains about 3M rows, movements about 16M rows
and locations about 80K rows.  There are about 3 to 8 rows for each
and every animal in the movements table, with at most one entry of
mtypeid=0 for each animal (95% of the animals have an entry).

Not sure if that's going to help making some demo data.  It's just
that it took quite a while loading it all here, so coming up with
some code to make demo data may take a while.


Thanks!

  Sam

Re: planner picking more expensive plan

От
John A Meinel
Дата:
Sam Mason wrote:

>Hi,
>
>I've just been referred here after a conversion on IRC and everybody
>seemed to think I've stumbled upon some strangeness.
>
>The planner (in PG version 8.0.2) is choosing what it thinks is a more
>expensive plan.  I've got a table of animals (about 3M rows) and their
>movements (about 16M rows), and I'm trying to execute this query:
>
>  SELECT a.birthlocnid, m.locnid
>  FROM animals a
>    LEFT JOIN movements m ON (a.animalid = m.animalid AND m.mtypeid=0)
>  LIMIT 10;
>
>
>
Why are you using LIMIT without having an ORDER BY?
What are actually trying to get out of this query? Is it just trying to
determine where the 'home' locations are?
It just seems like this query isn't very useful. As it doesn't restrict
by animal id, and it just gets 10 randomly selected animals where
m.mtypeid=0.
And why a LEFT JOIN instead of a normal join?
Anyway, the general constraints you are applying seem kind of confusing.
What happens if you change the plan to:

  SELECT a.birthlocnid, m.locnid
  FROM animals a
    LEFT JOIN movements m ON (a.animalid = m.animalid AND m.mtypeid=0)
  ORDER BY a.animalid LIMIT 10;


I would guess that this would help the planner realize it should try to
use an index, since it can realize that it wants only a few rows by
a.animalid in order.
Though I also recognize that you aren't returning a.animalid so you
don't really know which animals you are returning.

I get the feeling you are trying to ask something like "do animals stay
at their birth location", or at least "how are animals moving around". I
don't know what m.typeid = 0 means, but I'm guessing it is something
like where their home is.

Anyway, I would say you need to put a little bit more restriction in, so
the planner can figure out how to get only 10 rows.

John
=:->

>If I have "work_mem" set to something small (1000) it uses this plan:
>
>    QUERY PLAN
>
> Limit  (cost=0.00..202.52 rows=10 width=8) (actual time=0.221..0.600 rows=10 loops=1)
>   ->  Merge Left Join  (cost=0.00..66888828.30 rows=3302780 width=8) (actual time=0.211..0.576 rows=10 loops=1)
>         Merge Cond: ("outer".animalid = "inner".animalid)
>         ->  Index Scan using animals_pkey on animals a  (cost=0.00..10198983.91 rows=3302780 width=8) (actual
time=0.112..0.276rows=10 loops=1) 
>         ->  Index Scan using movement_animal on movements m  (cost=0.00..56642740.73 rows=3107737 width=8) (actual
time=0.088..0.235rows=10 loops=1) 
>               Filter: (mtypeid = 0)
> Total runtime: 0.413 ms
>
>But if I increase "work_mem" to 10000 it uses this plan:
>
>    QUERY PLAN
>
> Limit  (cost=565969.42..566141.09 rows=10 width=8) (actual time=27769.047..27769.246 rows=10 loops=1)
>   ->  Merge Right Join  (cost=565969.42..57264070.77 rows=3302780 width=8) (actual time=27769.043..27769.228 rows=10
loops=1)
>         Merge Cond: ("outer".animalid = "inner".animalid)
>         ->  Index Scan using movement_animal on movements m  (cost=0.00..56642740.73 rows=3107737 width=8) (actual
time=0.022..0.154rows=10 loops=1) 
>               Filter: (mtypeid = 0)
>         ->  Sort  (cost=565969.42..574226.37 rows=3302780 width=8) (actual time=27768.991..27769.001 rows=10 loops=1)
>               Sort Key: a.animalid
>               ->  Seq Scan on animals a  (cost=0.00..77086.80 rows=3302780 width=8) (actual time=0.039..5620.651
rows=3303418loops=1) 
> Total runtime: 27851.097 ms
>
>
>I've tried playing with the statistics as people suggested on IRC but to
>no effect.  There was some discussion about why it would be doing this,
>but nothing obvious came out of it.
>
>SHOW ALL output is at the end of this mail but it should be pretty
>standard apart from:
>
>  shared_buffers = 10000
>  work_mem = 8192
>  max_connections = 100
>  effective_cache_size = 10000
>
>Hope that's enough information to be useful.
>
>Thanks.
>
>   Sam
>


Вложения

Re: planner picking more expensive plan

От
Sam Mason
Дата:
John A Meinel wrote:
>Why are you using LIMIT without having an ORDER BY?

I'm just exploring the data, trying to figure out what it's like.

>It just seems like this query isn't very useful. As it doesn't restrict
>by animal id, and it just gets 10 randomly selected animals where
>m.mtypeid=0.

Yup, that's the point.  Check to see if the animals were born where
they say they were.  The data's come from an external source and
I'm just trying to figure out how good it is before I do too much
with it

>And why a LEFT JOIN instead of a normal join?

I'm not sure if some animals will have missing data!

>Anyway, the general constraints you are applying seem kind of confusing.

This was a slightly cut down query in an attempt to reduce general
confusion -- I guess I failed.  Sorry!

>I would guess that this would help the planner realize it should try to
>use an index, since it can realize that it wants only a few rows by
>a.animalid in order.

This seems to work the appropiate magic.  It always seems to prefer
index scans now.

The real point of asking this question orignally was to find out
why the planner was choosing a more expensive plan over a cheaper
one.  When I discovered this orignally I was disabling seqscan and
then it picked the correct version.  The actual work_mem didn't
change when I did this, it just picked the correct plan.  I discovered
the work_mem parameter fiddle later.  I think I forgot to mention
that in the original email though!


  Sam

Re: planner picking more expensive plan

От
Dennis Bjorklund
Дата:
On Fri, 1 Jul 2005, Sam Mason wrote:

The key thing with the query that Sam have is that if you turn off seqscan
you get the first plan that run in 0.4ms and if seqscan is on the runtime
is 27851ms.

There are 100 way to make it select the seq scan, including rewriting the
query to something more useful, tweaking different parameters and so on.

The interesting part is that pg give the fast plan a cost of 202 and the
slow a cost of 566141, but still it chooses the slow query unless seqscan
is turned off (or some other tweak with the same effect). It know very
well that the plan with the index scan will be much faster, it just don't
manage to generate it unless you force it to.

It makes you wonder if pg throws away some plans too early in the planning
phase.

>  Limit  (cost=0.00..202.52 rows=10 width=8) (actual time=0.221..0.600 rows=10 loops=1)
>    ->  Merge Left Join  (cost=0.00..66888828.30 rows=3302780 width=8) (actual time=0.211..0.576 rows=10 loops=1)
>          Merge Cond: ("outer".animalid = "inner".animalid)
>          ->  Index Scan using animals_pkey on animals a  (cost=0.00..10198983.91 rows=3302780 width=8) (actual
time=0.112..0.276rows=10 loops=1) 
>          ->  Index Scan using movement_animal on movements m  (cost=0.00..56642740.73 rows=3107737 width=8) (actual
time=0.088..0.235rows=10 loops=1) 
>                Filter: (mtypeid = 0)
>  Total runtime: 0.413 ms
>
>  Limit  (cost=565969.42..566141.09 rows=10 width=8) (actual time=27769.047..27769.246 rows=10 loops=1)
>    ->  Merge Right Join  (cost=565969.42..57264070.77 rows=3302780 width=8) (actual time=27769.043..27769.228 rows=10
loops=1)
>          Merge Cond: ("outer".animalid = "inner".animalid)
>          ->  Index Scan using movement_animal on movements m  (cost=0.00..56642740.73 rows=3107737 width=8) (actual
time=0.022..0.154rows=10 loops=1) 
>                Filter: (mtypeid = 0)
>          ->  Sort  (cost=565969.42..574226.37 rows=3302780 width=8) (actual time=27768.991..27769.001 rows=10
loops=1)
>                Sort Key: a.animalid
>                ->  Seq Scan on animals a  (cost=0.00..77086.80 rows=3302780 width=8) (actual time=0.039..5620.651
rows=3303418loops=1) 
>  Total runtime: 27851.097 ms


Another thing to notice is that if one remove the Limit node then the
situation is reversed and the plan that pg choose (with the Limit node) is
the one with the lowest cost. The startup cost is however very high so
combining that Merge Join with a Limit will of course produce something
slow compared to the upper plan where the startup cost is 0.0.

A stand alone test case would be nice, but even without the above plans
are interesting.

--
/Dennis Björklund