Plan for relatively simple query seems to be very inefficient

Список
Период
Сортировка
От Arjen van der Meijden
Тема Plan for relatively simple query seems to be very inefficient
Дата
Msg-id 425413D3.5030304@vulcanus.its.tudelft.nl
обсуждение исходный текст
Ответы Re: Plan for relatively simple query seems to be very inefficient  (Steve Atkins)
Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane)
Re: Plan for relatively simple query seems to be very inefficient  (Mischa)
Список pgsql-performance
Дерево обсуждения
Plan for relatively simple query seems to be very inefficient  (Arjen van der Meijden, )
 Re: Plan for relatively simple query seems to be very inefficient  (Steve Atkins, )
  Re: Plan for relatively simple query seems to be very inefficient  (Arjen van der Meijden, )
 Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane, )
  Re: Plan for relatively simple query seems to be very inefficient  (Arjen van der Meijden, )
   Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane, )
    Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
     Re: [HACKERS] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  ("Jim C. Nasby", )
      Re: [HACKERS] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
       Re: [HACKERS] Recognizing range constraints (was Re: Plan  (John A Meinel, )
        Re: [HACKERS] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
       Re: [HACKERS] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  ("Jim C. Nasby", )
     Re: Recognizing range constraints (was Re: Plan for  (Simon Riggs, )
     Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Bruno Wolff III, )
      Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
       Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Mischa, )
        Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
         Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (, )
 Re: Plan for relatively simple query seems to be very inefficient  ("Dave Held", )
  Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane, )
  Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane, )
 Re: Plan for relatively simple query seems to be very inefficient  (Mischa, )
Hi list,

I noticed on a forum a query taking a surprisingly large amount of time
in MySQL. Of course I wanted to prove PostgreSQL 8.0.1 could do it much
better. To my surprise PostgreSQL was ten times worse on the same
machine! And I don't understand why.

I don't really need this query to be fast since I don't use it, but the
range-thing is not really an uncommon query I suppose. So I'm wondering
why it is so slow and this may point to a wrong plan being chosen or
generated.

Here are table definitions:

         Table "public.postcodes"
    Column    |     Type      | Modifiers
-------------+---------------+-----------
  postcode_id | smallint      | not null
  range_from  | smallint      |
  range_till  | smallint      |
Indexes:
     "postcodes_pkey" PRIMARY KEY, btree (postcode_id)
     "range" UNIQUE, btree (range_from, range_till)

    Table "public.data_main"
  Column |   Type   | Modifiers
--------+----------+-----------
  userid | integer  | not null
  range  | smallint |
Indexes:
     "data_main_pkey" PRIMARY KEY, btree (userid)

And here's the query I ran:

SELECT COUNT(*) FROM
data_main AS dm,
postcodes AS p
WHERE dm.range BETWEEN p.range_from AND p.range_till
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=332586.85..332586.85 rows=1 width=0) (actual
time=22712.038..22712.039 rows=1 loops=1)
    ->  Nested Loop  (cost=3.76..328945.96 rows=1456356 width=0) (actual
time=0.054..22600.826 rows=82688 loops=1)
          Join Filter: (("outer".range >= "inner".range_from) AND
("outer".range <= "inner".range_till))
          ->  Seq Scan on data_main dm  (cost=0.00..1262.20 rows=81920
width=2) (actual time=0.020..136.930 rows=81920 loops=1)
          ->  Materialize  (cost=3.76..5.36 rows=160 width=4) (actual
time=0.001..0.099 rows=160 loops=81920)
                ->  Seq Scan on postcodes p  (cost=0.00..3.60 rows=160
width=4) (actual time=0.010..0.396 rows=160 loops=1)
  Total runtime: 22712.211 ms


When I do something completely bogus, which will result in coupling the
data per record from data_main on one record from postcodes, it still
not very fast but acceptable:

SELECT COUNT(*) FROM
data_main AS dm,
postcodes AS p
WHERE dm.range / 10 = p.postcode_id

                                                                  QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=10076.98..10076.98 rows=1 width=0) (actual
time=1456.016..1456.017 rows=1 loops=1)
    ->  Merge Join  (cost=8636.81..9913.13 rows=65537 width=0) (actual
time=1058.105..1358.571 rows=81920 loops=1)
          Merge Cond: ("outer".postcode_id = "inner"."?column2?")
          ->  Index Scan using postcodes_pkey on postcodes p
(cost=0.00..5.76 rows=160 width=2) (actual time=0.034..0.507 rows=160
loops=1)
          ->  Sort  (cost=8636.81..8841.61 rows=81920 width=2) (actual
time=1057.698..1169.879 rows=81920 loops=1)
                Sort Key: (dm.range / 10)
                ->  Seq Scan on data_main dm  (cost=0.00..1262.20
rows=81920 width=2) (actual time=0.020..238.886 rows=81920 loops=1)
  Total runtime: 1461.156 ms


Doing something similarily bogus, but with less results is much faster,
even though it should have basically the same plan:

SELECT COUNT(*) FROM
data_main AS dm,
postcodes AS p
WHERE dm.range  = p.postcode_id

                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=2138.63..2138.63 rows=1 width=0) (actual
time=180.667..180.668 rows=1 loops=1)
    ->  Hash Join  (cost=4.00..2087.02 rows=20642 width=0) (actual
time=180.645..180.645 rows=0 loops=1)
          Hash Cond: ("outer".range = "inner".postcode_id)
          ->  Seq Scan on data_main dm  (cost=0.00..1262.20 rows=81920
width=2) (actual time=0.005..105.548 rows=81920 loops=1)
          ->  Hash  (cost=3.60..3.60 rows=160 width=2) (actual
time=0.592..0.592 rows=0 loops=1)
                ->  Seq Scan on postcodes p  (cost=0.00..3.60 rows=160
width=2) (actual time=0.025..0.349 rows=160 loops=1)
  Total runtime: 180.807 ms
(7 rows)

If you like to toy around with the datasets on your heavily optimized
postgresql-installs, let me know. The data is just generated for
testing-purposes and I'd happily send a copy to anyone interested.

Best regards,

Arjen van der Meijden

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: How to improve db performance with $7K?
Следующее
От: John A Meinel
Дата:
Сообщение: Re: [HACKERS] Recognizing range constraints (was Re: Plan