Re: Plan for relatively simple query seems to be very inefficient

От: Mischa
Тема: Re: Plan for relatively simple query seems to be very inefficient
Дата: ,
Msg-id: 1112812553.42542c091d8ce@webmail.telus.net
(см: обсуждение, исходный текст)
Ответ на: Plan for relatively simple query seems to be very inefficient  (Arjen van der Meijden)
Список: pgsql-performance

Скрыть дерево обсуждения

Plan for relatively simple query seems to be very inefficient  (Arjen van der Meijden, )
 Re: Plan for relatively simple query seems to be very inefficient  (Steve Atkins, )
  Re: Plan for relatively simple query seems to be very inefficient  (Arjen van der Meijden, )
 Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane, )
  Re: Plan for relatively simple query seems to be very inefficient  (Arjen van der Meijden, )
   Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane, )
    Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
     Re: [HACKERS] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  ("Jim C. Nasby", )
      Re: [HACKERS] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
       Re: [HACKERS] Recognizing range constraints (was Re: Plan  (John A Meinel, )
        Re: [HACKERS] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
       Re: [HACKERS] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  ("Jim C. Nasby", )
     Re: Recognizing range constraints (was Re: Plan for  (Simon Riggs, )
     Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Bruno Wolff III, )
      Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
       Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Mischa, )
        Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane, )
         Re: Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (, )
 Re: Plan for relatively simple query seems to be very inefficient  ("Dave Held", )
  Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane, )
  Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane, )
 Re: Plan for relatively simple query seems to be very inefficient  (Mischa, )

Quoting Arjen van der Meijden <>:

> Hi list,
>
> I noticed on a forum a query taking a surprisingly large amount of time
> in MySQL. Of course I wanted to prove PostgreSQL 8.0.1 could do it much
> better. To my surprise PostgreSQL was ten times worse on the same
> machine! And I don't understand why.
>
> I don't really need this query to be fast since I don't use it, but the
> range-thing is not really an uncommon query I suppose. So I'm wondering
> why it is so slow and this may point to a wrong plan being chosen or
> generated.
>
> Here are table definitions:
>
>          Table "public.postcodes"
>     Column    |     Type      | Modifiers
> -------------+---------------+-----------
>   postcode_id | smallint      | not null
>   range_from  | smallint      |
>   range_till  | smallint      |
> Indexes:
>      "postcodes_pkey" PRIMARY KEY, btree (postcode_id)
>      "range" UNIQUE, btree (range_from, range_till)
>
>     Table "public.data_main"
>   Column |   Type   | Modifiers
> --------+----------+-----------
>   userid | integer  | not null
>   range  | smallint |
> Indexes:
>      "data_main_pkey" PRIMARY KEY, btree (userid)
>
> And here's the query I ran:
>
> SELECT COUNT(*) FROM
> data_main AS dm,
> postcodes AS p
> WHERE dm.range BETWEEN p.range_from AND p.range_till

I just posted an answer to this (via webcafe webmail; can't recall which
pg-list), that might interest you.

BTree indexes as they stand (multi-column, ...) answer what most people need for
queries. Unfortunately, out-of-the-box, they have no good way of handling range
queries. To compensate, you can use a small amount of kinky SQL. This is in the
same line as the tricks used to implement hierarchic queries in relational SQL.

[1] Create a table "widths"(wid int) of powers of 2, up to what will just cover
max(range_till-range_from). Since your "range" column is a smallint, this table
can have no more than 15 rows. You can get as fussy as you want about keeping
this table to a minimum.

[2] Change postcodes:
    ALTER TABLE postcodes
       ADD wid INT USING 2 ^ CEIL(LOG(range_from - range_till,2));
    ALTER TABLE postcodes
       ADD start INT USING range_from - (range_from % wid);
    CREATE INDEX postcodes_wid_start_index ON (wid, start);
    ANALYZE postcodes;

[4] Write your query as:
    SELECT COUNT(*)
    FROM data_main AS dm
    CROSS JOIN widths -- yes, CROSS JOIN. For once, it HELPS performance.
    JOIN postcodes AS p
      ON dm.wid = widths.wid AND dm.start = p.range - p.range % widths.wid
    WHERE dm.range BETWEEN p.range_from AND p.range_till

This uses BTREE exact-match to make a tight restriction on which rows to check.
YMMV, but this has worked even for multi-M table joins.

--
"Dreams come true, not free."



В списке pgsql-performance по дате сообщения:

От: Alex Turner
Дата:
Сообщение: Re: How to improve db performance with $7K?
От: Alex Turner
Дата:
Сообщение: Re: How to improve db performance with $7K?