Re: Very specialised query

Список
Период
Сортировка
От Matthew Wakeling
Тема Re: Very specialised query
Дата
Msg-id alpine.DEB.2.00.0903271708080.21772@aragorn.flymine.org
обсуждение исходный текст
Ответ на Re: Very specialised query  (Віталій Тимчишин)
Ответы Re: Very specialised query  (Tom Lane)
Список pgsql-performance
Дерево обсуждения
Very specialised query  (Matthew Wakeling, )
 Re: Very specialised query  ("Kevin Grittner", )
 Re: Very specialised query  (Tom Lane, )
  Re: Very specialised query  (Matthew Wakeling, )
 Re: Very specialised query  (Matthew Wakeling, )
  Re: Very specialised query  (Tom Lane, )
 Re: Very specialised query  (Віталій Тимчишин, )
  Re: Very specialised query  (Matthew Wakeling, )
   Re: Very specialised query  (Tom Lane, )
    Re: Very specialised query  (Matthew Wakeling, )
    Re: Very specialised query  (Matthew Wakeling, )
     Re: Very specialised query  (Віталій Тимчишин, )
      Re: Very specialised query  (Matthew Wakeling, )
       Re: Very specialised query  (Віталій Тимчишин, )
        Re: Very specialised query  (Matthew Wakeling, )
 Re: Very specialised query  (Dimitri Fontaine, )
  Re: Very specialised query  (Matthew Wakeling, )
 Re: Very specialised query  ("Marc Mamin", )
  Re: Very specialised query  (Matthew Wakeling, )
   Re: Very specialised query  ("Marc Mamin", )
    Re: Very specialised query  (Matthew Wakeling, )
 Re: Very specialised query  (Matthew Wakeling, )
  Re: Very specialised query  (Віталій Тимчишин, )
   Re: Very specialised query  (Matthew Wakeling, )
    Re: Very specialised query  (Matthew Wakeling, )
     Re: Very specialised query  (Matthew Wakeling, )
      Re: Very specialised query  (Craig Ringer, )
 Re: Very specialised query  ("Marc Mamin", )
  Re: Very specialised query  (Matthew Wakeling, )
On Fri, 27 Mar 2009, Віталій Тимчишин wrote:
> ...an index on (objectid, start) would help...

Definitely.

> You could try  adding    "AND l2.start > l1.start" to the first query. 
> This will drop symmetric half of intersections (the ones that will
> remain are l2 inside or to the left of l1), but you can redo results by
> id1,id2 union all id2, id1 and may allow to use start index for
> "between"

That's remarkably clever, and should have been obvious. Thanks.

Adding the constraint as you say does indeed make the query fast. However,
there seems to be a problem with the planner, in that it does not
distinguish between the costs of two alternative plans, which have vastly
different real costs. Consider the following:

SELECT
     l1.id AS id1,
     l2.id AS id2
FROM
     location l1,
     location l2
WHERE
         l1.objectid = 228000093
     AND l2.objectid = 228000093
     AND l1.id <> l2.id
     AND l1.start < l2.end
     AND l1.end > l2.start
     AND l1.start < l2.start
UNION ALL
SELECT
     l1.id AS id1, l2.id AS id2
FROM
     location l1,
     location l2
WHERE
         l1.objectid = 228000093
     AND l2.objectid = 228000093
     AND l1.id <> l2.id
     AND l1.start < l2.end
     AND l1.end > l2.start
     AND l1.start >= l2.start;
                                                                     QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
  Append  (cost=0.00..20479179.74 rows=138732882 width=8)
    ->  Nested Loop  (cost=0.00..9545925.46 rows=69366441 width=8)
          Join Filter: ((l1.id <> l2.id) AND (l1.start < l2.end))
          ->  Index Scan using location_test_obj_end on location l1  (cost=0.00..55966.07 rows=43277 width=12)
                Index Cond: (objectid = 228000093)
          ->  Index Scan using location_test_obj_start on location l2  (cost=0.00..123.10 rows=4809 width=12)
                Index Cond: ((l2.objectid = 228000093) AND (l1.end > l2.start) AND (l1.start < l2.start))
    ->  Nested Loop  (cost=0.00..9545925.46 rows=69366441 width=8)
          Join Filter: ((l1.id <> l2.id) AND (l1.start < l2.end))
          ->  Index Scan using location_test_obj_end on location l1  (cost=0.00..55966.07 rows=43277 width=12)
                Index Cond: (objectid = 228000093)
          ->  Index Scan using location_test_obj_start on location l2  (cost=0.00..123.10 rows=4809 width=12)
                Index Cond: ((l2.objectid = 228000093) AND (l1.end > l2.start) AND (l1.start >= l2.start))
(13 rows)


Notice the two different index conditions:
     (l1.end > l2.start) AND (l1.start < l2.start)  - "between"
     (l1.end > l2.start) AND (l1.start >= l2.start) - open-ended
Both have a cost of (cost=0.00..123.10 rows=4809 width=12)

Postgres estimates these two index scans to be equivalent in cost, where
they are actually vastly different in real cost. Shouldn't Postgres favour
a "between" index scan over an open-ended one?

Matthew

--
 [About NP-completeness] These are the problems that make efficient use of
 the Fairy Godmother.                    -- Computer Science Lecturer

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Marc Mamin"
Дата:
Сообщение: Re: Very specialised query
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Proposal of tunable fix for scalability of 8.4