Re: Optimizer: limit not taken into account

Поиск
Список
Период
Сортировка
От Bruno Wolff III
Тема Re: Optimizer: limit not taken into account
Дата
Msg-id 20060517174426.GA14567@wolff.to
обсуждение исходный текст
Ответ на Optimizer: limit not taken into account  ("Craig A. James" <cjames@modgraph-usa.com>)
Ответы Re: Optimizer: limit not taken into account  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Please don't reply to previous messages to start new threads. This makes it
harder to find stuff in the archives and may keep people from noticing your
message.

On Wed, May 17, 2006 at 08:54:52 -0700,
  "Craig A. James" <cjames@modgraph-usa.com> wrote:
> Here's a "corner case" that might interest someone.  It tripped up one of
> our programmers.
>
> We have a table with > 10 million rows.  The ID column is indexed, the
> table has been vacuum/analyzed.  Compare these two queries:
>
>   select * from tbl where id >= 10000000 limit 1;
>   select * from tbl where id >= 10000000 order by id limit 1;
>
> The first takes 4 seconds, and uses a full table scan.  The second takes 32
> msec and uses the index.  Details are below.

I suspect it wasn't intended to be a full table scan. But rather a sequential
scan until it found a matching row. If the data in the table is ordered by
by id, this strategy may not work out well. Where as if the data is randomly
ordered, it would be expected to find a match quickly.

Have you analyzed the table recently? If the planner has bad stats on the
table, that is going to make it more likely to choose a bad plan.


> I understand why the planner makes the choices it does -- the "id >
> 10000000" isn't very selective and under normal circumstances a full table
> scan is probably the right choice.  But the "limit 1" apparently doesn't
> alter the planner's strategy at all.  We were surprised by this.
>
> Adding the "order by" was a simple solution.
>
> Craig
>
>
>
> pg=> explain analyze select url, url_digest from url_queue where priority
> >= 10000000 limit 1;
>                       QUERY PLAN
> ------------------------------------------------------------------------------------------
> Limit  (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117
> rows=1 loops=1)
>   ->  Seq Scan on url_queue  (cost=0.00..391254.35 rows=606176 width=108)
>   (actual time=4036.101..4036.101 rows=1 loops=1)
>         Filter: (priority >= 10000000)
> Total runtime: 4036.200 ms
> (4 rows)
>
> pg=> explain analyze select url, url_digest from url_queue where priority
> >= 10000000 order by priority limit 1;
>                       QUERY PLAN
> --------------------------------------------------------------------------------------
> Limit  (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448
> rows=1 loops=1)
>   ->  Index Scan using url_queue_priority on url_queue
>   (cost=0.00..1440200.41 rows=606176 width=112) (actual time=32.434..32.434
>   rows=1 loops=1)
>         Index Cond: (priority >= 10000000)
> Total runtime: 32.566 ms
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

В списке pgsql-performance по дате отправления:

Предыдущее
От: john_oshea@wordbank.com
Дата:
Сообщение: Re: SQL CPU time usage
Следующее
От: Chris Mckenzie
Дата:
Сообщение: Performance/Maintenance test result collection