Re: slow IN() clause for many cases

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема Re: slow IN() clause for many cases
Дата
Msg-id 20051016163328.GF5779@svana.org
обсуждение исходный текст
Ответ на Re: slow IN() clause for many cases  (Greg Stark <gsstark@mit.edu>)
Ответы Re: slow IN() clause for many cases  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sun, Oct 16, 2005 at 12:03:33PM -0400, Greg Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> > Another point here is that LIMIT without any ORDER BY isn't an amazingly
> > useful case.  Neither the old implementation of OR indexscans nor the
> > new can produce ordered output, which means you're not going to get
> > fast-start behavior anyway for real queries.
>
> That's true. That's why I was wondering more about cases where the client end
> was going to read all the records until it found the record it's looking for
> or found enough records for its purposes.

I would argue that the client should simply ask for what it wants
rather than filtering on the client end. Then PostgreSQL has the info
to optimise appropriately.

> There are other uses of fast-start as well. If the records are going to be put
> through a slow process it can be more efficient to pipeline them through while
> the later records are still coming from the database.

Well, IIRC if you create a cursor or use LIMIT, PostgreSQL will prefer
fast-start plans if that gets the requested result faster.

> The example above raises another idea though. Would it be possible for the
> optimizer to recognize when a clause is so expansive that it would be faster
> to read the complement than the actual clause as written?
>
> That could be useful with bitmap operations if it meant less time reading the
> index to build the bitmap but the eventual result after the bitmap operations
> is restrictive enough to justify an index scan.

The problem is, the bitmaps are lossy. If so many matches indicate
about the same area in the table, they are coalesced into a single
block. Hence, you cannot meaningfully ask for a complement.

However, if you knew from the beginning that you wanted the complement
you may be able to arrange something. OTOH, there's no way to tell from
the index if it's referred to *all* the tuples in a page so there is no
way to exclude a page from consideration.

> eg a case where 90% of users are active, and 10% have pending set and there's
> an index on each of these:
>
> WHERE user_status = 'active' AND pending = true
>
> If the optimizer tries to a bitmap index scan now it has to read in a huge
> index; it's probably better off just using the pending index alone. But if it
> realizes that it can use the index to find the tuples that *aren't* active and
> then take the complement of that bitmap it could build a bitmap reading in
> only 10% of that index.

As I pointed out above, I don't think you can take the complement of a
bitmap. Besides, I havn't looked at the costings for bitmap scans but
ISTM that if the stats indicate a 90% coverage for user_status and the
index is not highly correlated, it should exclude that index from
consideration altogether. And whether it uses a normal index scan or a
bitmap scan for pending also depends on the stats. It's the comparison
between scanning the whole index and then the pages in sequence versus
just grabbing the tuples from the index.

Besides, the case you provide is the perfect candidate for a partial
index. Attributes that only apply to a small fraction of the table are
better off as predicates if you're going to be searching for them a
lot.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Magnus Hagander"
Дата:
Сообщение: Re: Advice needed concerning Win32 signals
Следующее
От: Thomas Hallgren
Дата:
Сообщение: Re: Advice needed concerning Win32 signals