Re: slow IN() clause for many cases
От | Martijn van Oosterhout |
---|---|
Тема | Re: slow IN() clause for many cases |
Дата | |
Msg-id | 20051016163328.GF5779@svana.org обсуждение исходный текст |
Ответ на | Re: slow IN() clause for many cases (Greg Stark <gsstark@mit.edu>) |
Ответы |
Re: slow IN() clause for many cases
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
On Sun, Oct 16, 2005 at 12:03:33PM -0400, Greg Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > Another point here is that LIMIT without any ORDER BY isn't an amazingly > > useful case. Neither the old implementation of OR indexscans nor the > > new can produce ordered output, which means you're not going to get > > fast-start behavior anyway for real queries. > > That's true. That's why I was wondering more about cases where the client end > was going to read all the records until it found the record it's looking for > or found enough records for its purposes. I would argue that the client should simply ask for what it wants rather than filtering on the client end. Then PostgreSQL has the info to optimise appropriately. > There are other uses of fast-start as well. If the records are going to be put > through a slow process it can be more efficient to pipeline them through while > the later records are still coming from the database. Well, IIRC if you create a cursor or use LIMIT, PostgreSQL will prefer fast-start plans if that gets the requested result faster. > The example above raises another idea though. Would it be possible for the > optimizer to recognize when a clause is so expansive that it would be faster > to read the complement than the actual clause as written? > > That could be useful with bitmap operations if it meant less time reading the > index to build the bitmap but the eventual result after the bitmap operations > is restrictive enough to justify an index scan. The problem is, the bitmaps are lossy. If so many matches indicate about the same area in the table, they are coalesced into a single block. Hence, you cannot meaningfully ask for a complement. However, if you knew from the beginning that you wanted the complement you may be able to arrange something. OTOH, there's no way to tell from the index if it's referred to *all* the tuples in a page so there is no way to exclude a page from consideration. > eg a case where 90% of users are active, and 10% have pending set and there's > an index on each of these: > > WHERE user_status = 'active' AND pending = true > > If the optimizer tries to a bitmap index scan now it has to read in a huge > index; it's probably better off just using the pending index alone. But if it > realizes that it can use the index to find the tuples that *aren't* active and > then take the complement of that bitmap it could build a bitmap reading in > only 10% of that index. As I pointed out above, I don't think you can take the complement of a bitmap. Besides, I havn't looked at the costings for bitmap scans but ISTM that if the stats indicate a 90% coverage for user_status and the index is not highly correlated, it should exclude that index from consideration altogether. And whether it uses a normal index scan or a bitmap scan for pending also depends on the stats. It's the comparison between scanning the whole index and then the pages in sequence versus just grabbing the tuples from the index. Besides, the case you provide is the perfect candidate for a partial index. Attributes that only apply to a small fraction of the table are better off as predicates if you're going to be searching for them a lot. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
В списке pgsql-hackers по дате отправления: