Re: Index Skip Scan
| От | Dmitry Dolgov | 
|---|---|
| Тема | Re: Index Skip Scan | 
| Дата | |
| Msg-id | 20200210193056.l4slz4bspqv44gqu@localhost обсуждение исходный текст | 
| Ответ на | Re: Index Skip Scan (James Coleman <jtc331@gmail.com>) | 
| Список | pgsql-hackers | 
> On Sat, Feb 08, 2020 at 01:31:02PM -0500, James Coleman wrote: > On Sat, Feb 8, 2020 at 10:24 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > > On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote: > > > So in this case the skip scan is ~15x slower than the usual plan (index > > > only scan + unique). The reason why this happens is pretty simple - to > > > estimate the number of groups we multiply the ndistinct estimates for > > > the two columns (which both have n_distinct = 10000), but then we cap > > > the estimate to 10% of the table. But when the columns are independent > > > with high cardinalities that under-estimates the actual value, making > > > the cost for skip scan much lower than it should be. > > > > The current implementation checks if we can find the next value on the > > same page to do a shortcut instead of tree traversal and improve such > > kind of situations, but I can easily imagine that it's still not enough > > in some extreme situations. > > This is almost certainly rehashing already covered ground, but since I > doubt it's been discussed recently, would you be able to summarize > that choice (not to always get the next tuple by scanning from the top > of the tree again) and the performance/complexity tradeoffs? Yeah, this part of discussion happened already some time ago. The idea [1] is to protect ourselves at least partially from incorrect ndistinct estimations. Simply doing jumping over an index means that even if the next key we're searching for is on the same page as previous, we still end up doing a search from the root of the tree, which is of course less efficient than just check right on the page before jumping further. Performance tradeoff in this case is simple, we make regular use case slightly slower, but can perform better in the worst case scenarios. Complexity tradeoff was never discussed, but I guess everyone assumed it's relatively straightforward to check the current page and return if something was found before jumping. [1]: https://www.postgresql.org/message-id/CA%2BTgmoY7QTHhzLWZupNSyyqFRBfMgYocg3R-6g%3DDRgT4-KBGqg%40mail.gmail.com
В списке pgsql-hackers по дате отправления: