Re: Optimize single tuple fetch from nbtree index
От | Floris Van Nee |
---|---|
Тема | Re: Optimize single tuple fetch from nbtree index |
Дата | |
Msg-id | 1564784330839.3645@Optiver.com обсуждение исходный текст |
Ответ на | Re: Optimize single tuple fetch from nbtree index (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Hi Tom, Thanks for your quick reply! > Regardless of whether there's actually a LIMIT 1? That seems disastrous > for every other case than the narrow one where the optimization wins. > Because every other case is now going to have to touch the index page > twice. That's more CPU and about double the contention --- if you could > not measure any degradation from that, you're not measuring the right > thing. I thought the same as well at first. Note that I did measure degradation of 2-3% as mentioned on some cases, but initiallyI also expected worse. Do you have any ideas on cases that would suffer the most? I thought the tight inner nestedloop that I posted in my performance tests would have this index lookup as bottleneck. I know they are the bottleneckfor the LIMIT 1 query (because these improve by a factor 2-3 with the patch). And my theory is that for a LIMIT3, the price paid for this optimization is highest, because it would touch the page twice and read all items from it,while only returning three of them. > In principle, you could pass down knowledge of whether there's a LIMIT, > using the same mechanism used to enable top-N sorting. But it'd have to > also go through the AM interface layer, so I'm not sure how messy this > would be. This was an idea I had as well and I would be willing to implement such a thing if this is deemed interesting enough by thecommunity. However, I didn't want to do this for the first version of this patch, as it would be quite some extra work,which would be useless if the idea of the patch itself gets rejected already. :-) I'd appreciate any pointers in theright direction - I can take a look at how top-N sorting pushes the LIMIT down. Given enough interest for the basic ideaof this patch, I will implement it. >> This calls _bt_first in nbtsearch.c, which will, if there are scankeys, descend the tree to a leaf page and read justthe first (or possibly two) tuples. It won't touch the rest of the page yet. If indeed just one tuple was required, therewon't be a call to _bt_next and we're done. If we do need more than one tuple, _bt_next will resume reading tuples fromthe index page at the point where we left off. > How do you know how many index entries you have to fetch to get a tuple that's live/visible to the query? Indeed we don't know that - that's why this initial patch does not make any assumptions about this and just assumes the good-weatherscenario that everything is visible. I'm not sure if it's possible to give an estimation of this and whetheror not that would be useful. Currently, if it turns out that the tuple is not visible, there'll just be another callto _bt_next again which will resume reading the page as normal. I'm open to implement any suggestions that may improvethis. >> - We need to take into account page splits, insertions and vacuums while we do not have the read-lock in between _bt_firstand the first call to _bt_next. This made my patch quite a bit more complicated than my initial implementation. > Meh. I think the odds that you got this 100% right are small, and the > odds that it would be maintainable are smaller. There's too much that > can happen if you're not holding any lock --- and there's a lot of active > work on btree indexes, which could break whatever assumptions you might > make today. Agreed, which is also why I posted this initial version of the patch here already, to get some input from the experts onthis topic what assumptions can be made now and in the future. If it turns out that it's completely not feasible to doan optimization like this, because of other constraints in the btree implementation, then we're done pretty quickly here.:-) For what it's worth: the patch at least passes make check consistently - I caught a lot of these edge cases relatedto page splits and insertions while running the regression tests, which runs the modified bits of code quite oftenand in parallel. There may be plenty of edge cases left however... > I'm not unalterably opposed to doing something like this, but my sense > is that the complexity and probable negative performance impact on other > cases are not going to look like a good trade-off for optimizing the > case at hand. I do think it could be a big win if we could get something like this working. Cases with a LIMIT seem common enough to meto make it possible to add some extra optimizations, especially if that could lead to 2-3x the TPS for these kind of queries.However, it indeed needs to be within a reasonable complexity. If it turns out that in order for us to optimize this,we need to add a lot of extra complexity, it may not be worth it to add it. > BTW, you haven't even really made the case that optimizing a query that > behaves this way is the right thing to be doing ... maybe some other > plan shape that isn't a nestloop around a LIMIT query would be a better > solution. It is pretty difficult to come up with any faster plans than this unfortunately. We have a large database with many tableswith timeseries data, and when tables get large and when there's an efficient multi-column index and when you wantto do these kind of time-base or item-based lookups, the nested loop is generally the fastest option. I can elaboratemore on this, but I'm not sure if this thread is the best place for that. I appreciate you taking the time to take a look at this. I'd be happy to look more into any suggestions you come up with.Working on this has taught me a lot about the internals of Postgres already - I find it really interesting! -Floris
В списке pgsql-hackers по дате отправления:
Следующее
От: Tom LaneДата:
Сообщение: Re: The unused_oids script should have a reminder to use the 8000-8999 OID range