Re: GIN fast insert
От | Robert Haas |
---|---|
Тема | Re: GIN fast insert |
Дата | |
Msg-id | 603c8f070902241335i575269a8ydccf01043644250f@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: GIN fast insert (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: GIN fast insert
|
Список | pgsql-hackers |
On Tue, Feb 24, 2009 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> On the other hand, Teodor showed a typical use case and a very >> substantial performance gain: > > Yeah. Whatever we do here is a tradeoff (and whether Robert likes it > or not, reliability and code maintainability weigh heavily in the > tradeoff). I have no problem with reliability or code maintainability and I'm not sure what I said that would give that impression. If the consensus of the group is that the performance loss from dropping index scans is not important, then I'm fine with that, especially if that consensus is reached in the context of an educated knowledge of what that performance loss is likely to be. To me, a 2x slowdown on two-table anti-join seems pretty bad, but I just work here. Perhaps nobody else thinks that a semi-join or anti-join against a GIN index is a plausible use case (like, find all of the words from the following list that do not appear in any document)? If everyone agrees that we don't care about that case (or about ORDER-BY-without-LIMIT, which is certainly less compelling), then go ahead and remove it. I have no horse in this race other than having been asked to review the patch, which I did. On the other hand, if a significant number of people think that it might be a bad idea to make that case significantly worse, then some redesign work is called for, and that may mean the patch needs to get bumped. My own opinion is that it is better to decide on the right design and then figure out which release that design can go into than it is to start by deciding this has to go into 8.4 and then figuring out what can be done in that period of time. I don't think there is any question that making GIN continue to support both index scans and bitmap index scans will make the code more complex, but how bad will it be? So far we've ruled out using the planner to prevent index scans when the pending list is long (because it's not reliable) and cleaning up the pending list during insert when needed (because it won't work with Hot Standby). We haven't decided what WILL work, apart from ripping out index scans altogether, so to some degree we're comparing against an unknown. >> I wonder how many people really use GIN with non-bitmap scans for some >> benefit? And even if the benefit exists, does the planner have a way to >> identify those cases reliably, or does it have to be done manually? > > A relevant point there is that most of the estimator functions for > GIN-amenable operators are just smoke and mirrors; so if the planner > is making a good choice between indexscan and bitmapscan at all, it's > mostly luck. This might get better someday, but not in 8.4. Based on the limited testing I've done thus far, it appears to pick an index scan for small numbers of rows and a bitmap index scan for larger number of rows. Index scans will have lower startup costs which can be valuable if you only need to scan part of the index (as in the semi and anti join cases). I haven't done enough testing to see if there is any benefit when scanning the whole index and only returning a few tuples. ...Robert
В списке pgsql-hackers по дате отправления: