Обсуждение: [HACKERS] Index usage for elem-contained-by-const-range clauses
The topic has been previously discussed[0] on the -performance mailing list, about four years ago. In that thread, Tom suggested[0] the planner could be made to "expand "intcol <@ 'x,y'::int4range" into "intcol between x and y", using something similar to the index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". This patch tries to do exactly that. It's not tied to any specific datatype, and has been tested with both builtin types and custom range types. Most of the checking for proper datatypes, operators, and btree index happens before this code, so I haven't run into any issues yet in my testing. But I'm not familiar enough with the internals to be able to confidently say it can handle all cases just yet. [0]: https://www.postgresql.org/message-id/flat/9860.1364013108%40sss.pgh.pa.us#9860.1364013108@sss.pgh.pa.us -- #!/usr/bin/env regards Chhatoi Pritam Baral -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Thu, Feb 23, 2017 at 4:47 AM, Pritam Baral <pritam@pritambaral.com> wrote: > The topic has been previously discussed[0] on the -performance mailing list, > about four years ago. > > In that thread, Tom suggested[0] the planner could be made to "expand > "intcol <@ > 'x,y'::int4range" into "intcol between x and y", using something similar > to the > index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". > > This patch tries to do exactly that. Please add your patch to https://commitfest.postgresql.org/ so it doesn't get overlooked. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Pritam!
I've assigned to review this patch.
On Thu, Feb 23, 2017 at 2:17 AM, Pritam Baral <pritam@pritambaral.com> wrote:
The topic has been previously discussed[0] on the -performance mailing list,
about four years ago.
In that thread, Tom suggested[0] the planner could be made to "expand
"intcol <@
'x,y'::int4range" into "intcol between x and y", using something similar
to the
index LIKE optimization (ie, the "special operator" stuff in indxpath.c)".
That's cool idea. But I would say more. Sometimes it's useful to transform "intcol between x and y" into "intcol <@ 'x,y'::int4range". btree_gin supports "intcol between x and y" as overlap of "intcol >= x" and "intcol <= y". That is very inefficient. But it this clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle this very efficient.
This patch tries to do exactly that. It's not tied to any specific datatype,
and has been tested with both builtin types and custom range types. Most
of the
checking for proper datatypes, operators, and btree index happens before
this
code, so I haven't run into any issues yet in my testing. But I'm not
familiar
enough with the internals to be able to confidently say it can handle
all cases
just yet.
I've tried this patch. It applies cleanly, but doesn't compile.
indxpath.c:4252:1: error: conflicting types for 'range_elem_contained_quals'
range_elem_contained_quals(Node *leftop, Datum rightop)
^
indxpath.c:192:14: note: previous declaration is here
static List *range_elem_contained_quals(Node *leftop, Oid expr_op, Oid opfamily,
^
Could you please recheck that you published right version of patch?
Also, I noticed that patch haven't regression tests. Some mention of this optimization in docs is also nice to have.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 3/10/17 8:29 AM, Alexander Korotkov wrote: > That's cool idea. But I would say more. Sometimes it's useful to > transform "intcol between x and y" into "intcol <@ 'x,y'::int4range". > btree_gin supports "intcol between x and y" as overlap of "intcol >= x" > and "intcol <= y". That is very inefficient. But it this clause would > be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle > this very efficient. That's certainly be nice as well, but IMHO it's outside the scope of this patch to accomplish that. BTW, while we're wishing for things... Something else that would be nice is if there was a way to do these kind of transforms without hacking the backend... > Also, I noticed that patch haven't regression tests. BTW, those tests need to pay special attention to inclusive vs exclusive bounds. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
Pritam Baral <pritam@pritambaral.com> writes: > The topic has been previously discussed[0] on the -performance mailing list, > about four years ago. > In that thread, Tom suggested[0] the planner could be made to "expand > "intcol <@ > 'x,y'::int4range" into "intcol between x and y", using something similar > to the > index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". > This patch tries to do exactly that. I took a quick look through this, and have some thoughts --- * In match_special_index_operator, there are two switch statements and you've added a clause to only one of them. In the second one, you need to add a check that you're working with a btree index. I'd expect the patch as-is to fall over if an "indexkey <@ range" clause were matched to a hash index. * You're failing to account for the case of "range @> indexkey", which really this ought to work for. That would take a bit of fooling around with the structure of match_special_index_operator to allow indexkey on the right, but we've foreseen since the beginning that that would someday be necessary. Looks like today is that day. * The first bit in range_elem_contained_quals will fall over for an indexkey that is an expression rather than a simple Var. Probably you should just be using exprType() instead. (Not sure about whether that's sufficient in domain cases, though.) Or actually, why look at that at all? Seems like what you want is to look at the RHS input, ie the range value, and get the relevant element datatype from it. That would correspond to what will happen if the <@ operator executes normally: elem_contained_by_range does not consult the type of its LHS. * The "return NIL" for an empty range looks pretty dubious. Even if it fails to fail altogether, it's not doing what we really want, which is to signal that the condition cannot succeed so we needn't search the index. Maybe what we should do in that case is generate an "indexkey = NULL" qual. * Likewise, if the range is infinite, you're just returning NIL and that's leaving something on the table. Probably worth generating "indexkey IS NOT NULL" in that case; it's not going to help much in typical usage, but it would prevent scanning nulls if there are a lot of them in the index. * elog(ERROR) doesn't return, so stuff like this is not idiomatic: + if (opr2oid == InvalidOid) + { + elog(ERROR, "no <= operator for opfamily %u", opfamily); + return NIL; + } It'd be sufficient to do + if (opr2oid == InvalidOid) + elog(ERROR, "no <= operator for opfamily %u", opfamily); * You're not bothering to insert any inputcollid into the generated comparison operator nodes. I'm not sure why that fails to fall over for text comparisons (if indeed it does fail ...) but it's wrong. Use the range type's collation there. * It's sort of annoying that the whole thing only works for a Const range value. A different approach you might think about is to make this work more like ScalarArrayOp, ie we pass the qual through to btree as-is and teach relevant functions in access/nbtree/ how to extract index bound conditions from the range datum at runtime. That would likely end up being a significantly larger patch, though, so you might reasonably conclude it's not worth the effort. regards, tom lane
I wrote: > * You're not bothering to insert any inputcollid into the generated > comparison operator nodes. I'm not sure why that fails to fall over > for text comparisons (if indeed it does fail ...) but it's wrong. > Use the range type's collation there. Oh ... looking at this again, I realize that there's an additional validity check missing: if the range type's collation doesn't match the index column's collation, we can't do this optimization at all. That check probably belongs in match_special_index_operator. regards, tom lane
On Sunday 12 March 2017 01:58 AM, Jim Nasby wrote: > On 3/10/17 8:29 AM, Alexander Korotkov wrote: >> That's cool idea. But I would say more. Sometimes it's useful to >>transform "intcol between x and y" into "intcol <@ 'x,y'::int4range". >> btree_gin supports "intcol between x and y" asoverlap of "intcol >= x" >> and "intcol <= y". That is very inefficient. But it this clause would >> be transformed into"intcol <@ 'x,y'::int4range", btree_gin could handle >> this very efficient. > > That's certainly be nice as well, butIMHO it's outside the scope of this patch to accomplish that. Also, I think btree indexes are more common than btree_gin. The motivation for this originally came from trying to use the primary key of a large table in a range search, and the primary key index was the default btree. Also, this is my first deep dive into Postgres's source code, so I took a few easy ways out, just to get started. If it's not too complex to get btree_gin to handle between queries as contained-in-range, I can give it a try. > > BTW, while we're wishing for things... Something else that would be nice is if there was a way to do these kind of transformswithout hacking the backend... Indeed. And this was one of the things Tom said back when a similar discussion had happened (on the -performance mailing list). But seeing as how it's been almost four years since then, I decided to go ahead with the backend hacking anyway. > >> Also, I noticed that patch haven't regression tests. > > BTW, those tests need to pay special attention to inclusivevs exclusive bounds. I will add regression tests, though I do have to get through all of Tom's suggestions elsewhere in this thread first. -- #!/usr/bin/env regards Chhatoi Pritam Baral
On Friday 10 March 2017 07:59 PM, Alexander Korotkov wrote: > Hi, Pritam! > > I've assigned to review this patch. > > On Thu, Feb 23, 2017 at 2:17 AM, Pritam Baral <pritam@pritambaral.com>wrote: > > The topic has been previously discussed[0] on the -performance mailing list, > about four years ago. > > In that thread, Tom suggested[0] the planner could be made to "expand > "intcol <@ > 'x,y'::int4range" into "intcol between x and y", using something similar > to the > index LIKE optimization (ie,the "special operator" stuff in indxpath.c)". > > > That's cool idea. But I would say more. Sometimes it's useful totransform "intcol between x and y" into "intcol <@ 'x,y'::int4range". btree_gin supports "intcol between x and y" as overlapof "intcol >= x" and "intcol <= y". That is very inefficient. But it this clause would be transformed into "intcol<@ 'x,y'::int4range", btree_gin could handle this very efficient. > > > > This patch tries to do exactly that.It's not tied to any specific datatype, > and has been tested with both builtin types and custom range types. Most > of the > checking for proper datatypes, operators,and btree index happens before > this > code, so I haven't run into any issues yet in my testing. But I'mnot > familiar > enough with the internals to be able to confidently say it can handle > all cases > justyet. > > > I've tried this patch. It applies cleanly, but doesn't compile. > > indxpath.c:4252:1: error: conflictingtypes for 'range_elem_contained_quals' > range_elem_contained_quals(Node *leftop, Datum rightop) > ^ > indxpath.c:192:14:note: previous declaration is here > static List *range_elem_contained_quals(Node *leftop, Oid expr_op,Oid opfamily, > ^ > Could you please recheck that you published right version of patch? So sorry. I'm attaching the correct version of the original with this, in case you want to test the limited implementation, because I still have to go through Tom's list of suggestions. BTW, the patch is for applying on top of REL9_6_2, and while I suspect it may work on master too, I haven't tested it since the original submission (Feb 23). > Also, I noticed that patch haven't regression tests. Some mention of this optimization in docs is also nice to have. > > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- #!/usr/bin/env regards Chhatoi Pritam Baral -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Hi Pritam, On 3/17/17 5:41 PM, Pritam Baral wrote: > > So sorry. I'm attaching the correct version of the original with this, > in case you want to test the limited implementation, because I still > have to go through Tom's list of suggestions. > > BTW, the patch is for applying on top of REL9_6_2, and while I > suspect it may work on master too, I haven't tested it since the > original submission (Feb 23). > >> Also, I noticed that patch haven't regression tests. Some mention of >> this optimization in docs is also nice to have. > > ------ > >> Alexander Korotkov > Postgres Professional: http://www.postgrespro.com >> > The Russian Postgres Company This thread has been idle for a week. Please respond and/or post a new patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David david@pgmasters.net
On Sat, Mar 18, 2017 at 12:41 AM, Pritam Baral <pritam@pritambaral.com> wrote:
On Friday 10 March 2017 07:59 PM, Alexander Korotkov wrote:Hi, Pritam! > > I've assigned to review this patch. > > On Thu, Feb 23, 2017 at 2:17 AM, Pritam Baral <pritam@pritambaral.com> wrote: > > The topic has been previously discussed[0] on the -performance mailing list, > about four years ago. > > In that thread, Tom suggested[0] the planner could be made to "expand > "intcol <@ > 'x,y'::int4range" into "intcol between x and y", using something similar > to the > index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". > > > That's cool idea. But I would say more. Sometimes it's useful to transform "intcol between x and y" into "intcol <@ 'x,y'::int4range". btree_gin supports "intcol between x and y" as overlap of "intcol >= x" and "intcol <= y". That is very inefficient. But it this clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle this very efficient. > > > > This patch tries to do exactly that. It's not tied to any specific datatype, > and hasbeen tested with both builtin types and custom range types. Most > of the > checking for proper datatypes, operators, and btree index happens before > this > code, so I haven't run into any issues yet in my testing. But I'm not > familiar > enough with the internals to be able to confidently say it can handle > all cases > just yet. > > > I've tried this patch. It applies cleanly, but doesn't compile. > > indxpath.c:4252:1: error: conflicting types for 'range_elem_contained_quals' > range_elem_contained_quals(Node *leftop, Datum rightop) > ^ > indxpath.c:192:14: note: previous declaration is here > static List *range_elem_contained_quals(No de *leftop, Oid expr_op, Oid opfamily, > ^ > Could you please recheck that you published right version of patch?
So sorry. I'm attaching the correct version of the original with this,
in case you want to test the limited implementation, because I still
have to go through Tom's list of suggestions.
BTW, the patch is for applying on top of REL9_6_2, and while I
suspect it may work on master too, I haven't tested it since the
original submission (Feb 23).
What is idea behind basing patch on the REL9_6_2?
This patch implements new functionality and it's definitely not going to be considered to be committed to stable release branches.
If you are interesting in committing this patch to master, please rebase it on master branch. If not, please clarify the purpose of this submission.
Also, please include some numbering to the patch name, so that we could distinguish one version of patch from another.
The Russian Postgres Company
On 3/24/17 10:50 AM, David Steele wrote: > Hi Pritam, > > On 3/17/17 5:41 PM, Pritam Baral wrote: >> >> So sorry. I'm attaching the correct version of the original with this, >> in case you want to test the limited implementation, because I still >> have to go through Tom's list of suggestions. >> >> BTW, the patch is for applying on top of REL9_6_2, and while I >> suspect it may work on master too, I haven't tested it since the >> original submission (Feb 23). >> >>> Also, I noticed that patch haven't regression tests. Some mention of >>> this optimization in docs is also nice to have. > > ------ > >>> Alexander Korotkov > Postgres Professional: http://www.postgrespro.com >>> > The Russian Postgres Company > > This thread has been idle for a week. Please respond and/or post a new > patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked > "Returned with Feedback". This submission has been marked "Returned with Feedback". Please feel free to resubmit to a future commitfest. Regards, -- -David david@pgmasters.net