Обсуждение: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
От
Peter Geoghegan
Дата:
We lack SortSupport for many character-like-type cases. In full, the cases within the core system are: * char(n) opfamily (bpchar_ops). * text_pattern_ops opfamily (includes text and varchar "pattern" opclasses, which are generally recommended for accelerating LIKE operator queries). * bpchar_pattern_ops -- the operator family/class for char(n), used where "pattern" style indexing is required for the char(n) type. * bytea default opclass. This is a type that, like the others, shares its representation with text (a varlena header and some data bytes -- a string, essentially). Its comparator already behaves identically to that of the text comparator when the "C" collation is used. I've actually seen a specific request for this [1]. These cases do matter. For one thing, even if they're less important than the default text/varchar opclasses, having such large inconsistencies in character type sort performance is a fairly major POLA violation; opclasses like text_pattern_ops are *supposed* to be faster though less capable alternatives to the defaults. For another, char(n) is in the SQL standard, which may be why all TPC benchmarks use char(n) for columns that are sorted on or used for grouping. char(n) sorting can be made about 8x faster with SortSupport/abbreviation, making it the best candidate for abbreviation optimization that I've ever seen. It would be regrettable if we accidentally lost a benchmark due to not having char(n) SortSupport. Attached patch adds SortSupport for all of the cases listed above. I did not bother doing anything with contrib/citext, because I think it's not worth it. I think that we should definitely invest in case insensitive collations, and retire contrib/citext. The fact that the *default* collation (and not the input collation) is passed by citextcmp()'s call to str_tolower() suggests to me that the only way to make citext do the right thing is to basically introduce case insensitive collations to Postgres. Of course, doing so would make contrib/citext obsolete. I also didn't bother extending the name type's SortSupport (something that has existed since 9.2) to perform abbreviation, although that wouldn't be very hard. I saw no point. I think I might also get around to adding abbreviated key support for the network address types during the 9.6 cycle. That seems like something a less experienced contributor could easily pick up, though -- if anyone wants to take it off my hands, please do so. [1] https://lwn.net/Articles/653721/ -- Peter Geoghegan
Вложения
Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
От
Peter Geoghegan
Дата:
On Thu, Nov 12, 2015 at 4:38 PM, Peter Geoghegan <pg@heroku.com> wrote: > * bytea default opclass. This is a type that, like the others, shares > its representation with text (a varlena header and some data bytes -- > a string, essentially). Its comparator already behaves identically to > that of the text comparator when the "C" collation is used. I've > actually seen a specific request for this [1]. I probably should have given an explanation for why it's okay that NUL bytes can exist in strings from the point of view of the generalized SortSupport for text worker function. The next revision will have comments along those lines. -- Peter Geoghegan
Re: Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
От
Alvaro Herrera
Дата:
Peter Geoghegan wrote: > On Thu, Nov 12, 2015 at 4:38 PM, Peter Geoghegan <pg@heroku.com> wrote: > > * bytea default opclass. This is a type that, like the others, shares > > its representation with text (a varlena header and some data bytes -- > > a string, essentially). Its comparator already behaves identically to > > that of the text comparator when the "C" collation is used. I've > > actually seen a specific request for this [1]. > > I probably should have given an explanation for why it's okay that NUL > bytes can exist in strings from the point of view of the generalized > SortSupport for text worker function. The next revision will have > comments along those lines. Did you ever post "the next revision"? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
От
Alvaro Herrera
Дата:
Peter Geoghegan wrote: > We lack SortSupport for many character-like-type cases. In full, the > cases within the core system are: You're stealthily introducing a new abstraction called "string", including a typedef and DatumGetString support macros. Is that really necessary? Shouldn't it be discussed specifically? I don't necessarily oppose it as is, mainly because it's limited to within varlena.c for now, but I'm not sure it'd okay to make it more public. Also, there's a lot of churn in this patch to just remove tss to sss. Can't we just keep the original variable name? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
От
Peter Geoghegan
Дата:
On Thu, Jan 7, 2016 at 7:41 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > You're stealthily introducing a new abstraction called "string", > including a typedef and DatumGetString support macros. Is that really > necessary? Shouldn't it be discussed specifically? I don't necessarily > oppose it as is, mainly because it's limited to within varlena.c for > now, but I'm not sure it'd okay to make it more public. Note that a similar abstraction for the "unknown" type also exists only within varlena.c. So, DatumGetStringP() and so on appear right alongside DatumGetUnknownP() and so on. The idea of the "string" abstraction is that is advertises that certain functions could equally well apply to a variety of "varlena header + some bytes" types. I thought about just using the varlena type instead, but preferred the "string" abstraction. > Also, there's a lot of churn in this patch to just remove tss to sss. > Can't we just keep the original variable name? I think that minimizing lines changed in a mechanical way by a commit is overrated as a goal for Postgres patches, but I don't feel too strongly about holding on to the "churn" in this patch. I attach a new revision, which has the changes I outlined to code comments. I haven't minimized the differences in the way you suggest just yet. -- Peter Geoghegan
Вложения
Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
От
Andreas Karlsson
Дата:
Hi, I have reviewed this now and I think this is a useful addition even though these indexes are less common. Consistent behavior is worth a lot in my mind and this patch is reasonably small. The patch no longer applies due to 1) oid collisions and 2) a trivial conflict. When I fixed those two the test suite passed. I tested this patch by building indexes with all the typess and got nice measurable speedups. Logically I think the patch makes sense and the code seems to be correct, but I have some comments on it. - You use two names a lot "string" vs "varstr". What is the difference between those? Is there any reason for not using varstr consistently? - You have a lot of renaming as has been mentioned previously in this thread. I do not care strongly for it either way, but it did make it harder to spot what the patch changed. If it was my own project I would have considered splitting the patch into two parts, one renaming everything and another adding the new feature, but the PostgreSQL seem to often prefer having one commit per feature. Do as you wish here. - I think the comment about NUL bytes in varstr_abbrev_convert makes it seem like the handling is much more complicated than it actually is. I did not understand it after a couple of readings and had to read the code understand what it was talking about. Nice work, I like your sorting patches. Andreas
On Sun, Jan 31, 2016 at 10:59 PM, Andreas Karlsson <andreas@proxel.se> wrote: > I have reviewed this now and I think this is a useful addition even though > these indexes are less common. Consistent behavior is worth a lot in my mind > and this patch is reasonably small. > > The patch no longer applies due to 1) oid collisions and 2) a trivial > conflict. When I fixed those two the test suite passed. > > I tested this patch by building indexes with all the typess and got nice > measurable speedups. > > Logically I think the patch makes sense and the code seems to be correct, > but I have some comments on it. > > - You use two names a lot "string" vs "varstr". What is the difference > between those? Is there any reason for not using varstr consistently? > > - You have a lot of renaming as has been mentioned previously in this > thread. I do not care strongly for it either way, but it did make it harder > to spot what the patch changed. If it was my own project I would have > considered splitting the patch into two parts, one renaming everything and > another adding the new feature, but the PostgreSQL seem to often prefer > having one commit per feature. Do as you wish here. > > - I think the comment about NUL bytes in varstr_abbrev_convert makes it seem > like the handling is much more complicated than it actually is. I did not > understand it after a couple of readings and had to read the code understand > what it was talking about. > > Nice work, I like your sorting patches. Thanks for the review. I fixed the OID conflict, tweaked a few comments, and committed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
От
Peter Geoghegan
Дата:
On Sun, Jan 31, 2016 at 7:59 PM, Andreas Karlsson <andreas@proxel.se> wrote: > Nice work, I like your sorting patches. Thanks. I like your reviews of my sorting patches. :-) -- Peter Geoghegan
Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
От
Peter Geoghegan
Дата:
On Wed, Feb 3, 2016 at 11:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Thanks for the review. I fixed the OID conflict, tweaked a few > comments, and committed this. Thanks. I noticed a tiny, preexisting typo in the string abbreviated key code. The attached patch fixes it. -- Peter Geoghegan
Вложения
On Fri, Feb 5, 2016 at 6:14 AM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Feb 3, 2016 at 11:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Thanks for the review. I fixed the OID conflict, tweaked a few >> comments, and committed this. > > Thanks. I noticed a tiny, preexisting typo in the string abbreviated > key code. The attached patch fixes it. Gosh, I must have looked at that line 10 times without seeing that mistake. Thanks, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company