Обсуждение: text_pattern_ops and complex regexps
Greetings, I've run into an annoying issue which I would think could be handled better. Basically, indexes using text_pattern_opsdon't work with some complex regexps even when they (imv anyway) could. I'm willing to believe I'm wrongabout the potential to use them, or that my regexp is wrong, but I don't see it. Test case: create table text_test (name text); insert into text_test values ('North'); insert into text_test values ('North West');create index text_test_name_idx on text_test using btree (name text_pattern_ops); set enable_seqscan = false; -- justto show the test -- works fine explain analyze select * from text_test where name ~ '^(North)'; -- works fine explainanalyze select * from text_test where name ~ '^(North)( West)'; -- doesn't work explain analyze select * from text_testwhere name ~ '^(North)(| West)'; Results: CREATE TABLE INSERT 0 1 INSERT 0 1 CREATE INDEX SET QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------Index Scanusing text_test_name_idx on text_test (cost=0.00..8.27 rows=1 width=32) (actual time=0.071..0.077 rows=2 loops=1) IndexCond: ((name ~>=~ 'North'::text) AND (name ~<~ 'Norti'::text)) Filter: (name ~ '^(North)'::text)Total runtime: 0.121ms (4 rows) QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------Index Scanusing text_test_name_idx on text_test (cost=0.00..8.27 rows=1 width=32) (actual time=0.176..0.178 rows=1 loops=1) IndexCond: ((name ~>=~ 'North'::text) AND (name ~<~ 'Norti'::text)) Filter: (name ~ '^(North)( West)'::text)Total runtime:0.209 ms (4 rows) QUERY PLAN --------------------------------------------------------------------------------------------------------------------Seq Scanon text_test (cost=100000000.00..100000001.03 rows=1 width=32) (actual time=0.013..0.019 rows=2 loops=1) Filter: (name~ '^(North)(| West)'::text)Total runtime: 0.045 ms (3 rows) I don't see why the last case can't use the index. Obviously, for this example case, doing a Seq Scan is fine but withthe real data set there are cases where an index could help. Any help would be greatly appreciated. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > I don't see why the last case can't use the index. The planner's understanding of regexps is far weaker than yours. (In particular, I think it's set up to abandon optimization if it sees | anywhere.) regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > I don't see why the last case can't use the index. > > The planner's understanding of regexps is far weaker than yours. > > (In particular, I think it's set up to abandon optimization if it > sees | anywhere.) That's kind of what I figured from the empirical data. My hope was that it might be something which could be fixed. Is this entirely the planner's doing (eg: PG code)? Perhaps this is misguided but I would think that the regexp libraries might have some support for "give me all anchored required text for this regexp" which we could then use in the planner. Certainly in an ideal world we wouldn't have to teach the planner the knowledge that the regexp libraries include for this. Thoughts? Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> (In particular, I think it's set up to abandon optimization if it >> sees | anywhere.) > That's kind of what I figured from the empirical data. My hope was that > it might be something which could be fixed. See regex_fixed_prefix(), but it's a pretty hard problem without writing a complete regex parser. > Perhaps this is misguided but I would > think that the regexp libraries might have some support for "give me all > anchored required text for this regexp" which we could then use in the > planner. I wouldn't see why. It's certainly worth considering to hand the pattern to the regex engine and then burrow into the data structure it builds; but right now we consider that structure to be entirely private to backend/regex/. There's also the problem that we'd have no easy way to determine how much the result depends on the current regex flavor setting. There are some cases now where regex_fixed_prefix deliberately omits possible optimizations because of uncertainty about the flavor. regards, tom lane
Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > Perhaps this is misguided but I would think that the regexp > > libraries might have some support for "give me all anchored required > > text for this regexp" which we could then use in the planner. > > I wouldn't see why. It's certainly worth considering to hand the > pattern to the regex engine and then burrow into the data structure it > builds; but right now we consider that structure to be entirely private > to backend/regex/. There's also the problem that we'd have no easy > way to determine how much the result depends on the current regex flavor > setting. There are some cases now where regex_fixed_prefix deliberately > omits possible optimizations because of uncertainty about the flavor. I think changeable regex flavors turned out to be a bad idea. They can wreak all sorts of havoc. You change the setting, SIGHUP, and suddenly your application fails to work as expected. Maybe we should make that setting PGC_POSTMASTER (or just get rid of it?), and provide was to pass flags to change the flavor for particular operations (this is easy for function-based stuff but not so easy for operators). That way it doesn't intrude in stuff like cached plans and so on. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > I think changeable regex flavors turned out to be a bad idea. They can > wreak all sorts of havoc. You change the setting, SIGHUP, and suddenly > your application fails to work as expected. Maybe we should make that > setting PGC_POSTMASTER (or just get rid of it?), and provide was to pass > flags to change the flavor for particular operations (this is easy for > function-based stuff but not so easy for operators). That way it > doesn't intrude in stuff like cached plans and so on. Maybe so. I think it was originally intended mostly as a backwards-compatibility measure when we added the support for ARE flavor. It's pretty likely that no one changes the flavor setting in practice anymore. If we just locked it down as "advanced always" then we could simplify the documentation by a measurable amount ... regards, tom lane
On Wed, May 06, 2009 at 12:10:49PM -0400, Alvaro Herrera wrote: > Tom Lane wrote: > > Stephen Frost <sfrost@snowman.net> writes: > > > > Perhaps this is misguided but I would think that the regexp > > > libraries might have some support for "give me all anchored > > > required text for this regexp" which we could then use in the > > > planner. > > > > I wouldn't see why. It's certainly worth considering to hand the > > pattern to the regex engine and then burrow into the data > > structure it builds; but right now we consider that structure to > > be entirely private to backend/regex/. There's also the problem > > that we'd have no easy way to determine how much the result > > depends on the current regex flavor setting. There are some cases > > now where regex_fixed_prefix deliberately omits possible > > optimizations because of uncertainty about the flavor. > > I think changeable regex flavors turned out to be a bad idea. +1 Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> I think changeable regex flavors turned out to be a bad idea. They can >> wreak all sorts of havoc. You change the setting, SIGHUP, and suddenly >> your application fails to work as expected. Maybe we should make that >> setting PGC_POSTMASTER (or just get rid of it?), and provide was to pass >> flags to change the flavor for particular operations (this is easy for >> function-based stuff but not so easy for operators). That way it >> doesn't intrude in stuff like cached plans and so on. > > Maybe so. I think it was originally intended mostly as a > backwards-compatibility measure when we added the support for ARE > flavor. It's pretty likely that no one changes the flavor setting > in practice anymore. If we just locked it down as "advanced always" > then we could simplify the documentation by a measurable amount ... yeah I don't recall a single incident in the last few years that required playing with the regex flavours.... Stefan
Alvaro Herrera <alvherre@commandprompt.com> writes: > ... Maybe we should make that > setting PGC_POSTMASTER (or just get rid of it?), and provide was to pass > flags to change the flavor for particular operations (this is easy for > function-based stuff but not so easy for operators). BTW, if you are putting it on the application to use some other syntax to get at the old flavors, then there already is an adequate feature built into the regex library: a pattern beginning with (?b) or (?e) will be taken as a BRE or ERE respectively, cf. table 9-19 in current docs. So I don't see any value in inventing something additional. The only reason for regex_flavor to exist is to satisfy applications that were written to expect the pre-7.4 regex syntax to work as-is. If we think there aren't any of those anymore, let's just kill the GUC and be done with it. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > If we think there aren't any of those anymore, let's just kill the > GUC and be done with it. +1. I'll try to spend some time in backend/regexp and regex_fixed_prefix soon. Thanks, Stephen
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> I think changeable regex flavors turned out to be a bad idea. They can >> wreak all sorts of havoc. You change the setting, SIGHUP, and suddenly >> your application fails to work as expected. Maybe we should make that >> setting PGC_POSTMASTER (or just get rid of it?), and provide was to pass >> flags to change the flavor for particular operations (this is easy for >> function-based stuff but not so easy for operators). That way it >> doesn't intrude in stuff like cached plans and so on. >> > > Maybe so. I think it was originally intended mostly as a > backwards-compatibility measure when we added the support for ARE > flavor. It's pretty likely that no one changes the flavor setting > in practice anymore. If we just locked it down as "advanced always" > then we could simplify the documentation by a measurable amount ... > > > I know of at least one significant client (OpenACS) that still apparently requires extended flavor. Removing the compatibility option would be a major pain point for some of my clients. PGC_POSTMASTER would be fine, though. cheers andrew
On Wed, 2009-05-06 at 15:55 -0400, Andrew Dunstan wrote: > I know of at least one significant client (OpenACS) that still > apparently requires extended flavor. Removing the compatibility option > would be a major pain point for some of my clients. PGC_POSTMASTER would > be fine, though. Isn't that why we wouldn't remove it from back releases? Joshua D. Drake > > cheers > > andrew > -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Joshua D. Drake wrote: > On Wed, 2009-05-06 at 15:55 -0400, Andrew Dunstan wrote: > > >> I know of at least one significant client (OpenACS) that still >> apparently requires extended flavor. Removing the compatibility option >> would be a major pain point for some of my clients. PGC_POSTMASTER would >> be fine, though. >> > > Isn't that why we wouldn't remove it from back releases? > > My clients aren't going to be very happy if they can't upgrade because of this. cheers andrew
On Wed, 2009-05-06 at 16:10 -0400, Andrew Dunstan wrote: > > Isn't that why we wouldn't remove it from back releases? > > > > > > My clients aren't going to be very happy if they can't upgrade because > of this. Certainly. Nobody wants to make clients unhappy but for the good of the code man, for the good of the code :). 8.3 will be supported for a very long time to come. Sincerely, Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Andrew Dunstan <andrew@dunslane.net> writes: > I know of at least one significant client (OpenACS) that still > apparently requires extended flavor. Is this demonstrable, or just speculation? The incompatibilities between ARE mode and (legal) ERE patterns are pretty darn small. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > ... Maybe we should make that > setting PGC_POSTMASTER (or just get rid of it?), Another thought here: if we do get persuaded that the regex_flavor GUC has to stay, we could eliminate it as a hazard for planning by changing its scope to PGC_BACKEND. That would be much less restrictive than PGC_POSTMASTER; for instance it'd still work to set it for a particular application via ALTER ROLE. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I know of at least one significant client (OpenACS) that still >> apparently requires extended flavor. >> > > Is this demonstrable, or just speculation? The incompatibilities > between ARE mode and (legal) ERE patterns are pretty darn small. > > > It's explicitly documented. Whether or not there is a good basis for the documentation I can't yet say. If that's going to be influential I will dig deeper. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Is this demonstrable, or just speculation? The incompatibilities >> between ARE mode and (legal) ERE patterns are pretty darn small. > It's explicitly documented. Whether or not there is a good basis for the > documentation I can't yet say. If that's going to be influential I will > dig deeper. Our fine manual asserts that the only such incompatibility is that inside square brackets (character alternative lists), ERE mode takes backslash as an ordinary character while ARE mode thinks it begins an escape. Given the additional capabilities afforded by the latter interpretation, and that every other modern regex engine on the planet takes the latter approach *without* giving you any option, it doesn't seem too unreasonable to ask OpenACS to join the twenty-first century ... regards, tom lane