Обсуждение: Regular Expression in SQL
I recall not long ago a discussion about regular expressions in a query that hit on this exact topic but don't think it wasever resolved so I am giving it a go again...<br /><br />Here is my query (keep in mind that I am just experimenting nowso don't worry about the fact that I am using nested substring function calls): <br /><br />SELECT<br /> referrer,<br/> substring(referrer FROM '^([^\\/]*\\/\\/[^\\/]*)(\\/)?'),<br /> substring(referrer FROM '^[^\\/]*\\/\\/www\\.google\\.[^\\/]*\\/[^\\?]*\\?(.*)$'),<br/> substring(substring(referrer FROM '^[^\\/]*\\/\\/www\\.google\\.[^\\/]*\\/[^\\?]*\\?(.*)$')FROM '((%&q=)|(q=))#"[^&]*#"((&%)|())' FOR '#') <br/>FROM one_hour_air.web_page_view<br />WHERE referrer ~ '^[^\\/]*\\/\\/(www.google\\.[^\\/]*)\\/'<br /><br />What I getis:<br /><br /><div style="margin-left: 40px;">referrer <br />substring<br />substring_1<br />substring_2<br /><br /><ahref="http://www.google.ca/search?q=one+hour+heating&hl=en"> http://www.google.ca/search?q=one+hour+heating&hl=en</a><br/><a href="http://www.google.ca">http://www.google.ca</a><br/>q=one+hour+heating&hl=en<br />q=<br /><br /><a href="http://www.google.com/search?hl=en&q=One+hour+heating+and+Air"> http://www.google.com/search?hl=en&q=One+hour+heating+and+Air</a><br/><a href="http://www.google.com">http://www.google.com</a><br/>hl=en&q=One+hour+heating+and+Air<br />hl=en&q=<br /></div><br/>What I expected for substring_2 was (respectively): <br /><br /><div style="margin-left: 40px;">one+hour+heating<br/></div><div style="margin-left: 40px;"><br /></div><div style="margin-left: 40px;">One+hour+heating+and+Air<br/></div><br />I thought by using the FOR '#' I could specify exactly what part of the expressionI would get but it still grabs the first (...) of the pattern. At least that is what the documentation in seciton9.7.2 at <a href="http://www.postgresql.org/docs/8.1/static/functions-matching.html">http://www.postgresql.org/docs/8.1/static/functions-matching.html</a> ledme to believe. How can I get the part of the string I am really after without using one nested substring after another?<br /><br />Thanks,<br />Aaron Bono<br /><br />==================================================================<br/> Aaron Bono<br /> Aranya Software Technologies,Inc.<br /> <a href="http://www.aranya.com">http://www.aranya.com </a><br />==================================================================
"Aaron Bono" <postgresql@aranya.com> writes: > I thought by using the FOR '#' I could specify exactly what part of the > expression I would get but it still grabs the first (...) of the pattern. Hmm ... I think that this is a bug in similar_escape(): it ought to transform parentheses in a SIMILAR pattern into non-capturing parentheses. Until this is fixed, your best bet is to use the POSIX-regexp form of substring(). You can't sneak non-capturing parens through similar_escape, because it'll try to escape the ? ... regards, tom lane