Обсуждение: Does pgsql's regex processor optimize Common-Prefix?
Hi all. I am developing an application which searches for city names in a column. There is a lot of cities and I have to 'like' every name which is not effective enough. So I want to know whether pgsql's regex processor can optimize regexes such as: Nebraska|Nevada|North Carolina to N(e(braska|vada)|orth Carolina) If the processor can do that like a Dictionary-Tree, it may be affordable to me or else I have to write a matcher myself. Any suggestion is appreciated. Thank you and appologize for my poor English. --Xig
Kurapica wrote: > I am developing an application which searches for city names in a > column. There is a lot of cities and I have to 'like' every name which > is not effective enough. So I want to know whether pgsql's regex > processor can optimize regexes such as: > > Nebraska|Nevada|North Carolina > to > N(e(braska|vada)|orth Carolina) > > If the processor can do that like a Dictionary-Tree, it may be > affordable to me or else I have to write a matcher myself. > > Any suggestion is appreciated. Thank you and appologize for my poor English. Compared to the use of indexes to skip whole table scanning, this optimization is going to have very little impact. So don't worry about it. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Kurapica, I'd use contrib/pg_trgm for your application. Олег On Tue, 26 Dec 2006, Alvaro Herrera wrote: > Kurapica wrote: > >> I am developing an application which searches for city names in a >> column. There is a lot of cities and I have to 'like' every name which >> is not effective enough. So I want to know whether pgsql's regex >> processor can optimize regexes such as: >> >> Nebraska|Nevada|North Carolina >> to >> N(e(braska|vada)|orth Carolina) >> >> If the processor can do that like a Dictionary-Tree, it may be >> affordable to me or else I have to write a matcher myself. >> >> Any suggestion is appreciated. Thank you and appologize for my poor English. > > Compared to the use of indexes to skip whole table scanning, this > optimization is going to have very little impact. So don't worry about > it. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Alvaro Herrera <alvherre@commandprompt.com> writes: > Kurapica wrote: >> So I want to know whether pgsql's regex >> processor can optimize regexes such as: >> Nebraska|Nevada|North Carolina >> to >> N(e(braska|vada)|orth Carolina) > Compared to the use of indexes to skip whole table scanning, this > optimization is going to have very little impact. So don't worry about > it. Well, if you were able to extract a long enough common prefix to make an index optimization possible/useful, then it would have some value. But that seems unlikely. What I think would be considerably more interesting is a conversion to an OR form: state ~ '(^Nebraska)|(^Nevada)|(^North Carolina)' to state ~ '^Nebraska' OR state ~ '^Nevada' OR state ~ '^North Carolina' which could be planned as three separate, very-selective indexscans --- unlike the rewritten version proposed above. But Oleg's suggestion of using pg_trgm or some other full-text searching mechanism is probably at least as good, and it requires no new coding. regards, tom lane