Обсуждение: Number of occurrence of characters?
Hi,
I have a need to fairly often select data where the number of
occurrences of a character in the field is "x". Semantically, it's
literally "SELECT something FROM table WHERE numch('/', field)=$x".
The problem is how to do it efficiently. I see there isn't a built-in
function that counts character occurrences so I'd have to write it
myself. An additional constraint is that it must be implemented with
built-in capabilities, i.e. SQL and plpsql languages. I can do it the
brute force way, looping over the string and processing one by one
character with substring(), but is there a faster way?
Whatever the function is, I intend to create an index on it.
Вложения
On Thu, Feb 05, 2009 at 02:31:24PM +0100, Ivan Voras wrote:
> The problem is how to do it efficiently. I see there isn't a built-in
> function that counts character occurrences so I'd have to write it
> myself. An additional constraint is that it must be implemented with
> built-in capabilities, i.e. SQL and plpsql languages. I can do it the
> brute force way, looping over the string and processing one by one
> character with substring(), but is there a faster way?
# select length(regexp_replace('/some/string/with/slashes', '[^/]+', '', 'g'));
length
--------
4
(1 row)
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
Ivan Voras wrote:
> I have a need to fairly often select data where the number of
> occurrences of a character in the field is "x". Semantically, it's
> literally "SELECT something FROM table WHERE numch('/', field)=$x".
>
> The problem is how to do it efficiently. I see there isn't a built-in
> function that counts character occurrences so I'd have to write it
> myself. An additional constraint is that it must be implemented with
> built-in capabilities, i.e. SQL and plpsql languages. I can do it the
> brute force way, looping over the string and processing one by one
> character with substring(), but is there a faster way?
Hmm, you could do this:
CREATE OR REPLACE FUNCTION numch(text, text) RETURNS integer AS $$
SELECT length($2) - length(replace($2, $1, '')) $$ LANGUAGE SQL;
ie. remove the characters we're counting, and see how much shorter the
string became. I don't know if this is any faster than looping in a
plpgsql function, but it might be.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
hubert depesz lubaczewski wrote:
> On Thu, Feb 05, 2009 at 02:31:24PM +0100, Ivan Voras wrote:
>> The problem is how to do it efficiently. I see there isn't a built-in
>> function that counts character occurrences so I'd have to write it
>> myself. An additional constraint is that it must be implemented with
>> built-in capabilities, i.e. SQL and plpsql languages. I can do it the
>> brute force way, looping over the string and processing one by one
>> character with substring(), but is there a faster way?
>
> # select length(regexp_replace('/some/string/with/slashes', '[^/]+', '', 'g'));
> length
> --------
> 4
> (1 row)
Thank you (and Heikki), I had a feeling I was missing an approach.
Вложения
It all depends at the end of a day, how crucial is that functionality to your app ? If very, I would add to every insert/update a trigger, that would update info on other table(s) with stats per character. Other option, is to write a function in C that would parse word and chop it in C,O (character, number of occurrences ) result. -- GJ