Re: looking for a faster way to do that

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: looking for a faster way to do that
Дата
Msg-id 976CEC06-8525-4A87-BBCF-84211ACD91CB@gmail.com
обсуждение исходный текст
Ответ на Re: looking for a faster way to do that  (hamann.w@t-online.de)
Список pgsql-general
On 25 Sep 2011, at 8:04, hamann.w@t-online.de wrote:
> Hi Eduardo,
>
> it is clear that scanning the table once with a list of matches will outperform
> rescanning the table for every string wanted. Now, my problem is that the patterns are
> dynamic as well. So if I could translate a table with one column  and a few thousand rows
> into something like
> regexp_matches(code,'string1|string2|.....string2781')
> would ideally be a performant query. Unfortunately I have no idea how I could achieve this
> transformation inside the database. Doing it externally fails, because any single query cannot
> be more than so few characters.



To me it sounds a little bit like you're comparing every item in a warehouse to a set of descriptions to see what type
ofitem it is, which is something you would be much better off storing as a property of the item. If an item is a fruit,
storethat it's a fruit! 
But I'm guessing at what you're trying to accomplish, so here's a few other options...

I guess you could create 2781 expression indexes to do what you want (is there a limit that prevents this?). Query
planningwould probably become kind of slow and the indices will take up a considerable fraction of the total table
storagerequired - that's a pretty outlandish approach. 

CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string1'));
CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2'));
...
CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2781'));

Or are you really going to query every record against all 2781 regexes? I can't figure out a realistic scenario why you
(oranyone) would want that. 
In that case those indices aren't going to help you much, as the planner would have to hold every record in tbl to each
index- it won't do that. 


You could also create a giant lookup table (a materialized view, if you like) where the results of every match of str
intbl against the wantcode in the regex table is stored. That's some huge overhead, but it will probably outperform
mostother options. With the numbers you gave that table will hold about 2-3 billion records with two foreign key values
anda truth value each. 


Alban Hertroys

--
The scale of a problem often equals the size of an ego.



В списке pgsql-general по дате отправления:

Предыдущее
От: Eduardo Morras
Дата:
Сообщение: Re: Speed of lo_unlink vs. DELETE on BYTEA
Следующее
От: pasman pasmański
Дата:
Сообщение: New feature: accumulative functions.