Re: Regex performance issue

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Regex performance issue
Дата
Msg-id 4571FFA6.6070804@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Regex performance issue  ("Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>)
Список pgsql-performance
Alexandru Coseru wrote:
> Hello..
>
> I cannot use the first advice , because i'm not aware of the prefix
> length in the database...
> This is why i'm ordering after length(prefix)..
>
>
> On the 2nd one , i'm not sure that i can follow you..

Ok, let me try again :)

> asterisk=> select * from destlist LIMIT 10;
> id | id_ent | dir |   prefix   |   country   |      network       | tip
> ----+--------+-----+------------+-------------+--------------------+-----
>  1 |     -1 |   0 | (^0093)    | AFGHANISTAN | AFGHANISTAN        |   6
>  2 |     -1 |   0 | (^00937)   | AFGHANISTAN | AFGHANISTAN Mobile |   5
>  3 |     -1 |   0 | (^00355)   | ALBANIA     | ALBANIA            |   6
>  4 |     -1 |   0 | (^0035538) | ALBANIA     | ALBANIA Mobile     |   5
>  5 |     -1 |   0 | (^0035568) | ALBANIA     | ALBANIA Mobile     |   5
>  6 |     -1 |   0 | (^0035569) | ALBANIA     | ALBANIA Mobile     |   5
>  7 |     -1 |   0 | (^00213)   | ALGERIA     | ALGERIA            |   6
>  8 |     -1 |   0 | (^0021361) | ALGERIA     | ALGERIA Mobile     |   5
>  9 |     -1 |   0 | (^0021362) | ALGERIA     | ALGERIA Mobile     |   5
> 10 |     -1 |   0 | (^0021363) | ALGERIA     | ALGERIA Mobile     |   5

Store the prefix in a character column, without the regex stuff. Like
below. I've removed the columns that are not relevant, in fact it would
make sense to store them in another table, and have just the id and
prefix in this table.

id | prefix  | network
---+---------+--------------------
  1 | 0093    | AFGHANISTAN
  2 | 00937   | AFGHANISTAN Mobile
  3 | 00355   | ALBANIA
  4 | 0035538 | ALBANIA Mobile
  5 | 0035568 | ALBANIA Mobile
  6 | 0035569 | ALBANIA Mobile
  7 | 00213   | ALGERIA
  8 | 0021361 | ALGERIA Mobile
  9 | 0021362 | ALGERIA Mobile
10 | 0021363 | ALGERIA Mobile

Now, add the rows marked with start below:

  id | prefix  | network
----+---------+--------------------
   1 | 0093    | AFGHANISTAN
   2 | 00937   | AFGHANISTAN Mobile
* 1 | 00938   | AFGHANISTAN
   3 | 00355   | ALBANIA
   4 | 0035538 | ALBANIA Mobile
* 3 | 0035539 | ALBANIA
   5 | 0035568 | ALBANIA Mobile
   6 | 0035569 | ALBANIA Mobile
* 3 | 003557  | ALBANIA
   7 | 00213   | ALGERIA
   8 | 0021361 | ALGERIA Mobile
   9 | 0021362 | ALGERIA Mobile
  10 | 0021363 | ALGERIA Mobile
* 7 | 0021364 | ALGERIA

The added rows make it unnecessary to use regex for the searches. You
can use just the >= operator like this: (using the example number you gave)

SELECT id FROM destlist WHERE '00213618833' >= prefix ORDER BY prefix
LIMIT 1

Which would return id 7. As another example, a query of "00213654321"
would match the last row, which again has id 7.

I'm too tired to figure out the exact algorithm for adding the rows, but
I'm pretty sure it can be automated... The basic idea is that when
there's a row with id A and prefix XXXX, and another row with id B and
prefix XXXXY, we add another row with id A and prefix XXXX(Y+1).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Alexandru Coseru"
Дата:
Сообщение: Re: Regex performance issue
Следующее
От: "Alexandru Coseru"
Дата:
Сообщение: Re: Regex performance issue