select with "like" from another table

Поиск
Список
Период
Сортировка
От Anton Maksimenkov
Тема select with "like" from another table
Дата
Msg-id 8cac8dd0605290253v75d43b9p249ebf58ee28b63c@mail.gmail.com
обсуждение исходный текст
Ответы Re: select with "like" from another table
Список pgsql-performance
Hi.

I have 2 tables - one with calls numbers and another with calls codes.
The structure almost like this:
billing=# \d a_voip
                                        Table "public.a_voip"
      Column       |            Type             |
 Modifiers
--------------------+-----------------------------+-----------------------------------------------------
 id                 | integer                     | not null default
nextval('a_voip_id_seq'::regclass)
 tm                 | timestamp without time zone | not null
 user_name          | character varying(50)       | not null
...
 calling_station_id | character varying(20)       | not null
 called_station_id  | character varying(20)       | not null
Indexes:
   "a_voip_pkey" PRIMARY KEY, btree (id)
   "a_voip_tm" btree (tm)

billing=# \d a_voip_codes
       Table "public.a_voip_codes"
 Column |         Type          | Modifiers
--------+-----------------------+-----------
 code   | integer               | not null
 region | character varying(77) |
 tarif  | numeric(13,7)         |
Indexes:
   "a_voip_codes_pkey" PRIMARY KEY, btree (code)

I need to select longest codes from a_voip_codes which match with the
the called_station_id. Because codes (very rarely) changes I construct
query

SELECT  user_name, called_station_id,
(SELECT code FROM a_voip_codes AS c where v.called_station_id like
c.code || '%' order by code desc limit 1) AS code
 FROM a_voip AS v WHERE user_name = 'dixi' AND tm between '2006-04-01'
and '2006-05-01' group by user_name, called_station_id;

Analyzed variant
billing=# explain analyze SELECT user_name, called_station_id, (SELECT
code FROM a_voip_codes AS c where v.called_station_id like c.code ||
'%' order by code desc limit 1) AS code FROM a_voip AS v WHERE
user_name = 'dixi' AND tm between '2006-04-01' and '2006-05-01' group
by user_name, called_station_id;

 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=11515.93..12106.26 rows=69 width=22) (actual
time=215.719..677.044 rows=130 loops=1)
  ->  Bitmap Heap Scan on a_voip v  (cost=1106.66..11513.16 rows=554
width=22) (actual time=72.336..207.618 rows=848 loops=1)
        Recheck Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
        Filter: ((user_name)::text = 'dixi'::text)
        ->  Bitmap Index Scan on a_voip_tm  (cost=0.00..1106.66
rows=90943 width=0) (actual time=69.441..69.441 rows=93594 loops=1)
              Index Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
  SubPlan
    ->  Limit  (cost=0.00..8.55 rows=1 width=4) (actual
time=3.565..3.567 rows=1 loops=130)
          ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..85.45 rows=10 width=4) (actual
time=3.560..3.560 rows=1 loops=130)
                Filter: (($0)::text ~~ ((code)::text || '%'::text))
 Total runtime: 678.186 ms
(11 rows)

It is ugly, however not so long (but only for 69 rows). If I want to
select for ALL users it goes veeeery long:
billing=# explain analyze SELECT user_name, called_station_id, (SELECT
code FROM a_voip_codes AS c where v.called_station_id like c.code ||
'%' order by code desc limit 1) AS code FROM a_voip AS v WHERE tm
between '2006-04-01' and '2006-05-01' group by user_name,
called_station_id;

  QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=11740.52..107543.85 rows=11198 width=22) (actual
time=779.488..75637.623 rows=20564 loops=1)
  ->  Bitmap Heap Scan on a_voip v  (cost=1106.66..11285.81
rows=90943 width=22) (actual time=72.539..274.850 rows=90204 loops=1)
        Recheck Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
        ->  Bitmap Index Scan on a_voip_tm  (cost=0.00..1106.66
rows=90943 width=0) (actual time=69.853..69.853 rows=93594 loops=1)
              Index Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
  SubPlan
    ->  Limit  (cost=0.00..8.55 rows=1 width=4) (actual
time=3.631..3.633 rows=1 loops=20564)
          ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..85.45 rows=10 width=4) (actual
time=3.623..3.623 rows=1 loops=20564)
                Filter: (($0)::text ~~ ((code)::text || '%'::text))
 Total runtime: 75652.199 ms
(10 rows)

So I want to ask, how can I reorganize query/structure for achieve
good performance?

 I experiment with additional column (matched_code) for a_voip table
and think about RULE which will update that column "matched_code"
doing the (SELECT code FROM a_voip_codes AS c where
v.called_station_id like c.code || '%' order by code desc limit 1) job
when a_voip_codes updated. Or about TRIGGER. But this may also takes
long time, especially with short "code" numbers (like 1 digit). Look:

billing=# explain analyze UPDATE a_voip SET matched_code = (SELECT
code FROM a_voip_codes AS c WHERE a_voip.called_station_id like c.code
|| '%' order by code desc limit 1) WHERE matched_code LIKE '1%';

   QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on a_voip  (cost=20.34..20467.27 rows=2057
width=168) (actual time=13.407..22201.369 rows=2028 loops=1)
  Filter: ((matched_code)::text ~~ '1%'::text)
  ->  Bitmap Index Scan on a_voip_matched_code  (cost=0.00..20.34
rows=2057 width=0) (actual time=2.035..2.035 rows=2028 loops=1)
        Index Cond: (((matched_code)::text >= '1'::character varying)
AND ((matched_code)::text < '2'::character varying))
  SubPlan
    ->  Limit  (cost=0.00..8.55 rows=1 width=4) (actual
time=10.909..10.911 rows=1 loops=2028)
          ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..85.45 rows=10 width=4) (actual
time=10.923..10.923 rows=1 loops=2028)
                Filter: (($0)::text ~~ ((code)::text || '%'::text))
 Total runtime: 23216.770 ms
(9 rows)

Is there any other ways to connect longest "code" with "called_station_id"?
--
engineer

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Stark
Дата:
Сообщение: Re: App very unresponsive while performing simple update
Следующее
От: "chris smith"
Дата:
Сообщение: Re: select with "like" from another table