Обсуждение: select with "like" from another table

Поиск
Список
Период
Сортировка

select with "like" from another table

От
"Anton Maksimenkov"
Дата:
Hi.

I have 2 tables - one with calls numbers and another with calls codes.
The structure almost like this:
billing=# \d a_voip
                                        Table "public.a_voip"
      Column       |            Type             |
 Modifiers
--------------------+-----------------------------+-----------------------------------------------------
 id                 | integer                     | not null default
nextval('a_voip_id_seq'::regclass)
 tm                 | timestamp without time zone | not null
 user_name          | character varying(50)       | not null
...
 calling_station_id | character varying(20)       | not null
 called_station_id  | character varying(20)       | not null
Indexes:
   "a_voip_pkey" PRIMARY KEY, btree (id)
   "a_voip_tm" btree (tm)

billing=# \d a_voip_codes
       Table "public.a_voip_codes"
 Column |         Type          | Modifiers
--------+-----------------------+-----------
 code   | integer               | not null
 region | character varying(77) |
 tarif  | numeric(13,7)         |
Indexes:
   "a_voip_codes_pkey" PRIMARY KEY, btree (code)

I need to select longest codes from a_voip_codes which match with the
the called_station_id. Because codes (very rarely) changes I construct
query

SELECT  user_name, called_station_id,
(SELECT code FROM a_voip_codes AS c where v.called_station_id like
c.code || '%' order by code desc limit 1) AS code
 FROM a_voip AS v WHERE user_name = 'dixi' AND tm between '2006-04-01'
and '2006-05-01' group by user_name, called_station_id;

Analyzed variant
billing=# explain analyze SELECT user_name, called_station_id, (SELECT
code FROM a_voip_codes AS c where v.called_station_id like c.code ||
'%' order by code desc limit 1) AS code FROM a_voip AS v WHERE
user_name = 'dixi' AND tm between '2006-04-01' and '2006-05-01' group
by user_name, called_station_id;

 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=11515.93..12106.26 rows=69 width=22) (actual
time=215.719..677.044 rows=130 loops=1)
  ->  Bitmap Heap Scan on a_voip v  (cost=1106.66..11513.16 rows=554
width=22) (actual time=72.336..207.618 rows=848 loops=1)
        Recheck Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
        Filter: ((user_name)::text = 'dixi'::text)
        ->  Bitmap Index Scan on a_voip_tm  (cost=0.00..1106.66
rows=90943 width=0) (actual time=69.441..69.441 rows=93594 loops=1)
              Index Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
  SubPlan
    ->  Limit  (cost=0.00..8.55 rows=1 width=4) (actual
time=3.565..3.567 rows=1 loops=130)
          ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..85.45 rows=10 width=4) (actual
time=3.560..3.560 rows=1 loops=130)
                Filter: (($0)::text ~~ ((code)::text || '%'::text))
 Total runtime: 678.186 ms
(11 rows)

It is ugly, however not so long (but only for 69 rows). If I want to
select for ALL users it goes veeeery long:
billing=# explain analyze SELECT user_name, called_station_id, (SELECT
code FROM a_voip_codes AS c where v.called_station_id like c.code ||
'%' order by code desc limit 1) AS code FROM a_voip AS v WHERE tm
between '2006-04-01' and '2006-05-01' group by user_name,
called_station_id;

  QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=11740.52..107543.85 rows=11198 width=22) (actual
time=779.488..75637.623 rows=20564 loops=1)
  ->  Bitmap Heap Scan on a_voip v  (cost=1106.66..11285.81
rows=90943 width=22) (actual time=72.539..274.850 rows=90204 loops=1)
        Recheck Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
        ->  Bitmap Index Scan on a_voip_tm  (cost=0.00..1106.66
rows=90943 width=0) (actual time=69.853..69.853 rows=93594 loops=1)
              Index Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
  SubPlan
    ->  Limit  (cost=0.00..8.55 rows=1 width=4) (actual
time=3.631..3.633 rows=1 loops=20564)
          ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..85.45 rows=10 width=4) (actual
time=3.623..3.623 rows=1 loops=20564)
                Filter: (($0)::text ~~ ((code)::text || '%'::text))
 Total runtime: 75652.199 ms
(10 rows)

So I want to ask, how can I reorganize query/structure for achieve
good performance?

 I experiment with additional column (matched_code) for a_voip table
and think about RULE which will update that column "matched_code"
doing the (SELECT code FROM a_voip_codes AS c where
v.called_station_id like c.code || '%' order by code desc limit 1) job
when a_voip_codes updated. Or about TRIGGER. But this may also takes
long time, especially with short "code" numbers (like 1 digit). Look:

billing=# explain analyze UPDATE a_voip SET matched_code = (SELECT
code FROM a_voip_codes AS c WHERE a_voip.called_station_id like c.code
|| '%' order by code desc limit 1) WHERE matched_code LIKE '1%';

   QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on a_voip  (cost=20.34..20467.27 rows=2057
width=168) (actual time=13.407..22201.369 rows=2028 loops=1)
  Filter: ((matched_code)::text ~~ '1%'::text)
  ->  Bitmap Index Scan on a_voip_matched_code  (cost=0.00..20.34
rows=2057 width=0) (actual time=2.035..2.035 rows=2028 loops=1)
        Index Cond: (((matched_code)::text >= '1'::character varying)
AND ((matched_code)::text < '2'::character varying))
  SubPlan
    ->  Limit  (cost=0.00..8.55 rows=1 width=4) (actual
time=10.909..10.911 rows=1 loops=2028)
          ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..85.45 rows=10 width=4) (actual
time=10.923..10.923 rows=1 loops=2028)
                Filter: (($0)::text ~~ ((code)::text || '%'::text))
 Total runtime: 23216.770 ms
(9 rows)

Is there any other ways to connect longest "code" with "called_station_id"?
--
engineer

Re: select with "like" from another table

От
"chris smith"
Дата:
On 5/29/06, Anton Maksimenkov <anton200@gmail.com> wrote:
> Hi.
>
> I have 2 tables - one with calls numbers and another with calls codes.
> The structure almost like this:
...

How long does this query take?

SELECT code FROM a_voip_codes c, a_voip v where v.called_station_id
like c.code ||
'%' order by code desc limit 1

I wonder if you'll benefit from an index on a_voip(called_station_id)
to speed up this join.

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: select with "like" from another table

От
"Anton Maksimenkov"
Дата:
> > I have 2 tables - one with calls numbers and another with calls codes.
> > The structure almost like this:
> ...
> How long does this query take?
>
> SELECT code FROM a_voip_codes c, a_voip v where v.called_station_id
> like c.code ||
> '%' order by code desc limit 1

billing=# explain analyze SELECT code FROM a_voip_codes c, a_voip v
where v.called_station_id like c.code || '%' order by code desc limit
1;

 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..11.24 rows=1 width=4) (actual
time=15809.846..15809.848 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..35877212.61 rows=3192650 width=4)
(actual time=15809.841..15809.841 rows=1 loops=1)
         Join Filter: (("inner".called_station_id)::text ~~
(("outer".code)::text || '%'::text))
         ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..69.87 rows=2078 width=4) (actual
time=0.029..0.106 rows=6 loops=1)
         ->  Seq Scan on a_voip v  (cost=0.00..11887.81 rows=307281
width=13) (actual time=1.696..935.368 rows=254472 loops=6)
 Total runtime: 15810.088 ms
(6 rows)


> I wonder if you'll benefit from an index on a_voip(called_station_id)
> to speed up this join.

Yes, it's long. But index gives no help here:

billing=# CREATE INDEX a_voip_called_station_id ON a_voip(called_station_id);
CREATE INDEX
billing=# vacuum analyze;
VACUUM
billing=# explain analyze SELECT code FROM a_voip_codes c, a_voip v
where v.called_station_id like c.code || '%' order by code desc limit
1;

 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..11.27 rows=1 width=4) (actual
time=15254.783..15254.785 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..35767665.65 rows=3172732 width=4)
(actual time=15254.778..15254.778 rows=1 loops=1)
         Join Filter: (("inner".called_station_id)::text ~~
(("outer".code)::text || '%'::text))
         ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..69.87 rows=2078 width=4) (actual
time=0.021..0.097 rows=6 loops=1)
         ->  Seq Scan on a_voip v  (cost=0.00..11868.64 rows=305364
width=13) (actual time=0.006..750.337 rows=254472 loops=6)
 Total runtime: 15255.066 ms
(6 rows)


The main problem with first (main) query:

SELECT  user_name, called_station_id,
(SELECT code FROM a_voip_codes AS c where v.called_station_id LIKE
c.code || '%' order by code desc limit 1) AS code
 FROM a_voip AS v WHERE user_name = 'dixi' AND tm between '2006-04-01'
and '2006-05-01' group by user_name, called_station_id;

is that internal (SELECT... v.called_station_id LIKE c.code || '%'...)
executed for each row, returned by external SELECT user_name... part.
So I looking how to avoid internal (SELECT ...) part of query.

 Terrible oracle gives something like "over by (partition by ... order
by code desc) rnum ... where rnum = 1" which works like DISTINCT and
numerate similate rows, then we get just longest (rnum = 1) rows. But
I can't imagine how to implement some kind of this algorithm with
postgres.
--
engineer