Re: grep -f keyword data query

Поиск
Список
Период
Сортировка
От Hiroyuki Sato
Тема Re: grep -f keyword data query
Дата
Msg-id CA+Tq-Rrm_mf_DQzwSjaLmEfoU9Yv9kcXVavxMxH+S9Kt3ZQb5A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: grep -f keyword data query  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: grep -f keyword data query  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-general
Hello Jeff

Thank you for replying.

2015年12月29日(火) 4:35 Jeff Janes <jeff.janes@gmail.com>:
On Sun, Dec 27, 2015 at 3:53 PM, Hiroyuki Sato <hiroysato@gmail.com> wrote:
> Hello Andreas and Tom
>
> Thank you for replying.
>
> Sorry, I re-created my questions. I was mis-pasted query log on previous
> question.
> (@~ operator is PGroonga extension (http://pgroonga.github.io))
> Please ignore it.
>
> Best regards.
>
> 1, Problem.
>   (1) Following query is exteme slow. (478sec)
>     SELECT
>       u.url
>     FROM
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
>
>   (2) grep -f kwd.txt sample.txt (exec time under 1sec)

Certainly not in my hands.  The best I can get is 9 seconds.

>
> 2, Questions
>
>   (1) Is it possible to improve this query like the command ``grep -f
> keyword data``?

You will not get a general tool to match a specialized tool in the
specialized tool's own domain.  fgrep is good at what fgrep does.

This is just same data and real data is forward proxy log.
There are no own domain. It is contain 20,000,000 log per day.
 


Since your queries all have constant text strings at the beginning,
they could use the index.  But if you are not using the C collation,
then you need build a special index:

create index on url_lists4 (url text_pattern_ops);

Currently I just use C collation (ASCII) only. 
 

But, the planner refuses to use this index for your query anyway,
because it can't see that the patterns are all left-anchored.

Really, your best bet is refactor your url data so it is stored with a
url_prefix and url_suffix column.  Then you can do exact matching
rather than pattern matching.

I see, exact matching faster than pattern matting. 
But I need pattern match in path part 
 I would like to pattern match '/a/b/c' part. 

That's why I asked this question.

If it is impossible to improve join speed, I will dump data once, and 
match it with grep or something tools.

Thanks.

 

Cheers,

Jeff

В списке pgsql-general по дате отправления:

Предыдущее
От: Hiroyuki Sato
Дата:
Сообщение: Re: grep -f keyword data query
Следующее
От: Christopher Molnar
Дата:
Сообщение: Regex help again (sorry, I am bad at these)