Обсуждение: Similarity Search with Wildcards

Поиск
Список
Период
Сортировка

Similarity Search with Wildcards

От
Ghislain Hachey
Дата:
Hi list,

I have a varchar column with content such as "Client Name - Brief
Description of Problem" (it's a help desk ticket system). I want to
generate reports by clients and the only thing I can base my query on is
this column. The client names often contain typos or are entered
slightly differently. I installed the pg_trgm extension and it almost
does what I want. The problem is that it searches the similarity of the
whole field and not just the client name resulting in not so similar
searches (I include my query below).

SELECT
  tickets.id as ticket_id,
  tickets.subject as ticket_subject,
  similarity(tickets.subject, 'Client Name') AS sml,
FROM
 tickets
WHERE
  tickets.subject % 'Client Name';

I thought about using wildcards as discussed here
<http://www.postgresql.org/message-id/flat/4D3CC2DC.6060002@wulczer.org#4D3CC2DC.6060002@wulczer.org>
but this does not seem to have any effect (I include the query I tried
below).

SELECT
  tickets.id as ticket_id,
  tickets.subject as ticket_subject,
  similarity(tickets.subject, '%Client Name%') AS sml,
FROM
 tickets
WHERE
  tickets.subject % '%Client Name%';

Both queries result in the same similarity. I would hope that the
similarity algorithm would only work on the "Client Name" part of the
string and ignore what is before and after; in other words, the latter
query above would return a similarity factor of 1 on the content "Client
Name - Brief Description of Problem".

Any pointer in a right direction would be appreciated.

--
GH<www.ghachey.info>

Re: Similarity Search with Wildcards

От
John R Pierce
Дата:
On 2/27/2013 10:35 PM, Ghislain Hachey wrote:
> I have a varchar column with content such as "Client Name - Brief
> Description of Problem" (it's a help desk ticket system). I want to
> generate reports by clients and the only thing I can base my query on
> is this column. The client names often contain typos or are entered
> slightly differently. I installed the pg_trgm extension and it almost
> does what I want. The problem is that it searches the similarity of
> the whole field and not just the client name resulting in not so
> similar searches (I include my query below).


why isn't client name a separate field??   thats the logical approach



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Similarity Search with Wildcards

От
Ghislain Hachey
Дата:
On 02/28/2013 06:12 PM, John R Pierce wrote:
> On 2/27/2013 10:35 PM, Ghislain Hachey wrote:
>> I have a varchar column with content such as "Client Name - Brief
>> Description of Problem" (it's a help desk ticket system). I want to
>> generate reports by clients and the only thing I can base my query on
>> is this column. The client names often contain typos or are entered
>> slightly differently. I installed the pg_trgm extension and it almost
>> does what I want. The problem is that it searches the similarity of
>> the whole field and not just the client name resulting in not so
>> similar searches (I include my query below).
>
>
> why isn't client name a separate field??   thats the logical approach

I know, but the system and the workflow of the staff is already in
place. I was hoping to get something quick with minimal changes. I was
also hoping to understand more how wildcards can be used with pg_trgm.
Otherwise, I will add a field and modify the app.

Thanks,

--
GH<www.ghachey.info>

Re: Similarity Search with Wildcards

От
Ken Tanzer
Дата:
I'm not sure about the indexing and performance impacts, but I think you
could use SUBSTRING with a regex to pull out the client name, and then
match on that.

 SELECT substring('Client Name - Description' FROM '^(.*) [-]');

  substring
-------------
 Client Name


On Thu, Feb 28, 2013 at 12:02 AM, Ghislain Hachey <ghachey@gmail.com> wrote:

>  On 02/28/2013 06:12 PM, John R Pierce wrote:
>
> On 2/27/2013 10:35 PM, Ghislain Hachey wrote:
>
> I have a varchar column with content such as "Client Name - Brief
> Description of Problem" (it's a help desk ticket system). I want to
> generate reports by clients and the only thing I can base my query on is
> this column. The client names often contain typos or are entered slightly
> differently. I installed the pg_trgm extension and it almost does what I
> want. The problem is that it searches the similarity of the whole field and
> not just the client name resulting in not so similar searches (I include my
> query below).
>
>
>
> why isn't client name a separate field??   thats the logical approach
>
>
> I know, but the system and the workflow of the staff is already in place.
> I was hoping to get something quick with minimal changes. I was also hoping
> to understand more how wildcards can be used with pg_trgm. Otherwise, I
> will add a field and modify the app.
>
> Thanks,
>
> --
> GH <http://www.ghachey.info>
>



--
AGENCY Software
A data system that puts you in control
100% Free Software
*http://agency-software.org/*
ken.tanzer@agency-software.org
(253) 245-3801

Subscribe to the mailing
list<agency-general-request@lists.sourceforge.net?body=subscribe>
 to
learn more about AGENCY or
follow the discussion.