Re: How to optimize query that concatenates strings?

Поиск
Список
Период
Сортировка
От Stephane Bortzmeyer
Тема Re: How to optimize query that concatenates strings?
Дата
Msg-id 20060707205841.GA10545@sources.org
обсуждение исходный текст
Ответ на How to optimize query that concatenates strings?  ("badlydrawnbhoy" <badlydrawnbhoy@gmail.com>)
Список pgsql-general
On Fri, Jul 07, 2006 at 04:29:51AM -0700,
 badlydrawnbhoy <badlydrawnbhoy@gmail.com> wrote
 a message of 48 lines which said:

> I've got a database of URLs, and when inserting new data into it I
> want to make sure that there are no functionally equivalent URLs
> already present. For example, 'umist.ac.uk' is functionally the same
> as 'umist.ac.uk/'.

IMHO, your problem seems to be an instance of a very general class:
data that needs canonicalization. For instance, I work for a domain
name registry and domain names are case-insensitive. Do we store the
domain names blindly and then always use ILIKE or other
case-insensitive operators? No, we canonicalize domain names by
changing them to lower-case. That way:

* we do not have to think of using case-insensitive operators,

* indexes do work.

This is what I recommend here: decide on a canonical form and
canonicalize everything when it comes into the database (it is easy to
do it from a trigger).

If the original form is important, you can still store it in a column
intended for display, not for searching.

Here is a way to canonicalize, with a trigger. The function
"canonicalize" is left as an exercice (you can write it in PL/pgSQL,
C, Python, etc):

CREATE OR REPLACE FUNCTION force_canonicalization() RETURNS TRIGGER
   AS 'BEGIN
          NEW.url = canonicalize(NEW.url);
          RETURN NEW;
       END;'
     LANGUAGE PLPGSQL;

CREATE TRIGGER force_canonicalization
   BEFORE INSERT ON Things
   FOR EACH ROW
   EXECUTE PROCEDURE force_canonicalization();


В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: How to insert .xls files into database
Следующее
От: Michael Loftis
Дата:
Сообщение: Bug? Changing where distinct occurs produces error?