Re: Index on TEXT versus CHAR(32)... fast exact TEXT matching

Поиск
Список
Период
Сортировка
От Pierre-Frédéric Caillaud
Тема Re: Index on TEXT versus CHAR(32)... fast exact TEXT matching
Дата
Msg-id opsdu638jlcq72hf@musicbox
обсуждение исходный текст
Ответ на Index on TEXT versus CHAR(32)... fast exact TEXT matching  (Jon Lapham <lapham@jandr.org>)
Список pgsql-general
> CREATE TABLE a (id SERIAL, thetext TEXT, thetext_md5 CHAR(32));
> SELECT id FROM a WHERE thetext_m5d=md5('Some other text');

    From your explanations :
    - you need an exact match text->id
    - you already know the text

    Then, why bother with the TEXT column ?
    I have no idea on the performance of pg text indexes, but I do know that
a table which fits in RAM, or an index which fits in RAM, is faster than
when it does not.

    Try this :

create table my_searches (
    id        serial primary key,
    text_md5    char[16] (or even a binary string)
);

    Then create a table my_text (id,full_text) with appropriate foreign key
(to get a 1<->1 relationship between your tables) because you may still
want to remember the text. Index it on id of course.

    My point is that your search table no longer has the big text field, it
has only constant size fields, (thus constant rowsize) and it is much
smaller. The index will also be smaller and maybe fit in RAM. You then :

    SELECT id FROM my_text WHERE id in (SELECT id FROM my_searches WHERE
text_md5 = (your md5)) AND full_text=(your text);

    Unless a MD5 collision occurs, which should be rare, the subquery will
return 1 result and the first SELECT will just do an index lookup on ID.

    Now, ok, I'm stupid, I reimplemented the Hash index...

    I'd be interested in your benchmarks of your real data with :
    - table with full text + id, with btree index on text
    - table with full text + id, with hash index on text
    - the scheme I just came up with



В списке pgsql-general по дате отправления:

Предыдущее
От: Scott Ribe
Дата:
Сообщение: Re: How to determine a database is intact?
Следующее
От: Pierre-Frédéric Caillaud
Дата:
Сообщение: Re: How to determine a database is intact?