Обсуждение: spgist text_ops and LIKE

Поиск

Список

Период

Сортировка

spgist text_ops and LIKE

От

Robert Haas

Дата:

01 февраля 2012 г., 21:51:05

Is spgist intended to support prefix searches with LIKE?

I ask because, first, it seems like something spgist ought to be good
at (unless I'm confused), and, second, the text_ops opfamily includes
these operators:

~<~(text,text)~<=~(text,text)~>=~(text,text)~>~(text,text)

...which seems to be the same operators that are used for btree
pattern-matching searches:

rhaas=# explain select count(*) from person where last_name like 'WAR%';                                       QUERY
PLAN
------------------------------------------------------------------------------------------Aggregate
(cost=2519.27..2519.28rows=1 width=0)  ->  Bitmap Heap Scan on person  (cost=24.70..2496.75 rows=9005 width=0)
Filter:(last_name ~~ 'WAR%'::text)        ->  Bitmap Index Scan on person_tpo  (cost=0.00..22.45
 
rows=900 width=0)              Index Cond: ((last_name ~>=~ 'WAR'::text) AND
(last_name ~<~ 'WAS'::text))
(5 rows)

...but when I create an index like this:

create index person_spg on person using spgist (last_name text_ops);

...I can't get LIKE to use it, even if I disable seqscans.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: spgist text_ops and LIKE

От

Tom Lane

Дата:

02 февраля 2012 г., 02:58:09

Robert Haas <robertmhaas@gmail.com> writes:
> Is spgist intended to support prefix searches with LIKE?

Too lazy to look at the code right now, but I think indxpath.c contains
hardwired assumptions that LIKE prefix optimizations are only possible
with btree indexes.  Possibly it would be worth relaxing that.  (The
whole "special index operator" mechanism is undesirably special-purpose,
but I currently have no ideas about how to make it more flexible.)
        regards, tom lane

Re: spgist text_ops and LIKE

От

Tom Lane

Дата:

02 февраля 2012 г., 16:19:22

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Feb 1, 2012 at 10:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Too lazy to look at the code right now, but I think indxpath.c contains
>> hardwired assumptions that LIKE prefix optimizations are only possible
>> with btree indexes.  Possibly it would be worth relaxing that.

> Is it as simple as the attached?

Yeah, that looks about right, with the single quibble that the other
places where we have #defines in pg_opfamily.h put them after the
corresponding DATA line not before.  Please fix and commit.

> On the flip side, spg_text_inner_consistent()
> seems to fall back to a full index-scan for the regular comparison
> operators, which makes me wonder why anyone would bother having the
> operator at all.

MHO is that that code path is only useful when you are in C locale;
but I think that's a useful enough case to be worth the rather small
amount of incremental code to handle it.  I'm not sure I believe your
speculations about making spgist handle inequality searches in other
locales.  Non-C sort orders are too bizarre and we have too little
visibility into the precise rules the current locale is following.

... however ... could it be reasonable to apply strxfrm and store the
result of that in the index?  Then searches would just need byte-by-byte
comparisons.
        regards, tom lane

Re: spgist text_ops and LIKE

От

Robert Haas

Дата:

02 февраля 2012 г., 17:21:58

On Thu, Feb 2, 2012 at 12:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Feb 1, 2012 at 10:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Too lazy to look at the code right now, but I think indxpath.c contains
>>> hardwired assumptions that LIKE prefix optimizations are only possible
>>> with btree indexes.  Possibly it would be worth relaxing that.
>
>> Is it as simple as the attached?
>
> Yeah, that looks about right, with the single quibble that the other
> places where we have #defines in pg_opfamily.h put them after the
> corresponding DATA line not before.  Please fix and commit.

Done.

>> On the flip side, spg_text_inner_consistent()
>> seems to fall back to a full index-scan for the regular comparison
>> operators, which makes me wonder why anyone would bother having the
>> operator at all.
>
> MHO is that that code path is only useful when you are in C locale;
> but I think that's a useful enough case to be worth the rather small
> amount of incremental code to handle it.  I'm not sure I believe your
> speculations about making spgist handle inequality searches in other
> locales.  Non-C sort orders are too bizarre and we have too little
> visibility into the precise rules the current locale is following.

Unfortunately, I fear that you are right, unless we were to switch
over to using a locale machinery of our own devising in place of that
provided by the OS.  And no, I'm not proposing that.

> ... however ... could it be reasonable to apply strxfrm and store the
> result of that in the index?  Then searches would just need byte-by-byte
> comparisons.

Yeah, you could, but I don't see the point.  I was hoping that spgist
would give us the ability to have a single index that allows both
inequality comparisons and pattern matching.  If you're going to have
one index where the strings are stored "straight" and another where
strxfrm() is applied, then you could just build two btree indexes, one
the regular way and a second with text_pattern_ops.

I'm having trouble figuring out under what set of circumstances spgist
is expected to be the best available alternative.  It only supports a
small subset of the data types that GiST does, so I suppose the point
is that it should be faster for the cases that it does handle.  And,
considering that this is all brand-new code, the fact that it's almost
keeping pace with btree on both pattern-matching and equality
comparisons is certainly respectable -- but I so far haven't found any
cases where it's a clear win.  There's limited glory in being the
almost-fastest way of indexing for a certain class of queries.
Admittedly, I haven't tried the point-in-box stuff yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: spgist text_ops and LIKE

От

Tom Lane

Дата:

02 февраля 2012 г., 18:02:19

Robert Haas <robertmhaas@gmail.com> writes:
> I'm having trouble figuring out under what set of circumstances spgist
> is expected to be the best available alternative.  It only supports a
> small subset of the data types that GiST does, so I suppose the point
> is that it should be faster for the cases that it does handle.  And,
> considering that this is all brand-new code, the fact that it's almost
> keeping pace with btree on both pattern-matching and equality
> comparisons is certainly respectable -- but I so far haven't found any
> cases where it's a clear win.  There's limited glory in being the
> almost-fastest way of indexing for a certain class of queries.
> Admittedly, I haven't tried the point-in-box stuff yet.

I think the text opclass is mostly meant as sample code/proof-of-concept.
What spgist is likely to be good for is queries that btree can't cope
with at all.  But it's cute that we can make it handle LIKE for a few
more lines of code, so I'm fine with adding that.
        regards, tom lane

Re: spgist text_ops and LIKE

От

Alexander Korotkov

Дата:

02 февраля 2012 г., 18:07:59

On Thu, Feb 2, 2012 at 10:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I'm having trouble figuring out under what set of circumstances spgist
is expected to be the best available alternative. It only supports a
small subset of the data types that GiST does, so I suppose the point
is that it should be faster for the cases that it does handle. And,
considering that this is all brand-new code, the fact that it's almost
keeping pace with btree on both pattern-matching and equality
comparisons is certainly respectable -- but I so far haven't found any
cases where it's a clear win. There's limited glory in being the
almost-fastest way of indexing for a certain class of queries.

I think that current implementation of suffix tree (in particular implementation of consistent methods) in spgist is far from reveal full potential of suffix tree. I see at least two additional search types which can be efficiently evaluated using suffix tree:
1) Search by prefix regular expression. For example, search for /^a(bc|d)e[fgh]/ in single index scan.
2) Search by levenshtein distance, i.e. "SELECT * FROM tbl WHERE levenshtein(col, 'some constant') < k;" using index (surely actual syntax for such index scan would be anouther).
This is only two additional applications which first comes to my mind, there could be other. Unfortunately, I don't have enough of time for it now. But I like to put my hands on this in future.

Admittedly, I haven't tried the point-in-box stuff yet.

In short, I believe spgist for geometrical data-types is more CPU-effective and less IO-effective than gist. Gist have to call consistent method for all index tuples of the page it uses in scan, while spgist scans smaller nodes. Sure, comprehensive testing is required for doing final conclusion about that.

-----
With best regards,
Alexander Korotkov.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: spgist text_ops and LIKE

spgist text_ops and LIKE

Re: spgist text_ops and LIKE

Re: spgist text_ops and LIKE

Вложения

Re: spgist text_ops and LIKE

Re: spgist text_ops and LIKE

Re: spgist text_ops and LIKE

Re: spgist text_ops and LIKE