Re: WIP: index support for regexp search

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: WIP: index support for regexp search
Дата
Msg-id CAPpHfdtWpbVoXGU+damLwb+_-1wcRRwuzm2XSuy8Qr1aDdnDzw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: index support for regexp search  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Apr 9, 2013 at 9:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Mon, Apr 8, 2013 at 9:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I spent the weekend hacking on this, making a number of bug fixes and a
>> whole lot of cosmetic changes.  I think there are large parts of this
>> that are in committable shape now, but I still find the actual graph
>> transformation logic to be mostly unintelligible.  I think what's most
>> obscure is the distinction between the arcs list and the keys list of
>> each state in the expanded graph.  I get the impression that the
>> general idea is for the arcs to represent exactly-known transitions
>> while the keys represent imprecisely-known transitions ... but there
>> seems to be at least some leakage between those categories.  Could
>> you write down a specification for what's supposed to be happening
>> there?

> Here is my try to specify it.

Thanks.  I hacked on this some more and committed it.  I found a number
of bugs along the way with respect to handling of word boundaries
(partially-blank transition trigrams) and EOL-color ($) handling.
I think it's all fixed now but it could definitely use some more
study and testing.
 
Great, thanks! I also will do some more testing.

One issue that bothered me is that the regression tests really don't
provide much visibility into what the code is doing.  Some of the bugs
had to do with failing to generate expected trigrams, for instance
col ~ 'foo bar' only generating trigram "foo" and not "bar".  This still
led to getting the right answer, so the error was invisible as far as the
tests were concerned.  Is it worth thinking of a way to expose what the
extract function did at SQL level, so we could test more carefully?

Yes, I also had similar idea. But, I think we need some relatively stable representation of resulting graph in order to expose it. There could be a lot of equivalent graphs. Some changes in implementation could lead to change from one equivalent graph to another. It would be better to not rewrite tests in this case. Ideally, we should expose some representation which is the same for all equivalent graphs. However, it doesn't seem to be realistic. But, I think we could at least make it stable to order sequence of states and color trigrams. Another option I see is to expose just set of trigrams. It doesn't have completeness of information, but it is quite stable.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: corrupt pages detected by enabling checksums
Следующее
От: Christoph Berg
Дата:
Сообщение: Re: Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog