Re: pg_trgm vs. Solr ngram

Поиск
Список
Период
Сортировка
От Bertrand Mamasam
Тема Re: pg_trgm vs. Solr ngram
Дата
Msg-id CACZ67_UD38B7r3ug5uSW2_TgtdGt+hPhhw0BnsM+bK1vQE_01g@mail.gmail.com
обсуждение исходный текст
Ответ на pg_trgm vs. Solr ngram  (Chris <rc@networkz.ch>)
Список pgsql-general


Le ven. 10 févr. 2023, 03:20, Chris <rc@networkz.ch> a écrit :
Hello list

I'm pondering migrating an FTS application from Solr to Postgres, just
because we use Postgres for everything else.

The application is basically fgrep with a web frontend. However the
indexed documents are very computer network specific and contain a lot
of hyphenated hostnames with dot-separated domains, as well as IPv4 and
IPv6 addresses. In Solr I was using ngrams and customized the
TokenizerFactories until more or less only whitespace was as separator,
while [.:-_\d] remains part of the ngrams. This allows to search for
".12.255/32" or "xzy-eth5.example.org" without any false positives.

It looks like a straight conversion of this method is not possible since
the tokenization in pg_trgm is not configurable afaict. Is there some
other good method to search for a random substring including all the
punctuation using an index? Or a pg_trgm-style module that is more
flexible like the Solr/Lucene variant?

Or maybe hacking my own pg_trgm wouldn't be so hard and could be fun, do
I pretty much just need to change the emitted tokens or will this lead
to significant complications in the operators, indexes etc.?

thanks for any hints & cheers
Christian

In Solr you used FTS so I suggest that you do the same in Postgres and look at the full text search functions. You can create a tsvector yourself in many different ways or use one of the provided functions. So you could add complete IP adresses to your index and then search for them using something like phrase search. You can also create text search configurations or just use the "simple" one if you just need something like fgrep. Of course, the end result will be more like Solr and less like fgrep. 



В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pg_trgm vs. Solr ngram
Следующее
От: Alban Hertroys
Дата:
Сообщение: Re: WHERE col = ANY($1) extended to 2 or more columns?