Re: Replacing Apache Solr with Postgre Full Text Search?

Поиск
Список
Период
Сортировка
От Mike Rylander
Тема Re: Replacing Apache Solr with Postgre Full Text Search?
Дата
Msg-id CAO8ar==10fY-Q+mP+krz+eMdqcQ1CtSdtHzPiC0NGj5WafWzUQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Replacing Apache Solr with Postgre Full Text Search?  (J2eeInside J2eeInside <j2eeinside@gmail.com>)
Ответы Re: Replacing Apache Solr with Postgre Full Text Search?
Список pgsql-general
On Thu, Mar 26, 2020 at 4:03 AM J2eeInside J2eeInside
<j2eeinside@gmail.com> wrote:
>
> Hi Mike, and thanks for valuable answer!
> In short, you think a PG Full Text Search can do the same as Apache Solr?
>

Can it?  I mean, it does today.  Whether it would for you depends on
your needs and how much effort you can afford to put into the stuff
that is /not/ the full text engine itself, like document normalizers
and search UIs.

There are trade-offs to be made when choosing any tool.  Solr is
great, and so is Lucene (Solr's heart), and so is Elastic Search.  For
that matter, Zebra is awesome for full text indexing, too.  Those all
make indexing a pile of documents easy.  But, none of those are great
as an authoritative data store, so for instance there will necessarily
be drift between your data and the Solr index requiring a full
refresh.  It's also hard to integrate non-document filtering
requirements like I have in my use case.  Both of those are important
to my use case, so PG's full text is my preference.

Solr also didn't exist (publicly) in 2004 when we started building Evergreen. :)

> P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there any constraints in Ful Text search regarding
thosefile types? 
>

It can't handle those without some help -- it supports exactly text --
but you can extract the text using other tools.

--
Mike Rylander
 | Executive Director
 | Equinox Open Library Initiative
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker@equinoxinitiative.org
 | web:  http://equinoxinitiative.org

>
> On Wed, Mar 25, 2020 at 3:36 PM Mike Rylander <mrylander@gmail.com> wrote:
>>
>> On Wed, Mar 25, 2020 at 8:37 AM J2eeInside J2eeInside
>> <j2eeinside@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> > I hope someone  can help/suggest:
>> > I'm currently maintaining a project that uses Apache Solr /Lucene. To be honest, I wold like to replace Solr with
PostgreFull Text Search. However, there is a huge amount of documents involved - arround 200GB. Wondering, can Postgre
handlethis efficiently? 
>> > Does anyone have specific experience, and what should the infrastructure look like?
>> >
>> > P.S. Not to be confused, the Sol works just fine, i just wanted to eliminate one component from the whole system
(ifFull text search can replace Solr at all) 
>>
>> I'm one of the core developers (and the primary developer of the
>> search subsystem) for the Evergreen ILS [1] (integrated library system
>> -- think book library, not software library).  We've been using PGs
>> full-text indexing infrastructure since day one, and I can say it is
>> definitely capable of handling pretty much anything you can throw at
>> it.
>>
>> Our indexing requirements are very complex and need to be very
>> configurable, and need to include a lot more than just "search and
>> rank a text column," so we've had to build a ton of infrastructure
>> around record (document) ingest, searching/filtering, linking, and
>> display.  If your indexing and search requirements are stable,
>> specific, and well-understood it should be straight forward,
>> especially if you don't have to take into account non-document
>> attributes like physical location, availability, and arbitrary
>> real-time visibility rules like Evergreen does.
>>
>> As for scale, it's more about document count than total size.  There
>> are Evergreen libraries with several million records to search, and
>> with proper hardware and tuning everything works well.  Our main
>> performance issue has to do with all of the stuff outside the records
>> (documents) themselves that have to be taken into account during
>> search.  The core full-text search part of our queries is extremely
>> performant, and has only gotten better over the years.
>>
>> [1] http://evergreen-ils.org
>>
>> HTH,
>> --
>> Mike Rylander
>>  | Executive Director
>>  | Equinox Open Library Initiative
>>  | phone:  1-877-OPEN-ILS (673-6457)
>>  | email:  miker@equinoxinitiative.org
>>  | web:  http://equinoxinitiative.org



В списке pgsql-general по дате отправления:

Предыдущее
От: Justin King
Дата:
Сообщение: Re: PG12 autovac issues
Следующее
От: Andreas Joseph Krogh
Дата:
Сообщение: Sv: Replacing Apache Solr with Postgre Full Text Search?