Re: Replacing Apache Solr with Postgre Full Text Search?

Поиск
Список
Период
Сортировка
От J2eeInside J2eeInside
Тема Re: Replacing Apache Solr with Postgre Full Text Search?
Дата
Msg-id CAK-aFFbaE33n4t_wOdHGwAZYPacpo-v87w72tA5cZcdAdRCYkw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Replacing Apache Solr with Postgre Full Text Search?  (Mike Rylander <mrylander@gmail.com>)
Ответы Re: Replacing Apache Solr with Postgre Full Text Search?
Список pgsql-general
Hi Mike, and thanks for valuable answer!
In short, you think a PG Full Text Search can do the same as Apache Solr?

P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there any constraints in Ful Text search regarding those file types?


On Wed, Mar 25, 2020 at 3:36 PM Mike Rylander <mrylander@gmail.com> wrote:
On Wed, Mar 25, 2020 at 8:37 AM J2eeInside J2eeInside
<j2eeinside@gmail.com> wrote:
>
> Hi all,
>
> I hope someone  can help/suggest:
> I'm currently maintaining a project that uses Apache Solr /Lucene. To be honest, I wold like to replace Solr with Postgre Full Text Search. However, there is a huge amount of documents involved - arround 200GB. Wondering, can Postgre handle this efficiently?
> Does anyone have specific experience, and what should the infrastructure look like?
>
> P.S. Not to be confused, the Sol works just fine, i just wanted to eliminate one component from the whole system (if Full text search can replace Solr at all)

I'm one of the core developers (and the primary developer of the
search subsystem) for the Evergreen ILS [1] (integrated library system
-- think book library, not software library).  We've been using PGs
full-text indexing infrastructure since day one, and I can say it is
definitely capable of handling pretty much anything you can throw at
it.

Our indexing requirements are very complex and need to be very
configurable, and need to include a lot more than just "search and
rank a text column," so we've had to build a ton of infrastructure
around record (document) ingest, searching/filtering, linking, and
display.  If your indexing and search requirements are stable,
specific, and well-understood it should be straight forward,
especially if you don't have to take into account non-document
attributes like physical location, availability, and arbitrary
real-time visibility rules like Evergreen does.

As for scale, it's more about document count than total size.  There
are Evergreen libraries with several million records to search, and
with proper hardware and tuning everything works well.  Our main
performance issue has to do with all of the stuff outside the records
(documents) themselves that have to be taken into account during
search.  The core full-text search part of our queries is extremely
performant, and has only gotten better over the years.

[1] http://evergreen-ils.org

HTH,
--
Mike Rylander
 | Executive Director
 | Equinox Open Library Initiative
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker@equinoxinitiative.org
 | web:  http://equinoxinitiative.org

В списке pgsql-general по дате отправления:

Предыдущее
От: "Ivan E. Panchenko"
Дата:
Сообщение: Re: PostgreSQL 13: native JavaScript Procedural Language support ?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: PostgreSQL 13: native JavaScript Procedural Language support ?