phraseto_tsquery design

Поиск
Список
Период
Сортировка
От Sagiv Some
Тема phraseto_tsquery design
Дата
Msg-id CADcPEWBedG8jUG6yuSS4d8n7SsUM2fajH01Eprp27do1sAQEBA@mail.gmail.com
обсуждение исходный текст
Ответы Re: phraseto_tsquery design  (Arthur Zakirov <a.zakirov@postgrespro.ru>)
Список pgsql-hackers
I recently migrated over from SQL Server to Postgres (through Amazon Aurora) for the cost. 

I have been working with the full-text search functions for quite some time, but have encountered a few major roadblocks in my quest to use postgres full-text search functionality to its fullest. I am using it as a blob text-search tool, where the entire document (could be up to 100 pages) is indexed and searched on:

1. Limited table and document sizes. In SQL server, I can create a 1.5 TB table no problem (filingid, data blobs). However, in postgres, this is not possible - I am limited to approximately 100-200GB per table and 1MB per tsvector. This is an annoyance, but can get around it.

2. However, it seems impossible to bypass the performance problem of phrase searching. I conduct quite a bit of phrase searching, and although postgres' "phraseto_tsquery" performs great on phrases with uncommon words, it slows to a screeching halt on phrases with common words such as "law firm" or, for example, "bank of america". This is a huge problem, because "plainto_tsquery" performs just fine on these but as I understand it, phrase searching is built to do a scan after finding each word using "plainto"?

There are already positions and the "plainto" function is quite fast; is there a way to modify the "phraseto" query to perform a useful and fast search that looks for the distance between found words appropriately? 

Seems fairly trivial to modify the function this way, but I didn't see it on the development roadmap. It's a pretty critical piece and I have seen that other people have had issues with it too, and a test between this function and some other similar functions show that the performance of the phraseto function is quite poor.

Thanks,
Sagiv


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robbie Harwood
Дата:
Сообщение: Re: libpq compression
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)