Re: tsvector field length limitation

Поиск
Список
Период
Сортировка
Искать

Re: tsvector field length limitation

От:
Jonathan Marks <jonathanaverymarks@gmail.com>
Дата:
What if we just didn’t use positional arguments at all? I.e. we just populate the tsvector with lexemes only?

> On Jun 20, 2018, at 10:49 AM, Tom Lane  wrote:
> 
> Jonathan Marks  writes:
>> ... we run into the max tsvector length requirement "The length of a tsvector (lexemes + positions) must be less than 1 megabyte”
> 
>> Is there any way to disable or increase that limit in Postgres 10.3?
> 
> No; it's forced by the representation used for tsvector, which stores
> lexeme offsets in 20-bit fields (cf WordEntry in
> src/include/tsearch/ts_type.h).  Perhaps that was short-sighted but
> I don't foresee it changing anytime soon.  You'd more or less need
> a whole new datatype ("bigtsvector"?) to make it happen.
> 
> 			regards, tom lane


Re: tsvector field length limitation

От:
AJG <ayden@gera.co.nz>
Дата:
Hi Jonathan,

Check out this potential fix/extension

https://github.com/postgrespro/tsvector2





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Re: tsvector field length limitation

От:
Tom Lane <tgl@sss.pgh.pa.us>
Дата:
Jonathan Marks  writes:
> ... we run into the max tsvector length requirement "The length of a tsvector (lexemes + positions) must be less than 1 megabyte”

> Is there any way to disable or increase that limit in Postgres 10.3?

No; it's forced by the representation used for tsvector, which stores
lexeme offsets in 20-bit fields (cf WordEntry in
src/include/tsearch/ts_type.h).  Perhaps that was short-sighted but
I don't foresee it changing anytime soon.  You'd more or less need
a whole new datatype ("bigtsvector"?) to make it happen.

			regards, tom lane

tsvector field length limitation

От:
Jonathan Marks <jonathanaverymarks@gmail.com>
Дата:
Hi folks —

We utilize Postgres’ full text search system pretty heavily in our team’s operations and often index tens of millions of records with varying lengths of text. In most cases, the text we need to index is pretty short (no more than. hundreds of words) but in rare cases a single record is very very long (high hundreds of thousands of words or longer). With those records, we run into the max tsvector length requirement "The length of a tsvector (lexemes + positions) must be less than 1 megabyte”

I understand the performance implications of having very long tsvectors (our gin index updates are pretty terrible in some cases) but would really appreciate it if the max tsvector length were larger (even 5MB would make a huge difference) or if that error were a stern warning rather than a hard error.

Is there any way to disable or increase that limit in Postgres 10.3? Perhaps in a future version?

Thank you!
Jonathan
FAQ