Обсуждение: B-tree index row size limit

Поиск
Список
Период
Сортировка

B-tree index row size limit

От
Florian Weimer
Дата:
The index row size limit reared its ugly head again.

My current use of PostgreSQL is to load structured data into it but
from sources I don't have control over, to support a wide range of
queries whose precise nature is not yet known to me.  (Is this called
a data warehouse?)

Anyway, what happens from time to time is that some data which has
been processed successfully in the past suddenly failed to load
because there happens to be a very long string in it.  I know how to
work around this, but it's still annoying when it happens, and the
workarounds may make it much, much harder to write efficient queries.

What it would it take to eliminate the B-tree index row size limit (or
rather, increase it to several hundred megabytes)?  I don't care about
performance for index-based lookups for overlong columns, I just want
to be able to load arbitrary data and index it.



Re: B-tree index row size limit

От
Heikki Linnakangas
Дата:
On 10/09/2016 02:39 PM, Florian Weimer wrote:
> What it would it take to eliminate the B-tree index row size limit (or
> rather, increase it to several hundred megabytes)?  I don't care about
> performance for index-based lookups for overlong columns, I just want
> to be able to load arbitrary data and index it.

A few ideas:

* Add support for "truncate" B-tree support functions. Long values 
wouldn't be stored, but they would be cut at a suitable length. This 
would complicate things when you have two values that only differ in the 
truncated-away portion. You'd need to still be able to order them 
correctly in the index, perhaps by fetching the actual value from the heap.

* Use TOAST for index datums. That would involve adding a whole new 
toast table for the index, with the index for the toast table.

* Have something like TOAST, implemented within the B-tree AM. When a 
large datum is stored, chop it into chunks that are stored in special 
"toast" pages in the index.

* Add smarts to the planner, to support using an expression index even 
if the predicate doesn't contain the expression verbatim. For example, 
if you have an index on SUBSTR(column, 100), and a predicate "column = 
'foo'", you could use the index, if the planner just knew enough about 
SUBSTR to realize that.

* Don't do it. Use a hash index instead. If all goes well, it'll be 
WAL-logged in PostgreSQL 10.

- Heikki