Re: How to store "blobs" efficiently for small and large sizes, with random access

Поиск

Список

Период

Сортировка

От	Dominique Devienne
Тема	Re: How to store "blobs" efficiently for small and large sizes, with random access
Дата	19 октября 2022 г. 16:32:50
Msg-id	CAFCRh-8o5XzNYjPLZ1hxmzgVCEBA90fRrWTmEq6Ard82n1GDPA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: How to store "blobs" efficiently for small and large sizes, with random access (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы	Re: How to store "blobs" efficiently for small and large sizes, with random access (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список	pgsql-general

Дерево обсуждения

On Wed, Oct 19, 2022 at 3:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2022-Oct-19, Dominique Devienne wrote:
> > OTOH, lo has random access, which I also need...
>
> Generally speaking, bytea sucks for random access, because if a TOAST
> item is compressed, it has to be always read from the beginning in order
> to decompress correctly.  However, if you set
> ALTER TABLE tab ALTER COLUMN byt SET STORAGE EXTERNAL
> then compression is not used, and random access becomes fast.

Thank you Álvaro. Quite insightful as well. I was aware of TOAST compression,
but didn't make the connection to the side-effects on random-access.

But now that TOAST has LZ4 support, which decompresses extremely fast,
compared to ZLib (I have experience with LZ4 for a WebSocket-based
server messages),
and choosing an appropriately small shard/chunk size, that might be
mitigated somewhat.
Would need testing / benchmarking to compare uncompressed vs LZ4, at
various chunk
and subset/offset sizes, of course.

Anybody has an answer to my question regarding how substr() works on
bytea values?
I.e. is it "pushed down" / optimized enough that it avoids reading the
whole N-byte value,
to then pass it to substr(), which then returns an M-byte value (where M < N)?

If TOAST stores 2,000 chunks, and those chunks' PKs are numbers,
simple arithmetic
should be able to select only the chunks of interest, those incurring
only the necessary IO
for the selected range, no?

Or the fact subsetting a bytea currently requires substr() prevents
using such a scenario?
And if so, why not support a native subsetting notation that did
support that scenario,
like the obvious bytea_col[offset, count] or bytea_col[start:end]?

Seems to be me efficient native subsetting of varlength values would
be quite valuable.

В списке pgsql-general по дате отправления:

Предыдущее

От: Ravi Krishna
Дата: 19 октября 2022 г., 16:31:49
Сообщение: Speeding up adding fky on a very large table

Следующее

От: Tom Lane
Дата: 19 октября 2022 г., 16:34:20
Сообщение: Re: byte-size of column values

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: How to store "blobs" efficiently for small and large sizes, with random access

Предыдущее

Следующее