Re: How to store "blobs" efficiently for small and large sizes, with random access

Поиск

Список

Период

Сортировка

От	Daniel Verite
Тема	Re: How to store "blobs" efficiently for small and large sizes, with random access
Дата	19 октября 2022 г. 14:29:24
Msg-id	08dacb6b-09ea-4c04-b3db-b308e8b208ab@manitou-mail.org обсуждение исходный текст
Ответ на	How to store "blobs" efficiently for small and large sizes, with random access (Dominique Devienne <ddevienne@gmail.com>)
Ответы	Re: How to store "blobs" efficiently for small and large sizes, with random access
Список	pgsql-general

Дерево обсуждения

    Dominique Devienne wrote:

> PostgreSQL bytea is much better and simpler, except limited to 1GB...
> Bytea also has no direct random access, except via substr[ing], but
> how efficient and "random access" is that?

Bytea contents are compressed before being sliced (in chunks of
TOAST_MAX_CHUNK_SIZE bytes, typically it's 2000 IIRC), so it's not
possible to access a piece of data without decompressing the contents
before it.

By contrast large objects are sliced before compression, so the
performance of random access is likely to be completely different.

> Here are the main requirement I need to fulfil:
> 1) store literally millions of rows, 1 "blob" per row. (scientific data).
> 2) most "blobs" are smallish, below a few KBs. bytea's perfect for that.
> 3) yet many blobs are on the dozens of MBs. bytea's still OK.
> 4) but some blobs exceed the 1GB byte limit. Found at least a dozen
> just in our test data, so clients will have them.
> 5) for accessing larger blobs, the API I must implement accesses
> contiguous chunks of the blobs. Thus I need random access.

In your case I would probably opt for bytea (as opposed to large
objects), and slicing the blobs in the application in chunks of a
fixed size much larger than what TOAST does (for instance, somewhere
between 128 kB and 32 MB).

That is, having a table like:

create table blobs (
  blob_id some_type,
  chunk_no int,  /* 0->N */
  chunk bytea
);

It's not as simple as using a single bytea field or large objects,
but overall it avoids the management difficulties of both large
objects and very large contents in bytea columns.
Random access is achieved by skipping the chunks before the
requested piece of data.

If the app is able to read/write the chunks in binary mode, its queries
should perform as well as the large objects functions.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: How to store "blobs" efficiently for small and large sizes, with random access