Re: How to store "blobs" efficiently for small and large sizes, with random access

Поиск
Список
Период
Сортировка
От Dominique Devienne
Тема Re: How to store "blobs" efficiently for small and large sizes, with random access
Дата
Msg-id CAFCRh--1AGj=81sh8Dhudq6mWHWXMdNmJ1CuKd2Hj+tFu3Sq3A@mail.gmail.com
обсуждение исходный текст
Ответ на How to store "blobs" efficiently for small and large sizes, with random access  (Dominique Devienne <ddevienne@gmail.com>)
Список pgsql-general
On Thu, Oct 20, 2022 at 12:21 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> On Thu, 2022-10-20 at 10:32 +0200, Dominique Devienne wrote:
> > FWIW, when Oracle introduced SecureFile blobs years ago in v11, it
> > represented a leap forward in
> > performance, and back then we were seeing them being 3x faster than LO
> > at GB sizes, if I recall correctly,
>
> I don't know what exactly they are, but I suspect that they are just
> files (segments?) in Oracle's "file system" (tablespaces/datafiles).
> So pretty much what we recommend.

I don't think so :) Pretty much the opposite in fact.

Don't let the term SecureFile foul you. From the DB client's perspective,
they are a server-side value like any other, living in a row/col
(relation/tuple),
pretty much like a BYTEA value. But unlike BYTEA, and like LO, what you
SELECT or UPDATE is a LOB Locator, not the content itself, then used
in separate APIs,
so very much like LO. So you get the best of both BYTEA (acts like a
value, in a tab/col),
and LO (random access).

Fully transactional. In fact, the LOB locator in a handle to the MVCC machinery,
so you can get the locator(s) within a transaction, commit that transaction, and
later if you read from the locator, you are still AS OF that
transaction (i.e. read-consistent).
That's super handy for lazy-loading the data in the app on demand. You of course
expose yourself to "snapshot too old". We missed that dearly in PostgreSQL.
Could emulate it with an explicit SNAPSHOT left open, not nearly as convenient.

And there are specialized APIs that allow operating on *multiple* LOBs
in a single
server round-trip, which was essential for performance, for smaller
ones. In Oracle,
anything above 32KB had to be a LOB (at the time), yet you don't want to do a
round-trip for each and every 32KB chunk of data, when you have
thousands like load.
(unless you shard yourself "manually", but good luck matching the perf
of SecureFile LOBS)

I'm not privy of how they were implemented internally. But I do
believe the old blobs
they were replacing (at the time) where more like LO is, i.e. handled
(mostly) like
the other datatypes, in the table infrastructure; while these new (in v11) blobs
were handled internally completely differently, more in a file-system manner.
Thus the large performance gap between the OLD and NEW Oracle LOBs.

But from the outside, that's an "implementation detail". They were in the DB,
transactional, value-like (modulo the level of indirection for random access),
and importantly, efficient. I really wish PostgreSQL had an equivalent.

There's apparently an Oracle ACE on this list, so you can fill in the
gaps above,
or correct any falsehoods I wrote above. This is over 10 years old, so....
I was a big fan of another ACE, Tom Kyte, whose books helped me a lot,
and I was neck-deep in OCI for a few years, but I was just a mostly-self-taught
Oracle developer, so definitely not an expert like an ACE. FWIW, --DD



В списке pgsql-general по дате отправления:

Предыдущее
От: Ron
Дата:
Сообщение: Re: How to store "blobs" efficiently for small and large sizes, with random access
Следующее
От: Ron
Дата:
Сообщение: Re: pg_restore 12 "permission denied for schema" errors