Re: Pluggable toaster

Поиск

Список

Период

Сортировка

От	Nikita Malakhov
Тема	Re: Pluggable toaster
Дата	13 июля 2022 г. 19:45:40
Msg-id	CAN-LCVNkU+kdieu4i_BDnLgGszNY1RCnL6Dsrdz44fY7FOG3vg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Pluggable toaster (Nikita Malakhov <hukutoc@gmail.com>)
Ответы	Re: Pluggable toaster
Список	pgsql-hackers

Дерево обсуждения

Hi hackers!

According to previous requests the patch branch was cleaned up from garbage, logs, etc. All conflicts' resolutions were merged

into patch commits where they appear, branch was rebased to present one commit for one patch. The branch was actualized,

and a fresh patch set was generated.

https://github.com/postgrespro/postgres/tree/toasterapi_clean

What we propose in short:

We suggest a way to make TOAST pluggable as Storage (in a way like Pluggable Access Methods) - detached TOAST

mechanics from Heap AM, and made it an independent pluggable and extensible part with our freshly developed TOAST API.

With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals

and/or workflow and workload of data being TOASTed makes Custom Toasters much more efficient in performance and storage.

We keep backwards compatibility and default TOAST mechanics works as it worked previously, working silently with any

Toastable datatype

(and TOASTed values and tables from previous versions, no changes in this) and set as default Toaster is not stated otherwise,

but through our TOAST API.

We've already presented out work at HighLoad, PgCon and PgConf conferences, you can find materials here

http://www.sai.msu.su/~megera/postgres/talks/

Testing scripts used in talks are a bit scarce and have a lot of manual handling, so it is another bit of work to bunch them into

patch set, please be patient, I'll try to make it ASAP.

We have ready to plug in extension Toasters

- bytea appendable toaster for bytea datatype (impressive speedup with bytea append operation) is included in this patch set;

- JSONB toaster for JSONB (very cool performance improvements when dealing with TOASTed JSONB) will be provided later;

- Prototype Toasters (in development) for PostGIS (much faster then default with geometric data), large binary objects

(like pg_largeobject, but much, much larger, and without existing large object limitations), and currently we're checking default

Toaster implementation without using Indexes (direct access by TIDs, up to 3 times faster than default on smaller values,

less storage due to absence of index tree).

Patch set consists of 8 incremental patches:

0001_create_table_storage_v5.patch - SQL syntax fix for CREATE TABLE clause, processing SET STORAGE... correctly;

This patch is already discussed in a separate thread;

0002_toaster_interface_v8.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)

and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

0003_toaster_default_v7.patch - Default TOAST implemented via TOAST API;

0004_toaster_snapshot_v7.patch - refactoring of Default TOAST and support for versioned Toast rows;

0005_bytea_appendable_toaster_v7.patch - contrib module bytea_appendable_toaster - special Toaster for bytea datatype with customized append operation;

0006_toasterapi_docs_v3.patch - documentation package for Pluggable TOAST;

0007_fix_alignment_of_custom_toast_pointers_v3.patch - fixes custom toast pointer's

alignment required by bytea toaster by Nikita Glukhov;

0008_fix_toast_tuple_externalize_v3.patch - fixes toast_tuple_externalize function

not to call toast if old data is the same as new one.

The example of usage the TOAST API:

CREATE EXTENSION bytea_toaster;CREATE TABLE test_bytea_append (id int, a bytea STORAGE EXTERNAL);
ALTER TABLE test_bytea_append ALTER a SET TOASTER bytea_toaster;
INSERT INTO test_bytea_append SELECT i, repeat('a', 10000)::bytea FROM generate_series(1, 10) i;
UPDATE test_bytea_append SET a = a || repeat('b', 3000)::bytea;

This patch set opens the following issues:

1) With TOAST independent of AM it is used by it makes sense to move compression from AM into Toaster and make Compression one of Toaster's options.

Actually, Toasters allow to use any compression methods independently of AM;

2) Implement default Toaster without using Indexes (currently in development)?

3) Allows different, SQL-accessed large objects of almost infinite size IN DATABASE, unlike current large_object functionality and does not limit their quantity;

4) Several already developed Toasters show impressive results for datatypes they were designed for.

We're awaiting feedback.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

On Mon, Jul 11, 2022 at 3:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi!
We have branch with incremental commits worm where patches were generated with format-patch -
https://github.com/postgrespro/postgres/tree/toasterapi_clean
I'll clean up commits from garbage files asap, sorry, haven't noticed them while moving changes.

Best regards,
Nikita Malakhov

On Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Thanks!

> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals and/or workflow and workload
> of data being TOASTed makes Custom Toasters much more efficient in performance and storage.

The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?

> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?

Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Pluggable toaster

Вложения