Re: Pluggable toaster

Поиск
Список
Период
Сортировка
От Nikita Malakhov
Тема Re: Pluggable toaster
Дата
Msg-id CAN-LCVNkU+kdieu4i_BDnLgGszNY1RCnL6Dsrdz44fY7FOG3vg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Pluggable toaster  (Nikita Malakhov <hukutoc@gmail.com>)
Ответы Re: Pluggable toaster  (Nikita Malakhov <hukutoc@gmail.com>)
Список pgsql-hackers
Hi hackers!
According to previous requests the patch branch was cleaned up from garbage, logs, etc. All conflicts' resolutions were merged
into patch commits where they appear, branch was rebased to present one commit for one patch. The branch was actualized,
and a fresh patch set was generated.

What we propose in short:
We suggest a way to make TOAST pluggable as Storage (in a way like Pluggable Access Methods) - detached TOAST 
mechanics from Heap AM, and made it an independent pluggable and extensible part with our freshly developed TOAST API.
With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals 
and/or workflow and workload of data being TOASTed makes Custom Toasters much more efficient in performance and storage.
We keep backwards compatibility and default TOAST mechanics works as it worked previously, working silently with any 
Toastable datatype 
(and TOASTed values and tables from previous versions, no changes in this) and set as default Toaster is not stated otherwise, 
but through our TOAST API.

We've already presented out work at HighLoad, PgCon and PgConf conferences, you can find materials here 
Testing scripts used in talks are a bit scarce and have a lot of manual handling, so it is another bit of work to bunch them into 
patch set, please be patient, I'll try to make it ASAP.

We have ready to plug in extension Toasters 
- bytea appendable toaster for bytea datatype (impressive speedup with bytea append operation) is included in this patch set;
- JSONB toaster for JSONB (very cool performance improvements when dealing with TOASTed JSONB) will be provided later;
- Prototype Toasters (in development) for PostGIS (much faster then default with geometric data), large binary objects 
(like pg_largeobject, but much, much larger, and without existing large object limitations), and currently we're checking default 
Toaster implementation without using Indexes (direct access by TIDs, up to 3 times faster than default on smaller values, 
less storage due to absence of index tree).

Patch set consists of 8 incremental patches:
0001_create_table_storage_v5.patch - SQL syntax fix for CREATE TABLE clause, processing SET STORAGE... correctly;
This patch is already discussed in a separate thread;

0002_toaster_interface_v8.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...) 
and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

0003_toaster_default_v7.patch - Default TOAST implemented via TOAST API;

0004_toaster_snapshot_v7.patch - refactoring of Default TOAST and support for versioned Toast rows;

0005_bytea_appendable_toaster_v7.patch - contrib module bytea_appendable_toaster - special Toaster for bytea datatype with customized append operation;

0006_toasterapi_docs_v3.patch - documentation package for Pluggable TOAST;

0007_fix_alignment_of_custom_toast_pointers_v3.patch - fixes custom toast pointer's
alignment required by bytea toaster by Nikita Glukhov;

0008_fix_toast_tuple_externalize_v3.patch - fixes toast_tuple_externalize function 
not to call toast if old data is the same as new one.

The example of usage the TOAST API:
CREATE EXTENSION bytea_toaster;CREATE TABLE test_bytea_append (id int, a bytea STORAGE EXTERNAL);
ALTER TABLE test_bytea_append ALTER a SET TOASTER bytea_toaster;
INSERT INTO test_bytea_append SELECT i, repeat('a', 10000)::bytea FROM generate_series(1, 10) i;
UPDATE test_bytea_append SET a = a || repeat('b', 3000)::bytea;

This patch set opens the following issues:
1) With TOAST independent of AM it is used by it makes sense to move compression from AM into Toaster and make Compression one of Toaster's options.
Actually, Toasters allow to use any compression methods independently of AM;
2) Implement default Toaster without using Indexes (currently in development)?
3) Allows different, SQL-accessed large objects of almost infinite size IN DATABASE, unlike current large_object functionality and does not limit their quantity;
4) Several already developed Toasters show impressive results for datatypes they were designed for.

We're awaiting feedback.

Regards,
Nikita Malakhov
Postgres Professional 

On Mon, Jul 11, 2022 at 3:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi!
We have branch with incremental commits worm where patches were generated with format-patch -
I'll clean up commits from garbage files asap, sorry, haven't noticed them while moving changes.

Best regards,
Nikita Malakhov

On Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Thanks!

> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals and/or workflow and workload
> of data being TOASTed makes Custom Toasters much more efficient in performance and storage.

The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?

> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?

Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com




Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: make update-po@master stops at pg_upgrade
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.