Re: Pluggable toaster

Поиск
Список
Период
Сортировка
От Nikita Malakhov
Тема Re: Pluggable toaster
Дата
Msg-id CAN-LCVMhhCP1+AyCwtQJ_XabcfoP-RZHcwDjC197pEKkzi-=+g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Pluggable toaster  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
Hi,

>I'm not convinced that's universally true. Yes, I'm sure certain TOAST
>implementations would benefit from tighter control over compression, but
>does that imply compression and toast are redundant? I doubt that,
>because we compress non-toasted types too, for example. And layering has
>a value too, as makes it easier to replace the pieces.
Not exactly. It is a mean to control TOAST itself without changing the core 
each time you want to change Toast strategy or method. Compression is 
just an example. And no Toasters are available without the patch proposed, 
there is the one and only.

>Perhaps. My main point is that we should not be making too many radical
>changes at once - it makes it much harder to actually get anything done.
>So yeah, doing TOAST through IOT might be interesting, but I'd leave
>that for a separate patch.
That's why 4 distinct patches with incremental changes were proposed - 
1) just new Toaster API with some necessary core changes required by the API;
2) default toaster routed via new API (but all it's functions are not affected 
and dummy toaster extension as an example);
3) 1+2+some refactoring and versioning;
4) extension module for bytea columns.
Toast through IOT is a topic for discussion but does not seem to give a major 
advantage over existing storage method, according to tests.

>It seems better to prevent such incompatible combinations and restrict
>each toaster to just compatible data types, and the mapping table
>(linking toaster and data types) seems a way to do that.
To handle this case a validate function (toastervalidate_function) is proposed 
in the TsrRoutine structure.

>If you have to implement custom toaster to implement custom compression
>method, doesn't that make things more complex? You'd have to solve all
>the issues for custom compression methods and also all issues for custom
>toaster. Also, what if you want to just compress the column, not toast?
Default compression is restricted to 2 compression methods, all other means 
require extensions. Also, the name Toaster is a little bit misleading because
it intends that data is being sliced, but it is not always the case, to be toasted
a piece of bread must not necessarily be sliced.

Regards,
--
Nikita Malakhov
Postgres Professional

On Tue, Jan 18, 2022 at 7:06 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:


On 1/18/22 15:56, Teodor Sigaev wrote:
> Hi!
>
>> Maybe doing that kind of compression in TOAST is somehow simpler, but
>> I don't see it.
> Seems, in ideal world, compression should be inside toaster.
>

I'm not convinced that's universally true. Yes, I'm sure certain TOAST
implementations would benefit from tighter control over compression, but
does that imply compression and toast are redundant? I doubt that,
because we compress non-toasted types too, for example. And layering has
a value too, as makes it easier to replace the pieces.

>>
>>> 2 Current toast storage stores chunks in heap accesses method and to
>>> provide fast access by toast id it makes an index. Ideas:
>>>    - store chunks directly in btree tree, pgsql's btree already has an
>>>      INCLUDE columns, so, chunks and visibility data will be stored only
>>>      in leaf pages. Obviously it reduces number of disk's access for
>>>      "untoasting".
>>>    - use another access method for chunk storage
>>>
>>
>> Maybe, but that probably requires more thought - e.g. btree requires
>> the values to be less than 1/3 page, so I wonder how would that play
>> with toasting of values.
> That's ok, because chunk size is 2000 bytes right now and its could be
> saved.
>>

Perhaps. My main point is that we should not be making too many radical
changes at once - it makes it much harder to actually get anything done.
So yeah, doing TOAST through IOT might be interesting, but I'd leave
that for a separate patch.

>
>> Seems you'd need a mapping table, to allow M:N mapping between types
>> and toasters, linking it to all "compatible" types. It's not clear to
>> me how would this work with custom data types, domains etc.
> If toaster will look into internal structure then it should know type's
> binary format. So, new custom types have a little chance to work with
> old custom toaster. Default toaster works with any types.

The question is what happens when you combine data type with a toaster
that is not designed for that type. I mean, imagine you have a JSONB
toaster and you set it for a bytea column. Naive implementation will
just crash, because it'll try to process bytea as if it was JSONB.

It seems better to prevent such incompatible combinations and restrict
each toaster to just compatible data types, and the mapping table
(linking toaster and data types) seems a way to do that.

However, it seems toasters are either generic (agnostic to data types,
treating everything as bytea) or specialized. I doubt any specialized
toaster can reasonably support multiple data types, so maybe each
toaster can have just one "compatible type" OID. If it's invalid, it'd
be "generic" and otherwise it's useful for that type and types derived
from it (e.g. domains).

So you'd have the toaster OID in two places:

pg_type.toaster_oid      - default toaster for the type
pg_attribute.toaster_oid - current toaster for this column

and then you'd have

pg_toaster.typid - type this toaster handles (or InvalidOid for generic)


>>
>> Also, what happens to existing values when you change the toaster?
>> What if the toasters don't use the same access method to store the
>> chunks (heap vs. btree)? And so on.
>
> vatatt_custom contains an oid of toaster and toaster is not allowed to
> delete (at least, in suggested patches). So, if column's toaster has
> been changed then old values will be detoasted  by toaster pointed in
> varatt_custom structure, not in column definition. This is very similar
> to storage attribute works: we we alter storage attribute only new
> values will be stored with pointed storage type.
>

IIRC we do this for compression methods, right?

>>
>>> More thought:
>>> Now postgres has two options for column: storage and compression and
>>> now we add toaster. For me it seems too redundantly. Seems, storage
>>> should be binary value: inplace (plain as now) and toastable. All
>>> other variation such as toast limit, compression enabling,
>>> compression kind should be an per-column option for toaster (that's
>>> why we suggest valid toaster oid for any column with
>>> varlena/toastable datatype). It looks like a good abstraction but we
>>> will have a problem with backward compatibility and I'm afraid I
>>> can't implement it very fast.
>>>
>>
>> So you suggest we move all of this to toaster? I'd say -1 to that,
>> because it makes it much harder to e.g. add custom compression method,
>> etc.
> Hmm, I suggested to leave only toaster at upper level. Compression kind
> could be chosen in toaster's options (not implemented yet) or even make
> an API interface to compression to make it configurable. Right now,
> module developer could not implement a module with new compression
> method and it is a disadvantage.

If you have to implement custom toaster to implement custom compression
method, doesn't that make things more complex? You'd have to solve all
the issues for custom compression methods and also all issues for custom
toaster. Also, what if you want to just compress the column, not toast?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Japin Li
Дата:
Сообщение: Remove redundant MemoryContextSwith in BeginCopyFrom
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: [ERROR] Copy from CSV fails due to memory error.