Re: Pluggable toaster

Поиск
Список
Период
Сортировка
От Teodor Sigaev
Тема Re: Pluggable toaster
Дата
Msg-id aad5a6ff-75fa-47c5-7f8c-33aaa4f64b3a@sigaev.ru
обсуждение исходный текст
Ответ на Re: Pluggable toaster  (Simon Riggs <simon.riggs@enterprisedb.com>)
Ответы Re: Pluggable toaster  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
> In my understanding, we want to be able to
> 1. Access data from a toasted object one slice at a time, by using
> knowledge of the structure
> 2. If toasted data is updated, then update a minimum number of
> slices(s), without rewriting the existing slices
> 3. If toasted data is expanded, then allownew slices to be appended to
> the object without rewriting the existing slices

There are more options:
1 share common parts between not only versions of row but between all 
rows in a column. Seems strange but examples:
   - urls often have a common prefix and so storing in a prefix tree (as
     SP-GiST does) allows significantly decrease storage size
   - the same for json - it's often use case with common part of its
     hierarchical structure
   - one more usecase for json. If json use only a few schemes
     (structure) it's possible to store in toast storage only values and
     don't store keys and structure
2 Current toast storage stores chunks in heap accesses method and to 
provide fast access by toast id it makes an index. Ideas:
   - store chunks directly in btree tree, pgsql's btree already has an
     INCLUDE columns, so, chunks and visibility data will be stored only
     in leaf pages. Obviously it reduces number of disk's access for
     "untoasting".
   - use another access method for chunk storage

> ISTM that we would want the toast algorithm to be associated with the
> datatype, not the column?
> Can you explain your thinking?
Hm. I'll try to explain my motivation.
1) Datatype could have more than one suitable toasters. For different
    usecases: fast retrieving, compact storage, fast update etc. As I
    told   above, for jsonb there are several optimal strategies for
    toasting:   for values with a few different structures, for close to
    hierarchical structures,  for values with different parts by access
    mode (easy to imagine json with some keys used for search and some
    keys only for   output to user)
2) Toaster could be designed to work with different data type. Suggested
    appendable toaster is designed to work with bytea but could work with
    text

Looking on this point I have doubts where to store connection between 
toaster and datatype. If we add toasteroid to pg_type how to deal with 
several toaster for one datatype? (And we could want to has different 
toaster on one table!) If we add typoid to pg_toaster then how it will 
work with several datatypes? An idea to add a new many-to-many 
connection table seems workable but here there are another questions, 
such as will any toaster work with any table access method?

To resolve this bundle of question we propose validate() method of 
toaster, which should be called during DDL operation, i.e. toaster is 
assigned to column or column's datatype is changed.

More thought:
Now postgres has two options for column: storage and compression and now 
we add toaster. For me it seems too redundantly. Seems, storage should 
be binary value: inplace (plain as now) and toastable. All other 
variation such as toast limit, compression enabling, compression kind 
should be an per-column option for toaster (that's why we suggest valid 
toaster oid for any column with varlena/toastable datatype). It looks 
like a good abstraction but we will have a problem with backward 
compatibility and I'm afraid I can't implement it very fast.



> 
> We already have Expanded toast format, in-memory, which was designed
> specifically to allow us to access sub-structure of the datatype
> in-memory. So I was expecting to see an Expanded, on-disk, toast
> format that roughly matched that concept, since Tom has already shown
> us the way. (varatt_expanded). This would be usable by both JSON and
> PostGIS.
Hm, I don't understand. varatt_custom has variable-length tail which 
toaster could use it by any way, appandable toaster use it to store 
appended tail.

> 
> 
> Some other thoughts:
> 
> I imagine the data type might want to keep some kind of dictionary
> inside the main toast pointer, so we could make allowance for some
> optional datatype-specific private area in the toast pointer itself,
> allowing a mix of inline and out-of-line data, and/or a table of
> contents to the slices.
> 
> I'm thinking could also tackle these things at the same time:
> * We want to expand TOAST to 64-bit pointers, so we can have more
> pointers in a table
> * We want to avoid putting the data length into the toast pointer, so
> we can allow the toasted data to be expanded without rewriting
> everything (to avoid O(N^2) cost)
Right

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)
Следующее
От: Teodor Sigaev
Дата:
Сообщение: Re: Pluggable toaster