Re: Zedstore - compressed in-core columnar storage

Поиск

Список

Период

Сортировка

От	Ashwin Agrawal
Тема	Re: Zedstore - compressed in-core columnar storage
Дата	15 апреля 2019 г. 16:15:51
Msg-id	CALfoeitV6Hj-_JHxQXoDERs=s0R=whAGYJz7Gv=g5t1z8_DKRw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Zedstore - compressed in-core columnar storage (Peter Geoghegan <pg@bowt.ie>)
Ответы	Re: Zedstore - compressed in-core columnar storage
Список	pgsql-hackers

Дерево обсуждения

On Sat, Apr 13, 2019 at 4:22 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Apr 11, 2019 at 6:06 AM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
> Reading about it reminds me of this work -- TAG column storage( https://urldefense.proofpoint.com/v2/url?u=http-3A__www09.sigmod.org_sigmod_record_issues_0703_03.article-2Dgraefe.pdf&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=gxIaqms7ncm0pvqXLI_xjkgwSStxAET2rnZQpzba2KM&m=H2hOVqCm9svWVOW1xh7FhoURKEP-WWpWso6lKD1fLoM&s=KNOse_VUg9-BW7SyDXt1vw92n6x_B92N9SJHZKrdoIo&e= ).
> Isn't this storage system inspired from there, with TID as the TAG?
>
> It is not referenced here so made me wonder.

I don't think they're particularly similar, because that paper
describes an architecture based on using purely logical row
identifiers, which is not what a TID is. TID is a hybrid
physical/logical identifier, sometimes called a "physiological"
identifier, which will have significant overhead.

Storage system wasn't inspired by that paper, but yes seems it also talks about laying out column data in btrees, which is good to see. But yes as pointed out by Peter, the main aspect the paper is focusing on to save space for TAG, isn't something zedstore plan's to leverage, it being more restrictive. As discussed below we can use other alternatives to save space.

Ashwin said that
ZedStore TIDs are logical identifiers, but I don't see how that's
compatible with a hybrid row/column design (unless you map heap TID to
logical row identifier using a separate B-Tree).

Would like to know more specifics on this Peter. We may be having different context on hybrid row/column design. When we referenced design supports hybrid row/column families, it meant not within same table. So, not inside a table one can have some data in row and some in column nature. For a table, the structure will be homogenous. But it can easily support storing all the columns together, or subset of columns together or single column all connected together by TID.

The big idea with Graefe's TAG design is that there is practically no
storage overhead for these logical identifiers, because each entry's
identifier is calculated by adding its slot number to the page's
tag/low key. The ZedStore design, in contrast, explicitly stores TID
for every entry. ZedStore seems more flexible for that reason, but at
the same time the per-datum overhead seems very high to me. Maybe
prefix compression could help here, which a low key and high key can
do rather well.

Yes, the plan to optimize out TID space per datum, either by prefix compression or delta compression or some other trick.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Zedstore - compressed in-core columnar storage