Обсуждение: Diagonal storage model

Поиск
Список
Период
Сортировка

Diagonal storage model

От
Konstantin Knizhnik
Дата:
Hi hackers,

Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented
onOLAP, such as Vertica, HyPer,KDB,...
 
In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of
supportof vector operations in executor.
 
Situation can be changed if we will have pluggable storage API with support of vectorized execution.

But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal format).
This is why in most of the existed systems data is presentin both formats (at least for some time).

I want to announce new model, "diagonal storage" which combines benefits of both approaches.
The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until
wereach the last column.
 
After it we store second column of first record, third column of the second record,...

Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is heap_deform_tuple.
New format will allow to significantly reduce time of heap deforming, because there is just of column if the particular
recordin each tile.
 
Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum
computers.

Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance
atmost of TPC-H queries.
 


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: Diagonal storage model

От
Дмитрий Воронин
Дата:
Hi, Konstantin!

Thank you for working on new pluggable storage API.

Your patch in attachment is 505 bytes and contains only diff from explain.c. Is it right?

01.04.2018, 15:48, "Konstantin Knizhnik" <k.knizhnik@postgrespro.ru>:
> Hi hackers,
>
> Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented
onOLAP, such as Vertica, HyPer,KDB,...
 
> In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of
supportof vector operations in executor.
 
> Situation can be changed if we will have pluggable storage API with support of vectorized execution.
>
> But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal
format).
> This is why in most of the existed systems data is presentin both formats (at least for some time).
>
> I want to announce new model, "diagonal storage" which combines benefits of both approaches.
> The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until
wereach the last column.
 
> After it we store second column of first record, third column of the second record,...
>
> Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is
heap_deform_tuple.
> New format will allow to significantly reduce time of heap deforming, because there is just of column if the
particularrecord in each tile.
 
> Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum
computers.
>
> Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance
atmost of TPC-H queries.
 
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

-- 
Best regards, Dmitry Voronin



Re: Diagonal storage model

От
legrand legrand
Дата:
Great Idea !
thank you Konstantin



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: Diagonal storage model

От
Alexander Korotkov
Дата:
Hi!

On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
I want to announce new model, "diagonal storage" which combines benefits of both approaches.
The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until we reach the last column.
After it we store second column of first record, third column of the second record,...
 
Sounds interesting.  Could "diagonal storages" be applied twice?  That is could we apply
diagonal transformation to the result of another diagonal transformation?  I expect we
should get a "square diagonal" transformation...

Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance at most of TPC-H queries.

Great, but with square diagonal transformation we should get 3.14^2 times improvement,
which is even better!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Diagonal storage model

От
David Fetter
Дата:
On Sun, Apr 01, 2018 at 03:48:07PM +0300, Konstantin Knizhnik wrote:
> Hi hackers,
> 
> Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented
onOLAP, such as Vertica, HyPer,KDB,...
 
> In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of
supportof vector operations in executor.
 
> Situation can be changed if we will have pluggable storage API with support of vectorized execution.
> 
> But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal
format).
> This is why in most of the existed systems data is presentin both formats (at least for some time).
> 
> I want to announce new model, "diagonal storage" which combines benefits of both approaches.
> The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until
wereach the last column.
 
> After it we store second column of first record, third column of the second record,...
> 
> Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is
heap_deform_tuple.
> New format will allow to significantly reduce time of heap deforming, because there is just of column if the
particularrecord in each tile.
 
> Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum
computers.
> 
> Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance
atmost of TPC-H queries.
 

You're sure it's not 3.14159265358979323...?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Diagonal storage model

От
Marko Tiikkaja
Дата:
On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
I want to announce new model, "diagonal storage" which combines benefits of both approaches.
The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until we reach the last column.
After it we store second column of first record, third column of the second record,...

I'm a little worried about the fact that even with this model we're still limited to only two dimensions.  That's bound to cause problems sooner or later.


.m

Re: Diagonal storage model

От
Ashutosh Bapat
Дата:
On Mon, Apr 2, 2018 at 3:49 AM, Marko Tiikkaja <marko@joh.to> wrote:
> On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>>
>> I want to announce new model, "diagonal storage" which combines benefits
>> of both approaches.
>> The idea is very simple: we first store column 1 of first record, then
>> column 2 of second record, ... and so on until we reach the last column.
>> After it we store second column of first record, third column of the
>> second record,...
>
>
> I'm a little worried about the fact that even with this model we're still
> limited to only two dimensions.  That's bound to cause problems sooner or
> later.
>

How about a 3D storage model, whose first dimension gives horizontal
view, second provides vertical or columnar view and third one provides
diagonal view. It also provides capability to add extra dimensions to
provide additional views like double diagonal view. Alas! it all
collapses since I was late to the party.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: Diagonal storage model

От
Andrey Borodin
Дата:

> 2 апр. 2018 г., в 16:57, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> написал(а):
> On Mon, Apr 2, 2018 at 3:49 AM, Marko Tiikkaja <marko@joh.to> wrote:
>>
>> I'm a little worried about the fact that even with this model we're still
>> limited to only two dimensions.  That's bound to cause problems sooner or
>> later.
> How about a 3D storage model, whose first dimension gives horizontal
> view, second provides vertical or columnar view and third one provides
> diagonal view. It also provides capability to add extra dimensions to
> provide additional views like double diagonal view. Alas! it all
> collapses since I was late to the party.

BTW, MDX expression actually provides mulitidimensional result. They have COLUMNS, ROWS, PAGES, SECTIONS, CHAPTERS, and
AXIS(N)for those who is not satisfied with named dimensions. 

Best regards, Andrey Borodin.

Re: Diagonal storage model

От
Michael Paquier
Дата:
On Sun, Apr 01, 2018 at 03:48:07PM +0300, Konstantin Knizhnik wrote:
> Attach please find patch with first prototype implementation. It
> provides about 3.14 times improvement of performance at most of TPC-H
> queries.

Congratulations in finding a patch able to improve all workloads of
Postgres in such a simple and magic way, especially on this particular
date.  I would have used M_PI if I were you.
--
Michael

Вложения