Re: On columnar storage (2)

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: On columnar storage (2)
Дата
Msg-id 567AA672.2010602@postgrespro.ru
обсуждение исходный текст
Ответ на Re: On columnar storage (2)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: On columnar storage (2)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: On columnar storage (2)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
Hi Alvaro,

May be you know, that I have implemented IMCS (in-memory-columnar-store) 
as PostgreSQL extension.
It was not so successful, mostly because people prefer to use standard 
SQL rather than using some special functions for accessing columnar 
storage (CS). Now I am thinking about second reincarnation of IMCS, 
based on FDW and CSP (custom nodes). This is why I am very interested in 
your patch. I have investigated previous version of the patch and have 
some questions.
I will be pleased if you can clarify them to me:

1. CS API.
I agree with you that FDW API seems to be not enough to efficiently 
support work with CS.
At least we need batch insert.
But may be it is better to extend FDW API rather than creating special 
API for CS?

2. Horizontal<->Vertical data mapping. As far as I understand this 
patch, the model of CS assumes that some table columns are stored in 
horizontal format (in heap), some - in vertical format (in CS).  And 
there is one-to-one mapping between horizontal and vertical parts of row 
using CTID. But been involved in several projects requiring OLAP, I 
found out that in most cases it is more convenient to have one-to-many 
mapping. Assume some trading system dealing with stock quotes.
Data looks something like this:

Symbol  Day            Open Close High  Low  Volume
AAA       12/22/2015  10.0   12.0   13.0  8.0   100
AAB       12/22/2015  9.0       8.0   10.0  9.0   200
...
AAA       12/23/2015  12.0   11.5   12.5 11.0    50
AAB       12/23/2015  8.0       8.8    8.5    8.0  300

So it can be represented using the following table:
    create table Quote (Symbol char(10), Day date, Open real, High 
real, Low real, Close real, Volume integer);

Most likely we need to calculate some statistic for particular symbol or 
set of symbols.
For example, portfolio is set of symbols and we need to somehow analyze 
instruments in this portfolio.

There are about several thousands symbols,  tens instruments in 
portfolio and tens of thousands quotes per symbol (in other cases size 
of timeseries are much larger - millions elements).
How can we efficiently execute query like:
    select Symbol,sum(Close*Volume)/sum(Volume) as VWAP from Quote 
group by Symbol    where day between '01/01/2001' and '01/01/2010' and Symbol in 
('AAA', 'AAB','ABB',...);

If we have index by Symbol, then it will contain a lot of duplicates. 
And it is not clear how to efficiently combine index scan by symbol name 
and time slice.

One of the possible solution is to embed timeseries into tuples.
In this case we will have something like this:
    create table Quote (Symbol char(10), Day timeseries(date), Open 
timeseries(real), High timeseries(real),                                    Low timeseries(real), Close 
timeseries(real), Volume timeseries(integer));

We are using here unexisted type timeseries. It is something similar 
with array, but its content in stored in columnar storage rather than in 
record's TOAST.
In this case we can efficiently locate records by symbol (there are only 
few thousands entries in the table) and then perform CS operations with 
located timeseries.

So here we also split tuple into horizontal and vertical part. In 
horizontal part we store just identifier of timeseries.
Query plan should combine standard nodes with custom CS nodes. Mixing 
horizontal and vertical operations significantly complicates optimizer 
and restricts flexibility: having proposed representation it is 
difficult to efficiently calculate some characteristic for all symbols 
in specified time range. This is why I am not sure that it is the only 
possible and most efficient approach. But in any case there should be 
some efficient plan for queries like above.

3. Transpose of data and role of CS.
Let's look once again on Quote example above. Data is received in time 
ascending order. But most queries require grouping it by symbol.  So at 
some stage we have to "transpose"  data. To efficiently append data to 
timeseries we need to buffer it somewhere and then use append range of 
values. In Fujitsu approach two different representations of data are 
used: reader and writer optimized. In IMCS approach, CS is just 
temporary projection of normal PostgreSQL tables. So we do not need to 
worry about durability - it is enforced by PostgreSQL.

So the question is whether CS should be only storage for the data or 
just copy (may be transient) of normal table?

Best regards,
Konstantin

On 22.12.2015 17:43, Alvaro Herrera wrote:
> Michael Paquier wrote:
>> On Wed, Dec 9, 2015 at 3:10 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> Could we get this rebased past the merge of the parallel execution commits?
>> +1. Alvaro, Tomas, Simon, what are the next plans with those patches?
> Yeah, I've been working intermittently on getting the whole tree rebased
> and squashed, because after the last submission we made a lot of
> progress.  I'll repost later.  I think it should be marked "returned
> with feedback" for now.
>




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Patch: Optimize memory allocation in function 'bringetbitmap'
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: pg_hba_lookup function to get all matching pg_hba.conf entries