Re: Yet another vectorized engine

Поиск
Список
Период
Сортировка
От Hubert Zhang
Тема Re: Yet another vectorized engine
Дата
Msg-id CAB0yrenYmbYsioz167OrcO_8wVsvb=MA381-McLNcjEb1EJQYg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Yet another vectorized engine  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы Re: Yet another vectorized engine  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers
Thanks Konstantin for your detailed review!

On Tue, Dec 3, 2019 at 5:58 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:


On 02.12.2019 4:15, Hubert Zhang wrote:

The prototype extension is at https://github.com/zhangh43/vectorize_engine

I am very sorry, that I have no followed this link.
Few questions concerning your design decisions:

1. Will it be more efficient to use native arrays in vtype instead of array of Datum? I think it will allow compiler to generate more efficient code for operations with float4 and int32 types.
It is possible to use union to keep fixed size of vtype.
 
Yes, I'm also considering that when scan a column store, the column batch is loaded into a continuous memory region. For int32, the size of this region is 4*BATCHSIZE, while for int16, the size is 2*BATCHSIZE. So using native array could just do a single memcpy to fill the vtype batch.
 
2. Why VectorTupleSlot contains array (batch) of heap tuples rather than vectors (array of vtype)?

a. VectorTupleSlot stores array of vtype in tts_values field which is used to reduce the code change and reuse functions like ExecProject. Of course we could use separate field to store vtypes.
b. VectorTupleSlot also contains array of heap tuples. This used to do heap tuple deform. In fact, the tuples in a batch may across many pages, so we also need to pin an array of related pages instead of just one page.

3. Why you have to implement your own plan_tree_mutator and not using expression_tree_mutator?

I also want to replace plan node, e.g. Agg->CustomScan(with VectorAgg implementation). expression_tree_mutator cannot be used to mutate plan node such as Agg, am I right?
 
4. As far as I understand you now always try to replace SeqScan with your custom vectorized scan. But it makes sense only if there are quals for this scan or aggregation is performed.
In other cases batch+unbatch just adds extra overhead, doesn't it?

Probably extra overhead for heap format and query like 'select i from t;' without qual, projection, aggregation.
But with column store, VectorScan could directly read batch, and no additional batch cost. Column store is the better choice for OLAP queries.
Can we conclude that it would be better to use vector engine for OLAP queries and row engine for OLTP queries.

5. Throwing and catching exception for queries which can not be vectorized seems to be not the safest and most efficient way of handling such cases.
May be it is better to return error code in plan_tree_mutator and propagate this error upstairs? 
 
Yes, as for efficiency, another way is to enable some plan node to be vectorized and leave other nodes not vectorized and add batch/unbatch layer between them(Is this what you said "propagate this error upstairs"). As you mentioned, this could introduce additional overhead. Is there any other good approaches?
What do you mean by not safest? PG catch will receive the ERROR, and fallback to the original non-vectorized plan.


6. Have you experimented with different batch size? I have done similar experiments in VOPS and find out that tile size larger than 128 are not providing noticable increase of performance.
You are currently using batch size 1024 which is significantly larger than typical amount of tuples on one page.

Good point, We will do some experiments on it. 

7. How vectorized scan can be combined with parallel execution (it is already supported in9.6, isn't it?)

We didn't implement it yet. But the idea is the same as non parallel one. Copy the current parallel scan and implement vectorized Gather, keeping their interface to be VectorTupleTableSlot.
Our basic idea to reuse most of the current PG executor logic, and make them vectorized, then tuning performance gradually.

--
Thanks

Hubert Zhang

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Proposal: Add more compile-time asserts to exposeinconsistencies.
Следующее
От: Arthur Zakirov
Дата:
Сообщение: Re: pg_upgrade fails with non-standard ACL