Re: asynchronous and vectorized execution

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: asynchronous and vectorized execution
Дата
Msg-id CAFj8pRCMLh1rpZ78wn2ovAR3nBkBK3zjG7tA4DNzVzF+W9H90w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: asynchronous and vectorized execution  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers


2016-05-10 8:05 GMT+02:00 David Rowley <david.rowley@2ndquadrant.com>:
On 10 May 2016 at 16:34, Greg Stark <stark@mit.edu> wrote:
>
> On 9 May 2016 8:34 pm, "David Rowley" <david.rowley@2ndquadrant.com> wrote:
>>
>> This project does appear to require that we bloat the code with 100's
>> of vector versions of each function. I'm not quite sure if there's a
>> better way to handle this. The problem is that the fmgr is pretty much
>> a barrier to SIMD operations, and this was the only idea that I've had
>> so far about breaking through that barrier. So further ideas here are
>> very welcome.
>
> Well yes and no. In practice I think you only need to worry about vectorised
> versions of integer and possibly float. For other data types there either
> aren't vectorised operators or there's little using them.
>
> And I'll make a bold claim here that the only operators I think really
> matter are =
>
> The rain is because using SIMD instructions is a minor win if you have any
> further work to do per tuple. The only time it's a big win is if you're
> eliminating entire tuples from consideration efficiently. = is going to do
> that often, other btree operator classes might be somewhat useful, but
> things like + really only would come up in odd examples.
>
> But even that understates things. If you have column oriented storage then =
> becomes even more important since every scan has a series of implied
> equijoins to reconstruct the tuple. And the coup de grace is that in a
> column oriented storage you try to store variable length data as integer
> indexes into a dictionary of common values so *everything* is an integer =
> operation.
>
> How to do this without punching right through the executor as an abstraction
> and still supporting extensible data types and operators was puzzling me
> already. I do think it involves having these vector operators in the
> catalogue and also some kind of compression mapping to integer indexes. But
> I'm not sure that's all that would be needed.

Perhaps the first move to make on this front will be for aggregate
functions. Experimentation might be quite simple to realise which
functions will bring enough benefit. I imagined that even Datums where
the type is not processor native might yield a small speedup, not from
SIMD, but just from less calls through fmgr. Perhaps we'll realise
that those are not worth the trouble, I've no idea at this stage.

It can be reduced to sum and count in first iteration. On other hand lot of OLAP reports is based on pretty complex expressions - and there probably the compilation is better way.

Regards

Pavel
 

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: asynchronous and vectorized execution
Следующее
От: David Rowley
Дата:
Сообщение: Re: between not propated into a simple equality join