Re: Yet another vectorized engine

Поиск
Список
Период
Сортировка
От Hubert Zhang
Тема Re: Yet another vectorized engine
Дата
Msg-id CAB0yre=yjWj-=X-aL6cYc0icN4Md9VKx=244w19Sn4RZ1O4sCw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Yet another vectorized engine  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы Re: Yet another vectorized engine
Список pgsql-hackers
Hi Konstantin,

On Tue, Feb 25, 2020 at 6:44 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:


On 25.02.2020 11:06, Hubert Zhang wrote:
Hi Konstantin,

I checkout your branch pg13 in repo https://github.com/zhangh43/vectorize_engine
After I fixed some compile error, I tested Q1 on TPCH-10G
The result is different from yours and vectorize version is too slow. Note that I disable parallel worker by default.
no JIT no Vectorize:  36 secs
with JIT only:             23 secs
with Vectorize only:   33 secs
JIT + Vectorize:         29 secs

My config option is `CFLAGS='-O3 -g -march=native' --prefix=/usr/local/pgsql/ --disable-cassert --enable-debug --with-llvm`
I will do some spike on why vectorized is so slow. Could you please provide your compile option and the TPCH dataset size and your queries(standard Q1?) to help me to debug on it.



Hi, Hubert

Sorry, looks like I have used slightly deteriorated snapshot of master so I have not noticed some problems.
Fixes are committed.

Most of the time is spent in unpacking heap tuple (tts_buffer_heap_getsomeattrs):

  24.66%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
   8.28%  postgres  vectorize_engine.so  [.] VExecStoreColumns
   5.94%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
   4.21%  postgres  postgres             [.] bpchareq
   4.12%  postgres  vectorize_engine.so  [.] vfloat8_accum


In my version of nodeSeqscan I do not keep all fetched 1024 heap tuples but stored there attribute values in vector columns immediately.
But to avoid extraction of useless data it is necessary to know list of used columns.
The same problem is solved in zedstore, but unfortunately there is no existed method in Postgres to get list
of used attributes. I have done it but my last implementation contains error which cause loading of all columns.
Fixed version is committed.

Now profile without JIT is:

 15.52%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
  10.25%  postgres  postgres             [.] ExecInterpExpr
   6.54%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
   5.12%  postgres  vectorize_engine.so  [.] VExecStoreColumns
   4.86%  postgres  postgres             [.] bpchareq
   4.80%  postgres  vectorize_engine.so  [.] vfloat8_accum
   3.78%  postgres  postgres             [.] tts_minimal_getsomeattrs
   3.66%  postgres  vectorize_engine.so  [.] VExecAgg
   3.38%  postgres  postgres             [.] hashbpchar

and with JIT:

 13.88%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
   7.15%  postgres  vectorize_engine.so  [.] vfloat8_accum
   6.03%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
   5.55%  postgres  postgres             [.] bpchareq
   4.42%  postgres  vectorize_engine.so  [.] VExecStoreColumns
   4.19%  postgres  postgres             [.] hashbpchar
   4.09%  postgres  vectorize_engine.so  [.] vfloat8pl


I also tested Q1 with your latest code. Result of vectorized is still slow.
PG13 native: 38 secs
PG13 Vec: 30 secs
PG13 JIT: 23 secs
PG13 JIT+Vec: 27 secs

My perf result is as belows. There are three parts: 
1. lookup_hash_entry(43.5%) this part is not vectorized yet.
2. scan part: fetch_input_tuple(36%)
3. vadvance_aggregates part(20%)
I also perfed on PG96 vectorized version and got similar perf results and running time of vectorized PG96 and PG13 are also similar. But PG13 is much faster than PG96. So I just wonder whether we merge all the latest executor code of PG13 into the vectorized PG13 branch? 

- agg_fill_hash_table ◆ - 43.50% lookup_hash_entry (inlined) ▒ + 39.07% LookupTupleHashEntry ▒ 0.56% ExecClearTuple (inlined) ▒ - 36.06% fetch_input_tuple ▒ - ExecProcNode (inlined) ▒ - 36.03% VExecScan ▒ - 34.60% ExecScanFetch (inlined) ▒ - ExecScanFetch (inlined) ▒ - VSeqNext ▒ + 16.64% table_scan_getnextslot (inlined) ▒ - 10.29% slot_getsomeattrs (inlined) ▒ - 10.17% slot_getsomeattrs_int ▒ + tts_buffer_heap_getsomeattrs ▒ 7.14% VExecStoreColumns ▒ + 1.38% ExecQual (inlined) ▒ - 20.30% Vadvance_aggregates (inlined) ▒ - 17.46% Vadvance_transition_function (inlined) ▒ + 11.95% vfloat8_accum ▒ + 4.74% vfloat8pl ▒ 0.75% vint8inc_any ▒ + 2.77% ExecProject (inlined)

--
Thanks

Hubert Zhang

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: Yet another vectorized engine
Следующее
От: John Naylor
Дата:
Сообщение: Re: truncating timestamps on arbitrary intervals