Re: Yet another vectorized engine
От | Hubert Zhang |
---|---|
Тема | Re: Yet another vectorized engine |
Дата | |
Msg-id | CAB0yre=yjWj-=X-aL6cYc0icN4Md9VKx=244w19Sn4RZ1O4sCw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Yet another vectorized engine (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Ответы |
Re: Yet another vectorized engine
|
Список | pgsql-hackers |
Hi Konstantin,
On Tue, Feb 25, 2020 at 6:44 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
On 25.02.2020 11:06, Hubert Zhang wrote:Hi Konstantin,I checkout your branch pg13 in repo https://github.com/zhangh43/vectorize_engineAfter I fixed some compile error, I tested Q1 on TPCH-10GThe result is different from yours and vectorize version is too slow. Note that I disable parallel worker by default.no JIT no Vectorize: 36 secswith JIT only: 23 secswith Vectorize only: 33 secsJIT + Vectorize: 29 secsMy config option is `CFLAGS='-O3 -g -march=native' --prefix=/usr/local/pgsql/ --disable-cassert --enable-debug --with-llvm`I will do some spike on why vectorized is so slow. Could you please provide your compile option and the TPCH dataset size and your queries(standard Q1?) to help me to debug on it.
Hi, Hubert
Sorry, looks like I have used slightly deteriorated snapshot of master so I have not noticed some problems.
Fixes are committed.
Most of the time is spent in unpacking heap tuple (tts_buffer_heap_getsomeattrs):
24.66% postgres postgres [.] tts_buffer_heap_getsomeattrs
8.28% postgres vectorize_engine.so [.] VExecStoreColumns
5.94% postgres postgres [.] HeapTupleSatisfiesVisibility
4.21% postgres postgres [.] bpchareq
4.12% postgres vectorize_engine.so [.] vfloat8_accum
In my version of nodeSeqscan I do not keep all fetched 1024 heap tuples but stored there attribute values in vector columns immediately.
But to avoid extraction of useless data it is necessary to know list of used columns.
The same problem is solved in zedstore, but unfortunately there is no existed method in Postgres to get list
of used attributes. I have done it but my last implementation contains error which cause loading of all columns.
Fixed version is committed.
Now profile without JIT is:
15.52% postgres postgres [.] tts_buffer_heap_getsomeattrs
10.25% postgres postgres [.] ExecInterpExpr
6.54% postgres postgres [.] HeapTupleSatisfiesVisibility
5.12% postgres vectorize_engine.so [.] VExecStoreColumns
4.86% postgres postgres [.] bpchareq
4.80% postgres vectorize_engine.so [.] vfloat8_accum
3.78% postgres postgres [.] tts_minimal_getsomeattrs
3.66% postgres vectorize_engine.so [.] VExecAgg
3.38% postgres postgres [.] hashbpchar
and with JIT:
13.88% postgres postgres [.] tts_buffer_heap_getsomeattrs
7.15% postgres vectorize_engine.so [.] vfloat8_accum
6.03% postgres postgres [.] HeapTupleSatisfiesVisibility
5.55% postgres postgres [.] bpchareq
4.42% postgres vectorize_engine.so [.] VExecStoreColumns
4.19% postgres postgres [.] hashbpchar
4.09% postgres vectorize_engine.so [.] vfloat8pl
I also tested Q1 with your latest code. Result of vectorized is still slow.
PG13 native: 38 secs
PG13 Vec: 30 secs
PG13 JIT: 23 secs
PG13 JIT+Vec: 27 secs
My perf result is as belows. There are three parts:
1. lookup_hash_entry(43.5%) this part is not vectorized yet.
2. scan part: fetch_input_tuple(36%)
3. vadvance_aggregates part(20%)
I also perfed on PG96 vectorized version and got similar perf results and running time of vectorized PG96 and PG13 are also similar. But PG13 is much faster than PG96. So I just wonder whether we merge all the latest executor code of PG13 into the vectorized PG13 branch?
- agg_fill_hash_table ◆ - 43.50% lookup_hash_entry (inlined) ▒ + 39.07% LookupTupleHashEntry ▒ 0.56% ExecClearTuple (inlined) ▒ - 36.06% fetch_input_tuple ▒ - ExecProcNode (inlined) ▒ - 36.03% VExecScan ▒ - 34.60% ExecScanFetch (inlined) ▒ - ExecScanFetch (inlined) ▒ - VSeqNext ▒ + 16.64% table_scan_getnextslot (inlined) ▒ - 10.29% slot_getsomeattrs (inlined) ▒ - 10.17% slot_getsomeattrs_int ▒ + tts_buffer_heap_getsomeattrs ▒ 7.14% VExecStoreColumns ▒ + 1.38% ExecQual (inlined) ▒ - 20.30% Vadvance_aggregates (inlined) ▒ - 17.46% Vadvance_transition_function (inlined) ▒ + 11.95% vfloat8_accum ▒ + 4.74% vfloat8pl ▒ 0.75% vint8inc_any ▒ + 2.77% ExecProject (inlined)
Thanks
Hubert Zhang
В списке pgsql-hackers по дате отправления: