Re: Vitesse DB call for testing

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Vitesse DB call for testing
Дата
Msg-id CAHyXU0wnMSV=U2BKvTqxuo0G7cuc-W7iCpVrvZAFMsYTNpbo=w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Vitesse DB call for testing  (CK Tan <cktan@vitessedata.com>)
Список pgsql-hackers
On Fri, Oct 17, 2014 at 10:47 AM, CK Tan <cktan@vitessedata.com> wrote:
> Merlin, glad you tried it.
>
> We take the query plan exactly as given by the planner, decide whether to JIT or to punt depending on the cost. If we
punt,it goes back to pg executor. If we JIT, and if we could not proceed (usually of some operators we haven't
implementedyet), we again punt. Once we were able to generate the code, there is no going back; we call into LLVM to
obtainthe function entry point, and run it to completion. The 3% improvement you see in OLTP tests is definitely noise. 
>
> The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the bigint
insteadof numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg and
vitessedbare using int4 in the execution. 
>
> The speed up is really noticeable when the data type is nonvarlena. In the varlena cases, we still call into pg
routinesmost of the times. Again, try the sum,avg,count query on numeric, and you will see what I mean. 
>
> Also, we don't support UDF at the moment. So all queries involving UDF gets sent to pg executor.
>
> On your question of 32k page size, the rational is that some of our customers could be interested in a data warehouse
onpg. 32k page size is a big win when all you do is seqscan all day long. 
>
> We are looking for bug reports at these stage and some stress tests done without our own prejudices. Some test on
realdata in non prod setting on queries that are highly CPU bound would be ideal. 

One thing that I noticed is that when slamming your benchmark query
via pgbench, resident memory consumption was really aggressive and
would have taken down the server had I not spuriously stopped the
test.  Memory consumption did return to baseline after that so I
figured some type of llvm memory management games were going on.  This
isn't really a problem for most OLAP workloads but it's something to
be aware of.

Via 'perf top' on stock postgres, you see the usual suspects: palloc,
hash_search_etc.   On your build though HeapTuplesSatisfiesMVCC zooms
right to the top of the stack which is pretty interesting...the
executor is you've built is very lean and mean for sure.  A drop in
optimization engine with little no schema/sql changes is pretty neat
-- your primary competitor here is going to be column organized type
table solutions to olap type problems.

merlin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: Support UPDATE table SET(*)=...
Следующее
От: Marti Raudsepp
Дата:
Сообщение: Re: Support UPDATE table SET(*)=...