Re: [HACKERS] GSoC 2017

Поиск
Список
Период
Сортировка
От Ruben Buchatskiy
Тема Re: [HACKERS] GSoC 2017
Дата
Msg-id CAFRJ5K3sDGSn=pKgnsobYQX4CMTOU=0uJ-vt2kF3t1FsVnTCRQ@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] GSoC 2017  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы Re: [HACKERS] GSoC 2017  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Re: [HACKERS] GSoC 2017  (Stephen Frost <sfrost@snowman.net>)
Re: [HACKERS] GSoC 2017  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
2017-01-10 12:53 GMT+03:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:

1. What project ideas we have?


Hi!

We would like to propose a project on rewriting PostgreSQL executor from

traditional Volcano-style [1] to so-called push-based architecture as implemented in

Hyper [2][3] and VitesseDB [4]. The idea is to reverse the direction of data flow

control: instead of pulling up tuples one-by-one with ExecProcNode(), we suggest

pushing them from below to top until blocking operator (e.g. Aggregation) is

encountered. There’s a good example and more detailed explanation for this approach in [2].

The advantages of this approach:

 * It allows to completely avoid the need of loading/storing the internal state of the bottommost

   (scanning) nodes, which will significantly reduce overhead. With current pull-based model,

   we call functions like heapgettup_pagemode() (and many others) number-of-tuples-to-retrieve

   times, while in push-based model we will call them only once. Currently, we have

   implemented a prototype for SeqScan node and achieved 2x speedup on query

   “select * from lineitem”;

 * The number of memory accesses is minimized; generally better code and data locality,

    cache is used more effectively;

 * Switching to push model also makes a good base for building effective JIT-compiler.

   Currently we have working LLVM-based JIT compiler for expressions [5], as well as whole query

   JIT-compiler [6], which speeds up TPC-H queries up to 4-5 times, but the latter took manually

   re-implementing the executor logic with LLVM API using push model to get this speedup. JIT-compiling

   from original Postgres C code didn't give significant improvement because of Volcano-style model

   inherent inefficiency. After making a switch to push-model we expect to achieve speedup comparable

   to stand-alone JIT, but using the same code for both JIT and the interpreter.

Also, while working on this project, we are likely be revealing and fixing other

weak places of the current query executor. Volcano-style model is known to have

inadequate performance characteristics [7][8], e.g. function call overhead,

and we should deal with it anyway. We also plan to make relatively small patches,

which will optimize the redundant reload of the internal state in the current pull-model.

Many DB systems with support of full query compilation (e.g. LegoBase [9], Hekaton [10]) implement it in push-based manner.

Also we have seen in the mailing list that Kumar Rajeev had been investigating this idea too, and he reported that the results were impressive (unfortunately, without specifying more details):

https://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904%40szxeml521-mbs.china.huawei.com

References

[1]  Graefe G.. Volcano — an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng.,6(1): 120–135, 1994.

[2] Efficiently Compiling Efficient Query Plans for Modern Hardware,

   http://www.vldb.org/pvldb/vol4/p539-neumann.pdf

[3] Compiling Database Queries into Machine Code,

   http://sites.computer.org/debull/A14mar/p3.pdf

[4] https://docs.google.com/presentation/d/1R0po7_Wa9fym5U9Y5qHXGlUi77nSda2LlZXPuAxtd-M/pub?slide=id.g9b338944f_4_131

[5] PostgreSQL with JIT compiler for expressions,

   https://github.com/ispras/postgres

[6] LLVM Cauldron, slides,

   http://llvm.org/devmtg/2016-09/slides/Melnik-PostgreSQLLLVM.pdf

[7] MonetDB/X100: Hyper-Pipelining Query Execution

   http://cidrdb.org/cidr2005/papers/P19.pdf

[8] Vectorization vs. Compilation in Query Execution,

   https://pdfs.semanticscholar.org/dcee/b1e11d3b078b0157325872a581b51402ff66.pdf

[9] http://www.vldb.org/pvldb/vol7/p853-klonatos.pdf[10] https://www.microsoft.com/en-us/research/wp-content/uploads/2013/06/Hekaton-Sigmod2013-final.pdf


--
Best Regards,
Ruben. <ruben@ispras.ru>
ISP RAS.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: [HACKERS] Active zombies at AIX
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY