Re: In-Memory Columnar Store
От | knizhnik |
---|---|
Тема | Re: In-Memory Columnar Store |
Дата | |
Msg-id | 52A989B0.5020407@garret.ru обсуждение исходный текст |
Ответ на | Re: In-Memory Columnar Store (Pavel Stehule <pavel.stehule@gmail.com>) |
Ответы |
Re: In-Memory Columnar Store
(Merlin Moncure <mmoncure@gmail.com>)
|
Список | pgsql-hackers |
<div class="moz-cite-prefix">On 12/12/2013 11:42 AM, Pavel Stehule wrote:<br /></div><blockquote cite="mid:CAFj8pRDv7dHivPVrCQFT3zz2a7-d9+nm6cwB-mQcxOWtkdrrDg@mail.gmail.com"type="cite"><p dir="ltr">it is interesting idea.For me, a significant information from comparation, so we do some significantly wrong. Memory engine should be fasternaturally, but I don't tkink it can be 1000x.</blockquote><br /> Sorry, but I didn't fabricate this results:<br />Below is just snapshot from my computer:<br /><br /><br /> postgres=# select DbItem_load();<br /> dbitem_load <br /> -------------<br/> 9999998<br /> (1 row)<br /><br /> postgres=# \timing<br /> Timing is on.<br /> postgres=# selectcs_used_memory();<br /> cs_used_memory <br /> ----------------<br /> 4441894912<br /> (1 row)<br /><br /> postgres=#select agg_val,cs_cut(group_by,'c22c30c10') from <br /> (select (cs_project_agg(ss1.*)).* from <br /> (select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q, <br /> cs_hash_sum(q.score*q.volenquired,q.trader||q.desk||q.office) s1, <br /> cs_hash_sum(q.volenquired, q.trader||q.desk||q.office)s2) ss1) ss2;<br /> agg_val | cs_cut <br /> ------------------+------------------------------------------------------------<br/> 1.50028393511844 | ("John Coltrane","NewYork Corporates","New York")<br /> ....<br /> Time: <font color="#ff0000">506.125 ms</font><br /><br /> postgres=#select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office);<br /> ...<br /> Time:<font color="#ff0000">449328.645 ms</font><br /> postgres=# select sum(score*volenquired)/sum(volenquired) from DbItemgroup by (trader,desk,office);<br /> ...<br /> Time: <font color="#ff0000">441530.689 ms</font><br /><br /> Pleasenotice that time of second execution is almost the same as first, although all data can fit in cache!<br /><br /> Certainlyit was intersting to me to understand the reason of such bad performance.<br /> And find out two things:<br /><br/> 1. <br /> select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office);<br /> and<br/> select sum(score*volenquired)/sum(volenquired) from DbItem group by trader,desk,office;<br /><br /> are notthe same queries (it is hard to understand to C programmer:)<br /> And first one is executed significantly slower.<br/><br /> 2. It is not enough to increase "shared_buffers" parameter in postgresql.conf.<br /> "work_mem" is alsovery important. When I increased it to 1Gb from default 1Mb, then time of query execution is reduced to <br /> 7107.146ms. So the real difference is ten times, not 1000 times.<br /><br /><br /><br /><br /><br /><blockquote cite="mid:CAFj8pRDv7dHivPVrCQFT3zz2a7-d9+nm6cwB-mQcxOWtkdrrDg@mail.gmail.com"type="cite"><p dir="ltr">Yesterday we did asome tests, that shows so for large tables (5G)a our hashing is not effective. Disabling hash join and using merge joinincreased speed 2x<br /> Dne 9. 12. 2013 20:41 "knizhnik" <<a href="mailto:knizhnik@garret.ru" moz-do-not-send="true">knizhnik@garret.ru</a>>napsal(a):<br /> ><br /> > Hello!<br /> ><br /> > I want toannouce my implementation of In-Memory Columnar Store extension for PostgreSQL:<br /> ><br /> > Documentation:<a href="http://www.garret.ru/imcs/user_guide.html" moz-do-not-send="true">http://www.garret.ru/imcs/user_guide.html</a><br/> > Sources: <a href="http://www.garret.ru/imcs-1.01.tar.gz"moz-do-not-send="true">http://www.garret.ru/imcs-1.01.tar.gz</a><br /> ><br/> > Any feedbacks, bug reports and suggestions are welcome.<br /> ><br /> > Vertical representation of datais stored in PostgreSQL shared memory.<br /> > This is why it is important to be able to utilize all available physicalmemory.<br /> > Now servers with Tb or more RAM are not something exotic, especially in financial world.<br />> But there is limitation in Linux with standard 4kb pages for maximal size of mapped memory segment: 256Gb.<br />> It is possible to overcome this limitation either by creating multiple segments - but it requires too much changesin PostgreSQL memory manager.<br /> > Or just set MAP_HUGETLB flag (assuming that huge pages were allocated inthe system).<br /> ><br /> > I found several messages related with MAP_HUGETLB flag, the most recent one was from21 of November:<br /> > <a href="http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org" moz-do-not-send="true">http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org</a><br/> ><br /> > Iwonder what is the current status of this patch?<br /> ><br /> ><br /> ><br /> ><br /> ><br /> ><br />> -- <br /> > Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org" moz-do-not-send="true">pgsql-hackers@postgresql.org</a>)<br/> > To make changes to your subscription:<br /> > <a href="http://www.postgresql.org/mailpref/pgsql-hackers" moz-do-not-send="true">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote><br />
В списке pgsql-hackers по дате отправления:
Следующее
От: Heikki LinnakangasДата:
Сообщение: Re: TODO: Split out pg_resetxlog output into pre- and post-sections