Re: In-Memory Columnar Store

Поиск
Список
Период
Сортировка
От knizhnik
Тема Re: In-Memory Columnar Store
Дата
Msg-id 52A989B0.5020407@garret.ru
обсуждение исходный текст
Ответ на Re: In-Memory Columnar Store  (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы Re: In-Memory Columnar Store
Список pgsql-hackers
<div class="moz-cite-prefix">On 12/12/2013 11:42 AM, Pavel Stehule wrote:<br /></div><blockquote
cite="mid:CAFj8pRDv7dHivPVrCQFT3zz2a7-d9+nm6cwB-mQcxOWtkdrrDg@mail.gmail.com"type="cite"><p dir="ltr">it is interesting
idea.For me, a significant information from comparation, so we do some significantly wrong. Memory engine should be
fasternaturally, but I don't tkink it can be 1000x.</blockquote><br /> Sorry, but I didn't  fabricate this results:<br
/>Below is just snapshot from my computer:<br /><br /><br /> postgres=# select DbItem_load();<br />  dbitem_load <br />
-------------<br/>      9999998<br /> (1 row)<br /><br /> postgres=# \timing<br /> Timing is on.<br /> postgres=#
selectcs_used_memory();<br />  cs_used_memory <br /> ----------------<br />      4441894912<br /> (1 row)<br /><br />
postgres=#select agg_val,cs_cut(group_by,'c22c30c10') from <br />      (select (cs_project_agg(ss1.*)).* from <br />
          (select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q, <br />                
cs_hash_sum(q.score*q.volenquired,q.trader||q.desk||q.office) s1,  <br />                  cs_hash_sum(q.volenquired,
q.trader||q.desk||q.office)s2) ss1) ss2;<br />      agg_val      |                          
cs_cut                          <br />
------------------+------------------------------------------------------------<br/>  1.50028393511844 | ("John
Coltrane","NewYork Corporates","New York")<br /> ....<br /> Time: <font color="#ff0000">506.125 ms</font><br /><br />
postgres=#select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office);<br /> ...<br />
Time:<font color="#ff0000">449328.645 ms</font><br /> postgres=# select sum(score*volenquired)/sum(volenquired) from
DbItemgroup by (trader,desk,office);<br /> ...<br /> Time: <font color="#ff0000">441530.689 ms</font><br /><br />
Pleasenotice that time of second execution is almost the same as first, although all data can fit in cache!<br /><br />
Certainlyit was intersting to me to understand the reason of such bad performance.<br /> And find out two things:<br
/><br/> 1. <br />      select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office);<br />
and<br/>      select sum(score*volenquired)/sum(volenquired) from DbItem group by trader,desk,office;<br /><br /> are
notthe same queries (it is hard to understand to C programmer:)<br /> And first one is executed significantly
slower.<br/><br /> 2. It is not enough to increase "shared_buffers" parameter in postgresql.conf.<br /> "work_mem" is
alsovery important. When I increased it to 1Gb from default 1Mb, then time of query execution is reduced to <br />
7107.146ms. So the real difference is ten times, not 1000 times.<br /><br /><br /><br /><br /><br /><blockquote
cite="mid:CAFj8pRDv7dHivPVrCQFT3zz2a7-d9+nm6cwB-mQcxOWtkdrrDg@mail.gmail.com"type="cite"><p dir="ltr">Yesterday we did
asome tests, that shows so for large tables (5G)a our hashing is not effective. Disabling hash join and using merge
joinincreased speed 2x<br /> Dne 9. 12. 2013 20:41 "knizhnik" <<a href="mailto:knizhnik@garret.ru"
moz-do-not-send="true">knizhnik@garret.ru</a>>napsal(a):<br /> ><br /> > Hello!<br /> ><br /> > I want
toannouce my implementation of In-Memory Columnar Store extension for PostgreSQL:<br /> ><br /> >    
 Documentation:<a href="http://www.garret.ru/imcs/user_guide.html"
moz-do-not-send="true">http://www.garret.ru/imcs/user_guide.html</a><br/> >      Sources: <a
href="http://www.garret.ru/imcs-1.01.tar.gz"moz-do-not-send="true">http://www.garret.ru/imcs-1.01.tar.gz</a><br />
><br/> > Any feedbacks, bug reports and suggestions are welcome.<br /> ><br /> > Vertical representation of
datais stored in PostgreSQL shared memory.<br /> > This is why it is important to be able to utilize all available
physicalmemory.<br /> > Now servers with Tb or more RAM are not something exotic, especially in financial world.<br
/>> But there is limitation in Linux with standard 4kb pages  for maximal size of mapped memory segment: 256Gb.<br
/>> It is possible to overcome this limitation either by creating multiple segments - but it requires too much
changesin PostgreSQL memory manager.<br /> > Or just set MAP_HUGETLB flag (assuming that huge pages were allocated
inthe system).<br /> ><br /> > I found several messages related with MAP_HUGETLB flag, the most recent one was
from21 of November:<br /> > <a href="http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org"
moz-do-not-send="true">http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org</a><br/> ><br /> >
Iwonder what is the current status of this patch?<br /> ><br /> ><br /> ><br /> ><br /> ><br /> ><br
/>> -- <br /> > Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org"
moz-do-not-send="true">pgsql-hackers@postgresql.org</a>)<br/> > To make changes to your subscription:<br /> > <a
href="http://www.postgresql.org/mailpref/pgsql-hackers"
moz-do-not-send="true">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote><br /> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: TODO: Split out pg_resetxlog output into pre- and post-sections