Materialiation is slower than non-materialized

Поиск
Список
Период
Сортировка
От Rajeev rastogi
Тема Materialiation is slower than non-materialized
Дата
Msg-id BF2827DCCE55594C8D7A8F7FFD3AB77159876742@szxeml521-mbs.china.huawei.com
обсуждение исходный текст
Ответы Re: Materialiation is slower than non-materialized  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
<div class="WordSection1"><p class="MsoNormal">During my routine work, I observed that incase of execution of plan
havinginner node of NLJ as materialized node (on top of SeqScan) is slower compared to non-materialized SeqScan node.
Thishappens only if “Work_mem is not big enough to hold all tuples in memory.”<p class="MsoNormal"> <p
class="MsoNormal">Tomake test easy and faster, I set the work_mem as 256kB. Then result is as below:<p
class="MsoNormal"> <pclass="MsoNormal">=========With Material off=============<p class="MsoNormal">postgres=# set
enable_materialto off;<p class="MsoNormal">SET<p class="MsoNormal">Time: 0.225 ms<p class="MsoNormal">postgres=# select
count(tbl.id1)from tbl, tbl2 where tbl.id1<tbl2.id1;<p class="MsoNormal">  count<p class="MsoNormal">----------<p
class="MsoNormal">49995000<pclass="MsoNormal">(1 row)<p class="MsoNormal"> <p class="MsoNormal">Time: <b>26674.299
ms</b><pclass="MsoNormal">postgres=# explain select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;<p
class="MsoNormal">                             QUERY PLAN<p
class="MsoNormal">----------------------------------------------------------------------<pclass="MsoNormal">Aggregate 
(cost=2783478.33..2783478.34rows=1 width=4)<p class="MsoNormal">   ->  Nested Loop  (cost=0.00..2700145.00
rows=33333333width=4)<p class="MsoNormal">         Join Filter: (tbl.id1 < tbl2.id1)<p class="MsoNormal">        
-> Seq Scan on tbl  (cost=0.00..145.00 rows=10000 width=4)<p class="MsoNormal">         ->  Seq Scan on tbl2 
(cost=0.00..145.00rows=10000 width=4)<p class="MsoNormal">Planning time: 0.120 ms<p class="MsoNormal">(6 rows)<p
class="MsoListParagraph"> <pclass="MsoNormal">=========With Material on=============<p class="MsoNormal"> <p
class="MsoNormal">postgres=#set enable_material to on;<p class="MsoNormal">SET<p class="MsoNormal">Time: 0.222 ms<p
class="MsoNormal">postgres=#select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;<p class="MsoNormal"> 
count<pclass="MsoNormal">----------<p class="MsoNormal">49995000<p class="MsoNormal">(1 row)<p class="MsoNormal"> <p
class="MsoNormal">Time:<b>32839.627 ms</b><p class="MsoNormal">postgres=# explain select count(tbl.id1) from tbl, tbl2
wheretbl.id1<tbl2.id1;<p class="MsoNormal">                                 QUERY PLAN<p
class="MsoNormal">----------------------------------------------------------------------------<p
class="MsoNormal">Aggregate (cost=1983648.33..1983648.34 rows=1 width=4)<p class="MsoNormal">   ->  Nested Loop 
(cost=0.00..1900315.00rows=33333333 width=4)<p class="MsoNormal">         Join Filter: (tbl.id1 < tbl2.id1)<p
class="MsoNormal">        ->  Seq Scan on tbl  (cost=0.00..145.00 rows=10000 width=4)<p class="MsoNormal">        
-> Materialize  (cost=0.00..235.00 rows=10000 width=4)<p class="MsoNormal">               ->  Seq Scan on tbl2 
(cost=0.00..145.00rows=10000 width=4)<p class="MsoNormal">Planning time: 0.140 ms<p class="MsoNormal">(7 rows)<p
class="MsoNormal"> <pclass="MsoNormal">As per my analysis, above result is aligned with our current design. <p
class="MsoNormal"> <pclass="MsoNormal"><b><u>Materialization Node</u></b>:<p class="MsoNormal">Cost Calculation @ Plan
time:<pclass="MsoNormal">                If the results spills over to disk in case of Materialization, it considers
thecost for the same in total cost.<p class="MsoNormal">Actual Execution:<p class="MsoNormal">                Result is
actuallyfetched from disk only even on re-scan.<p class="MsoNormal"> <p class="MsoNormal"><b><u>Scan Node</u></b>:<p
class="MsoNormal">CostCalculation @ Plan time:<p class="MsoNormal" style="margin-left:36.0pt">The cost of re-scan of
SeqScannode is considered to be same scan of SeqScan node, which always assumes that the records is fetched from disk
andhence disk access cost is added (As we don’t know really how much memory will be available to cache during
execution).<pclass="MsoNormal">Actual Execution:<p class="MsoNormal">                After first scan, once the whole
recordsis loaded to memory (provided shared_buffer is big enough), rescan of records are read from memory only and
henceit is much faster.<p class="MsoNormal"> <p class="MsoNormal">So because of this while planning cost of
Materializednode is lesser than that of SeqScan node but while execution SeqScan is faster because it fetches tuples
frommemory on re-scan.<p class="MsoNormal"> <p class="MsoNormal">I am not sure if we can consider this to be a problem
ornot but I just wanted to share as generally it is expected by user to be Materialization faster than
Non-materialized.<pclass="MsoNormal">Please provide your opinion. If we can do something about this then I can take up
thiswork.<p class="MsoNormal"> <p class="MsoNormal"><i><span style="color:black">Thanks and Regards,</span></i><p
class="MsoNormal"><i>KumarRajeev Rastogi<span style="color:black"> </span></i><p class="MsoNormal"><span
style="font-size:7.5pt;font-family:"华文细黑","serif";color:gray">------------------------------------------------------------------------------------------------------------------------------<br
/></span><spanstyle="font-size:10.0pt;font-family:"Arial","sans-serif";color:gray">This e-mail and its attachments
containconfidential information from HUAWEI, which <br /> is intended only for the person or entity whose address is
listedabove. Any use of the <br /> information contained herein in any way (including, but not limited to, total or
partial<br /> disclosure, reproduction, or dissemination) by persons other than the intended <br /> recipient(s) is
prohibited.If you receive this e-mail in error, please notify the sender by <br /> phone or email immediately and
deleteit!</span><span style="font-size:10.0pt;color:black"></span><p class="MsoNormal"
style="margin-left:21.0pt"> </div>

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Display of multi-target-table Modify plan nodes in EXPLAIN
Следующее
От: Michael Paquier
Дата:
Сообщение: Exposing PG_VERSION_NUM in pg_config