Re: Expression Pruning in postgress

Поиск
Список
Период
Сортировка
От HarmeekSingh Bedi
Тема Re: Expression Pruning in postgress
Дата
Msg-id CALLwk6vj0V959K_F2QQM=TsL7cGw3=tYh-JE7p4eyzDtu56GJg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Expression Pruning in postgress  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Expression Pruning in postgress
Список pgsql-hackers
<br />Thanks Tom. Here is a example. Just a background of things . I have made changes in postgress execution and
storageengine to make it a MPP style engine - keeping all optimizer intact. Basically take pgress serial plan and
constructa parallel plan. The query I am running is below.<br /><br /><u><b>Query</b></u><br /><br /># explain select
ooj.local_time,ooj.is_timeout,ooj.capsulename,<br/>       regexp_split_to_table(<br />            case<br />          
    when ooj.isubstr is null then 'none'<br />                when ooj.isubstr='' then 'none'<br />                else
ooj.isubstr<br/>            end ,';') as interest,<br />       sum(ooj.impression) count_impressions,<br />      
sum(ooj.click)count_clicks<br />from (<br />    select (impression.server_utc_time/3600000)*3600 as local_time,<br />  
        impression.is_timeout_default_impression as is_timeout,<br />            impression.capsule_name as
capsulename,<br/>            substring(impression.website_variables from 'ybt=([0-9;]*)') as isubstr,<br />          
 1as impression,<br />            case click.impression_id when null then 0 else 1 end as click<br />   from<br />    
      impression_ytw2_row impression<br />   left outer join  clicks_row click<br />   on impression.impression_id =
click.impression_id)ooj<br /> group by local_time, is_timeout, capsulename, interest;<br /><br /><u><b>Now if you
kindlybear with me and ignore the Parallel nodes in the plan {which are just parallel distributors} . If you compare
thetwo plans below and check the output tupledesc of the Hash join you will see that some columns are not gettign
prunedout - In my case this is a big column. <br /></b></u><br /><b>Case 1 Plan {inline view gets merged} </b><br /><br
/>  Now when the inline view gets merged via pullup_subqueries I can see that we have columns
{impression.website_variables}which should get pruned out but it does not after the JOIN { it gets pruned out after
hashaggregate). <br /><br /> Parallel reciever RANDOM queue  (nodenum=0 cost=132.79..133.12 rows=10 width=104)<br />  
Output:(sum(1)), impression.is_timeout_default_impression, impression.capsule_name, (sum(1)), sum(1), sum(1)<br />  
-> Parallel sender RANDOM queue  (nodenum=1 cost=132.79..133.12 rows=10 width=104)<br />          Output: (sum(1)),
impression.is_timeout_default_impression,impression.capsule_name, (sum(1)), sum(1), sum(1)<br />         -> 
HashAggregate (nodenum=2 cost=132.79..133.12 rows=10 width=104)<br />               Output:
(((impression.server_utc_time/ 3600000) * 3600)), impression.is_timeout_default_impression, impression.capsule_name,
(regexp_split_to_table(CASEWHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL<br />
 THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN
'none'::textELSE "substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) END, ';'::text)), sum(1),
sum(1)<br/>                ->  Parallel reciever HASH-AGG queue  (nodenum=3 cost=117.45..130.08 rows=181
width=104)<br/>                     Output: impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name,<b>impression.website_variables</b>, ((impression.server_utc_time / 3600000) * 3600),
regexp_split_to_table(CASEWHEN ("substring"((impres<br /> ion.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL)
THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN
'none'::textELSE "substring"((impression.website_variables)::text, 'ybt=([0-9;<br /> *)'::text) END, ';'::text)<br
/>                    ->  Parallel sender HASH-AGG queue  (nodenum=4 cost=117.45..130.08 rows=181 width=104)<br
/>                          Output: impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name,<b>impression.website_variables</b>, ((impression.server_utc_time / 3600000) * 3600),
regexp_split_to_table(CASEWHEN ("substring"((<br /> mpression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL)
THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN
'none'::textELSE "substring"((impression.website_variables)::text, 'ybt=<br /> [0-9;]*)'::text) END, ';'::text)<br
/>                          <b>->  Hash Left Join  (nodenum=5 cost=117.45..130.08 rows=181 width=104)<br
/>                                Output: impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name,impression.website_variables, ((impression.server_utc_time / 3600000) * 3600),
regexp_split_to_table(CASEWHEN ("substr<br /> ng"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL)
THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN
'none'::textELSE "substring"((impression.website_variables)::text,</b><br /> 'ybt=([0-9;]*)'::text) END, ';'::text)<br
/>                                ->  Parallel reciever HASH-LEFT queue  (nodenum=6 cost=0.00..3.10 rows=10
width=136)<br/>                                       Output: impression.server_utc_time,
impression.is_timeout_default_impression,impression.capsule_name, impression.website_variables,
impression.impression_id<br/>                                        ->  Parallel sender HASH-LEFT queue  (nodenum=7
cost=0.00..3.10rows=10 width=136)<br />                                             Output: impression.server_utc_time,
impression.is_timeout_default_impression,impression.capsule_name, impression.website_variables,
impression.impression_id<br/>                                              ->  Seq Scan on impression_ytw2_row
impression (nodenum=8 cost=0.00..3.10 rows=10 width=136)<br />                                                  
Output:impression.server_utc_time, impression.is_timeout_default_impression, impression.capsule_name,
impression.website_variables,impression.impression_id<br />                                  ->  Hash  (nodenum=9
cost=72.20..72.20rows=3620 width=32)<br />                                       Output: click.impression_id<br
/>                                      ->  Parallel reciever HASH-RIGHT queue  (nodenum=10 cost=0.00..72.20
rows=3620width=32)<br />                                              Output: click.impression_id<br
/>                                            ->  Parallel sender HASH-RIGHT queue  (nodenum=11 cost=0.00..72.20
rows=3620width=32)<br />                                                   Output: click.impression_id<br />
                                                  ->  Seq Scan on clicks_row click  (nodenum=12 cost=0.00..72.20
rows=3620width=32)<br />                                                         Output: click.impression_id<br /><br
/><u><b>Case2 { PLan when I disable code to merge view - basically do not call pullup_subqueries - it does the right
thing}<br /><br /></b></u> Parallel reciever RANDOM queue  (nodenum=0 cost=133.70..137.32 rows=181 width=112)<br />   
Output:ooj.local_time, ooj.is_timeout, ooj.capsulename, (), sum((sum((sum(ooj.impression))))),
sum((sum((sum(ooj.click)))))<br/>   ->  Parallel sender RANDOM queue  (nodenum=1 cost=133.70..137.32 rows=181
width=112)<br/>          Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, (), sum((sum(ooj.impression))),
sum((sum(ooj.click)))<br/>         ->  HashAggregate  (nodenum=2 cost=133.70..137.32 rows=181 width=112)<br
/>              Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, (regexp_split_to_table(CASE WHEN (ooj.isubstr
ISNULL) THEN 'none'::text WHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE ooj.isubstr END, ';'::text)),
sum(ooj.impression),sum(oo<br /> .click)<br />               ->  Parallel reciever HASH-AGG queue  (nodenum=3
cost=117.45..130.98rows=181 width=112)<br />                     Output: ooj.local_time, ooj.is_timeout,
ooj.capsulename,ooj.isubstr, ooj.impression, ooj.click, regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN
'none'::textWHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE ooj.isubstr <br /> ND, ';'::text)<br
/>                    ->  Parallel sender HASH-AGG queue  (nodenum=4 cost=117.45..130.98 rows=181 width=112)<br
/>                          Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, ooj.isubstr, ooj.impression,
ooj.click,regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN 'none'::text WHEN (ooj.isubstr = ''::text) THEN
'none'::textELSE <a href="http://ooj.is">ooj.is</a><br /> bstr END, ';'::text)<br />                           -> 
SubqueryScan ooj  (nodenum=5 cost=117.45..130.98 rows=181 width=112)<br />                                 Output:
ooj.local_time,ooj.is_timeout, ooj.capsulename, ooj.isubstr, ooj.impression, ooj.click, regexp_split_to_table(CASE WHEN
(ooj.isubstrIS NULL) THEN 'none'::text WHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE <br /> oj.isubstr END,
';'::text)<br/>                                 <b>->  Hash Left Join  (nodenum=6 cost=117.45..128.27 rows=181
width=104)<br/>                                       Output: ((impression.server_utc_time / 3600000) * 3600),
impression.is_timeout_default_impression,impression.capsule_name, "substring"((impression.website_variables)::text,
'ybt=([0-9;]*)'::text),1, 1</b><br />                                        ->  Parallel reciever HASH-LEFT queue 
(nodenum=7cost=0.00..3.10 rows=10 width=136)<br />                                             Output:
impression.server_utc_time,impression.is_timeout_default_impression, impression.capsule_name,
impression.website_variables,impression.impression_id<br />                                              -> 
Parallelsender HASH-LEFT queue  (nodenum=8 cost=0.00..3.10 rows=10 width=136)<br
/>                                                  Output: impression.server_utc_time,
impression.is_timeout_default_impression,impression.capsule_name, impression.website_variables,
impression.impression_id<br/>                                                    ->  Seq Scan on impression_ytw2_row
impression (nodenum=9 cost=0.00..3.10 rows=10 width=136)<br />                                                        
Output:impression.server_utc_time, impression.is_timeout_default_impression, impression.capsule_name,
impression.website_variables,impression.impression_id<br />                                        ->  Hash 
(nodenum=10cost=72.20..72.20 rows=3620 width=32)<br />                                             Output:
click.impression_id<br/>                                             ->  Parallel reciever HASH-RIGHT queue 
(nodenum=11cost=0.00..72.20 rows=3620 width=32)<br />                                                    Output:
click.impression_id<br/>                                                   ->  Parallel sender HASH-RIGHT queue 
(nodenum=12cost=0.00..72.20 rows=3620 width=32)<br />                                                          Output:
click.impression_id<br/>                                                         ->  Seq Scan on clicks_row click 
(nodenum=13cost=0.00..72.20 rows=3620 width=32)<br />                                                               
Output:click.impression_id<br /><br /><u><b>More analysis<br /><br /></b></u><ol><li>Looked at code in make_join_rel
andbuild_joinrel_tlist I can see in one case the bms_nonempty_difference code kicks in and finds that the column can be
prunedout and other case it does not do the right thing.<li>I am still trying to work out why pullup_subquery should
makea difference.</ol>Any pointers appreciated .<br /><br />Regards<br />Harmeek<br /><br />2011 at 7:24 AM, Tom Lane
<<ahref="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /> > HarmeekSingh Bedi <<a
href="mailto:harmeeksingh@gmail.com">harmeeksingh@gmail.com</a>>writes:<br />>>    In a plan where Node 1 is
parent{say join) and Node 2 is child<br />>> (say scan)  . If node 1 has a expression say foo(column) then
scan<br/> >> will project 'column'  for sure and join will<br />>>       evaluate foo(column).  Now if the
nodeabove join does not need<br />>> column ? how does postgress remove the column from join's projection<br />
>>list .<br />><br />> See build_joinrel_tlist() in relnode.c.<br />><br />>> I am seeing that it
doesnot in many<br />>>       cases specially when sort appears above.<br />><br />> Please show a concrete
example.<br/> ><br />>                        regards, tom lane<br />><br /><br /> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Enhanced psql in core?
Следующее
От: mike beeper
Дата:
Сообщение: Re: [GENERAL] Creating temp tables inside read only transactions