Обсуждение: Retrieving unused tuple attributes in ExecScan

Поиск
Список
Период
Сортировка

Retrieving unused tuple attributes in ExecScan

От
"Ma, Marcus"
Дата:

Hey,

 

If I understand correctly, when a Sequential Scan takes place, the ExecScan function (located in executor/execScan.c) does not retrieve all attributes per tuple in the TupleTableSlot and only retrieves the necessary attribute. So for example, let’s imagine we have a table t1 with 3 number fields, c1, c2, and c3. So in the command:

 

Select * from t1 where t1.c1 > 500;

 

The returned TupleTableSlot will have its field of tts_values in the form (X, 0, 0), where X is the real value of t1.c1 but the fields of c2 and c3 are not actually retrieved because they aren’t used. Similarly, for the command:

 

Select * from t1;

 

The TupleTableSlot will always return the values of (0, 0, 0) because no comparisons are necessary. I am working on code where I’ll need access to attributes that aren’t listed in any qualification – what code should I change in execScan, or nodeSeqScan to be able to retrieve any attribute of a tuple? Basically, being able to make execScan return (X, Y, Z) instead of (0, 0, 0) even if the command doesn’t use any attribute comparisons.

 

Marcus

Re: Retrieving unused tuple attributes in ExecScan

От
Andres Freund
Дата:
Hi,

On 2022-06-27 19:00:44 +0000, Ma, Marcus wrote:
> If I understand correctly, when a Sequential Scan takes place, the ExecScan function (located in executor/execScan.c)
doesnot retrieve all attributes per tuple in the TupleTableSlot and only retrieves the necessary attribute. So for
example,let’s imagine we have a table t1 with 3 number fields, c1, c2, and c3. So in the command:
 
> 
> Select * from t1 where t1.c1 > 500;
> 
> The returned TupleTableSlot will have its field of tts_values in the form (X, 0, 0), where X is the real value of
t1.c1but the fields of c2 and c3 are not actually retrieved because they aren’t used. Similarly, for the command:
 
> 
> Select * from t1;
> 
> The TupleTableSlot will always return the values of (0, 0, 0) because no
> comparisons are necessary. I am working on code where I’ll need access to
> attributes that aren’t listed in any qualification – what code should I
> change in execScan, or nodeSeqScan to be able to retrieve any attribute of a
> tuple? Basically, being able to make execScan return (X, Y, Z) instead of
> (0, 0, 0) even if the command doesn’t use any attribute comparisons.

You'll need to tell the planner that those columns are needed. It's not just
seqscans that otherwise will discard / not compute values.

Where exactly do you need those columns and why?

Greetings,

Andres Freund



Re: Retrieving unused tuple attributes in ExecScan

От
"Ma, Marcus"
Дата:
Hey Andres,

So I'm actually using the columns during merge join, basically I'm building a bloom filter on the outer relation and
filteringout data on the inner relation of the join. I'm building the filter on the join keys, so the columns are being
usedfurther up the execution tree. However, even on a command like:
 

Select * from t1 inner join t2 on t1.c1 = t2.c2;

The execScan function returns slots that have (0, 0, 0) even though t1.c1 and t2.c2 will be used later on. I know that
theSort node and the MergeJoin node are able to read the actual values of the join keys, but for some reason the values
aren'tshowing up on the SeqScan level. However, as soon as I add a qualification, such as:
 

Select * from t1 inner join on t1.c1 = t2.c2 where t1.c1 % 2 = 0;

The qualification makes the t1.c1 value show up during execScan, but not the t2.c2 value.

Marcus

On 6/27/22, 3:10 PM, "Andres Freund" <andres@anarazel.de> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 



    Hi,

    On 2022-06-27 19:00:44 +0000, Ma, Marcus wrote:
    > If I understand correctly, when a Sequential Scan takes place, the ExecScan function (located in
executor/execScan.c)does not retrieve all attributes per tuple in the TupleTableSlot and only retrieves the necessary
attribute.So for example, let’s imagine we have a table t1 with 3 number fields, c1, c2, and c3. So in the command:
 
    >
    > Select * from t1 where t1.c1 > 500;
    >
    > The returned TupleTableSlot will have its field of tts_values in the form (X, 0, 0), where X is the real value of
t1.c1but the fields of c2 and c3 are not actually retrieved because they aren’t used. Similarly, for the command:
 
    >
    > Select * from t1;
    >
    > The TupleTableSlot will always return the values of (0, 0, 0) because no
    > comparisons are necessary. I am working on code where I’ll need access to
    > attributes that aren’t listed in any qualification – what code should I
    > change in execScan, or nodeSeqScan to be able to retrieve any attribute of a
    > tuple? Basically, being able to make execScan return (X, Y, Z) instead of
    > (0, 0, 0) even if the command doesn’t use any attribute comparisons.

    You'll need to tell the planner that those columns are needed. It's not just
    seqscans that otherwise will discard / not compute values.

    Where exactly do you need those columns and why?

    Greetings,

    Andres Freund


Re: Retrieving unused tuple attributes in ExecScan

От
Andres Freund
Дата:
Hi,

(please don't top-quote on PG lists)

On 2022-06-27 19:29:34 +0000, Ma, Marcus wrote:
> So I'm actually using the columns during merge join, basically I'm building a bloom filter on the outer relation and
filteringout data on the inner relation of the join. I'm building the filter on the join keys, so the columns are being
usedfurther up the execution tree. However, even on a command like:
 
> 
> Select * from t1 inner join t2 on t1.c1 = t2.c2;
> 
> The execScan function returns slots that have (0, 0, 0) even though t1.c1 and t2.c2 will be used later on. I know
thatthe Sort node and the MergeJoin node are able to read the actual values of the join keys, but for some reason the
valuesaren't showing up on the SeqScan level. However, as soon as I add a qualification, such as:
 
> 
> Select * from t1 inner join on t1.c1 = t2.c2 where t1.c1 % 2 = 0;
> 
> The qualification makes the t1.c1 value show up during execScan, but not the t2.c2 value.

Slots can incrementally deform tuples. You need to call
   slot_getsomeattrs(slot, number-up-to-which-you-need-tuples)
to reliably have columns deformed.

Greetings,

Andres Freund



Re: Retrieving unused tuple attributes in ExecScan

От
"Finnerty, Jim"
Дата:
Re: So I'm actually using the columns during merge join, basically I'm building a bloom filter on the outer relation
andfiltering out data on the inner relation of the join. I'm building the filter on the join keys
 

We had a whole implementation for Bloom filtering for hash inner join, complete with costing and pushdown of the Bloom
filterfrom the build side to the execution tree on the probe side (i.e. building a Bloom filter on the inner side of
thejoin at the conclusion of the build phase of the hash join, then pushing it down as a semi-join filter to the probe
sideof the join, where it could potentially be applied to multiple scans).  After a large change to that same area of
thecode by the community it got commented out and has been in that state ever since.  It's a good example of the sort
ofchange that really ought to be made with the community because there's too much merge burden otherwise.
 

It was a pretty effective optimization in some cases, though.  Most commercial systems have an optimization like this,
sometimeswith special optimizations when the number of distinct join keys is very small. If there is interest in
revivingthis functionality, we could probably extract some patches and work with the community to try to get it running
again. 
 

   /Jim