Обсуждение: Join the same row

Поиск
Список
Период
Сортировка

Join the same row

От
Edison Azzi
Дата:
Hi,

I´m trying to optimize some selects between 2 tables and the best way I
found was
alter the first table and add the fields of the 2nd table. I adjusted
the contents and
now a have only one table with all info that I need. Now resides my
problem, because
of legacy queries I decided to make a Rule that replace the 2nd table.

Until now all worked well, but I found when I make a join between de result
table and de Rule, even tought is the same row in the same table, the
optimizer
generete two access for the same row:
cta_pag is the table and ctapag_adm is the  rule.

CREATE OR REPLACE RULE "_RETURN" AS
    ON SELECT TO ctapag_adm DO INSTEAD  SELECT cta_pag.nrlancto,
cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai
   FROM cta_pag
  WHERE cta_pag.origem = 'A'::bpchar;

This is one of the legacy queries:

select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto and
p.nrlancto = 21861;

EXPLAIN:
Nested Loop  (cost=0.00..11.49 rows=1 width=443) (actual
time=0.081..0.088 rows=1 loops=1)
  ->  Index Scan using cta_pag_pk on cta_pag p  (cost=0.00..5.74 rows=1
width=408) (actual time=0.044..0.046 rows=1 loops=1)
        Index Cond: (nrlancto = 21861::numeric)
  ->  Index Scan using cta_pag_pk on cta_pag  (cost=0.00..5.74 rows=1
width=35) (actual time=0.023..0.025 rows=1 loops=1)
        Index Cond: (21861::numeric = nrlancto)
        Filter: (origem = 'A'::bpchar)
Total runtime: 0.341 ms


    Resulting in twice the time for accessing.

Acessing just on time the same row:

select * from cta_pag p where  p.nrlancto = 21861

EXPLAIN:
Index Scan using cta_pag_pk on cta_pag p  (cost=0.00..5.74 rows=1
width=408) (actual time=0.044..0.047 rows=1 loops=1)
  Index Cond: (nrlancto = 21861::numeric)
Total runtime: 0.161 ms


    Is there a way to force the optimizer to understand that is the same
row?

    Thanks,
    Edison


--
Edison Azzi
<edisonazzi (at ) terra ( dot ) com ( dot ) br>


Re: Join the same row

От
Richard Huxton
Дата:
Edison Azzi wrote:
> Hi,
>
> I´m trying to optimize some selects between 2 tables and the best way I
> found was
> alter the first table and add the fields of the 2nd table. I adjusted
> the contents and
> now a have only one table with all info that I need. Now resides my
> problem, because
> of legacy queries I decided to make a Rule that replace the 2nd table.
>
> Until now all worked well, but I found when I make a join between de result
> table and de Rule, even tought is the same row in the same table, the
> optimizer
> generete two access for the same row:
> cta_pag is the table and ctapag_adm is the  rule.
>
> CREATE OR REPLACE RULE "_RETURN" AS
>    ON SELECT TO ctapag_adm DO INSTEAD  SELECT cta_pag.nrlancto,
> cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai
>   FROM cta_pag
>  WHERE cta_pag.origem = 'A'::bpchar;
>
> This is one of the legacy queries:
>
> select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto and
> p.nrlancto = 21861;

OK - and you get a self-join (which is what you asked for, but you'd
like the planner to notice that it might not be necessary).

>    Resulting in twice the time for accessing.
>
> Acessing just on time the same row:
>
> select * from cta_pag p where  p.nrlancto = 21861

This isn't the same query though. Your rule has an additional condition
origem='A'. This means it wouldn't be correct to eliminate the self-join
even if the planner could.

>    Is there a way to force the optimizer to understand that is the same
> row?

However, even if you removed the condition on origem, I don't think the
planner will notice that it can eliminate the join. It's just too
unusual a case for the planner to have a rule for it.

I might be wrong about the planner - I'm just another user. One of the
developers may correct me.
--
   Richard Huxton
   Archonet Ltd


Re: Join the same row

От
Richard Huxton
Дата:
Edison Azzi wrote:
> Richard Huxton escreveu:
>> However, even if you removed the condition on origem, I don't think
>> the planner will notice that it can eliminate the join. It's just too
>> unusual a case for the planner to have a rule for it.
>>
>> I might be wrong about the planner - I'm just another user. One of the
>> developers may correct me.
>
>
> You are rigth, the planner will not eliminate the join, see:
>
> select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and
> p.nrlancto = 21861;
>
> EXPLAIN:
> Nested Loop  (cost=0.00..11.48 rows=1 width=816)
>  ->  Index Scan using cta_pag_pk on cta_pag a  (cost=0.00..5.74 rows=1
> width=408)
>        Index Cond: (21861::numeric = nrlancto)
>  ->  Index Scan using cta_pag_pk on cta_pag p  (cost=0.00..5.74 rows=1
> width=408)
>        Index Cond: (nrlancto = 21861::numeric)
>
>
> I know that this is too unusual case, but I hoped that the planner could
> deal
> with this condition. I´m trying to speed up without have to rewrite a
> bunch of
> queries. Now I'll have to think another way to work around this issue.

Is the performance really so bad? All the data is guaranteed to be
cached for the second index-scan.

--
   Richard Huxton
   Archonet Ltd


Re: Join the same row

От
Edison Azzi
Дата:
Richard Huxton escreveu:

> Edison Azzi wrote:
>
>> Hi,
>>
>> I´m trying to optimize some selects between 2 tables and the best way
>> I found was
>> alter the first table and add the fields of the 2nd table. I adjusted
>> the contents and
>> now a have only one table with all info that I need. Now resides my
>> problem, because
>> of legacy queries I decided to make a Rule that replace the 2nd table.
>>
>> Until now all worked well, but I found when I make a join between de
>> result
>> table and de Rule, even tought is the same row in the same table, the
>> optimizer
>> generete two access for the same row:
>> cta_pag is the table and ctapag_adm is the  rule.
>>
>> CREATE OR REPLACE RULE "_RETURN" AS
>>    ON SELECT TO ctapag_adm DO INSTEAD  SELECT cta_pag.nrlancto,
>> cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai
>>   FROM cta_pag
>>  WHERE cta_pag.origem = 'A'::bpchar;
>>
>> This is one of the legacy queries:
>>
>> select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto
>> and p.nrlancto = 21861;
>
>
> OK - and you get a self-join (which is what you asked for, but you'd
> like the planner to notice that it might not be necessary).
>
>>    Resulting in twice the time for accessing.
>>
>> Acessing just on time the same row:
>>
>> select * from cta_pag p where  p.nrlancto = 21861
>
>
> This isn't the same query though. Your rule has an additional
> condition origem='A'. This means it wouldn't be correct to eliminate
> the self-join even if the planner could.
>
>>    Is there a way to force the optimizer to understand that is the
>> same row?
>
>
> However, even if you removed the condition on origem, I don't think
> the planner will notice that it can eliminate the join. It's just too
> unusual a case for the planner to have a rule for it.
>
> I might be wrong about the planner - I'm just another user. One of the
> developers may correct me.


You are rigth, the planner will not eliminate the join, see:

select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and
p.nrlancto = 21861;

EXPLAIN:
Nested Loop  (cost=0.00..11.48 rows=1 width=816)
  ->  Index Scan using cta_pag_pk on cta_pag a  (cost=0.00..5.74 rows=1
width=408)
        Index Cond: (21861::numeric = nrlancto)
  ->  Index Scan using cta_pag_pk on cta_pag p  (cost=0.00..5.74 rows=1
width=408)
        Index Cond: (nrlancto = 21861::numeric)


I know that this is too unusual case, but I hoped that the planner could
deal
with this condition. I´m trying to speed up without have to rewrite a
bunch of
queries. Now I'll have to think another way to work around this issue.

Thanks,

    Edison.




Re: Join the same row

От
Tom Lane
Дата:
Edison Azzi <edisonazzi@terra.com.br> writes:
> You are rigth, the planner will not eliminate the join, see:

> select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and
> p.nrlancto = 21861;

> EXPLAIN:
> Nested Loop  (cost=0.00..11.48 rows=1 width=816)
>   ->  Index Scan using cta_pag_pk on cta_pag a  (cost=0.00..5.74 rows=1
> width=408)
>         Index Cond: (21861::numeric = nrlancto)
>   ->  Index Scan using cta_pag_pk on cta_pag p  (cost=0.00..5.74 rows=1
> width=408)
>         Index Cond: (nrlancto = 21861::numeric)

But do you care?  That second fetch of the same row isn't going to cost
much of anything, since everything it needs to touch will have been
sucked into cache already.  I don't really see the case for adding logic
to the planner to detect this particular flavor of badly-written query.

Notice that the planner *is* managing to propagate the constant
comparison to both relations.

            regards, tom lane