Обсуждение: combination of function to simple query makes query slow

Поиск
Список
Период
Сортировка

combination of function to simple query makes query slow

От
"jan aerts (RI)"
Дата:
Hi,

I'm having some problems in combining a function within a standard
query.

I basically have two tables (see between hashes below):
* alias_table: objects can have different names (e.g. name1 is an alias
of name2, and name3 is an alias of name4)
* mapping_table: shows objects that 'lead to' other objects (e.g. the
object with name1 leads to the object with name5)

I want to traverse the mapping_table in a transparent way: if a first
object leads to a second object, and that second object leads to a third
object, I want to list the first and the third objects. So from the
example table below: name6 leads to name2, which is an alias of name1,
which in its turn leads to name5. The result of my query should be:
name6 leads to name5.

To accomplish this, I wrote a little function ('aliases_of'), as
described below between the hashes.

The query SELECT aliases_of(1);
returns: aliases_of ----------          1          2
, which is the expected output.

The query SELECT * FROM mapping_table WHERE object1_id IN ( SELECT aliases_of(2) );
returns: object1_id | object1_name | object2_id | object2_name ----------------------------------------------------- 1
       | name1        | 5          | name5 
, which is the expected output, because it is equivalent to: SELECT * FROM mapping_table WHERE object1_id IN ( 1, 2 );

However, the query that would traverse the data: SELECT m1.object1_id, m1.object1_name, m2.object2_id, m2.object2_name
FROMmapping_table m1, mapping_table m2 WHERE m1.object2_id IN ( SELECT aliases_of(m2.object1_id) ); 
just keeps on running, and never seems to stop. (I must say that that
table in effect contains about a million records.)
The expected output would have been: object1_id | object1_name | object2_id | object2_name
-----------------------------------------------------6          | name6        | 5          | name5 

I also tried to following, which doesn't work either. SELECT m1.object1_id, m1.object1_name, m2.object2_id,
m2.object2_nameFROM mapping_table m1 JOIN mapping_table m2 ON ( m1.object2_id IN ( 
SELECT aliases_of(m2.object1_id) );

Is there a way to speed up this query? I'd also have to be able to
traverse the mapping_table ever further, following a path of things that
'lead to' other things.

I hope I made myself clear a bit...
Any help appreciated. Many thanks,
jan.

################################################################
(A) TABLE: alias_table object1_id | object1_name | object2_id | object2_name
-----------------------------------------------------1          | name1        | 2          | name2 3          | name3
     | 4          | name4 

(B) TABLE: mapping_table object1_id | object1_name | object2_id | object2_name
-----------------------------------------------------1          | name1        | 5          | name5 6          | name6
     | 2          | name2 

(C) FUNCTION: aliases_of CREATE FUNCTION aliases_of(bigint) RETURNS SETOF bigint AS 'SELECT $1     UNION     SELECT
object1_idFROM alias_table WHERE object2_id = $1     UNION     SELECT object2_id FROM alias_table WHERE object1_id = $1
  ' LANGUAGE SQL; 
################################################################



---------The obligatory disclaimer--------
The information contained in this e-mail (including any attachments) is
confidential and is intended for the use of the addressee only.   The
opinions expressed within this e-mail (including any attachments) are
the opinions of the sender and do not necessarily constitute those of
Roslin Institute (Edinburgh) ("the Institute") unless specifically
stated by a sender who is duly authorised to do so on behalf of the
Institute.


Re: combination of function to simple query makes query slow

От
Jaime Casanova
Дата:
On 9/30/05, jan aerts (RI) <jan.aerts@bbsrc.ac.uk> wrote:
> Hi,
>
> I'm having some problems in combining a function within a standard
> query.
>
> I basically have two tables (see between hashes below):
> * alias_table: objects can have different names (e.g. name1 is an alias
> of name2, and name3 is an alias of name4)
> * mapping_table: shows objects that 'lead to' other objects (e.g. the
> object with name1 leads to the object with name5)
>
> I want to traverse the mapping_table in a transparent way: if a first
> object leads to a second object, and that second object leads to a third
> object, I want to list the first and the third objects. So from the
> example table below: name6 leads to name2, which is an alias of name1,
> which in its turn leads to name5. The result of my query should be:
> name6 leads to name5.
>
> To accomplish this, I wrote a little function ('aliases_of'), as
> described below between the hashes.
>
> The query
>  SELECT aliases_of(1);
> returns:
>  aliases_of
>  ----------
>           1
>           2
> , which is the expected output.
>
> The query
>  SELECT * FROM mapping_table
>  WHERE object1_id IN ( SELECT aliases_of(2) );
> returns:
>  object1_id | object1_name | object2_id | object2_name
>  -----------------------------------------------------
>  1          | name1        | 5          | name5
> , which is the expected output, because it is equivalent to:
>  SELECT * FROM mapping_table
>  WHERE object1_id IN ( 1, 2 );
>
> However, the query that would traverse the data:
>  SELECT m1.object1_id, m1.object1_name, m2.object2_id, m2.object2_name
>  FROM mapping_table m1, mapping_table m2
>  WHERE m1.object2_id IN ( SELECT aliases_of(m2.object1_id) );
> just keeps on running, and never seems to stop. (I must say that that
> table in effect contains about a million records.)
> The expected output would have been:
>  object1_id | object1_name | object2_id | object2_name
>  -----------------------------------------------------
>  6          | name6        | 5          | name5
>
> I also tried to following, which doesn't work either.
>  SELECT m1.object1_id, m1.object1_name, m2.object2_id, m2.object2_name
>  FROM mapping_table m1 JOIN mapping_table m2 ON ( m1.object2_id IN (
> SELECT aliases_of(m2.object1_id) );
>
> Is there a way to speed up this query? I'd also have to be able to
> traverse the mapping_table ever further, following a path of things that
> 'lead to' other things.
>
> I hope I made myself clear a bit...
> Any help appreciated. Many thanks,
> jan.
>
> ################################################################
> (A) TABLE: alias_table
>  object1_id | object1_name | object2_id | object2_name
>  -----------------------------------------------------
>  1          | name1        | 2          | name2
>  3          | name3        | 4          | name4
>
> (B) TABLE: mapping_table
>  object1_id | object1_name | object2_id | object2_name
>  -----------------------------------------------------
>  1          | name1        | 5          | name5
>  6          | name6        | 2          | name2
>
> (C) FUNCTION: aliases_of
>  CREATE FUNCTION aliases_of(bigint) RETURNS SETOF bigint
>  AS 'SELECT $1
>      UNION
>      SELECT object1_id FROM alias_table WHERE object2_id = $1
>      UNION
>      SELECT object2_id FROM alias_table WHERE object1_id = $1
>     '
>  LANGUAGE SQL;
> ################################################################
>
>


maybe you can show us an EXPLAIN  of your select:

EXPLAIN select_statement


--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: combination of function to simple query makes query slow

От
"jan aerts (RI)"
Дата:
Some more information:

An EXPLAIN of the following query my_db=> explain select m1.object1_id, m1.object2_id, m2.object1_id,
m2.object2_id my_db-> from c_mappings m1, c_mappings m2 my_db-> where m1.object1_id = 16575564 my_db-> and
m2.object1_idin (select aliases_of(m1.object2_id)); 
gives:                                            QUERY PLAN
------------------------------------------------------------------------
----------------------------Nested Loop  (cost=0.00..99746.00 rows=1170281 width=16)  Join Filter: (subplan)  ->  Index
Scanusing ind_cmappings_object1_id on c_mappings m1 
(cost=0.00..6.12 rows=2 width=8)        Index Cond: (object1_id = 16575564)  ->  Seq Scan on c_mappings m2
(cost=0.00..36052.89rows=1435589 
width=8)  SubPlan    ->  Result  (cost=0.00..0.01 rows=1 width=0)
(7 rows)

All columns of c_mappings, as well as the columns that are accessed
through the aliases_of function, as indexed. However, notice how the
second loop uses a "Seq Scan" instead of an "Index Scan".
Is there a way to use an index scan on the results of a function?

Thanks,
jan.

> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of jan aerts (RI)
> Sent: 30 September 2005 17:02
> To: pgsql-sql@postgresql.org
> Subject: [SQL] combination of function to simple query makes
> query slow
>
> Hi,
>
> I'm having some problems in combining a function within a
> standard query.
>
> I basically have two tables (see between hashes below):
> * alias_table: objects can have different names (e.g. name1
> is an alias of name2, and name3 is an alias of name4)
> * mapping_table: shows objects that 'lead to' other objects
> (e.g. the object with name1 leads to the object with name5)
>
> I want to traverse the mapping_table in a transparent way: if
> a first object leads to a second object, and that second
> object leads to a third object, I want to list the first and
> the third objects. So from the example table below: name6
> leads to name2, which is an alias of name1, which in its turn
> leads to name5. The result of my query should be:
> name6 leads to name5.
>
> To accomplish this, I wrote a little function ('aliases_of'),
> as described below between the hashes.
>
> The query
>   SELECT aliases_of(1);
> returns:
>   aliases_of
>   ----------
>            1
>            2
> , which is the expected output.
>
> The query
>   SELECT * FROM mapping_table
>   WHERE object1_id IN ( SELECT aliases_of(2) );
> returns:
>   object1_id | object1_name | object2_id | object2_name
>   -----------------------------------------------------
>   1          | name1        | 5          | name5
> , which is the expected output, because it is equivalent to:
>   SELECT * FROM mapping_table
>   WHERE object1_id IN ( 1, 2 );
>
> However, the query that would traverse the data:
>   SELECT m1.object1_id, m1.object1_name, m2.object2_id,
> m2.object2_name
>   FROM mapping_table m1, mapping_table m2
>   WHERE m1.object2_id IN ( SELECT aliases_of(m2.object1_id)
> ); just keeps on running, and never seems to stop. (I must
> say that that table in effect contains about a million
> records.) The expected output would have been:
>   object1_id | object1_name | object2_id | object2_name
>   -----------------------------------------------------
>   6          | name6        | 5          | name5
>
> I also tried to following, which doesn't work either.
>   SELECT m1.object1_id, m1.object1_name, m2.object2_id,
> m2.object2_name
>   FROM mapping_table m1 JOIN mapping_table m2 ON (
> m1.object2_id IN ( SELECT aliases_of(m2.object1_id) );
>
> Is there a way to speed up this query? I'd also have to be
> able to traverse the mapping_table ever further, following a
> path of things that 'lead to' other things.
>
> I hope I made myself clear a bit...
> Any help appreciated. Many thanks,
> jan.
>
> ################################################################
> (A) TABLE: alias_table
>   object1_id | object1_name | object2_id | object2_name
>   -----------------------------------------------------
>   1          | name1        | 2          | name2
>   3          | name3        | 4          | name4
>
> (B) TABLE: mapping_table
>   object1_id | object1_name | object2_id | object2_name
>   -----------------------------------------------------
>   1          | name1        | 5          | name5
>   6          | name6        | 2          | name2
>
> (C) FUNCTION: aliases_of
>   CREATE FUNCTION aliases_of(bigint) RETURNS SETOF bigint
>   AS 'SELECT $1
>       UNION
>       SELECT object1_id FROM alias_table WHERE object2_id = $1
>       UNION
>       SELECT object2_id FROM alias_table WHERE object1_id = $1
>      '
>   LANGUAGE SQL;
> ################################################################
>
>
>
> ---------The obligatory disclaimer-------- The information
> contained in this e-mail (including any attachments) is
> confidential and is intended for the use of the addressee only.   The
> opinions expressed within this e-mail (including any
> attachments) are the opinions of the sender and do not
> necessarily constitute those of Roslin Institute (Edinburgh)
> ("the Institute") unless specifically stated by a sender who
> is duly authorised to do so on behalf of the Institute.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>


Re: combination of function to simple query makes query slow

От
Jaime Casanova
Дата:
On 10/3/05, jan aerts (RI) <jan.aerts@bbsrc.ac.uk> wrote:
> Some more information:
>
> An EXPLAIN of the following query
>  my_db=> explain select m1.object1_id, m1.object2_id, m2.object1_id,
> m2.object2_id
>  my_db-> from c_mappings m1, c_mappings m2
>  my_db-> where m1.object1_id = 16575564
>  my_db-> and m2.object1_id in (select aliases_of(m1.object2_id));
> gives:
>                                             QUERY PLAN
> ------------------------------------------------------------------------
> ----------------------------
>  Nested Loop  (cost=0.00..99746.00 rows=1170281 width=16)
>   Join Filter: (subplan)
>   ->  Index Scan using ind_cmappings_object1_id on c_mappings m1
> (cost=0.00..6.12 rows=2 width=8)
>         Index Cond: (object1_id = 16575564)
>   ->  Seq Scan on c_mappings m2  (cost=0.00..36052.89 rows=1435589
> width=8)
>   SubPlan
>     ->  Result  (cost=0.00..0.01 rows=1 width=0)
> (7 rows)
>
> All columns of c_mappings, as well as the columns that are accessed
> through the aliases_of function, as indexed. However, notice how the
> second loop uses a "Seq Scan" instead of an "Index Scan".
> Is there a way to use an index scan on the results of a function?
>
> Thanks,
> jan.
>


what version is your postgres?

what if you make temp table first? something like this:

select * from c_mappings
where object1_id = 16575564  into temp m1;

select m1.object1_id, m1.object2_id, m2.object1_id,  m2.object2_id from m1, c_mappings m2
where m2.object1_id in (select aliases_of(m1.object2_id));

just an idea...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: combination of function to simple query makes query slow

От
"jan aerts (RI)"
Дата:
My Postgres version is 7.3.4 (on a central server, so I can't upgrade if
that would be one of the suggestions...)

It is indeed completely valid to make such a temporary table, but I need
this function to help me automate some standard queries other people can
make on the database. In other words: (1) the query has to be done in
one go, instead of using 2 different selects, and (2) using the complete
definition of the function in the query itself becomes pretty bloating,
as this function can be used 3 or more times in a single query...

I found out that it has probably something to do with Postgres not able
to use an index scan on this function. Even though I ended up defining
it as 'stable' or even 'immutable'. (Or I may be completely wrong, of
course).
I thought that making a function stable or immutable would make it
available for an index search.

Any additional suggestions?

Thanks,
jan.

> -----Original Message-----
> From: Jaime Casanova [mailto:systemguards@gmail.com]
> Sent: 03 October 2005 20:22
> To: jan aerts (RI)
> Cc: pgsql-sql@postgresql.org
> Subject: Re: [SQL] combination of function to simple query
> makes query slow
>
> On 10/3/05, jan aerts (RI) <jan.aerts@bbsrc.ac.uk> wrote:
> > Some more information:
> >
> > An EXPLAIN of the following query
> >  my_db=> explain select m1.object1_id, m1.object2_id,
> m2.object1_id,
> > m2.object2_id  my_db-> from c_mappings m1, c_mappings m2  my_db->
> > where m1.object1_id = 16575564  my_db-> and m2.object1_id
> in (select
> > aliases_of(m1.object2_id));
> > gives:
> >                                             QUERY PLAN
> >
> ----------------------------------------------------------------------
> > --
> > ----------------------------
> >  Nested Loop  (cost=0.00..99746.00 rows=1170281 width=16)
> >   Join Filter: (subplan)
> >   ->  Index Scan using ind_cmappings_object1_id on c_mappings m1
> > (cost=0.00..6.12 rows=2 width=8)
> >         Index Cond: (object1_id = 16575564)
> >   ->  Seq Scan on c_mappings m2  (cost=0.00..36052.89 rows=1435589
> > width=8)
> >   SubPlan
> >     ->  Result  (cost=0.00..0.01 rows=1 width=0)
> > (7 rows)
> >
> > All columns of c_mappings, as well as the columns that are accessed
> > through the aliases_of function, as indexed. However,
> notice how the
> > second loop uses a "Seq Scan" instead of an "Index Scan".
> > Is there a way to use an index scan on the results of a function?
> >
> > Thanks,
> > jan.
> >
>
>
> what version is your postgres?
>
> what if you make temp table first? something like this:
>
> select * from c_mappings
> where object1_id = 16575564
>    into temp m1;
>
> select m1.object1_id, m1.object2_id, m2.object1_id,  m2.object2_id
>   from m1, c_mappings m2
> where m2.object1_id in (select aliases_of(m1.object2_id));
>
> just an idea...
>
> --
> regards,
> Jaime Casanova
> (DBA: DataBase Aniquilator ;)
>


Re: combination of function to simple query makes query slow

От
Tom Lane
Дата:
"jan aerts (RI)" <jan.aerts@bbsrc.ac.uk> writes:
> My Postgres version is 7.3.4 (on a central server, so I can't upgrade if
> that would be one of the suggestions...)

7.3.4 has multiple known data-loss bugs and security issues.  If you're
dealing with someone who won't upgrade it, find someone else to deal
with.  At the very least they need to move to 7.3.10 (or as of today,
7.3.11).  See
http://developer.postgresql.org/docs/postgres/release-7-3-11.html
and following pages for reasons why.

> I thought that making a function stable or immutable would make it
> available for an index search.

Your problem isn't the function, it's the IN (SELECT ...) construct.
7.3 is not bright enough to optimize that.  (Given that it's a
correlated sub-SELECT, I'm afraid later releases aren't either :-(.)
You need to find a way of expressing the query without that.

My guess is that trying to use a function for this is counterproductive
in itself; the table access that's going on inside the function needs
to be exposed for optimization in order to get reasonable overall
performance.
        regards, tom lane