Обсуждение: getting the ranks out of items with SHARED

Поиск
Список
Период
Сортировка

getting the ranks out of items with SHARED

От
Janning Vygen
Дата:
Hi,

in postgresql you have several possibilites to get the rank of items. A thread
earlier this year shows correlated subqueries (not very performant) and other
tricks and techniques to solve the ranking problem:

http://archives.postgresql.org/pgsql-general/2005-05/msg00157.php

The possibility to use a SHARED variable in plperl can be another nice way to
get a rank of items. ( good example showing SHARED in use is at
http://www.varlena.com/varlena/GeneralBits/114.php)

So i wrote two functions for my purpose:

CREATE OR REPLACE function ranking(int4) RETURNS int4 LANGUAGE 'plperl' AS $$
  my %this;

  $this{'punkte'}  = shift;
  $this{'ranking'} = $_SHARED{'prev'}{'ranking'}; # defaults to 0
  $this{'count'}   = $_SHARED{'prev'}{'count'} + 1; # defaults to 1

  $this{'ranking'} = $this{'count'} unless
    $this{'punkte'} == $_SHARED{'prev'}{'punkte'};

  $_SHARED{'prev'} = \%this;
  return $this{'ranking'};
$$;

CREATE OR REPLACE FUNCTION reset_ranking() RETURNS void LANGUAGE 'plperl' AS
$$
  $_SHARED{'prev'} = undef;
$$;


Nice Thing: the function drops rankings which other ranking solutions in the
given thread can't. Like this:

rank | points
-------------
1    | 10
2    |  9
2    |  9
4    |  8
5    |  7

It drops rank 3 because we have to entries for second rank.

It would be even nice if you can write a genric ranking() function which takes
anyarray as an argument, but as far as i know you can't pass an "anyarray" to
a plperl function, right?


Now i can do the following in plpsql Function which updates a caching table
for me and it works fine:

PERFORM reset_ranking();

CREATE TEMP TABLE ranking AS
SELECT
  *,
  ranking(r1.gc_gesamtpunkte) AS rank
FROM (
  SELECT
    mg_name,
    gc_gesamtpunkte
    FROM temp_gc
    ORDER BY gc_gesamtpunkte DESC, mg_name ASC
) AS r1
;

EXECUTE '
      UPDATE temp_gc
      SET gc_rank = ranking.rank
      FROM ranking
      WHERE temp_gc.mg_name = ranking.mg_name;
';

DROP TABLE ranking;


Problems arrise when you try to do the select and update step together without
any temporary table in between:

PERFORM reset_ranking();

UPDATE temp_gc SET gc_rank = ranking.rank
FROM (
  SELECT
    *,
    ranking(r1.gc_gesamtpunkte) AS rank
  FROM (
    SELECT
      mg_name,
      gc_gesamtpunkte
    FROM temp_gc
    ORDER BY gc_gesamtpunkte DESC, mg_name ASC
  ) AS r1
) AS ranking
WHERE temp_gc.mg_name = ranking.mg_name;
';

I have a guess, what happens here: The order of the subselect statement is
dropped by the optimizer because the optimizer doesn't see the "side-effect"
of the ranking function. that's ok because using such functions isn't SQLish,
i guess.

Is there a way to FORCE the optimizer to keep things orders like the sql
statement author wanted it?

kind regards,
janning


Re: getting the ranks out of items with SHARED

От
Tom Lane
Дата:
Janning Vygen <vygen@gmx.de> writes:
> I have a guess, what happens here: The order of the subselect statement is
> dropped by the optimizer because the optimizer doesn't see the "side-effect"
> of the ranking function.

That guess is wrong.

I think the problem is that you are trying to update multiple rows in
the same statement, which would require a "reset ranking" between each
row, which this approach doesn't provide for.

The whole thing looks mighty fragile in other ways; anything involving a
single global variable isn't going to work nicely in very many cases.
Consider casting your solution as an aggregate instead...

            regards, tom lane

postgresql SSL off

От
Tony Smith
Дата:
When I was trying to connect my databse with jdbc, I
got the following error message:

org.postgresql.util.PSQLException: Connection
rejected: FATAL: no pg_hba.conf entry for host
"mydomain", user "xxxx", database "myDB", SSL off.

When I run in dos console "psql myDB..." it works
fine.

My jdbc code is at a different at different domain
from the database machine.

Thanks,




__________________________________
Do you Yahoo!?
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250

Re: postgresql SSL off

От
"Joshua D. Drake"
Дата:
Tony Smith wrote:
> When I was trying to connect my databse with jdbc, I
> got the following error message:
>
> org.postgresql.util.PSQLException: Connection
> rejected: FATAL: no pg_hba.conf entry for host
> "mydomain", user "xxxx", database "myDB", SSL off.
>

You need to setup your pg_hba.conf to allow remote connections to
the database for the user.

http://www.postgresql.org/docs/7.4/static/client-authentication.html


Sincerely,

Joshua D. Drake



> When I run in dos console "psql myDB..." it works
> fine.
>
> My jdbc code is at a different at different domain
> from the database machine.
>
> Thanks,
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail - Find what you need with new enhanced search.
> http://info.mail.yahoo.com/mail_250
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org


--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

Re: getting the ranks out of items with SHARED

От
Janning Vygen
Дата:
Hi,

Am Mittwoch, 13. Juli 2005 00:03 schrieb Tom Lane:
> Janning Vygen <vygen@gmx.de> writes:
> > I have a guess, what happens here: The order of the subselect statement
> > is dropped by the optimizer because the optimizer doesn't see the
> > "side-effect" of the ranking function.
>
> That guess is wrong.

ah, and i already thought to be already on a higher level of understanding
postgresql...

> I think the problem is that you are trying to update multiple rows in
> the same statement, which would require a "reset ranking" between each
> row, which this approach doesn't provide for.

no thats not the point, i guess (which might be wrong again)

but i still don't understand why it doesn't work:

this is my important query named *Q* :=

   SELECT
     *,
     ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
   FROM (
     SELECT
       mg_name,
       gc_gesamtpunkte,
       gc_gesamtsiege
     FROM temp_gc
     ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
   ) AS r1

this way it works:

CREATE TEMP TABLE ranking AS *Q*;
EXECUTE 'UPDATE temp_gc SET gc_rank = ranking.rank
FROM ranking WHERE temp_gc.mg_name = ranking.mg_name;';

and this way it doesn't:

UPDATE temp_gc
SET gc_rank = ranking.rank
FROM (*Q*)
ranking
WHERE temp_gc.mg_name = ranking.mg_name;
;

i want to update multiple rows, but the all data in table temp_gc doesnt need
a reset of the ranking.

> The whole thing looks mighty fragile in other ways; anything involving a
> single global variable isn't going to work nicely in very many cases.
> Consider casting your solution as an aggregate instead...

I know that this is not the best solution but it is the fastest. A corrolated
subquery with aggregates takes ages in opposite to the ranking function
solution.

But by the time of writing i have a new problem with my solution posted today
with subject "strange error with temp table: pg_type_typname_nsp_index"

kind regards,
janning

Re: getting the ranks out of items with SHARED

От
Tom Lane
Дата:
Janning Vygen <vygen@gmx.de> writes:
> this way it works:

> CREATE TEMP TABLE ranking AS *Q*;
> EXECUTE 'UPDATE temp_gc SET gc_rank = ranking.rank
> FROM ranking WHERE temp_gc.mg_name = ranking.mg_name;';

> and this way it doesn't:

> UPDATE temp_gc
> SET gc_rank = ranking.rank
> FROM (*Q*)
> ranking
> WHERE temp_gc.mg_name = ranking.mg_name;

It's difficult to be sure without looking at EXPLAIN output, but I would
guess that the second query is being done with a plan that involves
multiple scans of "*Q*", and that's confusing your function.

            regards, tom lane

Re: getting the ranks out of items with SHARED

От
Tom Lane
Дата:
Janning Vygen <vygen@planwerk6.de> writes:
> and this is the combined statement:

>    UPDATE temp_gc
>     SET gc_rank = ranking.rank
>     FROM (

>     SELECT
>       *,
>     ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
>     FROM (
>         SELECT
>           mg_name,
>           gc_gesamtpunkte,
>           gc_gesamtsiege
>         FROM temp_gc
>         ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
>     ) AS r1
>     ) AS ranking
>     WHERE temp_gc.mg_name = ranking.mg_name;


> to me it looks like the call to ranking() is moved from the subquery plan
> "upwards". but i really can't interpret this explain output regarding to
> "where" the ranking funcion is called.

Yeah, I think you are exactly right.  The EXPLAIN output doesn't show
targetlist contents (EXPLAIN VERBOSE would, but not very readably :-(),
but the small difference in the estimated costs seems to indicate that
the function evaluation is not happening at the level of the "subquery
scan" plan node in the combined plan.  This is consistent with the
planner's behavior in general.  It will honor the "order by" in the
sense that the output of the "r1" subquery is delivered to the upper
plan level in that order, but it feels no compunction about flattening
the "ranking" subquery into the outer query, and after that you cannot
be certain about the order in which the executions of the ranking()
function happen.

What you need is to prevent the flattening of the intermediate subquery.
The current favorite technique is to insert a dummy OFFSET clause:

 UPDATE temp_gc
    SET gc_rank = ranking.rank
    FROM (

    SELECT
      *,
    ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
    FROM (
        SELECT
          mg_name,
          gc_gesamtpunkte,
          gc_gesamtsiege
        FROM temp_gc
        ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
    ) AS r1
    OFFSET 0
    ) AS ranking
    WHERE temp_gc.mg_name = ranking.mg_name;

although you could also do it by inserting an ORDER BY at that level.

BTW, there's been some discussion of preventing flattening of subqueries
whose SELECT lists contain volatile functions.  If we did that then
declaring ranking() as volatile would be enough to avoid the problem.
I've been hesitant to make the change because I'm concerned about the
probable loss of optimization in cases where the function is labeled
volatile by default, merely because the author didn't think about how
to mark it.  And in any case "volatile" doesn't really describe the
issue with your function...

> my problem is that getting a rank out of items is very expensive with
> aggregate functions,

I'm unconvinced --- I don't see a reason that you can't implement it
exactly the same way as an aggregate.  The only difference is that the
state value is an aggregate state value instead of a global variable.
There might be some extra palloc overhead, but nothing worse.

            regards, tom lane

Re: getting the ranks out of items with SHARED

От
Janning Vygen
Дата:
Am Mittwoch, 13. Juli 2005 15:35 schrieb Tom Lane:
> Janning Vygen <vygen@gmx.de> writes:
> > this way it works:
> >
> > CREATE TEMP TABLE ranking AS *Q*;
> > EXECUTE 'UPDATE temp_gc SET gc_rank = ranking.rank
> > FROM ranking WHERE temp_gc.mg_name = ranking.mg_name;';
> >
> > and this way it doesn't:
> >
> > UPDATE temp_gc
> > SET gc_rank = ranking.rank
> > FROM (*Q*)
> > ranking
> > WHERE temp_gc.mg_name = ranking.mg_name;
>
> It's difficult to be sure without looking at EXPLAIN output, but I would
> guess that the second query is being done with a plan that involves
> multiple scans of "*Q*", and that's confusing your function.
>
>             regards, tom lane

here you are. both versions with explain output
first version creates temp table (explain no 1) and updates afterwards
(explain no. 2).
second version combines both (explain no.3 )

[whats the best way to post explain output? My mailclient wraps the output. i
hope it is still readable]

no 1 ***** first create temp table *****
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Subquery Scan r1  (cost=58.54..70.99 rows=830 width=44) (actual
time=0.186..0.403 rows=7 loops=1)
   ->  Sort  (cost=58.54..60.62 rows=830 width=44) (actual time=0.106..0.137
rows=7 loops=1)
         Sort Key: gc_gesamtpunkte, gc_gesamtsiege, mg_name
         ->  Seq Scan on temp_gc  (cost=0.00..18.30 rows=830 width=44) (actual
time=0.015..0.052 rows=7 loops=1)
 Total runtime: 0.470 ms
(5 Zeilen)

no 2 **** update statement references temp table ****
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=127.70..193.49 rows=4109 width=70) (actual
time=0.221..0.404 rows=7 loops=1)
   Merge Cond: ("outer".mg_name = "inner".mg_name)
   ->  Sort  (cost=58.54..60.62 rows=830 width=66) (actual time=0.110..0.142
rows=7 loops=1)
         Sort Key: temp_gc.mg_name
         ->  Seq Scan on temp_gc  (cost=0.00..18.30 rows=830 width=66) (actual
time=0.013..0.055 rows=7 loops=1)
   ->  Sort  (cost=69.16..71.63 rows=990 width=36) (actual time=0.089..0.119
rows=7 loops=1)
         Sort Key: ranking.mg_name
         ->  Seq Scan on ranking  (cost=0.00..19.90 rows=990 width=36) (actual
time=0.006..0.042 rows=7 loops=1)
 Total runtime: 0.525 ms
(9 Zeilen)

no 3 **** combined update statement ****
                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=167.70..232.14 rows=3445 width=78) (actual
time=0.455..0.774 rows=7 loops=1)
   Merge Cond: ("outer".mg_name = "inner".mg_name)
   ->  Sort  (cost=58.54..60.62 rows=830 width=66) (actual time=0.111..0.142
rows=7 loops=1)
         Sort Key: temp_gc.mg_name
         ->  Seq Scan on temp_gc  (cost=0.00..18.30 rows=830 width=66) (actual
time=0.016..0.057 rows=7 loops=1)
   ->  Sort  (cost=109.16..111.23 rows=830 width=44) (actual time=0.248..0.281
rows=7 loops=1)
         Sort Key: r1.mg_name
         ->  Subquery Scan r1  (cost=58.54..68.92 rows=830 width=44) (actual
time=0.102..0.201 rows=7 loops=1)
               ->  Sort  (cost=58.54..60.62 rows=830 width=44) (actual
time=0.092..0.125 rows=7 loops=1)
                     Sort Key: gc_gesamtpunkte, gc_gesamtsiege, mg_name
                     ->  Seq Scan on temp_gc  (cost=0.00..18.30 rows=830
width=44) (actual time=0.008..0.045 rows=7 loops=1)
 Total runtime: 0.886 ms
(12 Zeilen)

and this is the combined statement:

   UPDATE temp_gc
    SET gc_rank = ranking.rank
    FROM (

    SELECT
      *,
    ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
    FROM (
        SELECT
          mg_name,
          gc_gesamtpunkte,
          gc_gesamtsiege
        FROM temp_gc
        ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
    ) AS r1
    ) AS ranking
    WHERE temp_gc.mg_name = ranking.mg_name;


to me it looks like the call to ranking() is moved from the subquery plan
"upwards". but i really can't interpret this explain output regarding to
"where" the ranking funcion is called.

my problem is that getting a rank out of items is very expensive with
aggregate functions, so i try to do a trick here which is not very relational
indeed and this of course can lead to trouble.

Maybe i have to rethink the whole stuff. (because i get some temp table errors
anyway as mentioned in the other thread)

Kind regards,
Janning Vygen