Обсуждение: transposing data for a view

Поиск

Список

Период

Сортировка

transposing data for a view

От

jeremy@wundt.psychiatry.uiowa.edu (H Jeremy Bockholt)

Дата:

31 октября 2001 г., 14:32:29

I have a generalized table:

scanid | region | volume
-------------------------
1          A      34.4
1          B      32.1
1          C      29.1
2          A      32.4
2          B      33.2
2          C      35.6
.
.
.

I want to create a flattened out view that looks like the following:

scanid | A_volume | B_volume | C_volume
----------------------------------------
1        34.4         32.1      29.1
2        32.4         33.2      35.6
.
.
.

How do I correctly/efficiently construct a psql query to
pivot/transpose the data?  I am using postgreSQL version 7.0.x

thanks,
Jeremy

Re: transposing data for a view

От

"James Orr"

Дата:

31 октября 2001 г., 15:04:06

----- Original Message ----- 
From: "H Jeremy Bockholt" <jeremy@wundt.psychiatry.uiowa.edu>
To: <pgsql-sql@postgresql.org>
Sent: Tuesday, October 30, 2001 7:36 PM
Subject: [SQL] transposing data for a view


> I have a generalized table:
> 
> scanid | region | volume
> -------------------------
> 1          A      34.4
> 1          B      32.1
> 1          C      29.1
> 2          A      32.4
> 2          B      33.2
> 2          C      35.6
> .
> .
> .
> 
> I want to create a flattened out view that looks like the following:
> 
> scanid | A_volume | B_volume | C_volume
> ----------------------------------------
> 1        34.4         32.1      29.1
> 2        32.4         33.2      35.6
> .
> .
> .
> 
> How do I correctly/efficiently construct a psql query to
> pivot/transpose the data?  I am using postgreSQL version 7.0.x

SELECT    A.scanid,   A.volume AS A_volume,   B.volume AS B_volume,   C.volume AS C_volume
FROM   table A JOIN   table B ON (A.scanid = B.scanid) JOIN   table C ON (B.scanid = C.scanid)
WHERE   A.region = 'A' AND   B.region = 'B' AND   C.region = 'C'

- James

Re: transposing data for a view

От

"Josh Berkus"

Дата:

31 октября 2001 г., 16:02:15

Jeremy,

First, to do a pivot table, you have to be using Postgres 7.1.x.  7.0.x
will not do it.  So upgrade now.

> I want to create a flattened out view that looks like the following:
> 
> scanid | A_volume | B_volume | C_volume
> ----------------------------------------
> 1        34.4         32.1      29.1
> 2        32.4         33.2      35.6

There are two approaches, the simple approach and the complex.  The
simple approach requires you to know in advance of building the view all
of the possible values for your category column.  The complex approach,
which is dynamic, requires a rather sophisticated function (which I will
write eventually, really!) so we won't go into it here.

The simple approach is to create each column as a sub-select in the FROM
clause of your statement.  So, per the example above:

SELECT scanid, A_volume, B_volume, C_volume
FROM (SELECT scanid FROM volumes GROUP BY scanid) scan
LEFT OUTER JOIN
(SELECT scanid, volume as A_volume FROM volumes WHERE region = 'A') av
ON scan.scanid = av.scanid LEFT OUTER JOIN
(SELECT scanid, volume as B_volume FROM volumes WHERE region = 'B') bv
ON scan.scanid = bv.scanid LEFT OUTER JOIN
(SELECT scanid, volume as C_volume FROM volumes WHERE region = 'C') cv
ON scan.scanid = cv.scanid
ORDER BY scanid;

This approach can be adapted to include aggregates and the like.

-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco

Re: transposing data for a view

От

"Aasmund Midttun Godal"

Дата:

31 октября 2001 г., 22:32:07

I think this might do the trick...

CREATE FUNCTION hori_view() RETURNS BOOLEAN AS 'DECLARE view_select TEXT; view_from TEXT; view_where TEXT; column_name
TEXT;last_column_name TEXT; g_row generalized_table%ROWTYPE;BEGIN SELECT region INTO column_name    FROM
generalized_tableORDER BY region LIMIT 1; view_select := ''SELECT '' || column_name ||     ''.scanid, '' || column_name
||''.volume AS "'' ||     column_name || ''_volume"''; view_from := '' FROM generalized_table '' || column_name;
view_where:= '' WHERE '' ||column_name ||     ''.region = '''''' || column_name || ''''''''; last_column_name :=
column_name;FOR g_row IN SELECT DISTINCT ON (region) *    FROM generalized_table ORDER BY region OFFSET 1 LOOP
view_select:= view_select || '', '' || g_row.region ||    ''.volume AS "'' || g_row.region || ''_volume"'';
view_from:= view_from || '' JOIN generalized_table '' ||        g_row.region || '' ON ('' || last_column_name ||
''.scanid= '' || g_row.region || ''.scanid)'';    view_where := view_where || '' AND '' || g_row.region ||
''.region= '''''' ||  g_row.region || '''''''';    last_column_name := g_row.region; END LOOP;EXECUTE ''CREATE VIEW
generalized_viewAS '' || view_select ||        view_from || view_where;RETURN TRUE;      END;
 
' LANGUAGE 'plpgsql';

SELECT hori_view();

SELECT * FROM generalized_view;

Ok, it may not be pretty but it works,

Regards,

Aasmund.

On Wed, 31 Oct 2001 12:42:10 -0800, "Josh Berkus" <josh@agliodbs.com> wrote:
> Jeremy,
> 
> First, to do a pivot table, you have to be using Postgres 7.1.x.  7.0.x
> will not do it.  So upgrade now.
> 
> 
> There are two approaches, the simple approach and the complex.  The
> simple approach requires you to know in advance of building the view all
> of the possible values for your category column.  The complex approach,
> which is dynamic, requires a rather sophisticated function (which I will
> write eventually, really!) so we won't go into it here.
> 
> The simple approach is to create each column as a sub-select in the FROM
> clause of your statement.  So, per the example above:
> 
> SELECT scanid, A_volume, B_volume, C_volume
> FROM (SELECT scanid FROM volumes GROUP BY scanid) scan
> LEFT OUTER JOIN
> (SELECT scanid, volume as A_volume FROM volumes WHERE region = 'A') av
> ON scan.scanid = av.scanid LEFT OUTER JOIN
> (SELECT scanid, volume as B_volume FROM volumes WHERE region = 'B') bv
> ON scan.scanid = bv.scanid LEFT OUTER JOIN
> (SELECT scanid, volume as C_volume FROM volumes WHERE region = 'C') cv
> ON scan.scanid = cv.scanid
> ORDER BY scanid;
> 
> This approach can be adapted to include aggregates and the like.
> 
> -Josh Berkus
> 
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                        Josh Berkus
>   Complete information technology      josh@agliodbs.com
>    and data management solutions       (415) 565-7293
>   for law firms, small businesses        fax 621-2533
>     and non-profit organizations.      San Francisco
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Aasmund Midttun Godal

aasmund@godal.com - http://www.godal.com/
+47 40 45 20 46

Re: transposing data for a view

От

jeremy@wundt.psychiatry.uiowa.edu (H Jeremy Bockholt)

Дата:

01 ноября 2001 г., 10:38:45

Hi,
I see how your idea could work--the only thing I would change would
be region needs to be scanid on the first line; however,
I'm using PostgreSQL v7.03, so I can not do subqueries within FROM clause.

Other than creating a temporary table, is there a way to
restructure this solution to get around this limitation?

thanks,
jeremy

Arian Prins <prinsarian@zonnet.nl> wrote in message news:<3BDFAD88.6BB058DA@zonnet.nl>...
> H Jeremy Bockholt schreef:
> 
> > I have a generalized table:
> >
> > scanid | region | volume
> > -------------------------
> > 1          A      34.4
> > 1          B      32.1
> > 1          C      29.1
> > 2          A      32.4
> > 2          B      33.2
> > 2          C      35.6
> > .
> > .
> > .
> >
> > I want to create a flattened out view that looks like the following:
> >
> > scanid | A_volume | B_volume | C_volume
> > ----------------------------------------
> > 1        34.4         32.1      29.1
> > 2        32.4         33.2      35.6
> > .
> > .
> > .
> >
> > How do I correctly/efficiently construct a psql query to
> > pivot/transpose the data?  I am using postgreSQL version 7.0.x
> >
> > thanks,
> > Jeremy
> 
> Try This:
> 
> select region, sum(a_volume) AS a_volume,
>                      sum(b_volume) AS b_volume,
>                     sum(c_volume) AS c_volume
> from (
>     select
>         scanid,
>         volume AS a_volume,
>         0 AS b_volume,
>         0 AS c_volume
>     from mytable
>     where region = A
>     UNION
>     select
>         scanid,
>         0 AS a_volume,
>         volume AS b_volume,
>         0 AS c_volume
>     from mytable
>     where region = B
>     UNION
>     select
>         scanid,
>        0 AS a_volume,
>         0 AS b_volume,
>         volume AS c_volume
>     from mytable
>     where region = C
>  ) tmp
> 
> (you might have to specifically typecast the zero's)
> 
> It would probably also be possible using CASE-statements. This is just
> _one_ idea.
> 
> Arian.

Re: transposing data for a view

От

Arian Prins

Дата:

01 ноября 2001 г., 10:54:29


H Jeremy Bockholt schreef:

> I have a generalized table:
>
> scanid | region | volume
> -------------------------
> 1          A      34.4
> 1          B      32.1
> 1          C      29.1
> 2          A      32.4
> 2          B      33.2
> 2          C      35.6
> .
> .
> .
>
> I want to create a flattened out view that looks like the following:
>
> scanid | A_volume | B_volume | C_volume
> ----------------------------------------
> 1        34.4         32.1      29.1
> 2        32.4         33.2      35.6
> .
> .
> .
>
> How do I correctly/efficiently construct a psql query to
> pivot/transpose the data?  I am using postgreSQL version 7.0.x
>
> thanks,
> Jeremy

Try This:

select region, sum(a_volume) AS a_volume,                    sum(b_volume) AS b_volume,                   sum(c_volume)
ASc_volume
 
from (   select       scanid,       volume AS a_volume,       0 AS b_volume,       0 AS c_volume   from mytable   where
region= A   UNION   select       scanid,       0 AS a_volume,       volume AS b_volume,       0 AS c_volume   from
mytable  where region = B   UNION   select       scanid,      0 AS a_volume,       0 AS b_volume,       volume AS
c_volume  from mytable   where region = C) tmp
 

(you might have to specifically typecast the zero's)

It would probably also be possible using CASE-statements. This is just
_one_ idea.

Arian.

Re: transposing data for a view

От

"Josh Berkus"

Дата:

01 ноября 2001 г., 11:24:26

Jeff,

> SELECT
>    scanid,
>    sum ( CASE WHEN region = 'A' THEN volume ELSE NULL
> )
>      AS A_volume,
>    sum ( CASE WHEN region = 'B' THEN volume ELSE NULL
> )
>      AS B_volume,
>    sum ( CASE WHEN region = 'C' THEN volume ELSE NULL
> )
>      AS C_volume
> FROM table
> GROUP BY scanid;

Hey, that's an elegant solution to doing it in 7.0.3.  I hadn't thought
of it.  Jeremy, never mind what I said about being forced to upgrade.
Upgrading *would* still be a good idea, of course.

Of couse, it's only *half* a solution.  Your query will result in:

scanid    A_volume  B_volume  C_volume
1    34.5
1          55.1
1                12.3
2    11.1
etc.

For the second half of the solution, Jeremy needs to create the above as
a view ('volume_rollup_1') and apply this second view:

SELECT scanid, SUM(A_volume) as A_volume, SUM(B_Volume) as B_Volume,
SUM(C_volume) as C_volume
FROM volume_rollup_1;

This will give Jeremy the "pivot" grid he's looking for.

> BTW, I don't believe the self-join approach proposed
> earlier will work, because joining on "scanid" will
> create a cartesian type join where the region values
> will be duplicated (multiplicated!).

Not if you're talking about my query, they won't.  I use that query form
in many projects to create roll-ups; it's the "best" SQL92 approach to
the "pivot table" problem.  However, it will not work in 7.0.3.

-Josh


______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Вложения

Re: transposing data for a view

От

"Josh Berkus"

Дата:

01 ноября 2001 г., 11:54:33

Jeremy,

> I see how your idea could work--the only thing I would change would
> be region needs to be scanid on the first line; however,
> I'm using PostgreSQL v7.03, so I can not do subqueries within FROM
> clause.
>
> Other than creating a temporary table, is there a way to
> restructure this solution to get around this limitation?

No, you need to upgrade.  What's the obstacle to using 7.1.3, anyway?

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Вложения

Re: transposing data for a view

От

Jeff Eckermann

Дата:

01 ноября 2001 г., 11:56:45

How about:

SELECT  scanid,  sum ( CASE WHEN region = 'A' THEN volume ELSE NULL
)    AS A_volume,  sum ( CASE WHEN region = 'B' THEN volume ELSE NULL
)    AS B_volume,  sum ( CASE WHEN region = 'C' THEN volume ELSE NULL
)    AS C_volume
FROM table
GROUP BY scanid;

Requires that you know in advance the range of region
values.

A bit shorter than some of the other approaches? :-)

BTW, I don't believe the self-join approach proposed
earlier will work, because joining on "scanid" will
create a cartesian type join where the region values
will be duplicated (multiplicated!).

--- Aasmund Midttun Godal <postgresql@envisity.com>
wrote:
> I think this might do the trick...
> 
> CREATE FUNCTION hori_view() RETURNS BOOLEAN AS '
>     DECLARE
>      view_select TEXT;
>      view_from TEXT;
>      view_where TEXT;
>      column_name TEXT;
>      last_column_name TEXT;
>      g_row generalized_table%ROWTYPE;
>     BEGIN
>      SELECT region INTO column_name
>         FROM generalized_table ORDER BY region LIMIT 1;
>      view_select := ''SELECT '' || column_name || 
>         ''.scanid, '' || column_name || ''.volume AS "''
> ||
>          column_name || ''_volume"'';
>      view_from := '' FROM generalized_table '' ||
> column_name;
>       view_where := '' WHERE '' ||column_name || 
>         ''.region = '''''' || column_name || '''''''';
>      last_column_name := column_name;
>      FOR g_row IN SELECT DISTINCT ON (region) *
>         FROM generalized_table ORDER BY region OFFSET 1
> LOOP
>         view_select := view_select || '', '' ||
> g_row.region ||
>         ''.volume AS "'' || g_row.region || ''_volume"'';
>         view_from := view_from || '' JOIN
> generalized_table '' ||
>             g_row.region || '' ON ('' || last_column_name ||
>             ''.scanid = '' || g_row.region || ''.scanid)'';
>         view_where := view_where || '' AND '' ||
> g_row.region ||
>              ''.region = '''''' ||  g_row.region || '''''''';
>         last_column_name := g_row.region;
>      END LOOP;
>     EXECUTE ''CREATE VIEW generalized_view AS '' ||
> view_select ||    
>         view_from || view_where;
>     RETURN TRUE;
>        END;
> ' LANGUAGE 'plpgsql';
> 
> SELECT hori_view();
> 
> SELECT * FROM generalized_view;
> 
> Ok, it may not be pretty but it works,
> 
> Regards,
> 
> Aasmund.
> 
> On Wed, 31 Oct 2001 12:42:10 -0800, "Josh Berkus"
> <josh@agliodbs.com> wrote:
> > Jeremy,
> > 
> > First, to do a pivot table, you have to be using
> Postgres 7.1.x.  7.0.x
> > will not do it.  So upgrade now.
> > 
> > 
> > There are two approaches, the simple approach and
> the complex.  The
> > simple approach requires you to know in advance of
> building the view all
> > of the possible values for your category column. 
> The complex approach,
> > which is dynamic, requires a rather sophisticated
> function (which I will
> > write eventually, really!) so we won't go into it
> here.
> > 
> > The simple approach is to create each column as a
> sub-select in the FROM
> > clause of your statement.  So, per the example
> above:
> > 
> > SELECT scanid, A_volume, B_volume, C_volume
> > FROM (SELECT scanid FROM volumes GROUP BY scanid)
> scan
> > LEFT OUTER JOIN
> > (SELECT scanid, volume as A_volume FROM volumes
> WHERE region = 'A') av
> > ON scan.scanid = av.scanid LEFT OUTER JOIN
> > (SELECT scanid, volume as B_volume FROM volumes
> WHERE region = 'B') bv
> > ON scan.scanid = bv.scanid LEFT OUTER JOIN
> > (SELECT scanid, volume as C_volume FROM volumes
> WHERE region = 'C') cv
> > ON scan.scanid = cv.scanid
> > ORDER BY scanid;
> > 
> > This approach can be adapted to include aggregates
> and the like.
> > 
> > -Josh Berkus
> > 
> > ______AGLIO DATABASE
> SOLUTIONS___________________________
> >                                        Josh Berkus
> >   Complete information technology     
> josh@agliodbs.com
> >    and data management solutions       (415)
> 565-7293
> >   for law firms, small businesses        fax
> 621-2533
> >     and non-profit organizations.      San
> Francisco
> > 
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> 
> Aasmund Midttun Godal
> 
> aasmund@godal.com - http://www.godal.com/
> +47 40 45 20 46
> 
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the
> unregister command
>     (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)


__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

Re: transposing data for a view

От

Jeff Eckermann

Дата:

01 ноября 2001 г., 12:17:03

Josh,
I don't see how you got the result you quote below. 
My query sums the volume figures, grouping by scanid:
that worked as expected (one line per scanid) on my
system when I tested it (version 7.1.2, though I don't
think that matters).
Jeff

--- Josh Berkus <josh@agliodbs.com> wrote:

> Of couse, it's only *half* a solution.  Your query
> will result in:
> 
> scanid    A_volume  B_volume  C_volume
> 1    34.5                
> 1          55.1        
> 1                12.3
> 2    11.1                
> etc.
> 

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

Re: transposing data for a view

От

"Josh Berkus"

Дата:

01 ноября 2001 г., 12:39:44

Jeff,

> I don't see how you got the result you quote below.
> My query sums the volume figures, grouping by scanid:
> that worked as expected (one line per scanid) on my
> system when I tested it (version 7.1.2, though I don't
> think that matters).

Lemme try it ....

Yes, you're right.  Sorry!  Disregard my commentary about the second
view.

-Josh


______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Вложения

Re: transposing data for a view

От

"James Orr"

Дата:

01 ноября 2001 г., 12:51:07

> > BTW, I don't believe the self-join approach proposed
> > earlier will work, because joining on "scanid" will
> > create a cartesian type join where the region values
> > will be duplicated (multiplicated!).
>
> Not if you're talking about my query, they won't.  I use that query form
> in many projects to create roll-ups; it's the "best" SQL92 approach to
> the "pivot table" problem.  However, it will not work in 7.0.3.

I think he might be talking about mine.  The region values will not be
duplicated, the WHERE clause prevents it.
I kind of prefer my own query aesthetically, is it as efficient internally?

- James

Re: transposing data for a view

От

Jeff Eckermann

Дата:

01 ноября 2001 г., 13:30:25

--- James Orr <james@lrgmail.com> wrote:

> I think he might be talking about mine.  The region
> values will not be
> duplicated, the WHERE clause prevents it.

If you are saying that I didn't read the original
query closely enough, you're probably right. 
Unfortunately I deleted the original message, so I
can't check that.

> I kind of prefer my own query aesthetically, is it
> as efficient internally?

Someone more knowledgable will have to answer that:
though I would guess that working with three joined
tables would slow things down somewhat.
> 
> - James
> 

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

Optimization, etc

От

"Jeff Sack"

Дата:

09 ноября 2001 г., 08:29:08

(apologies if this was sent twice)

Hello all. I have an academic project that I’m working on and, as I’m relatively new to optimization techniques and database design, I’m wondering if some of you can give me some pointers. Below is the schema (to model baseball statistics), and I’m pretty much stuck with it at this point. If anyone has any suggestions involving changing the schema, I’d appreciate hearing them just for future reference…

There are several interesting queries that I’ve been constructing just to get a feel for this schema, and given that some of these tables contain 100,000-200,000 tuples, some queries are taking a good 5-10 seconds to execute. I’m just wondering if this is simply the fault of my schema or are these queries poorly constructed? Here are some queries I’ve been trying:

Here’s a query for the top ten all time home run leaders:

select P.first_name, P.last_name, S.player_id, sum(B.hr) as hr, (sum(B.h)::float / sum(B.ab)::float) ab

from statistics S, batting_stats B, players P

where S.id=B.id and S.player_id=P.id and B.ab>0

group by P.first_name, P.last_name, S.player_id

order by hr desc limit 10;

Select the 10 highest batting averages of all time where the batter had 600 or more at bats. Also gather the name, year, team, hits, at bats...

select (B.h::float / B.ab) as avg, B.h, B.ab, S.year, P.first_name, P.last_name, T.city, T.name

from batting_stats B, statistics S, players P, Teams T

where B.ab > 600 and S.id=B.id and S.player_id=P.id and S.team_id=T.id

order by avg desc limit 10;

To find the names of the single season home run leaders, along with the total number of home runs, the team name/city and the year:

select P.first_name, P.last_name, B.hr, T.name, T.city, S.year from statistics S, batting_stats B, players P, teams T

where (S.id=B.id) and

(S.player_id=P.id) and

(B.hr>30) and

(T.id=S.team_id) limit 10;

You get the idea. These queries take a while. Is this just the way it is or there things that can be done to optimize this?

One separate issue (the reason why the above examples are all about batting statistics) I’m having is representing the innings pitched statistic. The way it is often represented (and the way it is done in this schema) is something like this 123.0 means exactly 123 innings pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings pitched. I’m contemplating the best way to write a function that knows how to sum these values accurately. Is this something that can be done with PL/PGSQL or should I go straight to something like PLPERL? Alternatively, I could research a way to represent fractions in the DB and write a script to convert all values in this column. Any advice here??

Thanks in advance for any thoughts, comments, suggestions…

-Jeff

create table leagues

(

-- create an integer id field for easier and efficient FK referencing

id serial,

name varchar(50),

first_year integer not null

check (first_year > 1850 and first_year < 2010),

last_year integer

check (first_year > 1850 and first_year < 2010),

primary key(id),

constraint chronological_order

check (last_year >= first_year)

);

create table teams

(

id serial,

name varchar(50) not null,

city varchar(50),

first_year integer not null

check (first_year > 1850 and first_year < 2010),

last_year integer

check (first_year > 1850 and first_year < 2010),

alt_id integer,

league_id integer not null,

primary key(id),

foreign key(alt_id) references teams(id),

foreign key(league_id) references leagues(id)

on delete cascade,

constraint chronological_order

check (last_year >= first_year)

);

create table players

(

id serial,

first_name varchar(30),

last_name varchar(30) not null,

bats char(1) check (bats in ('L','R','S')),

throws char(1) check (throws in ('L','R')),

dob date,

primary key(id)

);

create table statistics

(

id serial,

year integer not null,

g integer,

player_id integer not null,

team_id integer not null,

foreign key(player_id) references players(id)

on delete cascade,

foreign key(team_id) references teams(id)

on delete cascade,

primary key(id)

);

create table managing_stats

(

id integer not null,

w integer,

l integer,

primary key(id),

foreign key(id) references statistics(id)

on delete cascade

);

create table fielding_stats

(

id integer not null,

pos char(5),

po integer,

a integer,

e integer,

dp integer,

primary key(id),

foreign key(id) references statistics(id)

on delete cascade

);

create table batting_stats

(

id integer not null,

ab integer,

r integer,

h integer,

doubles integer,

triples integer,

hr integer,

rbi integer,

sb integer,

cs integer,

bb integer,

so integer,

sh integer,

sf integer,

ibb integer,

hbp integer,

primary key(id),

foreign key(id) references statistics(id)

on delete cascade

);

create table pitching_stats

(

id integer not null,

w integer,

l integer,

gs integer,

cg integer,

sh integer,

sv integer,

ip numeric(5,1),

h integer,

er integer,

hr integer,

bb integer,

so integer,

primary key(id),

foreign key(id) references statistics(id)

on delete cascade

);

Re: Optimization, etc

От

Stephan Szabo

Дата:

09 ноября 2001 г., 10:58:58

On Fri, 9 Nov 2001, Jeff Sack wrote:

> To find the names of the single season home run leaders, along with the
> total number of home runs, the team name/city and the year:
>
> select P.first_name, P.last_name, B.hr, T.name, T.city, S.year from
> statistics S, batting_stats B, players P, teams T
> where (S.id=B.id) and
> (S.player_id=P.id) and
> (B.hr>30) and
> (T.id=S.team_id) limit 10;
>
> You get the idea.  These queries take a while.  Is this just the way it
> is or there things that can be done to optimize this?

As a starting point, have you run vacuum analyze and what does
explain show for the query.  Also, do you have indexes on fields that
you're limiting on (like hr, etc...).

> One separate issue (the reason why the above examples are all about
> batting statistics) I'm having is representing the innings pitched
> statistic.  The way it is often represented (and the way it is done in
> this schema) is something like this 123.0 means exactly 123 innings
> pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings
> pitched.  I'm contemplating the best way to write a function that knows
> how to sum these values accurately.  Is this something that can be done
> with PL/PGSQL or should I go straight to something like PLPERL?
> Alternatively, I could research a way to represent fractions in the DB
> and write a script to convert all values in this column.  Any advice
> here??

You'd probably be best off doing the conversion at insert time into an
additional field assuming that inserts are much less likely that
selects on your data.

(col-floor(col)*(10/3::numeric) seems to get back an appropriate value
but is probably reasonably expensive.

Re: Optimization, etc

От

Jason Earl

Дата:

09 ноября 2001 г., 13:48:59

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

> On Fri, 9 Nov 2001, Jeff Sack wrote:
> 
> > To find the names of the single season home run leaders, along with the
> > total number of home runs, the team name/city and the year:
> >
> > select P.first_name, P.last_name, B.hr, T.name, T.city, S.year from
> > statistics S, batting_stats B, players P, teams T
> > where (S.id=B.id) and
> > (S.player_id=P.id) and
> > (B.hr>30) and
> > (T.id=S.team_id) limit 10;
> >
> > You get the idea.  These queries take a while.  Is this just the way it
> > is or there things that can be done to optimize this?
> 
> As a starting point, have you run vacuum analyze and what does
> explain show for the query.  Also, do you have indexes on fields that
> you're limiting on (like hr, etc...).
> 
> > One separate issue (the reason why the above examples are all about
> > batting statistics) I'm having is representing the innings pitched
> > statistic.  The way it is often represented (and the way it is done in
> > this schema) is something like this 123.0 means exactly 123 innings
> > pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings
> > pitched.  I'm contemplating the best way to write a function that knows
> > how to sum these values accurately.  Is this something that can be done
> > with PL/PGSQL or should I go straight to something like PLPERL?
> > Alternatively, I could research a way to represent fractions in the DB
> > and write a script to convert all values in this column.  Any advice
> > here??
> 
> You'd probably be best off doing the conversion at insert time into an
> additional field assuming that inserts are much less likely that
> selects on your data.
> 
> (col-floor(col)*(10/3::numeric) seems to get back an appropriate value
> but is probably reasonably expensive.

Another thought would be to simply store this as an integer that is
the number of thirds of an inning that the pitcher pitched.  In other
words your 123.1 would be stored as 370.  That would allow you a very
easy way to manipulate these values mathematically, and it would be
easy to write a simple function to format these values so that your
uses would see the 123.1 that they expect.

Heck something like:

SELECT (370 / 3)::text || '.' || (370 % 3)::text;

would do exactly what you want.

Throw in some indexes like Josh and Stephen suggest and you'll be
cooking with gasoline.

You also might want to take a look at Bruce Momjian's article about
performance tuning:

http://www2.linuxjournal.com/lj-issues/issue88/4791.html

Jason

Re: Optimization, etc

От

Masaru Sugawara

Дата:

17 ноября 2001 г., 08:30:49

On Fri, 9 Nov 2001 07:57:41 -0800 (PST)
Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:


> > On Fri, 9 Nov 2001, Jeff Sack wrote:
> >
> > One separate issue (the reason why the above examples are all about
> > batting statistics) I'm having is representing the innings pitched
> > statistic.  The way it is often represented (and the way it is done in
> > this schema) is something like this 123.0 means exactly 123 innings
> > pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings
> > pitched.  I'm contemplating the best way to write a function that knows
> > how to sum these values accurately.  Is this something that can be done
> > with PL/PGSQL or should I go straight to something like PLPERL?
> > Alternatively, I could research a way to represent fractions in the DB
> > and write a script to convert all values in this column.  Any advice
> > here??
> 
> You'd probably be best off doing the conversion at insert time into an
> additional field assuming that inserts are much less likely that
> selects on your data.
> 
> (col-floor(col)*(10/3::numeric) seems to get back an appropriate value
> but is probably reasonably expensive.
Hi,I tried to put your nice idea into the GROUP BY clause, since I wantedto know how expensive it is.  There are
severalplayers in the playerstable and about 60k rows in the pitching_stats, which have only columnsconcerned with the
SUM()aggregate function. In case of my PC with 333MHz clock, 256M SDRAM, and ATA33 HDD, the costs of executing the
followingqueries are about 4 sec and 6 sec, respectively.  It seems to me that they are reasonable.  However the ratio
ofthe sort time appears to occupyquite a bit of time.CREATE TABLE players       (id serial
CONSTRAINT pkey_players                               PRIMARY KEY,                               first_name
varchar(30));CREATE TABLE pitching_stats(id integer                               NOT NULL
CONSTRAINT key_pitching_stats                               REFERENCES players(id),                            ip
numeric(5,1));


----- 1st version.  It costs about 4 sec.
SELECT pl.id, pl.first_name, t.ret FROM (SELECT ps.id, sum(ps.ip-floor(ps.ip)) % 0.3::numeric                +
floor((sum(ps.ip-floor(ps.ip)))/ 0.3::numeric)                + sum(floor(ps.ip)) AS ret           FROM pitching_stats
ASps           GROUP BY ps.id       ) AS t INNER JOIN       players AS pl ON (t.id = pl.id)
 


----- 2nd version.  It costs about 6 sec.
SELECT pl.id, pl.first_name,       sum(ps.ip-floor(ps.ip)) % 0.3::numeric       + floor((sum(ps.ip-floor(ps.ip))) /
0.3::numeric)      + sum(floor(ps.ip)) AS ret  FROM players AS pl INNER JOIN       pitching_stats AS ps ON (pl.id =
ps.id) GROUP BY pl.id, pl.first_name
 



QUERY PLAN:
----- 1st version.
Hash Join  (cost=6357.01..7197.06 rows=6000 width=32) ->  Subquery Scan t  (cost=6355.96..6955.96 rows=6000 width=16)
   ->  Aggregate  (cost=6355.96..6955.96 rows=6000 width=16)             ->  Group  (cost=6355.96..6505.96 rows=60000
width=16)                  ->  Sort  (cost=6355.96..6355.96 rows=60000 width=16)                         ->  Seq Scan
onpitching_stats ps                                  (cost=0.00..983.00 rows=60000 width=16) ->  Hash  (cost=1.04..1.04
rows=4width=16)       ->  Seq Scan on players pl  (cost=0.00..1.04 rows=4 width=16)
 

----- 2nd version.
Aggregate  (cost=9037.33..9787.33 rows=6000 width=32) ->  Group  (cost=9037.33..9337.33 rows=60000 width=32)       ->
Sort (cost=9037.33..9037.33 rows=60000 width=32)             ->  Hash Join  (cost=1.05..3384.10 rows=60000 width=32)
              ->  Seq Scan on pitching_stats ps                                 (cost=0.00..983.00 rows=6 0000
width=16)                  ->  Hash  (cost=1.04..1.04 rows=4 width=16)                         ->  Seq Scan on players
pl                                    (cost=0.00..1.04 rows=4 width=16)
 


regards,
Masaru Sugawara

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: transposing data for a view

Вложения

Вложения

Вложения