Обсуждение: Calculated values

Поиск
Список
Период
Сортировка

Calculated values

От
Camm Maguire
Дата:
Greetings!  What is the 'best way' or 'guiding philosophy' if there is
one for dealing with calculated values in a pg database?  For example,
say you have a table with a column of floats, and you want to make a
view showing this column as a percent of the total across the column.
Should you

a) create another table with the total, and put on triggers on insert,
update, and delete to modify the right total or

b) create the view with a specific subselect to recalculate the total
at select time.  This has the disadvantage that the total seems to be
recalculated for each row.  Is there any sql syntax which can merge a
dynamically generated aggregate, *calculated only once*, into each
output row?

Thanks!


--
Camm Maguire                             camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: Calculated values

От
Emmanuel Charpentier
Дата:
Camm Maguire wrote:
>
> Greetings!  What is the 'best way' or 'guiding philosophy' if there is
> one for dealing with calculated values in a pg database?  For example,
> say you have a table with a column of floats, and you want to make a
> view showing this column as a percent of the total across the column.
> Should you
>
> a) create another table with the total, and put on triggers on insert,
> update, and delete to modify the right total or
>
> b) create the view with a specific subselect to recalculate the total
> at select time.  This has the disadvantage that the total seems to be
> recalculated for each row.  Is there any sql syntax which can merge a
> dynamically generated aggregate, *calculated only once*, into each
> output row?

Hmmm ... You want to avoid "remerging", if I follow you ...

Coud you try :

CREATE VIEW my view AS
    SELECT id, partialsum, (partialsum/totalsum) AS percentage
    FROM (SELECT id, SUM(item) AS partialsum GROUP BY id)
        JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause

That way, totalsum should be computed once.

"Caching" computations in a secondary table is a (somewhat hidden) form
of redundancy, and therefore a nice way to ask for trouble ...

Hope this helps ...

--
Emmanuel Charpentier

Re: Calculated values

От
Camm Maguire
Дата:
Greetings, and thanks for your reply!

Emmanuel Charpentier <charpent@bacbuc.dyndns.org> writes:

> Camm Maguire wrote:
> >
> > Greetings!  What is the 'best way' or 'guiding philosophy' if there is
> > one for dealing with calculated values in a pg database?  For example,
> > say you have a table with a column of floats, and you want to make a
> > view showing this column as a percent of the total across the column.
> > Should you
> >
> > a) create another table with the total, and put on triggers on insert,
> > update, and delete to modify the right total or
> >
> > b) create the view with a specific subselect to recalculate the total
> > at select time.  This has the disadvantage that the total seems to be
> > recalculated for each row.  Is there any sql syntax which can merge a
> > dynamically generated aggregate, *calculated only once*, into each
> > output row?
>
> Hmmm ... You want to avoid "remerging", if I follow you ...
>
> Coud you try :
>
> CREATE VIEW my view AS
>     SELECT id, partialsum, (partialsum/totalsum) AS percentage
>     FROM (SELECT id, SUM(item) AS partialsum GROUP BY id)
>         JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause
>

I can't seem to get this syntax to work with pg.  No subselects seem
to be accepted in the from clause, and join doesn't seem to be a
reserved word at all.  But this kind of thing is exactly what I'm
looking for.

Another idea is to write a plpgsql function which returns tuples of a
given table.  Only problem here appears to be that to access the
columns of the result, one has to do something like 'select
id(function),percentage(function),...' which then calls the function
multiple times for each column!

Maybe this is better done outside the db.

Any advice most appreciated!

> That way, totalsum should be computed once.
>
> "Caching" computations in a secondary table is a (somewhat hidden) form
> of redundancy, and therefore a nice way to ask for trouble ...
>
> Hope this helps ...
>
> --
> Emmanuel Charpentier
>
>

--
Camm Maguire                             camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: Calculated values

От
Tom Lane
Дата:
Camm Maguire <camm@enhanced.com> writes:
> Emmanuel Charpentier <charpent@bacbuc.dyndns.org> writes:
>> CREATE VIEW my view AS
>> SELECT id, partialsum, (partialsum/totalsum) AS percentage
>> FROM (SELECT id, SUM(item) AS partialsum GROUP BY id)
>> JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause

> I can't seem to get this syntax to work with pg.  No subselects seem
> to be accepted in the from clause, and join doesn't seem to be a
> reserved word at all.

Sounds like you are trying to do it in 7.0 or before.  Emmanuel is
relying on 7.1 features --- and the example won't work as given anyway,
since (a) the subselects neglect to specify source tables; (b) you
have to write CROSS JOIN not JOIN if you want to omit ON/USING.

In 7.0 you could accomplish the same thing with temp tables, or more
straightforwardly by something like

SELECT    id,
    SUM(item) AS partialsum,
    SUM(item) / (SELECT SUM(item) FROM table) AS percentage
FROM table
GROUP BY id

This relies for efficiency on the poorly-documented fact that the
sub-select will only be evaluated once, since it has no dependency
on the state of the outer select.  (You can check this by seeing that
EXPLAIN shows the subselect as an InitPlan not a SubPlan.)

            regards, tom lane

Re: Calculated values

От
Camm Maguire
Дата:
Greetings, and thanks so much for your reply!

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Camm Maguire <camm@enhanced.com> writes:
> > Emmanuel Charpentier <charpent@bacbuc.dyndns.org> writes:
> >> CREATE VIEW my view AS
> >> SELECT id, partialsum, (partialsum/totalsum) AS percentage
> >> FROM (SELECT id, SUM(item) AS partialsum GROUP BY id)
> >> JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause
>
> > I can't seem to get this syntax to work with pg.  No subselects seem
> > to be accepted in the from clause, and join doesn't seem to be a
> > reserved word at all.
>
> Sounds like you are trying to do it in 7.0 or before.  Emmanuel is
> relying on 7.1 features --- and the example won't work as given anyway,
> since (a) the subselects neglect to specify source tables; (b) you
> have to write CROSS JOIN not JOIN if you want to omit ON/USING.
>

Thanks!  Indeed, I'm using 7.0.3.

> In 7.0 you could accomplish the same thing with temp tables, or more
> straightforwardly by something like
>
> SELECT    id,
>     SUM(item) AS partialsum,
>     SUM(item) / (SELECT SUM(item) FROM table) AS percentage
> FROM table
> GROUP BY id


>
> This relies for efficiency on the poorly-documented fact that the
> sub-select will only be evaluated once, since it has no dependency
> on the state of the outer select.  (You can check this by seeing that
> EXPLAIN shows the subselect as an InitPlan not a SubPlan.)

Thanks again.  Alas, the 'explain' for me still shows the subplan, and
I can see why.  Here is my view definition:

create view csp2 as
    SELECT c1.asof, c1.ticker, c1.sp, c1.price AS p1,
    (((100 * float8(c1.sp)) * c1.price) /
        (select sum(price*sp) from sprices,sinfo
            where sprices.sinfo = sinfo.sinfo
            and sprices.asof = c1.asof)) AS wt,
    c2.price AS p2, c2.split, c1.div,
    (100 * c1.ret)
    FROM csp1 c1, csp1 c2,  dates
    WHERE ((((c1.asof = dates.asof))
        AND (c2.asof = dates.nasof))
        AND (c1.id = c2.id));

What is obviously doing this is the 'sprices.asof = c1.asof'
dependency between the inner and outer select.  Trouble is, my only
intention is to be able to use this view with a constant, albeit
'runtime-selectable', 'asof' or date, as in

select * from csp where asof = '20000103';

Any other suggestions?  This dependency issue slows the above query
down significantly, resulting in the calculation of the same sum ~
1000 times.  Do you also agree with the previous respondent that
trying to have a table of sums, updated dynamically with triggers, is
not a good idea?  I'm trying to find the right philosophy to the
design of this db, and am ending up searching for a happy medium
between select speed and insert complexity.


>
>             regards, tom lane
>
>

Thanks again,

--
Camm Maguire                             camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

performance...

От
"chris markiewicz"
Дата:
hello.

this might be as much of a general database question as it is a postgres
question...

i have a table with 5 columns...a primary key (integer), three small (10
character) text fields, and one semi-large (1400 characters) text field.
note that only a small percentage (5% ?) of the rows contain 1400 characters
in the 5th column...the other 95% have approx 10 characters.  it has 1100
rows.

the problem is this - queries (command line) often take a very long time -
anywhere from 5-15 seconds - to execute.  the queries use only the primary
key and nothing else in the where clause.  no joins.  a sample query is:

select * from weather where weatherid = 12372;

from the command line, it seems that the first query can take a very long
time but subsequent queries happen quickly ( < 1 sec).  i'm guessing that
this is the result of caching or something.

do the long times make sense?  what can i do to shorten them?  would a
smaller text field help?  i have no reason to think that this would be
faster or slower in another db, so it might be unrelated to postgres itself.

i greatly appreciate your help.

chris


Re: performance...

От
adb
Дата:
First thing to try is to do

explain select * from weather where weatherid = 12372;

and see if it's doing a sequential scan on your table.
If it is, the index and table stats may be out of date
and I think you fix those using vacuum analyze.

One other thing is that if you only have 1100 rows of the sizes
you describe, even a table scan shouldn't take as long as you
say unless the machine is either under serious load or doesn't
have enough memory to run postgres without swapping. (or the disk is
freaking out with io errors)

Alex.

On Mon, 29 Jan 2001, chris markiewicz wrote:

> hello.
>
> this might be as much of a general database question as it is a postgres
> question...
>
> i have a table with 5 columns...a primary key (integer), three small (10
> character) text fields, and one semi-large (1400 characters) text field.
> note that only a small percentage (5% ?) of the rows contain 1400 characters
> in the 5th column...the other 95% have approx 10 characters.  it has 1100
> rows.
>
> the problem is this - queries (command line) often take a very long time -
> anywhere from 5-15 seconds - to execute.  the queries use only the primary
> key and nothing else in the where clause.  no joins.  a sample query is:
>
> select * from weather where weatherid = 12372;
>
> from the command line, it seems that the first query can take a very long
> time but subsequent queries happen quickly ( < 1 sec).  i'm guessing that
> this is the result of caching or something.
>
> do the long times make sense?  what can i do to shorten them?  would a
> smaller text field help?  i have no reason to think that this would be
> faster or slower in another db, so it might be unrelated to postgres itself.
>
> i greatly appreciate your help.
>
> chris
>


Re: performance...

От
Alfred Perlstein
Дата:
* adb <adb@Beast.COM> [010129 13:40] wrote:
>
> One other thing is that if you only have 1100 rows of the sizes
> you describe, even a table scan shouldn't take as long as you
> say unless the machine is either under serious load or doesn't
> have enough memory to run postgres without swapping. (or the disk is
> freaking out with io errors)

Increasing the amount of shared memory for postgresql can help.

RE: performance...

От
"chris markiewicz"
Дата:
vacuum analyze seems to have done the trick...this is the sort of thing that
happens when a non-dba is doing dba work...

sincerest thanks to all that responded!

chris

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of adb
Sent: Monday, January 29, 2001 4:36 PM
To: chris markiewicz
Cc: pgsql-general@postgresql.org
Subject: Re: performance...


First thing to try is to do

explain select * from weather where weatherid = 12372;

and see if it's doing a sequential scan on your table.
If it is, the index and table stats may be out of date
and I think you fix those using vacuum analyze.

One other thing is that if you only have 1100 rows of the sizes
you describe, even a table scan shouldn't take as long as you
say unless the machine is either under serious load or doesn't
have enough memory to run postgres without swapping. (or the disk is
freaking out with io errors)

Alex.

On Mon, 29 Jan 2001, chris markiewicz wrote:

> hello.
>
> this might be as much of a general database question as it is a postgres
> question...
>
> i have a table with 5 columns...a primary key (integer), three small (10
> character) text fields, and one semi-large (1400 characters) text field.
> note that only a small percentage (5% ?) of the rows contain 1400
characters
> in the 5th column...the other 95% have approx 10 characters.  it has 1100
> rows.
>
> the problem is this - queries (command line) often take a very long time -
> anywhere from 5-15 seconds - to execute.  the queries use only the primary
> key and nothing else in the where clause.  no joins.  a sample query is:
>
> select * from weather where weatherid = 12372;
>
> from the command line, it seems that the first query can take a very long
> time but subsequent queries happen quickly ( < 1 sec).  i'm guessing that
> this is the result of caching or something.
>
> do the long times make sense?  what can i do to shorten them?  would a
> smaller text field help?  i have no reason to think that this would be
> faster or slower in another db, so it might be unrelated to postgres
itself.
>
> i greatly appreciate your help.
>
> chris
>


Re: performance...

От
Mitch Vincent
Дата:
You have weatherid indexed, right? If not, index it and run VACUUM
ANALYZE -- if so make then sure you've run VACUUM ANALYZE after you
created the index..

Also, you might want to EXPLAIN that query to get the query plan (just
type EXPLAIN <query>)if the above doesn't work. Post the ouput of
EXPLAIN along with the schema of your tables and perhaps someone can help
some more..

-Mitch


On Mon, 29 Jan 2001, chris markiewicz wrote:

> hello.
>
> this might be as much of a general database question as it is a postgres
> question...
>
> i have a table with 5 columns...a primary key (integer), three small (10
> character) text fields, and one semi-large (1400 characters) text field.
> note that only a small percentage (5% ?) of the rows contain 1400 characters
> in the 5th column...the other 95% have approx 10 characters.  it has 1100
> rows.
>
> the problem is this - queries (command line) often take a very long time -
> anywhere from 5-15 seconds - to execute.  the queries use only the primary
> key and nothing else in the where clause.  no joins.  a sample query is:
>
> select * from weather where weatherid = 12372;
>
> from the command line, it seems that the first query can take a very long
> time but subsequent queries happen quickly ( < 1 sec).  i'm guessing that
> this is the result of caching or something.
>
> do the long times make sense?  what can i do to shorten them?  would a
> smaller text field help?  i have no reason to think that this would be
> faster or slower in another db, so it might be unrelated to postgres itself.
>
> i greatly appreciate your help.
>
> chris
>
>