Re: Tricky math in PSQL

Поиск
Список
Период
Сортировка
От Kip Warner
Тема Re: Tricky math in PSQL
Дата
Msg-id 1461740984.28754.14.camel@thevertigo.com
обсуждение исходный текст
Ответ на Re: Tricky math in PSQL  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-novice
On Tue, 2016-04-26 at 09:13 -0500, Merlin Moncure wrote:
> I'm not going to work out the whole problem for you, but you probably
> want to start looking in the direction custom aggregate function.
> Most people are not aware of this but custom aggregates can be
> applied over a window.  The state can be a composite type and thus
> hold arbitrarily complex information that you can maintain as you
> walk through the set (this works great for time series and things
> like that).
>
> Barring that, if you are still looking for solutions to this type of
> problem in the database, I'd point you in the direction of pl/r.  The
> learning curve is somewhat steep but rewards mastery.
>
> CREATE AGGREGATE RateChange(NUMERIC, NUMERIC, INT)
> (
>   SFUNC = AggRateChange,
>   STYPE = NUMERIC
> );
>
> CREATE OR REPLACE FUNCTION AggRateChange(
>   State NUMERIC,
>   Seed NUMERIC,
>   RateChange NUMERIC,
>   NDigits INT) RETURNS NUMERIC AS
> $$
>   SELECT ROUND(COALESCE($1, $2) * (1.0 + $3), NDigits);
> $$ LANGUAGE SQL IMMUTABLE;
>
> select ratechange(100, (random() * 0.05)::numeric, 3) over(order by
> s)
> from generate_series(1,10) s;
>  ratechange
> ────────────
>     101.524
>     104.570
>     104.674
>     106.599
>     111.685
>     114.583
>     120.174
>     125.561
>     130.291

Hey Merlin,

Thank you for your thoughtful suggestions. I spent the better part of
the day wrestling with some refactored schema and queries with a great
deal of help from the kind folks in #postgresql. This is what I've got
now, albeit without custom aggregate functions:

    http://pastebin.com/zwWmYMd0

As you can see, it does appear to work, but it's a problem of doing so
efficiently now.  The solution I have around line 192 to 208 performs
total_diff() for a given id against all others, but with some constants
removed. It does work, but for a statistic table of 1,000,000 rows and
a statistics_child table of 101,948,612, the query takes approximately
4m 11s on my machine.

I'd like to improve this. I wonder if providing indexes on some of the
columns in the two views would be helpful?

--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com


Вложения

В списке pgsql-novice по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: Tricky math in PSQL
Следующее
От: Hassan Schroeder
Дата:
Сообщение: Logging connections, not queries