Re: Growth planning

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: Growth planning
Дата
Msg-id 5F0C1A8D-F199-497A-B7FA-8143E48BB020@gmail.com
обсуждение исходный текст
Ответ на Growth planning  (Israel Brewster <ijbrewster@alaska.edu>)
Список pgsql-general
> On 4 Oct 2021, at 18:22, Israel Brewster <ijbrewster@alaska.edu> wrote:

(…)

> the script owner is taking about wanting to process and pull in “all the historical data we have access to”, which
wouldgo back several years, not to mention the probable desire to keep things running into the foreseeable future. 

(…)

> - The largest SELECT workflow currently is a script that pulls all available data for ONE channel of each station
(currently,I suspect that will change to all channels in the near future), and runs some post-processing machine
learningalgorithms on it. This script (written in R, if that makes a difference) currently takes around half an hour to
run,and is run once every four hours. I would estimate about 50% of the run time is data retrieval and the rest doing
itsown thing. I am only responsible for integrating this script with the database, what it does with the data (and
thereforehow long that takes, as well as what data is needed), is up to my colleague. I have this script running on the
samemachine as the DB to minimize data transfer times. 

I suspect that a large portion of time is spent on downloading this data to the R script, would it help to rewrite it
inPL/R and do (part of) the ML calculations at the DB side? 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: DELETE ... USING LATERAL
Следующее
От: Ron
Дата:
Сообщение: Re: Growth planning