Re: Time bucketing query performance

Поиск

Список

Период

Сортировка

От	Justin Pryzby
Тема	Re: Time bucketing query performance
Дата	8 мая 2018 г. 04:08:57
Msg-id	20180508010857.GI5192@telsasoft.com обсуждение исходный текст
Ответ на	Time bucketing query performance (Julian Wilson <julianwlsn@gmail.com>)
Список	pgsql-performance

Дерево обсуждения

On Mon, May 07, 2018 at 07:33:17PM -0400, Julian Wilson wrote:
> Hi,
> 
> I'm trying to debug improve the performance of my time bucketing query.
> What I'm trying to do is essentially bucket by an arbitrary interval and
> then do some aggregations within that interval (min,max,sum, etc). I am
> using a `max` in the query I posted. For context in the data, it is 1
> minute candles of cryptocurrency data (open price, high price, low price,
> close price, volume, for an interval). I want to transform this to a 10
> minute interval, on demand, and that is what this query is meant to do.
> 
> I understand the slow part of my query is in the LEFT JOIN, but I just
> can't quite figure out how to do it without the LEFT JOIN.
> 
> Here is my pastebin with all the details so I don't clutter the message. I
> tried to follow everything in the 'Slow Query Questions' WIKI page. There
> is also a depesz link there.
> 
> https://ybin.me/p/9d3f52d88b4b2a46#kYLotYpNuIjjbp2P4l3la8fGSJIV0p+opH4sPq1m2/Y=

Thsse may not be a substantial part of the issue, but I have some suggestions:

0) You're using CTE, which cannot have stats (unlike temporary table).  Can you
rewrite without, perhaps with GROUP BY date_trunc('hour', time_open) ?

1) you're querying on start_time AND end_time, and I believe the planner thinks
those conditions are independent, but they're not.  Try getting rid of the
frame_end and move the "5 months" into the main query using BETWEEN or two
ANDed conditions on the same variable.  See if the rowcount estimate is more
accurate:

        ->  Index Scan using historical_ohlcv_pkey on historical_ohlcv ohlcv  (cost=0.56..2488.58 ROWS=12110 width=22)
(actualtime=3.709..4.403 ROWS=9 loops=3625)

              Index Cond: ((exchange_symbol = 'BINANCE'::text) AND (symbol_id = 'ETHBTC'::text) AND (time_open >=
g.start_time))
              Filter: (time_close < g.end_time)

Alternately, you could try:
CREATE STATISTICS (dependencies) ON (time_open,time_close) FROM historical_ohlcv ;
ANALYZE historical_ohlcv;

2) Is your work_mem really default?  64kb?  Recommend changing it to see if the
plan changes (although it looks like that's not the issue).

3) If you have SSD, you should probably CREATE TABLESPACE tmp LOCATION /srv/pgsql_tmp and
ALTER SYSTEM SET temp_tablespaces='tmp' and SELECT pg_reload_conf();

4) If those don't help, as a test, try running with SET enable_nestloop=off.
I'm guessing that fixing rowcount estimate in (1) will be sufficient.

5) May not be important, but rerun with explain (ANALYZE,BUFFERS) and show the
results.

Justin

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Time bucketing query performance