Re: Time bucketing query performance

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: Time bucketing query performance
Дата
Msg-id 20180508010857.GI5192@telsasoft.com
обсуждение исходный текст
Ответ на Time bucketing query performance  (Julian Wilson <julianwlsn@gmail.com>)
Список pgsql-performance
On Mon, May 07, 2018 at 07:33:17PM -0400, Julian Wilson wrote:
> Hi,
> 
> I'm trying to debug improve the performance of my time bucketing query.
> What I'm trying to do is essentially bucket by an arbitrary interval and
> then do some aggregations within that interval (min,max,sum, etc). I am
> using a `max` in the query I posted. For context in the data, it is 1
> minute candles of cryptocurrency data (open price, high price, low price,
> close price, volume, for an interval). I want to transform this to a 10
> minute interval, on demand, and that is what this query is meant to do.
> 
> I understand the slow part of my query is in the LEFT JOIN, but I just
> can't quite figure out how to do it without the LEFT JOIN.
> 
> Here is my pastebin with all the details so I don't clutter the message. I
> tried to follow everything in the 'Slow Query Questions' WIKI page. There
> is also a depesz link there.
> 
> https://ybin.me/p/9d3f52d88b4b2a46#kYLotYpNuIjjbp2P4l3la8fGSJIV0p+opH4sPq1m2/Y=

Thsse may not be a substantial part of the issue, but I have some suggestions:

0) You're using CTE, which cannot have stats (unlike temporary table).  Can you
rewrite without, perhaps with GROUP BY date_trunc('hour', time_open) ?

1) you're querying on start_time AND end_time, and I believe the planner thinks
those conditions are independent, but they're not.  Try getting rid of the
frame_end and move the "5 months" into the main query using BETWEEN or two
ANDed conditions on the same variable.  See if the rowcount estimate is more
accurate:

        ->  Index Scan using historical_ohlcv_pkey on historical_ohlcv ohlcv  (cost=0.56..2488.58 ROWS=12110 width=22)
(actualtime=3.709..4.403 ROWS=9 loops=3625)
 
              Index Cond: ((exchange_symbol = 'BINANCE'::text) AND (symbol_id = 'ETHBTC'::text) AND (time_open >=
g.start_time))
              Filter: (time_close < g.end_time)

Alternately, you could try:
CREATE STATISTICS (dependencies) ON (time_open,time_close) FROM historical_ohlcv ;
ANALYZE historical_ohlcv;

2) Is your work_mem really default?  64kb?  Recommend changing it to see if the
plan changes (although it looks like that's not the issue).

3) If you have SSD, you should probably CREATE TABLESPACE tmp LOCATION /srv/pgsql_tmp and
ALTER SYSTEM SET temp_tablespaces='tmp' and SELECT pg_reload_conf();

4) If those don't help, as a test, try running with SET enable_nestloop=off.
I'm guessing that fixing rowcount estimate in (1) will be sufficient.

5) May not be important, but rerun with explain (ANALYZE,BUFFERS) and show the
results.

Justin


В списке pgsql-performance по дате отправления:

Предыдущее
От: Julian Wilson
Дата:
Сообщение: Time bucketing query performance
Следующее
От: Andreas Joseph Krogh
Дата:
Сообщение: Sv: Re: Latest advice on SSD?