Re: Too few rows expected by Planner on partitioned tables

Поиск
Список
Период
Сортировка
От Julian Wolf
Тема Re: Too few rows expected by Planner on partitioned tables
Дата
Msg-id VI1PR10MB1632B519F07E47AF61FE2507F0540@VI1PR10MB1632.EURPRD10.PROD.OUTLOOK.COM
обсуждение исходный текст
Ответ на Re: Too few rows expected by Planner on partitioned tables  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: Too few rows expected by Planner on partitioned tables  (Michael Lewis <mlewis@entrata.com>)
Список pgsql-performance
Hi Justin,

thank you very much for your help and sorry for the late answer.

After testing around with your suggestions, it actually was the daterange type which caused all the problems. Messing around with the statistics value improved performance drastically but did not solve the problem. We decided to replace the daterange type with a BIGINT and calculate the "id" of the daterange by just using the BIGINT (2x 4 bytes) representation of the daterange. Thus, it can be transformed in both directions immutably.

CREATE OR REPLACE FUNCTION to_daterange_id(daterange DATERANGE)
RETURNS BIGINT
IMMUTABLE
LANGUAGE plpgsql
AS
$$
BEGIN
return (extract(EPOCH FROM lower(daterange))::BIGINT << 32) |
extract(EPOCH FROM upper(daterange))::BIGINT;
end;
--------------------------------------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION to_daterange(daterange_id BIGINT)
RETURNS DATERANGE
IMMUTABLE
LANGUAGE plpgsql
AS
$$
BEGIN
RETURN daterange(to_timestamp(daterange_id >> 32)::DATE, to_timestamp(daterange_id & x'FFFFFFFF'::BIGINT)::DATE);
END;
$$;

So there is no daterange object messing up the primary key index. Your other suggestions sadly didn't work, as the daterange was the partition key of the table too, this field was inevitably the first criterion in all queries and thus overruled every other index.

With that said and done, it would be nice, if daterange objects could be used in unique indexes too. They are a great way to identify data which represents a week, month, etc. worth of data (similar to a two-column-date representation).

Thank you very much again for your time and help

Julian

Julian P. Wolf | Invenium Data Insights GmbH
julian.wolf@invenium.io | +43 664 88 199 013
Herrengasse 28 | 8010 Graz | www.invenium.io


From: Justin Pryzby <pryzby@telsasoft.com>
Sent: Wednesday, July 22, 2020 4:40 PM
To: Julian Wolf <julian.wolf@invenium.io>
Cc: pgsql-performance Postgres Mailing List <pgsql-performance@lists.postgresql.org>
Subject: Re: Too few rows expected by Planner on partitioned tables
 
On Tue, Jul 21, 2020 at 01:09:22PM +0000, Julian Wolf wrote:
> Hello,
>
> A description of what you are trying to achieve and what results you expect:
> Our database is growing on a daily basis by about 2.5million rows per table (2 at the moment). Because of that, we decided to partition the data, especially, as we are pre-aggregating the data for weeks, months, quarters and years. Every aggregation is stored in a separate partition:
>
...
> Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable IMHO.
...
>     PRIMARY KEY ( daterange, spatial_feature_id, visitor_profile_id, activity_type_combination_id,
>                  activity_chain_id),
...
> ) PARTITION BY LIST (daterange);

> schemaname relname n_live_tup
> mobility_insights location_statistics_y2019m03d 23569853
> mobility_insights location_statistics_y2019m03w 19264373
> mobility_insights location_statistics_y2019m03 18105295

> Aggregate  (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)
>   Buffers: shared hit=67334
>   ->  Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st  (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)
>         Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))
>         Buffers: shared hit=67334

I guess this isn't actually the problem query, since it takes 143ms and not
dozens of seconds.  I don't know what is the problem query, but maybe it might
help to create an new index on spatial_feature_id, which could be scanned
rather than scanning the unique index.

Also, if daterange *and* spatial_feature_id are always *both* included, then
this might work:

postgres=# CREATE STATISTICS t_stats (mcv) ON daterange,spatial_feature_id FROM t ; ANALYZE t;

--
Justin

В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
Следующее
От: Michael Lewis
Дата:
Сообщение: Re: Too few rows expected by Planner on partitioned tables