Обсуждение: Performance degrade in Planning Time to find appropriate Partial Index

Поиск
Список
Период
Сортировка

Performance degrade in Planning Time to find appropriate Partial Index

От
Meenatchi Sandanam
Дата:
I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains multiple form data differentiated by ID range. Hence a column contains more than one form data. To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes which suits my requirement. I have created Partial Indexes with ID Range as criteria and it provides Uniqueness and Indexing per form basis as expected. But DML operations on a particular form scans all the Indexes created for the entire table instead of scanning the Indexes created for that particular form ID Range. This degrades Planner Performance and Query Time more than 10 times as below, 

Query Result for the table with 3000 Partial Indexes(15 Indexes per form) : 

explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789; 
QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------ 
Index Scan using form_data_1_bigint50_3000 on form_data_copy (cost=0.28..8.29 rows=1 width=8) (actual time=0.057..0.057 rows=0 loops=1) 
Index Cond: (bigint50 = 789) 
Planning time: 99.287 ms 
Execution time: 0.112 ms 
(4 rows) 

Time: 103.967 ms 
Query Result for the table with no Indexes(with same record count as above table) : 

explain analyse select id from form_data_copy1 where id between 3001 and 4000 and bigint50=789; QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------- 
Index Scan using form_data_copy1_fk1_idx on form_data_copy1 (cost=0.42..208.62 rows=1 width=8) (actual time=1.576..1.576 rows=0 loops=1) 
Index Cond: ((id >= 3001) AND (id <= 4000)) 
Filter: (bigint50 = 789) 
Rows Removed by Filter: 859 
Planning time: 1.243 ms 
Execution time: 1.701 ms 
(6 rows) 

Time: 5.891 ms 

To ensure that the Planning Time 99.287 ms is not the time taken for scanning 15 Indexes for the form, I have created only 15 Indexes for the table and got the result as below, 

explain analyse select id from form_data_copy1 where id between 3001 and 4000 and bigint50=789; 
QUERY PLAN 
----------------------------------------------------------------------------------------------------------------------------------------------- 
Index Scan using form_data_copy1_bigint50_3000 on form_data_copy1 (cost=0.28..8.29 rows=1 width=8) (actual time=0.025..0.025 rows=0 loops=1) 
Index Cond: (bigint50 = 789) 
Planning time: 3.017 ms 
Execution time: 0.086 ms 
(4 rows) 

Time: 7.291 ms 
It seems PGSQL scans all 3000 Indexes even though I provided the ID Range in the query. Please clarify whether my assumption is correct or the reason for this more Planning Time. Also, suggest me the way to reduce this planning time.

Re: Performance degrade in Planning Time to find appropriatePartial Index

От
Laurenz Albe
Дата:
Meenatchi Sandanam wrote:
> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains
> multiple form data differentiated by ID range. Hence a column contains more than one form data.
> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes
> which suits my requirement. I have created Partial Indexes with ID Range as criteria and
> it provides Uniqueness and Indexing per form basis as expected. But DML operations on a
> particular form scans all the Indexes created for the entire table instead of scanning
> the Indexes created for that particular form ID Range. This degrades Planner Performance
> and Query Time more than 10 times as below, 
> 
> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) : 

It is crazy to create 3000 partial indexes on one table.

No wonder planning and DML statements take very long, they have to consider all the
indexes.

> explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789;

Use a single index on (bigint50, id) for best performance.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com


Re: Performance degrade in Planning Time to find appropriate Partial Index

От
Michael Loftis
Дата:

On Thu, Mar 1, 2018 at 03:10 Meenatchi Sandanam <meen.opm@gmail.com> wrote:
I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains multiple form data differentiated by ID range. Hence a column contains more than one form data. To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes which suits my requirement. I have created Partial Indexes with ID Range as criteria and it provides Uniqueness and Indexing per form basis as expected. But DML operations on a particular form scans all the Indexes created for the entire table instead of scanning the Indexes created for that particular form ID Range. This degrades Planner Performance and Query Time more than 10 times as below, 

Query Result for the table with 3000 Partial Indexes(15 Indexes per form) : 

This smells like you’ve failed to normalize your data correctly. 3k indexes to ensure uniqueness ? It sounds a lot more like you need 15 tables for 15 forms ... perhaps with a view for reading or maybe 1/15th of the columns to begin with by having a form_type column...or perhaps like an index function for the unique constraint....such that the output of the function is the normalized portion of data that’s required to be unique....

If you’ve really got 3k different uniqueness criteria differing by “id” ranges then it sounds like an expression index with a function spitting out the hash of uniqueness but that’d still be hairy, at least you wouldn’t eat the time on every read though. But I’d reduce that id range based problem to include a unique_type indicator column instead.



--

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler

Re: Performance degrade in Planning Time to find appropriate Partial Index

От
Nandakumar M
Дата:
Hi,

https://heapanalytics.com/blog/engineering/running-10-million-postgresql-indexes-in-production

From the link shared above, it looks like what Meenatchi has done should work.

Do the conditions on the partial index and query match exactly? (
greater than / greater than equals mismatch maybe?)

If conditions for those partial indexes are mutually exclusive and the
query has a matching condition then Postgres can use that index alone.
Are we missing something here?

Regards,
Nanda

On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Meenatchi Sandanam wrote:
>> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains
>> multiple form data differentiated by ID range. Hence a column contains more than one form data.
>> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes
>> which suits my requirement. I have created Partial Indexes with ID Range as criteria and
>> it provides Uniqueness and Indexing per form basis as expected. But DML operations on a
>> particular form scans all the Indexes created for the entire table instead of scanning
>> the Indexes created for that particular form ID Range. This degrades Planner Performance
>> and Query Time more than 10 times as below,
>>
>> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) :
>
> It is crazy to create 3000 partial indexes on one table.
>
> No wonder planning and DML statements take very long, they have to consider all the
> indexes.
>
>> explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789;
>
> Use a single index on (bigint50, id) for best performance.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>


Re: Performance degrade in Planning Time to find appropriate Partial Index

От
Pavel Stehule
Дата:


2018-03-02 14:49 GMT+01:00 Nandakumar M <m.nanda92@gmail.com>:
Hi,

https://heapanalytics.com/blog/engineering/running-10-million-postgresql-indexes-in-production

From the link shared above, it looks like what Meenatchi has done should work.

It can be different situation, there are not specified indexes per table. And if some projects works, it doesn't mean, so they are well designed.

PostgreSQL has not column storage. Look on column databases. They are designed for extra wide tables.

Regards

Pavel
 

Do the conditions on the partial index and query match exactly? (
greater than / greater than equals mismatch maybe?)

If conditions for those partial indexes are mutually exclusive and the
query has a matching condition then Postgres can use that index alone.
Are we missing something here?

Regards,
Nanda

On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Meenatchi Sandanam wrote:
>> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains
>> multiple form data differentiated by ID range. Hence a column contains more than one form data.
>> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes
>> which suits my requirement. I have created Partial Indexes with ID Range as criteria and
>> it provides Uniqueness and Indexing per form basis as expected. But DML operations on a
>> particular form scans all the Indexes created for the entire table instead of scanning
>> the Indexes created for that particular form ID Range. This degrades Planner Performance
>> and Query Time more than 10 times as below,
>>
>> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) :
>
> It is crazy to create 3000 partial indexes on one table.
>
> No wonder planning and DML statements take very long, they have to consider all the
> indexes.
>
>> explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789;
>
> Use a single index on (bigint50, id) for best performance.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>


Re: Performance degrade in Planning Time to find appropriate Partial Index

От
Pavel Stehule
Дата:


2018-03-02 15:29 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2018-03-02 14:49 GMT+01:00 Nandakumar M <m.nanda92@gmail.com>:
Hi,

https://heapanalytics.com/blog/engineering/running-10-million-postgresql-indexes-in-production

From the link shared above, it looks like what Meenatchi has done should work.

It can be different situation, there are not specified indexes per table. And if some projects works, it doesn't mean, so they are well designed.

PostgreSQL has not column storage. Look on column databases. They are designed for extra wide tables.

read the article:

1. Probably they use Citus

2. Since partial indexes are so easy to create and work with, we’ve wound up with over 10 million partial indexes across our entire cluster.



Regards

Pavel
 

Do the conditions on the partial index and query match exactly? (
greater than / greater than equals mismatch maybe?)

If conditions for those partial indexes are mutually exclusive and the
query has a matching condition then Postgres can use that index alone.
Are we missing something here?

Regards,
Nanda

On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Meenatchi Sandanam wrote:
>> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains
>> multiple form data differentiated by ID range. Hence a column contains more than one form data.
>> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes
>> which suits my requirement. I have created Partial Indexes with ID Range as criteria and
>> it provides Uniqueness and Indexing per form basis as expected. But DML operations on a
>> particular form scans all the Indexes created for the entire table instead of scanning
>> the Indexes created for that particular form ID Range. This degrades Planner Performance
>> and Query Time more than 10 times as below,
>>
>> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) :
>
> It is crazy to create 3000 partial indexes on one table.
>
> No wonder planning and DML statements take very long, they have to consider all the
> indexes.
>
>> explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789;
>
> Use a single index on (bigint50, id) for best performance.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>