Обсуждение: Performance degrade in Planning Time to find appropriate Partial Index
I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains multiple form data differentiated by ID range. Hence a column contains more than one form data. To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes which suits my requirement. I have created Partial Indexes with ID Range as criteria and it provides Uniqueness and Indexing per form basis as expected. But DML operations on a particular form scans all the Indexes created for the entire table instead of scanning the Indexes created for that particular form ID Range. This degrades Planner Performance and Query Time more than 10 times as below,
Query Result for the table with 3000 Partial Indexes(15 Indexes per form) :
To ensure that the Planning Time 99.287 ms is not the time taken for scanning 15 Indexes for the form, I have created only 15 Indexes for the table and got the result as below,
Query Result for the table with 3000 Partial Indexes(15 Indexes per form) :
explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789;Query Result for the table with no Indexes(with same record count as above table) :
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ ------------------
Index Scan using form_data_1_bigint50_3000 on form_data_copy (cost=0.28..8.29 rows=1 width=8) (actual time=0.057..0.057 rows=0 loops=1)
Index Cond: (bigint50 = 789)
Planning time: 99.287 ms
Execution time: 0.112 ms
(4 rows)
Time: 103.967 ms
explain analyse select id from form_data_copy1 where id between 3001 and 4000 and bigint50=789; QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ -------------------
Index Scan using form_data_copy1_fk1_idx on form_data_copy1 (cost=0.42..208.62 rows=1 width=8) (actual time=1.576..1.576 rows=0 loops=1)
Index Cond: ((id >= 3001) AND (id <= 4000))
Filter: (bigint50 = 789)
Rows Removed by Filter: 859
Planning time: 1.243 ms
Execution time: 1.701 ms
(6 rows)
Time: 5.891 ms
To ensure that the Planning Time 99.287 ms is not the time taken for scanning 15 Indexes for the form, I have created only 15 Indexes for the table and got the result as below,
explain analyse select id from form_data_copy1 where id between 3001 and 4000 and bigint50=789;It seems PGSQL scans all 3000 Indexes even though I provided the ID Range in the query. Please clarify whether my assumption is correct or the reason for this more Planning Time. Also, suggest me the way to reduce this planning time.
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ -----------------------
Index Scan using form_data_copy1_bigint50_3000 on form_data_copy1 (cost=0.28..8.29 rows=1 width=8) (actual time=0.025..0.025 rows=0 loops=1)
Index Cond: (bigint50 = 789)
Planning time: 3.017 ms
Execution time: 0.086 ms
(4 rows)
Time: 7.291 ms
Meenatchi Sandanam wrote: > I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains > multiple form data differentiated by ID range. Hence a column contains more than one form data. > To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes > which suits my requirement. I have created Partial Indexes with ID Range as criteria and > it provides Uniqueness and Indexing per form basis as expected. But DML operations on a > particular form scans all the Indexes created for the entire table instead of scanning > the Indexes created for that particular form ID Range. This degrades Planner Performance > and Query Time more than 10 times as below, > > Query Result for the table with 3000 Partial Indexes(15 Indexes per form) : It is crazy to create 3000 partial indexes on one table. No wonder planning and DML statements take very long, they have to consider all the indexes. > explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789; Use a single index on (bigint50, id) for best performance. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Thu, Mar 1, 2018 at 03:10 Meenatchi Sandanam <meen.opm@gmail.com> wrote:
I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains multiple form data differentiated by ID range. Hence a column contains more than one form data. To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes which suits my requirement. I have created Partial Indexes with ID Range as criteria and it provides Uniqueness and Indexing per form basis as expected. But DML operations on a particular form scans all the Indexes created for the entire table instead of scanning the Indexes created for that particular form ID Range. This degrades Planner Performance and Query Time more than 10 times as below,
Query Result for the table with 3000 Partial Indexes(15 Indexes per form) :
This smells like you’ve failed to normalize your data correctly. 3k indexes to ensure uniqueness ? It sounds a lot more like you need 15 tables for 15 forms ... perhaps with a view for reading or maybe 1/15th of the columns to begin with by having a form_type column...or perhaps like an index function for the unique constraint....such that the output of the function is the normalized portion of data that’s required to be unique....
If you’ve really got 3k different uniqueness criteria differing by “id” ranges then it sounds like an expression index with a function spitting out the hash of uniqueness but that’d still be hairy, at least you wouldn’t eat the time on every read though. But I’d reduce that id range based problem to include a unique_type indicator column instead.
"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler
Hi, https://heapanalytics.com/blog/engineering/running-10-million-postgresql-indexes-in-production From the link shared above, it looks like what Meenatchi has done should work. Do the conditions on the partial index and query match exactly? ( greater than / greater than equals mismatch maybe?) If conditions for those partial indexes are mutually exclusive and the query has a matching condition then Postgres can use that index alone. Are we missing something here? Regards, Nanda On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > Meenatchi Sandanam wrote: >> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains >> multiple form data differentiated by ID range. Hence a column contains more than one form data. >> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes >> which suits my requirement. I have created Partial Indexes with ID Range as criteria and >> it provides Uniqueness and Indexing per form basis as expected. But DML operations on a >> particular form scans all the Indexes created for the entire table instead of scanning >> the Indexes created for that particular form ID Range. This degrades Planner Performance >> and Query Time more than 10 times as below, >> >> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) : > > It is crazy to create 3000 partial indexes on one table. > > No wonder planning and DML statements take very long, they have to consider all the > indexes. > >> explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789; > > Use a single index on (bigint50, id) for best performance. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com >
2018-03-02 14:49 GMT+01:00 Nandakumar M <m.nanda92@gmail.com>:
Hi,
https://heapanalytics.com/blog/engineering/running-10- million-postgresql-indexes-in- production
From the link shared above, it looks like what Meenatchi has done should work.
It can be different situation, there are not specified indexes per table. And if some projects works, it doesn't mean, so they are well designed.
PostgreSQL has not column storage. Look on column databases. They are designed for extra wide tables.
Regards
Pavel
Do the conditions on the partial index and query match exactly? (
greater than / greater than equals mismatch maybe?)
If conditions for those partial indexes are mutually exclusive and the
query has a matching condition then Postgres can use that index alone.
Are we missing something here?
Regards,
Nanda
On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Meenatchi Sandanam wrote:
>> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains
>> multiple form data differentiated by ID range. Hence a column contains more than one form data.
>> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes
>> which suits my requirement. I have created Partial Indexes with ID Range as criteria and
>> it provides Uniqueness and Indexing per form basis as expected. But DML operations on a
>> particular form scans all the Indexes created for the entire table instead of scanning
>> the Indexes created for that particular form ID Range. This degrades Planner Performance
>> and Query Time more than 10 times as below,
>>
>> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) :
>
> It is crazy to create 3000 partial indexes on one table.
>
> No wonder planning and DML statements take very long, they have to consider all the
> indexes.
>
>> explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789;
>
> Use a single index on (bigint50, id) for best performance.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
2018-03-02 15:29 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2018-03-02 14:49 GMT+01:00 Nandakumar M <m.nanda92@gmail.com>:Hi,
https://heapanalytics.com/blog/engineering/running-10-millio n-postgresql-indexes-in-produc tion
From the link shared above, it looks like what Meenatchi has done should work.It can be different situation, there are not specified indexes per table. And if some projects works, it doesn't mean, so they are well designed.PostgreSQL has not column storage. Look on column databases. They are designed for extra wide tables.
read the article:
1. Probably they use Citus
2. Since partial indexes are so easy to create and work with, we’ve wound up with over 10 million partial indexes across our entire cluster.
2. Since partial indexes are so easy to create and work with, we’ve wound up with over 10 million partial indexes across our entire cluster.
RegardsPavel
Do the conditions on the partial index and query match exactly? (
greater than / greater than equals mismatch maybe?)
If conditions for those partial indexes are mutually exclusive and the
query has a matching condition then Postgres can use that index alone.
Are we missing something here?
Regards,
Nanda
On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Meenatchi Sandanam wrote:
>> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table contains
>> multiple form data differentiated by ID range. Hence a column contains more than one form data.
>> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL Partial Indexes
>> which suits my requirement. I have created Partial Indexes with ID Range as criteria and
>> it provides Uniqueness and Indexing per form basis as expected. But DML operations on a
>> particular form scans all the Indexes created for the entire table instead of scanning
>> the Indexes created for that particular form ID Range. This degrades Planner Performance
>> and Query Time more than 10 times as below,
>>
>> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) :
>
> It is crazy to create 3000 partial indexes on one table.
>
> No wonder planning and DML statements take very long, they have to consider all the
> indexes.
>
>> explain analyse select id from form_data_copy where id between 3001 and 4000 and bigint50=789;
>
> Use a single index on (bigint50, id) for best performance.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>