Обсуждение: Tale partitioning
I have created a parent table and inherited into several tables to facilitate table partitioning.
I noticed that the inherited tables did not inherit any of the indices, etc. in their DDL.
Do these need to be created, or are they actually there via inheritance but only “used” when the parent table is queried directly?
Also, I am having what appear to be some performance issues when querying a large table which currently has about 1.8 million rows.
It may be an issue of me adjusting to gow PostgreSQL’s planner works.
ksrawsysid is the primary key. I expected it to perform very quickly, yet it took about 90 seconds to return a recordset.
Adding a where clause made the query almost instantaneous.
The query which I issued was:
select * from tblksraw order by ksrawsysid desc limit 10
QUERY PLAN
Limit (cost=5351703.70..5351703.72 rows=10 width=1186)
-> Sort (cost=5351703.70..5355948.36 rows=1697865 width=1186)
Sort Key: public.tblksraw.ksrawsysid
-> Result (cost=0.00..108314.65 rows=1697865 width=1186)
-> Append (cost=0.00..108314.65 rows=1697865 width=1186)
-> Seq Scan on tblksraw (cost=0.00..108187.45 rows=1697145 width=255)
-> Seq Scan on tblksraw01 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw02 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw03 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw04 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw05 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw06 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw07 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw08 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw09 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw10 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw11 tblksraw (cost=0.00..10.60 rows=60 width=1186)
-> Seq Scan on tblksraw12 tblksraw (cost=0.00..10.60 rows=60 width=1186)
ksrawsysid is the primary key. I expected it to perform very quickly, yet it took about 90 seconds to return a recordset.
Adding a where clause made the query almost instantaneous.
select * from tblksraw where ksrawsysid between 1000000 and 1000020 order by ksrawsysid desc limit 10
QUERY PLAN
Limit (cost=142.85..142.87 rows=10 width=1186)
-> Sort (cost=142.85..142.93 rows=31 width=1186)
Sort Key: public.tblksraw.ksrawsysid
-> Result (cost=0.00..142.08 rows=31 width=1186)
-> Append (cost=0.00..142.08 rows=31 width=1186)
-> Index Scan using pk_tblksraw on tblksraw (cost=0.00..11.28 rows=19 width=255)
Index Cond: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw01 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw02 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw03 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw04 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw05 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw06 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw07 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw08 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw09 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw10 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw11 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
-> Seq Scan on tblksraw12 tblksraw (cost=0.00..10.90 rows=1 width=1186)
Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))
Form the query plans, it appears that I have to explicitly create indices on the inherited tables. I would have assumed they would have been inherited as well.
I expected that limiting the number of rows would have resulted in an immediate result, whereas in actuality what appears to be happening is that the entire table is being processed as a result set, but only 10 rows being returned. This seems like extraneous.
Regards,
Benjamin
HTH,
Chris
On Wed, Apr 26, 2006 at 04:33:04PM -0400, Chris Hoover wrote: > Each of the partition tables needs it's own set of indexes. Build them, and > see if the does not fix your performance issues. Also, be sure you turned > on the constraint_exclusion parameter, and each table (other than the > "master") has an constraint on it that is unique. I don't believe constraint_exclusion is smart enough to deal with ORDER BY / LIMIT yet... :/ -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Please include the mailing list in your replies so others can provide input. > From: Sriram Dandapani [mailto:sdandapani@counterpane.com] > Most of our reports use a order by limit X...The rowcount in > some tables > are > 200 million. (and the table size is about 50-100gb) > > Does the fact that constraint_exclusion doesn't deal with order by > /limit > makes partitioning an unwise choice. Well, in a worst-case scenario, partitioning will perform no worse than if you had one giant table. So it's not hurting you,it may just not be helping you. > What if the main query does just an order by and an outer query wraps > the inner query with a limit.. It all depends on if the order-by code is partitioning aware, and I'm not sure that it is. But if you index on the appropriatecolumn it should hopefully make use of that... > I am trying to figure out if I should use partitioning or not (my goal > is two-fold..purge lots of data in aged tables and make queries > partition-aware) Well, reason #1 sounds like plenty of justification for using partitioning to me. > -----Original Message----- > From: pgsql-admin-owner@postgresql.org > [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby > Sent: Wednesday, April 26, 2006 3:51 PM > To: Chris Hoover > Cc: Benjamin Krajmalnik; pgsql-admin@postgresql.org > Subject: Re: [ADMIN] Tale partitioning > > On Wed, Apr 26, 2006 at 04:33:04PM -0400, Chris Hoover wrote: > > Each of the partition tables needs it's own set of indexes. Build > them, and > > see if the does not fix your performance issues. Also, be sure you > turned > > on the constraint_exclusion parameter, and each table > (other than the > > "master") has an constraint on it that is unique. > > I don't believe constraint_exclusion is smart enough to deal > with ORDER > BY / LIMIT yet... :/ > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Thanks...looks like partitioning will help. -----Original Message----- From: Jim Nasby [mailto:jnasby@pervasive.com] Sent: Thursday, May 04, 2006 11:37 AM To: Sriram Dandapani Cc: pgsql-admin@postgresql.org Subject: RE: [ADMIN] Tale partitioning Please include the mailing list in your replies so others can provide input. > From: Sriram Dandapani [mailto:sdandapani@counterpane.com] > Most of our reports use a order by limit X...The rowcount in > some tables > are > 200 million. (and the table size is about 50-100gb) > > Does the fact that constraint_exclusion doesn't deal with order by > /limit > makes partitioning an unwise choice. Well, in a worst-case scenario, partitioning will perform no worse than if you had one giant table. So it's not hurting you, it may just not be helping you. > What if the main query does just an order by and an outer query wraps > the inner query with a limit.. It all depends on if the order-by code is partitioning aware, and I'm not sure that it is. But if you index on the appropriate column it should hopefully make use of that... > I am trying to figure out if I should use partitioning or not (my goal > is two-fold..purge lots of data in aged tables and make queries > partition-aware) Well, reason #1 sounds like plenty of justification for using partitioning to me. > -----Original Message----- > From: pgsql-admin-owner@postgresql.org > [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby > Sent: Wednesday, April 26, 2006 3:51 PM > To: Chris Hoover > Cc: Benjamin Krajmalnik; pgsql-admin@postgresql.org > Subject: Re: [ADMIN] Tale partitioning > > On Wed, Apr 26, 2006 at 04:33:04PM -0400, Chris Hoover wrote: > > Each of the partition tables needs it's own set of indexes. Build > them, and > > see if the does not fix your performance issues. Also, be sure you > turned > > on the constraint_exclusion parameter, and each table > (other than the > > "master") has an constraint on it that is unique. > > I don't believe constraint_exclusion is smart enough to deal > with ORDER > BY / LIMIT yet... :/ > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
BTW, I should have mentioned that partitioning is a very new feature and that folks probably would like to know about shortcommings you find while using it. On Thu, May 04, 2006 at 11:40:51AM -0700, Sriram Dandapani wrote: > Thanks...looks like partitioning will help. > > -----Original Message----- > From: Jim Nasby [mailto:jnasby@pervasive.com] > Sent: Thursday, May 04, 2006 11:37 AM > To: Sriram Dandapani > Cc: pgsql-admin@postgresql.org > Subject: RE: [ADMIN] Tale partitioning > > Please include the mailing list in your replies so others can provide > input. > > > From: Sriram Dandapani [mailto:sdandapani@counterpane.com] > > Most of our reports use a order by limit X...The rowcount in > > some tables > > are > 200 million. (and the table size is about 50-100gb) > > > > Does the fact that constraint_exclusion doesn't deal with order by > > /limit > > makes partitioning an unwise choice. > > Well, in a worst-case scenario, partitioning will perform no worse than > if you had one giant table. So it's not hurting you, it may just not be > helping you. > > > What if the main query does just an order by and an outer query wraps > > the inner query with a limit.. > > It all depends on if the order-by code is partitioning aware, and I'm > not sure that it is. But if you index on the appropriate column it > should hopefully make use of that... > > > I am trying to figure out if I should use partitioning or not (my goal > > is two-fold..purge lots of data in aged tables and make queries > > partition-aware) > > Well, reason #1 sounds like plenty of justification for using > partitioning to me. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>BTW, I should have mentioned that partitioning is a very new feature and
>that folks probably would like to know about shortcommings you find
>while using it.
We just implemented partitioning, and have found it very useful for dropping old data, as opposed to deleting and enduring the subsequent vacuum.
One unforeseen issue we ran into had to do with postgres basing it's execution plans on the master-table (which is generally empty) and therefore choosing inefficient merge-joins, in our case. The work-around, however, was for us to delete the rows in pg_statistics for the master tables, as this thread indicates.
http://archives.postgresql.org/pgsql-performance/2006-05/msg00024.php
Once we employed the work-around things seem good.
- Mark