Обсуждение: Tale partitioning

Поиск

Список

Период

Сортировка

Tale partitioning

От

"Benjamin Krajmalnik"

Дата:

26 апреля 2006 г., 17:50:43

I have created a parent table and inherited into several tables to facilitate table partitioning.

I noticed that the inherited tables did not inherit any of the indices, etc. in their DDL.

Do these need to be created, or are they actually there via inheritance but only “used” when the parent table is queried directly?

Also, I am having what appear to be some performance issues when querying a large table which currently has about 1.8 million rows.

It may be an issue of me adjusting to gow PostgreSQL’s planner works.

ksrawsysid is the primary key. I expected it to perform very quickly, yet it took about 90 seconds to return a recordset.

Adding a where clause made the query almost instantaneous.

The query which I issued was:

select * from tblksraw order by ksrawsysid desc limit 10

QUERY PLAN

Limit (cost=5351703.70..5351703.72 rows=10 width=1186)

-> Sort (cost=5351703.70..5355948.36 rows=1697865 width=1186)

Sort Key: public.tblksraw.ksrawsysid

-> Result (cost=0.00..108314.65 rows=1697865 width=1186)

-> Append (cost=0.00..108314.65 rows=1697865 width=1186)

-> Seq Scan on tblksraw (cost=0.00..108187.45 rows=1697145 width=255)

-> Seq Scan on tblksraw01 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw02 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw03 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw04 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw05 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw06 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw07 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw08 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw09 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw10 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw11 tblksraw (cost=0.00..10.60 rows=60 width=1186)

-> Seq Scan on tblksraw12 tblksraw (cost=0.00..10.60 rows=60 width=1186)

ksrawsysid is the primary key. I expected it to perform very quickly, yet it took about 90 seconds to return a recordset.

Adding a where clause made the query almost instantaneous.

select * from tblksraw where ksrawsysid between 1000000 and 1000020 order by ksrawsysid desc limit 10

QUERY PLAN

Limit (cost=142.85..142.87 rows=10 width=1186)

-> Sort (cost=142.85..142.93 rows=31 width=1186)

Sort Key: public.tblksraw.ksrawsysid

-> Result (cost=0.00..142.08 rows=31 width=1186)

-> Append (cost=0.00..142.08 rows=31 width=1186)

-> Index Scan using pk_tblksraw on tblksraw (cost=0.00..11.28 rows=19 width=255)

Index Cond: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw01 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw02 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw03 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw04 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw05 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw06 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw07 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw08 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw09 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw10 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw11 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

-> Seq Scan on tblksraw12 tblksraw (cost=0.00..10.90 rows=1 width=1186)

Filter: ((ksrawsysid >= 1000000) AND (ksrawsysid <= 1000020))

Form the query plans, it appears that I have to explicitly create indices on the inherited tables. I would have assumed they would have been inherited as well.

I expected that limiting the number of rows would have resulted in an immediate result, whereas in actuality what appears to be happening is that the entire table is being processed as a result set, but only 10 rows being returned. This seems like extraneous.

Regards,

Benjamin

Re: Tale partitioning

От

"Chris Hoover"

Дата:

26 апреля 2006 г., 20:33:18

Each of the partition tables needs it's own set of indexes. Build them, and see if the does not fix your performance issues. Also, be sure you turned on the constraint_exclusion parameter, and each table (other than the "master") has an constraint on it that is unique.

HTH,

Chris

Re: Tale partitioning

От

"Jim C. Nasby"

Дата:

26 апреля 2006 г., 22:52:15

On Wed, Apr 26, 2006 at 04:33:04PM -0400, Chris Hoover wrote:
> Each of the partition tables needs it's own set of indexes.  Build them, and
> see if the does not fix your performance issues.  Also, be sure you turned
> on the constraint_exclusion parameter, and each table (other than the
> "master") has an constraint on it that is unique.

I don't believe constraint_exclusion is smart enough to deal with ORDER
BY / LIMIT yet... :/
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Tale partitioning

От

"Jim Nasby"

Дата:

04 мая 2006 г., 18:37:47

Please include the mailing list in your replies so others can provide input.

> From: Sriram Dandapani [mailto:sdandapani@counterpane.com]
> Most of our reports use a order by limit X...The rowcount in
> some tables
> are > 200 million. (and the table size is about 50-100gb)
>
> Does the fact that constraint_exclusion doesn't deal with order by
> /limit
> makes partitioning  an unwise choice.

Well, in a worst-case scenario, partitioning will perform no worse than if you had one giant table. So it's not hurting
you,it may just not be helping you. 

> What if the main query does just an order by and an outer query wraps
> the inner query with a limit..

It all depends on if the order-by code is partitioning aware, and I'm not sure that it is. But if you index on the
appropriatecolumn it should hopefully make use of that... 

> I am trying to figure out if I should use partitioning or not (my goal
> is two-fold..purge lots of data in aged tables and make queries
> partition-aware)

Well, reason #1 sounds like plenty of justification for using partitioning to me.

> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby
> Sent: Wednesday, April 26, 2006 3:51 PM
> To: Chris Hoover
> Cc: Benjamin Krajmalnik; pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Tale partitioning
>
> On Wed, Apr 26, 2006 at 04:33:04PM -0400, Chris Hoover wrote:
> > Each of the partition tables needs it's own set of indexes.  Build
> them, and
> > see if the does not fix your performance issues.  Also, be sure you
> turned
> > on the constraint_exclusion parameter, and each table
> (other than the
> > "master") has an constraint on it that is unique.
>
> I don't believe constraint_exclusion is smart enough to deal
> with ORDER
> BY / LIMIT yet... :/
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Tale partitioning

От

"Sriram Dandapani"

Дата:

04 мая 2006 г., 18:44:03

Thanks...looks like partitioning will help.

-----Original Message-----
From: Jim Nasby [mailto:jnasby@pervasive.com]
Sent: Thursday, May 04, 2006 11:37 AM
To: Sriram Dandapani
Cc: pgsql-admin@postgresql.org
Subject: RE: [ADMIN] Tale partitioning

Please include the mailing list in your replies so others can provide
input.

> From: Sriram Dandapani [mailto:sdandapani@counterpane.com]
> Most of our reports use a order by limit X...The rowcount in
> some tables
> are > 200 million. (and the table size is about 50-100gb)
>
> Does the fact that constraint_exclusion doesn't deal with order by
> /limit
> makes partitioning  an unwise choice.

Well, in a worst-case scenario, partitioning will perform no worse than
if you had one giant table. So it's not hurting you, it may just not be
helping you.

> What if the main query does just an order by and an outer query wraps
> the inner query with a limit..

It all depends on if the order-by code is partitioning aware, and I'm
not sure that it is. But if you index on the appropriate column it
should hopefully make use of that...

> I am trying to figure out if I should use partitioning or not (my goal
> is two-fold..purge lots of data in aged tables and make queries
> partition-aware)

Well, reason #1 sounds like plenty of justification for using
partitioning to me.

> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby
> Sent: Wednesday, April 26, 2006 3:51 PM
> To: Chris Hoover
> Cc: Benjamin Krajmalnik; pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Tale partitioning
>
> On Wed, Apr 26, 2006 at 04:33:04PM -0400, Chris Hoover wrote:
> > Each of the partition tables needs it's own set of indexes.  Build
> them, and
> > see if the does not fix your performance issues.  Also, be sure you
> turned
> > on the constraint_exclusion parameter, and each table
> (other than the
> > "master") has an constraint on it that is unique.
>
> I don't believe constraint_exclusion is smart enough to deal
> with ORDER
> BY / LIMIT yet... :/
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Tale partitioning

От

"Jim C. Nasby"

Дата:

04 мая 2006 г., 18:49:02

BTW, I should have mentioned that partitioning is a very new feature and
that folks probably would like to know about shortcommings you find
while using it.

On Thu, May 04, 2006 at 11:40:51AM -0700, Sriram Dandapani wrote:
> Thanks...looks like partitioning will help.
>
> -----Original Message-----
> From: Jim Nasby [mailto:jnasby@pervasive.com]
> Sent: Thursday, May 04, 2006 11:37 AM
> To: Sriram Dandapani
> Cc: pgsql-admin@postgresql.org
> Subject: RE: [ADMIN] Tale partitioning
>
> Please include the mailing list in your replies so others can provide
> input.
>
> > From: Sriram Dandapani [mailto:sdandapani@counterpane.com]
> > Most of our reports use a order by limit X...The rowcount in
> > some tables
> > are > 200 million. (and the table size is about 50-100gb)
> >
> > Does the fact that constraint_exclusion doesn't deal with order by
> > /limit
> > makes partitioning  an unwise choice.
>
> Well, in a worst-case scenario, partitioning will perform no worse than
> if you had one giant table. So it's not hurting you, it may just not be
> helping you.
>
> > What if the main query does just an order by and an outer query wraps
> > the inner query with a limit..
>
> It all depends on if the order-by code is partitioning aware, and I'm
> not sure that it is. But if you index on the appropriate column it
> should hopefully make use of that...
>
> > I am trying to figure out if I should use partitioning or not (my goal
> > is two-fold..purge lots of data in aged tables and make queries
> > partition-aware)
>
> Well, reason #1 sounds like plenty of justification for using
> partitioning to me.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Tale partitioning

От

"Mark Liberman"

Дата:

05 мая 2006 г., 01:53:43

>BTW, I should have mentioned that partitioning is a very new feature and
>that folks probably would like to know about shortcommings you find
>while using it.

We just implemented partitioning, and have found it very useful for dropping old data, as opposed to deleting and enduring the subsequent vacuum.

One unforeseen issue we ran into had to do with postgres basing it's execution plans on the master-table (which is generally empty) and therefore choosing inefficient merge-joins, in our case. The work-around, however, was for us to delete the rows in pg_statistics for the master tables, as this thread indicates.

http://archives.postgresql.org/pgsql-performance/2006-05/msg00024.php

Once we employed the work-around things seem good.

- Mark