Обсуждение: Table Partitioning Advice Request
Hi all. I'm planning to implement table partitioning as "suggested" (among other sources) in the official documentation. I'm using v8.4.2 at the moment. My case is far from the general one as: 1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs) 2. Rows will be inserted one-by-one or, in the worse case, in bunches of two or three 3. Partitioning will be based upon TIMESTAMP ranges 4. The "virtual" tables should approach (and possibly go past) 100M rows 5. Most (99%) of the INSERTs (and possibly SELECTs) will actually operate on a rather small number of partitions (hardly more than 2). My main TABLE is like the following one: CREATE TABLE events ( eventtype text not null, item_id int8 not null, event_date timestamp not null default now(), row_date timestamp not null default now(), event_id serial8 primary key ); where the partitioning would happen over the values of the event_date column. The row_date columns is to record the row creation TIMESTAMP as events can be created relatively to the past, the future or the current time. In my mind a solution which is simple to maintain is to add a simple RULE ... ON INSERT for every newly created partition table. The TRIGGER approach, in my opinion, is much more complex to maintain as either the body of the function needs to be rewritten as new partitions are added, or some external TABLE lookup is needed to choose the actual table name to be used for a (dynamically created) INSERT. Now the questions. 1. As the number of RULEs will grow with the time, how will change the efficiency of the query planner while "browsing" among the RULES? 2. In the case the previous answer would lead to bad news for me, is there any better/different approach to partitioning with TRIGGERs? 3. Is there any more general advise for such approaches? -- Vincenzo Romano NON QVIETIS MARIBVS NAVTA PERITVS
Rule is not advisable, Trigger is the best solution. --- Thanks Sam Jas --- On Thu, 17/12/09, Vincenzo Romano <vincenzo.romano@notorand.it> wrote:
|
The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
Why? If you have to choose among a couple hundred partition tables, the trigger function body is far from trivial! You really think that calling and running a trigger function for every line is the best solution? 2009/12/17 Sam Jas <samjas33@yahoo.com> > > Rule is not advisable, Trigger is the best solution. > > --- > Thanks > Sam Jas > > --- On Thu, 17/12/09, Vincenzo Romano <vincenzo.romano@notorand.it> wrote: > > From: Vincenzo Romano <vincenzo.romano@notorand.it> > Subject: [GENERAL] Table Partitioning Advice Request > To: pgsql-general@postgresql.org > Date: Thursday, 17 December, 2009, 11:05 AM > > Hi all. > > I'm planning to implement table partitioning as "suggested" (among > other sources) in the official documentation. > I'm using v8.4.2 at the moment. > > My case is far from the general one as: > 1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs) > 2. Rows will be inserted one-by-one or, in the worse case, in bunches > of two or three > 3. Partitioning will be based upon TIMESTAMP ranges > 4. The "virtual" tables should approach (and possibly go past) 100M rows > 5. Most (99%) of the INSERTs (and possibly SELECTs) will actually > operate on a rather small number of partitions (hardly more than 2). > > My main TABLE is like the following one: > > CREATE TABLE events ( > eventtype text not null, > item_id int8 not null, > event_date timestamp not null default now(), > row_date timestamp not null default now(), > event_id serial8 primary key > ); > > where the partitioning would happen over the values of the event_date column. > The row_date columns is to record the row creation TIMESTAMP as events > can be created relatively to the past, the future or the current time. > > In my mind a solution which is simple to maintain is to add a simple > RULE ... ON INSERT for every newly created partition table. > The TRIGGER approach, in my opinion, is much more complex to maintain > as either the body of the function needs to be rewritten as new > partitions are added, or > some external TABLE lookup is needed to choose the actual table name > to be used for a (dynamically created) INSERT. > > Now the questions. > > 1. As the number of RULEs will grow with the time, how will change > the efficiency of the query planner while "browsing" among the RULES? > 2. In the case the previous answer would lead to bad news for me, is > there any better/different approach to partitioning with TRIGGERs? > 3. Is there any more general advise for such approaches? > > -- > Vincenzo Romano > NON QVIETIS MARIBVS NAVTA PERITVS > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > ________________________________ > The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. -- Vincenzo Romano NotOrAnd Information Technologies cel. +39 339 8083886 | gtalk. vincenzo.romano@notorand.it fix. +39 0823 454163 | skype. notorand.it fax. +39 02 700506964 | msn. notorand.it NON QVIETIS MARIBVS NAVTA PERITVS
On Thu, Dec 17, 2009 at 02:41:40PM +0100, Vincenzo Romano wrote: > > 2009/12/17 Sam Jas <samjas33@yahoo.com> > > Rule is not advisable, Trigger is the best solution. > If you have to choose among a couple hundred partition tables, the > trigger function body is far from trivial! It's possible to generate such code and deploy it automatically. > You really think that calling and running a trigger function for every > line is the best solution? Yes. The trigger function is choosing from a small subset of the tables, or you know which tables exactly the rows are going into and insert them there. Oh, and please do trim, and don't top-post. I've fix this in this post. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2009/12/17 David Fetter <david@fetter.org>: >> You really think that calling and running a trigger function for every >> line is the best solution? > > Yes. The trigger function is choosing from a small subset of the > tables, or you know which tables exactly the rows are going into and > insert them there. So I understand that when a table has multiple RULEs, the planner just browse all of them to check the one(s) that will apply. What I have in mind is a "side table" with some bookkeeping data to be used by the trigger function to select the actual table to be used. Do you think this is faster than RULEs? Is there any performance study for the trigger-based implementation? > Oh, and please do trim, and don't top-post. I've fix this in this > post. Sorry! :-)
2009/12/17 Sam Jas <samjas33@yahoo.com> > > Rule is not advisable, Trigger is the best solution. Does the trigger solution need the TABLE CHECK constraint? It looks to me it won't.
Vincenzo Romano <vincenzo.romano@notorand.it> writes: > Is there any performance study for the trigger-based implementation? Consider that if you use RULE to partition, when you DROP a partition the INSERTs are locked out because the query depends on the table being droped. That alone could lead you to stop considering RULEs for partitioning. -- dim
2009/12/17 Dimitri Fontaine <dfontaine@hi-media.com>: > Vincenzo Romano <vincenzo.romano@notorand.it> writes: >> Is there any performance study for the trigger-based implementation? > > Consider that if you use RULE to partition, when you DROP a partition > the INSERTs are locked out because the query depends on the table being > droped. > > That alone could lead you to stop considering RULEs for partitioning. In that case I would also drop the relevant rule(s). But it seems anyway that the rule system would need to scan all the rules in order to know which one(s) to apply. And, at least in my case, I would have hundreds of rules ... Thanks for the hint, anyway. -- Vincenzo Romano NON QVIETIS MARIBVS NAVTA PERITVS
On Thu, Dec 17, 2009 at 8:29 AM, Vincenzo Romano <vincenzo.romano@notorand.it> wrote: > 2009/12/17 Sam Jas <samjas33@yahoo.com> >> >> Rule is not advisable, Trigger is the best solution. > > Does the trigger solution need the TABLE CHECK constraint? > It looks to me it won't. The table check constraint is used during selects to pick just the child tables needed to deliver the data you are asking for, assuming the key value is in the where clause. The main reason to avoid rules is that they're much less efficient than triggers. We use partitioning at work for our stats db, and partition by day, and we have about 2 years worth of stats data, so our insert trigger has about 700 if / elseif / else conditions in it, and the trigger is still very very fast. I tried it with rules before and it was way too slow. Note that the absolute fastest way to insert to the right child table is to do it from the application side, choosing the right child table there. We automated our trigger creation. While the trigger is huge in terms of number of lines, speed wise the creation of the new trigger each night at midnight is measured in milliseconds
2009/12/18 Scott Marlowe <scott.marlowe@gmail.com>: > The main reason to avoid rules is that they're much less efficient > than triggers. We use partitioning at work for our stats db, and > partition by day, and we have about 2 years worth of stats data, so > our insert trigger has about 700 if / elseif / else conditions in it, > and the trigger is still very very fast. I tried it with rules before > and it was way too slow. I still don't understand the poor efficiency of the rule system. I presume it's in the implementation, not the concept. Well, for the sake of efficiency I would like to avoid that if/elseif chain with an extra support table to look for the right partition. Unless you have a different advise against dynamic SQL (EXECUTE...) to define the partition table name. > Note that the absolute fastest way to insert to the right child table > is to do it from the application side, choosing the right child table > there. I definitely disagree on this advise. For the sake of speed I could also run some library level (and thus application level) database solution. The main objective for using an RDBMS (at least in my view) is to gather in a single place (the RDBMS) the data storage and management stuff. Moving policies to the application level would just scatter that stuff with faster but more error prone solutions. And the aplpication needs to know the partitioning policies. > We automated our trigger creation. While the trigger is huge in terms > of number of lines, speed wise the creation of the new trigger each > night at midnight is measured in milliseconds Correct. There is still another point for me to investigate into this partitioning area. When the partitions get created. I see from your answers that everyone is expecting the partitions to be already in place at the moment the trigger or the rule get fired. In my application you can generate events for the past or the future where there's no partition yet. -- Vincenzo Romano NON QVIETIS MARIBVS NAVTA PERITVS