Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1

Поиск
Список
Период
Сортировка
От Emmanuel Cecchet
Тема Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Дата
Msg-id 49470CDC.7020707@frogthinker.org
обсуждение исходный текст
Ответ на Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1  ("Jaime Casanova" <jcasanov@systemguards.com.ec>)
Ответы Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1  ("Nikhil Sontakke" <nikhil.sontakke@enterprisedb.com>)
Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Список pgsql-hackers
Hi all,

I will be working on a roadmap for the partitioning features. I think
that there are different needs and that we will not be able to address
them all in 8.5 or even 8.6.
The goal will be to get things done step by step but possibly with a
design that will not require major refactoring to support new features.
I will try to setup the new wiki page tomorrow.

In the meantime, I have made some more tests with the trigger in C (see
attached patch). To prevent duplicating too much code, it requires the
ExecRelCheck method to be exported (that would be nice to have this
function exported in 8.4 so that we can start experimenting in 8.4 and
don't have to wait another year for 8.5). If there is locality in the
inserts (which might be the case if you COPY sorted data), the
performance remains constant regardless the number of child tables.
My initial tests to insert 140k rows are as follows:
- direct inserts in a child table: 2 seconds
- pgplsql trigger (IF ... ELSE IF ... blocks) : 14.5 seconds.
- C trigger: 4 seconds (actually the overhead is in the constraint check)

Right now if the row cannot be inserted in any child table, it is
inserted in the parent. If you want to fail, we can add a 'fail trigger'
(after all other triggers) that generates an error if previous triggers
did not capture the row. If you want to create a new partition, you can
have another trigger to handle that.

So I think that this trigger approach is pretty flexible like people
used AOP in J2EE servers to process requests. It has also the advantage
of allowing fast prototyping. It should also be easy to push that
functionality down in the core as needed.

Is it ok if I move Simon's requirement document under a more generic
'Table partitioning' page on the Wiki?

Thanks for your feedback,
manu

Jaime Casanova wrote:
> On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark <stark@enterprisedb.com> wrote:
>
>> Josh Berkus <josh@agliodbs.com> writes:
>>
>>
>>> Hackers,
>>>
>>> We don't yet seem to have a clear specification for this feature, and the Other
>>> Open Source DB has shown us how problematic it is to get auto-partitioning
>>> wrong.
>>>
>>> Should we defer auto-partitioning to 8.5?
>>>
>> If we're serious about having a "next generation" partitioning with a concept
>> of partition keys then it seems to me to make more sense to do that first and
>> then add on a feature like this.
>>
>>
>
> +1
>
>
>> This is still very useful. I haven't looked at the actual patch, does it
>> require core changes or can it be stashed in a pgfoundry or contrib module?
>>
>>
>
> what i don't like about this one is that it creates partitions at
> create table time and to manually add all new partitions (inherit
> tables and modify the trigger)... and what i want to see is an
> automatic creation when it's needed...
>
>
>


--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet

### Eclipse Workspace Patch 1.0
#P Postgres-HEAD
Index: src/test/regress/regress.c
===================================================================
RCS file: /root/cvsrepo/pgsql/src/test/regress/regress.c,v
retrieving revision 1.71
diff -u -r1.71 regress.c
--- src/test/regress/regress.c    25 Mar 2008 22:42:46 -0000    1.71
+++ src/test/regress/regress.c    16 Dec 2008 01:42:50 -0000
@@ -10,6 +10,9 @@
 #include "utils/geo_decls.h"    /* includes <math.h> */
 #include "executor/executor.h"    /* For GetAttributeByName */
 #include "commands/sequence.h"    /* for nextval() */
+#include "catalog/namespace.h"
+#include "executor/executor.h"
+#include "executor/tuptable.h"

 #define P_MAXDIG 12
 #define LDELIM            '('
@@ -732,3 +735,141 @@
     *--walk = '\0';
     PG_RETURN_CSTRING(result);
 }
+
+
+/*
+ * Partition trigger test
+ *
+ * The trigger should be used this way:
+ * CREATE TRIGGER trigger_name
+    BEFORE INSERT ON master_table
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(child_table_name, column_number, min_val, max_val);
+ */
+
+extern Datum partition_insert_trigger(PG_FUNCTION_ARGS);
+
+static Datum
+check_constraints_and_insert_tuple(Relation child_table_relation, TriggerData *trigdata ,HeapTuple trigtuple)
+{ // Check the constraints
+    ResultRelInfo    *resultRelInfo;
+    TupleTableSlot    *slot;
+    EState             *estate = CreateExecutorState();
+    Datum             result;
+
+    result = PointerGetDatum(trigdata->tg_trigtuple);
+
+    resultRelInfo = makeNode(ResultRelInfo);
+    resultRelInfo->ri_RangeTableIndex = 1;        /* dummy */
+    resultRelInfo->ri_RelationDesc = child_table_relation;
+
+    estate->es_result_relations = resultRelInfo;
+    estate->es_num_result_relations = 1;
+    estate->es_result_relation_info = resultRelInfo;
+
+    /* Set up a tuple slot too */
+    slot = MakeSingleTupleTableSlot(trigdata->tg_relation->rd_att);
+    ExecStoreTuple(trigtuple, slot, InvalidBuffer, false);
+
+    if (ExecRelCheck(resultRelInfo, slot, estate) == NULL)
+    { // Constraints satisfied, insert the row in the child table
+        bool use_wal = true;
+        bool use_fsm=true;
+
+        /* BEFORE ROW INSERT Triggers */
+        if (resultRelInfo->ri_TrigDesc &&
+                resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0)
+        {
+            HeapTuple    newtuple;
+            newtuple = ExecBRInsertTriggers(estate, resultRelInfo, trigtuple);
+
+            if (newtuple != trigtuple) /* modified by Trigger(s) */
+            {
+                heap_freetuple(trigtuple);
+                trigtuple = newtuple;
+            }
+        }
+
+        /* Perform the insert
+         * TODO: Check that we detect constraint violation if before row insert does something bad */
+        heap_insert(child_table_relation, trigtuple, GetCurrentCommandId(true), use_wal, use_fsm);
+
+        /* Update indices */
+        ExecOpenIndices(resultRelInfo);
+        if (resultRelInfo->ri_NumIndices > 0)
+            ExecInsertIndexTuples(slot, &(trigtuple->t_self), estate, false);
+
+        /* AFTER ROW INSERT Triggers */
+        ExecARInsertTriggers(estate, resultRelInfo, trigtuple);
+
+        result = PointerGetDatum(NULL);
+    }
+    // Free resources
+    FreeExecutorState(estate);
+    ExecDropSingleTupleTableSlot(slot);
+
+    return result;
+}
+
+
+PG_FUNCTION_INFO_V1(partition_insert_trigger);
+
+static Relation    last_inserted_relation;
+
+Datum
+partition_insert_trigger(PG_FUNCTION_ARGS)
+{
+    TriggerData *trigdata = (TriggerData *) fcinfo->context;
+    HeapTuple    trigtuple= trigdata->tg_trigtuple;
+    TupleConstr *constr;
+    char        *child_table_name;
+    Relation    child_table_relation;
+    Oid            relation_id;
+    Datum        result;
+
+    // Try to exploit locality for bulk inserts
+    // We expect consecutive insert to go to the same child table
+    if (last_inserted_relation != NULL)
+    {   // Try the last table we used
+        result = check_constraints_and_insert_tuple(last_inserted_relation, trigdata, trigtuple);
+        if (result == PointerGetDatum(NULL))
+            return result;
+        // We got a miss
+        last_inserted_relation = NULL;
+    }
+
+    /* make sure it's called as a trigger at all */
+    if (!CALLED_AS_TRIGGER(fcinfo))
+        elog(ERROR, "partition_insert_trigger: not called by trigger manager");
+
+    /* Sanity checks */
+    if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event) || !TRIGGER_FIRED_BEFORE(trigdata->tg_event))
+        elog(ERROR, "partition_insert_trigger: not called on insert before");
+
+    // Child table name is either given as the unique parameter or it is the name of the trigger
+    if (trigdata->tg_trigger->tgnargs == 1)
+        child_table_name = trigdata->tg_trigger->tgargs[0];
+    else
+        child_table_name = trigdata->tg_trigger->tgname;
+
+    // Lookup the child relation
+    relation_id = RelnameGetRelid(child_table_name);
+    if (relation_id == InvalidOid)
+        elog(ERROR, "partition_insert_trigger: Invalid child table %s", child_table_name);
+    child_table_relation = RelationIdGetRelation(relation_id);
+    if (child_table_relation == NULL)
+        elog(ERROR, "partition_insert_trigger: Failed to locate relation for child table %s", child_table_name);
+
+    constr = child_table_relation->rd_att->constr;
+    if (constr->num_check == 0)
+        elog(ERROR, "partition_insert_trigger: No constraint found for child table %s", child_table_name);
+
+    result = check_constraints_and_insert_tuple(child_table_relation, trigdata, trigtuple);
+
+    if (result == PointerGetDatum(NULL))
+        last_inserted_relation = child_table_relation;
+
+    RelationClose(child_table_relation);
+
+    return result;
+}
+
Index: src/backend/executor/execMain.c
===================================================================
RCS file: /root/cvsrepo/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.314
diff -u -r1.314 execMain.c
--- src/backend/executor/execMain.c    31 Oct 2008 21:07:54 -0000    1.314
+++ src/backend/executor/execMain.c    16 Dec 2008 01:42:50 -0000
@@ -1947,7 +1947,7 @@
 /*
  * ExecRelCheck --- check that tuple meets constraints for result relation
  */
-static const char *
+const char *
 ExecRelCheck(ResultRelInfo *resultRelInfo,
              TupleTableSlot *slot, EState *estate)
 {
Index: src/test/regress/output/create_function_1.source
===================================================================
RCS file: /root/cvsrepo/pgsql/src/test/regress/output/create_function_1.source,v
retrieving revision 1.34
diff -u -r1.34 create_function_1.source
--- src/test/regress/output/create_function_1.source    31 Oct 2008 19:37:56 -0000    1.34
+++ src/test/regress/output/create_function_1.source    16 Dec 2008 01:42:50 -0000
@@ -47,6 +47,10 @@
         RETURNS int4
         AS '@libdir@/regress@DLSUFFIX@'
         LANGUAGE C STRICT;
+CREATE FUNCTION partition_insert_trigger ()
+        RETURNS trigger
+        AS '@libdir@/regress@DLSUFFIX@'
+        LANGUAGE C STRICT;
 -- Things that shouldn't work:
 CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
     AS 'SELECT ''not an integer'';';
@@ -80,3 +84,70 @@
 CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
     AS 'nosuch';
 ERROR:  there is no built-in function named "nosuch"
+-- Partitioning trigger test
+CREATE TABLE master (
+  id     int not null,
+  date   date not null,
+  value  int
+);
+CREATE TABLE child_y2008m01 (
+    CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
+) INHERITS (master);
+CREATE TABLE child_y2008m02 (
+    CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
+) INHERITS (master);
+CREATE TABLE child_y2008m03 (
+    CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
+) INHERITS (master);
+CREATE TRIGGER insert_child_y2008m01
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01');
+CREATE TRIGGER child_y2008m02
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+CREATE TRIGGER child_y2008m03
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+INSERT INTO master VALUES (1, '2008-01-15', 1);
+INSERT INTO master VALUES (2, '2008-02-15', 2);
+INSERT INTO master VALUES (3, '2008-03-15', 3);
+INSERT INTO master VALUES (4, '2008-04-15', 4);
+COPY master FROM '/root/pg_partitions/copy_input.txt';
+select * from master;
+ id |    date    | value
+----+------------+-------
+  4 | 04-15-2008 |     4
+  1 | 01-15-2008 |     1
+ 11 | 01-10-2008 |    11
+  2 | 02-15-2008 |     2
+ 12 | 02-15-2008 |    12
+  3 | 03-15-2008 |     3
+ 13 | 03-15-2008 |    13
+(7 rows)
+
+ select * from child_y2008m01;
+ id |    date    | value
+----+------------+-------
+  1 | 01-15-2008 |     1
+ 11 | 01-10-2008 |    11
+(2 rows)
+
+  select * from child_y2008m02;
+ id |    date    | value
+----+------------+-------
+  2 | 02-15-2008 |     2
+ 12 | 02-15-2008 |    12
+(2 rows)
+
+  select * from child_y2008m03;
+ id |    date    | value
+----+------------+-------
+  3 | 03-15-2008 |     3
+ 13 | 03-15-2008 |    13
+(2 rows)
+
+DROP TABLE master CASCADE;
+NOTICE:  drop cascades to 3 other objects
+DETAIL:  drop cascades to table child_y2008m01
+drop cascades to table child_y2008m02
+drop cascades to table child_y2008m03
Index: src/include/executor/executor.h
===================================================================
RCS file: /root/cvsrepo/pgsql/src/include/executor/executor.h,v
retrieving revision 1.152
diff -u -r1.152 executor.h
--- src/include/executor/executor.h    31 Oct 2008 21:07:55 -0000    1.152
+++ src/include/executor/executor.h    16 Dec 2008 01:42:50 -0000
@@ -155,6 +155,8 @@
 extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
                 TupleTableSlot *slot, EState *estate);
+extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo,
+             TupleTableSlot *slot, EState *estate);
 extern TupleTableSlot *EvalPlanQual(EState *estate, Index rti,
              ItemPointer tid, TransactionId priorXmax);
 extern PlanState *ExecGetActivePlanTree(QueryDesc *queryDesc);
Index: src/test/regress/input/create_function_1.source
===================================================================
RCS file: /root/cvsrepo/pgsql/src/test/regress/input/create_function_1.source,v
retrieving revision 1.19
diff -u -r1.19 create_function_1.source
--- src/test/regress/input/create_function_1.source    1 Oct 2008 22:38:57 -0000    1.19
+++ src/test/regress/input/create_function_1.source    16 Dec 2008 01:42:50 -0000
@@ -52,6 +52,12 @@
         AS '@libdir@/regress@DLSUFFIX@'
         LANGUAGE C STRICT;

+CREATE FUNCTION partition_insert_trigger ()
+        RETURNS trigger
+        AS '@libdir@/regress@DLSUFFIX@'
+        LANGUAGE C STRICT;
+
+
 -- Things that shouldn't work:

 CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
@@ -77,3 +83,48 @@

 CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
     AS 'nosuch';
+
+-- Partitioning trigger test
+
+CREATE TABLE master (
+  id     int not null,
+  date   date not null,
+  value  int
+);
+
+CREATE TABLE child_y2008m01 (
+    CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
+) INHERITS (master);
+
+CREATE TABLE child_y2008m02 (
+    CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
+) INHERITS (master);
+
+CREATE TABLE child_y2008m03 (
+    CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
+) INHERITS (master);
+
+CREATE TRIGGER insert_child_y2008m01
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01');
+
+CREATE TRIGGER child_y2008m02
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+
+CREATE TRIGGER child_y2008m03
+    BEFORE INSERT ON master
+    FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+
+INSERT INTO master VALUES (1, '2008-01-15', 1);
+INSERT INTO master VALUES (2, '2008-02-15', 2);
+INSERT INTO master VALUES (3, '2008-03-15', 3);
+INSERT INTO master VALUES (4, '2008-04-15', 4);
+COPY master FROM '/root/pg_partitions/copy_input.txt';
+
+select * from master;
+select * from child_y2008m01;
+select * from child_y2008m02;
+select * from child_y2008m03;
+
+DROP TABLE master CASCADE;

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: planner issue with constraint exclusion
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: stat() vs cygwin