Обсуждение: Issues with patitionning and triggers

Поиск
Список
Период
Сортировка

Issues with patitionning and triggers

От
Samuel Gilbert
Дата:
I have data warehousing DB 2 fairly big tables : one contains about 200
million rows and the other one contains about 4 billion rows.  Some queries
are now taking way too long to run (> 13 hours).  I need to get these queries
to run in an hour or so.  The slowdown was gradual, but I eventually hit a
wall, when the planner stopped using indexes.

All of this was done on PostgreSQL 9.2.0 64-bit compiled from the official
source.  Significant changes in postgresql.conf :

shared_buffers = 8GB
work_mem = 8GB
maintenance_work_mem = 8GB
max_stack_depth = 2MB

Here is the information about the big tables :

SELECT
   nspname || '.' || relname AS relation,
   pg_stat_get_live_tuples(C.oid) AS live_tuples,
   pg_size_pretty(pg_relation_size(C.oid, 'main')) AS relation_size,
   pg_size_pretty(pg_indexes_size(C.oid)) AS indexes_size,
   pg_size_pretty(pg_total_relation_size(C.oid)) AS total_size
FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE
   nspname NOT IN ('pg_catalog', 'information_schema') AND
   C.relkind <> 'i' AND
   nspname !~ '^pg_toast' AND
   relname IN ('obs', 'forecast');

    relation     | live_tuples | relation_size | indexes_size | total_size
-----------------+-------------+---------------+--------------+------------
 public.obs      |   193235914 | 18 GB         | 15 GB        | 33 GB
 public.forecast |  3914247064 | 425 GB        | 148 GB       | 573 GB

From what I read, the planner probably stopped using indexes since they are so
big compared to the system RAM (64 GB).  I therefore tried partitioning the
obs table.  The forecast table is the biggest issue, but I wanted to be able
to try this out in a single day before tackling the real monster.

Here is the structure of the parent table :

                 Table "public.observation"
      Column      |            Type             | Modifiers
------------------+-----------------------------+-----------
 station          | integer                     | not null
 method           | integer                     | not null
 startdate        | timestamp without time zone | not null
 duration         | interval                    | not null
 value            | real                        | not null
 dataset          | integer                     | not null
 modificationdate | timestamp without time zone | not null
Check constraints:
    "observation_check1" CHECK ((startdate + duration) < now())
    "observation_duration_check1" CHECK (duration > '00:00:00'::interval)
Foreign-key constraints:
    "observation_dataset_fkey1" FOREIGN KEY (dataset) REFERENCES dataset(id)
MATCH FULL
    "observation_method_fkey1" FOREIGN KEY (method) REFERENCES method(id)
MATCH FULL
    "observation_station_fkey1" FOREIGN KEY (station) REFERENCES station(id)
MATCH FULL
Triggers:
    trigger_insert_00 BEFORE INSERT ON observation FOR EACH ROW EXECUTE
PROCEDURE observation_insert_trigger()

INSERT ... RETURNING does not work with partitioned tables, since the trigger
function on the parent that dispatches new rows to the children tables, must
return NULL.  If the trigger function on the parent ends with "RETURN NEW",
INSERT ... RETURNING works, but new rows are duplicated; they are inserted
both in the parent and child tables.

Is there a way to make INSERT ... RETURNING work without duplicating the rows?

The other issue I'm encountering is that I also have very simple BEFORE UPDATE
and BEFORE INSERT triggers that set the modification date on every single row
:

CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$
BEGIN
   NEW.modificationDate := now();
   RETURN NEW;
END;
$$ LANGUAGE 'plpgsql';

The modification date must be updated if any row is modified in any way.  I
first tried to define the triggers on the parent table.  This worked, but I
realized that if a queries targets explicitly a child table, it could modify a
row without the date being updated.  I therefore dropped the triggers on the
parent table and defined them for every child.  To my great surprise, the
insert below failed with a message saying that NULLs are not allowed in the
modificationdate column.

INSERT INTO observation
(dataset, station, method, startdate, duration, value)
VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42);

Why isn't the BEFORE INSERT trigger on the child table being executed?

Cheers,
Samuel Gilbert


Re: Issues with patitionning and triggers

От
Adrian Klaver
Дата:
On 02/18/2014 02:10 PM, Samuel Gilbert wrote:
> I have data warehousing DB 2 fairly big tables : one contains about 200
> million rows and the other one contains about 4 billion rows.  Some queries
> are now taking way too long to run (> 13 hours).  I need to get these queries
> to run in an hour or so.  The slowdown was gradual, but I eventually hit a
> wall, when the planner stopped using indexes.
>

> The other issue I'm encountering is that I also have very simple BEFORE UPDATE
> and BEFORE INSERT triggers that set the modification date on every single row
> :
>
> CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$
> BEGIN
>     NEW.modificationDate := now();
>     RETURN NEW;
> END;
> $$ LANGUAGE 'plpgsql';
>
> The modification date must be updated if any row is modified in any way.  I
> first tried to define the triggers on the parent table.  This worked, but I
> realized that if a queries targets explicitly a child table, it could modify a
> row without the date being updated.  I therefore dropped the triggers on the
> parent table and defined them for every child.  To my great surprise, the
> insert below failed with a message saying that NULLs are not allowed in the
> modificationdate column.
>
> INSERT INTO observation
> (dataset, station, method, startdate, duration, value)
> VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42);
>
> Why isn't the BEFORE INSERT trigger on the child table being executed?

Constraints are checked before triggers are run.

>
> Cheers,
> Samuel Gilbert
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Issues with patitionning and triggers

От
Samuel Gilbert
Дата:
On 2014-02-18 14:25:59 Adrian Klaver wrote:
> On 02/18/2014 02:10 PM, Samuel Gilbert wrote:
> > I have data warehousing DB 2 fairly big tables : one contains about 200
> > million rows and the other one contains about 4 billion rows.  Some
> > queries
> > are now taking way too long to run (> 13 hours).  I need to get these
> > queries to run in an hour or so.  The slowdown was gradual, but I
> > eventually hit a wall, when the planner stopped using indexes.
> >
> >
> > The other issue I'm encountering is that I also have very simple BEFORE
> > UPDATE and BEFORE INSERT triggers that set the modification date on every
> > single row
> >
> >
> > CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$
> > BEGIN
> >
> >     NEW.modificationDate := now();
> >     RETURN NEW;
> >
> > END;
> > $$ LANGUAGE 'plpgsql';
> >
> > The modification date must be updated if any row is modified in any way.
> > I
> > first tried to define the triggers on the parent table.  This worked, but
> > I
> > realized that if a queries targets explicitly a child table, it could
> > modify a row without the date being updated.  I therefore dropped the
> > triggers on the parent table and defined them for every child.  To my
> > great surprise, the insert below failed with a message saying that NULLs
> > are not allowed in the modificationdate column.
> >
> > INSERT INTO observation
> > (dataset, station, method, startdate, duration, value)
> > VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42);
> >
> > Why isn't the BEFORE INSERT trigger on the child table being executed?
>
> Constraints are checked before triggers are run.
>
> > Cheers,
> > Samuel Gilbert

I don't think that is the case since I currently have the BEFORE INSERT
trigger working on the non-partitioned version of the table.

The modificationdate field has a NOT NULL constraint.  Even if I explicitly
provide a NULL for the modificationdate column, a date gets written in the
table.  This leads me to believe that the BEFORE INSERT trigger is really
executed before the constraint is checked.

What I don't understand is why the trigger doesn't appear to be executed when
it's defined on a child table.  I'll add a RAISE NOTICE to the trigger
function to makes sure it's not getting called.


Re: Issues with patitionning and triggers

От
Tom Lane
Дата:
Samuel Gilbert <samuel.gilbert@ec.gc.ca> writes:
> All of this was done on PostgreSQL 9.2.0 64-bit compiled from the official
> source.  Significant changes in postgresql.conf :

Why in the world are you using 9.2.0?  You're missing a year and a half
worth of bug fixes, some of them quite serious.

> INSERT ... RETURNING does not work with partitioned tables, since the trigger
> function on the parent that dispatches new rows to the children tables, must
> return NULL.  If the trigger function on the parent ends with "RETURN NEW",
> INSERT ... RETURNING works, but new rows are duplicated; they are inserted
> both in the parent and child tables.

> Is there a way to make INSERT ... RETURNING work without duplicating the rows?

Fraid not --- it only shows what got inserted into the parent table, which
is nothing if you're using this technique.

> The modification date must be updated if any row is modified in any way.  I
> first tried to define the triggers on the parent table.  This worked, but I
> realized that if a queries targets explicitly a child table, it could modify a
> row without the date being updated.  I therefore dropped the triggers on the
> parent table and defined them for every child.  To my great surprise, the
> insert below failed with a message saying that NULLs are not allowed in the
> modificationdate column.

You'd have to provide a self-contained example for anyone to help you with
that.  The most obvious explanation is that you forgot to attach the
trigger to the specific child table ...

            regards, tom lane


Re: Issues with patitionning and triggers

От
Adrian Klaver
Дата:
On 02/18/2014 02:42 PM, Samuel Gilbert wrote:
> On 2014-02-18 14:25:59 Adrian Klaver wrote:
>> On 02/18/2014 02:10 PM, Samuel Gilbert wrote:
>>> I have data warehousing DB 2 fairly big tables : one contains about 200
>>> million rows and the other one contains about 4 billion rows.  Some
>>> queries
>>> are now taking way too long to run (> 13 hours).  I need to get these
>>> queries to run in an hour or so.  The slowdown was gradual, but I
>>> eventually hit a wall, when the planner stopped using indexes.
>>>
>>>
>>> The other issue I'm encountering is that I also have very simple BEFORE
>>> UPDATE and BEFORE INSERT triggers that set the modification date on every
>>> single row
>>>
>>>
>>> CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$
>>> BEGIN
>>>
>>>      NEW.modificationDate := now();
>>>      RETURN NEW;
>>>
>>> END;
>>> $$ LANGUAGE 'plpgsql';
>>>
>>> The modification date must be updated if any row is modified in any way.
>>> I
>>> first tried to define the triggers on the parent table.  This worked, but
>>> I
>>> realized that if a queries targets explicitly a child table, it could
>>> modify a row without the date being updated.  I therefore dropped the
>>> triggers on the parent table and defined them for every child.  To my
>>> great surprise, the insert below failed with a message saying that NULLs
>>> are not allowed in the modificationdate column.
>>>
>>> INSERT INTO observation
>>> (dataset, station, method, startdate, duration, value)
>>> VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42);
>>>
>>> Why isn't the BEFORE INSERT trigger on the child table being executed?
>>
>> Constraints are checked before triggers are run.
>>
>>> Cheers,
>>> Samuel Gilbert
>
> I don't think that is the case since I currently have the BEFORE INSERT
> trigger working on the non-partitioned version of the table.

Sorry for steering you wrong. I could have sworn I saw the behavior I
mentioned, previously, when trying to do what you have done.

>
> The modificationdate field has a NOT NULL constraint.  Even if I explicitly
> provide a NULL for the modificationdate column, a date gets written in the
> table.  This leads me to believe that the BEFORE INSERT trigger is really
> executed before the constraint is checked.
>
> What I don't understand is why the trigger doesn't appear to be executed when
> it's defined on a child table.  I'll add a RAISE NOTICE to the trigger
> function to makes sure it's not getting called.

Still not sure what is going on, but I do have a question based on this
statement from your original post:

"The modification date must be updated if any row is modified in any way."

If that is the case shouldn't the trigger also cover UPDATE?

>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Issues with patitionning and triggers

От
Samuel Gilbert
Дата:
> "The modification date must be updated if any row is modified in any way."
>
> If that is the case shouldn't the trigger also cover UPDATE?

You completely right about that!  I actually have both configured, but I
focused only on the INSERT to try keep the length of my post as short as
possible.

As Tom Lane pointed out, it's hard to get help without a complete self-
contained example.  I will work on writing that up tomorrow.

Cheers!


On 2014-02-18 15:02:41 Adrian Klaver wrote:
> On 02/18/2014 02:42 PM, Samuel Gilbert wrote:
> > On 2014-02-18 14:25:59 Adrian Klaver wrote:
> >> On 02/18/2014 02:10 PM, Samuel Gilbert wrote:
> >>> I have data warehousing DB 2 fairly big tables : one contains about 200
> >>> million rows and the other one contains about 4 billion rows.  Some
> >>> queries
> >>> are now taking way too long to run (> 13 hours).  I need to get these
> >>> queries to run in an hour or so.  The slowdown was gradual, but I
> >>> eventually hit a wall, when the planner stopped using indexes.
> >>>
> >>>
> >>> The other issue I'm encountering is that I also have very simple BEFORE
> >>> UPDATE and BEFORE INSERT triggers that set the modification date on
> >>> every
> >>> single row
> >>>
> >>>
> >>> CREATE FUNCTION set_modificationDate() RETURNS TRIGGER AS $$
> >>> BEGIN
> >>>
> >>>      NEW.modificationDate := now();
> >>>      RETURN NEW;
> >>>
> >>> END;
> >>> $$ LANGUAGE 'plpgsql';
> >>>
> >>> The modification date must be updated if any row is modified in any way.
> >>> I
> >>> first tried to define the triggers on the parent table.  This worked,
> >>> but
> >>> I
> >>> realized that if a queries targets explicitly a child table, it could
> >>> modify a row without the date being updated.  I therefore dropped the
> >>> triggers on the parent table and defined them for every child.  To my
> >>> great surprise, the insert below failed with a message saying that NULLs
> >>> are not allowed in the modificationdate column.
> >>>
> >>> INSERT INTO observation
> >>> (dataset, station, method, startdate, duration, value)
> >>> VALUES (9, 2, 128, '2014-01-01 00:00:00', '24 hours', 42);
> >>>
> >>> Why isn't the BEFORE INSERT trigger on the child table being executed?
> >>
> >> Constraints are checked before triggers are run.
> >>
> >>> Cheers,
> >>> Samuel Gilbert
> >
> > I don't think that is the case since I currently have the BEFORE INSERT
> > trigger working on the non-partitioned version of the table.
>
> Sorry for steering you wrong. I could have sworn I saw the behavior I
> mentioned, previously, when trying to do what you have done.
>
> > The modificationdate field has a NOT NULL constraint.  Even if I
> > explicitly
> > provide a NULL for the modificationdate column, a date gets written in the
> > table.  This leads me to believe that the BEFORE INSERT trigger is really
> > executed before the constraint is checked.
> >
> > What I don't understand is why the trigger doesn't appear to be executed
> > when it's defined on a child table.  I'll add a RAISE NOTICE to the
> > trigger function to makes sure it's not getting called.
>
> Still not sure what is going on, but I do have a question based on this
> statement from your original post:
>
> "The modification date must be updated if any row is modified in any way."
>
> If that is the case shouldn't the trigger also cover UPDATE?


Re: Issues with patitionning and triggers

От
Samuel Gilbert
Дата:
On 2014-02-18 17:59:35 Tom Lane wrote:
> Samuel Gilbert <samuel.gilbert@ec.gc.ca> writes:
> > All of this was done on PostgreSQL 9.2.0 64-bit compiled from the official
>
> > source.  Significant changes in postgresql.conf :
> Why in the world are you using 9.2.0?  You're missing a year and a half
> worth of bug fixes, some of them quite serious.

Yes.  I know and I bear the pain and shame not running at least the latest
revision of the 9.2 branch.  Unfortunately, it's hard to get my manager to
view the update of software that "just works" as something to prioritize.

The good news is that your reply is a good argument to do so!  :)

Cheers!

>
> > INSERT ... RETURNING does not work with partitioned tables, since the
> > trigger function on the parent that dispatches new rows to the children
> > tables, must return NULL.  If the trigger function on the parent ends
> > with "RETURN NEW", INSERT ... RETURNING works, but new rows are
> > duplicated; they are inserted both in the parent and child tables.
> >
> > Is there a way to make INSERT ... RETURNING work without duplicating the
> > rows?
> Fraid not --- it only shows what got inserted into the parent table, which
> is nothing if you're using this technique.
>
> > The modification date must be updated if any row is modified in any way.
> > I
> > first tried to define the triggers on the parent table.  This worked, but
> > I
> > realized that if a queries targets explicitly a child table, it could
> > modify a row without the date being updated.  I therefore dropped the
> > triggers on the parent table and defined them for every child.  To my
> > great surprise, the insert below failed with a message saying that NULLs
> > are not allowed in the modificationdate column.
>
> You'd have to provide a self-contained example for anyone to help you with
> that.  The most obvious explanation is that you forgot to attach the
> trigger to the specific child table ...
>
>             regards, tom lane


Re: Issues with patitionning and triggers

От
Samuel Gilbert
Дата:
Hello everyone,

Here is a complete example demonstrating the issue I am encountering :

CREATE TABLE parent (
   split INTEGER NOT NULL,
   happiness INTEGER NOT NULL,
   modificationDate TIMESTAMP NOT NULL
);

CREATE TABLE child_split1 ( CHECK (split = 1) ) INHERITS (parent);
CREATE TABLE child_split2 ( CHECK (split = 2) ) INHERITS (parent);

CREATE OR REPLACE FUNCTION parent_dispatcher_trigger()
RETURNS TRIGGER AS $$
BEGIN
   CASE NEW.split
      WHEN 1 THEN INSERT INTO child_split1 VALUES (NEW.*);
      WHEN 2 THEN INSERT INTO child_split2 VALUES (NEW.*);
      ELSE RAISE EXCEPTION 'Partition for % does not exist!',NEW.split;
   END CASE;
   RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_insert_00
   BEFORE INSERT ON parent
   FOR EACH ROW EXECUTE PROCEDURE parent_dispatcher_trigger();

CREATE OR REPLACE FUNCTION set_modificationDate_debug() RETURNS TRIGGER AS $$
BEGIN
   NEW.modificationDate := now();
   RAISE NOTICE 'NEW row is now : (%, %, %)',
      NEW.split, NEW.happiness, NEW.modificationDate;
   RETURN NEW;
END;
$$ LANGUAGE 'plpgsql';

CREATE TRIGGER trigger_insert_00
   BEFORE INSERT ON child_split1
   FOR EACH ROW EXECUTE PROCEDURE set_modificationdate_debug();

CREATE TRIGGER trigger_insert_00
   BEFORE INSERT ON child_split2
   FOR EACH ROW EXECUTE PROCEDURE set_modificationdate_debug();

INSERT INTO parent (split, happiness) VALUES (1, 42);

-- NOTICE:  NEW row is now : (1, 42, 2014-02-19 16:31:07.384151)
-- CONTEXT:  SQL statement "INSERT INTO child_split1 VALUES (NEW.*)"
-- PL/pgSQL function parent_dispatcher_trigger() line 4 at SQL statement
-- ERROR:  null value in column "modificationdate" violates not-null
constraint
-- DETAIL:  Failing row contains (1, 42, null).

INSERT INTO child_split1 (split, happiness) VALUES (1, 42);

-- NOTICE:  NEW row is now : (1, 42, 2014-02-19 16:37:27.134194)
-- INSERT 0 1


So, we clearly see that trigger_insert_00 is called in both cases.  I don't
understand why the query fails on the parent, but works when the INSERT
targets the child table directly.

Regards,
Samuel Gilbert


On 2014-02-18 23:16:31 Samuel Gilbert wrote:
> On 2014-02-18 17:59:35 Tom Lane wrote:
> > Samuel Gilbert <samuel.gilbert@ec.gc.ca> writes:
> > > All of this was done on PostgreSQL 9.2.0 64-bit compiled from the
> > > official
> >
> > > source.  Significant changes in postgresql.conf :
> > Why in the world are you using 9.2.0?  You're missing a year and a half
> > worth of bug fixes, some of them quite serious.
>
> Yes.  I know and I bear the pain and shame not running at least the latest
> revision of the 9.2 branch.  Unfortunately, it's hard to get my manager to
> view the update of software that "just works" as something to prioritize.
>
> The good news is that your reply is a good argument to do so!  :)
>
> Cheers!
>
> > > INSERT ... RETURNING does not work with partitioned tables, since the
> > > trigger function on the parent that dispatches new rows to the children
> > > tables, must return NULL.  If the trigger function on the parent ends
> > > with "RETURN NEW", INSERT ... RETURNING works, but new rows are
> > > duplicated; they are inserted both in the parent and child tables.
> > >
> > > Is there a way to make INSERT ... RETURNING work without duplicating the
> > > rows?
> >
> > Fraid not --- it only shows what got inserted into the parent table, which
> > is nothing if you're using this technique.
> >
> > > The modification date must be updated if any row is modified in any way.
> > > I
> > > first tried to define the triggers on the parent table.  This worked,
> > > but
> > > I
> > > realized that if a queries targets explicitly a child table, it could
> > > modify a row without the date being updated.  I therefore dropped the
> > > triggers on the parent table and defined them for every child.  To my
> > > great surprise, the insert below failed with a message saying that NULLs
> > > are not allowed in the modificationdate column.
> >
> > You'd have to provide a self-contained example for anyone to help you with
> > that.  The most obvious explanation is that you forgot to attach the
> > trigger to the specific child table ...
> >
> >             regards, tom lane


Re: Issues with patitionning and triggers

От
Tom Lane
Дата:
Samuel Gilbert <samuel.gilbert@ec.gc.ca> writes:
> Here is a complete example demonstrating the issue I am encountering :

What seems to be happening is:

1. The parent table has a NOT NULL constraint on modificationDate.

2. parent_dispatcher_trigger does RETURN NEW, so the tuple will be
inserted into the parent table as well as the selected child (presumably
this is not what you wanted ...)

3. There is no trigger that will replace the modificationDate for an
insertion into the parent.  So the parent insertion fails, after all
the triggers have run.

            regards, tom lane