Обсуждение: [GENERAL] Error that shouldn't happen?
Hello Everyone,
I've run into a strange error on the PostgreSQL 9.5.4 DB we use for our Zabbix Server. I implemented auto-partitioning based on the design from this wiki article: https://www.zabbix.org/wiki/Docs/howto/zabbix2_postgresql_autopartitioning
I implemented auto-partitioning for the history_uint table using the following trigger function:
CREATE FUNCTION zbx_part_trigger_func() RETURNS trigger
LANGUAGE plpgsql
AS $_$
DECLARE
prefix text := 'partitions';
timeformat text;
selector text;
_interval interval;
tablename text;
startdate text;
enddate text;
create_table_part text;
create_index_part text;
BEGIN
selector = TG_ARGV[0];
IF selector = 'hour' THEN
timeformat := 'YYYY_MM_DD_HH24';
ELSIF selector = 'day' THEN
timeformat := 'YYYY_MM_DD';
ELSIF selector = 'month' THEN
timeformat := 'YYYY_MM';
ELSE
RAISE EXCEPTION 'zbx_part_trigger_func: Specify "hour", "day", or "month" for interval selector instead of "%"', selector;
END IF;
_interval := '1 ' || selector;
tablename := TG_TABLE_NAME || '_p' || to_char(to_timestamp(NEW.clock), timeformat);
EXECUTE 'INSERT INTO ' || quote_ident(prefix) || '.' || quote_ident(tablename) || ' SELECT ($1).*' USING NEW;
RETURN NULL;
EXCEPTION
WHEN undefined_table THEN
startdate := extract(epoch FROM date_trunc(selector, to_timestamp(NEW.clock)));
enddate := extract(epoch FROM date_trunc(selector, to_timestamp(NEW.clock) + _interval ));
create_table_part := 'CREATE TABLE IF NOT EXISTS ' || quote_ident(prefix) || '.' || quote_ident(tablename)
|| ' (CHECK ((clock >= ' || quote_literal(startdate)
|| ' AND clock < ' || quote_literal(enddate)
|| '))) INHERITS (' || TG_TABLE_NAME || ')';
create_index_part := 'CREATE INDEX ' || quote_ident(tablename)
|| '_1 on ' || quote_ident(prefix) || '.' || quote_ident(tablename) || '(itemid,clock)';
EXECUTE create_table_part;
EXECUTE create_index_part;
--insert it again
EXECUTE 'INSERT INTO ' || quote_ident(prefix) || '.' || quote_ident(tablename) || ' SELECT ($1).*' USING NEW;
RETURN NULL;
END;
$_$;
With this trigger (no other triggers defined):
zbx_partition_trg BEFORE INSERT ON history_uint FOR EACH ROW EXECUTE PROCEDURE zbx_part_trigger_func('day');
I had fully expected race conditions to occur on a very busy system and throw errors trying to create the table, but instead I got the following index creation error:
ERROR: relation "history_uint_p2017_05_17_1" already exists
CONTEXT: SQL statement "CREATE INDEX history_uint_p2017_05_17_1 on partitions.history_uint_p2017_05_17(itemid,clock)"
PL/pgSQL function zbx_part_trigger_func() line 43 at EXECUTE
STATEMENT: insert into history_uint (itemid,clock,ns,value) values (73800,1494979201,11841804,99382669312),(30061,1494979201,17605067,0);
I am unable to figure out how the trigger was able to successfully create the table, but then fail creating the index. I would have expected one thread to "win" and create both the table and index, but other threads would fail when creating the table… but NOT when creating the index.
The only other function defined in the system is the "cleanup" function which was not running at the time.
The target table and index were still created.
Can anyone shed any light on how this could have occurred? Is this a bug or am I missing something?
Pertinent details:
· PostgreSQL 9.5.4 installed from PGDG packages on Centos 7.3.1611
· Zabbix 3.2 server
Thanks,
Rob Brucks
I am unable to figure out how the trigger was able to successfully create the table, but then fail creating the index. I would have expected one thread to "win" and create both the table and index, but other threads would fail when creating the table… but NOT when creating the index.
According to this post, adding "if not exists" won't really help for race conditions.
"The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend to
handle concurrency issues any better than regular old CREATE TABLE,
which is to say not very well." - Robert Haas
It still doesn't explain how the function got past creating the table, but failed on the index. If another thread was also creating the table then there should have been lock contention on the create table statement.
Thanks,
Rob
From: "David G. Johnston" <david.g.johnston@gmail.com>
Date: Thursday, May 18, 2017 at 3:05 PM
To: Rob Brucks <rob.brucks@rackspace.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] Error that shouldn't happen?
On Thu, May 18, 2017 at 12:48 PM, Rob Brucks <rob.brucks@rackspace.com> wrote:
I am unable to figure out how the trigger was able to successfully create the table, but then fail creating the index. I would have expected one thread to "win" and create both the table and index, but other threads would fail when creating the table… but NOT when creating the index.
I don't fully comprehend the locking involved here but if you want a failure while creating the table you shouldn't use "IF NOT EXISTS". On the other side adding "IF NOT EXISTS" to the CREATE INDEX will supposedly prevent the error you are seeing.
The trigger that failed to create the index also failed to create the table - it just didn't care because of the IF NOT EXISTS. At least this is what I am observing from your description.
David J.
According to this post, adding "if not exists" won't really help for race conditions.
"The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend to
handle concurrency issues any better than regular old CREATE TABLE,
which is to say not very well." - Robert Haas
It still doesn't explain how the function got past creating the table, but failed on the index. If another thread was also creating the table then there should have been lock contention on the create table statement.
Thanks.
I can code an exception block to handle the table problem, and probably one for the index collision too.
My point is how did two concurrent threads successfully create the same table? That had to have happened if one of the threads hit a duplicate index error.
It almost seems like Postgres skipped checking for duplicate tables due to some timing issue. I don't want my DB to ending up hosed by something like that.
Thanks,
Rob
From: "David G. Johnston" <david.g.johnston@gmail.com>
Date: Thursday, May 18, 2017 at 3:31 PM
To: Rob Brucks <rob.brucks@rackspace.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] Error that shouldn't happen?
On Thu, May 18, 2017 at 1:18 PM, Rob Brucks <rob.brucks@rackspace.com> wrote:
According to this post, adding "if not exists" won't really help for race conditions.
"The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend to
handle concurrency issues any better than regular old CREATE TABLE,
which is to say not very well." - Robert Haas
It still doesn't explain how the function got past creating the table, but failed on the index. If another thread was also creating the table then there should have been lock contention on the create table statement.
AT1: Insert, failed, cannot find table
AT2: Insert, failed, cannot find table
BT2: Create Table, succeeds
BT1: Create Table; fails, it exists now, if exists converts to a warning
CT2: Create Index, succeeds
CT1: Create Index, fails , hard error
DT2: Insert, succeeds
DT1: Never Happens
What that post seems to be describing is that it is possible the "BT1" actually hard errors instead of just being converted into a notice. There is no statement visible action to show that interleave but there is an underlying race condition since both BT1 and BT2 are executing concurrently.
In short even with IF NOT EXISTS you are not guaranteed to not fail. But at least IF NOT EXISTS makes the probability of not failing > 0. It doesn't handle the concurrency any better - but it does change the outcome in some of those less-than-ideally handled situations.
David J.
According to this post, adding "if not exists" won't really help for race conditions.
"The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend to
handle concurrency issues any better than regular old CREATE TABLE,
which is to say not very well." - Robert Haas
It still doesn't explain how the function got past creating the table, but failed on the index. If another thread was also creating the table then there should have been lock contention on the create table statement.
AT1: Insert, failed, cannot find tableAT2: Insert, failed, cannot find tableBT2: Create Table, succeedsBT1: Create Table; fails, it exists now, if exists converts to a warningCT2: Create Index, succeedsCT1: Create Index, fails , hard errorDT2: Insert, succeedsDT1: Never HappensWhat that post seems to be describing is that it is possible the "BT1" actually hard errors instead of just being converted into a notice. There is no statement visible action to show that interleave but there is an underlying race condition since both BT1 and BT2 are executing concurrently.In short even with IF NOT EXISTS you are not guaranteed to not fail. But at least IF NOT EXISTS makes the probability of not failing > 0. It doesn't handle the concurrency any better - but it does change the outcome in some of those less-than-ideally handled situations.David J.
--
'If at first you dont succeed, dont take up skydiving.'
My point is how did two concurrent threads successfully create the same table?
Thanks.
I can code an exception block to handle the table problem, and probably one for the index collision too.
My point is how did two concurrent threads successfully create the same table? That had to have happened if one of the threads hit a duplicate index error.
It almost seems like Postgres skipped checking for duplicate tables due to some timing issue. I don't want my DB to ending up hosed by something like that.
Thanks,
Rob
From: "David G. Johnston" <david.g.johnston@gmail.com>
Date: Thursday, May 18, 2017 at 3:31 PM
To: Rob Brucks <rob.brucks@rackspace.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] Error that shouldn't happen?
On Thu, May 18, 2017 at 1:18 PM, Rob Brucks <rob.brucks@rackspace.com> wrote:
According to this post, adding "if not exists" won't really help for race conditions.
"The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend to
handle concurrency issues any better than regular old CREATE TABLE,
which is to say not very well." - Robert Haas
It still doesn't explain how the function got past creating the table, but failed on the index. If another thread was also creating the table then there should have been lock contention on the create table statement.
AT1: Insert, failed, cannot find table
AT2: Insert, failed, cannot find table
BT2: Create Table, succeeds
BT1: Create Table; fails, it exists now, if exists converts to a warning
CT2: Create Index, succeeds
CT1: Create Index, fails , hard error
DT2: Insert, succeeds
DT1: Never Happens
What that post seems to be describing is that it is possible the "BT1" actually hard errors instead of just being converted into a notice. There is no statement visible action to show that interleave but there is an underlying race condition since both BT1 and BT2 are executing concurrently.
In short even with IF NOT EXISTS you are not guaranteed to not fail. But at least IF NOT EXISTS makes the probability of not failing > 0. It doesn't handle the concurrency any better - but it does change the outcome in some of those less-than-ideally handled situations.
David J.
On 5/18/2017 1:40 PM, Andrew Kerber wrote: > It appears to me you might be making this a lot more difficult than > necessary. Why not just pre-create the required partitions daily or > weekly or monthly? Or do you have a requirement that a new partition > only be created the first time it is required? +1 we create new partitions in advance of their being needed as part of a maintenance process that's strictly single threaded. -- john r pierce, recycling bits in santa cruz
On 5/18/2017 1:40 PM, Andrew Kerber wrote:It appears to me you might be making this a lot more difficult than necessary. Why not just pre-create the required partitions daily or weekly or monthly? Or do you have a requirement that a new partition only be created the first time it is required?
+1
we create new partitions in advance of their being needed as part of a maintenance process that's strictly single threaded.
On 2017-05-18 21:48, Rob Brucks wrote: > Hello Everyone, > > I am unable to figure out how the trigger was able to successfully > create the table, but then fail creating the index. I would have > expected one thread to "win" and create both the table and index, but > other threads would fail when creating the table… but NOT when > creating the index. First, I agree whole heartedly with the other's suggestions to "not do this". Create a cronjob of whatever that prepares the required tables before you need them, empty tables are cheap. Second: IF EXISTS only tells you that an object exists and is ready for use. So what happens when a process is in the middle of creating that object? Does IF EXISTS tell you it exists or not? What you need (accepting that this whole trigger based approach is probably not the best option) is a proper locking mechanism. A "thundering herd" protection. The first time the trigger is triggered it should set a lock (n advisory lock for example) that subsequent calls to the same trigger can lok at to see if the table they need is being created at that time, so they will skip the create commands and *WAIT* for the first process to complete before using the table. That *WaIT* is important, and also something you probably don't want, especially if you have a busy database.