Обсуждение: I'm puzzled by a foreign key constraint problem

Поиск
Список
Период
Сортировка

I'm puzzled by a foreign key constraint problem

От
Jonathan Guthrie
Дата:
I've been tearing my hair out over this one issue and I'm hoping that
someone on this list will have an insight on the matter that will shed
some light on why the system is doing what it's doing.

I have a database with a number of tables, two of which are projects and
resources.  We also have a users table and tables for files and folders
and such.  Files, folders, users, and so forth are considered resources
and their identifiers (UUID's) are listed in the resources table.
Entries in the resources table are used to do things like attach
metadata to resources and to set permissions to allow users other than
the owner to access specified resources.  Because permissions are
associated with resources, there is a foreign key constraint between the
resourceid field in the permissions table and the resourceid column of
the resource table, which is the primary key of the resource table.

The system itself is written in C++ and database wrappers have been
created which work with the original database (MS SQL Server) and we are
porting to Postgres.  The stored procedures (250 or so of them) have
been converted into Postgres functions using a tool called SQLWays.  I'm
using named prepared procedures for each of the Postgres functions that
are called, plus three, "BEGIN", "COMMIT", and "ROLLBACK", which consist
of the named operation.  Those three are necessary because we're
returning data in cursors, so each time one of those functions is
executed it's done inside a transaction and a status code is returned
which determines whether the transaction is committed or rolled back.

When I create a project, entries in the project table and the resource
table are created in a single function.  Then, separate functions are
called to set the owner's access to the new project.  These other
functions are failing because of the resourceid foreign key constraint.

Now, I've turned the logging as far up as it will go, and I seem to have
verified that the first transaction, the one that populates the resource
table, is completed and committed before any calls are made to the
function that sets the permissions value.  (It's called five times and
the all fail.)  The logging generated by the foreign key constraint
failure includes the UUID for the resource and I can go back in later
and do a query for that record in the resource table and that query
successfully returns a single row.  Also, I can go in at a later time
and execute the function that sets the permissions and it works.

To me, this implies that it's a race condition.  When I first ran into
this problem last week I set the foreign key check to deferred and it
seemed to stop complaining although I don't remember what steps (if any)
I did to test that conclusion.  In any case, it happens with the check
set to deferred, so that didn't fix the problem.  I don't think it
should have had an effect anyway.  I spent the afternoon reading
documentation about constraints and keys and various options associated
with them

Anyway, I need for these operations to succeed because the lack of
permissions causes odd problems in other parts of the system.  I don't
even know where to begin looking for the problem or what magic might be
useful at allowing the system to work as intended.  I'm sure I'm doing
something wrong, but I don't know what.  Does this problem ring a bell
with anyone?  Is there any information that isn't clear from my
description or which I could gather which would be helpful?



!DSPAM:1544,490fb75c40305259311678!



Re: I'm puzzled by a foreign key constraint problem

От
Craig Ringer
Дата:
Jonathan Guthrie wrote:


> The stored procedures (250 or so of them) have
> been converted into Postgres functions using a tool called SQLWays.  I'm
> using named prepared procedures for each of the Postgres functions that
> are called, plus three, "BEGIN", "COMMIT", and "ROLLBACK", which consist
> of the named operation.

You're talking about prepared statements at the ODBC level, right? Is
this with client-side or with server-side prepare?

PostgreSQL won't actually let you use these in prepared statements at
the SQL level:

craig=> PREPARE teststm AS BEGIN;
ERROR:  syntax error at or near "BEGIN"
LINE 1: PREPARE teststm AS BEGIN;

Nor can you use them in a function:

craig=> CREATE OR REPLACE FUNCTION testfn() RETURNS void AS $$
craig$> BEGIN;
craig$> $$ LANGUAGE 'sql';
CREATE FUNCTION
craig=> SELECT testfn();
ERROR:  BEGIN is not allowed in a SQL function
CONTEXT:  SQL function "testfn" during startup

This makes sense, given that to invoke a function without a containing
transaction is impossible; PostgreSQL will implicitly wrap it in a
transaction that's committed as soon as the statement is executed.

I don't *think* you can use BEGIN etc in prepared statements at the v3
protocol level for the same reasons, but I'm not 100% certain of that.

Given those limitations, I'm assuming you're talking about named
prepared statements on the client side.

> Now, I've turned the logging as far up as it will go, and I seem to have
> verified that the first transaction, the one that populates the resource
> table, is completed and committed before any calls are made to the
> function that sets the permissions value.

What transaction isolation level are you using? If you're on the READ
COMMITTED level, then yes all you should need is for the transaction
that creates the records of interest to commit before another
transaction (even one that was already running) can see the values.

In any case, I'm a little puzzled as to why you're not doing the
creation of the initial records and the related permissions records etc
all in the one transaction.

> Anyway, I need for these operations to succeed because the lack of
> permissions causes odd problems in other parts of the system.

That really shows that you need to do it all in one transaction, then.

--
Craig Ringer

Re: I'm puzzled by a foreign key constraint problem

От
Richard Huxton
Дата:
Jonathan Guthrie wrote:
> When I create a project, entries in the project table and the resource
> table are created in a single function.  Then, separate functions are
> called to set the owner's access to the new project.  These other
> functions are failing because of the resourceid foreign key constraint.

Have you turned statement logging on? Your message suggests that's the
case, but didn't say so explicitly.

Are the two steps:
  1. Create project, resource
  2. Set access-rights
done in separate connections by any chance? If so it's possible (due to
MVCC) that #2 is still seeing the database as it was before #1 committed.

> Anyway, I need for these operations to succeed because the lack of
> permissions causes odd problems in other parts of the system.

If you want both steps to succeed or fail together though, they need to
be in the same transaction.

--
  Richard Huxton
  Archonet Ltd

Re: I'm puzzled by a foreign key constraint problem

От
Jonathan Guthrie
Дата:
On Tue, 2008-11-04 at 07:49 +0000, Richard Huxton wrote:
> Jonathan Guthrie wrote:
> > When I create a project, entries in the project table and the resource
> > table are created in a single function.  Then, separate functions are
> > called to set the owner's access to the new project.  These other
> > functions are failing because of the resourceid foreign key constraint.
>
> Have you turned statement logging on? Your message suggests that's the
> case, but didn't say so explicitly.
>
> Are the two steps:
>   1. Create project, resource
>   2. Set access-rights
> done in separate connections by any chance? If so it's possible (due to
> MVCC) that #2 is still seeing the database as it was before #1 committed.

It's possible, likely even.  We use a connection pool to manage
connections to the database and they're doled out as the system sees
fit.  However, at some point every update has to finish such that any
view of the database will see that update as finished, right?

> > Anyway, I need for these operations to succeed because the lack of
> > permissions causes odd problems in other parts of the system.

> If you want both steps to succeed or fail together though, they need to
> be in the same transaction.

That's what Mr Ringer said, and although I understand that answer and I
understand the reason that two people have independently responded with
it, I'm dissatisfied with it.

There are two reasons why I'm dissatisfied with that answer.  First, my
mandate is basically to create an interface layer for Postgres and then
port the SQL Server stored procedures without changing how they work.
If I change the logic in this part, it will be different not only from
the mechanism used in the original SQL Server stored procedure, but also
different from the logic used in other stored procedures that do similar
things.

The second reason is because adding permissions doesn't just happen at
project creation time.  The software I work on is middleware for the
actual client applications and the client can assign any user
permissions to access the project just as soon as it knows the project's
ID, which is one of the values returned by the project creation
function.  If the issue is a difference in views because the requests
come in on different connections, then there's a time window during
which a valid and unanticipatable request from the client could fail if
the request happens to use a connection to communicate with the database
that is different from the one used to create the project.

Anyway, while I agree that adding the logic to set permissions to the
project create function seems the simplest approach to dealing with the
issue, I'd really rather not change the logic until I've thoroughly
explored all other options.  I CAN guarantee that the other operations
on a project definitely won't begin until the create is committed.  So,
is there any way of causing a commit to not return until all the views
are consistent?



Re: I'm puzzled by a foreign key constraint problem

От
"Scott Marlowe"
Дата:
On Tue, Nov 4, 2008 at 11:18 AM, Jonathan Guthrie
<jguthrie@brokersys.com> wrote:
> On Tue, 2008-11-04 at 07:49 +0000, Richard Huxton wrote:
>> Jonathan Guthrie wrote:
>> > When I create a project, entries in the project table and the resource
>> > table are created in a single function.  Then, separate functions are
>> > called to set the owner's access to the new project.  These other
>> > functions are failing because of the resourceid foreign key constraint.
>>
>> Have you turned statement logging on? Your message suggests that's the
>> case, but didn't say so explicitly.
>>
>> Are the two steps:
>>   1. Create project, resource
>>   2. Set access-rights
>> done in separate connections by any chance? If so it's possible (due to
>> MVCC) that #2 is still seeing the database as it was before #1 committed.
>
> It's possible, likely even.  We use a connection pool to manage
> connections to the database and they're doled out as the system sees
> fit.  However, at some point every update has to finish such that any
> view of the database will see that update as finished, right?

Sure.  But, if the query to add the permissions is running under an
already active transaction, and you're running in serializable mode,
it can't see the changes because it started before they were
committed.


>> > Anyway, I need for these operations to succeed because the lack of
>> > permissions causes odd problems in other parts of the system.
>
>> If you want both steps to succeed or fail together though, they need to
>> be in the same transaction.
>
> That's what Mr Ringer said, and although I understand that answer and I
> understand the reason that two people have independently responded with
> it, I'm dissatisfied with it.

Mr Ringer was right.  So was Jonathan.

> There are two reasons why I'm dissatisfied with that answer.  First, my
> mandate is basically to create an interface layer for Postgres and then
> port the SQL Server stored procedures without changing how they work.

Even if they're broken?

> If I change the logic in this part, it will be different not only from
> the mechanism used in the original SQL Server stored procedure, but also
> different from the logic used in other stored procedures that do similar
> things.

Then they might be broken and need fixing as well.  Blindly converting
broken code that just happened to work is not the best way to approach
a project.

> The second reason is because adding permissions doesn't just happen at
> project creation time.

Is there some reason you can call the external permission setting
function from within the function that adds the users?  This would
seem the simplest and cleanest solution, since they would then
automatically share a transaction.  Other operations could still call
the permissions setting function as they used to.

Re: I'm puzzled by a foreign key constraint problem

От
Richard Huxton
Дата:
Jonathan Guthrie wrote:
> On Tue, 2008-11-04 at 07:49 +0000, Richard Huxton wrote:
>> Jonathan Guthrie wrote:
>>> When I create a project, entries in the project table and the resource
>>> table are created in a single function.  Then, separate functions are
>>> called to set the owner's access to the new project.  These other
>>> functions are failing because of the resourceid foreign key constraint.
>> Have you turned statement logging on? Your message suggests that's the
>> case, but didn't say so explicitly.
>>
>> Are the two steps:
>>   1. Create project, resource
>>   2. Set access-rights
>> done in separate connections by any chance? If so it's possible (due to
>> MVCC) that #2 is still seeing the database as it was before #1 committed.
>
> It's possible, likely even.  We use a connection pool to manage
> connections to the database and they're doled out as the system sees
> fit.  However, at some point every update has to finish such that any
> view of the database will see that update as finished, right?

You'll need to read the section of the manuals regarding transaction
isolation and how it impacts MVCC for full details, but the short answer
is "no". A pre-existing transaction might well see the database as it
was when its snapshot was first taken. More likely to happen if you have
a connection pool that issues BEGINs too early...

>>> Anyway, I need for these operations to succeed because the lack of
>>> permissions causes odd problems in other parts of the system.
>
>> If you want both steps to succeed or fail together though, they need to
>> be in the same transaction.
>
> That's what Mr Ringer said, and although I understand that answer and I
> understand the reason that two people have independently responded with
> it, I'm dissatisfied with it.
>
> There are two reasons why I'm dissatisfied with that answer.  First, my
> mandate is basically to create an interface layer for Postgres and then
> port the SQL Server stored procedures without changing how they work.
> If I change the logic in this part, it will be different not only from
> the mechanism used in the original SQL Server stored procedure, but also
> different from the logic used in other stored procedures that do similar
> things.

The logic is wrong regardless of whether you use PostgreSQL, SQL Server,
Oracle or any other DB though. If you want a guarantee that both actions
succeed or fail together you'll need to wrap them in a transaction. What
you're saying is that at the moment there is no such guarantee with SQL
Server as your database, it just happens to work most (e.g. 99.99%) of
the time.

> The second reason is because adding permissions doesn't just happen at
> project creation time.  The software I work on is middleware for the
> actual client applications and the client can assign any user
> permissions to access the project just as soon as it knows the project's
> ID, which is one of the values returned by the project creation
> function.  If the issue is a difference in views because the requests
> come in on different connections, then there's a time window during
> which a valid and unanticipatable request from the client could fail if
> the request happens to use a connection to communicate with the database
> that is different from the one used to create the project.

This is separate from the issue of both actions succeeding or failing.

> Anyway, while I agree that adding the logic to set permissions to the
> project create function seems the simplest approach to dealing with the
> issue, I'd really rather not change the logic until I've thoroughly
> explored all other options.  I CAN guarantee that the other operations
> on a project definitely won't begin until the create is committed.  So,
> is there any way of causing a commit to not return until all the views
> are consistent?

It doesn't. They are. But I think your second connection is fixed to an
older snapshot.  Set aside an hour, read through the concurrency control
/ transaction-isolation section of the manuals and experiment with two
psql screens open at the same time until you're clear how it all works.
It'll probably take 5 mins to find the problem then (in consunction with
statment logging turned on at the server side).

I might be wrong about the cause, but since (1) foreign-keys work in PG,
(2) you seem to know what you're doing, I'm guessing it's a combination
of the subtleties of mvcc and your connection-pool interacting.

--
  Richard Huxton
  Archonet Ltd


Re: I'm puzzled by a foreign key constraint problem

От
Craig Ringer
Дата:
Jonathan Guthrie wrote:

> It's possible, likely even.  We use a connection pool to manage
> connections to the database and they're doled out as the system sees
> fit.  However, at some point every update has to finish such that any
> view of the database will see that update as finished, right?

So that any /new/ snapshot of the state of the database sees it, yes.

If you have long-lived transactions at the SERIALIZABLE isolation level,
they won't (by design) see changes committed by other transactions after
the first statement in the SERIALIZABLE transaction has been issued.

See the manual for more information on concurrency, MVCC, and
transaction isolation.

> There are two reasons why I'm dissatisfied with that answer.  First, my
> mandate is basically to create an interface layer for Postgres and then
> port the SQL Server stored procedures without changing how they work.

This is unlikely to be possible. The two databases are extremely
different in some important ways.

In particular, MS SQL Server uses a locking approach to concurrency,
whereas PostgreSQL uses an multi-version approach (more like Oracle).
You should usually be able to make the locking approach work in
PostgreSQL, but there *will* be differences you need to think about in
the way procedures interact when run concurrently. You may need to add
more explicit locking to correct for assumptions that're valid under MS
SQL Server but not under PostgreSQL, or adjust your logic to exploit
multi-versioning properly instead. You will get much better performance
if you adapt your code to the MVCC model instead of trying to stick to
using locking for concurrency control.

It doesn't help that PostgreSQL does not at present support true stored
procedures. There is no top-level CALLable procedure support; instead
PostgreSQL has very powerful functions. The most important difference
this makes is that you CAN NOT perform transaction control operations
(BEGIN, ROLLBACK, COMMIT) within any procedural function in PostgreSQL.
They are inherently wrapped in a transaction. You *can* RAISE EXCEPTION
from PL/PgSQL to trigger a rollback (unless the caller traps and handles
the exception), but there's no way to force a commit or begin a new and
distinct transaction.

OK, that's not absolutely 100% true. You can do it with dblink. You just
don't want to.

Anyway, if your MS SQL server stored procedures expect to be able to
BEGIN a transaction, do some work, COMMIT it, then BEGIN another and do
some more work before COMMITTING that second piece of work, you're going
to have to do some redesign.

> The second reason is because adding permissions doesn't just happen at
> project creation time.

That's fine. Nothing stops you from issuing something like:

BEGIN;
SELECT create_it(blah);
SELECT set_permissions(blah, perms);
COMMIT;

and later using:

SELECT set_permissions(blah, otherperms);

standalone or inside another, unrelated transaction.

The point is that if your initial create and the setting of the initial
permissions must succeed or fail together, they MUST be done within a
single transaction. That is, in fact, the fundamental point of database
transactions.

What you should avoid doing is:

TRANSACTION 1                 TRANSACTION 2

BEGIN;
                              BEGIN;
SELECT create_it(blah);
                              SELECT set_permissions(blah, perms);
COMMIT;
                              COMMIT;

... because that just won't work. It sounds like you've got that right,
but you might be doing this:

TRANSACTION 1                 TRANSACTION 2

BEGIN;
                              BEGIN;
                              SET transaction_isolation = SERIALIZABLE;
                              -- do something else that triggers
                              -- freezing of the transaction's snapshot,
                              -- even something like:
                              SELECT 1;
SELECT create_it(blah);
COMMIT;
                              SELECT set_permissions(blah, perms);
                              COMMIT;

... which will also fail.

> The software I work on is middleware for the
> actual client applications and the client can assign any user
> permissions to access the project just as soon as it knows the project's
> ID, which is one of the values returned by the project creation
> function.

When the project's creation function returns, the created project is not
yet visible to other transactions, even ones begun after the function
returns. It only becomes visible after the transaction in which the
create function was called COMMITs.

This is fine if your later manipulations of the permissions etc happen
within the same transaction as the initial create (as they should).
However, if you're trying to refer to the created record from another
transaction before the one that created the record has committed, it
won't yet be visible.

Furthermore, if the transaction trying to refer to the record created by
some other transaction is at the SERIALIZABLE isolation level, it won't
be able to see the created object at all if the first statement in the
transaction was issued before the record of interest was created. It's
likely that READ COMMITTED is what you want anyway, but you really
/must/ read the documentation on transaction isolation levels etc to
understand and be sure about what you need.

> If the issue is a difference in views because the requests
> come in on different connections, then there's a time window during
> which a valid and unanticipatable request from the client could fail if
> the request happens to use a connection to communicate with the database
> that is different from the one used to create the project.

It has nothing to do with which connection is used. It's all about the
transactions involved; a READ COMMITTED or SERIALIZABLE transaction
cannot see another's dirty (uncomitted) changes, and a SERIALIZABLE
transaction cannot see committed changes newer than its own start time.

> Anyway, while I agree that adding the logic to set permissions to the
> project create function seems the simplest approach to dealing with the
> issue, I'd really rather not change the logic until I've thoroughly
> explored all other options.  I CAN guarantee that the other operations
> on a project definitely won't begin until the create is committed.

OK, then you should be having no issues if you're using the READ
COMMITTED isolation level, or if the transaction that performs those
other operations begins after the transaction that created the project
committed.

>  So,
> is there any way of causing a commit to not return until all the views
> are consistent?

They're always internally consistent; a transaction's view of the
database state is never contaminated by uncommitted data, and changes
happen atomically between statements as far as the transaction is
concerned (if it's in READ COMMITTED mode) or never become visible at
all (in SERIALIZABLE mode).

Different concurrent transaction's views of the database state are not
DESIGNED to be consistent when compared with each other externally by
some side channel. Two READ COMMITTED transactions that have no dirty
changes will see the same view of the database, but two SERIALIZABLE
transactions started at different times won't. Nor will a SERIALIZABLE
transaction compared to a READ COMMITTED transaction when changes have
been committed after the SERIALIZABLE transaction's snapshot was taken.
And, of course, even two READ COMMITTED transactions don't see the same
view of the database state if either or both of them have made changes.

--
Craig Ringer

Re: I'm puzzled by a foreign key constraint problem

От
Jonathan Guthrie
Дата:
On Wed, 2008-11-05 at 04:40 +0900, Craig Ringer wrote:
> The point is that if your initial create and the setting of the initial
> permissions must succeed or fail together, they MUST be done within a
> single transaction. That is, in fact, the fundamental point of database
> transactions.

I understand that.  Honestly, I do.  If I hadn't ever said that odd
things happen when the permissions aren't set, then maybe I could find
out what I'm doing wrong.

> What you should avoid doing is:
>
> TRANSACTION 1                 TRANSACTION 2
>
> BEGIN;
>                               BEGIN;
> SELECT create_it(blah);
>                               SELECT set_permissions(blah, perms);
> COMMIT;
>                               COMMIT;
>
> ... because that just won't work. It sounds like you've got that right,
> but you might be doing this:

> TRANSACTION 1                 TRANSACTION 2
>
> BEGIN;
>                               BEGIN;
>                               SET transaction_isolation = SERIALIZABLE;
>                               -- do something else that triggers
>                               -- freezing of the transaction's snapshot,
>                               -- even something like:
>                               SELECT 1;
> SELECT create_it(blah);
> COMMIT;
>                               SELECT set_permissions(blah, perms);
>                               COMMIT;
>
> ... which will also fail.

The thing is, the C++ code does this

BEGIN transaction 1
INSERT project
COMMIT

BEGIN transaction 2
SET permissions
COMMIT

or, at least, it's supposed to.  Those two operations are not supposed
to overlap at all even if they're on two different connections.  I
thought I had verified this by looking at the log file.  I mean, I can
look at the log file and see things like

2008-11-03 16:29:22 CST DEBUG:  00000: StartTransactionCommand
and
2008-11-03 16:29:22 CST DEBUG:  00000: CommitTransactionCommand

where I would expect them to if what I'm expecting is going on, but the
log file doesn't appear to have enough information to see a transaction
created, proceed, and then end.  That is, how do I know which
transaction was started and which one was committed?

I'm kind of confused by lines like this:

2008-11-03 16:29:22 CST DEBUG:  00000: name: unnamed; blockState:    INPROGRESS; state: INPROGR, xid/subid/cid:
678145/1/4,nestlvl: 1, children: 678146 678147 

Is there an easy explanation somewhere?



Re: I'm puzzled by a foreign key constraint problem

От
Tom Lane
Дата:
Jonathan Guthrie <jguthrie@brokersys.com> writes:
> ... or, at least, it's supposed to.  Those two operations are not supposed
> to overlap at all even if they're on two different connections.  I
> thought I had verified this by looking at the log file.  I mean, I can
> look at the log file and see things like

> 2008-11-03 16:29:22 CST DEBUG:  00000: StartTransactionCommand
> and
> 2008-11-03 16:29:22 CST DEBUG:  00000: CommitTransactionCommand

> where I would expect them to if what I'm expecting is going on, but the
> log file doesn't appear to have enough information to see a transaction
> created, proceed, and then end.  That is, how do I know which
> transaction was started and which one was committed?

You need to add more identification info to your log_line_prefix.
The PID would be the most reliable way to tie those entries together,
but I think there's also an option that writes the transaction ID.

> I'm kind of confused by lines like this:

> 2008-11-03 16:29:22 CST DEBUG:  00000: name: unnamed; blockState:    INPROGRESS; state: INPROGR, xid/subid/cid:
678145/1/4,nestlvl: 1, children: 678146 678147 

> Is there an easy explanation somewhere?

You'd have to look at the source code to figure out most of the
DEBUG-level messages.

            regards, tom lane

Re: I'm puzzled by a foreign key constraint problem

От
Craig Ringer
Дата:
Jonathan Guthrie wrote:

> The thing is, the C++ code does this
>
> BEGIN transaction 1
> INSERT project
> COMMIT
>
> BEGIN transaction 2
> SET permissions
> COMMIT
>
> or, at least, it's supposed to.

OK, and we know that if it is doing what it is supposed to, transaction
2 /must/ see the changes made by transaction 1. Either it's not doing
what it's supposed to, or you've hit a mind bogglingly huge bug in
PostgreSQL that's never been noticed before. I'm inclined to suspect the
former.

Tom Lane pointed out where to go from here re the server logging. It'll
be interesting to see what turns up once you've got new logs that show
the backend pid and the involved xid.

Sorry about hammering on the point re transaction interleaving and so
on. You do see people here who've misunderstood how MVCC visibility
works, and I wasn't sure from your mail that your transactions didn't
overlap.

--
Craig Ringer

Re: I'm no longer puzzled by a foreign key constraint problem

От
Jonathan Guthrie
Дата:
On Wed, 2008-11-05 at 12:14 +0900, Craig Ringer wrote:
> Jonathan Guthrie wrote:
>
> > The thing is, the C++ code does this
> >
> > BEGIN transaction 1
> > INSERT project
> > COMMIT
> >
> > BEGIN transaction 2
> > SET permissions
> > COMMIT
> >
> > or, at least, it's supposed to.
>
> OK, and we know that if it is doing what it is supposed to, transaction
> 2 /must/ see the changes made by transaction 1. Either it's not doing
> what it's supposed to, or you've hit a mind bogglingly huge bug in
> PostgreSQL that's never been noticed before. I'm inclined to suspect the
> former.

As am I.  In fact, I'm rather counting on it, and that's the reason I
posted my question to pgsql-general rather than bugs.  As an aside, I
prefer any problems I run into to be in my code because I can most
readily fix those.

> Tom Lane pointed out where to go from here re the server logging. It'll
> be interesting to see what turns up once you've got new logs that show
> the backend pid and the involved xid.

I'll be keeping Mr. Lane's message for the logging tips.  For my own
part, since people kept making a strong distinction between the READ
COMMITTED transaction isolation level as opposed to the SERIALIZABLE
transaction level, I concluded that one likely explanation is that the
transaction isolation level was not what I thought it was and put an
explicit "ISOLATION LEVEL READ COMMITTED" in the code that issues the
BEGIN.

The problem has since gone away.  Of course, with a problem that isn't
perfectly understood one must guard carefully against the idea that
you've solved a problem that doesn't recur or that the solution is
certain to be what you thought it was.  I'm thinking that there may have
been some weird interaction between the business logic that's built in
to the executable and the database layer code that lives in a shared
object file.  Perhaps it wasn't the code I changed that did it, but the
fact that I had to recompile to apply the change.

> Sorry about hammering on the point re transaction interleaving and so
> on. You do see people here who've misunderstood how MVCC visibility
> works, and I wasn't sure from your mail that your transactions didn't
> overlap.

There's nothing to be sorry for.  I can now see several deficincies in
my original message that hindered rather than helped the communication.
The only thing I can say in my own defense is that it appears likely
that if I had known what sort of things I needed to be certain of, and
certain to mention, in my initial message, then I wouldn't have had to
write the message in the first place.

In any case, my thanks to all of you for your attention and time.  I am
profoundly grateful for it.



Re: I'm no longer puzzled by a foreign key constraint problem

От
Craig Ringer
Дата:
Jonathan Guthrie wrote:

>> Tom Lane pointed out where to go from here re the server logging. It'll
>> be interesting to see what turns up once you've got new logs that show
>> the backend pid and the involved xid.
>
> I'll be keeping Mr. Lane's message for the logging tips.  For my own
> part, since people kept making a strong distinction between the READ
> COMMITTED transaction isolation level as opposed to the SERIALIZABLE
> transaction level, I concluded that one likely explanation is that the
> transaction isolation level was not what I thought it was and put an
> explicit "ISOLATION LEVEL READ COMMITTED" in the code that issues the
> BEGIN.
>
> The problem has since gone away.

This VERY strongly suggests that your transactions do in fact overlap.
You've probably got something like this:


Transaction 1                       Transaction 2
BEGIN;
SELECT schema_version FROM app_info;
SELECT create_the_object();
                                    BEGIN;
                                    SELECT schema_version FROM app_info;

COMMIT;
                                    SELECT set_permissions();
                                    COMMIT;


In READ COMMITTED isolation this would work fine. Transaction 2 would be
able to see the object transaction 1 created because transaction 1 had
been committed before the statement "SELECT set_permissions()" was issued.

In SERIALIZABLE isolation, because the snapshot is frozen when the first
database access occurs in the transaction (in this case "SELECT
schema_version ..." but it could be anything) and that happens BEFORE
transaction 1 commits, transaction 2 cannot see the work done by
transaction 1 even though it's committed.

If you have a highly layered system with connection pools, etc, then it
seems reasonably likely that you're doing something like "disabling
autocommit" (which might BEGIN a transaction and issue a few setup
statements) in the connection setup code for your connection pool.

If this turns out to be the case you should be able to tell because
you'll have connections in the `idle in transaction' state in the output
of `select * from pg_stat_activity'. These will mess with VACUUM's
ability to clean out dead tuples, causing table and index bloat that'll
slow your system down and waste disk space. In the SERIALIZABLE
isolation level it'll also cause visibility problems that'll give you
major headaches.

The best thing to do, though, is configure your logging as Tom Lane
suggested and definitively confirm whether the transactions of interest
do or do not overlap in time. Even if the problem appears to have gone
away, there may be other consequences and it's something you really need
to investigate.

If that does turn out to be the problem, you also need to look at why
you're using the SERIALIZABLE isolation level by default. As per the
(excellent and very strongly recommended, I cannot stress it enough)
documentation, there are some downsides to using SERIALIZABLE isolation,
like the need to be prepared to retry a transaction in case of
serialization failure. For most operations READ COMMITTED is entirely
safe; you just need to think about concurrency when writing your SQL and
avoid "read-modify-write" code (eg SELECT value, add to value, UPDATE
value).

If you're using an ORM layer then all bets are off, since they *love* to
read-modify-write and there's not really any way around it. The ORM
should provide optimistic locking to detect conflicts, allowing you to
still use READ COMMITTED safely. If it doesn't, you're stuck with
SERIALIZABLE and will have to be very, very careful with your connection
pooling, transaction lifetimes, etc. Of course, you should be anyway to
be VACUUM friendly among other things.

> I'm thinking that there may have
> been some weird interaction between the business logic that's built in
> to the executable and the database layer code that lives in a shared
> object file.  Perhaps it wasn't the code I changed that did it, but the
> fact that I had to recompile to apply the change.

Well, remove the change, recompile again, and see if the problem comes
back. If it does, you know you've got an issue in your code, and that's
important to confirm.


--
Craig Ringer