Обсуждение: CREATE DATABASE/DROP DATABASE race conditions

Поиск
Список
Период
Сортировка

CREATE DATABASE/DROP DATABASE race conditions

От
Florian Weimer
Дата:
I'm observing weird behavior with CREATE DATABASE/DROP DATABASE and
PostgreSQL 7.2.1 (on Debian/unstable): There seems to be a short time
period after the 'Z' response of the backend during which the database
continues to exist (e.g. subsequent CREATE DATABASE operations with
the same name fail) or does not exist yet (e.g. it is not possible to
connect to the database).

You cannot reproduce this with "psql" or "createdb"/"dropdb".  In the
first case, the operations appear to be properly serialized, in the
second case, the process creation overhead prevents triggering this
race condition.  (I observe these with a test suite for a client
interface library.)

This might be a kernel or file system quirk (I'm using XFS 1.1).  Is
this a known issue, or shall I try to come up with a C test case?

--
Florian Weimer                       Weimer@CERT.Uni-Stuttgart.DE
University of Stuttgart           http://CERT.Uni-Stuttgart.DE/people/fw/
RUS-CERT                          fax +49-711-685-5898

Re: CREATE DATABASE/DROP DATABASE race conditions

От
Tom Lane
Дата:
Florian Weimer <Weimer@CERT.Uni-Stuttgart.DE> writes:
> I'm observing weird behavior with CREATE DATABASE/DROP DATABASE and
> PostgreSQL 7.2.1 (on Debian/unstable): There seems to be a short time
> period after the 'Z' response of the backend during which the database
> continues to exist (e.g. subsequent CREATE DATABASE operations with
> the same name fail) or does not exist yet (e.g. it is not possible to
> connect to the database).

> You cannot reproduce this with "psql" or "createdb"/"dropdb".

I'm inclined to think it's a bug in your application coding, then.
psql certainly doesn't go out of its way to serialize operations.

Could we see a self-contained test case?

            regards, tom lane

Re: CREATE DATABASE/DROP DATABASE race conditions

От
Florian Weimer
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

>> You cannot reproduce this with "psql" or "createdb"/"dropdb".
>
> I'm inclined to think it's a bug in your application coding, then.

Well, sort of.

> psql certainly doesn't go out of its way to serialize operations.

Well, unless you play around with \connect, it uses a single
connection, which makes a difference.  Using \connect, I can reproduce
the underlying effect:

\connect template1 fw
CREATE DATABASE aaa;
\connect aaa fw;
CREATE TABLE a (a text);
\connect template1 fw
DROP DATABASE aaa;

sometimes results in:

You are now connected to database template1 as user fw.
CREATE DATABASE aaa;
CREATE DATABASE
You are now connected to database aaa as user fw.
CREATE TABLE a (a text);
CREATE
You are now connected to database template1 as user fw.
DROP DATABASE aaa;
psql:/tmp/t.sql:6: ERROR:  DROP DATABASE: database "aaa" is being accessed by other users

There was indeed a glitch in my test code which didn't retry often
enough (or pause long enough), so the wasn't created reliably.  I
think I've misinterpreted the results.

The behavior shown above using psql is annoying, but has to be
expected.  The standard appraoch to connection termination in the
front end/back end protocol is asynchronous, so front ends cannot know
when the back end process has actually released all locks (or what you
want to call it) on the template1 database.

Hmm, I think I can live with some exponential backoff algorithm.

--
Florian Weimer                       Weimer@CERT.Uni-Stuttgart.DE
University of Stuttgart           http://CERT.Uni-Stuttgart.DE/people/fw/
RUS-CERT                          fax +49-711-685-5898

Re: CREATE DATABASE/DROP DATABASE race conditions

От
Tom Lane
Дата:
Florian Weimer <Weimer@CERT.Uni-Stuttgart.DE> writes:
> \connect template1 fw
> CREATE DATABASE aaa;
> \connect aaa fw;
> CREATE TABLE a (a text);
> \connect template1 fw
> DROP DATABASE aaa;

> sometimes results in:

> You are now connected to database template1 as user fw.
> CREATE DATABASE aaa;
> CREATE DATABASE
> You are now connected to database aaa as user fw.
> CREATE TABLE a (a text);
> CREATE
> You are now connected to database template1 as user fw.
> DROP DATABASE aaa;
> psql:/tmp/t.sql:6: ERROR:  DROP DATABASE: database "aaa" is being accessed by other users

Well, this is not what you asserted to begin with.  The reason the above
fails is that it takes a nonzero amount of time for a backend to exit
after it detects client disconnect.  The "other user" being complained
of is simply your own old backend that had been used for the CREATE
TABLE command.  (It doesn't help any that psql doesn't drop the old
connection till it's successfully established a new one; so the normal
backend startup time doesn't offer any offsetting delay in this
scenario.)

It might be possible to tweak the FE/BE protocol to allow this to be
handled more carefully.  Right now the normal case is

    client sends X
    client closes connection

                    backend closes connection

                    backend cleans up

but we could do

    client sends X

    client waits to see EOF

                    backend cleans up

                    backend closes connection

    client closes connection

The backend code change would be trivial (instead of explicitly closing
the socket, just let it be closed by the kernel when the process exits).
The frontend change would be a little less trivial, and would be wanted
only by a few clients anyway.  Not sure how to handle that; maybe create
a variant version of PQfinish ...

            regards, tom lane