Re: ERROR: Cannot insert a duplicate key into unique index pg_class_relname_nsp_index

Поиск
Список
Период
Сортировка
От Kragen Sitaker
Тема Re: ERROR: Cannot insert a duplicate key into unique index pg_class_relname_nsp_index
Дата
Msg-id 20040109164039.A11165@fs.corp.airwave.com
обсуждение исходный текст
Ответ на Re: ERROR: Cannot insert a duplicate key into unique index pg_class_relname_nsp_index  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: ERROR: Cannot insert a duplicate key into unique index pg_class_relname_nsp_index  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
On Fri, Jan 09, 2004 at 06:19:00PM -0500, Tom Lane wrote:
> Kragen Sitaker <kragen+pgsql@airwave.com> writes:
> > ERROR:  Cannot insert a duplicate key into unique index pg_class_relname_nsp_index
> > We've been getting this error in our application every once in a while
> > --- typically once an hour to once a day, although it varies over time.
>
> This seems to me that it must indicate a collision on name+schema of the
> temp table.  Now that ought to be impossible :-(

Those were my first thoughts too :)

> --- you should get errors earlier than this if you were actually
> creating a duplicately named temp table, and the design for selecting
> nonconflicting temp schemas seems pretty bulletproof to me too.

Sure.  We thought maybe we had two instances of the daemons running at
once, but we tried that, and couldn't make the error happen every time.

It's worth mentioning that the daemon will often run for hours before
dying with this error.  Then, when it comes back up a few seconds later,
it's likely to fail again immediately, but it's even more likely to run
without a problem for hours more.

> > We started seeing this error on November 22, three days after we migrated
> > from Postgres 7.2.3 and 7.3.2 to 7.3.4.  We still see the error with
> > 7.3.5, but not with 7.4.0.
>
> Hmm.  I'm not aware of any 7.4 bug fix that would affect such a thing,
> so I wouldn't want to bet that 7.4 has really solved the issue.

I'm glad to know that.

> Digging in the 7.3.2-to-7.3.4 change logs, I see one potentially
> relevant change:
>
> 2003-02-06 20:33  tgl
>
>     * src/: backend/catalog/dependency.c, backend/catalog/namespace.c,
>     include/catalog/dependency.h (REL7_3_STABLE): Revise mechanism for
>     getting rid of temp tables at backend shutdown.  Instead of
>     grovelling through pg_class to find them, make use of the handy
>     dandy dependency mechanism: just delete everything that depends on
>     our temp schema.  Unlike the pg_class scan, the dependency
>     mechanism is smart enough to delete things in an order that doesn't
>     fall foul of any dependency restrictions.  Fixes problem reported
>     by David Heggie: a temp table with a serial column may cause a
>     backend FATAL exit at shutdown time, if it chances to try to delete
>     the temp sequence first.
>
> Now this change also exists in 7.4, but perhaps it is malfunctioning in
> 7.3.*.  Or maybe you just haven't stressed the 7.4 installation enough
> to reproduce the problem there --- what do you think are the odds of
> that?

It's possible.  We've re-downgraded that development machine to 7.3.4
to experiment with other ways of solving the problem, and it looks like
our nightly backup script didn't work last night, so I can't inspect
the state of the database that didn't manifest the problems with 7.4.
It's possible it might have had less stuff in it :(

We'll run the experiment again.  Should we try 7.3.3 too?

> Given that you're explicitly dropping the temp table before exit, it's
> not clear how a problem in this code could cause you grief anyway.

Well, it's possible the daemon could have gotten killed while it was
inside the transaction, followed shortly by a shutdown of postgres ---
a dozen times or more --- and during development, we frequently kill
the daemon so that it will restart with new code.  For our application,
we shut down and restart Postgres every night because it seems to make
VACUUM FULL work better.

> But it's the only starting point I can see.  You might try adding some
> monitoring code to see if you can detect temp tables being left around
> by exited backends.

Something like this?
foobar=> select count(*), pg_class.relnamespace group by relnamespace;
 count | relnamespace
-------+--------------
   106 |           11
    70 |           99
   147 |         2200
(3 rows)
foobar=> select oid, * from pg_namespace;
   oid   |  nspname   | nspowner | nspacl
---------+------------+----------+--------
      11 | pg_catalog |        1 | {=U}
      99 | pg_toast   |        1 | {=}
    2200 | public     |        1 | {=UC}
   16765 | pg_temp_1  |        1 |
   17593 | pg_temp_2  |        1 |
   17647 | pg_temp_15 |        1 |
   20278 | pg_temp_16 |        1 |
 1570284 | pg_temp_32 |        1 |
(8 rows)

I wonder why those old namespaces are left around?  A new one shows up
whenever I kill and restart the daemon that creates the temporary tables.

We could run this code periodically to see when new namespaces pop up.

> > Creating and destroying a temporary table with a lot of attributes every
> > second is causing other problems, too; the indices on the pg_attribute
> > table grow to hundreds of megabytes in size,
>
> Right.  7.4 should fix that though.

Great!

> > and for some reason,
> > reindexing our system tables corrupts our database.
>
> That seems suspicious as well.  What happens exactly?  How did you get
> out of it??

I don't remember what happens exactly.  One of us will try this again
this weekend or early next week to get more details.  All my coworkers
can remember is that PostgreSQL complained about "something about
the heap".

We never deployed that code to any customer sites, so we recovered from
it by wiping the data directory on the development machines that we had
that problem on.

> > Do you have any suggestions for how to diagnose this unpredictable
> > performance?
>
> I think you have more urgent things to worry about.  Like finding why it
> doesn't work reliably.

Well, if we didn't have the unpredictable performance, we wouldn't have
been creating the temporary table in the first place, which is the only
thing that exposed this problem for us.  PostgreSQL otherwise works great,
rock-solidly reliably; we've lost data to disk failure, flaky hardware,
filesystem corruption due to power failure on ext2fs, and human error,
in hundreds of thousands of hours of production operation, but never
yet to Postgres.

BTW, we're using the 7.3.4 PGDG RPMs with an extra patch to add
pg_autovacuum.

Thank you very much.

-Kragen
--
Very clever, young man.  But if you use turtles for RPC, you'll have a
very, very slow RPC system.  -- Jacqueline Arnold

В списке pgsql-general по дате отправления:

Предыдущее
От: "Nigel J. Andrews"
Дата:
Сообщение: Re: no space left on device
Следующее
От: Kragen Sitaker
Дата:
Сообщение: Re: ERROR: Cannot insert a duplicate key into unique index pg_class_relname_nsp_index