Обсуждение: reindexdb hangs

Поиск
Список
Период
Сортировка

reindexdb hangs

От
"dx k9"
Дата:
Hi,

What would cause or what should I do with a table that hangs during its
reindexing?  I can see in the messages the last table to reindex, so
sequentially the next table must be the problem.
I have statement_timeout set for 6 hours, then I get "canceling statement
due to statement timeout"  Ok, that's fine, and I don't think setting the
statement_timeout to 0 will really help.  The question is why did it hang or
what are some reasons that would make a table hang a reindexing?   And is
there a solution?

At 1:45 AM the reindexing was going fine.  Then, it just hung up --  &
finally cancelled.

Thanks for any ideas.  My experience has been that this happens sometimes
then the next reindexdb -a , everything reindexes fine.

~DJK

_________________________________________________________________
Interest Rates NEAR 39yr LOWS!  $430,000 Mortgage for $1,299/mo - Calculate
new payment
http://www.lowermybills.com/lre/index.jsp?sourceid=lmb-9632-19132&moid=14888


Re: reindexdb hangs

От
Scott Marlowe
Дата:
On Wed, 2007-05-02 at 09:50, dx k9 wrote:
> Hi,
>
> What would cause or what should I do with a table that hangs during its
> reindexing?  I can see in the messages the last table to reindex, so
> sequentially the next table must be the problem.
> I have statement_timeout set for 6 hours, then I get "canceling statement
> due to statement timeout"  Ok, that's fine, and I don't think setting the
> statement_timeout to 0 will really help.  The question is why did it hang or
> what are some reasons that would make a table hang a reindexing?   And is
> there a solution?

The real question is did it hang?  If the machine was still working on
reindexing then it wasn't hung, it was just taking longer than you
wanted it to.  Which can also be a problem, but isn't the same thing as
hanging, and as it's a different problem, would have a different
solution than if it is hanging.

So, what does vmstat / top / iostat have to say about this "hang"?  Was
it really hanging, or just taking a long time?

Re: reindexdb hangs

От
"dx k9"
Дата:
Thanks for the response Scott.

It turns out it was a temporary database and temporary table, that just
wasn't there maybe it thought it was there from some type of snapshot then
the next minute it was gone.

Too bad we can't come up with a -e parameter for exclude say,
Something like this would be nice
reindexdb -a -e -p 4444 -e tempdb template1

The way it is now, I just have a 76 line script with each -d database .  I
also have to keep it up to date on my own, when they add a new or remove
some database in the cluster.

~DjK

_________________________________________________________________
Download Messenger. Join the i�m Initiative. Help make a difference today.
http://im.live.com/messenger/im/home/?source=TAGHM_APR07


Re: reindexdb hangs

От
Scott Marlowe
Дата:
On Wed, 2007-05-02 at 10:49, dx k9 wrote:
> Thanks for the response Scott.
>
> It turns out it was a temporary database and temporary table, that just
> wasn't there maybe it thought it was there from some type of snapshot then
> the next minute it was gone.
>
> Too bad we can't come up with a -e parameter for exclude say,
> Something like this would be nice
> reindexdb -a -e -p 4444 -e tempdb template1
>
> The way it is now, I just have a 76 line script with each -d database .  I
> also have to keep it up to date on my own, when they add a new or remove
> some database in the cluster.

Had a similar problem at my last job.  I wrote a bash script that used
psql to get a list of databases and update the file that listed all the
dbs to go through.  We also used that list to do backups of each db
individually.

Re: reindexdb hangs

От
Tom Lane
Дата:
"dx k9" <bitsandbytes88@hotmail.com> writes:
> [ stuck reindex ]
> It turns out it was a temporary database and temporary table, that just
> wasn't there maybe it thought it was there from some type of snapshot then
> the next minute it was gone.

Hmm, there is not any filter in ReindexDatabase() to exclude temp tables
of other backends, but it sure seems like there needs to be.  CLUSTER
might have the same issue.  I think we fixed this in VACUUM long ago,
but we need to check the other commands that grovel over all of a database.

            regards, tom lane

Re: reindexdb hangs

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> "dx k9" <bitsandbytes88@hotmail.com> writes:
> > [ stuck reindex ]
> > It turns out it was a temporary database and temporary table, that just
> > wasn't there maybe it thought it was there from some type of snapshot then
> > the next minute it was gone.
>
> Hmm, there is not any filter in ReindexDatabase() to exclude temp tables
> of other backends, but it sure seems like there needs to be.  CLUSTER
> might have the same issue.  I think we fixed this in VACUUM long ago,
> but we need to check the other commands that grovel over all of a database.

Was this ever fixed?  I think it wasn't, because I don't see any check
in ReindexDatabase.  Here is a patch to add one.

I examined cluster.c and it does seem to be missing a check too.  I'm
not sure where to add one though; the best choice would be the place
where the list of rels is built, but that scans only pg_index, so it
doesn't have access to the namespace of each rel.  So one idea would be
to get the pg_class row for each candidate, but that seems slow.
Another idea would be to just add all the candidates and silently skip
the temp indexes in cluster_rel.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Вложения

Re: reindexdb hangs

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> Hmm, there is not any filter in ReindexDatabase() to exclude temp tables
>> of other backends, but it sure seems like there needs to be.  CLUSTER
>> might have the same issue.  I think we fixed this in VACUUM long ago,
>> but we need to check the other commands that grovel over all of a database.

> Was this ever fixed?  I think it wasn't, because I don't see any check
> in ReindexDatabase.  Here is a patch to add one.

No, that's still on the todo list.  Your patch looks reasonable.

> I examined cluster.c and it does seem to be missing a check too.  I'm
> not sure where to add one though; the best choice would be the place
> where the list of rels is built, but that scans only pg_index, so it
> doesn't have access to the namespace of each rel.  So one idea would be
> to get the pg_class row for each candidate, but that seems slow.
> Another idea would be to just add all the candidates and silently skip
> the temp indexes in cluster_rel.

Yeah, an extra fetch of the pg_class row doesn't seem all that nice.
I think you'd want to check it in approximately the same two places
where pg_class_ownercheck() is applied (one for the 1-xact and one for
the multi-xact path).

Are there any other commands to be worried about?  I can't see any
besides VACUUM/ANALYZE, and those seem covered.

            regards, tom lane

Re: reindexdb hangs

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> > I examined cluster.c and it does seem to be missing a check too.  I'm
> > not sure where to add one though; the best choice would be the place
> > where the list of rels is built, but that scans only pg_index, so it
> > doesn't have access to the namespace of each rel.  So one idea would be
> > to get the pg_class row for each candidate, but that seems slow.
> > Another idea would be to just add all the candidates and silently skip
> > the temp indexes in cluster_rel.
>
> Yeah, an extra fetch of the pg_class row doesn't seem all that nice.
> I think you'd want to check it in approximately the same two places
> where pg_class_ownercheck() is applied (one for the 1-xact and one for
> the multi-xact path).

Actually, the 1-xact path does not need it, because the check is already
elsewhere.  We only need logic enough to skip temp tables silently while
building the list in the multi-xact path.  What this means is that very
few people, if any, clusters temp tables; because as soon as you do, a
plain CLUSTER command in another session errors out with

alvherre=# cluster;
ERROR:  cannot cluster temporary tables of other sessions

So, patch attached.

> Are there any other commands to be worried about?  I can't see any
> besides VACUUM/ANALYZE, and those seem covered.

I can't think of any.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Вложения

Re: reindexdb hangs

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> Yeah, an extra fetch of the pg_class row doesn't seem all that nice.
>> I think you'd want to check it in approximately the same two places
>> where pg_class_ownercheck() is applied (one for the 1-xact and one for
>> the multi-xact path).

> Actually, the 1-xact path does not need it, because the check is already
> elsewhere.

Yeah, but if you do it there it's one added comparison
(isOtherTempNamespace is very cheap, and you can get the namespace
cheaply from the already-open rel).  This way you need an extra syscache
lookup because you are insisting on doing the check in a place where you
don't have easy access to the pg_class row.  Doesn't seem better.

            regards, tom lane

Re: reindexdb hangs

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> Yeah, an extra fetch of the pg_class row doesn't seem all that nice.
> >> I think you'd want to check it in approximately the same two places
> >> where pg_class_ownercheck() is applied (one for the 1-xact and one for
> >> the multi-xact path).
>
> > Actually, the 1-xact path does not need it, because the check is already
> > elsewhere.
>
> Yeah, but if you do it there it's one added comparison
> (isOtherTempNamespace is very cheap, and you can get the namespace
> cheaply from the already-open rel).  This way you need an extra syscache
> lookup because you are insisting on doing the check in a place where you
> don't have easy access to the pg_class row.  Doesn't seem better.

I'm not sure I follow.  Are you suggesting adding a new function,
similar to pg_class_ownercheck, which additionally checks for temp-ness?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: reindexdb hangs

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I'm not sure I follow.  Are you suggesting adding a new function,
> similar to pg_class_ownercheck, which additionally checks for temp-ness?

No, I was just suggesting adding the check for temp-ness in cluster()
and cluster_rel() where we do pg_class_ownercheck.  We already have the
rel open there and so it's cheap to do the temp-ness check.

I guess it's a question of which path you are more concerned about
making cheap, of course.  Your proposal was to filter before putting the
rel into the list at all, which certainly saves cycles when we reject a
remote temp table; but it adds cycles for all other tables.  I argue
that the remote-temp-table case is uncommon (else we'd have had many
more trouble reports) and therefore optimizing it at the cost of the
normal case isn't a win.

            regards, tom lane

Re: reindexdb hangs

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I'm not sure I follow.  Are you suggesting adding a new function,
> > similar to pg_class_ownercheck, which additionally checks for temp-ness?
>
> No, I was just suggesting adding the check for temp-ness in cluster()
> and cluster_rel() where we do pg_class_ownercheck.  We already have the
> rel open there and so it's cheap to do the temp-ness check.

I applied a patch along these lines to HEAD and 8.2.

I am unsure if I should backpatch to 8.1: the code in cluster.c has
changed, and while it is relatively easy to modify the patch, this is a
rare bug and nobody has reported it in CLUSTER (not many people clusters
temp tables, it seems).  Should I patch only REINDEX?  How far back?

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)

Re: reindexdb hangs

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I am unsure if I should backpatch to 8.1: the code in cluster.c has
> changed, and while it is relatively easy to modify the patch, this is a
> rare bug and nobody has reported it in CLUSTER (not many people clusters
> temp tables, it seems).  Should I patch only REINDEX?  How far back?

I'd say go as far back as you can conveniently modify the patch for.
This is a potential data-loss bug (even if only for temporary data)
so we ought to take it seriously.

            regards, tom lane

Re: reindexdb hangs

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I am unsure if I should backpatch to 8.1: the code in cluster.c has
> > changed, and while it is relatively easy to modify the patch, this is a
> > rare bug and nobody has reported it in CLUSTER (not many people clusters
> > temp tables, it seems).  Should I patch only REINDEX?  How far back?
>
> I'd say go as far back as you can conveniently modify the patch for.
> This is a potential data-loss bug (even if only for temporary data)
> so we ought to take it seriously.

OK, I fixed it all the way back that it was needed: 7.4 for CLUSTER and
8.1 for REINDEX.

Before 7.4 there wasn't a database-wide version of CLUSTER.  This wasn't
a very serious bug in any case, because it would have thrown an ERROR if
it tried to cluster a remote temp table.  So apparently no one ever saw
it, because I can't remember any report about it.


For REINDEX the story is very different, because what happens is that
queries start silently returning wrong data.  My test case was

alvherre=# create temp table foo (a int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY creará el índice implícito «foo_pkey» para la tabla «foo»
CREATE TABLE
alvherre=# insert into foo select * from generate_series(1,40000);
INSERT 0 40000

-- "reindex database alvherre" in another session

alvherre=# select * from foo where a = 400;
  a
-----
(0 rows)

If you now REINDEX this table in the current session, it correctly
returns one tuple.  So if somebody is executing a procedure which
involve temp tables and someone else concurrently does a REINDEX
DATABASE, the queries of the first session are automatically corrupted.
It seems like the worst kind of bug.

Maybe we need additional protections on the bufmgr to prevent this kind
of problem.

We only introduced REINDEX DATABASE as a way to reindex user indexes in
8.1.  Before that, it only considered system catalogs, which ISTM are
never temp.

Many thanks to dx k9 for the original report.

--
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"Endurecerse, pero jamás perder la ternura" (E. Guevara)