Обсуждение: reindexdb hangs
Hi, What would cause or what should I do with a table that hangs during its reindexing? I can see in the messages the last table to reindex, so sequentially the next table must be the problem. I have statement_timeout set for 6 hours, then I get "canceling statement due to statement timeout" Ok, that's fine, and I don't think setting the statement_timeout to 0 will really help. The question is why did it hang or what are some reasons that would make a table hang a reindexing? And is there a solution? At 1:45 AM the reindexing was going fine. Then, it just hung up -- & finally cancelled. Thanks for any ideas. My experience has been that this happens sometimes then the next reindexdb -a , everything reindexes fine. ~DJK _________________________________________________________________ Interest Rates NEAR 39yr LOWS! $430,000 Mortgage for $1,299/mo - Calculate new payment http://www.lowermybills.com/lre/index.jsp?sourceid=lmb-9632-19132&moid=14888
On Wed, 2007-05-02 at 09:50, dx k9 wrote: > Hi, > > What would cause or what should I do with a table that hangs during its > reindexing? I can see in the messages the last table to reindex, so > sequentially the next table must be the problem. > I have statement_timeout set for 6 hours, then I get "canceling statement > due to statement timeout" Ok, that's fine, and I don't think setting the > statement_timeout to 0 will really help. The question is why did it hang or > what are some reasons that would make a table hang a reindexing? And is > there a solution? The real question is did it hang? If the machine was still working on reindexing then it wasn't hung, it was just taking longer than you wanted it to. Which can also be a problem, but isn't the same thing as hanging, and as it's a different problem, would have a different solution than if it is hanging. So, what does vmstat / top / iostat have to say about this "hang"? Was it really hanging, or just taking a long time?
Thanks for the response Scott. It turns out it was a temporary database and temporary table, that just wasn't there maybe it thought it was there from some type of snapshot then the next minute it was gone. Too bad we can't come up with a -e parameter for exclude say, Something like this would be nice reindexdb -a -e -p 4444 -e tempdb template1 The way it is now, I just have a 76 line script with each -d database . I also have to keep it up to date on my own, when they add a new or remove some database in the cluster. ~DjK _________________________________________________________________ Download Messenger. Join the i�m Initiative. Help make a difference today. http://im.live.com/messenger/im/home/?source=TAGHM_APR07
On Wed, 2007-05-02 at 10:49, dx k9 wrote: > Thanks for the response Scott. > > It turns out it was a temporary database and temporary table, that just > wasn't there maybe it thought it was there from some type of snapshot then > the next minute it was gone. > > Too bad we can't come up with a -e parameter for exclude say, > Something like this would be nice > reindexdb -a -e -p 4444 -e tempdb template1 > > The way it is now, I just have a 76 line script with each -d database . I > also have to keep it up to date on my own, when they add a new or remove > some database in the cluster. Had a similar problem at my last job. I wrote a bash script that used psql to get a list of databases and update the file that listed all the dbs to go through. We also used that list to do backups of each db individually.
"dx k9" <bitsandbytes88@hotmail.com> writes: > [ stuck reindex ] > It turns out it was a temporary database and temporary table, that just > wasn't there maybe it thought it was there from some type of snapshot then > the next minute it was gone. Hmm, there is not any filter in ReindexDatabase() to exclude temp tables of other backends, but it sure seems like there needs to be. CLUSTER might have the same issue. I think we fixed this in VACUUM long ago, but we need to check the other commands that grovel over all of a database. regards, tom lane
Tom Lane wrote: > "dx k9" <bitsandbytes88@hotmail.com> writes: > > [ stuck reindex ] > > It turns out it was a temporary database and temporary table, that just > > wasn't there maybe it thought it was there from some type of snapshot then > > the next minute it was gone. > > Hmm, there is not any filter in ReindexDatabase() to exclude temp tables > of other backends, but it sure seems like there needs to be. CLUSTER > might have the same issue. I think we fixed this in VACUUM long ago, > but we need to check the other commands that grovel over all of a database. Was this ever fixed? I think it wasn't, because I don't see any check in ReindexDatabase. Here is a patch to add one. I examined cluster.c and it does seem to be missing a check too. I'm not sure where to add one though; the best choice would be the place where the list of rels is built, but that scans only pg_index, so it doesn't have access to the namespace of each rel. So one idea would be to get the pg_class row for each candidate, but that seems slow. Another idea would be to just add all the candidates and silently skip the temp indexes in cluster_rel. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Вложения
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Hmm, there is not any filter in ReindexDatabase() to exclude temp tables >> of other backends, but it sure seems like there needs to be. CLUSTER >> might have the same issue. I think we fixed this in VACUUM long ago, >> but we need to check the other commands that grovel over all of a database. > Was this ever fixed? I think it wasn't, because I don't see any check > in ReindexDatabase. Here is a patch to add one. No, that's still on the todo list. Your patch looks reasonable. > I examined cluster.c and it does seem to be missing a check too. I'm > not sure where to add one though; the best choice would be the place > where the list of rels is built, but that scans only pg_index, so it > doesn't have access to the namespace of each rel. So one idea would be > to get the pg_class row for each candidate, but that seems slow. > Another idea would be to just add all the candidates and silently skip > the temp indexes in cluster_rel. Yeah, an extra fetch of the pg_class row doesn't seem all that nice. I think you'd want to check it in approximately the same two places where pg_class_ownercheck() is applied (one for the 1-xact and one for the multi-xact path). Are there any other commands to be worried about? I can't see any besides VACUUM/ANALYZE, and those seem covered. regards, tom lane
Tom Lane wrote: > > I examined cluster.c and it does seem to be missing a check too. I'm > > not sure where to add one though; the best choice would be the place > > where the list of rels is built, but that scans only pg_index, so it > > doesn't have access to the namespace of each rel. So one idea would be > > to get the pg_class row for each candidate, but that seems slow. > > Another idea would be to just add all the candidates and silently skip > > the temp indexes in cluster_rel. > > Yeah, an extra fetch of the pg_class row doesn't seem all that nice. > I think you'd want to check it in approximately the same two places > where pg_class_ownercheck() is applied (one for the 1-xact and one for > the multi-xact path). Actually, the 1-xact path does not need it, because the check is already elsewhere. We only need logic enough to skip temp tables silently while building the list in the multi-xact path. What this means is that very few people, if any, clusters temp tables; because as soon as you do, a plain CLUSTER command in another session errors out with alvherre=# cluster; ERROR: cannot cluster temporary tables of other sessions So, patch attached. > Are there any other commands to be worried about? I can't see any > besides VACUUM/ANALYZE, and those seem covered. I can't think of any. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Вложения
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Yeah, an extra fetch of the pg_class row doesn't seem all that nice. >> I think you'd want to check it in approximately the same two places >> where pg_class_ownercheck() is applied (one for the 1-xact and one for >> the multi-xact path). > Actually, the 1-xact path does not need it, because the check is already > elsewhere. Yeah, but if you do it there it's one added comparison (isOtherTempNamespace is very cheap, and you can get the namespace cheaply from the already-open rel). This way you need an extra syscache lookup because you are insisting on doing the check in a place where you don't have easy access to the pg_class row. Doesn't seem better. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> Yeah, an extra fetch of the pg_class row doesn't seem all that nice. > >> I think you'd want to check it in approximately the same two places > >> where pg_class_ownercheck() is applied (one for the 1-xact and one for > >> the multi-xact path). > > > Actually, the 1-xact path does not need it, because the check is already > > elsewhere. > > Yeah, but if you do it there it's one added comparison > (isOtherTempNamespace is very cheap, and you can get the namespace > cheaply from the already-open rel). This way you need an extra syscache > lookup because you are insisting on doing the check in a place where you > don't have easy access to the pg_class row. Doesn't seem better. I'm not sure I follow. Are you suggesting adding a new function, similar to pg_class_ownercheck, which additionally checks for temp-ness? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > I'm not sure I follow. Are you suggesting adding a new function, > similar to pg_class_ownercheck, which additionally checks for temp-ness? No, I was just suggesting adding the check for temp-ness in cluster() and cluster_rel() where we do pg_class_ownercheck. We already have the rel open there and so it's cheap to do the temp-ness check. I guess it's a question of which path you are more concerned about making cheap, of course. Your proposal was to filter before putting the rel into the list at all, which certainly saves cycles when we reject a remote temp table; but it adds cycles for all other tables. I argue that the remote-temp-table case is uncommon (else we'd have had many more trouble reports) and therefore optimizing it at the cost of the normal case isn't a win. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I'm not sure I follow. Are you suggesting adding a new function, > > similar to pg_class_ownercheck, which additionally checks for temp-ness? > > No, I was just suggesting adding the check for temp-ness in cluster() > and cluster_rel() where we do pg_class_ownercheck. We already have the > rel open there and so it's cheap to do the temp-ness check. I applied a patch along these lines to HEAD and 8.2. I am unsure if I should backpatch to 8.1: the code in cluster.c has changed, and while it is relatively easy to modify the patch, this is a rare bug and nobody has reported it in CLUSTER (not many people clusters temp tables, it seems). Should I patch only REINDEX? How far back? -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake)
Alvaro Herrera <alvherre@commandprompt.com> writes: > I am unsure if I should backpatch to 8.1: the code in cluster.c has > changed, and while it is relatively easy to modify the patch, this is a > rare bug and nobody has reported it in CLUSTER (not many people clusters > temp tables, it seems). Should I patch only REINDEX? How far back? I'd say go as far back as you can conveniently modify the patch for. This is a potential data-loss bug (even if only for temporary data) so we ought to take it seriously. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I am unsure if I should backpatch to 8.1: the code in cluster.c has > > changed, and while it is relatively easy to modify the patch, this is a > > rare bug and nobody has reported it in CLUSTER (not many people clusters > > temp tables, it seems). Should I patch only REINDEX? How far back? > > I'd say go as far back as you can conveniently modify the patch for. > This is a potential data-loss bug (even if only for temporary data) > so we ought to take it seriously. OK, I fixed it all the way back that it was needed: 7.4 for CLUSTER and 8.1 for REINDEX. Before 7.4 there wasn't a database-wide version of CLUSTER. This wasn't a very serious bug in any case, because it would have thrown an ERROR if it tried to cluster a remote temp table. So apparently no one ever saw it, because I can't remember any report about it. For REINDEX the story is very different, because what happens is that queries start silently returning wrong data. My test case was alvherre=# create temp table foo (a int primary key); NOTICE: CREATE TABLE / PRIMARY KEY creará el índice implícito «foo_pkey» para la tabla «foo» CREATE TABLE alvherre=# insert into foo select * from generate_series(1,40000); INSERT 0 40000 -- "reindex database alvherre" in another session alvherre=# select * from foo where a = 400; a ----- (0 rows) If you now REINDEX this table in the current session, it correctly returns one tuple. So if somebody is executing a procedure which involve temp tables and someone else concurrently does a REINDEX DATABASE, the queries of the first session are automatically corrupted. It seems like the worst kind of bug. Maybe we need additional protections on the bufmgr to prevent this kind of problem. We only introduced REINDEX DATABASE as a way to reindex user indexes in 8.1. Before that, it only considered system catalogs, which ISTM are never temp. Many thanks to dx k9 for the original report. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "Endurecerse, pero jamás perder la ternura" (E. Guevara)