Обсуждение: vacuum analyze
Hi, Ive spent the last 4 days working my butt off trying to find the cause of the seemingly random vacuum analyze crash. Actually Ive been just trying to reproduce it, cos as soon as I added in -ggdb into the compile rules it stopped happening *grrr* (not that Im surprised. It was random at best before, and things like this always hide when you try and look for them). But after 4 days of frustration, I just want to be sure - nobody else has found the problem and solved it have they? I just dont want to waste my time on this if someone else has found the cause... Thanx M Simms
Michael Simms <grim@argh.demon.co.uk> writes: > But after 4 days of frustration, I just want to be sure - nobody else > has found the problem and solved it have they? I just dont want to > waste my time on this if someone else has found the cause... Let's see ... I know that removing pg_vlock while vacuum is running will lead to a coredump after vacuum finishes (it doesn't recover cleanly after its attempt to unlink pg_vlock fails). I think I know how to fix that but it's not done yet. The same problem could affect any error that is detected between vacuum's internal transactions. Do you get any error reports in the postmaster log when there is a crash? Beyond that, I don't recall having heard of any recent fixes that affect vacuum. If you can create a reproducible example then more people could poke at it, so that seems like the avenue to focus on. regards, tom lane
> > Michael Simms <grim@argh.demon.co.uk> writes: > > But after 4 days of frustration, I just want to be sure - nobody else > > has found the problem and solved it have they? I just dont want to > > waste my time on this if someone else has found the cause... > > Let's see ... I know that removing pg_vlock while vacuum is running > will lead to a coredump after vacuum finishes (it doesn't recover > cleanly after its attempt to unlink pg_vlock fails). I think I know > how to fix that but it's not done yet. The same problem could affect > any error that is detected between vacuum's internal transactions. > Do you get any error reports in the postmaster log when there is a > crash? ahem, well, to be honest, Ive never found any documentation on how to read the logs *embarrassed smile*. template1=> select * from pg_log; ERROR: pg_log cannot be accessed by users That happens with any account. It COULD be a problem with that, as I have a crontab process that vacuums everything every 24 hours, but also I perform some minor vacuums in the meantime, some of which may occur when the main vacuum is happening. I didnt notice that as a pattern, but it certainly COULD be that. I'll check into it. > Beyond that, I don't recall having heard of any recent fixes that affect > vacuum. > > If you can create a reproducible example then more people could poke > at it, so that seems like the avenue to focus on. Yup, well, if I could get it to happen *at all* any more, I could poke around, as I am running the backend that is handling the vacuum under gdb. If I find a reproducable way I will certainly report it here. Thanx M Simms
Michael Simms <grim@argh.demon.co.uk> writes: > ahem, well, to be honest, Ive never found any documentation on how to > read the logs *embarrassed smile*. > template1=> select * from pg_log; > ERROR: pg_log cannot be accessed by users No, no, not pg_log. I'm talking about the text file that you've directed the postmaster's stdout and stderr into. (You are doing that and not dropping it on the floor, I trust.) > It COULD be a problem with that, as I have a crontab process that vacuums > everything every 24 hours, but also I perform some minor vacuums in the > meantime, some of which may occur when the main vacuum is happening. pg_vlock exists specifically to prevent two concurrent vacuums. The scenario I was talking about involved removing it by hand, which you wouldn't do unless you were trying to provoke a vacuum error (or, perhaps, cleaning up after a previous vacuum run coredumped). regards, tom lane
Tom Lane wrote: > Let's see ... I know that removing pg_vlock while vacuum is running > will lead to a coredump after vacuum finishes (it doesn't recover > cleanly after its attempt to unlink pg_vlock fails). I think I know > how to fix that but it's not done yet. The same problem could affect > any error that is detected between vacuum's internal transactions. > Do you get any error reports in the postmaster log when there is a > crash? > > Beyond that, I don't recall having heard of any recent fixes that affect > vacuum. > > If you can create a reproducible example then more people could poke > at it, so that seems like the avenue to focus on. > > regards, tom lane Perhaps the bug I reported on pgsql-bugs about a week ago has some relation to this problem: I had been able to reproducibly (?) crash postmaster with my example program (a loop of update table) combined with several vacuum commands in a seperate task. As the sice of the table's index grows a failure almost gets certain. If you think the program might help you, contact me or look into bugs' archives. Regards Christof