>>> On 16.06.2006 at 23:21:21, in message
<200606162121.k5GLLLw13054@candle.pha.pa.us>, Bruce Momjian
<pgman@candle.pha.pa.us> wrote:
> Yea. Where you using WAL archiving? We will have a fix in 8.1.5 to
> prevent multiple archivers from starting. Perhaps that was a cause.
>
Not at the time, no. The rename in question was just a regular WAL
segment rename.
> Yes, I just reread that thread. I also am confused where to go from
> here.
>
Yeah, it's unfortunate that our best theory (a _commit on a deleted
file) just didn't seem to be supported by the evidence. Although the
servers which see a heavy SELECT load are now Linux, we still have a
couple of Windows servers receiving the normal replication traffic. We
still get regular fsync errors after the scheduled CLUSTERs so if you do
find a fix (or come up with a new theory), there's a test bed there (at
least for now).
> Were you the only one use Win32 in heavy usage? You were on Win2003.
> Were there some bugs in the OS that got fixed later.
...
> Yep. What has me baffled is why no one else is seeing the problem.
> We had a rash of reports, and now all is quiet.
>
We might be somewhat more susceptible than most too. Due to the way
our middle tier parcels out queries, some connections might sit idle for
a long time. Per Tom's explanation in the original thread, this is an
important factor. Ultimately if a concurrent rename isn't possible in
Windows (and that looks likely), it's going to be a problem as things
stand now.
Pete