Обсуждение: Isolation tests still falling over routinely

Поиск
Список
Период
Сортировка

Isolation tests still falling over routinely

От
Tom Lane
Дата:
The buildfarm is still showing isolation test failures more days than
not, eg
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pika&dt=2011-09-17%2012%3A43%3A11
and I've personally seen such failures when testing with
CLOBBER_CACHE_ALWAYS.  Could we please fix those tests to not have such
fragile timing assumptions?
        regards, tom lane


Re: Isolation tests still falling over routinely

От
"Kevin Grittner"
Дата:
Tom Lane  wrote:
> The buildfarm is still showing isolation test failures more days
> than not, eg
> 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pika&dt=2011-09-17%2012%3A43%3A11
> and I've personally seen such failures when testing with
> CLOBBER_CACHE_ALWAYS. Could we please fix those tests to not have
> such fragile timing assumptions?
I went back over two months, and only found one failure related to an
SSI test, and that was because the machine ran out of disk space. 
There should never be any timing-related failures on the SSI tests,
as there is no blocking or deadlocking.
If you have seen any failures on isolation tests other than the fk-*
tests, I'd be very interested in details.
The rest are not related to SSI but test deadlock conditions related
to foreign keys.  I didn't have anything to do with these but to
provide alternate result files for REPEATABLE READ and SERIALIZABLE
isolation levels.  (I test the installcheck-world target and the
isolation tests in those modes frequently, and the fk-deadlock tests
were failing every time at those levels.)
If I remember right, Alvaro chose these timings to balance run time
against chance of failure.  Unless we want to remove these deadlock
handling tests or ignore failures (which both seem like bad ideas to
me), I think we need to bump the long timings by an order of
magnitude and just concede that those tests run for a while.
-Kevin


Re: Isolation tests still falling over routinely

От
Alvaro Herrera
Дата:
Excerpts from Kevin Grittner's message of mar sep 20 22:51:39 -0300 2011:

> If I remember right, Alvaro chose these timings to balance run time
> against chance of failure.  Unless we want to remove these deadlock
> handling tests or ignore failures (which both seem like bad ideas to
> me), I think we need to bump the long timings by an order of
> magnitude and just concede that those tests run for a while.

The main problem I have is that I haven't found a way to reproduce the
problems in my machine.  I was playing with modifying the way the error
messages are reported, but that ended up unfinished in a local branch.

I'll give it a go once more and see if I can commit so that buildfarm
tells us if it works or not.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Isolation tests still falling over routinely

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> The main problem I have is that I haven't found a way to reproduce the
> problems in my machine.

Try -DCLOBBER_CACHE_ALWAYS.
        regards, tom lane


Re: Isolation tests still falling over routinely

От
Alvaro Herrera
Дата:
Excerpts from Tom Lane's message of mar sep 20 21:30:42 -0300 2011:
> 
> The buildfarm is still showing isolation test failures more days than
> not, eg
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pika&dt=2011-09-17%2012%3A43%3A11
> and I've personally seen such failures when testing with
> CLOBBER_CACHE_ALWAYS.  Could we please fix those tests to not have such
> fragile timing assumptions?

The fix has now been installed for two weeks and no new failure has
occured.  The only failure in the IsolationCheck phase since then was
caused by a disk filling up (and it wasn't in the fk-* tests anyway).
I think we can consider this issue fixed.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Isolation tests still falling over routinely

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of mar sep 20 21:30:42 -0300 2011:
>> Could we please fix those tests to not have such
>> fragile timing assumptions?

> The fix has now been installed for two weeks and no new failure has
> occured.  The only failure in the IsolationCheck phase since then was
> caused by a disk filling up (and it wasn't in the fk-* tests anyway).
> I think we can consider this issue fixed.

Yeah, it looks good.  Thanks!
        regards, tom lane