Обсуждение: New regression driver
I't committed.
There is a new shell script run_check.sh in the regression
test directory. It is driven by the conficuration file
./sql/run_check.tests and runs most of our tests parallel. It
is invoked with the new GNUmakefile target 'runcheck'.
The regress.sh is using the new tests file too by extracting
the tests to run via awk, so ./sql/tests is obsolete now and
subject to be removed soon.
Bruce Momjian wrote:
> Any modifications to shared pg_ tables would be a problem. Also, pg_log
> and pg_variable locking is not happening in there either, is it?
Thus, it does a complete independant database installation
below the regression test, starting it's own postmaster (and
terminating it at the end, of course). The entire test suite
can be run without even shutting down the currently installed
database.
So a
...src > ./configure
...src > make
...src > cd test/regression
...src/test/regression > make clean all runcheck
sequence will compile and temporarily install the new build
under the regression path, and then run all the tests against
it.
I think if my new test driver has settled, we should change
the GNUmakefile to just print some messages if 'make runtest'
is typed. The current runtest target should IMHO still be
availabe under another name, to test the real life
installation created by 'make install'.
Alternatively (IMHO better) some parameter to run_check.sh
could tell if it should create it's own, temporary
installation, or if it should use the existing installed
database system and the already running postmaster.
Tom Lane wrote:
> In other words, you've already exposed a bug! Right on!
Absolutely right and I've commented out that code for now.
It is in utils/cache/catcache.c line 996. The comments say
that the code should prevent the backend from entering
infinite recursion while loading new cache entries. But the
flag used for it seems to live in shared memory, thus it is
affected by other backends too. If the flag is true doesn't
tell if a backend set it itself, or if another one did. If we
really need this check, it must be implemented in another
way.
Another bug I discoverd is that Vadims WAL code allways looks
for the pg_control file in the PGDATA or compiled in default
directory, ignoring the -D switch. Haven't fixed it up to
now, but run_check.sh uses PGDATA, so it's safe at the
moment.
I ran the old regress.sh using the default installation
parallel to the run_check.sh using it's own installation and
postmaster.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes:
> There is a new shell script run_check.sh in the regression
> test directory. It is driven by the conficuration file
> ./sql/run_check.tests and runs most of our tests parallel. It
> is invoked with the new GNUmakefile target 'runcheck'.
This is way cool. I had to fix a couple of silly little portability
problems, but I like it.
> I think if my new test driver has settled, we should change
> the GNUmakefile to just print some messages if 'make runtest'
> is typed. The current runtest target should IMHO still be
> availabe under another name, to test the real life
> installation created by 'make install'.
> Alternatively (IMHO better) some parameter to run_check.sh
> could tell if it should create it's own, temporary
> installation, or if it should use the existing installed
> database system and the already running postmaster.
We should leave the old driver available, so that if an unexpected
problem arises one can easily check to see if it's being triggered by
concurrent execution or not. Or, run_check could have a parameter to
force serialized execution, if you would rather have just one script.
In that case we could toss the old runtest and rename run_check to
runtest. (If we do keep both scripts, can we pick more helpful names
than "runtest" and "run_check"? The difference is not immediately
obvious...)
I agree that run_check needs to be able to test a normal installation
as well as a temporary one.
> Absolutely right and I've commented out that code for now.
> It is in utils/cache/catcache.c line 996. The comments say
> that the code should prevent the backend from entering
> infinite recursion while loading new cache entries. But the
> flag used for it seems to live in shared memory, thus it is
> affected by other backends too. If the flag is true doesn't
> tell if a backend set it itself, or if another one did. If we
> really need this check, it must be implemented in another
> way.
I will look at this. I don't think that the catcaches live in
shared memory, so the problem is probably not what you suggest.
The fact that the behavior is different under load may point to a
real problem, not just an insufficiently clever debugging check.
> I ran the old regress.sh using the default installation
> parallel to the run_check.sh using it's own installation and
> postmaster.
They both give the same results on my platform, too.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes:
> wieck@debis.com (Jan Wieck) writes:
>> It is in utils/cache/catcache.c line 996. The comments say
>> that the code should prevent the backend from entering
>> infinite recursion while loading new cache entries.
> I will look at this. I don't think that the catcaches live in
> shared memory, so the problem is probably not what you suggest.
> The fact that the behavior is different under load may point to a
> real problem, not just an insufficiently clever debugging check.
Indeed, this is a real bug, and commenting out the code that caught
it is not the right fix!
What is happening is that utils/inval.c is trying to initialize some
variables that contain OIDs of system relations. This means calling
the catcache routines in order to look up relation names in pg_class.
However, if a shared cache inval message arrives from another backend
while that's happening, we recursively invoke inval.c to deal with the
message. And inval.c sees that its OID variables aren't initialized
yet, so it recursively calls the catcache routines to try to get them
initialized. Or, if just the first one's been initialized so far,
ValidateHacks() assumes they're all valid, and you can end up at the
elog(FATAL) panic at the bottom of CacheIdInvalidate(). I've got a core
dump which contains a ten-deep recursion between inval.c and syscache.c,
culminating in elog(FATAL) because the eleventh incoming sinval message
was just slow enough to let inval.c's first OID variable get filled in
before it arrived.
In short: we don't deal very robustly with cache invals happening
during backend startup. Send invals at a new backend with just the
right timing, and it'll choke.
I am not sure if this bug is of long standing or if we introduced it
since 6.5. It's possible I created it while messing with the relcache
stuff a month or two ago. But I can easily believe that it's been
there a long time and we never had a way of reproducing the problem
with any reliability before.
I think the fix is to rip out inval.c's attempt to look up system
relation names, and just give it hardwired knowledge of their OIDs.
Even though it sort-of works to do the lookups, it's bad practice for
routines that are potentially called during catcache initialization
to depend on the catcache to be already working. And there are other
places that already have hardwired knowledge of the system relation
OIDs, so...
regards, tom lane
> I think the fix is to rip out inval.c's attempt to look up system > relation names, and just give it hardwired knowledge of their OIDs. > Even though it sort-of works to do the lookups, it's bad practice for > routines that are potentially called during catcache initialization > to depend on the catcache to be already working. And there are other > places that already have hardwired knowledge of the system relation > OIDs, so... FYI, I am in the process of coding all cache miss lookups to use new system indexes. I have also added code to SearchSelfReferences() because pg_operator has some fancy depency on its lookup using an index, and has to have certain lookup happen with an sequential and not an index scan. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> ... I have also added code to SearchSelfReferences()
> because pg_operator has some fancy depency on its lookup using an index,
> and has to have certain lookup happen with an sequential and not an
> index scan.
Say what? That's got to be a symptom of a bug somewhere. Maybe
pg_operator needs some CommandCounterIncrement calls so that the
tuples it inserts become visible earlier? What are you seeing exactly?
For that matter, SearchSelfReferences looks like one giant kluge to me.
Who added this, and why, and what's the logic? (Undocumented kluges
are very high on my hate list.)
regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > Sent: Sunday, November 21, 1999 11:12 AM > To: Bruce Momjian > Cc: PostgreSQL HACKERS > Subject: Re: [HACKERS] New regression driver > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > ... I have also added code to SearchSelfReferences() > > because pg_operator has some fancy depency on its lookup using an index, > > and has to have certain lookup happen with an sequential and not an > > index scan. > > Say what? That's got to be a symptom of a bug somewhere. Maybe > pg_operator needs some CommandCounterIncrement calls so that the > tuples it inserts become visible earlier? What are you seeing exactly? > > For that matter, SearchSelfReferences looks like one giant kluge to me. > Who added this, and why, and what's the logic? (Undocumented kluges > are very high on my hate list.) > It's me who added the function. I left it undocumented,sorry. Bruce,could you add an document on it ? Bruce added a new index to pg_index. Index scan needs an information of pg_index. If we use the new index,we needs the information about the index in pg_index. Doesn't this cause a real cycle ? I added the function in order to hold one tuple which causes a real cycle. The tuple in pg_index should be scanned sequentially. I don't think it's the best solution. Please change it if there's a better way. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> It's me who added the function. > I left it undocumented,sorry. > Bruce,could you add an document on it ? Done. Will commit soon. > > Bruce added a new index to pg_index. > Index scan needs an information of pg_index. > If we use the new index,we needs the information about the index > in pg_index. > Doesn't this cause a real cycle ? Yes. I am using it for a pg_operator index too. > > I added the function in order to hold one tuple which causes a real > cycle. The tuple in pg_index should be scanned sequentially. > > I don't think it's the best solution. > Please change it if there's a better way. I talked to Tom, and we think it is a good solution. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026