Обсуждение: Cache-flush stress testing

Поиск
Список
Период
Сортировка

Cache-flush stress testing

От
Tom Lane
Дата:
I've completed a round of stress testing the system for vulnerabilities
to unexpected cache flush events (relcache, catcache, or typcache
entries disappearing while in use).  I'm pleased to report that the 8.1
branch now passes all available regression tests (main, contrib, pl)
with CLOBBER_CACHE_ALWAYS defined as per the attached patch.
I have not had the patience to run a full regression cycle with
CLOBBER_CACHE_RECURSIVELY (I estimate that would take over a week on the
fastest machine I have) but I have gotten through the first dozen or so
tests, and I doubt that completing the full set would find anything not
found by CLOBBER_CACHE_ALWAYS.

HEAD is still broken pending resolution of the lookup_rowtype_tupdesc()
business.  8.0 should be OK but I haven't actually tested it.

I'm still bothered by the likelihood that there are cache-flush bugs in
code paths that are not exercised by the regression tests.  The
CLOBBER_CACHE patch is far too slow to consider enabling on any regular
basis, but it seems that throwing in cache flushes at random intervals,
as in the test program I posted here:
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php
doesn't provide very good test coverage.  Has anyone got any ideas about
better ways to locate such bugs?

        regards, tom lane


Index: inval.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/cache/inval.c,v
retrieving revision 1.74
diff -c -r1.74 inval.c
*** inval.c    22 Nov 2005 18:17:24 -0000    1.74
--- inval.c    19 Jan 2006 21:47:07 -0000
***************
*** 625,630 ****
--- 625,660 ---- {     ReceiveSharedInvalidMessages(LocalExecuteInvalidationMessage,
InvalidateSystemCaches);
+ 
+     /*
+      * Test code to force cache flushes anytime a flush could happen.
+      *
+      * If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a
+      * fairly thorough test that the system contains no cache-flush hazards.
+      * However, it also makes the system unbelievably slow --- the regression
+      * tests take about 100 times longer than normal.
+      *
+      * If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY.
+      * This slows things by at least a factor of 10000, so I wouldn't suggest
+      * trying to run the entire regression tests that way.  It's useful to
+      * try a few simple tests, to make sure that cache reload isn't subject
+      * to internal cache-flush hazards, but after you've done a few thousand
+      * recursive reloads it's unlikely you'll learn more.
+      */
+ #if defined(CLOBBER_CACHE_ALWAYS)
+     {
+         static bool in_recursion = false;
+ 
+         if (!in_recursion)
+         {
+             in_recursion = true;
+             InvalidateSystemCaches();
+             in_recursion = false;
+         }
+     }
+ #elif defined(CLOBBER_CACHE_RECURSIVELY)
+     InvalidateSystemCaches();
+ #endif }  /*


Re: Cache-flush stress testing

От
"Jim C. Nasby"
Дата:
On Thu, Jan 19, 2006 at 05:03:20PM -0500, Tom Lane wrote:
> I'm still bothered by the likelihood that there are cache-flush bugs in
> code paths that are not exercised by the regression tests.  The
> CLOBBER_CACHE patch is far too slow to consider enabling on any regular
> basis, but it seems that throwing in cache flushes at random intervals,
> as in the test program I posted here:
> http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php
> doesn't provide very good test coverage.  Has anyone got any ideas about
> better ways to locate such bugs?

Some of the machines in the buildfarm do nothing else useful, if this
was turned into a configure option it would be trivial to setup some of
those machines to just hammer away at this.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461