Valgrind Memcheck support

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Valgrind Memcheck support
Дата
Msg-id 20130609212559.GB491289@tornado.leadboat.com
обсуждение исходный текст
Ответы Re: Valgrind Memcheck support  (Andres Freund <andres@2ndquadrant.com>)
Re: Valgrind Memcheck support  (Andres Freund <andres@2ndquadrant.com>)
Re: Valgrind Memcheck support  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
Valgrind's Memcheck tool[1] is handy for finding bugs, but our use of a custom
allocator limits its ability to detect problems in unmodified PostgreSQL.
During the 9.1 beta cycle, I found some bugs[2] with a rough patch adding
instrumentation to aset.c and mcxt.c such that Memcheck understood our
allocator.  I've passed that patch around to a few people over time, and I've
now removed the roughness such that it's ready for upstream.  In hopes of
making things clearer in the commit history, I've split out a preliminary
refactoring patch from the main patch and attached each separately.

Besides the aset.c/mcxt.c instrumentation, this patch adds explicit checks for
undefined memory to PageAddItem() and printtup(); this has caught C-language
functions that fabricate a Datum without initializing all bits.  The inclusion
of all this is controlled by a pg_config_manual.h setting.  The patch also
adds a "suppression file" that directs Valgrind to silences nominal errors we
don't plan to fix.

To smoke-test the instrumentation, I used "make installcheck" runs on x86_64
GNU/Linux and ppc64 GNU/Linux.  This turned up various new and newly-detected
memory bugs, which I will discuss in a separate thread.  With those fixed,
"make installcheck" has a clean report (in my one particular configuration).
I designed the suppression file to work across platforms; I specifically
anticipated eventual use on x86_64 Darwin and x86_64 FreeBSD.  Valgrind 3.8.1
quickly crashed when running PostgreSQL on Darwin; I did not dig further.

Since aset.c and mcxt.c contain the hottest code paths in the backend, I
verified that a !USE_VALGRIND, non-assert build produces the same code before
and after the patch.  Testing that revealed the need to move up the
AllocSizeIsValid() check in repalloc(), though I don't understand why GCC
reacted that way.

Peter Geoghegan and Korry Douglas provided valuable feedback on earlier
versions of this code.


Possible future directions:

- Test "make installcheck-world".  When I last did this in past years, contrib did
  trigger some errors.

- Test recovery, such as by running a streaming replica under Memcheck while
  the primary runs "make installcheck-world".

- Test newer compilers and higher optimization levels.  I used GCC 4.2 at -O1.
  A brief look at -O2 results showed a new error that I have not studied.  GCC
  4.8 at -O3 might show still more due to increasingly-aggressive assumptions.

- A buildfarm member running its installcheck steps this way.

- Memcheck has support for detecting leaks.  I have not explored that side at
  all, always passing --leak-check=no.  We could add support for freeing
  "everything" at process exit, thereby making the leak detection meaningful.


Brief notes for folks reproducing my approach: I typically start the
Memcheck-hosted postgres like this:

  valgrind --leak-check=no --gen-suppressions=all \
    --suppressions=src/tools/valgrind.supp --time-stamp=yes \
    --log-file=$HOME/pg-valgrind/%p.log postgres

If that detected an error on which I desired more detail, I would rerun a
smaller test case with "--track-origins=yes --read-var-info=yes".  That slows
things noticeably but gives more-specific messaging.  When even that left the
situation unclear, I would temporarily hack allocChunkLimit so every palloc()
turned into a malloc().

I strongly advise installing the latest-available Valgrind, particularly
because recent releases suffer far less of a performance drop processing the
instrumentation added by this patch.  A "make installcheck" run takes 273
minutes under Vaglrind 3.6.0 but just 27 minutes under Valgrind 3.8.1.

Thanks,
nm

[1] http://valgrind.org/docs/manual/mc-manual.html
[2] http://www.postgresql.org/message-id/20110312133224.GA7833@tornado.gateway.2wire.net

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: postgres_fdw regression tests order dependency
Следующее
От: Noah Misch
Дата:
Сообщение: 9.3 crop of memory errors