make check hang on AIX 5L p690 4way/I have two solutions (corrected)
| От | Tomoyuki Niijima | 
|---|---|
| Тема | make check hang on AIX 5L p690 4way/I have two solutions (corrected) | 
| Дата | |
| Msg-id | OFF407F862.04F474BA-ON49256C27.0080B26C@LocalDomain обсуждение исходный текст | 
| Список | pgsql-bugs | 
Sounds like, I should not post bug report with patch to
pgsql-patches@postgresql.org.  I post the same report to
pgsql-bugs@postgresql.org.
PoorTom
----- Forwarded by Tomoyuki Niijima/Japan/IBM on 2002/09/02 08:29 -----
                      Tomoyuki Niijima
                                               To:       pgsql-patches@postgresql.org
                      2002/08/30 01:55         cc:
                                               From:     Tomoyuki Niijima/Japan/IBM@IBMJP
                                               Subject:  make check hang on AIX 5L p690 4way/I have two
                                                solutions (corrected)
Your name               : Tomoyuki Niijima
Your email address      : niijima@jp.ibm.com
System Configuration
---------------------
  Architecture (example: Intel Pentium)         : IBM 7040-681 (pSeries
690) 4way (LPAR)
  Operating System (example: Linux 2.0.26 ELF)  : AIX 5L 5.1
  PostgreSQL version (example: PostgreSQL-7.2.1):   PostgreSQL-7.2.1
  Compiler used (example:  gcc 2.95.2)          : gcc 2.9
Please enter a FULL description of your problem:
------------------------------------------------
I tried to build PostgreSQL with the following step to see backends hung
during the regression test. The problem has been reproduced on two machine
but both of these are the same type of hardware and software. I also tried
to recreate the problem on other machines, on older version of AIX but I
couldn't.
Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------
./configure --enable-multibyte=EUC_JP --with-CC=gcc
make
I learned that backend slept in semop() by attaching dbx (AIX debugger) to
one of 'postgres:' processes.
If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------
After looked through pgsql-hackers mailing list, I focused on spin lock
issue to solve the problem. The easiest and may not be the best solution
for the problem is to give up HAS_TEST_AND_SET. This actually works.
*** src/include/port/aix.h.org  Tue Feb 13 23:32:52 2001
--- src/include/port/aix.h      Fri Aug 30 01:02:28 2002
***************
*** 1,8 ****
  #define CLASS_CONFLICT
  #define DISABLE_XOPEN_NLS
! #define HAS_TEST_AND_SET
  #define NO_MKTIME_BEFORE_1970
! typedef unsigned int slock_t;
  #include <sys/machine.h>              /* ENDIAN definitions for network
                                                                 *
communication
 */
--- 1,8 ----
  #define CLASS_CONFLICT
  #define DISABLE_XOPEN_NLS
! /* #define HAS_TEST_AND_SET */
  #define NO_MKTIME_BEFORE_1970
! /* typedef unsigned int slock_t; */
  #include <sys/machine.h>              /* ENDIAN definitions for network
                                                                 *
communication
 */
One another and better solution for the problem is to use _check_lock() and
_clear_lock() as spin lock.  Important thing here is to define S_UNLOCK()
with _clear_lock().  This will solve the so called "Compiler bug" issue
someone wrote on the mailing list.
We have some other API such as cs(), compare_and_swap() and fetch_and_or()
to do test and set on AIX, but any of these didn't solve my problem.  I
wrote tiny testing program to see if we have any bug of these API of AIX,
but I couldn't see any problem except for compare_and_swap(). It seems that
you can not use compare_and_swap() for the purpose, as it would not work as
spin lock on any SMP machines I tested.  I don't know the reason why cs()
nor fetch_and_or()/fetch_and_and() will not work with PostgreSQL on p690.
These worked with my testing program on all machines I tested.
*** ./src/include/storage/s_lock.h.org  Wed Jan 30 00:44:42 2002
--- ./src/include/storage/s_lock.h      Fri Aug 30 01:13:15 2002
***************
*** 440,446 ****
   * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
   * (see storage/ipc.h).
   */
! #define TAS(lock)     cs((int *) (lock), 0, 1)
  #endif         /* _AIX */
--- 440,447 ----
   * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
   * (see storage/ipc.h).
   */
! #define TAS(lock)     _check_lock(lock, 0, 1)
! #define S_UNLOCK(lock)        _clear_lock(lock, 0)
  #endif         /* _AIX */
		
	В списке pgsql-bugs по дате отправления: