Re:Re:Re: BUG #15187: When use huge page, there may be a lot ofhanged connections with status startup or authentication

Поиск
Список
Период
Сортировка
От chenhj
Тема Re:Re:Re: BUG #15187: When use huge page, there may be a lot ofhanged connections with status startup or authentication
Дата
Msg-id 3f0cf5ff.777d.16378c971d3.Coremail.chjischj@163.com
обсуждение исходный текст
Ответ на Re:Re: BUG #15187: When use huge page, there may be a lot of hangedconnections with status startup or authentication  (chenhj <chjischj@163.com>)
Список pgsql-bugs

At 2018-05-15 06:16:39, "chenhj" <chjischj@163.com> wrote:
At 2018-05-07 02:57:12, "Andres Freund" <andres@anarazel.de> wrote:
>On 2018-05-06 23:45:17 +0800, chenhj wrote:
>> >>
>> >>Chen, have you disabled transparent hugepages and zone reclaim?
>> >>
>> >>Greetings,
>> >>
>> >>Andres Freund>>c) Depend on huge page >huge_page=on, happen(no matter transparent_hugepage is [always] or [never]) >huge_page=off, not happen
>> >
>> >When disable transparent hugepages ,this problem also occurs.
>> >Aboud zone reclaim,I will see it later.
>> >What I doubt is that this problem does not occurs at PostgreSQL 9.6.2 (I tested 10.2 and 9.6.2 on the same machine)
>> >d) Depend on PostgreSQL Version
>> >PostgreSQL 10.2 happen
>> >PostgreSQL 9.6 not happen
>> >Chen Huajun
>> The problem occurs whether vm.zone_reclaim_mode is set to 0 or 1.
>> 
>> In addition, what needs to be corrected is that even huge_pages=off is problematic.
>> 
>> Huge_pages = on SQL execution is a very slow , and with hangd connections in startup and auth state.
>> 
>
>You'd probably need to provide a few perf profiles to get further
>insight.
>
>Greetings,
>
>Andres Freund

According to test, this question is related to commit "ecb0d20a9d2e09b7112d3b192047f711f9ff7e59", which changed from Using SysV semaphores to Using POSIX semaphores on Linux. 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ecb0d20a9d2e09b7112d3b192047f711f9ff7e59
Use unnamed POSIX semaphores, if available, on Linux and FreeBSD.
We've had support for using unnamed POSIX semaphores instead of System Vsemaphores for quite some time, but it was not used by default on anyplatform.  Since many systems have rather small limits on the number ofSysV semaphores allowed, it seems desirable to switch to POSIX semaphoreswhere they're available and don't create performance or kernel resourceproblems.  Experimentation by me shows that unnamed POSIX semaphoresare at least as good as SysV semaphores on Linux, and we previously hada report from Maksym Sobolyev that FreeBSD is significantly worse withSysV semaphores than POSIX ones.  So adjust those two platforms to useunnamed POSIX semaphores, if configure can find the necessary libraryfunctions.  If this goes well, we may switch other platforms as well,but it would be advisable to test them individually first.
It's not currently contemplated that we'd encourage users to selecta semaphore API for themselves, but anyone who wants to experimentcan add PREFERRED_SEMAPHORES=UNNAMED_POSIX (or NAMED_POSIX, or SYSV)to their configure command line to do so.
I also tweaked configure to report which API it's selected, mainlyso that we can tell that from buildfarm reports.
I did not touch the user documentation's discussion about semaphores;that will need some adjustment once the dust settles.
Discussion: <8536.1475704230@sss.pgh.pa.us>

This is why, this problem does not occur on 9.6.2, and it occurs on 10.2.

As to why? Perhaps this is a bug in the Linux kernel. However, it is not clear from which version of the Linux kernel "fixed?" this problem. The problem still occurs after upgrading the CentOS 6.5 kernel from 2.6.32-431 to 2.6.32-504.23.4.
To avoid this problem, may be the only way is upgrading the CentOS to higher version(such as CentOS 7.3).
Regards,
Chen Huajun
We have confirmed this to be a known Linux kernel bug. And fixed by the following commmit. Thanks for all help.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=13d60f4b6ab5b702dc8d2ee20999f98a93728aec  
  futex: Take hugepages into account when generating futex_key

Regards,
Chen Huajun

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Milorad Krstevski
Дата:
Сообщение: Re: BUG #15206: Can not import CSV into PostgreSQL
Следующее
От: Milorad Krstevski
Дата:
Сообщение: Re: BUG #15206: Can not import CSV into PostgreSQL