Re: [HACKERS] jacana hung after failing to acquire random number

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: [HACKERS] jacana hung after failing to acquire random number
Дата
Msg-id 50108a9a-72ad-4887-320f-f2d9de149c41@dunslane.net
обсуждение исходный текст
Ответ на Re: [HACKERS] jacana hung after failing to acquire random number  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: [HACKERS] jacana hung after failing to acquire random number  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: [HACKERS] jacana hung after failing to acquire random number  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers

On 12/12/2016 02:32 AM, Heikki Linnakangas wrote:
> On 12/12/2016 05:58 AM, Michael Paquier wrote:
>> On Sun, Dec 11, 2016 at 9:06 AM, Andrew Dunstan <andrew@dunslane.net> 
>> wrote:
>>>
>>> jascana (mingw, 64 bit compiler, no openssl) is currently hung on "make
>>> check". After starting the autovacuum launcher there are 120 
>>> messages on its
>>> log about "Could not acquire random number". Then nothing.
>>>
>>>
>>> So I suspect the problem here is commit
>>> fe0a0b5993dfe24e4b3bcf52fa64ff41a444b8f1, although I haven't looked in
>>> detail.
>>>
>>>
>>> Shouldn't we want the postmaster to shut down if it's not going to go
>>> further? Note that frogmouth, also mingw, which builds with openssl, 
>>> doesn't
>>> have this issue.
>>
>> Did you unlock it in some way at the end? Here is the shape of the
>> report for others:
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-12-10%2022%3A00%3A15 
>>
>> And here is of course the interesting bit:
>> 2016-12-10 17:25:38.822 EST [584c80e2.ddc:2] LOG:  could not acquire
>> random number
>> 2016-12-10 17:25:39.869 EST [584c80e2.ddc:3] LOG:  could not acquire
>> random number
>> 2016-12-10 17:25:40.916 EST [584c80e2.ddc:4] LOG:  could not acquire
>> random number
>>
>> I am not seeing any problems with MSVC without openssl, so that's a
>> problem proper to MinGW. I am getting to wonder if it is actually a
>> good idea to cache the crypt context and then re-use it. Using a new
>> context all the time is definitely not performance-wise though.
>
> Actually, looking at the config.log on jacana, it's trying to use 
> /dev/urandom:
>
> configure:15028: checking for /dev/urandom
> configure:15041: result: yes
> configure:15054: checking which random number source to use
> configure:15073: result: /dev/urandom
>
> And looking closer at configure.in, I can see why:
>
>   elif test "$PORTNAME" = x"win32" ; then
>     USE_WIN32_RANDOM=1
>
> That test is broken. It looks like the x"$VAR" = x"constant" idiom, 
> but the left side of the comparison doesn't have the 'x'. Oops.
>
> Fixed that, let's see if it made jacana happy again.
>
> This makes me wonder if we should work a bit harder to get a good 
> error message, if acquiring a random number fails for any reason. This 
> needs to work in the frontend as well backend, but we could still have 
> an elog(LOG, ...) there, inside an #ifndef FRONTEND block.


I see you have now improved the messages in postmaster.c, which is good.

However, the bigger problem (ISTM) is that when this failed I had a 
system which was running but where every connection immediately failed:
   ============== creating temporary instance            ==============   ============== initializing database system
       ==============   ============== starting postmaster                    ==============
 
   pg_regress: postmaster did not respond within 120 seconds   Examine
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/log/postmaster.logfor the reason   make: ***
[check]Error 2
 

Should one or more of these errors be fatal? Or should we at least get 
pg_regress to try to shut down the postmaster if it can't connect after 
120 seconds?


[In answer to Michael's question above, I forcibly shut down the 
postmaster by hand. Otherwise it would still be running, and we would 
not have got the report on the buildfarm server.]

cheers

andrew




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Stark
Дата:
Сообщение: Re: [HACKERS] jsonb problematic operators
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: [HACKERS] [PATCH] Fix for documentation of timestamp type