Обсуждение: castoroides spinlock failure on test_shm_mq

Поиск
Список
Период
Сортировка

castoroides spinlock failure on test_shm_mq

От
Alvaro Herrera
Дата:
Has anybody noticed the way castoroides is randomly failing?
 SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from
generate_series(1,270000)),200, 3);
 
! PANIC:  stuck spinlock (100cb92f4) detected at atomics.c:30
! server closed the connection unexpectedly
!     This probably means the server terminated abnormally
!     before or while processing the request.
! connection to server was lost


-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: castoroides spinlock failure on test_shm_mq

От
Robert Haas
Дата:
On Sat, Jun 20, 2015 at 12:24 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Has anybody noticed the way castoroides is randomly failing?
>
>   SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from
generate_series(1,270000)),200, 3);
 
> ! PANIC:  stuck spinlock (100cb92f4) detected at atomics.c:30
> ! server closed the connection unexpectedly
> !       This probably means the server terminated abnormally
> !       before or while processing the request.
> ! connection to server was lost

Yeah, Andres and I discussed it a month ago:

http://www.postgresql.org/message-id/20150527225528.GP5310@alap3.anarazel.de

I think we're going to need to try to implement real memory barriers
on all architectures we support.  It's not clear whether there's some
suitable generic fallback that we could use or whether we're going to
need something different for each case.  I had thought Andres was
planning to work on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: castoroides spinlock failure on test_shm_mq

От
Andres Freund
Дата:
On 2015-06-20 09:35:39 -0400, Robert Haas wrote:
> On Sat, Jun 20, 2015 at 12:24 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > Has anybody noticed the way castoroides is randomly failing?
> >
> >   SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from
generate_series(1,270000)),200, 3);
 
> > ! PANIC:  stuck spinlock (100cb92f4) detected at atomics.c:30
> > ! server closed the connection unexpectedly
> > !       This probably means the server terminated abnormally
> > !       before or while processing the request.
> > ! connection to server was lost
> 
> Yeah, Andres and I discussed it a month ago:
> 
> http://www.postgresql.org/message-id/20150527225528.GP5310@alap3.anarazel.de
> 
> I think we're going to need to try to implement real memory barriers
> on all architectures we support.  It's not clear whether there's some
> suitable generic fallback that we could use or whether we're going to
> need something different for each case.  I had thought Andres was
> planning to work on this.

I am. I'd posted on the other thread that I want to use
waitpid(PostmasterPid, WNOHANG) as the fallback for now. Unless somebody
protests I'm going to commit that first, wait for a while to see wether
it stabilizes the solaris members, and then commit a better fallback for
solaris with suncc.

Greetings,

Andres Freund