Re: Elusive segfault with 9.3.5 & query cancel

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Elusive segfault with 9.3.5 & query cancel
Дата
Msg-id 54823492.3000009@BlueTreble.com
обсуждение исходный текст
Ответ на Re: Elusive segfault with 9.3.5 & query cancel  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Elusive segfault with 9.3.5 & query cancel  (Peter Geoghegan <pg@heroku.com>)
Re: Elusive segfault with 9.3.5 & query cancel  (Richard Frith-Macdonald <richardfrithmacdonald@gmail.com>)
Список pgsql-hackers
On 12/5/14, 4:11 PM, Peter Geoghegan wrote:
> On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> We made some changes which decreased query cancel (optimizing queries,
>>> turning on hot_standby_feedback) and we haven't seen a segfault since
>>> then.  As far as the user is concerned, this solves the problem, so I'm
>>> never going to get a trace or a core dump file.
>>
>> Forgot a major piece of evidence as to why I think this is related to
>> query cancel:  in each case, the segfault was preceeded by a
>> multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
>> the backend running the query which segfaulted might have been the only
>> backend *not* cancelled due to query conflict concurrently.
>> Contradicting this, there are other multi-backend query cancels in the
>> logs which do NOT produce a segfault.
>
> I wonder if it would be useful to add additional instrumentation so
> that even without a core dump, there was some cursory information
> about the nature of a segfault.
>
> Yes, doing something with a SIGSEGV handler is very scary, and there
> are major portability concerns (e.g.
> https://bugs.ruby-lang.org/issues/9654), but I believe it can be made
> robust on Linux. For what it's worth, this open source project offers
> that kind of functionality in the form of a library:
> https://github.com/vmarkovtsev/DeathHandler

Perhaps we should also officially recommend production servers be setup to create core files. AFAIK the only downside
isthe time it would take to write a core that's huge because of shared buffers, but perhaps there's some way to avoid
writingthose? (That means the core won't help if the bug is due to something in a buffer, but that seems unlikely
enoughthat the tradeoff is worth it...)
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Adam Brightwell
Дата:
Сообщение: Re: Role Attribute Bitmask Catalog Representation
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Elusive segfault with 9.3.5 & query cancel