On 04/30/2015 11:09 PM, Peter Geoghegan wrote:
> I've been unable to reproduce the unprincipled deadlock using the same
> test case as before. However, the exclusion constraint code now
> livelocks. Here is example output from a stress-testing session:
>
>...
>
> [Fri May 1 04:45:35 2015] normal exit at 1430455535 after 128000
> items processed at count_upsert_exclusion.pl line 192.
> trying 128 clients:
> [Fri May 1 04:45:58 2015] NOTICE: extension "btree_gist" already
> exists, skipping
> [Fri May 1 04:45:58 2015] init done at count_upsert_exclusion.pl line 106.
>
>
> (I ssh into server, check progress). Then, due to some issue with the
> scheduler or something, progress continues:
>
> [Fri May 1 05:17:57 2015] sum is 462
> [Fri May 1 05:17:57 2015] count is 8904
> [Fri May 1 05:17:58 2015] normal exit at 1430457478 after 128000
> items processed at count_upsert_exclusion.pl line 192.
> trying 128 clients:
Hmm, so it was stuck for half an hour at that point? Why do you think it
was a livelock?
> This is the same server that I shared credentials with you for. Feel
> free to ssh in and investigate it yourself.
I logged in, but the system seems very unresponsive in general. I just
started "apt-get install gdb" on it, to investigate what the backends
are stuck at. It's been running for about 30 minutes now, and I'm still
waiting...
- Heikki