On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > In the meantime, we've had *lots* of buildfarm failures in the
> > added pg_stat_all_tables query, which indicate that indeed the
> > stats collector mechanism isn't terribly reliable. But that
> > doesn't directly prove anything about the original problem,
> > since the planner doesn't look at stats collector data.
>
> I noticed that if you look at the list of failures of this type, there
> are often pairs of animals belonging to Andres that failed at the same
> time. I wonder if he might be running a bunch of animals on one
> kernel, and need to increase net.core.rmem_max and
> net.core.rmem_default (or maybe the write side variants, or both, or
> something like that).
In further support of that theory, here are the counts of 'stats'
failures (excluding bogus reports due to crashes) for the past 90
days:
owner | animal | count
-------------------------+--------------+-------
andres-AT-anarazel.de | desmoxytes | 5
andres-AT-anarazel.de | dragonet | 9
andres-AT-anarazel.de | flaviventris | 1
andres-AT-anarazel.de | idiacanthus | 5
andres-AT-anarazel.de | komodoensis | 11
andres-AT-anarazel.de | pogona | 1
andres-AT-anarazel.de | serinus | 3
andrew-AT-dunslane.net | lorikeet | 1
buildfarm-AT-coelho.net | moonjelly | 1
buildfarm-AT-coelho.net | seawasp | 17
clarenceho-AT-gmail.com | mayfly | 2
Andres's animals report the same hostname and run at the same time, so
it'd be interesting to know what net.core.rmem_max is set to and
whether these problems go away if it's cranked up 10x higher or
something. In a quick test I can see that make installcheck is
capable of sending a *lot* of 936 byte messages in the same
millisecond.
--
Thomas Munro
https://enterprisedb.com