Re: Maybe BF "timedout" failures are the client script's fault?
| От | Andrew Dunstan |
|---|---|
| Тема | Re: Maybe BF "timedout" failures are the client script's fault? |
| Дата | |
| Msg-id | 05efa923-a1b2-48b5-b9ec-9abf8758f720@dunslane.net обсуждение исходный текст |
| Ответ на | Maybe BF "timedout" failures are the client script's fault? (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: Maybe BF "timedout" failures are the client script's fault?
|
| Список | pgsql-hackers |
On 2026-01-09 Fr 3:41 PM, Tom Lane wrote: > We've been assuming that all the "timedout" failures on BF member > fruitcrow were due to some wonkiness in the GNU/Hurd platform. > I got suspicious about that though after noticing that there are > a small number of such failures on other animals, eg [1][2][3]. > In each case, the failure message claims it waited a good long > time, which is at variance with the actually observed runtime. > For instance [1] says "timed out after 14400 secs", but the > actual total test runtime is only 01:24:28 according to the > summary at the top of the page. > > Looking into the buildfarm client, I realized that it's assuming that > "sleep($wait_time)" is sufficient to wait for $wait_time seconds. > However, the Perl docs point out that sleep() can be interrupted by a > signal. So now I'm suspicious that many of these failures are caused > by a stray signal waking up the wait_timeout thread prematurely. > GNU/Hurd might just be more prone to that than other platforms. > > I propose the attached patch to the BF client to try to make this > more robust. > > regards, tom lane > > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ovenbird&dt=2025-11-14%2009%3A21%3A05 > [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2025-10-17%2018%3A32%3A07 > [3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&dt=2026-01-08%2023%3A07%3A37 > The patch seems reasonable on its face, but I doubt it's the issue. Rather I think what's happening here is that a test is hanging silently and lastcommand.log's mtime doesn't get updated, causing a misreporting of the run duration. So in addition to the above I have added some code to update that timestamp if the file exists (which should only be the case with a timeout). See https://github.com/PGBuildFarm/client-code/commit/e5d67a35a0136a53e441fccf0ecc9b1b6322526c cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: