Обсуждение: The buildfarm is in a pretty bad way, folks
It sure looks like there's been a frantic push to commit stuff that maybe wasn't quite fully baked. I'm not terribly on board with that, because it's likely to be hard to disentangle who broke what. But in particular, it's clear that partition_prune and isolation/checksum_cancel are showing big problems. regards, tom lane
Hi, On 2018-04-06 16:59:11 -0400, Tom Lane wrote: > It sure looks like there's been a frantic push to commit stuff that > maybe wasn't quite fully baked. I'm not terribly on board with that, > because it's likely to be hard to disentangle who broke what. > But in particular, it's clear that partition_prune and > isolation/checksum_cancel are showing big problems. While I'm obviously also unhappy about the frantic push to push semi baked stuff, I'm not sure the two issues you point to above are that good examples of carelessness. At least the latter seems mostly a pretty normal portability thing around orderedness? Greetings, Andres Freund
On Fri, Apr 6, 2018 at 10:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
It sure looks like there's been a frantic push to commit stuff that
maybe wasn't quite fully baked. I'm not terribly on board with that,
because it's likely to be hard to disentangle who broke what.
But in particular, it's clear that partition_prune and
isolation/checksum_cancel are showing big problems.
Daniel is working on investigating the isolationtester thing. See a mail on one of the threads where initial indications were the "atomics with no real atomics" (or whatever you'd call it) were to blame. We could redo that thing without atomics to get rid of that (and possibly should), but it would be good to figure out if it's actually broken first, so that part can get fixed if it is.
On 2018-04-06 23:12:19 +0200, Magnus Hagander wrote: > Daniel is working on investigating the isolationtester thing. See a mail on > one of the threads where initial indications were the "atomics with no real > atomics" (or whatever you'd call it) were to blame. We could redo that > thing without atomics to get rid of that (and possibly should), but it > would be good to figure out if it's actually broken first, so that part can > get fixed if it is. Is that an explanation for https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2018-04-06%2019%3A18%3A11 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-04-06%2016%3A03%3A01 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2018-04-06%2015%3A46%3A16 ? Those all don't seem fall under that? Having proper atomics? Greetings, Andres Freund
On Fri, Apr 6, 2018 at 11:19 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-04-06 23:12:19 +0200, Magnus Hagander wrote:
> Daniel is working on investigating the isolationtester thing. See a mail on
> one of the threads where initial indications were the "atomics with no real
> atomics" (or whatever you'd call it) were to blame. We could redo that
> thing without atomics to get rid of that (and possibly should), but it
> would be good to figure out if it's actually broken first, so that part can
> get fixed if it is.
Is that an explanation for
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm= gharial&dt=2018-04-06%2019% 3A18%3A11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm= lousyjack&dt=2018-04-06%2016% 3A03%3A01
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm= sungazer&dt=2018-04-06%2015% 3A46%3A16
? Those all don't seem fall under that? Having proper atomics?
No, sorry, bad wording. The initial indications were that, that's not the *only* indications. There is possibly/probably more than one thing.
Tom Lane wrote: > It sure looks like there's been a frantic push to commit stuff that > maybe wasn't quite fully baked. I'm not terribly on board with that, > because it's likely to be hard to disentangle who broke what. > But in particular, it's clear that partition_prune and > isolation/checksum_cancel are showing big problems. The partition_prune failure is clearly a minor portability issue which I'll investigate after I pick up the kids. From where I sit, if we let that patch bake any more, it will burn in the oven. Partition prune also broke the sepgsql test also -- I think because one partition is no longer scanned. Seems a reasonable thing to me, just need to update the expected file. But I'll look closer. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andres Freund <andres@anarazel.de> writes: > On 2018-04-06 16:59:11 -0400, Tom Lane wrote: >> But in particular, it's clear that partition_prune and >> isolation/checksum_cancel are showing big problems. > While I'm obviously also unhappy about the frantic push to push semi > baked stuff, I'm not sure the two issues you point to above are that > good examples of carelessness. At least the latter seems mostly a pretty > normal portability thing around orderedness? I'm just venting, perhaps, but if there's a good reason for that to have been left broken for ~24 hours, I don't know what it is. It's getting in the way of testing other recent commits. (I'm also not real happy about the amount of time the checksum-xxx tests consume.) regards, tom lane
On Fri, Apr 6, 2018 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
> On 2018-04-06 16:59:11 -0400, Tom Lane wrote:
>> But in particular, it's clear that partition_prune and
>> isolation/checksum_cancel are showing big problems.
> While I'm obviously also unhappy about the frantic push to push semi
> baked stuff, I'm not sure the two issues you point to above are that
> good examples of carelessness. At least the latter seems mostly a pretty
> normal portability thing around orderedness?
I'm just venting, perhaps, but if there's a good reason for that
to have been left broken for ~24 hours, I don't know what it is.
It's getting in the way of testing other recent commits.
(I'm also not real happy about the amount of time the checksum-xxx
tests consume.)
The isolation tester ones, or the regular ones? Because the regular ones finish in << 30 seconds here, just wondering if that actually counts as too time consuming in this type of tests?
Magnus Hagander <magnus@hagander.net> writes: > On Fri, Apr 6, 2018 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (I'm also not real happy about the amount of time the checksum-xxx >> tests consume.) > The isolation tester ones, or the regular ones? Because the regular ones > finish in << 30 seconds here, just wondering if that actually counts as too > time consuming in this type of tests? The isolationtester ones. Looking at longfin, which while not a speed demon isn't real slow either, the isolation-check step was taking 2:05 two days ago and now it's at 2:48. That's a pretty big incremental jump for one feature. regards, tom lane