Re: REL_13_STABLE Windows 10 Regression Failures

Поиск
Список
Период
Сортировка
От Heath Lord
Тема Re: REL_13_STABLE Windows 10 Regression Failures
Дата
Msg-id CA+BEBhvx8PJxMEx8xwakWuwtjhZ4DAS=KRRxdD3ZmM88CvMrwQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: REL_13_STABLE Windows 10 Regression Failures  (Heath Lord <heath.lord@crunchydata.com>)
Ответы Re: REL_13_STABLE Windows 10 Regression Failures  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-bugs
On Fri, Oct 30, 2020 at 3:47 PM Heath Lord <heath.lord@crunchydata.com> wrote:
>
> Tom,
>    We are working to set up our environment to allow us to get a stack
> trace as we do not have any of the Visual Studios stuff installed
> right now.  However, I thought I would send you a little more
> information while we are trying to get that working.
>    Going through the stats_ext.sql file line by line with a freshly
> built REL_13_STABLE database stood up we have determined that running
> any of the following commands back to back will cause the database to
> crash:
>
> CREATE STATISTICS tst ON relnatts + relpages FROM pg_class;
> CREATE STATISTICS tst ON (relpages, reltuples) FROM pg_class;
>
>   If you run another command in between them like:
>
> SELECT version();
>
>   Then it will not crash when you run either of those commands again.
> However if you run any combination of those 2 commands back to back it
> will crash the database.  The output from the psql instance after
> stepping through the stats_ext.sql file is in the
> stats_ext_psql_output.txt file attached.
>
>   The information from the postgres logfile for the above is attached
> in the pg_logfile_output.txt file.
>
>    Hopefully, this will at least give you some information while we
> are working on getting the backtrace.  Thanks.
>
> -Heath
>
> On Fri, Oct 30, 2020 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Heath Lord <heath.lord@crunchydata.com> writes:
> > > When building from source on a Windows 10 VM using MinGW (8.1.0), I
> > > get a random number of regression failures off the REL_13_STABLE
> > > branch.  I debugged this a little bit and found out that the "random"
> > > number of failures is fully dependent on the machine and if I disable
> > > the "stats_ext.sql" regression test; all other tests pass without
> > > issue. When the "stats_ext.sql" regression test runs, it causes a
> > > database exception and PostgreSQL crashes.
> >
> > Hmph ... it's weird that we have not seen this in the buildfarm.
> > Have you tried to extract any info from the crash, like a stack trace?
> >
> > > I did some digging and determined that on the REL_13_STABLE branch
> > > this instability was introduced with this commit
> > > "b380484a850b6bf7d9fc0d85c555a2366e38451f"[1]. This corresponds to
> > > commit "19f5a37b9fc48a12c77edafb732543875da2f4a3"[1] on master. I
> > > worked backwards from there to determine when the regressions stopped
> > > failing and determined that with commit
> > > "be0a6666656ec3f68eb7d8e7abab5139fcd47012"[2] the regression tests are
> > > no longer failing.
> >
> > I'm having a hard time believing that b380484a8 would have introduced
> > a portability problem, and an even harder time believing that be0a66666
> > would have resolved it if so.  What seems more likely is that there's
> > some underlying issue such as a memory stomp, that the first commit
> > accidentally exposed and the second one accidentally hid again.
> > So, even if back-patching be0a66666 seemed feasible from a stability
> > standpoint (which I don't think it is), I fear it'd just mask a problem
> > that would eventually bite us again.
> >
> > So I think we need to dig down and try to identify the root cause,
> > without any preconceptions about how to fix it.  Again, a stack trace
> > would be pretty useful.  Or at least some info about which step of
> > stats_ext.sql is crashing.
> >
> >                         regards, tom lane

All,
   I was finally able to get a stack trace.  I apologize for it taking
so long, but for some reason when I did a configure with
"--enable-cassert --enable-debug" all of the regression tests passed.
However I finally was able to get it to work with only using
"--enable-debug" and then using the mingw version of gdb to get what I
have attached.  Please let me know if this contains any useful
information.  I also noticed that causing the crash was slightly
different and it definitely behaved differently with debug enabled.
Thank you in advance for any help.

-Heath

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Re: BUG #16702: inline code and function : when use dynamic name for rowtype, there is some bug!
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: REL_13_STABLE Windows 10 Regression Failures