Обсуждение: Debug strategy for musl Postgres?

Поиск
Список
Период
Сортировка

Debug strategy for musl Postgres?

От
John Mudd
Дата:
I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python using musl
as a foundation for Postgres.

I'm using musl to increase the portability of the Postgres binary. I build
on Ubuntu 13.10 but will runs on older Linux boxes.

So far I get better results with the musl Postgres built on modern Ubuntu
and running on an old kernel than building Postgres directly on the old
Linux using standard C library. But the musl Postgres is still not working
fully. I'm not getting responses from the server.

Here's the tail end "strace pg_isready" output for musl Postgres built and
running on Ubuntu 13.10:

clock_gettime(CLOCK_REALTIME, {1397359337, 426941692}) = 0
poll([{fd=4, events=POLLOUT|POLLERR}], 1, 3000) = 1 ([{fd=4,
revents=POLLOUT}])
sendto(4, "\0\0\0=\0\3\0\0user\0mudd\0database\0mudd\0"..., 61,
MSG_NOSIGNAL, NULL, 0) = 61
clock_gettime(CLOCK_REALTIME, {1397359337, 427070343}) = 0
poll([{fd=4, events=POLLIN|POLLERR}], 1, 3000) = 1 ([{fd=4,
revents=POLLIN}])
recvfrom(4, "R\0\0\0\10\0\0\0\0E\0\0\0RSFATAL\0C3D000\0Mdat"..., 16384, 0,
NULL, NULL) = 92
close(4)                                = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo
...}) = 0
writev(1, [{"/tmp:5432 - accepting connection"..., 33}, {"\n", 1}], 2) = 34
exit_group(0)                           = ?


Here's the tail end "strace pg_isready" output for musl Postgres built on
Ubuntu 13.10 but running on old Linux:

clock_gettime(0, 0xbfffa5a8)            = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0})            = 0
poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, 3000) = 1
sendto(3, "\0\0\0?\0\3\0\0user\0jmudd\0database\0jmud"..., 63, 0x4000,
NULL, 0) = 63

clock_gettime(0, 0xbfffa5a8)            = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0})            = 0
poll([{fd=3, events=POLLIN|POLLERR}], 1, 3000) = 0
close(3)                                = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
writev(1, [{"/tmp:5432 - no response", 23}, {"\n", 1}], 2) = 24
exit_group(2)                           = ?


For my next step I'll try building musl Postgres with the --enable-cassert
option. What else can I do to debug this?

John

Re: Debug strategy for musl Postgres?

От
Euler Taveira
Дата:
On 13-04-2014 00:40, John Mudd wrote:
> I built Postgres 9.3.4 from source on top of the musl C library,
> http://www.musl-libc.org/
> I also built zlib, bzip2, ncurses, openssl, readline and Python using musl
> as a foundation for Postgres.
>
This is not a bug. This kind of discussion belongs to -hackers.

While reading this email, I give musl a try. I'm using Debian jessie
which contains musl 1.0.0. I compiled the source (git master) using
CC="musl-gcc" and disabled zlib and readline. It passed all regression
tests. I also tried a pgbench which ran like a charm. (After installed
the binaries I had to set the libray path for musl in
/etc/ld-musl-x86_64.d.)

> I'm using musl to increase the portability of the Postgres binary. I build
> on Ubuntu 13.10 but will runs on older Linux boxes.
>
Could you give details about your architecture?

> For my next step I'll try building musl Postgres with the --enable-cassert
> option. What else can I do to debug this?
>
Is postgres running and listening 5432? Did you try another binaries
(eg. psql) or even postgres in single mode?


--
   Euler Taveira                   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Debug strategy for musl Postgres?

От
John Mudd
Дата:
I agree, not a bug. I was just following the instructions to post as bug
first and then move to hackers if directed. I'll repost on hackers and give
the rest of my reply there.


On Sun, Apr 13, 2014 at 12:04 PM, Euler Taveira <euler@timbira.com.br>wrote:

> On 13-04-2014 00:40, John Mudd wrote:
> > I built Postgres 9.3.4 from source on top of the musl C library,
> > http://www.musl-libc.org/
> > I also built zlib, bzip2, ncurses, openssl, readline and Python using
> musl
> > as a foundation for Postgres.
> >
> This is not a bug. This kind of discussion belongs to -hackers.
>
> --
>    Euler Taveira                   Timbira - http://www.timbira.com.br/
>    PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
>

Re: Debug strategy for musl Postgres?

От
John Mudd
Дата:
On Sun, Apr 13, 2014 at 12:04 PM, Euler Taveira <euler@timbira.com.br>wrote:

> On 13-04-2014 00:40, John Mudd wrote:
> > I built Postgres 9.3.4 from source on top of the musl C library,
> > http://www.musl-libc.org/
> > I also built zlib, bzip2, ncurses, openssl, readline and Python using
> musl
> > as a foundation for Postgres.
> >
> This is not a bug. This kind of discussion belongs to -hackers.
>
> While reading this email, I give musl a try. I'm using Debian jessie
> which contains musl 1.0.0. I compiled the source (git master) using
> CC="musl-gcc" and disabled zlib and readline. It passed all regression
> tests. I also tried a pgbench which ran like a charm. (After installed
> the binaries I had to set the libray path for musl in
> /etc/ld-musl-x86_64.d.)
>
> > I'm using musl to increase the portability of the Postgres binary. I
> build
> > on Ubuntu 13.10 but will runs on older Linux boxes.
> >
> Could you give details about your architecture?
>

Built on 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:25:33 UTC 2013 i686
i686 i686 GNU/Linux
Runs fine there.

Moved postgres install directory to  2.4.21-4.EL #1 Fri Oct 3 18:13:58 EDT
2003 i686 i686 i386 GNU/Linux
Not working fully there.
Note: It's says 2.4 kernel but I've been told that's misleading. The kernel
has upgrades that make it effectively 2.6.


>
> > For my next step I'll try building musl Postgres with the
> --enable-cassert
> > option. What else can I do to debug this?
> >
> Is postgres running and listening 5432? Did you try another binaries
> (eg. psql) or even postgres in single mode?
>
>
I rebuilt with --enable-cassert, reran and no difference on 2.4 machine.

It's listening even on 2.4 machine. I ran strace on main postgres process
and got the following while running pg_isready.

Process 23811 attached - interrupt to quit
Process 23811 detached

But pg_isready just reports "/tmp:5432 - no response" after a few seconds.


>
> --
>    Euler Taveira                   Timbira - http://www.timbira.com.br/
>    PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
>

Re: Debug strategy for musl Postgres?

От
John Mudd
Дата:
On Sun, Apr 13, 2014 at 4:19 PM, John Mudd <johnbmudd@gmail.com> wrote:

>
> It's listening even on 2.4 machine. I ran strace on main postgres process
> and got the following while running pg_isready.
>
> Process 23811 attached - interrupt to quit
> Process 23811 detached
>
>
Correction, the main postgres process does not indicate any awareness that
pg_isready is trying to connect. The msgs I listed above are just from
strace attaching.

 The same happens if I try psql. Psql just waits indefinitely.

Fwd: [HACKERS] Fwd: Debug strategy for musl Postgres?

От
John Mudd
Дата:
On Sun, Apr 13, 2014 at 4:28 PM, Andres Freund <andres@2ndquadrant.com>wrote:

> Hi,
>
> On 2014-04-13 16:08:00 -0400, John Mudd wrote:
> > I built Postgres 9.3.4 from source on top of the musl C library,
> > http://www.musl-libc.org/
> > I also built zlib, bzip2, ncurses, openssl, readline and Python using
> musl
> > as a foundation for Postgres.
> >
> > I'm using musl to increase the portability of the Postgres binary. I
> build
> > on Ubuntu 13.10 but will runs on older Linux boxes.
> >
> > So far I get better results with the musl Postgres built on modern Ubuntu
> > and running on an old kernel than building Postgres directly on the old
> > Linux using standard C library. But the musl Postgres is still not
> working
> > fully. I'm not getting responses from the server.
>
> I tend to think that this is more a matter for the musl devs than
> postgres. Postgres works on a fair numbers of libcs and musl is pretty
> new and rough around the edges.
>

Okay. I just wanted to check here too.


>
> > clock_gettime(0, 0xbfffa5a8)            = -1 ENOSYS (Function not
> implemented)
>
> This looks suspicious.
>
> > gettimeofday(NULL, {300, 0})            = 0
> > poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, 3000) = 1
> > sendto(3, "\0\0\0?\0\3\0\0user\0jmudd\0database\0jmud"..., 63, 0x4000,
> > NULL, 0) = 63
> >
> > clock_gettime(0, 0xbfffa5a8)            = -1 ENOSYS (Function not
> > implemented)
> > gettimeofday(NULL, {300, 0})            = 0
> > poll([{fd=3, events=POLLIN|POLLERR}], 1, 3000) = 0
>
> Here a poll didn't return anything. You'll likely have to look at
> the server side.
>

Yes, the server. It's in a tight loop. This is all it's doing. Thanks, I'll
look into this.

 clock_gettime(0, 0xbfffded8)            = -1 ENOSYS (Function not
implemented)
 gettimeofday(NULL, {300, 0})            = 0
clock_gettime(0, 0xbfffded8)            = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0})            = 0
clock_gettime(0, 0xbfffded8)            = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0})            = 0
clock_gettime(0, 0xbfffded8)            = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0})            = 0
clock_gettime(0, 0xbfffded8)            = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0})            = 0



>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund                     http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: Debug strategy for musl Postgres?

От
Stefan Kaltenbrunner
Дата:
On 04/13/2014 10:19 PM, John Mudd wrote:
>
> On Sun, Apr 13, 2014 at 12:04 PM, Euler Taveira <euler@timbira.com.br
> <mailto:euler@timbira.com.br>> wrote:
>
>     On 13-04-2014 00:40, John Mudd wrote:
>     > I built Postgres 9.3.4 from source on top of the musl C library,
>     > http://www.musl-libc.org/
>     > I also built zlib, bzip2, ncurses, openssl, readline and Python
>     using musl
>     > as a foundation for Postgres.
>     >
>     This is not a bug. This kind of discussion belongs to -hackers.
>
>     While reading this email, I give musl a try. I'm using Debian jessie
>     which contains musl 1.0.0. I compiled the source (git master) using
>     CC="musl-gcc" and disabled zlib and readline. It passed all regression
>     tests. I also tried a pgbench which ran like a charm. (After installed
>     the binaries I had to set the libray path for musl in
>     /etc/ld-musl-x86_64.d.)
>
>     > I'm using musl to increase the portability of the Postgres binary.
>     I build
>     > on Ubuntu 13.10 but will runs on older Linux boxes.
>     >
>     Could you give details about your architecture?
>
>
> Built on 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:25:33 UTC 2013
> i686 i686 i686 GNU/Linux
> Runs fine there.
>
> Moved postgres install directory to  2.4.21-4.EL #1 Fri Oct 3 18:13:58
> EDT 2003 i686 i686 i386 GNU/Linux
> Not working fully there.
> Note: It's says 2.4 kernel but I've been told that's misleading. The
> kernel has upgrades that make it effectively 2.6.

This looks like a RHEL3 version number, and while that kernel was kind
of creepy thing with a lot of patches (also from the 2.6 era) backport
it is definititly not a 2.6 kernel(also note that 2.6.0 was released in
december of 2003 while RHEL 3 was released in october that year. Juding
from the version number this also seems to be based on the very first
RHEL3 kernel missing all follow up bugfixed during the RHEL3 lifetime.

So I would be very much not surprised if a modern and young C-library
running on a >10 year old kernel that never looked like the upstream
kernel misbehaved with a complex userspace app like postgresql.



Stefan