Обсуждение: Solaris 10u9, PG 8.4.6, 'c' lang function, fails on 1 of 5 servers
Hello Postgresql Community Members,
I am stumped trying to install a few 'c' language functions
on a particular Solaris server (64-bit, amd cpu arch (not sparc)). I actually
have 5 Postgresql servers, and the .so loads fine into 4 of them, but
refuses to load into the 5th. I've quintuple checked the file
permissions, build of the .so, gcc versions, PostgreSQL versions,
etc... I've had a college double check my work. We're both stumped.
Details to follow.
All servers are running Solaris 10u9 on 64-bit hardware inside
Solaris zones. Two of the servers are X4720's, 144GB ram, 24 Intel
CPU cores. These two servers run the 4 working Solaris zones that
are able to load the function implemented in the .so files. Postgresql
version 8.4.6, compiled from source (not a binary package).
The server that is misbehaving is an X4600, 128 GB ram, 16 AMD CPU
cores, but otherwise identical: Solaris 10u9, 64-bit OS, Postgresql
8.4.6. All 5 systems use the stock gcc that ships with Solaris (v3.4.3,
its old, I know).
The permissions on the files and Postgresql directories. First the
a working server, then the server that is not working as expected.
(root@working: </db>) # ls -ld /db /db/*.so
drwx------ 11 pgsql root 23 Sep 27 10:39 /db
-rwxr-xr-x 1 root root 57440 Sep 27 10:39 /db/pgsql_micr_parser_64.so
(root@working: </db>) # psql -Upgsql -dpostgres -c"select version();"
PostgreSQL 8.4.6 on x86_64-pc-solaris2.11, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802), 64-bit
(root@working: </db>) # file /opt/local/x64/postgresql-8.4.6/bin/postgres
/opt/local/x64/postgresql-8.4.6/bin/postgres: ELF 64-bit LSB executable AMD64 Version 1 [SSE], dynamically linked, not stripped
(root@working: </db>) # psql -Upgsql -dmy_db -c"create or replace function parse_micr(text) returns micr_struct
as '/db/pgsql_micr_parser_64.so', 'pgsql_micr_parser' language c volatile cost 1;"
CREATE FUNCTION
(root@working: </db>) # psql -Upgsql -dmy_db -t -c"select transit from parse_micr(':8888=8888: <45800=100<');"
8888=8888
(root@failed: </db>) # ls -ld /db /db/*.so
drwx------ 11 pgsql root 24 Sep 29 11:16 /db
-rwxr-xr-x 1 root root 57440 Sep 29 09:46 /db/pgsql_micr_parser_64.so
(root@failed: </db>) # psql -Upgsql -dpostgres -c"select version();"
PostgreSQL 8.4.6 on x86_64-pc-solaris2.11, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802), 64-bit
(root@failed: </db>) # file /opt/local/x64/postgresql-8.4.6/bin/postgres
/opt/local/x64/postgresql-8.4.6/bin/postgres: ELF 64-bit LSB executable AMD64 Version 1 [SSE], dynamically linked, not stripped
(root@failed: </db>) # psql -Upgsql -dmy_db -c"create or replace function parse_micr(text) returns micr_struct
as '/db/pgsql_micr_parser_64.so', 'pgsql_micr_parser' language c volatile cost 1;"
ERROR: could not load library "/db/pgsql_micr_parser_64.so": ld.so.1: postgres: fatal: /db/pgsql_micr_parser_64.so: Permission denied
Ok. Well, the file permissions are correct, so what gives? Next
step is to trace the backend process as it attempts to load the .so.
So I connect to the "failed" server via pgAdmin and run "select getpid();"
I then run "truss -p <PID>" from my shell, and in pgAdmin, execute the
SQL to create the function. This is the result of the system trace:
(root@failed: </db>) # truss -p 10369
recv(9, 0x0097C103, 5, 0) (sleeping...)
recv(9, "170301\0 ", 5, 0) = 5
recv(9, " TBEE5 n J\0 VF6E4DDCF84".., 32, 0) = 32
recv(9, "170301\0B0", 5, 0) = 5
recv(9, "AAD5A5 L97B0CEA5A9F0CD89".., 176, 0) = 176
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9520) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9530) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF8F50) = 0
resolvepath("/db/pgsql_micr_parser_64.so", "/db/pgsql_micr_parser_64.so", 1023) = 27
open("/db/pgsql_micr_parser_64.so", O_RDONLY) = 22
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 22, 0) Err#13 EACCES
close(22) = 0
setcontext(0xFFFFFD7FFFDF9050)
setcontext(0xFFFFFD7FFFDF9BB0)
We can see that the backend is able to open the .so file for
reading, but the mmap fails. From the Solaris man page on mmap:
ERRORS
The mmap() function will fail if:
EACCES The fildes file descriptor is not open for
read, regardless of the protection speci-
fied; or fildes is not open for write and
PROT_WRITE was specified for a MAP_SHARED
type mapping.
My analysis:
1) The file descriptor (#22) is open for O_RDONLY.
2) PROT_WRITE and MAP_SHARED are not specified, so write access is not relevant.
Things that I tried, unsuccessfully:
1) I recompiled the .so on the target system (X4600, AMD chips) just
in case it is somehow different from the .so that got built on the
working system (X4270, Intel chips).
2) Tested with a different .so (I have another that implements forward
and reverse DNS lookups, so one may invoke DNS functions inside SQL
statements). Same behavior. Loads fine on the X4270 systems, but
fails on the X4600 system.
3) Compiled both .so's on 32-bit and 64-bit Gentoo Linux and load them
into Postgresql 9.0.4. Works fine.
4) Compiled both .so's on 64-bit Solaris 10u9, postgresql 9.1 on an
X4270 and it loads fine there too.
5) Examined a truss on a working system while loading the function.
Since it loaded fine already, I had to drop the function, then
disconnect pgAdmin (to make the backend exit), reconnect and redo
the "create function":
(root@working: </db>) # truss -p 16921
## (I elided a bunch of non-relevant grovelling though the FSM mapped file)
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9520) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9530) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF8F50) = 0
resolvepath("/db/pgsql_micr_parser_64.so", "/db/pgsql_micr_parser_64.so", 1023) = 27
open("/db/pgsql_micr_parser_64.so", O_RDONLY) = 22
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 22, 0) = 0xFFFFFD7FFED80000
mmap(0x00010000, 90112, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, 4294967295, 0) = 0xFFFFFD7FFED00000
mmap(0xFFFFFD7FFED00000, 21997, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 22, 0) = 0xFFFFFD7FFED00000
mmap(0xFFFFFD7FFED15000, 2576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 22, 20480) = 0xFFFFFD7FFED15000
munmap(0xFFFFFD7FFED06000, 61440) = 0
memcntl(0xFFFFFD7FFED00000, 7008, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(22) = 0
6) There is nothing interesting in dmesg or syslog.
7) Disconnecting and reconnecting a few times, to try a freshly
launched backend. No luck.
Any thoughts or suggestions?
I am stumped trying to install a few 'c' language functions
on a particular Solaris server (64-bit, amd cpu arch (not sparc)). I actually
have 5 Postgresql servers, and the .so loads fine into 4 of them, but
refuses to load into the 5th. I've quintuple checked the file
permissions, build of the .so, gcc versions, PostgreSQL versions,
etc... I've had a college double check my work. We're both stumped.
Details to follow.
All servers are running Solaris 10u9 on 64-bit hardware inside
Solaris zones. Two of the servers are X4720's, 144GB ram, 24 Intel
CPU cores. These two servers run the 4 working Solaris zones that
are able to load the function implemented in the .so files. Postgresql
version 8.4.6, compiled from source (not a binary package).
The server that is misbehaving is an X4600, 128 GB ram, 16 AMD CPU
cores, but otherwise identical: Solaris 10u9, 64-bit OS, Postgresql
8.4.6. All 5 systems use the stock gcc that ships with Solaris (v3.4.3,
its old, I know).
The permissions on the files and Postgresql directories. First the
a working server, then the server that is not working as expected.
(root@working: </db>) # ls -ld /db /db/*.so
drwx------ 11 pgsql root 23 Sep 27 10:39 /db
-rwxr-xr-x 1 root root 57440 Sep 27 10:39 /db/pgsql_micr_parser_64.so
(root@working: </db>) # psql -Upgsql -dpostgres -c"select version();"
PostgreSQL 8.4.6 on x86_64-pc-solaris2.11, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802), 64-bit
(root@working: </db>) # file /opt/local/x64/postgresql-8.4.6/bin/postgres
/opt/local/x64/postgresql-8.4.6/bin/postgres: ELF 64-bit LSB executable AMD64 Version 1 [SSE], dynamically linked, not stripped
(root@working: </db>) # psql -Upgsql -dmy_db -c"create or replace function parse_micr(text) returns micr_struct
as '/db/pgsql_micr_parser_64.so', 'pgsql_micr_parser' language c volatile cost 1;"
CREATE FUNCTION
(root@working: </db>) # psql -Upgsql -dmy_db -t -c"select transit from parse_micr(':8888=8888: <45800=100<');"
8888=8888
(root@failed: </db>) # ls -ld /db /db/*.so
drwx------ 11 pgsql root 24 Sep 29 11:16 /db
-rwxr-xr-x 1 root root 57440 Sep 29 09:46 /db/pgsql_micr_parser_64.so
(root@failed: </db>) # psql -Upgsql -dpostgres -c"select version();"
PostgreSQL 8.4.6 on x86_64-pc-solaris2.11, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802), 64-bit
(root@failed: </db>) # file /opt/local/x64/postgresql-8.4.6/bin/postgres
/opt/local/x64/postgresql-8.4.6/bin/postgres: ELF 64-bit LSB executable AMD64 Version 1 [SSE], dynamically linked, not stripped
(root@failed: </db>) # psql -Upgsql -dmy_db -c"create or replace function parse_micr(text) returns micr_struct
as '/db/pgsql_micr_parser_64.so', 'pgsql_micr_parser' language c volatile cost 1;"
ERROR: could not load library "/db/pgsql_micr_parser_64.so": ld.so.1: postgres: fatal: /db/pgsql_micr_parser_64.so: Permission denied
Ok. Well, the file permissions are correct, so what gives? Next
step is to trace the backend process as it attempts to load the .so.
So I connect to the "failed" server via pgAdmin and run "select getpid();"
I then run "truss -p <PID>" from my shell, and in pgAdmin, execute the
SQL to create the function. This is the result of the system trace:
(root@failed: </db>) # truss -p 10369
recv(9, 0x0097C103, 5, 0) (sleeping...)
recv(9, "170301\0 ", 5, 0) = 5
recv(9, " TBEE5 n J\0 VF6E4DDCF84".., 32, 0) = 32
recv(9, "170301\0B0", 5, 0) = 5
recv(9, "AAD5A5 L97B0CEA5A9F0CD89".., 176, 0) = 176
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9520) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9530) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF8F50) = 0
resolvepath("/db/pgsql_micr_parser_64.so", "/db/pgsql_micr_parser_64.so", 1023) = 27
open("/db/pgsql_micr_parser_64.so", O_RDONLY) = 22
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 22, 0) Err#13 EACCES
close(22) = 0
setcontext(0xFFFFFD7FFFDF9050)
setcontext(0xFFFFFD7FFFDF9BB0)
We can see that the backend is able to open the .so file for
reading, but the mmap fails. From the Solaris man page on mmap:
ERRORS
The mmap() function will fail if:
EACCES The fildes file descriptor is not open for
read, regardless of the protection speci-
fied; or fildes is not open for write and
PROT_WRITE was specified for a MAP_SHARED
type mapping.
My analysis:
1) The file descriptor (#22) is open for O_RDONLY.
2) PROT_WRITE and MAP_SHARED are not specified, so write access is not relevant.
Things that I tried, unsuccessfully:
1) I recompiled the .so on the target system (X4600, AMD chips) just
in case it is somehow different from the .so that got built on the
working system (X4270, Intel chips).
2) Tested with a different .so (I have another that implements forward
and reverse DNS lookups, so one may invoke DNS functions inside SQL
statements). Same behavior. Loads fine on the X4270 systems, but
fails on the X4600 system.
3) Compiled both .so's on 32-bit and 64-bit Gentoo Linux and load them
into Postgresql 9.0.4. Works fine.
4) Compiled both .so's on 64-bit Solaris 10u9, postgresql 9.1 on an
X4270 and it loads fine there too.
5) Examined a truss on a working system while loading the function.
Since it loaded fine already, I had to drop the function, then
disconnect pgAdmin (to make the backend exit), reconnect and redo
the "create function":
(root@working: </db>) # truss -p 16921
## (I elided a bunch of non-relevant grovelling though the FSM mapped file)
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9520) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9530) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF8F50) = 0
resolvepath("/db/pgsql_micr_parser_64.so", "/db/pgsql_micr_parser_64.so", 1023) = 27
open("/db/pgsql_micr_parser_64.so", O_RDONLY) = 22
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 22, 0) = 0xFFFFFD7FFED80000
mmap(0x00010000, 90112, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, 4294967295, 0) = 0xFFFFFD7FFED00000
mmap(0xFFFFFD7FFED00000, 21997, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 22, 0) = 0xFFFFFD7FFED00000
mmap(0xFFFFFD7FFED15000, 2576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 22, 20480) = 0xFFFFFD7FFED15000
munmap(0xFFFFFD7FFED06000, 61440) = 0
memcntl(0xFFFFFD7FFED00000, 7008, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(22) = 0
6) There is nothing interesting in dmesg or syslog.
7) Disconnecting and reconnecting a few times, to try a freshly
launched backend. No luck.
Any thoughts or suggestions?
On Thu, 2011-09-29 at 12:08 -0500, dennis jenkins wrote: > ERROR: could not load library "/db/pgsql_micr_parser_64.so": ld.so.1: > postgres: fatal: /db/pgsql_micr_parser_64.so: Permission denied for a different shared object, but may provide clues... Error: "- adding iplike database function... <snip> org.postgresql.util.PSQLException: ERROR: could not access file '<snip>/lib/iplike.so': Permission denied" The PostgreSQL server cannot access the iplike.so file. This could be due to the file itself not having appropriate permissions for the user that PostgreSQL runs as and /or one or more of the parent directories of the iplike.so not having appropriate permissions. Error: "- adding iplike database function... <snip> org.postgresql.util.PSQLException: ERROR: could not load library ..." The latter part of the error could be something like "<path>/iplike.so: cannot open shared object file: No such file or directory" or "ld.so.1: postgres: fatal: <path>/iplike.so: wrong ELF class: ELFCLASS32". The PostgreSQL server cannot load the iplike.so file. This is almost always caused by the PostgreSQL server and the iplike.so file being compiled for different processor instruction sets.
On Thu, Sep 29, 2011 at 12:08 PM, dennis jenkins <dennis.jenkins.75@gmail.com> wrote:
(root@failed: </db>) # psql -Upgsql -dmy_db -c"create or replace function parse_micr(text) returns micr_struct
as '/db/pgsql_micr_parser_64.so', 'pgsql_micr_parser' language c volatile cost 1;"
ERROR: could not load library "/db/pgsql_micr_parser_64.so": ld.so.1: postgres: fatal: /db/pgsql_micr_parser_64.so: Permission denied
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF8F50) = 0
resolvepath("/db/pgsql_micr_parser_64.so", "/db/pgsql_micr_parser_64.so", 1023) = 27
open("/db/pgsql_micr_parser_64.so", O_RDONLY) = 22
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 22, 0) Err#13 EACCES
close(22) = 0
Problem solved.
First, some more background. These 5 postgresql servers are local zones running in Solaris. (like a "linux container"). The database itself is inside "/db". The database user ("pgsql") has no problems reading any file there, including the micr_parser.so file.
"/db" is not just a directory under "/". It is a separate file system mounted within the zone (via a loopback mount). The actual file-system is a ZFS sub-filesystem on a dedicated zpool on the (shared) servers. On the four servers where the "create function" worked, "/db" was mounted with options "nodevices".
However, on the server where it failed, "/db" was mounted with "nodevices,noexec". This was causing mmap() to fail when it requested "PROT_EXEC" access.
My inspiration for solving this riddle was that I copied the .so to a local directory under "/" that was not under "/db". "create function" then succeeded. The little light bulb over my head turned on and I began checking filesystem mount flags.
So, for anyone who finds this posting via a search engine years from now... if "create function" fails with "permission denied", also check that the filesystem holding the ".so" file is not mounted with "exec" disabled.
I've not tested this behavior on Linux. I humble suggest putting a note in the postgresql documentation about the FS mount flags impact on 'C' language functions.