Обсуждение: Re: [GENERAL] query log corrupted-looking entries

Поиск
Список
Период
Сортировка

Re: [GENERAL] query log corrupted-looking entries

От
Tom Lane
Дата:
"George Pavlov" <gpavlov@mynewplace.com> writes:
>> Hmm.  If the messages are less than PIPE_BUF bytes long (4096 bytes on
>> Linux) then the writes are supposed to be atomic.

> Some of them involve long messages (>4K), but there are many that do not
> (like the ones I had posted at the start of this thread).

I checked around with some kernel/glibc gurus in Red Hat, and the
consensus seemed to be that we'd be better off to bypass fprintf() and
just send message strings to stderr using write() --- ie, instead of
elog.c doing

            fprintf(stderr, "%s", buf.data);

do

            write(fileno(stderr), buf.data, strlen(buf.data));

Anyone have any comments on possible portability risks?  In particular,
will this work on Windows?

            regards, tom lane

Re: [GENERAL] query log corrupted-looking entries

От
"Magnus Hagander"
Дата:
> >> Hmm.  If the messages are less than PIPE_BUF bytes long
> (4096 bytes
> >> on
> >> Linux) then the writes are supposed to be atomic.
>
> > Some of them involve long messages (>4K), but there are
> many that do
> > not (like the ones I had posted at the start of this thread).
>
> I checked around with some kernel/glibc gurus in Red Hat, and
> the consensus seemed to be that we'd be better off to bypass
> fprintf() and just send message strings to stderr using
> write() --- ie, instead of elog.c doing
>
>             fprintf(stderr, "%s", buf.data);
>
> do
>
>             write(fileno(stderr), buf.data, strlen(buf.data));
>
> Anyone have any comments on possible portability risks?  In
> particular, will this work on Windows?

Should work fine on Windows. fileno() is deprecated however, with the
following comment:
        C:\Program Files\Microsoft Visual Studio
8\VC\INCLUDE\stdio.h(688) : see
 declaration of 'fileno'
        Message: 'The POSIX name for this item is deprecated. Instead,
use the ISO C++ conformant name: _fileno. See online help for details.'


It still works, and there is a define to get around that warning though,
so it's definitly not critical.


//Magnus

Re: [GENERAL] query log corrupted-looking entries

От
Tom Lane
Дата:
"Magnus Hagander" <mha@sollentuna.net> writes:
> Should work fine on Windows. fileno() is deprecated however, with the
> following comment:
>         C:\Program Files\Microsoft Visual Studio
> 8\VC\INCLUDE\stdio.h(688) : see
>  declaration of 'fileno'
>         Message: 'The POSIX name for this item is deprecated. Instead,
> use the ISO C++ conformant name: _fileno. See online help for details.'

Only Microsoft would think that they get to unilaterally deprecate a
POSIX standard API :-(

We appear to be using fileno() in some dozens of places already,
so I see no reason to worry about this.

            regards, tom lane

Re: [GENERAL] query log corrupted-looking entries

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> "Magnus Hagander" <mha@sollentuna.net> writes:
> > Should work fine on Windows. fileno() is deprecated however, with the
> > following comment:
> >         C:\Program Files\Microsoft Visual Studio
> > 8\VC\INCLUDE\stdio.h(688) : see
> >  declaration of 'fileno'
> >         Message: 'The POSIX name for this item is deprecated. Instead,
> > use the ISO C++ conformant name: _fileno. See online help for details.'
>
> Only Microsoft would think that they get to unilaterally deprecate a
> POSIX standard API :-(

Yeah, these are the same guys who chose to implement select() to only
work on sockets and not on plain file descriptors (for example you can't
make it work on a pipe).  This must be the single most far-reaching
change they did to POSIX compatibility.  It's incredibly idiotic and
problematic.  And note that it's not that it's deprecated -- it just
doesn't work at all :-(

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [GENERAL] query log corrupted-looking entries

От
Tom Lane
Дата:
I wrote:
> I checked around with some kernel/glibc gurus in Red Hat, and the
> consensus seemed to be that we'd be better off to bypass fprintf() and
> just send message strings to stderr using write() --- ie, instead of
> elog.c doing
>             fprintf(stderr, "%s", buf.data);
> do
>             write(fileno(stderr), buf.data, strlen(buf.data));

I did some strace'ing of the backend, and observed that even for very
long messages (upwards of 100K) the fprintf results in a single write()
call.  This was true on both Fedora Core 5 and a pretty old HPUX version.
So it'd seem that most versions of libc already know about optimizing
fprintf-%s into a direct write(), and changing our code wouldn't change
the behavior.  It'd be interesting to verify whether it's the same on
George's machine though.
        regards, tom lane


Re: [GENERAL] query log corrupted-looking entries

От
"George Pavlov"
Дата:
> the behavior.  It'd be interesting to verify whether it's the same on
> George's machine though.

Let me know how to test this. (Please do a "for dummies" version -- I am
not sure I can figure it out from the thread even though someone else
might be able to.)



Re: [GENERAL] query log corrupted-looking entries

От
Tom Lane
Дата:
"George Pavlov" <gpavlov@mynewplace.com> writes:
>> It'd be interesting to verify whether it's the same on
>> George's machine though.

> Let me know how to test this.

Identify the PID of one of your active backends (from "ps" or by looking
in pg_stat_activity), and then run
strace -p backend-pid-here 2>tmpfile

You'll need to do that as either root or postgres.  Let it run for a
minute or three, preferably long enough to capture some long log
messages being emitted by that backend.  (Bonus points if you can
capture one that's actually intermixed with others in the log, but
that might take some luck.)  Stop it with control-C whenever you get
bored, and then grep through the tmpfile for lines beginning "write(2,".
It should be pretty clear whether log messages are being sent as single
or multiple write() operations.
        regards, tom lane


Re: [GENERAL] query log corrupted-looking entries

От
"Magnus Hagander"
Дата:
> > > Should work fine on Windows. fileno() is deprecated however, with
> > > the following comment:
> > >         C:\Program Files\Microsoft Visual Studio
> > > 8\VC\INCLUDE\stdio.h(688) : see
> > >  declaration of 'fileno'
> > >         Message: 'The POSIX name for this item is deprecated.
> > > Instead, use the ISO C++ conformant name: _fileno. See
> online help for details.'
> >
> > Only Microsoft would think that they get to unilaterally
> deprecate a
> > POSIX standard API :-(

Well.
a) It's not a POSIX platform.
b) There is a #define you can set to disable the deprecation, from what
I can tell.


> Yeah, these are the same guys who chose to implement select()
> to only work on sockets and not on plain file descriptors
> (for example you can't make it work on a pipe).  This must be
> the single most far-reaching change they did to POSIX
> compatibility.  It's incredibly idiotic and problematic.  And
> note that it's not that it's deprecated -- it just doesn't
> work at all :-(

You will notice that in the win32 API there is no such thing as a "plain
file descriptor". The Windows API has you using HANDLEs for both sockets
and files, and as long as you use the Windows API, you can pass them
both to the same function (just use waitformultipleobjects instead of
select).
select() is only in the winsock library, which is there only for
compatibility with Berkeley Sockets. And it works with sockets created
that way...

That said, it can certainly be irritating :-) Oh, and if you want it for
pipes, look at some of the port code for win32 - claudio did a pipe()
implementation that works fine with select().

//Magnus

Re: [GENERAL] query log corrupted-looking entries

От
"George Pavlov"
Дата:
Hoping to resurrect this thread. I am seeing more and more of this as
the database gets more usage and it really messes up query log analysis.


A quick summary: When I posted this was getting corrupted query log
entries. I still am. They look like this (the problem line + one line
before and after):

2007-05-15 14:24:52 PDT [3859]: [968-1] LOG:  duration: 0.071 ms
2007-05-15 14:24:52 PDT [3859]: [969-1] LOG:  statement2007-05-15
14:24:52 PDT [3885]: [208-1] LOG:  duration: 11.800 ms
: PREPARE <unnamed> AS SELECT P.IS_DISPLAY, P.IS_ACTIVE,
P.CITY_POSTAL_CODE_ID,P.property_name,P.address_1,P.address_2,

This seems to happen with statements of all lengths (some are long, but
many are not).

This used to happen on 8.1.3. I am now on: PostgreSQL 8.1.8 on
x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red
Hat 3.4.6-3)

I think what Tom suggested last (see below) is that I try something on
my machine, but it was not entirely to me what exactly and I failed to
follow up. I'd be happy to do any tests to help diagnose this better,
although I don't have a reliable way to reproduce it ("put a lot of
stress on the DB" is all I can do and hope it happens).

Thanks,

George



> -----Original Message-----
> From: George Pavlov
> Sent: Wednesday, October 18, 2006 1:23 PM
> To: Tom Lane
> Cc: pgsql-hackers@postgreSQL.org
> Subject: RE: [GENERAL] query log corrupted-looking entries
>
> > the behavior.  It'd be interesting to verify whether it's
> the same on
> > George's machine though.
>
> Let me know how to test this. (Please do a "for dummies"
> version -- I am not sure I can figure it out from the thread
> even though someone else might be able to.)
>
>

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, October 18, 2006 1:17 PM
> To: George Pavlov
> Cc: pgsql-hackers@postgreSQL.org
> Subject: Re: [GENERAL] query log corrupted-looking entries
>
> I wrote:
> > I checked around with some kernel/glibc gurus in Red Hat, and the
> > consensus seemed to be that we'd be better off to bypass
> fprintf() and
> > just send message strings to stderr using write() --- ie, instead of
> > elog.c doing
> >             fprintf(stderr, "%s", buf.data);
> > do
> >             write(fileno(stderr), buf.data, strlen(buf.data));
>
> I did some strace'ing of the backend, and observed that even for very
> long messages (upwards of 100K) the fprintf results in a
> single write()
> call.  This was true on both Fedora Core 5 and a pretty old
> HPUX version.
> So it'd seem that most versions of libc already know about optimizing
> fprintf-%s into a direct write(), and changing our code
> wouldn't change
> the behavior.  It'd be interesting to verify whether it's the same on
> George's machine though.
>
>             regards, tom lane
>

Re: [GENERAL] query log corrupted-looking entries

От
"Ed L."
Дата:
On Wednesday 23 May 2007 1:04 pm, George Pavlov wrote:
> Hoping to resurrect this thread. I am seeing more and more of
> this as the database gets more usage and it really messes up
> query log analysis.
>
>
> A quick summary: When I posted this was getting corrupted
> query log entries. I still am. They look like this (the
> problem line + one line before and after):
>
> 2007-05-15 14:24:52 PDT [3859]: [968-1] LOG:  duration: 0.071
> ms 2007-05-15 14:24:52 PDT [3859]: [969-1] LOG:
> statement2007-05-15 14:24:52 PDT [3885]: [208-1] LOG:
> duration: 11.800 ms

FWIW, I've also been seeing this sort of query log corruption for
as long as I can remember, 7.1 through 8.2, HPUX (parisc, ia64),
Linux on intel, amd...

Ed