Обсуждение: Questions regarding notify processing.

Поиск
Список
Период
Сортировка

Questions regarding notify processing.

От
Terry Lee Tucker
Дата:
Hello List:

I am working on the proper method for Notification processing. The application
interface is X-Windows and I am using the database socket as an alternate
input source to the X-Server. I have a callback that fires when there is data
to read on the socket. Here is the setup:

/* This callback fires when there is data to read on the database socket. */
void DbInputCB (XtPointer clientData, int *source, XtInputId *id)
{
    .
    .
    CheckForNotifies (ctrl);
    .
    .
    .
}

/* This function processes NOTIFY messages. */
void CheckForNotifies (Controls *ctrl)
{
   PQflush (ctrl->sys->conn);              // always flush it first ;o)

    PQconsumeInput (ctrl->sys->conn);       // suck up queued input
    while ((ctrl->notify = PQnotifies (ctrl->sys->conn)) != NULL)
    {
        /* If you are interested in receiving any NOTICE data, you will
         * have to have this function defined locally. */
        if (ctrl->main->backEndNotify)
        {
            void (*p) () = (void *) ctrl->main->backEndNotify;
            (*p) (ctrl);                    // call the function
        }
        PQfreemem (ctrl->notify);           // free the memory
    }
}

All this works great except for certain cases where one of the notify
messages, the one I'm really interested in, gets spooled, queued, or
something and is not delivered until I send another notification, after the
fact, from another client; then, I finally get the message delivered along
with the one I just sent. I have a certain interface that registers an
interest in APPT_MADE. I have a trigger that, when the final appointment is
made, issues a NOTIFY for those interested. The operation that created the
appointment is performed as an Asynchronous command with PQsendQuery due to
the fact that there may be two sql commands in the sql string, depending on
what the user does. The interface allows editing of two different tables
simultaneously. The message sent by the trigger fired as a result of the
update does not get delivered unless I send another notify, then, it is
delivered, along with the one I just sent. I tried calling CheckForNotifies()
from the interface code after the save operation, but then, after two back to
back updates, or sometimes just one, the PQsendQuery call hangs up. The only
way to "unhang it" is to issue a NOTIFY from another client, which allows
PQsendQuery to continue.

Obviously, something is blocking. Also, I don't understand why the APPT_MADE
message does not get delivered. Any pointers would be appreciated.

Thanks...

Re: Questions regarding notify processing.

От
Tom Lane
Дата:
Terry Lee Tucker <terry@esc1.com> writes:
> All this works great except for certain cases where one of the notify
> messages, the one I'm really interested in, gets spooled, queued, or
> something and is not delivered until I send another notification, after the
> fact, from another client; then, I finally get the message delivered along
> with the one I just sent.

What PG version is this?  Some of what you mention sounds a bit like old
bugs, but not enough to be sure.  Also, are you using SSL?

AFAIK the only gotcha on the server side in current code is that it
won't send notify messages while you are within a transaction; so if you
are using BEGIN/COMMIT that could be the source of the issue.

Looking at the source for PQconsumeInput, I note that depending on the
kernel behavior it may not read all available data; so you might try
turning that call into a loop
    while (PQconsumeInput(conn) > 0)
       /* loop */ ;
Ordinarily this is not necessary because if the socket remains
read-ready you should get another callback right away anyway.  But
perhaps the X toolkit is doing something weird that suppresses
repeated callbacks.

> I tried calling CheckForNotifies() from the interface code after the
> save operation, but then, after two back to back updates, or sometimes
> just one, the PQsendQuery call hangs up. The only way to "unhang it"
> is to issue a NOTIFY from another client, which allows PQsendQuery to
> continue.

That's just plain bizarre.  Can you get a stack trace to find out
exactly where it's hung?  And what is the server process doing at
the same time?

            regards, tom lane

Re: Questions regarding notify processing.

От
Terry Lee Tucker
Дата:
Thanks for the reply Tom. See comments below:

On Thursday 24 February 2005 12:06 pm, Tom Lane saith:
> Terry Lee Tucker <terry@esc1.com> writes:
> > All this works great except for certain cases where one of the notify
> > messages, the one I'm really interested in, gets spooled, queued, or
> > something and is not delivered until I send another notification, after
> > the fact, from another client; then, I finally get the message delivered
> > along with the one I just sent.
>
> What PG version is this?  Some of what you mention sounds a bit like old
> bugs, but not enough to be sure.  Also, are you using SSL?
I never remember this until after I've sent the message. Here it is:
rnd=# select version ();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 7.4.6 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.2.3
20030502 (Red Hat Linux 3.2.3-49)

>
> AFAIK the only gotcha on the server side in current code is that it
> won't send notify messages while you are within a transaction; so if you
> are using BEGIN/COMMIT that could be the source of the issue.

The message is being sent inside a transaction by an after update trigger that
updates values in another table.

>
> Looking at the source for PQconsumeInput, I note that depending on the
> kernel behavior it may not read all available data; so you might try
> turning that call into a loop
>     while (PQconsumeInput(conn) > 0)
>        /* loop */ ;
I will give this a try.

> Ordinarily this is not necessary because if the socket remains
> read-ready you should get another callback right away anyway.  But
> perhaps the X toolkit is doing something weird that suppresses
> repeated callbacks.
>
> > I tried calling CheckForNotifies() from the interface code after the
> > save operation, but then, after two back to back updates, or sometimes
> > just one, the PQsendQuery call hangs up. The only way to "unhang it"
> > is to issue a NOTIFY from another client, which allows PQsendQuery to
> > continue.
>
> That's just plain bizarre.  Can you get a stack trace to find out
> exactly where it's hung?  And what is the server process doing at
> the same time?

I will be happy to do this if you or someone else will tell me how to do it.

>
>             regards, tom lane

Thank you for your help ;o)


Re: Questions regarding notify processing.

От
Terry Lee Tucker
Дата:
See partial output from strace below. On this particular run, the command
locked up during the second edit operation. The notify message, move_update,
was delivered, but the PQsendquery never returns.

Any ideas Tom?

recv(3, "A\0\0\0\25\0\0~3move_update\0\0C\0\0\0\rUPDAT"..., 16384, 0) = 42
write(2, "CheckForNotifies: ASYNC NOTIFY o"..., 82CheckForNotifies: ASYNC
NOTIFY of 'move_update' from backend pid '32307' received
) = 82
write(2, "main/main.c: 1638: backEndNotify"..., 54main/main.c: 1638:
backEndNotify (32307: move_update)
) = 54
write(2, "main/main.c: 1655: backEndNotify"..., 43main/main.c: 1655:
backEndNotify (leaving)
) = 43
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
ioctl(4, FIONREAD, [32])                = 0
read(4, "\34\364_\264\311\4\300\7<\1\0\0\304\243G\22\0\0\0\0\304"..., 32) = 32
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
ioctl(4, FIONREAD, [32])                = 0
read(4, "\34\364_\264\311\4\300\7\f\1\0\0\305\243G\22\0\4\300\7"..., 32) = 32
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
ioctl(4, FIONREAD, [384])               = 0
read(4, "\34\364_\264\311\4\300\7\365\0\0\0\306\243G\22\0\0\0\0"..., 384) =
384
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
ioctl(4, FIONREAD, [32])                = 0
read(4, "\226\252_\264\311\4\300\7\311\4\300\7\0\0\0\0\370\0\177"..., 32) = 32
write(4, "\2\2\4\0\314\4\300\7\0\10\0\0}\240 \0", 16) = 16
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
write(4, "8\2\4\0\32\0\300\7\0\0\10\0\0\0\0\0>\5\7\0\312\4\300\7"..., 500) =
500
read(4, 0xbffff040, 32)                 = -1 EAGAIN (Resource temporarily
unavailable)
select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
read(4, "\1\1t\264\0\0\0\0\304\364\3\1\370\0\177\1\320\2460\t`\347"..., 32) =
32
write(4, "\17\2\2\0\314\4\300\7", 8)    = 8
read(4, 0xbffff120, 32)                 = -1 EAGAIN (Resource temporarily
unavailable)
select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
read(4, "\1ku\264\0\0\0\0\257\0\0\0\311\4\300\7\0\0\0\0\0\0\0\0"..., 32) = 32
write(4, "B\2\7\0\314\4\300\7\24\0\300\7\256\0@\0\3\1@\0\256\0A\0"..., 64) =
64
read(4, 0xbffff0b0, 32)                 = -1 EAGAIN (Resource temporarily
unavailable)
select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
read(4, "\1tx\264\0\0\0\0\257\0\0\0\311\4\300\7\0\0;\t\260\267\343"..., 32) =
32
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
ioctl(4, FIONREAD, [0])                 = 0
write(4, "F\2\v\0\314\4\300\7\r\0\300\7\252\0<\0^\0\2\0\252\0j\0"..., 44) = 44
ioctl(4, FIONREAD, [0])                 = 0
gettimeofday({1109328767, 349228}, NULL) = 0
select(5, [3 4], [], [], NULL)          = 1 (in [4])
ioctl(4, FIONREAD, [32])                = 0
read(4, "\36\203y\264\0\0\0\0\247\3\300\7]\1`\1\1\0\0\0^\1\0\0%"..., 32) = 32
write(4, "\24\0\6\0]\1`\1%\1\0\0\0\0\0\0\0\0\0\0\200\226\230\0", 24) = 24
read(4, 0xbffff2f0, 32)                 = -1 EAGAIN (Resource temporarily
unavailable)
select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
read(4, "\1\0z\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32) = 32
write(4, "+\0\1\0", 4)                  = 4
read(4, 0xbffff2e0, 32)                 = -1 EAGAIN (Resource temporarily
unavailable)
select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
read(4, "\1\1{\264\0\0\0\0\311\4\300\7\0\0\0\0\1\0\0\0\2\0\0\0@"..., 32) = 32
write(4, "\22\0\25\0]\1`\1%\1\0\0\4\0\0\0 \0\0\0\17\0\0\0^\1\0\0"..., 132) =
132
read(4, 0xbffff440, 32)                 = -1 EAGAIN (Resource temporarily
unavailable)
select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
read(4, "\1\1~\264\0\0\0\0\311\4\300\7\0\0\0\0\0\0\0\0\2\0\0\0@"..., 32) = 32
select(5, [3], [], [], {0, 0})          = 0 (Timeout)
ioctl(4, FIONREAD, [0])                 = 0
ioctl(4, FIONREAD, [0])                 = 0
gettimeofday({1109328768, 129317}, NULL) = 0
select(5, [3 4], [], [], NULL)          = 1 (in [4])

Re: Questions regarding notify processing.

От
Terry Lee Tucker
Дата:
This thing is hanging up in PQnotifies. I have added a simple fprintf
statement to the following function:
void CheckForNotifies (Controls *ctrl)
{
#ifndef CALL_TRACE
    fprintf (stderr, "%s: %d: %s (%s)\n", __FILE__, __LINE__,
            __FUNCTION__, ctrl->table);
#endif

    PQflush (ctrl->sys->conn);              // always flush it first ;o)

    PQconsumeInput (ctrl->sys->conn);
    fprintf (stderr, "just before call to PQnotifies...\n");
    while ((ctrl->notify = PQnotifies (ctrl->sys->conn)) != NULL)
    {
#ifndef DEBUG
        fprintf (stderr,
        "%s: ASYNC NOTIFY of '%s' from backend pid '%d' received\n",
                 __FUNCTION__, ctrl->notify->relname,
                 ctrl->notify->be_pid);
#endif
        /* If you are interested in receiving any NOTICE data, you will
         * have to have this function defined locally. */
        if (ctrl->main->backEndNotify)
        {
            void (*p) () = (void *) ctrl->main->backEndNotify;
            (*p) (ctrl);                    // call the function
        }
        PQfreemem (ctrl->notify);           // free the memory
    }
}
The fprintf statement prints and nothing else happens. Observe:
move/move.c: 2072: preMenuActFunc (move::save_b)
move/move.c: 2718: save_SQL ()
move/move.c: 2798: save_SQL ()
move/move.c: 2816: restoreState ()
utility/global.c: 1804: CheckForNotifies (move)
just before call to PQnotifies...

And that is where it sits until I send a message notification from a separate
client (plsql). When the message is sent, the process continues. Is this a
bug?  Did anybody look at the stack trace is sent? I don't have the expertise
to analyze it.

TIA

On Friday 25 February 2005 06:00 am, Terry Lee Tucker saith:
> See partial output from strace below. On this particular run, the command
> locked up during the second edit operation. The notify message,
> move_update, was delivered, but the PQsendquery never returns.
>
> Any ideas Tom?
>
> recv(3, "A\0\0\0\25\0\0~3move_update\0\0C\0\0\0\rUPDAT"..., 16384, 0) = 42
> write(2, "CheckForNotifies: ASYNC NOTIFY o"..., 82CheckForNotifies: ASYNC
> NOTIFY of 'move_update' from backend pid '32307' received
> ) = 82
> write(2, "main/main.c: 1638: backEndNotify"..., 54main/main.c: 1638:
> backEndNotify (32307: move_update)
> ) = 54
> write(2, "main/main.c: 1655: backEndNotify"..., 43main/main.c: 1655:
> backEndNotify (leaving)
> ) = 43
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> ioctl(4, FIONREAD, [32])                = 0
> read(4, "\34\364_\264\311\4\300\7<\1\0\0\304\243G\22\0\0\0\0\304"..., 32) =
> 32 select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> ioctl(4, FIONREAD, [32])                = 0
> read(4, "\34\364_\264\311\4\300\7\f\1\0\0\305\243G\22\0\4\300\7"..., 32) =
> 32 select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> ioctl(4, FIONREAD, [384])               = 0
> read(4, "\34\364_\264\311\4\300\7\365\0\0\0\306\243G\22\0\0\0\0"..., 384) =
> 384
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> ioctl(4, FIONREAD, [32])                = 0
> read(4, "\226\252_\264\311\4\300\7\311\4\300\7\0\0\0\0\370\0\177"..., 32) =
> 32 write(4, "\2\2\4\0\314\4\300\7\0\10\0\0}\240 \0", 16) = 16
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> write(4, "8\2\4\0\32\0\300\7\0\0\10\0\0\0\0\0>\5\7\0\312\4\300\7"..., 500)
> = 500
> read(4, 0xbffff040, 32)                 = -1 EAGAIN (Resource temporarily
> unavailable)
> select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> read(4, "\1\1t\264\0\0\0\0\304\364\3\1\370\0\177\1\320\2460\t`\347"..., 32)
> = 32
> write(4, "\17\2\2\0\314\4\300\7", 8)    = 8
> read(4, 0xbffff120, 32)                 = -1 EAGAIN (Resource temporarily
> unavailable)
> select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> read(4, "\1ku\264\0\0\0\0\257\0\0\0\311\4\300\7\0\0\0\0\0\0\0\0"..., 32) =
> 32 write(4, "B\2\7\0\314\4\300\7\24\0\300\7\256\0@\0\3\1@\0\256\0A\0"...,
> 64) = 64
> read(4, 0xbffff0b0, 32)                 = -1 EAGAIN (Resource temporarily
> unavailable)
> select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> read(4, "\1tx\264\0\0\0\0\257\0\0\0\311\4\300\7\0\0;\t\260\267\343"..., 32)
> = 32
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> ioctl(4, FIONREAD, [0])                 = 0
> write(4, "F\2\v\0\314\4\300\7\r\0\300\7\252\0<\0^\0\2\0\252\0j\0"..., 44) =
> 44 ioctl(4, FIONREAD, [0])                 = 0
> gettimeofday({1109328767, 349228}, NULL) = 0
> select(5, [3 4], [], [], NULL)          = 1 (in [4])
> ioctl(4, FIONREAD, [32])                = 0
> read(4, "\36\203y\264\0\0\0\0\247\3\300\7]\1`\1\1\0\0\0^\1\0\0%"..., 32) =
> 32 write(4, "\24\0\6\0]\1`\1%\1\0\0\0\0\0\0\0\0\0\0\200\226\230\0", 24) =
> 24 read(4, 0xbffff2f0, 32)                 = -1 EAGAIN (Resource
> temporarily unavailable)
> select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> read(4, "\1\0z\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32) =
> 32 write(4, "+\0\1\0", 4)                  = 4
> read(4, 0xbffff2e0, 32)                 = -1 EAGAIN (Resource temporarily
> unavailable)
> select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> read(4, "\1\1{\264\0\0\0\0\311\4\300\7\0\0\0\0\1\0\0\0\2\0\0\0@"..., 32) =
> 32 write(4, "\22\0\25\0]\1`\1%\1\0\0\4\0\0\0 \0\0\0\17\0\0\0^\1\0\0"...,
> 132) = 132
> read(4, 0xbffff440, 32)                 = -1 EAGAIN (Resource temporarily
> unavailable)
> select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> read(4, "\1\1~\264\0\0\0\0\311\4\300\7\0\0\0\0\0\0\0\0\2\0\0\0@"..., 32) =
> 32 select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> ioctl(4, FIONREAD, [0])                 = 0
> ioctl(4, FIONREAD, [0])                 = 0
> gettimeofday({1109328768, 129317}, NULL) = 0
> select(5, [3 4], [], [], NULL)          = 1 (in [4])
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: Questions regarding notify processing.

От
Terry Lee Tucker
Дата:
I realize that I'm carrying on a conversation with myself here; however,
nothing new 'bout that, I do it all the time ;o)

Just in case anyone else is interested, the only way I can get this to work
correctly is to put a 100 millisecond timer in front of the call to
CheckForNotifies as in:
       XtAppAddTimeOut (app, 100,
                (XtTimerCallbackProc) CheckForNotifies, (XtPointer) ctrl);
Now the 1/10 second timer, on my notebook with 1.2 gigs of ram and a 1700 MHz
processor, works just fine for now; however, when this app rolls out into the
real world with 100 users pounding on it, will 1/10 of a second still be long
enough? And, why does it have to be there at all?

Points to ponder, or not...

On Friday 25 February 2005 01:05 pm, Terry Lee Tucker saith:
> This thing is hanging up in PQnotifies. I have added a simple fprintf
> statement to the following function:
> void CheckForNotifies (Controls *ctrl)
> {
> #ifndef CALL_TRACE
>     fprintf (stderr, "%s: %d: %s (%s)\n", __FILE__, __LINE__,
>             __FUNCTION__, ctrl->table);
> #endif
>
>     PQflush (ctrl->sys->conn);              // always flush it first ;o)
>
>     PQconsumeInput (ctrl->sys->conn);
>     fprintf (stderr, "just before call to PQnotifies...\n");
>     while ((ctrl->notify = PQnotifies (ctrl->sys->conn)) != NULL)
>     {
> #ifndef DEBUG
>         fprintf (stderr,
>         "%s: ASYNC NOTIFY of '%s' from backend pid '%d' received\n",
>                  __FUNCTION__, ctrl->notify->relname,
>                  ctrl->notify->be_pid);
> #endif
>         /* If you are interested in receiving any NOTICE data, you will
>          * have to have this function defined locally. */
>         if (ctrl->main->backEndNotify)
>         {
>             void (*p) () = (void *) ctrl->main->backEndNotify;
>             (*p) (ctrl);                    // call the function
>         }
>         PQfreemem (ctrl->notify);           // free the memory
>     }
> }
> The fprintf statement prints and nothing else happens. Observe:
> move/move.c: 2072: preMenuActFunc (move::save_b)
> move/move.c: 2718: save_SQL ()
> move/move.c: 2798: save_SQL ()
> move/move.c: 2816: restoreState ()
> utility/global.c: 1804: CheckForNotifies (move)
> just before call to PQnotifies...
>
> And that is where it sits until I send a message notification from a
> separate client (plsql). When the message is sent, the process continues.
> Is this a bug?  Did anybody look at the stack trace is sent? I don't have
> the expertise to analyze it.
>
> TIA
>
> On Friday 25 February 2005 06:00 am, Terry Lee Tucker saith:
> > See partial output from strace below. On this particular run, the command
> > locked up during the second edit operation. The notify message,
> > move_update, was delivered, but the PQsendquery never returns.
> >
> > Any ideas Tom?
> >
> > recv(3, "A\0\0\0\25\0\0~3move_update\0\0C\0\0\0\rUPDAT"..., 16384, 0) =
> > 42 write(2, "CheckForNotifies: ASYNC NOTIFY o"..., 82CheckForNotifies:
> > ASYNC NOTIFY of 'move_update' from backend pid '32307' received
> > ) = 82
> > write(2, "main/main.c: 1638: backEndNotify"..., 54main/main.c: 1638:
> > backEndNotify (32307: move_update)
> > ) = 54
> > write(2, "main/main.c: 1655: backEndNotify"..., 43main/main.c: 1655:
> > backEndNotify (leaving)
> > ) = 43
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > ioctl(4, FIONREAD, [32])                = 0
> > read(4, "\34\364_\264\311\4\300\7<\1\0\0\304\243G\22\0\0\0\0\304"..., 32)
> > = 32 select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > ioctl(4, FIONREAD, [32])                = 0
> > read(4, "\34\364_\264\311\4\300\7\f\1\0\0\305\243G\22\0\4\300\7"..., 32)
> > = 32 select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > ioctl(4, FIONREAD, [384])               = 0
> > read(4, "\34\364_\264\311\4\300\7\365\0\0\0\306\243G\22\0\0\0\0"..., 384)
> > = 384
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > ioctl(4, FIONREAD, [32])                = 0
> > read(4, "\226\252_\264\311\4\300\7\311\4\300\7\0\0\0\0\370\0\177"..., 32)
> > = 32 write(4, "\2\2\4\0\314\4\300\7\0\10\0\0}\240 \0", 16) = 16
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > write(4, "8\2\4\0\32\0\300\7\0\0\10\0\0\0\0\0>\5\7\0\312\4\300\7"...,
> > 500) = 500
> > read(4, 0xbffff040, 32)                 = -1 EAGAIN (Resource temporarily
> > unavailable)
> > select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> > read(4, "\1\1t\264\0\0\0\0\304\364\3\1\370\0\177\1\320\2460\t`\347"...,
> > 32) = 32
> > write(4, "\17\2\2\0\314\4\300\7", 8)    = 8
> > read(4, 0xbffff120, 32)                 = -1 EAGAIN (Resource temporarily
> > unavailable)
> > select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> > read(4, "\1ku\264\0\0\0\0\257\0\0\0\311\4\300\7\0\0\0\0\0\0\0\0"..., 32)
> > = 32 write(4,
> > "B\2\7\0\314\4\300\7\24\0\300\7\256\0@\0\3\1@\0\256\0A\0"..., 64) = 64
> > read(4, 0xbffff0b0, 32)                 = -1 EAGAIN (Resource temporarily
> > unavailable)
> > select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> > read(4, "\1tx\264\0\0\0\0\257\0\0\0\311\4\300\7\0\0;\t\260\267\343"...,
> > 32) = 32
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > ioctl(4, FIONREAD, [0])                 = 0
> > write(4, "F\2\v\0\314\4\300\7\r\0\300\7\252\0<\0^\0\2\0\252\0j\0"..., 44)
> > = 44 ioctl(4, FIONREAD, [0])                 = 0
> > gettimeofday({1109328767, 349228}, NULL) = 0
> > select(5, [3 4], [], [], NULL)          = 1 (in [4])
> > ioctl(4, FIONREAD, [32])                = 0
> > read(4, "\36\203y\264\0\0\0\0\247\3\300\7]\1`\1\1\0\0\0^\1\0\0%"..., 32)
> > = 32 write(4, "\24\0\6\0]\1`\1%\1\0\0\0\0\0\0\0\0\0\0\200\226\230\0", 24)
> > = 24 read(4, 0xbffff2f0, 32)                 = -1 EAGAIN (Resource
> > temporarily unavailable)
> > select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> > read(4, "\1\0z\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32)
> > = 32 write(4, "+\0\1\0", 4)                  = 4
> > read(4, 0xbffff2e0, 32)                 = -1 EAGAIN (Resource temporarily
> > unavailable)
> > select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> > read(4, "\1\1{\264\0\0\0\0\311\4\300\7\0\0\0\0\1\0\0\0\2\0\0\0@"..., 32)
> > = 32 write(4, "\22\0\25\0]\1`\1%\1\0\0\4\0\0\0
> > \0\0\0\17\0\0\0^\1\0\0"..., 132) = 132
> > read(4, 0xbffff440, 32)                 = -1 EAGAIN (Resource temporarily
> > unavailable)
> > select(5, [4], NULL, NULL, NULL)        = 1 (in [4])
> > read(4, "\1\1~\264\0\0\0\0\311\4\300\7\0\0\0\0\0\0\0\0\2\0\0\0@"..., 32)
> > = 32 select(5, [3], [], [], {0, 0})          = 0 (Timeout)
> > ioctl(4, FIONREAD, [0])                 = 0
> > ioctl(4, FIONREAD, [0])                 = 0
> > gettimeofday({1109328768, 129317}, NULL) = 0
> > select(5, [3 4], [], [], NULL)          = 1 (in [4])
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
MailScanner thanks transtec Computers for their support.

Re: Questions regarding notify processing.

От
Tom Lane
Дата:
Terry Lee Tucker <terry@esc1.com> writes:
>> Is this a bug?  Did anybody look at the stack trace is sent? I don't have
>> the expertise to analyze it.

I haven't seen any stack trace.  strace is not a stack trace --- I'm
talking about attaching to the stuck process with gdb and getting the
result of "bt".

If it is hanging up in PQnotifies then that's certainly a bug, but you
haven't proven that.  The debug printouts you showed could equally well
be interpreted as PQnotifies returned NULL and control went back to the
idle loop.  PQnotifies is not supposed to block, ever.

At the moment I suspect the problem is somewhere upstream, probably in
your interaction with the X toolkit.  I think the program is blocking
when in fact there are still bytes ready to be read by libpq.

            regards, tom lane

Re: Questions regarding notify processing.

От
Terry Lee Tucker
Дата:
Sorry, I didn't know what you needed. Here is the trace:
(gdb) bt
#0  0x2b04eef2 in select () from /lib/i686/libc.so.6
#1  0x2adeb06c in ?? () from /usr/X11R6/lib/libXt.so.6
#2  0x00000001 in ?? ()
#3  0x08136b88 in ?? ()
#4  0x2adc0c47 in XtChangeManagedSet () from /usr/X11R6/lib/libXt.so.6
#5  0x2adc13b0 in _XtWaitForSomething () from /usr/X11R6/lib/libXt.so.6
#6  0x2adc23f6 in XtAppNextEvent () from /usr/X11R6/lib/libXt.so.6
#7  0x2adb6a7c in XtAppMainLoop () from /usr/X11R6/lib/libXt.so.6
#8  0x080b136a in main (argc=2, argv=0xbffff794) at main/main.c:252

Thanks for your help...

On Friday 25 February 2005 03:06 pm, Tom Lane saith:
> Terry Lee Tucker <terry@esc1.com> writes:
> >> Is this a bug?  Did anybody look at the stack trace is sent? I don't
> >> have the expertise to analyze it.
>
> I haven't seen any stack trace.  strace is not a stack trace --- I'm
> talking about attaching to the stuck process with gdb and getting the
> result of "bt".
>
> If it is hanging up in PQnotifies then that's certainly a bug, but you
> haven't proven that.  The debug printouts you showed could equally well
> be interpreted as PQnotifies returned NULL and control went back to the
> idle loop.  PQnotifies is not supposed to block, ever.
>
> At the moment I suspect the problem is somewhere upstream, probably in
> your interaction with the X toolkit.  I think the program is blocking
> when in fact there are still bytes ready to be read by libpq.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Questions regarding notify processing.

От
Tom Lane
Дата:
Terry Lee Tucker <terry@esc1.com> writes:
> Sorry, I didn't know what you needed. Here is the trace:
> (gdb) bt
> #0  0x2b04eef2 in select () from /lib/i686/libc.so.6
> #1  0x2adeb06c in ?? () from /usr/X11R6/lib/libXt.so.6
> #2  0x00000001 in ?? ()
> #3  0x08136b88 in ?? ()
> #4  0x2adc0c47 in XtChangeManagedSet () from /usr/X11R6/lib/libXt.so.6
> #5  0x2adc13b0 in _XtWaitForSomething () from /usr/X11R6/lib/libXt.so.6
> #6  0x2adc23f6 in XtAppNextEvent () from /usr/X11R6/lib/libXt.so.6
> #7  0x2adb6a7c in XtAppMainLoop () from /usr/X11R6/lib/libXt.so.6
> #8  0x080b136a in main (argc=2, argv=0xbffff794) at main/main.c:252

Well, indeed it is not stuck in PQnotifies --- it's waiting for input,
and evidently not seeing any.

Looking back at your strace, the first line shows that the connection to
the PG server is on FD 3, since that's recognizably a NOTIFY message:

recv(3, "A\0\0\0\25\0\0~3move_update\0\0C\0\0\0\rUPDAT"..., 16384, 0) = 42

I suppose that the traffic on FD 4 is going to and from the X server.

What seems odd is that some of the select()s are waiting on both FD 3
and FD 4, but some on only one or the other.  I'd have expected all of
them to wait on both.

I think what is happening is that the X toolkit is failing to wait on
FD 3 at the critical point, presumably because it thinks you are no
longer interested in input from that FD.  It's been a very long time
since I touched low-level X stuff, but perhaps you need to remind the
toolkit repeatedly that you are interested in FD 3?  Is there any path
of control that could cancel the callback assignment for DbInputCB?

            regards, tom lane

Re: Questions regarding notify processing.

От
Terry Lee Tucker
Дата:
Tom,

The alternate input source is created on application startup with:
ctrl->sys->dbInputId = XtAppAddInput (app,
            ctrl->sys->socketId,
            (XtPointer) (XtInputReadMask),
            DbInputCB, (XtPointer) ctrl);
And, it remains throughout the duration of the application. I believe this
issue is related to your comment regarding the fact that selects are waiting
on both 3 and 4 and sometimes just one or the other.

I was using a different approach, that is, using code from one of the earlier
examples for Asynchronous command processing in the documentation. I had two
functions, one that established a few miliseconds read on the socket and, if
nothing was there, a .5 second timer was called and when that expired, we did
it all again. Using the alternate input feature of X is MUCH faster, but with
a caveat, as this thread reveals. I think I will leave the timer where it is
and see what develops when we test this before going into production. I think
the reason this pops up here is due to the fact that there are several
database accesses involved with this particular interface. Most are not
nearly as complicated as this one is.

I just need the notification in a reasonable amount of time after the Edit
operation and can even extend the timer that calls CheckForNotifies if
necessary.

You've been very helpful. Thanks...


On Friday 25 February 2005 03:51 pm, Tom Lane saith:
> Terry Lee Tucker <terry@esc1.com> writes:
> > Sorry, I didn't know what you needed. Here is the trace:
> > (gdb) bt
> > #0  0x2b04eef2 in select () from /lib/i686/libc.so.6
> > #1  0x2adeb06c in ?? () from /usr/X11R6/lib/libXt.so.6
> > #2  0x00000001 in ?? ()
> > #3  0x08136b88 in ?? ()
> > #4  0x2adc0c47 in XtChangeManagedSet () from /usr/X11R6/lib/libXt.so.6
> > #5  0x2adc13b0 in _XtWaitForSomething () from /usr/X11R6/lib/libXt.so.6
> > #6  0x2adc23f6 in XtAppNextEvent () from /usr/X11R6/lib/libXt.so.6
> > #7  0x2adb6a7c in XtAppMainLoop () from /usr/X11R6/lib/libXt.so.6
> > #8  0x080b136a in main (argc=2, argv=0xbffff794) at main/main.c:252
>
> Well, indeed it is not stuck in PQnotifies --- it's waiting for input,
> and evidently not seeing any.
>
> Looking back at your strace, the first line shows that the connection to
> the PG server is on FD 3, since that's recognizably a NOTIFY message:
>
> recv(3, "A\0\0\0\25\0\0~3move_update\0\0C\0\0\0\rUPDAT"..., 16384, 0) = 42
>
> I suppose that the traffic on FD 4 is going to and from the X server.
>
> What seems odd is that some of the select()s are waiting on both FD 3
> and FD 4, but some on only one or the other.  I'd have expected all of
> them to wait on both.
>
> I think what is happening is that the X toolkit is failing to wait on
> FD 3 at the critical point, presumably because it thinks you are no
> longer interested in input from that FD.  It's been a very long time
> since I touched low-level X stuff, but perhaps you need to remind the
> toolkit repeatedly that you are interested in FD 3?  Is there any path
> of control that could cancel the callback assignment for DbInputCB?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: Questions regarding notify processing.

От
Tom Lane
Дата:
Terry Lee Tucker <terry@esc1.com> writes:
> The alternate input source is created on application startup with:
> ctrl->sys->dbInputId = XtAppAddInput (app,
>             ctrl->sys->socketId,
>             (XtPointer) (XtInputReadMask),
>             DbInputCB, (XtPointer) ctrl);
> And, it remains throughout the duration of the application. I believe this
> issue is related to your comment regarding the fact that selects are waiting
> on both 3 and 4 and sometimes just one or the other.

I wonder if you have found a bug in the X toolkit?  It might be worth
trying to identify exactly where the selects that only wait on FD 4 are
coming from.  If you can prove that the toolkit is not paying attention
to FD 3 when it should be, it's time to file a bug with those folk.

            regards, tom lane