Обсуждение: BUG #16007: Regarding patch for BUG #3995: pqSocketCheck doesn't return

Поиск
Список
Период
Сортировка

BUG #16007: Regarding patch for BUG #3995: pqSocketCheck doesn't return

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16007
Logged by:          Kiran Khatke
Email address:      kirankhatke23may@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system:   Linux (2.6.32)
Description:

Hi, 
We are using PostgreSQL Version 8.3.17 in our product and running into issue
described under BUG#3995.
Can you please share if any patch is there on 8.3.x version to address this
issue.

Regards,
Kiran


Re: BUG #16007: Regarding patch for BUG #3995: pqSocketCheck doesn't return

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> We are using PostgreSQL Version 8.3.17 in our product and running into issue
> described under BUG#3995.
> Can you please share if any patch is there on 8.3.x version to address this
> issue.

So far as I can see from the discussion of #3995, there was absolutely
no reason to think there was any PG bug.  But you are following in the
footsteps of that reporter by (a) jumping to a conclusion about what
your problem is, and (b) providing absolutely zero concrete information
that would allow anyone to help you.

Please read
https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

Also, you really should think about updating to a version of Postgres
that isn't many years obsolete.  Even if your problem did trace down
to being a PG bug, we are not going to fix it in 8.3.x.

            regards, tom lane



Re: BUG #16007: Regarding patch for BUG #3995: pqSocketCheck doesn't return

От
Kiran Khatke
Дата:
Hello Tom,

Please find below required information and let me know if you have any questions.

A description of what you are trying to achieve and what results you expect.:
I am expecting to change the DB configuration with running SQL Statement - MGMT_SERVER_TIME.
And I would not expect hung in poll().

PostgreSQL version number you are running:
8.3.17
 
How you installed PostgreSQL: Changes made to the settings in the postgresql.conf file:  see Server Configuration for a quick way to list them all.

Operating system and version:
Linux, 2.6.32

What program you're using to connect to PostgreSQL:
Daemon called DBMGR is written in C. And that interfaces with postgres to issue SQL statements.

For questions about any kind of error:
One of the thread of DBMGR Daemon is waiting for the result of poll() function.
poll() was called by pgSocketCheck(). So pqSocketCheck() didn't return, hung in poll().
Below is the backtrace.

#6  0x1006f618 in pga_stop () at src/dbmgr/pg_admin.c:168
#7  0x10f0c330 in _dbm_sigabrt (signo=6, si=0x7f766d58, context=0x7f766dd8) at src/dbmgr/dbm_main.c:1567
#8  <signal handler called>
#9  0x2ea7f184 in *__GI___poll (fds=<value optimized out>, nfds=1, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#10 0x2b600238 in pqSocketCheck (conn=0x11088158, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1043
#11 0x2b600404 in pqWaitTimed (forRead=<value optimized out>, forWrite=4, conn=0x11088158, finish_time=1) at fe-misc.c:917
#12 0x2b5ff884 in PQgetResult (conn=0x11088158) at fe-exec.c:1223
#13 0x2b5ffb48 in PQexecFinish (conn=0x11088158) at fe-exec.c:1452
#14 0x100c2930 in dbConnObj::execStatement (this=0x11091048, sqlStatement=0x3100bec4 "UPDATE MGMT_SERVER SET LAST_SUCCESSFUL_CONNECTION='1561569998' ", checkAlreadyExists=false, freeResult=true,
    retSeqErr=false) at src/dbmgr/dbConnObj.c:243
#15 0x10099498 in dbConnectionMgr::updateSQL (this=0x11090640, objID=DBO_MGMT_SERVER_TIME, type=DB_CONFIGURATION, serialObj=0x1156249c "1,19,16385,10,1561569998", consObj=@0x7f7673f8, cons=0x2e8957e4 "",
    consSerial=0x2e8957e4 "") at src/dbmgr/dbConnectionMgr.c:1647


What you were doing when the error happened / how to cause the error:
DBMgr daemon's main processing thread which is handling database commands (insert/update/delete) got stuck.
So no further dbmgr calls for any of the DB table get executed.
Finally watch dog timer expires for the dbmgr and dbmgr is killed.

The EXACT TEXT of the error message you're getting, if there is one: (Copy and paste the message to the email, do not send a screenshot)

Regards,
Kiran

On Thu, Sep 12, 2019 at 1:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> We are using PostgreSQL Version 8.3.17 in our product and running into issue
> described under BUG#3995.
> Can you please share if any patch is there on 8.3.x version to address this
> issue.

So far as I can see from the discussion of #3995, there was absolutely
no reason to think there was any PG bug.  But you are following in the
footsteps of that reporter by (a) jumping to a conclusion about what
your problem is, and (b) providing absolutely zero concrete information
that would allow anyone to help you.

Please read
https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

Also, you really should think about updating to a version of Postgres
that isn't many years obsolete.  Even if your problem did trace down
to being a PG bug, we are not going to fix it in 8.3.x.

                        regards, tom lane

Re: BUG #16007: Regarding patch for BUG #3995: pqSocketCheck doesn't return

От
Tom Lane
Дата:
Kiran Khatke <kirankhatke23may@gmail.com> writes:
> One of the thread of DBMGR Daemon is waiting for the result of poll()
> function.
> poll() was called by pgSocketCheck(). So pqSocketCheck() didn't return,
> hung in poll().
> Below is the backtrace.

Well, it's waiting for the query to finish, or so it thinks.  Did you
look at what the server thinks the session is doing?

Your reference to multiple threads is a red flag to me.  Very often
we see people whose programs try to use the same PGconn object from
multiple threads.  That doesn't work --- and libpq does not have any
internal mutexes that would prevent the object's state from getting
messed up by concurrent operations.  So a plausible theory is that
this PGconn was used concurrently, and now this particular thread
is stuck because the object's state is corrupt (ie, it shows the
query as busy but the server doesn't think so).

It might be worth enabling log_statement = all on the server side
and then watching the server log to see what seems to be happening
from that end.

            regards, tom lane



Re: BUG #16007: Regarding patch for BUG #3995: pqSocketCheck doesn't return

От
Kiran Khatke
Дата:

Hello Tom, 

Thanks for the support.

Below are the thread which uses libpq, and both the thread stuck in poll() only.

We haven't enabled server logs earlier, so not sure about server side happening.

This issue is rarely reproducible, hence could not check enabling server logs. 

Thread 1:

#0  0x2ea7f184 in *__GI___poll (fds=<value optimized out>, nfds=1, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87

#1  0x2b600238 in pqSocketCheck (conn=0x110999b8, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1043

#2  0x2b600404 in pqWaitTimed (forRead=<value optimized out>, forWrite=1, conn=0x110999b8, finish_time=0) at fe-misc.c:917

#3  0x2b5ff884 in PQgetResult (conn=0x110999b8) at fe-exec.c:1223

#4  0x100c2fa4 in dbConnObj::execStatement_nowait (this=0x110910e8,

    sqlStatement=0x313aae84 "INSERT INTO event (event_id,severity,flags,timestamp,managed_obj_id,managed_obj,groups,params) VALUES (184,5,0,'2019-06-26T10:26:38.133353-07:00',6,'ServicesNode.1025','TRClient','Name=\"031663-SCSN-FO"...) at src/dbmgr/dbConnObj.c:169

#5  0x10099c80 in dbConnectionMgr::insertSQL (this=0x11090640, objID=DBO_EVENT, type=DB_LOGGING,

    serialObj=0x1156273c "9,11,1171457,1,0,13,1171458,3,184,11,1171459,1,5,11,1171460,1,0,43,1171461,32,2019-06-26T10:26:38.133353-07:00,11,1171462,1,6,28,1171463,17,ServicesNode.1025,18,1171464,8,TRClient,84,1171465,73,Name=\""..., retSeqErr=true) at src/dbmgr/dbConnectionMgr.c:1489

 

Thread 2: (main processing thread)

#5  0x1006f4b8 in _pga_stop_db () at src/dbmgr/pg_admin.c:7643

#6  0x1006f618 in pga_stop () at src/dbmgr/pg_admin.c:168

#7  0x10f0c330 in _dbm_sigabrt (signo=6, si=0x7f766d58, context=0x7f766dd8) at src/dbmgr/dbm_main.c:1567

#8  <signal handler called>

#9  0x2ea7f184 in *__GI___poll (fds=<value optimized out>, nfds=1, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87

#10 0x2b600238 in pqSocketCheck (conn=0x11088158, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1043

#11 0x2b600404 in pqWaitTimed (forRead=<value optimized out>, forWrite=4, conn=0x11088158, finish_time=1) at fe-misc.c:917

#12 0x2b5ff884 in PQgetResult (conn=0x11088158) at fe-exec.c:1223

#13 0x2b5ffb48 in PQexecFinish (conn=0x11088158) at fe-exec.c:1452

#14 0x100c2930 in dbConnObj::execStatement (this=0x11091048, sqlStatement=0x3100bec4 "UPDATE MGMT_SERVER SET LAST_SUCCESSFUL_CONNECTION='1561569998' ", checkAlreadyExists=false, freeResult=true,

    retSeqErr=false) at src/dbmgr/dbConnObj.c:243

#15 0x10099498 in dbConnectionMgr::updateSQL (this=0x11090640, objID=DBO_MGMT_SERVER_TIME, type=DB_CONFIGURATION, serialObj=0x1156249c "1,19,16385,10,1561569998", consObj=@0x7f7673f8, cons=0x2e8957e4 "",

    consSerial=0x2e8957e4 "") at src/dbmgr/dbConnectionMgr.c:1647

Regards,
Kiran 

On Mon, Sep 16, 2019 at 6:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kiran Khatke <kirankhatke23may@gmail.com> writes:
> One of the thread of DBMGR Daemon is waiting for the result of poll()
> function.
> poll() was called by pgSocketCheck(). So pqSocketCheck() didn't return,
> hung in poll().
> Below is the backtrace.

Well, it's waiting for the query to finish, or so it thinks.  Did you
look at what the server thinks the session is doing?

Your reference to multiple threads is a red flag to me.  Very often
we see people whose programs try to use the same PGconn object from
multiple threads.  That doesn't work --- and libpq does not have any
internal mutexes that would prevent the object's state from getting
messed up by concurrent operations.  So a plausible theory is that
this PGconn was used concurrently, and now this particular thread
is stuck because the object's state is corrupt (ie, it shows the
query as busy but the server doesn't think so).

It might be worth enabling log_statement = all on the server side
and then watching the server log to see what seems to be happening
from that end.

                        regards, tom lane