Обсуждение: wire protocol errors on Debian (py3/psyco 2.8)
Hi all, we have two reports for psycopg2 2.8 under Python 3 on Debian showing a problem with the wire protocol (full logs appended): psycopg2.OperationalError: datos insuficientes en el mensaje «T» or psycopg2.DatabaseError: el contenido del mensaje no concuerda con el largo, en el mensaje tipo «C» The error happens at different times when running the same code. The versions involved: psycopg2 module version: 2.8.4 (dt dec pq3 ext lo64) PostgreSQL via DB-API module "<module 'psycopg2' from '/usr/lib/python3/dist-packages/psycopg2/__init__.py'>": API level2.0, thread safety 2, parameter style "pyformat" libpq version (compiled in): 120000 libpq version (loaded now) : 120002 Platform: uname_result(system='Linux', node='Epigen7', release='5.4.0-4-amd64', version='#1 SMP Debian 5.4.19-1 (2020-02-13)',machine='x86_64', processor='') Python 3.7.6 (default, Jan 19 2020, 22:34:52) <\n>[GCC 9.2.1 20200117] on linux (posix) lsb_release: {'ID': 'Debian', 'OS': 'GNU/Linux', 'DESCRIPTION': 'Debian GNU/Linux bullseye/sid', 'RELEASE': 'testing', 'CODENAME':'bullseye'} threading: sys.thread_info(name='pthread', lock='semaphore', version='NPTL 2.29') We don't yet know the exact version of PostgreSQL this is running against but it is likely 12.2-1+b1 from Debian/Unstable. Any ideas or hints what to look at next ? Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Вложения
On 3/14/20 8:34 AM, Karsten Hilbert wrote: > Hi all, > > we have two reports for psycopg2 2.8 under Python 3 on Debian > showing a problem with the wire protocol (full logs appended): > > psycopg2.OperationalError: datos insuficientes en el mensaje «T» > > or > > psycopg2.DatabaseError: el contenido del mensaje no concuerda con el largo, en el mensaje tipo «C» > > The error happens at different times when running the same > code. The versions involved: > > psycopg2 module version: 2.8.4 (dt dec pq3 ext lo64) > PostgreSQL via DB-API module "<module 'psycopg2' from '/usr/lib/python3/dist-packages/psycopg2/__init__.py'>": API level2.0, thread safety 2, parameter style "pyformat" > libpq version (compiled in): 120000 > libpq version (loaded now) : 120002 > > Platform: uname_result(system='Linux', node='Epigen7', release='5.4.0-4-amd64', version='#1 SMP Debian 5.4.19-1 (2020-02-13)',machine='x86_64', processor='') > > Python 3.7.6 (default, Jan 19 2020, 22:34:52) <\n>[GCC 9.2.1 20200117] on linux (posix) > > lsb_release: {'ID': 'Debian', 'OS': 'GNU/Linux', 'DESCRIPTION': 'Debian GNU/Linux bullseye/sid', 'RELEASE': 'testing','CODENAME': 'bullseye'} > > threading: sys.thread_info(name='pthread', lock='semaphore', version='NPTL 2.29') > > We don't yet know the exact version of PostgreSQL this is > running against but it is likely 12.2-1+b1 from > Debian/Unstable. > > Any ideas or hints what to look at next ? How was psycopg2 installed? What is the code being run when the errors are thrown? > > Thanks, > Karsten > -- > GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B > -- Adrian Klaver adrian.klaver@aklaver.com
Dear Adrian, unfortunately, we don't have direct contact with the reporter (only via our mailing list and they haven't answered so far). > How was psycopg2 installed? Given they are running a "standard" Debian/Unstable with a Debian package of GNUmed installed and the matching psycopg2 package version psycopg2 module version: 2.8.4 (dt dec pq3 ext lo64) I suspect they apt-get install'ed it. Which assumption seems supported by the package file string in: PostgreSQL via DB-API module "<module 'psycopg2' from '/usr/lib/python3/dist-packages/psycopg2/__init__.py'>": API level2.0, thread safety 2, parameter style "pyformat" The one thing that's odd is that the Python version Python 3.7.6 (default, Jan 19 2020, 22:34:52) <\n>[GCC 9.2.1 20200117] on linux (posix) does not match what's available from Debian: python3: Installiert: 3.7.3-1 Installationskandidat: 3.7.3-1 Versionstabelle: 3.8.2-1 500 500 https://deb.debian.org/debian unstable/main i386 Packages 3.7.5-3 500 500 https://deb.debian.org/debian bullseye/main i386 Packages *** 3.7.3-1 990 990 https://deb.debian.org/debian buster/main i386 Packages 100 /var/lib/dpkg/status > What is the code being run when the errors are thrown? It happens (two reports so far) at different stages during the bringup phase of the GNUmed client. The code being run is the same, but the time of occurrence is different. case 1: select vco.value from cfg.v_cfg_opts_numeric vco where vco.owner = CURRENT_USER and vco.workplace = %(wp)s and vco.option= %(opt)s and vco.cookie is null limit 1 case 2: select _(message) from cfg.db_logon_banner where _() translates "message". This is the Python code being run (the place of error marked with xxxxx): def run_ro_queries(link_obj=None, queries=None, verbose=False, return_data=True, get_col_idx=False): """Run read-only queries. <queries> must be a list of dicts: [ {'cmd': <string>, 'args': <dict> or <tuple>}, {...}, ... ] """ if isinstance(link_obj, dbapi._psycopg.cursor): curs = link_obj curs_close = lambda *x:None tx_rollback = lambda *x:None readonly_rollback_just_in_case = lambda *x:None elif isinstance(link_obj, dbapi._psycopg.connection): curs = link_obj.cursor() curs_close = curs.close tx_rollback = link_obj.rollback if link_obj.autocommit is True: # readonly connection ? readonly_rollback_just_in_case = link_obj.rollback else: # do not rollback readonly queries on passed-in readwrite # connections just in case because they may have already # seen fully legitimate write action which would get lost readonly_rollback_just_in_case = lambda *x:None elif link_obj is None: conn = get_connection(readonly = True, verbose = verbose) curs = conn.cursor() curs_close = curs.close tx_rollback = conn.rollback readonly_rollback_just_in_case = conn.rollback else: raise ValueError('link_obj must be cursor, connection or None but not [%s]' % link_obj) if verbose: _log.debug('cursor: %s', curs) for query in queries: try: args = query['args'] except KeyError: args = None try: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx curs.execute(query['cmd'], args) xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx if verbose: gmConnectionPool.log_cursor_state(curs) except PG_ERROR_EXCEPTION as pg_exc: _log.error('query failed in RO connection') gmConnectionPool.log_pg_exception_details(pg_exc) try: curs_close() except PG_ERROR_EXCEPTION as pg_exc2: _log.exception('cannot close cursor') gmConnectionPool.log_pg_exception_details(pg_exc2) try: tx_rollback() # need to rollback so ABORT state isn't preserved in pooled conns except PG_ERROR_EXCEPTION as pg_exc2: _log.exception('cannot rollback transaction') gmConnectionPool.log_pg_exception_details(pg_exc2) if pg_exc.pgcode == sql_error_codes.INSUFFICIENT_PRIVILEGE: details = 'Query: [%s]' % curs.query.strip().strip('\n').strip().strip('\n') if curs.statusmessage != '': details = 'Status: %s\n%s' % ( curs.statusmessage.strip().strip('\n').strip().strip('\n'), details ) if pg_exc.pgerror is None: msg = '[%s]' % pg_exc.pgcode else: msg = '[%s]: %s' % (pg_exc.pgcode, pg_exc.pgerror) raise gmExceptions.AccessDenied ( msg, source = 'PostgreSQL', code = pg_exc.pgcode, details = details ) raise except Exception: _log.exception('error during query run in RO connection') gmConnectionPool.log_cursor_state(curs) try: curs_close() except PG_ERROR_EXCEPTION as pg_exc: _log.exception('cannot close cursor') gmConnectionPool.log_pg_exception_details(pg_exc) try: tx_rollback() # need to rollback so ABORT state isn't preserved in pooled conns except PG_ERROR_EXCEPTION as pg_exc: _log.exception('cannot rollback transation') gmConnectionPool.log_pg_exception_details(pg_exc) raise data = None col_idx = None if return_data: data = curs.fetchall() if verbose: _log.debug('last query returned [%s (%s)] rows', curs.rowcount, len(data)) _log.debug('cursor description: %s', curs.description) if get_col_idx: col_idx = get_col_indices(curs) curs_close() # so we can see data committed meanwhile if the # link object had been passed in and thusly might # be part of a long-running read-only transaction readonly_rollback_just_in_case() return (data, col_idx) Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On 3/14/20 9:12 AM, Karsten Hilbert wrote: > Dear Adrian, > > unfortunately, we don't have direct contact with the reporter > (only via our mailing list and they haven't answered so far). > >> How was psycopg2 installed? > > Given they are running a "standard" Debian/Unstable with a > Debian package of GNUmed installed and the matching psycopg2 > package version > > psycopg2 module version: 2.8.4 (dt dec pq3 ext lo64) > > I suspect they apt-get install'ed it. Which assumption seems > supported by the package file string in: > > PostgreSQL via DB-API module "<module 'psycopg2' from '/usr/lib/python3/dist-packages/psycopg2/__init__.py'>": APIlevel 2.0, thread safety 2, parameter style "pyformat" > > The one thing that's odd is that the Python version > > Python 3.7.6 (default, Jan 19 2020, 22:34:52) <\n>[GCC 9.2.1 20200117] on linux (posix) > > does not match what's available from Debian: > > python3: > Installiert: 3.7.3-1 > Installationskandidat: 3.7.3-1 > Versionstabelle: > 3.8.2-1 500 > 500 https://deb.debian.org/debian unstable/main i386 Packages > 3.7.5-3 500 > 500 https://deb.debian.org/debian bullseye/main i386 Packages > *** 3.7.3-1 990 > 990 https://deb.debian.org/debian buster/main i386 Packages > 100 /var/lib/dpkg/status To me resolving the above would help with the below. I have to go out shortly and will not be able to look over the below in detail until later. > >> What is the code being run when the errors are thrown? > > It happens (two reports so far) at different stages during > the bringup phase of the GNUmed client. The code being run is > the same, but the time of occurrence is different. > > case 1: > > select vco.value from cfg.v_cfg_opts_numeric vco where vco.owner = CURRENT_USER and vco.workplace = %(wp)s and vco.option= %(opt)s and vco.cookie is null limit 1 > > case 2: > > select _(message) from cfg.db_logon_banner > > where _() translates "message". This is the Python code being > run (the place of error marked with xxxxx): > > def run_ro_queries(link_obj=None, queries=None, verbose=False, return_data=True, get_col_idx=False): > """Run read-only queries. > > <queries> must be a list of dicts: > [ > {'cmd': <string>, 'args': <dict> or <tuple>}, > {...}, > ... > ] > """ > if isinstance(link_obj, dbapi._psycopg.cursor): > curs = link_obj > curs_close = lambda *x:None > tx_rollback = lambda *x:None > readonly_rollback_just_in_case = lambda *x:None > elif isinstance(link_obj, dbapi._psycopg.connection): > curs = link_obj.cursor() > curs_close = curs.close > tx_rollback = link_obj.rollback > if link_obj.autocommit is True: # readonly connection ? > readonly_rollback_just_in_case = link_obj.rollback > else: > # do not rollback readonly queries on passed-in readwrite > # connections just in case because they may have already > # seen fully legitimate write action which would get lost > readonly_rollback_just_in_case = lambda *x:None > elif link_obj is None: > conn = get_connection(readonly = True, verbose = verbose) > curs = conn.cursor() > curs_close = curs.close > tx_rollback = conn.rollback > readonly_rollback_just_in_case = conn.rollback > else: > raise ValueError('link_obj must be cursor, connection or None but not [%s]' % link_obj) > > if verbose: > _log.debug('cursor: %s', curs) > > for query in queries: > try: > args = query['args'] > except KeyError: > args = None > try: > xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > curs.execute(query['cmd'], args) > xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > if verbose: > gmConnectionPool.log_cursor_state(curs) > except PG_ERROR_EXCEPTION as pg_exc: > _log.error('query failed in RO connection') > gmConnectionPool.log_pg_exception_details(pg_exc) > try: > curs_close() > except PG_ERROR_EXCEPTION as pg_exc2: > _log.exception('cannot close cursor') > gmConnectionPool.log_pg_exception_details(pg_exc2) > try: > tx_rollback() # need to rollback so ABORT state isn't preserved in pooled conns > except PG_ERROR_EXCEPTION as pg_exc2: > _log.exception('cannot rollback transaction') > gmConnectionPool.log_pg_exception_details(pg_exc2) > if pg_exc.pgcode == sql_error_codes.INSUFFICIENT_PRIVILEGE: > details = 'Query: [%s]' % curs.query.strip().strip('\n').strip().strip('\n') > if curs.statusmessage != '': > details = 'Status: %s\n%s' % ( > curs.statusmessage.strip().strip('\n').strip().strip('\n'), > details > ) > if pg_exc.pgerror is None: > msg = '[%s]' % pg_exc.pgcode > else: > msg = '[%s]: %s' % (pg_exc.pgcode, pg_exc.pgerror) > raise gmExceptions.AccessDenied ( > msg, > source = 'PostgreSQL', > code = pg_exc.pgcode, > details = details > ) > raise > except Exception: > _log.exception('error during query run in RO connection') > gmConnectionPool.log_cursor_state(curs) > try: > curs_close() > except PG_ERROR_EXCEPTION as pg_exc: > _log.exception('cannot close cursor') > gmConnectionPool.log_pg_exception_details(pg_exc) > try: > tx_rollback() # need to rollback so ABORT state isn't preserved in pooled conns > except PG_ERROR_EXCEPTION as pg_exc: > _log.exception('cannot rollback transation') > gmConnectionPool.log_pg_exception_details(pg_exc) > raise > > data = None > col_idx = None > if return_data: > data = curs.fetchall() > if verbose: > _log.debug('last query returned [%s (%s)] rows', curs.rowcount, len(data)) > _log.debug('cursor description: %s', curs.description) > if get_col_idx: > col_idx = get_col_indices(curs) > > curs_close() > # so we can see data committed meanwhile if the > # link object had been passed in and thusly might > # be part of a long-running read-only transaction > readonly_rollback_just_in_case() > return (data, col_idx) > > > Thanks, > Karsten > -- > GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B > > -- Adrian Klaver adrian.klaver@aklaver.com
On Sat, Mar 14, 2020 at 09:26:39AM -0700, Adrian Klaver wrote: > > The one thing that's odd is that the Python version > > > > Python 3.7.6 (default, Jan 19 2020, 22:34:52) <\n>[GCC 9.2.1 20200117] on linux (posix) > > > > does not match what's available from Debian: > > > > python3: > > Installiert: 3.7.3-1 > > Installationskandidat: 3.7.3-1 > > Versionstabelle: > > 3.8.2-1 500 > > 500 https://deb.debian.org/debian unstable/main i386 Packages > > 3.7.5-3 500 > > 500 https://deb.debian.org/debian bullseye/main i386 Packages > > *** 3.7.3-1 990 > > 990 https://deb.debian.org/debian buster/main i386 Packages > > 100 /var/lib/dpkg/status > > To me resolving the above would help with the below. I know, that would have been my first guess, too. I will see whether OP will return with data. Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On 3/14/20 10:41 AM, Karsten Hilbert wrote: > On Sat, Mar 14, 2020 at 09:26:39AM -0700, Adrian Klaver wrote: > >>> The one thing that's odd is that the Python version >>> >>> Python 3.7.6 (default, Jan 19 2020, 22:34:52) <\n>[GCC 9.2.1 20200117] on linux (posix) >>> >>> does not match what's available from Debian: >>> >>> python3: >>> Installiert: 3.7.3-1 >>> Installationskandidat: 3.7.3-1 >>> Versionstabelle: >>> 3.8.2-1 500 >>> 500 https://deb.debian.org/debian unstable/main i386 Packages >>> 3.7.5-3 500 >>> 500 https://deb.debian.org/debian bullseye/main i386 Packages >>> *** 3.7.3-1 990 >>> 990 https://deb.debian.org/debian buster/main i386 Packages >>> 100 /var/lib/dpkg/status >> >> To me resolving the above would help with the below. > > I know, that would have been my first guess, too. I will see > whether OP will return with data. Yeah, I just know that there have been problems with the wheel(binary) installs of psycopg2 and mismatched external libraries: https://www.psycopg.org/docs/install.html Binary install from PyPI I could see that also happening if you are using a disto build of a package(psycopg2) with a third party build of Python. FYI, when I was trying to figure out the install procedure for GNUmed this link: https://www.gnumed.de/documentation/InstallerGuideHome threw a 404 error. > > Karsten > -- > GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B > > -- Adrian Klaver adrian.klaver@aklaver.com
On Sun, 15 Mar 2020, 04:34 Karsten Hilbert, <Karsten.Hilbert@gmx.net> wrote:
Hi all,
we have two reports for psycopg2 2.8 under Python 3 on Debian
showing a problem with the wire protocol (full logs appended):
psycopg2.OperationalError: datos insuficientes en el mensaje «T»
or
psycopg2.DatabaseError: el contenido del mensaje no concuerda con el largo, en el mensaje tipo «C»
The error happens at different times when running the same
code. The versions involved:
psycopg2 module version: 2.8.4 (dt dec pq3 ext lo64)
...
libpq version (loaded now) : 120002
...
We don't yet know the exact version of PostgreSQL this is
running against but it is likely 12.2-1+b1 from
Debian/Unstable.
Any ideas or hints what to look at next ?
The error comes from the libpq failing to parse messages from the server (at a glance CommandComplete and RowDescription - see https://www.postgresql.org/docs/current/protocol-message-formats.html) which are probably just the most common ones.
I doubt it's a bug with libpq 12.2 or whatever server is on the other side of the wire: we would have seen this error more often. If you want to investigate in that direction you may ask core devs if there have been changes in that area for those versions.
My wild guess is either faulty network or faulty memory on the machine, most likely the latter.
-- Daniele
On 3/14/20 9:12 AM, Karsten Hilbert wrote: > Dear Adrian, > > unfortunately, we don't have direct contact with the reporter > (only via our mailing list and they haven't answered so far). > >> How was psycopg2 installed? > > Given they are running a "standard" Debian/Unstable with a > Debian package of GNUmed installed and the matching psycopg2 > package version > > psycopg2 module version: 2.8.4 (dt dec pq3 ext lo64) > > I suspect they apt-get install'ed it. Which assumption seems > supported by the package file string in: > > PostgreSQL via DB-API module "<module 'psycopg2' from '/usr/lib/python3/dist-packages/psycopg2/__init__.py'>": APIlevel 2.0, thread safety 2, parameter style "pyformat" > > The one thing that's odd is that the Python version > > Python 3.7.6 (default, Jan 19 2020, 22:34:52) <\n>[GCC 9.2.1 20200117] on linux (posix) > > does not match what's available from Debian: > > python3: > Installiert: 3.7.3-1 > Installationskandidat: 3.7.3-1 > Versionstabelle: > 3.8.2-1 500 > 500 https://deb.debian.org/debian unstable/main i386 Packages > 3.7.5-3 500 > 500 https://deb.debian.org/debian bullseye/main i386 Packages > *** 3.7.3-1 990 > 990 https://deb.debian.org/debian buster/main i386 Packages > 100 /var/lib/dpkg/status > >> What is the code being run when the errors are thrown? > > It happens (two reports so far) at different stages during > the bringup phase of the GNUmed client. The code being run is > the same, but the time of occurrence is different. > > case 1: > > select vco.value from cfg.v_cfg_opts_numeric vco where vco.owner = CURRENT_USER and vco.workplace = %(wp)s and vco.option= %(opt)s and vco.cookie is null limit 1 > > case 2: > > select _(message) from cfg.db_logon_banner > > where _() translates "message". This is the Python code being > run (the place of error marked with xxxxx): > Went through the function and the attached logs from the OP. I do not pretend to understand all of what the logs are showing. Still I do not see anything that stands out to me. -- Adrian Klaver adrian.klaver@aklaver.com
On 3/14/20 12:35 PM, Daniele Varrazzo wrote: > > > > The error comes from the libpq failing to parse messages from the server > (at a glance CommandComplete and RowDescription - see > https://www.postgresql.org/docs/current/protocol-message-formats.html) > which are probably just the most common ones. > > I doubt it's a bug with libpq 12.2 or whatever server is on the other > side of the wire: we would have seen this error more often. If you want > to investigate in that direction you may ask core devs if there have > been changes in that area for those versions. I did run across this: https://www.postgresql.org/docs/12/release-12-2.html " Fix bugs in handling of non-blocking I/O when using GSSAPI encryption (Tom Lane) These errors could result in dropping data (usually leading to subsequent wire-protocol-violation errors) or in a “livelock” situation where a sending process goes to sleep although not all its data has been sent. Moreover, libpq failed to keep separate encryption state for each connection, creating the possibility for failures in applications using multiple encrypted database connections. " Would be nice to know what actual Postgres version is and how it is configured? > > My wild guess is either faulty network or faulty memory on the machine, > most likely the latter. > > -- Daniele -- Adrian Klaver adrian.klaver@aklaver.com
On Sat, Mar 14, 2020 at 01:29:58PM -0700, Adrian Klaver wrote: > > The error comes from the libpq failing to parse messages from the server > > (at a glance CommandComplete and RowDescription - see > > https://www.postgresql.org/docs/current/protocol-message-formats.html) > > which are probably just the most common ones. > > > > I doubt it's a bug with libpq 12.2 or whatever server is on the other > > side of the wire: we would have seen this error more often. If you want > > to investigate in that direction you may ask core devs if there have > > been changes in that area for those versions. > > I did run across this: > > https://www.postgresql.org/docs/12/release-12-2.html > > " > > Fix bugs in handling of non-blocking I/O when using GSSAPI encryption (Tom > Lane) > > These errors could result in dropping data (usually leading to subsequent > wire-protocol-violation errors) or in a “livelock” situation where a sending > process goes to sleep although not all its data has been sent. Moreover, > libpq failed to keep separate encryption state for each connection, creating > the possibility for failures in applications using multiple encrypted > database connections. > " > > Would be nice to know what actual Postgres version is and how it is > configured? As long as OP doesn't get back in touch we probably can't do much more. Many thanks for providing insights. I'll post more if more info becomes available. Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
> FYI, when I was trying to figure out the install procedure for GNUmed Thanks for bothering. > this link: > > https://www.gnumed.de/documentation/InstallerGuideHome > > threw a 404 error. Thanks for reporting. I am slowly rebuilding the documention after a foswiki proved to insecure ;-) Here's the new site: https://www.gnumed.de I have uploaded a *minimal* installation sort-of-guide: https://www.gnumed.de/documentation/GNUmedInstallation.html HTH, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B