Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3

Поиск
Список
Период
Сортировка
От Karl O. Pinc
Тема Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3
Дата
Msg-id 20240213193732.28cb8329@slate.karlpinc.com
обсуждение исходный текст
Ответы Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3
Список psycopg
Hi,

I'm writing a program which takes a text file from the user,
constructs an INPUT statement (the table and column names are not
known in advance), and uses the constructed statement
to insert rows of the text file into PG.

psycopg3 v3.1.18 (installed in a venv)
PG v15.5
Python 3.11.2
Debian 12.5 (stable, bookworm)

Should the user supply text that cannot be converted to the
server's encoding I want to report exactly where in the data
the problem lies.  The sample code below contains (some of)
what I want to report when there is a problem.

Because I want to know more than just "this particular
row of data caused a problem", after an error is
discovered the code below calls a psycopg3 dumper
on each data value to try to determine which data
value is problematic.

This does not work.  What is wrong with what I'm doing
and how do I do what I want?  (And how am I supposed to
know why this does not work and what works?)  I call the
dumper because I want to rely on psycopg3's mechanisms
and not have to query the server for its encoding
and figure out the PG->Python encoding mappings myself.

Thanks for the help.


Setup:

Client side encoding is UTF-8.

create database test_latin1 template template0 encoding 'LATIN1' locale 'C';

-- In the latin1 db
create table foo (col1 text, col2 text, col3 text);


Mock up of desired code (only slightly mangled by my email client):
-----------------<snip>-----------
#!/usr/bin/python3
#
# Test handing of database insertion encoding exception

import psycopg
import psycopg.adapt
import sys

# Latin1 does not have the euro symbol
data_row = ("ascii", "euro € symbol", "more ascii")

# Connect to an existing database
with psycopg.connect("dbname=test_latin1 user=test") as conn:
    text_dumper = conn.adapters.get_dumper(
                      str, psycopg.adapt.PyFormat.TEXT)

    # Open a cursor to perform database operations
    with conn.cursor() as cur:
        try:
            cur.execute(
                """
                insert into foo (col1, col2, col3)
                  values (%s, %s, %s)
                """,
                data_row,
            )
        except UnicodeEncodeError:
            # Call the text dumper on each data value to find which
            # caused the problem and provide a detailed report.
            col = 1
            for col_val in data_row:
                try:
                    text_dumper.dump(col_val)
                except UnicodeEncodeError as err:
                    print(
                        f"Column number {col} ({col_val}) contains a"
                        " character"
                        f" ({col_val[err.start, err.end]}), position"
                        f" {err.start + 1} through {err.end}, that"
                        " cannot be converted to the server-side"
                        " encoding"
                        f" ({err.encoding})"
                    )
                    print(str(err))
                    sys.exit()
                col += 1
            print("Cannot find cause of error")
            sys.exit()

print("Unexpectedly got no error")
sys.exit()
-----------------<snip>-----------

The relevant traceback (after the traceback of the initial
exception that is caught) is:

Traceback (most recent call last):
  File "/home/kop/projects/devel/test_encoding.py", line 32, in <module>
    text_dumper.dump(col_val)
TypeError: _StrDumper.dump() missing 1 required positional argument: 'obj'

Regards,

Karl <kop@karlpinc.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein



В списке psycopg по дате отправления:

Предыдущее
От: Daniele Varrazzo
Дата:
Сообщение: New releases, and happy new year!
Следующее
От: Daniele Varrazzo
Дата:
Сообщение: Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3