Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3
От | Karl O. Pinc |
---|---|
Тема | Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3 |
Дата | |
Msg-id | 20240213193732.28cb8329@slate.karlpinc.com обсуждение исходный текст |
Ответы |
Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3
|
Список | psycopg |
Hi, I'm writing a program which takes a text file from the user, constructs an INPUT statement (the table and column names are not known in advance), and uses the constructed statement to insert rows of the text file into PG. psycopg3 v3.1.18 (installed in a venv) PG v15.5 Python 3.11.2 Debian 12.5 (stable, bookworm) Should the user supply text that cannot be converted to the server's encoding I want to report exactly where in the data the problem lies. The sample code below contains (some of) what I want to report when there is a problem. Because I want to know more than just "this particular row of data caused a problem", after an error is discovered the code below calls a psycopg3 dumper on each data value to try to determine which data value is problematic. This does not work. What is wrong with what I'm doing and how do I do what I want? (And how am I supposed to know why this does not work and what works?) I call the dumper because I want to rely on psycopg3's mechanisms and not have to query the server for its encoding and figure out the PG->Python encoding mappings myself. Thanks for the help. Setup: Client side encoding is UTF-8. create database test_latin1 template template0 encoding 'LATIN1' locale 'C'; -- In the latin1 db create table foo (col1 text, col2 text, col3 text); Mock up of desired code (only slightly mangled by my email client): -----------------<snip>----------- #!/usr/bin/python3 # # Test handing of database insertion encoding exception import psycopg import psycopg.adapt import sys # Latin1 does not have the euro symbol data_row = ("ascii", "euro € symbol", "more ascii") # Connect to an existing database with psycopg.connect("dbname=test_latin1 user=test") as conn: text_dumper = conn.adapters.get_dumper( str, psycopg.adapt.PyFormat.TEXT) # Open a cursor to perform database operations with conn.cursor() as cur: try: cur.execute( """ insert into foo (col1, col2, col3) values (%s, %s, %s) """, data_row, ) except UnicodeEncodeError: # Call the text dumper on each data value to find which # caused the problem and provide a detailed report. col = 1 for col_val in data_row: try: text_dumper.dump(col_val) except UnicodeEncodeError as err: print( f"Column number {col} ({col_val}) contains a" " character" f" ({col_val[err.start, err.end]}), position" f" {err.start + 1} through {err.end}, that" " cannot be converted to the server-side" " encoding" f" ({err.encoding})" ) print(str(err)) sys.exit() col += 1 print("Cannot find cause of error") sys.exit() print("Unexpectedly got no error") sys.exit() -----------------<snip>----------- The relevant traceback (after the traceback of the initial exception that is caught) is: Traceback (most recent call last): File "/home/kop/projects/devel/test_encoding.py", line 32, in <module> text_dumper.dump(col_val) TypeError: _StrDumper.dump() missing 1 required positional argument: 'obj' Regards, Karl <kop@karlpinc.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein