Обсуждение: Error inserting RFC1738-encoded URLs
Hello, sometimes I get encoding errors when inserting a s a encoded URL in a text field. The database uses UTF8, with both collation and c-type defined as en_US.UTF-8, and the URL field itself is defined as VARCHAR(1024). In the case that the URL is longer than 1024 the software truncates it. The inserted URL is extracted from the log file of the Squid Proxy, which is encoded in UTF8. The URL is encoded with RFC 1738 encoding of all non-ASCII characters in the path & query sections. puny-coding of characters in the host authority section. RFC 1738 -> http://www.ietf.org/rfc/rfc1738.txt Example of URLs that raise error: http://www.formacion.aimplas.es/_Documentos/2011/FORMACIÓN%20ABIERTA/Folleto%20Especialistas%20Universitarios%20Polímeros%20ok.pdf http://ads.prisacom.com/RealMedia/ads/adstream_mjx.ads/www.elpais.es/edicionimpresa/deportes/articulos/1452867580@Middle,Middle1,Top,Top2,TopRight,x02,x20?search=VUELTA%20A%20ESPAÑA,Ciclismo,Deportes http://ads.prisacom.com/RealMedia/ads/adstream_nx.ads/www.elpais.es/edicionimpresa/deportes/articulos/1452867580@Middle,Middle1,Top,Top2,TopRight,x02,x20!Middle?search=VUELTA%20A%20ESPAÑA,Ciclismo,Deportes http://www.t-a-o.com/ES/moda-bebe-nino/pantalón/flash/zoom.swf?image_lien=52905_C1057_A_zoom.jpg&lang=ES http://static.slidesharecdn.com/swf/menu.swf?embedCode=<div%20style="width:425px"%20id="__ss_1320169">%20<strong%20style="display:block;margin:12px%200%204px"><a%20href="http://www.slideshare.net/raimonesteve/que-es-openerp"%20title="¿Que%20es%20Openerp?"%20target="_blank">¿Que%20es%20Openerp?</a></strong>%20<iframe%20src="http://www.slideshare.net/slideshow/embed_code/1320169"%20width="425"%20height="355"%20frameborder="0"%20marginwidth="0"%20marginheight="0"%20scrolling="no"></iframe>%20<div%20style="padding:5px%200%2012px">%20View%20more%20<a%20href="http://www.slideshare.net/"%20target="_blank">presentations</a>%20from%20<a%20href="http://www.slideshare.net/raimonesteve"%20target="_blank">raimonesteve</a>%20</div>%20</div>&showID=1320169&showURL=http://www.slideshare.net/raimonesteve/que-es-openerp ---------------- End URL examples -------------------------------- Anyone know what I must do to be able to safely insert any http URL?. Thanks for your time, Javier
Thanks for you reply Marti. I wil ltry to complete the description of the problem. I get errors like this: Error inserting data: INSERT INTO squid_access ( bytes, event, elapsed, rfc931, timestamp, url, method, peer, mimetype, remotehost, code) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ERROR: invalid byte sequence for encoding "UTF8": 0xe97469 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". I am sending the data to the database trough the DBI perl module, tomorrow I will make sure that the data is read as UTF8 but I think I had previously already tried to encode this a UTF8. (not sure, it was some weeks ago). Anyway, with the new data you continue to think the problem is that the data is sent with the bad encoding?. Regards, Javier On 10/24/2011 01:37 PM, Marti Raudsepp wrote: > On Mon, Oct 24, 2011 at 10:27, Javier Amor garcia<jamor@zentyal.com> wrote: >> sometimes I get encoding errors when inserting a s a encoded URL in a text >> field. > > You forgot the most important thing: *What's* the error that you get? > >> http://www.formacion.aimplas.es/_Documentos/2011/FORMACIÓN%20ABIERTA/Folleto%20Especialistas%20Universitarios%20Polímeros%20ok.pdf > > Since I have to guess, I suspect you're sending these strings to > Postgres in a non-UTF-8 encoding. > > This isn't a valid URL anyway -- you can't have unquoted "Ó" or "í" > characters since they're not valid ASCII. But a 'varchar' field would > accept them anyway if you send them in the right encoding. > > Regards, > Marti
On Mon, Oct 24, 2011 at 10:27, Javier Amor garcia <jamor@zentyal.com> wrote: > sometimes I get encoding errors when inserting a s a encoded URL in a text > field. You forgot the most important thing: *What's* the error that you get? > http://www.formacion.aimplas.es/_Documentos/2011/FORMACIÓN%20ABIERTA/Folleto%20Especialistas%20Universitarios%20Polímeros%20ok.pdf Since I have to guess, I suspect you're sending these strings to Postgres in a non-UTF-8 encoding. This isn't a valid URL anyway -- you can't have unquoted "Ó" or "í" characters since they're not valid ASCII. But a 'varchar' field would accept them anyway if you send them in the right encoding. Regards, Marti