Re: PATCH: Add uri percent-encoding for binary data

Поиск
Список
Период
Сортировка
От Anders Åstrand
Тема Re: PATCH: Add uri percent-encoding for binary data
Дата
Msg-id CAPwPebuhhnhr6KC45uEVBKwQsa44SdoLozGQDXdD=gEKOto1OA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: Add uri percent-encoding for binary data  (Isaac Morland <isaac.morland@gmail.com>)
Список pgsql-hackers
On Mon, Oct 7, 2019 at 11:38 PM Isaac Morland <isaac.morland@gmail.com> wrote:
>
> On Mon, 7 Oct 2019 at 03:15, Anders Åstrand <anders@449.se> wrote:
>>
>> Hello
>>
>> Attached is a patch for adding uri as an encoding option for
>> encode/decode. It uses what's called "percent-encoding" in rfc3986
>> (https://tools.ietf.org/html/rfc3986#section-2.1).
>>
>> The background for this patch is that I could easily build urls in
>> plpgsql, but doing the actual encoding of the url parts is painfully
>> slow. The list of available encodings for encode/decode looks quite
>> arbitrary to me, so I can't see any reason this one couldn't be in
>> there.
>>
>> In modern web scenarios one would probably most likely want to encode
>> the utf8 representation of a text string for inclusion in a url, in
>> which case correct invocation would be ENCODE(CONVERT_TO('some text in
>> database encoding goes here', 'UTF8'), 'uri'), but uri
>> percent-encoding can of course also be used for other text encodings
>> and arbitrary binary data.
>
>
> This seems like a useful idea to me. I've used the equivalent in Python and it provides more options:
>
> https://docs.python.org/3/library/urllib.parse.html#url-quoting
>
> I suggest reviewing that documentation there, because there are a few details that need to be checked carefully.
Whetheror not space should be encoded as plus and whether certain byte values should be exempt from %-encoding is
somethingthat depends on the application. Unfortunately, as far as I can tell there isn't a single version of URL
encodingthat satisfies all situations (thus explaining the complexity of the Python implementation). It might be
feasibleto suppress some of the Python options (I'm wondering about the safe= parameter) but I'm pretty sure you at
leastneed the equivalent of quote and quote_plus. 

Thanks a lot for your reply!

I agree that some (but not all) of the options available to that
python lib could be helpful for developers wanting to build urls
without having to encode the separate parts of it and stitching it
together, but not necessary for this patch to be useful. For generic
uri encoding the slash (/) must be percent encoded, because it has
special meaning in the standard. Some other extra characters may
appear unencoded though depending on context, but it's generally safer
to just encode them all and not hope that the encoder will know about
the context and skip over certain characters.

This does bring up an interesting point however. Maybe decode should
validate that only characters that are allowed unencoded appear in the
input?

Luckily, the plus-encoding of spaces are not part of the uri standard
at all but instead part of the format referred to as
application/x-www-form-urlencoded data. Fortunately that format is
close to dying now that forms more often post json.

Regards,
Anders



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Transparent Data Encryption (TDE) and encrypted files
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: v12 and pg_restore -f-