Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256
Дата
Msg-id 7560f076-a3c4-bcf7-09f7-bf7f10be78dd@iki.fi
обсуждение исходный текст
Ответ на Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256
Список pgsql-hackers
On 04/06/2017 07:59 PM, Heikki Linnakangas wrote:
> Another thing I'd like some more eyes on, is how this will work with
> encodings other than UTF-8. We will now try to normalize the password as
> if it was in UTF-8, even if it isn't. That's OK as long as we're
> consistent about it, but there is one worrisome scenario: what if the
> user's password consists mostly of characters, that when interpreted as
> UTF-8, are in the list of ignored characters. IOW, is it realistic that
> a user might have a password in a non-UTF-8 encoding, that gets silently
> mangled into something much shorter? I think that's highly unlikely, but
> can anyone come up with a plausible example of that?

I did some testing on what the byte sequences for the Unicode characters 
that SASLprep ignores mean in other encodings. I created a text file 
containing every ignored character, in UTF-8, and ran "iconv -f <other 
encoding> -t UTF-8//TRANSLIT" on the file, using all supported server 
encodings. The idea is to take each of the ignored byte sequences, and 
pretend that they are in some other encoding. If converting them to 
UTF-8 results in a legit character, then that character means something 
in that encoding, and could be misinterpreted if it's used in a password.

Here are some characters that seem plausible to be misinterpreted and 
ignored by SASLprep:

-------
EUC-JP and EUC-JISX0213:

U+00AD (C2 AD): 足 (meaning "foot", per Unihan database)
U+FE00-FE0F (EF B8 8X): 鏝 (meaning "trowel", per Unihan database)

EUC-CN:

U+00AD (C2 AD): 颅 (meaning "skull", per Unihan database)
U+FE00-FE0FF (EF B8 8X): 锔 (meaning "curium", per Unihan database)
U+FEFF (EF BB BF): 锘 (meaning "nobelium", per Wikipedia)

EUC-KR:

U+FE00-FE0F (EF BB BF): 截 (meanings "cut off, stop, obstruct, 
intersect", per Unihan database
U+FEFF (EF BB BF): 癤 (meanings "pimple, sore, boil", per Unihan database)

EUC-TW:
U+FE00-FE0F: 踫 (meanings "collide, bump into", per Unihan database)
U+FEFF: 踢  (meaning "kick", per Unihan database)

CP866:
U+1806: саЖ
U+180B: саЛ
U+180C: саМ
U+180D: саН
U+200B: тАЛ
U+200C: тАМ
U+200D: тАН
-------

The CP866 cases seem most likely to cause confusion. Those are all 
common words in Russian. I don't know how common those Chinese/Japanese 
characters are.

Overall, I think this is OK. Even though there are those characters that 
can be misinterpreted, for it to be problem all of the following have to 
be true:

1. The client is using one of those encodings.
2. The password string as whole has to look like valid UTF-8.
3. Ignoring those characters/words from the password would lead to a 
significantly weaker password, i.e. it was not very long to begin with, 
or it consisted almost entirely of those characters/words.

Thoughts? Attached is the full results of running iconv with each 
encoding, from which I picked the above cases.

- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build