Re: git: uh-oh

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: git: uh-oh
Дата
Msg-id AANLkTim+gfwcLXKZ5JkP-5sJVFkfWdVhePhC1=Fd5OK8@mail.gmail.com
обсуждение исходный текст
Ответ на Re: git: uh-oh  (Max Bowsher <maxb@f2s.com>)
Список pgsql-hackers
On Wed, Aug 25, 2010 at 13:03, Max Bowsher <maxb@f2s.com> wrote:
> On 25/08/10 09:18, Magnus Hagander wrote:
>> On Wed, Aug 25, 2010 at 07:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>
>>>> 2. Any non-ASCII characters in, for example, contributor's names show
>>>> up differently in the two repos.  Generally, the original repo is OK
>>>> and the new repo is garbled; although I found one very old example
>>>> that went the other way.
>>>
>>> What it looks like to me is that a Latin1->UTF8 conversion has been
>>> applied to the log text.  Which might be a good idea if it all *was*
>>> Latin1, but a fair-sized percentage isn't.  Applying this conversion to
>>> UTF8 entries results in garbage, of course.  Even if this could be done
>>> reliably, I think this counts as editorializing on the historical
>>> record, and should be switched off if possible.
>>
>> I think the problem is that we have a mix of them :( git requires it to be utf8.
>>
>> cvs2git is configured to try, in order, latin1, utf8 and ascii, and
>> use whichever first returns correct result. In this case it seems it
>> does return saying things are right, because the result is valid utf8
>> - just not the utf8 we expected.
>>
>> I can give it a try the other way around - trying utf8 *before*
>> latin1, to see if that makes it better - utf8 tends to be more strict.
>
> *Every* byte sequence is valid latin1, therefore if you try latin1,
> utf8, ascii in that order, latin1 will always be used.
>
> You most likely want utf8, latin1 (no point also including ascii since
> it's a strict subset of latin1).

Yup. I re-ran it with utf8, latin1, ascii and that commit looks better now.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Max Bowsher
Дата:
Сообщение: Re: git: uh-oh
Следующее
От: Robert Haas
Дата:
Сообщение: Re: git: uh-oh