Обсуждение: [HACKERS] shift_sjis_2004 related autority files are remaining

Поиск
Список
Период
Сортировка

[HACKERS] shift_sjis_2004 related autority files are remaining

От
Kyotaro HORIGUCHI
Дата:
Hi, I happned to notice that backend/utils/mb/Unicode directory
contains two encoding authority files, which I believe are not to
be there.

euc-jis-2004-std.txt
sjis-0213-2004-std.txt

And what is more astonishing, make distclean didn't its work.

| $ make distclean
| rm -f 

The Makefile there is missing the defenition of TEXT.

# Sorry for the bogus patch by me..

The attached is the *first patch* that fixes distclean and adds
the two files into GENERICTEXTS.

=====

I don't attach the *second* patch since it's too large for the
trivality and can be made by the following steps.

$ cd src/backend/utils/mb/Unicode
$ git rm *.txt
$ git commit


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Robert Haas
Дата:
On Fri, Apr 7, 2017 at 1:59 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hi, I happned to notice that backend/utils/mb/Unicode directory
> contains two encoding authority files, which I believe are not to
> be there.
>
> euc-jis-2004-std.txt
> sjis-0213-2004-std.txt
>
> And what is more astonishing, make distclean didn't its work.
>
> | $ make distclean
> | rm -f
>
> The Makefile there is missing the defenition of TEXT.
>
> # Sorry for the bogus patch by me..
>
> The attached is the *first patch* that fixes distclean and adds
> the two files into GENERICTEXTS.
>
> =====
>
> I don't attach the *second* patch since it's too large for the
> trivality and can be made by the following steps.
>
> $ cd src/backend/utils/mb/Unicode
> $ git rm *.txt
> $ git commit

I think you are right about all of this, although I am not an expert
in this area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Tatsuo Ishii
Дата:
> Hi, I happned to notice that backend/utils/mb/Unicode directory
> contains two encoding authority files, which I believe are not to
> be there.
> 
> euc-jis-2004-std.txt
> sjis-0213-2004-std.txt

Why do you believe so?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Michael Paquier
Дата:
On Fri, Jun 23, 2017 at 8:12 AM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
>> Hi, I happned to notice that backend/utils/mb/Unicode directory
>> contains two encoding authority files, which I believe are not to
>> be there.

(Worked on that with Horiguchi-san a couple of weeks back.)

>> euc-jis-2004-std.txt
>> sjis-0213-2004-std.txt
>
> Why do you believe so?

Unicode/Makefile includes that:
euc-jis-2004-std.txt sjis-0213-2004-std.txt:       $(DOWNLOAD) http://x0213.org/codetable/$(@F)

So those files ought to be downloaded when rebuilding the maps, and
they should not be in the tree. In short, I think that Horiguchi-san
is right. On top of the two pointed out by Horiguchi-san,
gb-18030-2000.xml should not be in the tree.
-- 
Michael



Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Tatsuo Ishii
Дата:
>> Why do you believe so?
> 
> Unicode/Makefile includes that:
> euc-jis-2004-std.txt sjis-0213-2004-std.txt:
>         $(DOWNLOAD) http://x0213.org/codetable/$(@F)
> 
> So those files ought to be downloaded when rebuilding the maps, and
> they should not be in the tree. In short, I think that Horiguchi-san
> is right. On top of the two pointed out by Horiguchi-san,
> gb-18030-2000.xml should not be in the tree.

I think we should keep the original .txt files because:

- It allows to track the changes in the original file if we decide to change the map files.

- The site http://x0213.org/ may disappear in the future. If that happens, we will lose track data how we create the
mapfiles.
 

I believe we'd better to follow the same way how src/timezone keeps
the original timezone data.

Above reasoning will not valid if we have a way to reconstruct the
original txt files from the map files, I doubt it's worth the
trouble to create such tools however.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Michael Paquier
Дата:
On Fri, Jun 23, 2017 at 9:39 AM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> I think we should keep the original .txt files because:

Hm. I am wondering about licensing issues here to keep those files in
the tree. I am no lawyer.

> - It allows to track the changes in the original file if we decide to
>   change the map files.

You have done that in the past for a couple of codepoints, didn't you?

> - The site http://x0213.org/ may disappear in the future. If that
>   happens, we will lose track data how we create the map files.

There are other problems then as there are 3 sites in use to fetch the data:
- GB2312.TXT comes from greenstone.org.
- Some from icu-project.org.
- The rest is from unicode.org.

> I believe we'd better to follow the same way how src/timezone keeps
> the original timezone data.
>
> Above reasoning will not valid if we have a way to reconstruct the
> original txt files from the map files, I doubt it's worth the
> trouble to create such tools however.

That's true as well. No need for reverse-engineering if there is no
reason to. That would be possible though.
-- 
Michael



Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Tatsuo Ishii
Дата:
> Hm. I am wondering about licensing issues here to keep those files in
> the tree. I am no lawyer.

Of course. Regarding euc-jis-2004-std.txt and sjis-0213-2004-std.txt,
it seems safe to keep them.

> ## Date: 13 May 2006
> ## License:
> ##     Copyright (C) 2001 earthian@tama.or.jp, All Rights Reserved.
> ##     Copyright (C) 2001 I'O, All Rights Reserved.
> ##     Copyright (C) 2006 Project X0213, All Rights Reserved.
> ##     You can use, modify, distribute this table freely.

>> - It allows to track the changes in the original file if we decide to
>>   change the map files.
> 
> You have done that in the past for a couple of codepoints, didn't you?

I believe the reason why I didn't keep other txt files were they were
prohibited to have copies according to their license.

>> - The site http://x0213.org/ may disappear in the future. If that
>>   happens, we will lose track data how we create the map files.
> 
> There are other problems then as there are 3 sites in use to fetch the data:
> - GB2312.TXT comes from greenstone.org.
> - Some from icu-project.org.
> - The rest is from unicode.org.

Maybe, but I don't know how to deal with them.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Kyotaro HORIGUCHI
Дата:
At Fri, 23 Jun 2017 10:04:26 +0900 (JST), Tatsuo Ishii <ishii@sraoss.co.jp> wrote in
<20170623.100426.157023025943107410.t-ishii@sraoss.co.jp>
> > Hm. I am wondering about licensing issues here to keep those files in
> > the tree. I am no lawyer.
> 
> Of course. Regarding euc-jis-2004-std.txt and sjis-0213-2004-std.txt,
> it seems safe to keep them.
> 
> > ## Date: 13 May 2006
> > ## License:
> > ##     Copyright (C) 2001 earthian@tama.or.jp, All Rights Reserved.
> > ##     Copyright (C) 2001 I'O, All Rights Reserved.
> > ##     Copyright (C) 2006 Project X0213, All Rights Reserved.
> > ##     You can use, modify, distribute this table freely.
> 
> >> - It allows to track the changes in the original file if we decide to
> >>   change the map files.
> > 
> > You have done that in the past for a couple of codepoints, didn't you?
> 
> I believe the reason why I didn't keep other txt files were they were
> prohibited to have copies according to their license.

For clarity, I personally perfer to keep all the source text file
in the repository, especially so that we can detect changes of
them. But since we decide that at least most of them not to be
there (from a reason of license), I just don't see a reason to
keep only the rest even without the restriction.

> >> - The site http://x0213.org/ may disappear in the future. If that
> >>   happens, we will lose track data how we create the map files.
> > 
> > There are other problems then as there are 3 sites in use to fetch the data:
> > - GB2312.TXT comes from greenstone.org.
> > - Some from icu-project.org.
> > - The rest is from unicode.org.
> 
> Maybe, but I don't know how to deal with them.

Except for detecting changes, as mentioned upthread, in case of
necessity of authority files (why?) after losing the autority, we
can regenerate a linear mapping from a .map file. But I believe
that further change (that we should follow) will hardly come.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Tatsuo Ishii
Дата:
> For clarity, I personally perfer to keep all the source text file
> in the repository, especially so that we can detect changes of
> them. But since we decide that at least most of them not to be
> there (from a reason of license), I just don't see a reason to
> keep only the rest even without the restriction.

So are you saying that if n/m of authority files are not kept because
of license issue, then m-n authority files should not be kept as well?
What's the benefit for us by doing so?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Peter Eisentraut
Дата:
On 6/23/17 11:15, Tatsuo Ishii wrote:
>> For clarity, I personally perfer to keep all the source text file
>> in the repository, especially so that we can detect changes of
>> them. But since we decide that at least most of them not to be
>> there (from a reason of license), I just don't see a reason to
>> keep only the rest even without the restriction.
> 
> So are you saying that if n/m of authority files are not kept because
> of license issue, then m-n authority files should not be kept as well?
> What's the benefit for us by doing so?

I don't have a clear opinion on this particular issue, but I think we
should have clarity on why particular files or code exist.  So why do
these files exist and the others don't?  Is it just the license?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Tatsuo Ishii
Дата:
> I don't have a clear opinion on this particular issue, but I think we
> should have clarity on why particular files or code exist.  So why do
> these files exist and the others don't?  Is it just the license?

I think so.

Many of those files are from http://ftp.unicode.org. There's no
license description there, so I think we should not copy those files
for safety reason. (I vaguely recall that they explicitly prohibited
to distribute the files before but I could no find such a statement at
this moment).

gb-18030-2000.xml and windows-949-2000.xml are from
https://ssl.icu-project.org/. I do not know what licenses those files
use (maybe Apache).

Regarding euc-jis-2004-std.txt and sjis-0213-2004-std.txt are from
http://x0213.org. The license are described in the files.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: [HACKERS] shift_sjis_2004 related autority files are remaining

От
Kyotaro HORIGUCHI
Дата:
Hi,

At Sun, 25 Jun 2017 09:20:10 +0900 (JST), Tatsuo Ishii <ishii@sraoss.co.jp> wrote in
<20170625.092010.542143642647288693.t-ishii@sraoss.co.jp>
> > I don't have a clear opinion on this particular issue, but I think we
> > should have clarity on why particular files or code exist.  So why do
> > these files exist and the others don't?  Is it just the license?
> 
> I think so.
> 
> Many of those files are from http://ftp.unicode.org. There's no
> license description there, so I think we should not copy those files
> for safety reason. (I vaguely recall that they explicitly prohibited
> to distribute the files before but I could no find such a statement at
> this moment).

The license for the files is seen in "EXHIBIT 1" in the following URL.

http://www.unicode.org/copyright.html

Roughly it claims that the copied files or software containing
the copy of thefiles should be accompanied by the same copyright
notice, or it should be seen in associated documentation. So we
could contain the files by adding some notice but fially we
decide not to contain them in the repository, I think

> gb-18030-2000.xml and windows-949-2000.xml are from
> https://ssl.icu-project.org/. I do not know what licenses those files
> use (maybe Apache).
> 
> Regarding euc-jis-2004-std.txt and sjis-0213-2004-std.txt are from
> http://x0213.org. The license are described in the files.

I'm not intending to insisnt on removing them if someone strongly
wants to preserve them, since their existence don't harm
anything.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center