Обсуждение: test git conversion

Поиск
Список
Период
Сортировка

test git conversion

От
Marko Kreen
Дата:
So I did quick conversion with cvs2git (latest from svn).

Result is here:

  https://github.com/markokr/pgjdbc-test

Attached files, so you can repeat it:

convert.sh - fetches cvs repo, converts it, pushes out
c2g.config.py - cvs2git config
c2g-config.diff - my changes to sample config

I grepped author list from git-dump.dat, so it should be reasonably
complete:

  awk '/^committer / { print $2; }' cvs2svn-tmp/git-dump.dat | sort -u

Also I checked for non-ascii symbols in commits with:

  LANG=C grep -E '[^  -~]' cvs2svn-tmp/git-dump.dat

and they look fine to me.  (there needs to be tab inside brackets,
^V^I may help)

Todo:
- fill author map
- spot problems

--
marko

Вложения

Re: test git conversion

От
Marko Kreen
Дата:
On Wed, Oct 5, 2011 at 9:42 AM, Marko Kreen <markokr@gmail.com> wrote:
> Todo:
> - fill author map
> - spot problems

I found author map from https://github.com/mhagander/pggit_migrate
and used that (blind and oliver i guessed).  Also renamed
cvs2svn to cvs2git in config.

Few uncertain areas that I know of:
- What to do with keywords?  Curretly they are squashed.
- What to do with .cvsignore.  There is option of skipping them.
  [I would prefer explicit commit that converts them to .gitignore]
- There are few manufactured commits on branches (not on HEAD):

$ grep -i cvs2 cvs2svn-tmp/git-dump.dat
This commit was manufactured by cvs2git to create branch 'REL6_4'.
This commit was manufactured by cvs2git to create tag 'REL7_1_BETA'.
This commit was manufactured by cvs2git to create tag 'REL7_1'.
This commit was manufactured by cvs2git to create branch 'REL7_3_STABLE'.
This commit was manufactured by cvs2git to create branch 'REL7_4_STABLE'.
This commit was manufactured by cvs2git to create branch 'REL7_4_STABLE'.
This commit was manufactured by cvs2git to create branch 'REL8_0_STABLE'.
This commit was manufactured by cvs2git to create branch 'REL8_2_STABLE'.
This commit was manufactured by cvs2git to create branch 'REL8_1_STABLE'.
This commit was manufactured by cvs2git to create branch 'REL8_2_STABLE'.
This commit was manufactured by cvs2git to create tag 'REL8_2_505'.
This commit was manufactured by cvs2git to create branch 'REL8_0_STABLE'.
This commit was manufactured by cvs2git to create branch 'REL8_1_STABLE'.
This commit was manufactured by cvs2git to create branch 'REL8_2_STABLE'.

So the question is does the code look sane around those points
and do we need to do something with it.  As cvs2git has seen improvements
since postgres conversion, perhaps they are ok and we can ignore those.
Especially since they are only on older branches.

New config attached.  No changes to convert script.

--
marko

Вложения

Re: test git conversion

От
Tom Lane
Дата:
Marko Kreen <markokr@gmail.com> writes:
> - There are few manufactured commits on branches (not on HEAD):

> $ grep -i cvs2 cvs2svn-tmp/git-dump.dat
> This commit was manufactured by cvs2git to create branch 'REL6_4'.
> ...

> So the question is does the code look sane around those points
> and do we need to do something with it.  As cvs2git has seen improvements
> since postgres conversion, perhaps they are ok and we can ignore those.
> Especially since they are only on older branches.

I don't have details in my head anymore, but when we did the server's
git conversion, every one of those "manufactured commits" was a real
problem, ie, it didn't represent the history in a nice way.  I think the
most common cause was adding a file on HEAD and then on a pre-existing
branch, and that the unmodified git conversion then misrepresented the
state of the file at earlier instants in that branch (ie, it was there
when it should not be).  This is a bug/ambiguity in the CVS
representation, there is nothing cvs2git can do about it.  We worked
around it in the server conversion by editing the RCS files ... don't
know whether you guys are feeling sufficiently anal to do likewise.

            regards, tom lane

Re: test git conversion

От
Marko Kreen
Дата:
On Wed, Oct 5, 2011 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> - There are few manufactured commits on branches (not on HEAD):
>
>> $ grep -i cvs2 cvs2svn-tmp/git-dump.dat
>> This commit was manufactured by cvs2git to create branch 'REL6_4'.
>> ...
>
>> So the question is does the code look sane around those points
>> and do we need to do something with it.  As cvs2git has seen improvements
>> since postgres conversion, perhaps they are ok and we can ignore those.
>> Especially since they are only on older branches.
>
> I don't have details in my head anymore, but when we did the server's
> git conversion, every one of those "manufactured commits" was a real
> problem, ie, it didn't represent the history in a nice way.  I think the
> most common cause was adding a file on HEAD and then on a pre-existing
> branch, and that the unmodified git conversion then misrepresented the
> state of the file at earlier instants in that branch (ie, it was there
> when it should not be).  This is a bug/ambiguity in the CVS
> representation, there is nothing cvs2git can do about it.  We worked
> around it in the server conversion by editing the RCS files ... don't
> know whether you guys are feeling sufficiently anal to do likewise.

I actually got interfaces/jdbc parts from Magnus' repository_fixups
working against jdbc repo.  Attached script gets rid of two REL7_1 tag
related commits.

But looking at the script it indeed is not about "representing cvs
history in git"
but "working around bad history in cvs".  And it does not affect main branch
history in any way.

So it's dubious whether it's worthwhile to do it, and in any case I'm
wrong person to do it, as I'm completely unfamiliar with JDBC
code, both old and new...

--
marko

Вложения

Re: test git conversion

От
Marko Kreen
Дата:
On Wed, Oct 5, 2011 at 5:15 PM, Marko Kreen <markokr@gmail.com> wrote:
> On Wed, Oct 5, 2011 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Marko Kreen <markokr@gmail.com> writes:
>>> - There are few manufactured commits on branches (not on HEAD):
>>
>>> $ grep -i cvs2 cvs2svn-tmp/git-dump.dat
>>> This commit was manufactured by cvs2git to create branch 'REL6_4'.
>>> ...
>>
>>> So the question is does the code look sane around those points
>>> and do we need to do something with it.  As cvs2git has seen improvements
>>> since postgres conversion, perhaps they are ok and we can ignore those.
>>> Especially since they are only on older branches.
>>
>> I don't have details in my head anymore, but when we did the server's
>> git conversion, every one of those "manufactured commits" was a real
>> problem, ie, it didn't represent the history in a nice way.  I think the
>> most common cause was adding a file on HEAD and then on a pre-existing
>> branch, and that the unmodified git conversion then misrepresented the
>> state of the file at earlier instants in that branch (ie, it was there
>> when it should not be).  This is a bug/ambiguity in the CVS
>> representation, there is nothing cvs2git can do about it.  We worked
>> around it in the server conversion by editing the RCS files ... don't
>> know whether you guys are feeling sufficiently anal to do likewise.
>
> I actually got interfaces/jdbc parts from Magnus' repository_fixups
> working against jdbc repo.  Attached script gets rid of two REL7_1 tag
> related commits.
>
> But looking at the script it indeed is not about "representing cvs
> history in git"
> but "working around bad history in cvs".  And it does not affect main branch
> history in any way.
>
> So it's dubious whether it's worthwhile to do it, and in any case I'm
> wrong person to do it, as I'm completely unfamiliar with JDBC
> code, both old and new...

Another approach:

https://github.com/markokr/pgjdbc-clean

I just added line:

  ExcludeRegexpStrategyRule(r'REL[678].*|WIN32_DEV|release-6-3'),

to config, so that only REL9* branches and tags are converted:

 * [new branch]      REL9_0_STABLE -> REL9_0_STABLE
 * [new branch]      REL9_1_STABLE -> REL9_1_STABLE
 * [new branch]      master -> master
 * [new tag]         REL9_0_801 -> REL9_0_801
 * [new tag]         REL9_0_802 -> REL9_0_802
 * [new tag]         REL9_1_901 -> REL9_1_901

No manufactured commits.

I really suggest going with that approach, otherwise the
conversion will be huge amount of delicate work.
Or broken old branches.

Sanity checking is also easier this way.

If needed the interested people can fix and convert
older branches later.

--
marko

Вложения

Re: test git conversion

От
Marko Kreen
Дата:
On Wed, Oct 5, 2011 at 6:07 PM, Kris Jurka <books@ejurka.com> wrote:
> On 10/5/2011 9:03 AM, Marko Kreen wrote:
>> I just added line:
>>
>>   ExcludeRegexpStrategyRule(r'REL[678].*|WIN32_DEV|release-6-3'),
>>
>> to config, so that only REL9* branches and tags are converted:
>
> That's not going to do it.  We need the old branches and tags preserved.

Well, they *are* preserved, in cvs repo :)

But it's up to you to decide your requirements.

--
marko

Re: test git conversion

От
Kris Jurka
Дата:
On 10/5/2011 9:03 AM, Marko Kreen wrote:

> I just added line:
>
>    ExcludeRegexpStrategyRule(r'REL[678].*|WIN32_DEV|release-6-3'),
>
> to config, so that only REL9* branches and tags are converted:

That's not going to do it.  We need the old branches and tags preserved.

Kris Jurka

Re: test git conversion

От
Maciek Sakrejda
Дата:
If anyone is interested, I've created a github repo for the conversion
and included Marko's scripts so far:

https://github.com/deafbybeheading/pgjdbc2git/

It may be helpful to tracks what works and what doesn't in various attempts.

I did not include the lopping off of pre-9 branches, since that's not
an acceptable trade-off.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Marko Kreen
Дата:
On Thu, Oct 6, 2011 at 9:48 AM, Maciek Sakrejda <msakrejda@truviso.com> wrote:
> If anyone is interested, I've created a github repo for the conversion
> and included Marko's scripts so far:
>
> https://github.com/deafbybeheading/pgjdbc2git/
>
> It may be helpful to tracks what works and what doesn't in various attempts.
>
> I did not include the lopping off of pre-9 branches, since that's not
> an acceptable trade-off.

Cool, thanks for doing it.

I propose following patch to it:
- include commented-out line for dropping branches, then all my hacks
will be there.
- comment out pushing
- show how to create pgjdbc.fixed
- grep for manufactured commits
- change scripts to be executable

Also I suggest removing the .diff, its pointless.

--
marko

Вложения

Re: test git conversion

От
Dave Cramer
Дата:
Ok, I've had a crack at this. I have a local copy on github
git://github.com/davecramer/pgjdbc.git . How can I clone the github
copy to git.postgresql.org ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca



On Thu, Oct 6, 2011 at 4:13 AM, Marko Kreen <markokr@gmail.com> wrote:
> On Thu, Oct 6, 2011 at 9:48 AM, Maciek Sakrejda <msakrejda@truviso.com> wrote:
>> If anyone is interested, I've created a github repo for the conversion
>> and included Marko's scripts so far:
>>
>> https://github.com/deafbybeheading/pgjdbc2git/
>>
>> It may be helpful to tracks what works and what doesn't in various attempts.
>>
>> I did not include the lopping off of pre-9 branches, since that's not
>> an acceptable trade-off.
>
> Cool, thanks for doing it.
>
> I propose following patch to it:
> - include commented-out line for dropping branches, then all my hacks
> will be there.
> - comment out pushing
> - show how to create pgjdbc.fixed
> - grep for manufactured commits
> - change scripts to be executable
>
> Also I suggest removing the .diff, its pointless.
>
> --
> marko
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
>

Re: test git conversion

От
Maciek Sakrejda
Дата:
> Ok, I've had a crack at this. I have a local copy on github
> git://github.com/davecramer/pgjdbc.git . How can I clone the github
> copy to git.postgresql.org ?

Excellent. Before we do that, though, should we try to clean up the
cvs2svn manufactured commits on branches with git filter-branch? I've
been meaning to get back to this, but now that someone's taken the
initiative, I can give that a shot tonight. Also, is the author info
right?:

maciek@anemone:~$ git log | grep 'Author:' | sort | uniq
Author: barry <barry>
Author: blind <blind>
Author: davec <davec>
Author: davecramer <davecramer>
Author: jurka <jurka>
Author: momjian <momjian>
Author: oliver <oliver>
Author: petere <petere>
Author: peter <peter>
Author: pgsql <pgsql>
Author: scrappy <scrappy>
Author: tgl <tgl>
Author: wieck <wieck>

It does not include e-mail addresses (I don't know if we necessarily
need them, but, e.g., the main PostgreSQL repo seems to have them) and
it seems to dupe some committers with aliases (davec/davecramer and
possibly petere/peter? also, should the "pgsql" commmits be attributed
to someone?). The filter-branch (or another filter-branch) can take
care of this as well.

Thanks,
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Dave Cramer
Дата:
Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca



On Tue, Dec 6, 2011 at 4:23 PM, Maciek Sakrejda <msakrejda@truviso.com> wrote:
>> Ok, I've had a crack at this. I have a local copy on github
>> git://github.com/davecramer/pgjdbc.git . How can I clone the github
>> copy to git.postgresql.org ?
>
> Excellent. Before we do that, though, should we try to clean up the
> cvs2svn manufactured commits on branches with git filter-branch? I've
> been meaning to get back to this, but now that someone's taken the
> initiative, I can give that a shot tonight. Also, is the author info
> right?:
>
> maciek@anemone:~$ git log | grep 'Author:' | sort | uniq

barry, and blind are the same person
> Author: barry <barry>
> Author: blind <blind>
davec and davecramer are the same person ( me)
> Author: davec <davec>
> Author: davecramer <davecramer>
> Author: jurka <jurka>
> Author: momjian <momjian>
> Author: oliver <oliver>
petere and peter could well be two different ppl.
> Author: petere <petere>
> Author: peter <peter>
no idea who pgsql is ?
> Author: pgsql <pgsql>
> Author: scrappy <scrappy>
> Author: tgl <tgl>
> Author: wieck <wieck>
>
> It does not include e-mail addresses (I don't know if we necessarily
> need them, but, e.g., the main PostgreSQL repo seems to have them) and
> it seems to dupe some committers with aliases (davec/davecramer and
> possibly petere/peter? also, should the "pgsql" commmits be attributed
> to someone?). The filter-branch (or another filter-branch) can take
> care of this as well.
>
> Thanks,
> ---
> Maciek Sakrejda | System Architect | Truviso
>
> 1065 E. Hillsdale Blvd., Suite 215
> Foster City, CA 94404
> (650) 242-3500 Main
> www.truviso.com

Re: test git conversion

От
Dave Cramer
Дата:
Sorry, missed the fact that you had a question in there.

Yes if we can clean up the manufactured commits that would be good.

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca



On Tue, Dec 6, 2011 at 4:23 PM, Maciek Sakrejda <msakrejda@truviso.com> wrote:
>> Ok, I've had a crack at this. I have a local copy on github
>> git://github.com/davecramer/pgjdbc.git . How can I clone the github
>> copy to git.postgresql.org ?
>
> Excellent. Before we do that, though, should we try to clean up the
> cvs2svn manufactured commits on branches with git filter-branch? I've
> been meaning to get back to this, but now that someone's taken the
> initiative, I can give that a shot tonight. Also, is the author info
> right?:
>
> maciek@anemone:~$ git log | grep 'Author:' | sort | uniq
> Author: barry <barry>
> Author: blind <blind>
> Author: davec <davec>
> Author: davecramer <davecramer>
> Author: jurka <jurka>
> Author: momjian <momjian>
> Author: oliver <oliver>
> Author: petere <petere>
> Author: peter <peter>
> Author: pgsql <pgsql>
> Author: scrappy <scrappy>
> Author: tgl <tgl>
> Author: wieck <wieck>
>
> It does not include e-mail addresses (I don't know if we necessarily
> need them, but, e.g., the main PostgreSQL repo seems to have them) and
> it seems to dupe some committers with aliases (davec/davecramer and
> possibly petere/peter? also, should the "pgsql" commmits be attributed
> to someone?). The filter-branch (or another filter-branch) can take
> care of this as well.
>
> Thanks,
> ---
> Maciek Sakrejda | System Architect | Truviso
>
> 1065 E. Hillsdale Blvd., Suite 215
> Foster City, CA 94404
> (650) 242-3500 Main
> www.truviso.com

Re: test git conversion

От
Marko Kreen
Дата:
On Tue, Dec 6, 2011 at 11:23 PM, Maciek Sakrejda <msakrejda@truviso.com> wrote:
>> Ok, I've had a crack at this. I have a local copy on github
>> git://github.com/davecramer/pgjdbc.git . How can I clone the github
>> copy to git.postgresql.org ?
>
> Excellent. Before we do that, though, should we try to clean up the
> cvs2svn manufactured commits on branches with git filter-branch? I've
> been meaning to get back to this, but now that someone's taken the
> initiative, I can give that a shot tonight. Also, is the author info
> right?:

The manufactured commit means cvs2git has lost sync with repo
and uses such commit to get back on track.  Can you really
simply filter out such commits, without fixing the sync problem first?

Instead, if the surrounding commits look fine, you can leave them as-is.

> maciek@anemone:~$ git log | grep 'Author:' | sort | uniq
> Author: barry <barry>
> Author: blind <blind>
> Author: davec <davec>
> Author: davecramer <davecramer>
> Author: jurka <jurka>
> Author: momjian <momjian>
> Author: oliver <oliver>
> Author: petere <petere>
> Author: peter <peter>
> Author: pgsql <pgsql>
> Author: scrappy <scrappy>
> Author: tgl <tgl>
> Author: wieck <wieck>
>
> It does not include e-mail addresses (I don't know if we necessarily
> need them, but, e.g., the main PostgreSQL repo seems to have them) and
> it seems to dupe some committers with aliases (davec/davecramer and
> possibly petere/peter? also, should the "pgsql" commmits be attributed
> to someone?). The filter-branch (or another filter-branch) can take
> care of this as well.

This file should have good author list:

https://github.com/mhagander/pggit_migrate/blob/master/cvs2git.options

--
marko

Re: test git conversion

От
Tom Lane
Дата:
Dave Cramer <pg@fastcrypt.com> writes:
> petere and peter could well be two different ppl.
>> Author: petere <petere>
>> Author: peter <peter>

petere would be Peter Eisentraut, the other is Peter Mount, I believe.

> no idea who pgsql is ?
>> Author: pgsql <pgsql>
>> Author: scrappy <scrappy>

Both of those are Marc Fournier --- Marc sometimes committed from the
pgsql user account, but I don't think anybody else ever did.

For consistency's sake I'd suggest you use the same spellings used in
the core repository conversion, which are

Alvaro Herrera <alvherre@alvh.no-ip.org>
Andrew Dunstan <andrew@dunslane.net>
Barry Lind <barry@xythos.com>
Bruce Momjian <bruce@momjian.us>
Bryan Henderson <bryanh@giraffe.netgate.net>
Byron Nikolaidis <byronn@insightdist.com>
CVS to git conversion script <webmaster@postgresql.org>
D'Arcy J.M. Cain <darcy@druid.net>
Dave Cramer <davec@fastcrypt.com>
Dennis Bjorklund <db@zigo.dhs.org>
Edmund Mergl <E.Mergl@bawue.de>
Greg Stark <stark@mit.edu>
Heikki Linnakangas <heikki.linnakangas@iki.fi>
Hiroshi Inoue <inoue@tpf.co.jp>
Itagaki Takahiro <itagaki.takahiro@gmail.com>
Jan Wieck <JanWieck@Yahoo.com>
Joe Conway <mail@joeconway.com>
Julian Assange <proff@suburbia.net>
Kris Jurka <books@ejurka.com>
Magnus Hagander <magnus@hagander.net>
Marc G. Fournier <scrappy@hub.org>
Michael Meskes <meskes@postgresql.org>
Neil Conway <neilc@samurai.com>
Peter Eisentraut <peter_e@gmx.net>
Peter Mount <peter@retep.org.uk>
Philip Warner <pjw@rhyme.com.au>
PostgreSQL Daemon <webmaster@postgresql.org>
Robert Haas <rhaas@postgresql.org>
Simon Riggs <simon@2ndQuadrant.com>
Tatsuo Ishii <ishii@postgresql.org>
Teodor Sigaev <teodor@sigaev.ru>
Thomas G. Lockhart <lockhart@fourpalms.org>
Tom Lane <tgl@sss.pgh.pa.us>
Vadim B. Mikheev <vadim4o@yahoo.com>
Vince Vielhaber <vev@michvhf.com>

although if any of the current JDBC committers prefer other email
addresses, there's probably no harm in adjusting those.

            regards, tom lane

Re: test git conversion

От
Maciek Sakrejda
Дата:
Okay, based on some quick log digging, I *think* "petere" is Peter
Eisentraut and "peter" is Peter Mount. I'll reference the list Marko
just sent out and the core PostgreSQL repo logs for any other author
information. I'll clean up the authors and remove the manufactured
commits. As to the "synch" concerns, I'll spot-check as I do this and
we can investigate further. I'll script this so we can refine/redo it
if necessary.
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Tom Lane
Дата:
Marko Kreen <markokr@gmail.com> writes:
> The manufactured commit means cvs2git has lost sync with repo
> and uses such commit to get back on track.  Can you really
> simply filter out such commits, without fixing the sync problem first?

When we did the core conversion, all the "manufactured" commits turned
out to be due to problems in the CVS repository, and we fixed them by
tweaking the repository contents before running the conversion.  There's
lots of detail in the pgsql-hackers archives.

            regards, tom lane

Re: test git conversion

От
Maciek Sakrejda
Дата:
> When we did the core conversion, all the "manufactured" commits turned
> out to be due to problems in the CVS repository, and we fixed them by
> tweaking the repository contents before running the conversion.  There's
> lots of detail in the pgsql-hackers archives.

Kris had pointed out something like this, too. I had looked at the
most recent few manufactured commits and didn't see any problems
(i.e., they were empty and could simply have been filtered out), but I
investigated further and you're absolutely right: starting shortly
before the 8.2-508 tag (and all the way back to the beginning), there
are some bogus commits leading to a weird history: e.g.,
https://github.com/davecramer/pgjdbc/commit/266a282e221a33e6e5f52f5388ea739528267f70
. It's hard to see this without a visualizer like gitk, but that
commit has two parents, merging the master branch into what it claims
is the newly-created 8.2 branch (which in fact has been around for a
while by that point, as evidenced by all the 8.2.x tags). This sort of
thing seems to happen over and over. I can't quite follow what it's
doing (or the mailing list explanations I've found, for that matter),
possibly since I have extremely limited experience with CVS (and I
thank my lucky stars for that). I also tried with a newer cvs2git, but
no dice.

I think if we care about history, moving forward with the migration
without resolving this would be a mistake. I'll pore over the core
list history on this and see what I can see. Advice from anyone with
CVS background welcome--e.g., are there any (Linux, ideally, but
Windows works) tools for visualizing CVS history (like gitk for git?).
Does TortoiseCVS support that (I vaguely recall that TortoiseSVN
does)?

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Tom Lane
Дата:
Maciek Sakrejda <msakrejda@truviso.com> writes:
> Kris had pointed out something like this, too. I had looked at the
> most recent few manufactured commits and didn't see any problems
> (i.e., they were empty and could simply have been filtered out), but I
> investigated further and you're absolutely right: starting shortly
> before the 8.2-508 tag (and all the way back to the beginning), there
> are some bogus commits leading to a weird history: e.g.,
> https://github.com/davecramer/pgjdbc/commit/266a282e221a33e6e5f52f5388ea739528267f70
> . It's hard to see this without a visualizer like gitk, but that
> commit has two parents, merging the master branch into what it claims
> is the newly-created 8.2 branch (which in fact has been around for a
> while by that point, as evidenced by all the 8.2.x tags). This sort of
> thing seems to happen over and over. I can't quite follow what it's
> doing (or the mailing list explanations I've found, for that matter),
> possibly since I have extremely limited experience with CVS (and I
> thank my lucky stars for that). I also tried with a newer cvs2git, but
> no dice.

Many of the problems in the core conversion stemmed from the weird way
in which CVS deals with files that are added to branches later than they
are added to mainline.  Originally, CVS had no way to represent the fact
that such a file had not existed on the branch all the way back to the
time of its addition to mainline; although any tags you'd added to the
branch in between would not be on such a file, making it apparent that
it wasn't really there.  So cvs2git sees an inconsistent history and has
to do something ugly to fix it.

Later on, the CVS people invented a kluge way to represent this situation,
so it's possible that you don't have any such cases in recent history.
Basically the kluge requires inserting a deletion event on the branch at
the time of the file's creation on master.  We did that by hand-editing
the RCS files when we did the core conversion, which was a tad more
exciting than I would've liked, but there weren't really enough cases
to justify building a tool for it.

There were also assorted problems stemming from having done squirrelly
things back in the day, like moving the point from which the REL7_1
branch had been sprouted.  I'm not clear on how much of that might
affect the JDBC repository.

There's lots and lots of detail in the pgsql-hackers archives about
this, around mid September 2010.

            regards, tom lane

Re: test git conversion

От
Maciek Sakrejda
Дата:
Thanks, this is extremely helpful.

> Many of the problems in the core conversion stemmed from the weird way
> in which CVS deals with files that are added to branches later than they
> are added to mainline.

You mean something like branch REL_X_STABLE created, file A added to
trunk, some more commits to trunk, then eventually the addition of A
back-ported to REL_X_STABLE?

> Originally, CVS had no way to represent the fact
> that such a file had not existed on the branch all the way back to the
> time of its addition to mainline; although any tags you'd added to the
> branch in between would not be on such a file, making it apparent that
> it wasn't really there.  So cvs2git sees an inconsistent history and has
> to do something ugly to fix it.
>
> Later on, the CVS people invented a kluge way to represent this situation,
> so it's possible that you don't have any such cases in recent history.
> Basically the kluge requires inserting a deletion event on the branch at
> the time of the file's creation on master.  We did that by hand-editing
> the RCS files when we did the core conversion, which was a tad more
> exciting than I would've liked, but there weren't really enough cases
> to justify building a tool for it.

Okay, I'll dig into the RCS files from the dump Kris and sent out and
compare the (well-migrated) recent branches to the older branches
which show problems. I'm not one to shy away from excitement.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Tom Lane
Дата:
Maciek Sakrejda <msakrejda@truviso.com> writes:
>> Many of the problems in the core conversion stemmed from the weird way
>> in which CVS deals with files that are added to branches later than they
>> are added to mainline.

> You mean something like branch REL_X_STABLE created, file A added to
> trunk, some more commits to trunk, then eventually the addition of A
> back-ported to REL_X_STABLE?

Yeah, except it's not the other activity on trunk that's interesting,
it's the other activity on the branch.  To be concrete:

    1. Branch REL_X_STABLE is forked off trunk.
    2. Time passes, some commits occur in REL_X_STABLE.
    3. File A gets added to trunk.
    4. More time passes, more commits occur in REL_X_STABLE.
    5. We make a release and apply a tag REL_X_Y in the branch.
       (In CVS this means that every single RCS file that has
       live content in the branch receives a tag marker.  Since
       file A is not in the branch, it gets no tag marker.)
    6. File A gets added to branch REL_X_STABLE.

In the representation used by CVS, the commit 6 appears to be a direct
child of commit 3 (or maybe even commit 1, I forget); therefore it
appears that file A has existed on the branch since its inception.
But the lack of a REL_X_Y tag marker puts the lie to that appearance,
so cvs2git can tell the history is inconsistent.  To fix it, we need to
insert a deletion event on the branch immediately after commit 3
(or was it commit 1? ... anyway look at the pgsql-hackers discussions
for details).  Recent versions of CVS know to insert a special deletion
event when first adding a file to a branch, but older ones didn't.

One thing that took me awhile to figure out is that these magic deletion
events have to conform to exactly the format cvs2git expects, including
the wording of the manufactured commit message, else it fails to
understand what it's supposed to do with them.

            regards, tom lane

Re: test git conversion

От
Maciek Sakrejda
Дата:
Okay, thanks for the tips. I found the relevant mailing list archives
and have been looking through the chronicles of the conversion. I'm a
little scared and very impressed. If anyone else is interested,

http://archives.postgresql.org/pgsql-hackers/2010-09/msg00636.php
(Tom's main post-conversion/cleanup report, including some details on
the RCS hacks)
http://archives.postgresql.org/pgsql-hackers/2010-08/msg01247.php
(thread detailing original issues)
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00014.php
(continuation of above)

There are some other threads that month, but nothing else seems
directly relevant. I'll spend some time digging through these and try
the same RCS hack.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Maciek Sakrejda
Дата:
FYI: still working on this. It was a little more involved than I
realized, but it's coming along now. I understand the dummy commit
issue and have crafted CVS history that seems to turn the manufactured
commits benign (they're still there, but the history around them is
not crazy). We can filter them out later. I've also restored Marko's
original author mapping (which seems to cover everything, but I'll
double-check against the list Tom sent out) and started using an svn
trunk checkout of cvs2git (the older version I have chokes on Marko's
config).

My conversion scripts are still available at
https://github.com/deafbybeheading/pgjdbc2git . I've been working on
the older dump that Kris had provided, but everything is automated and
a conversion takes  a minute or two, so syncing it with any recent
changes should not be a big deal.

Remaining work:
 - Fix another couple of occurrences of the missing deletion
 - Add automatic validation of tags (and possibly other points in
history) based on Marko's script
 - Validate author mappings
 - Manually inspect/compare repository histories
 - Keyword expansion (It looks like the core PostgreSQL project just
replaced these with file name only; that's probably sensible)

I'll keep working on this and send out another update.

Thanks,
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Maciek Sakrejda
Дата:
Ok, some more progress:

I've taken care of all but two of the manufactured commit issues, have
a step to strip keywords (more on that later), and have done some
preliminary spot-checks of the generated history.

The two manufactured commits that probably warrant a little more attention:

maciek@anemone:~/jdbc-to-git/pgjdbc2git/pgjdbc-checkout$ git log --all
--grep=manufactured
commit e64c318487848ebd7dee6b795dc360a3633337e3
Author: CVS to git conversion script <webmaster@postgresql.org>
Date:   Fri Jul 27 10:15:40 2007 +0000

    This commit was manufactured by cvs2git to create tag 'REL8_2_505'.

    Sprout from REL8_2_STABLE 2007-04-18 08:15:18 UTC Kris Jurka
<books@ejurka.com> 'Prepare for release of 8.2-505.'
    Cherrypick from master 2007-07-27 10:15:39 UTC Kris Jurka
<books@ejurka.com> 'Remove unused imports.':
        META-INF/services/java.sql.Driver
        org/postgresql/test/jdbc4/Jdbc4TestSuite.java
        org/postgresql/test/jdbc4/LOBTest.java

commit bb06ce11331182c6e0cb73adc6e494c4b92da8c1
Author: CVS to git conversion script <webmaster@postgresql.org>
Date:   Mon Oct 12 02:45:46 1998 +0000

    This commit was manufactured by cvs2git to create branch 'REL6_4'.

    Sprout from master 1998-10-12 02:45:45 UTC Bruce Momjian
<bruce@momjian.us> 'This patch updates the ImageViewer example to use
Multiple Threading.'
    Delete:
        postgresql/ChangeLog
        postgresql/PG_Object.java
        postgresql/PGbox.java
        postgresql/PGcircle.java
        postgresql/PGlseg.java
        postgresql/PGpath.java
        postgresql/PGpoint.java
        postgresql/PGpolygon.java
        postgresql/PGtokenizer.java


The first of these is weird because it's actually a *tag*. The second
is weird because it's a number of deletions back-patched into REL6_4
(which seems to be a stub branch with only a single commit), with an
irrelevant commit message. I'm happy to pull the standard metadata
wrangling for these as well, but I'd like to understand what's going
on here if someone with more repo history knowledge has some insight.

I've also run Marko's verify script and the diffs seem benign (some
keyword expansion stuff). As I understand from reading the core
PostgreSQL lists, the solution taken there was to port over with
expansion, and then kill expansion on all active branches in a series
of post-conversion commits. Is that acceptable here?

Any other questions or comments?

Thanks,
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Kris Jurka
Дата:

On Tue, 27 Dec 2011, Maciek Sakrejda wrote:

> Ok, some more progress:
>
> I've taken care of all but two of the manufactured commit issues, have
> a step to strip keywords (more on that later), and have done some
> preliminary spot-checks of the generated history.
>
> The two manufactured commits that probably warrant a little more attention:
>
> maciek@anemone:~/jdbc-to-git/pgjdbc2git/pgjdbc-checkout$ git log --all
> --grep=manufactured
> commit e64c318487848ebd7dee6b795dc360a3633337e3
> Author: CVS to git conversion script <webmaster@postgresql.org>
> Date:   Fri Jul 27 10:15:40 2007 +0000
>
>     This commit was manufactured by cvs2git to create tag 'REL8_2_505'.
>
>     Sprout from REL8_2_STABLE 2007-04-18 08:15:18 UTC Kris Jurka
> <books@ejurka.com> 'Prepare for release of 8.2-505.'
>     Cherrypick from master 2007-07-27 10:15:39 UTC Kris Jurka
> <books@ejurka.com> 'Remove unused imports.':
>         META-INF/services/java.sql.Driver
>         org/postgresql/test/jdbc4/Jdbc4TestSuite.java
>         org/postgresql/test/jdbc4/LOBTest.java
>

This tag should never have been applied to these three files.  I'm not
sure how that happened, but please just remove the tag from these files.


> commit bb06ce11331182c6e0cb73adc6e494c4b92da8c1
> Author: CVS to git conversion script <webmaster@postgresql.org>
> Date:   Mon Oct 12 02:45:46 1998 +0000
>
>     This commit was manufactured by cvs2git to create branch 'REL6_4'.
>
>     Sprout from master 1998-10-12 02:45:45 UTC Bruce Momjian
> <bruce@momjian.us> 'This patch updates the ImageViewer example to use
> Multiple Threading.'
>     Delete:
>         postgresql/ChangeLog
>         postgresql/PG_Object.java
>         postgresql/PGbox.java
>         postgresql/PGcircle.java
>         postgresql/PGlseg.java
>         postgresql/PGpath.java
>         postgresql/PGpoint.java
>         postgresql/PGpolygon.java
>         postgresql/PGtokenizer.java

I don't have any context for this ancient change, but I've done a little
digging, and I believe the historical action was:

1) Make some modifications.
2) Remove the files in the generated commit.
3) Branch the REL6_4 release.

I think anytime the last action before a branch is a delete, cvs2git
will be confused because the deleted files don't get tagged, so the
branch date comes from the last non-delete modification and it doesn't see
the deletes as occuring before the branch.

What we want is the REL6_4 branch point to be after the delete
commit.

Kris Jurka

Re: test git conversion

От
Maciek Sakrejda
Дата:
Thanks, Kris--I really appreciate the help. I added patches to the
conversion script to strip the tags from that first set, but I'm still
not sure what to do on the second set, since it looks like there is no
other activity on the REL6_4 branch (no tags, even). It seems like
what you are saying is that REL6_4 should just point at the next
commit in the main development branch: is that right?

That is, this is the git history from around there (git log --all
--stat --graph --until '1999-01-16' --decorate):
https://gist.github.com/1659451

It looks like the same set of changes was committed to both trunk and
the branch, and nothing else ever happened on that branch. You're
saying that we should just drop that branch point and have the git
branch point at f44a35... (see gist linked above) as a stub branch
(that is, a branch with no commits of its own), yes?

Thanks,
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com



On Fri, Jan 20, 2012 at 3:02 PM, Kris Jurka <books@ejurka.com> wrote:
>
>
> On Tue, 27 Dec 2011, Maciek Sakrejda wrote:
>
>> Ok, some more progress:
>>
>> I've taken care of all but two of the manufactured commit issues, have
>> a step to strip keywords (more on that later), and have done some
>> preliminary spot-checks of the generated history.
>>
>> The two manufactured commits that probably warrant a little more attention:
>>
>> maciek@anemone:~/jdbc-to-git/pgjdbc2git/pgjdbc-checkout$ git log --all
>> --grep=manufactured
>> commit e64c318487848ebd7dee6b795dc360a3633337e3
>> Author: CVS to git conversion script <webmaster@postgresql.org>
>> Date:   Fri Jul 27 10:15:40 2007 +0000
>>
>>     This commit was manufactured by cvs2git to create tag 'REL8_2_505'.
>>
>>     Sprout from REL8_2_STABLE 2007-04-18 08:15:18 UTC Kris Jurka
>> <books@ejurka.com> 'Prepare for release of 8.2-505.'
>>     Cherrypick from master 2007-07-27 10:15:39 UTC Kris Jurka
>> <books@ejurka.com> 'Remove unused imports.':
>>         META-INF/services/java.sql.Driver
>>         org/postgresql/test/jdbc4/Jdbc4TestSuite.java
>>         org/postgresql/test/jdbc4/LOBTest.java
>>
>
> This tag should never have been applied to these three files.  I'm not
> sure how that happened, but please just remove the tag from these files.
>
>
>> commit bb06ce11331182c6e0cb73adc6e494c4b92da8c1
>> Author: CVS to git conversion script <webmaster@postgresql.org>
>> Date:   Mon Oct 12 02:45:46 1998 +0000
>>
>>     This commit was manufactured by cvs2git to create branch 'REL6_4'.
>>
>>     Sprout from master 1998-10-12 02:45:45 UTC Bruce Momjian
>> <bruce@momjian.us> 'This patch updates the ImageViewer example to use
>> Multiple Threading.'
>>     Delete:
>>         postgresql/ChangeLog
>>         postgresql/PG_Object.java
>>         postgresql/PGbox.java
>>         postgresql/PGcircle.java
>>         postgresql/PGlseg.java
>>         postgresql/PGpath.java
>>         postgresql/PGpoint.java
>>         postgresql/PGpolygon.java
>>         postgresql/PGtokenizer.java
>
> I don't have any context for this ancient change, but I've done a little
> digging, and I believe the historical action was:
>
> 1) Make some modifications.
> 2) Remove the files in the generated commit.
> 3) Branch the REL6_4 release.
>
> I think anytime the last action before a branch is a delete, cvs2git
> will be confused because the deleted files don't get tagged, so the
> branch date comes from the last non-delete modification and it doesn't see
> the deletes as occuring before the branch.
>
> What we want is the REL6_4 branch point to be after the delete
> commit.
>
> Kris Jurka

Re: test git conversion

От
Kris Jurka
Дата:

On Sun, 22 Jan 2012, Maciek Sakrejda wrote:

> Thanks, Kris--I really appreciate the help. I added patches to the
> conversion script to strip the tags from that first set, but I'm still
> not sure what to do on the second set, since it looks like there is no
> other activity on the REL6_4 branch (no tags, even). It seems like
> what you are saying is that REL6_4 should just point at the next
> commit in the main development branch: is that right?

Correct.

> It looks like the same set of changes was committed to both trunk and
> the branch, and nothing else ever happened on that branch. You're
> saying that we should just drop that branch point and have the git
> branch point at f44a35... (see gist linked above) as a stub branch
> (that is, a branch with no commits of its own), yes?

Yes.

Also, I've posted an updated version of the CVS repo at:

http://ejurka.com/pgsql/tmp/

It didn't produce any new manufactured commits, so I think we're getting
close to being ready for a move.  I will start a new thread to discuss the
actual details/timing of that.


Kris Jurka

Re: test git conversion

От
Maciek Sakrejda
Дата:
>> It looks like the same set of changes was committed to both trunk and
>> the branch, and nothing else ever happened on that branch. You're
>> saying that we should just drop that branch point and have the git
>> branch point at f44a35... (see gist linked above) as a stub branch
>> (that is, a branch with no commits of its own), yes?
>
> Yes.

Done. I think the only remaining thing is the keyword cleanup. I
presume what I suggested before works (keep keyword expansion as it
was in CVS; do a cleanup commit in git in all active branches to purge
expandable keywords; I believe this is what the main project did)? I
took out the "grep -v"s from the tag verification script to see what
was there, and there are some subtantial diffs around that. I tried
tweaking the cvs2git config, but that only gives different diffs. I
need to spend more time looking at this, but I should be done soon
(though if anyone has any insight here, I welcome it).

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Maciek Sakrejda
Дата:
One more thing: are we missing the config for the $PostgreSQL$ custom
keyword in that tarball? I thought it was being processed properly,
but now that I've read up a bit on CVS keywords, I think it may not be
processed at all because the config is missing.
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Maciek Sakrejda
Дата:
I'm now working on getting the keyword expansion right. Some files in
older revisions have $Header$ keywords, which include the full path on
expansion, which is a problem (since the expanded version has to be
committed to git). I looked at what the server project did, and the
CVS root references seem to have been replaced with "/cvsroot". I
added functionality to do this on conversion in cvs2git (seems to
work; will try to commit a patch upstream), and added it as a prep
step to the CVS checkouts in the verify-tags.sh script, but I still
see diffs:

diff -r '--exclude=CVS' '--exclude=.git' ./Makefile ../cvs-co/REL6_5/Makefile
7c7
< #    $Header: /cvsroot/pgjdbc/Makefile,v 1.14 1999-06-23 05:56:17 peter Exp $
---
> #    $Header: /cvsroot/pgjdbc/Attic/Makefile,v 1.14 1999-06-23 05:56:17 peter Exp $

There are several dozen of these across all the various tags. The left
is the git conversion, the right is the CVS checkout. Everything in
the CVS checkout seems to be in the attic (even though the files
themselves are not--at least not at those checked out tags). Is this
expected? Does it have to do with the missing config file?

Thanks,
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Maciek Sakrejda
Дата:
> There are several dozen of these across all the various tags. The left
> is the git conversion, the right is the CVS checkout. Everything in
> the CVS checkout seems to be in the attic (even though the files
> themselves are not--at least not at those checked out tags). Is this
> expected? Does it have to do with the missing config file?

So not much news on this front except that I did a fresh checkout from
pgjdbc CVS, and I seem to have the same "/Attic/" path component on
those $Header$ expansions in old branches. I still don't understand
it, but it seems legit. If I don't hear otherwise, I'll throw a git
filter-branch step into the conversion process to make git history
consistent with these.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: test git conversion

От
Maciek Sakrejda
Дата:
The "/Attic" issue turned out to be a cvs2git problem; Michael
Haggerty fixed it very quickly. We now convert with spotless [1]
verification across all 80 (!) tags and the history looks clean. I
think we're ready to roll.

The conversion process is a little convoluted, but most of it is
automated. As before, the scripts to do the conversion are on github:
https://github.com/deafbybeheading/pgjdbc2git/ . In theory, I think
anyone running the script against the same repository bundle *should*
produce the same git history, but I didn't want to push the result to
github yet just in case.

[1]: Because git avoids CVS-style keywords, the conversion cannot be
perfect; in verification I substitute "/cvsroot" for the repository
root (though this only affects some older branches)
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com