Re: Using database to find file doublettes in my computer

Поиск

Список

Период

Сортировка

От	Eus
Тема	Re: Using database to find file doublettes in my computer
Дата	18 ноября 2008 г. 02:48:16
Msg-id	849157.43436.qm@web37603.mail.mud.yahoo.com обсуждение исходный текст
Ответ на	Using database to find file doublettes in my computer (Lothar Behrens <lothar.behrens@lollisoft.de>)
Список	pgsql-general

Дерево обсуждения

Hi Ho!

--- On Tue, 11/18/08, Lothar Behrens <lothar.behrens@lollisoft.de> wrote:

> Hi,
>
> I have a problem to find as fast as possible files that are
> double or
> in other words, identical.
> Also identifying those files that are not identical.
>
> My approach was to use dir /s and an awk script to convert
> it to a sql
> script to be imported into a table.
> That done, I could start issuing queries.
>
> But how to query for files to display a 'left / right
> view' for each
> file that is on multible places ?
>
> I mean this:
>
> This File;Also here
> C:\some.txt;C:\backup\some.txt
> C:\some.txt;C:\backup1\some.txt
> C:\some.txt;C:\backup2\some.txt
>
> but have only this list:
>
> C:\some.txt
> C:\backup\some.txt
> C:\backup1\some.txt
> C:\backup2\some.txt
>
>
> The reason for this is because I am faced with the problem
> of ECAD
> projects that are copied around
> many times and I have to identify what files are here
> missing and what
> files are there.
>
> So a manual approach is as follows:
>
> 1)   Identify one file (schematic1.sch) and see, where are
> copies of
> it.
> 2)   Compare the files of both directories and make a
> desision about
> what files to use further.
> 3)   Determine conflicts, thus these files can't be
> copied together
> for a cleanup.
>
> Are there any approaches or help ?

I also have been in this kind of circumstance before, but I work under GNU/Linux as always.

1. At that time, I used `md5sum' to generate the fingerprint of all files in a given directory to be cleaned up.

2. Later, I created a simple Java program to group the names of all files that had the same fingerprint (i.e., MD5
hash).

3. I simply deleted the files with the same MD5 hash but one file with a good filename (in my case, the filename
couldn'tbe relied on to perform a comparison since it differed by small additions like date, author's name, and the
like).

4. After that, I used my brain to find related files based on the filenames (e.g., `[2006-05-23] Jeff - x.txt' should
bethe same as `Jenny - x.txt'). Of course, the Java program also helped me in grouping the files that I thought to be
related.

5. Next, I perused the related files to see whether most of the contents were the same. If yes, I took the latest one
basedon the modified time. 

> This is a very time consuming job and I am searching for
> any solution
> that helps me save time :-)

Well, I think I saved a lot of time at that time to be able to eliminate about 7,000 files out of 15,000 files in about
twoweeks. 

> I know that those problems did not arise when the projects
> are well
> structured and in a version management system. But that
> isn't here :-)

I hope you employ such a system ASAP :-)

> Thanks
>
> Lothar

Best regards,

Eus (FSF member #4445)

In this digital era, where computing technology is pervasive,

your freedom depends on the software controlling those computing devices.

Join free software movement today!

It is free as in freedom, not as in free beer!

Join: http://www.fsf.org/jf?referrer=4445

В списке pgsql-general по дате отправления:

Предыдущее

От: "Scott Marlowe"
Дата: 18 ноября 2008 г., 02:03:26
Сообщение: Re: In memory Database for postgres

Следующее

От: "Webb Sprague"
Дата: 18 ноября 2008 г., 03:48:51
Сообщение: "INNER JOIN .... USING " in an UPDATE

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Using database to find file doublettes in my computer

Предыдущее

Следующее