Claus Guttesen <kometen@gmail.com> schrieb:
> Hi.
> 
> I have two tables, images and duplicates. The images-table is our
> current table and has approx. 90 mill. entries. I want to weed out
> duplicate file-entries (based on the md5-checksum of the file and
> user-id) and update the file name with the first entry found, if any.
> 
> The images-table is:
> 
> id serial primary key,
> userid int,
> filename text,
> hashcode text,
> and some additional fields like upload-time, exif-date etc.
> 
> Duplicates:
> id serial primary key,
> userid int,
> filename text,
> hashcode text,
> ref_count int
> 
> What I'd like to do is to perform a single query where I select from
> both tables and then test whether the file is all-ready in duplicates:
I'm not sure if i understand you correctly, but maybe this is what you
want. First, my tables:
test=# select * from images;userid | filename | ref_count
--------+----------+-----------     1 | foo      |     2 | bar      |     3 | foobar   |
(3 Zeilen)
Zeit: 0,153 ms
test=*# select * from duplicates ;userid | filename
--------+----------     2 | bar     3 | foobar     3 | foobar
(3 Zeilen)
Okay, now i update images and set the corrent ref_count:
test=*# update images  set ref_count = count from (   select i.userid, i.filename, count(d.filename) from images i
leftouter join duplicates d        using(userid,filename)      group by 1,2 ) foo  where    images.userid=foo.userid
and   images.filename=foo.filename;
 
UPDATE 3
Zeit: 0,621 ms
test=*# select * from images;userid | filename | ref_count
--------+----------+-----------     1 | foo      |         0     2 | bar      |         1     3 | foobar   |         2
(3 Zeilen)
HTH, Andreas
Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°