Обсуждение: restore/dup OIDs HELP!
Greetings Group, I'm running 7.1. Basically, my question is this: how do I delete an exact dup without deleting the original? Let me explain... I just accidentally ran a restore on my perfectly running database. When I originally made the dump file, I had OIDs turned on. I figured they were unique. Now, after the restore is done, I see that they are not. So now I have duplicate entries in a large table. They are duplicate all the way down to the OIDs! I was stupid enough not to set a unique key field for this table when I designed it, so the system accepted the "new" entries with the exact same OIDs as already existed. However, almost all of my other tables do have unique keys set, so the dups were rejected. Check out my stupidity: # SELECT oid, sender, length(text), date_submitted FROM comm ORDER BY sender, date_submitted LIMIT 20; oid | sender | length | date_submitted -------+--------+--------+------------------------ 61385 | 132 | 2179 | 2001-02-23 16:43:00-08 61385 | 132 | 2179 | 2001-02-23 16:43:00-08 61386 | 132 | 1313 | 2001-02-25 17:40:00-08 52234 | 154 | 2073 | 2001-05-07 23:40:00-07 52234 | 154 | 2073 | 2001-05-07 23:40:00-07 49588 | 168 | 3063 | 2002-03-20 12:04:00-08 49588 | 168 | 3063 | 2002-03-20 12:04:00-08 49592 | 168 | 5243 | 2002-03-26 10:54:00-08 49592 | 168 | 5243 | 2002-03-26 10:54:00-08 49801 | 188 | 1010 | 2000-08-22 12:30:00-07 49801 | 188 | 1010 | 2000-08-22 12:30:00-07 49802 | 188 | 307 | 2000-08-22 12:37:00-07 49802 | 188 | 307 | 2000-08-22 12:37:00-07 49803 | 188 | 1790 | 2000-08-22 12:39:00-07 49803 | 188 | 1790 | 2000-08-22 12:39:00-07 49804 | 188 | 531 | 2000-08-22 12:41:00-07 49804 | 188 | 531 | 2000-08-22 12:41:00-07 49805 | 188 | 4700 | 2000-08-22 12:45:00-07 49805 | 188 | 4700 | 2000-08-22 12:45:00-07 49809 | 188 | 2855 | 2000-08-22 12:47:00-07 (20 rows) The 'sender' field, plus the 'date_submitted' field are used as "the key" for each entry. If you're sharp, you'll notice that for 'sender' 132, there are three total, one dup. This is because I tried to delete one of the dups. This was a mistake, as BOTH were deleted. I re-added it again from the dump file manually (boy, that was really not fun). So, once again, here's my question: how do I go about deleting the duplicate entries WITHOUT also deleting the originals?
On Sun, 5 Jan 2003, Jack Flak wrote: > So, once again, here's my question: how do I go about deleting the > duplicate entries WITHOUT also deleting the originals? Maybe a dump with some awk/perl magic followed by a restore might be the easiest way. :-( "pg_dump -a -D" might give output that is fairly amenable to munging; a bit of text editing followed by sort | uniq might even do it. -- David.
On Sun, 5 Jan 2003, Jack Flak wrote: > Greetings Group, > > I'm running 7.1. > > Basically, my question is this: how do I delete an exact dup without > deleting the original? You can use one of the other hidden fields. ctid will be unique for the various rows, but I'm not sure of a good way to bulk delete them (tids don't seem to have comparison operators so the normal ways I can think of don't work). If all of the new rows were added by a single transaction (through copy say) you'd probably be able to delete all the rows added by that transaction using the xmin column. Find the xmin belonging to those rows and delete all the rows having that xmin.
Jack Flak <jack@flak.nospam.org> writes: > So, once again, here's my question: how do I go about deleting the > duplicate entries WITHOUT also deleting the originals? Perhaps something like CREATE TABLE foo AS SELECT DISTINCT * FROM original; then delete all from original, then INSERT INTO original SELECT * FROM foo; regards, tom lane
Would this work for you (if you can muck with the comm table temporarily without causing problems)? --Create a table with unique values create temporary table communique as select distinct * from comm; --Empty the table delete from comm; (or truncate comm if recent enough version or drop and recreate comm if it's really large and an older version of PostgreSQL) --Repopulate comm from commtemp insert into comm select * from commtemp; Cheers, Steve On Saturday 04 January 2003 10:30 pm, Jack Flak wrote: > Greetings Group, > > I'm running 7.1. > > Basically, my question is this: how do I delete an exact dup without > deleting the original? > > Let me explain... > > I just accidentally ran a restore on my perfectly running database. When I > originally made the dump file, I had OIDs turned on. I figured they were > unique. Now, after the restore is done, I see that they are not. > > So now I have duplicate entries in a large table. They are duplicate all > the way down to the OIDs! I was stupid enough not to set a unique key > field for this table when I designed it, so the system accepted the "new" > entries with the exact same OIDs as already existed. However, almost all of > my other tables do have unique keys set, so the dups were rejected. Check > out my stupidity: > > > # SELECT oid, sender, length(text), date_submitted FROM comm ORDER BY > sender, date_submitted LIMIT 20; > oid | sender | length | date_submitted > -------+--------+--------+------------------------ > 61385 | 132 | 2179 | 2001-02-23 16:43:00-08 > 61385 | 132 | 2179 | 2001-02-23 16:43:00-08 > 61386 | 132 | 1313 | 2001-02-25 17:40:00-08 > 52234 | 154 | 2073 | 2001-05-07 23:40:00-07 > 52234 | 154 | 2073 | 2001-05-07 23:40:00-07 > 49588 | 168 | 3063 | 2002-03-20 12:04:00-08 > 49588 | 168 | 3063 | 2002-03-20 12:04:00-08 > 49592 | 168 | 5243 | 2002-03-26 10:54:00-08 > 49592 | 168 | 5243 | 2002-03-26 10:54:00-08 > 49801 | 188 | 1010 | 2000-08-22 12:30:00-07 > 49801 | 188 | 1010 | 2000-08-22 12:30:00-07 > 49802 | 188 | 307 | 2000-08-22 12:37:00-07 > 49802 | 188 | 307 | 2000-08-22 12:37:00-07 > 49803 | 188 | 1790 | 2000-08-22 12:39:00-07 > 49803 | 188 | 1790 | 2000-08-22 12:39:00-07 > 49804 | 188 | 531 | 2000-08-22 12:41:00-07 > 49804 | 188 | 531 | 2000-08-22 12:41:00-07 > 49805 | 188 | 4700 | 2000-08-22 12:45:00-07 > 49805 | 188 | 4700 | 2000-08-22 12:45:00-07 > 49809 | 188 | 2855 | 2000-08-22 12:47:00-07 > (20 rows) > > > The 'sender' field, plus the 'date_submitted' field are used as "the key" > for each entry. > If you're sharp, you'll notice that for 'sender' 132, there are three > total, one dup. This is because I tried to delete one of the dups. This > was a mistake, as BOTH were deleted. I re-added it again from the dump > file manually (boy, that was really not fun). > > So, once again, here's my question: how do I go about deleting the > duplicate entries WITHOUT also deleting the originals? > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Stephan, That's very interesting! I didn't even know about these other "hidden" fields. How many others are there? You're right about the xmin value. It's the same for all the dups. That could have saved me a lot of work. I built a perl script to locate the dups, pull the data, re-insert it, and then delete by old OID. What is the 'ctid' field about? It's wierd. For that matter, what is 'xmin'??? Stephan Szabo wrote: > On Sun, 5 Jan 2003, Jack Flak wrote: > >> Greetings Group, >> >> I'm running 7.1. >> >> Basically, my question is this: how do I delete an exact dup without >> deleting the original? > > You can use one of the other hidden fields. ctid will be unique for the > various rows, but I'm not sure of a good way to bulk delete them (tids > don't seem to have comparison operators so the normal ways I can think of > don't work). > > If all of the new rows were added by a single transaction (through copy > say) you'd probably be able to delete all the rows added by that > transaction using the xmin column. Find the xmin belonging to those rows > and delete all the rows having that xmin. > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
Steve, Thanks for the excellent suggestion! Yes, that would have worked. I guess I did it the hard way. I wrote a nice little perl script to locate the dups by OID and then pull the data, re-insert it, and then delete the original dup by OID. It was smart, I could run it as many times as needed and it would not have touched any entry which was not dupped. This is the SQL soltuion to such a problem. Thanks for the help! Steve Crawford wrote: > Would this work for you (if you can muck with the comm table temporarily > without causing problems)? > > --Create a table with unique values > create temporary table communique as select distinct * from comm; > > --Empty the table > delete from comm; > (or truncate comm if recent enough version or drop and recreate comm if > it's really large and an older version of PostgreSQL) > > --Repopulate comm from commtemp > insert into comm select * from commtemp; > > Cheers, > Steve > > On Saturday 04 January 2003 10:30 pm, Jack Flak wrote: >> Greetings Group, >> >> I'm running 7.1. >> >> Basically, my question is this: how do I delete an exact dup without >> deleting the original? >> >> Let me explain... >> >> I just accidentally ran a restore on my perfectly running database. When >> I originally made the dump file, I had OIDs turned on. I figured they >> were unique. Now, after the restore is done, I see that they are not. >> >> So now I have duplicate entries in a large table. They are duplicate all >> the way down to the OIDs! I was stupid enough not to set a unique key >> field for this table when I designed it, so the system accepted the "new" >> entries with the exact same OIDs as already existed. However, almost all >> of my other tables do have unique keys set, so the dups were rejected. >> Check out my stupidity: >> >> >> # SELECT oid, sender, length(text), date_submitted FROM comm ORDER BY >> sender, date_submitted LIMIT 20; >> oid | sender | length | date_submitted >> -------+--------+--------+------------------------ >> 61385 | 132 | 2179 | 2001-02-23 16:43:00-08 >> 61385 | 132 | 2179 | 2001-02-23 16:43:00-08 >> 61386 | 132 | 1313 | 2001-02-25 17:40:00-08 >> 52234 | 154 | 2073 | 2001-05-07 23:40:00-07 >> 52234 | 154 | 2073 | 2001-05-07 23:40:00-07 >> 49588 | 168 | 3063 | 2002-03-20 12:04:00-08 >> 49588 | 168 | 3063 | 2002-03-20 12:04:00-08 >> 49592 | 168 | 5243 | 2002-03-26 10:54:00-08 >> 49592 | 168 | 5243 | 2002-03-26 10:54:00-08 >> 49801 | 188 | 1010 | 2000-08-22 12:30:00-07 >> 49801 | 188 | 1010 | 2000-08-22 12:30:00-07 >> 49802 | 188 | 307 | 2000-08-22 12:37:00-07 >> 49802 | 188 | 307 | 2000-08-22 12:37:00-07 >> 49803 | 188 | 1790 | 2000-08-22 12:39:00-07 >> 49803 | 188 | 1790 | 2000-08-22 12:39:00-07 >> 49804 | 188 | 531 | 2000-08-22 12:41:00-07 >> 49804 | 188 | 531 | 2000-08-22 12:41:00-07 >> 49805 | 188 | 4700 | 2000-08-22 12:45:00-07 >> 49805 | 188 | 4700 | 2000-08-22 12:45:00-07 >> 49809 | 188 | 2855 | 2000-08-22 12:47:00-07 >> (20 rows) >> >> >> The 'sender' field, plus the 'date_submitted' field are used as "the key" >> for each entry. >> If you're sharp, you'll notice that for 'sender' 132, there are three >> total, one dup. This is because I tried to delete one of the dups. This >> was a mistake, as BOTH were deleted. I re-added it again from the dump >> file manually (boy, that was really not fun). >> >> So, once again, here's my question: how do I go about deleting the >> duplicate entries WITHOUT also deleting the originals? >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
On Thu, 9 Jan 2003, Jack Flak wrote: > Stephan, > > That's very interesting! I didn't even know about these other "hidden" > fields. How many others are there? Let's see, I think the set is ctid, oid, xmin, cmin, xmax, cmax, tableoid IIRC, ctid is basically like a physical pointer to where the row actually is oid you know xmin - transaction that made the row cmin - command counter in that transaction that made the row xmax, cmax - like xmin, cmin but for the transaction that removes it tableoid - which table it belongs to (for inheritance) Cmin and xmax share storage, and it looks like xmin/xmax are of a type that you can't actually do much with from an SQL statement.
Thanks man; Iappreciate the data. Stephan Szabo wrote: > > On Thu, 9 Jan 2003, Jack Flak wrote: > >> Stephan, >> >> That's very interesting! I didn't even know about these other "hidden" >> fields. How many others are there? > > Let's see, I think the set is > ctid, oid, xmin, cmin, xmax, cmax, tableoid > > IIRC, > ctid is basically like a physical pointer to where the row actually is > oid you know > xmin - transaction that made the row > cmin - command counter in that transaction that made the row > xmax, cmax - like xmin, cmin but for the transaction that removes it > tableoid - which table it belongs to (for inheritance) > > Cmin and xmax share storage, and it looks like xmin/xmax are of a type > that you can't actually do much with from an SQL statement. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >