Обсуждение: restore/dup OIDs HELP!

Поиск
Список
Период
Сортировка

restore/dup OIDs HELP!

От
Jack Flak
Дата:
Greetings Group,

I'm running 7.1.

Basically, my question is this:  how do I delete an exact dup without
deleting the original?

Let me explain...

I just accidentally ran a restore on my perfectly running database. When I
originally made the dump file, I had OIDs turned on. I figured they were
unique. Now, after the restore is done, I see that they are not.

So now I have duplicate entries in a large table. They are duplicate all
the way down to the OIDs!  I was stupid enough not to set a unique key
field for this table when I designed it, so the system accepted the "new"
entries with the exact same OIDs as already existed. However, almost all of
my other tables do have unique keys set, so the dups were rejected. Check
out my stupidity:


# SELECT oid, sender, length(text), date_submitted  FROM comm  ORDER BY
sender, date_submitted  LIMIT 20;
  oid  | sender | length |     date_submitted
-------+--------+--------+------------------------
 61385 | 132    |   2179 | 2001-02-23 16:43:00-08
 61385 | 132    |   2179 | 2001-02-23 16:43:00-08
 61386 | 132    |   1313 | 2001-02-25 17:40:00-08
 52234 | 154    |   2073 | 2001-05-07 23:40:00-07
 52234 | 154    |   2073 | 2001-05-07 23:40:00-07
 49588 | 168    |   3063 | 2002-03-20 12:04:00-08
 49588 | 168    |   3063 | 2002-03-20 12:04:00-08
 49592 | 168    |   5243 | 2002-03-26 10:54:00-08
 49592 | 168    |   5243 | 2002-03-26 10:54:00-08
 49801 | 188    |   1010 | 2000-08-22 12:30:00-07
 49801 | 188    |   1010 | 2000-08-22 12:30:00-07
 49802 | 188    |    307 | 2000-08-22 12:37:00-07
 49802 | 188    |    307 | 2000-08-22 12:37:00-07
 49803 | 188    |   1790 | 2000-08-22 12:39:00-07
 49803 | 188    |   1790 | 2000-08-22 12:39:00-07
 49804 | 188    |    531 | 2000-08-22 12:41:00-07
 49804 | 188    |    531 | 2000-08-22 12:41:00-07
 49805 | 188    |   4700 | 2000-08-22 12:45:00-07
 49805 | 188    |   4700 | 2000-08-22 12:45:00-07
 49809 | 188    |   2855 | 2000-08-22 12:47:00-07
(20 rows)


The 'sender' field, plus the 'date_submitted' field are used as "the key"
for each entry.
If you're sharp, you'll notice that for 'sender' 132, there are three
total, one dup.  This is because I tried to delete one of the dups.  This
was a mistake, as BOTH were deleted.  I re-added it again from the dump
file manually (boy, that was really not fun).

So, once again, here's my question:  how do I go about deleting the
duplicate entries WITHOUT also deleting the originals?



Re: restore/dup OIDs HELP!

От
"David F. Skoll"
Дата:
On Sun, 5 Jan 2003, Jack Flak wrote:

> So, once again, here's my question:  how do I go about deleting the
> duplicate entries WITHOUT also deleting the originals?

Maybe a dump with some awk/perl magic followed by a restore might be
the easiest way. :-(

"pg_dump -a -D" might give output that is fairly amenable to munging;
a bit of text editing followed by sort | uniq might even do it.

--
David.

Re: restore/dup OIDs HELP!

От
Stephan Szabo
Дата:
On Sun, 5 Jan 2003, Jack Flak wrote:

> Greetings Group,
>
> I'm running 7.1.
>
> Basically, my question is this:  how do I delete an exact dup without
> deleting the original?

You can use one of the other hidden fields.  ctid will be unique for the
various rows, but I'm not sure of a good way to bulk delete them (tids
don't seem to have comparison operators so the normal ways I can think of
don't work).

If all of the new rows were added by a single transaction (through copy
say) you'd probably be able to delete all the rows added by that
transaction using the xmin column.  Find the xmin belonging to those rows
and delete all the rows having that xmin.


Re: restore/dup OIDs HELP!

От
Tom Lane
Дата:
Jack Flak <jack@flak.nospam.org> writes:
> So, once again, here's my question:  how do I go about deleting the
> duplicate entries WITHOUT also deleting the originals?

Perhaps something like

    CREATE TABLE foo AS SELECT DISTINCT * FROM original;

then delete all from original, then INSERT INTO original SELECT * FROM foo;

            regards, tom lane

Re: restore/dup OIDs HELP!

От
Steve Crawford
Дата:
Would this work for you (if you can muck with the comm table temporarily
without causing problems)?

--Create a table with unique values
create temporary table communique as select distinct * from comm;

--Empty the table
delete from comm;
(or truncate comm if recent enough version or drop and recreate comm if it's
really large and an older version of PostgreSQL)

--Repopulate comm from commtemp
insert into comm select * from commtemp;

Cheers,
Steve

On Saturday 04 January 2003 10:30 pm, Jack Flak wrote:
> Greetings Group,
>
> I'm running 7.1.
>
> Basically, my question is this:  how do I delete an exact dup without
> deleting the original?
>
> Let me explain...
>
> I just accidentally ran a restore on my perfectly running database. When I
> originally made the dump file, I had OIDs turned on. I figured they were
> unique. Now, after the restore is done, I see that they are not.
>
> So now I have duplicate entries in a large table. They are duplicate all
> the way down to the OIDs!  I was stupid enough not to set a unique key
> field for this table when I designed it, so the system accepted the "new"
> entries with the exact same OIDs as already existed. However, almost all of
> my other tables do have unique keys set, so the dups were rejected. Check
> out my stupidity:
>
>
> # SELECT oid, sender, length(text), date_submitted  FROM comm  ORDER BY
> sender, date_submitted  LIMIT 20;
>   oid  | sender | length |     date_submitted
> -------+--------+--------+------------------------
>  61385 | 132    |   2179 | 2001-02-23 16:43:00-08
>  61385 | 132    |   2179 | 2001-02-23 16:43:00-08
>  61386 | 132    |   1313 | 2001-02-25 17:40:00-08
>  52234 | 154    |   2073 | 2001-05-07 23:40:00-07
>  52234 | 154    |   2073 | 2001-05-07 23:40:00-07
>  49588 | 168    |   3063 | 2002-03-20 12:04:00-08
>  49588 | 168    |   3063 | 2002-03-20 12:04:00-08
>  49592 | 168    |   5243 | 2002-03-26 10:54:00-08
>  49592 | 168    |   5243 | 2002-03-26 10:54:00-08
>  49801 | 188    |   1010 | 2000-08-22 12:30:00-07
>  49801 | 188    |   1010 | 2000-08-22 12:30:00-07
>  49802 | 188    |    307 | 2000-08-22 12:37:00-07
>  49802 | 188    |    307 | 2000-08-22 12:37:00-07
>  49803 | 188    |   1790 | 2000-08-22 12:39:00-07
>  49803 | 188    |   1790 | 2000-08-22 12:39:00-07
>  49804 | 188    |    531 | 2000-08-22 12:41:00-07
>  49804 | 188    |    531 | 2000-08-22 12:41:00-07
>  49805 | 188    |   4700 | 2000-08-22 12:45:00-07
>  49805 | 188    |   4700 | 2000-08-22 12:45:00-07
>  49809 | 188    |   2855 | 2000-08-22 12:47:00-07
> (20 rows)
>
>
> The 'sender' field, plus the 'date_submitted' field are used as "the key"
> for each entry.
> If you're sharp, you'll notice that for 'sender' 132, there are three
> total, one dup.  This is because I tried to delete one of the dups.  This
> was a mistake, as BOTH were deleted.  I re-added it again from the dump
> file manually (boy, that was really not fun).
>
> So, once again, here's my question:  how do I go about deleting the
> duplicate entries WITHOUT also deleting the originals?
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

Re: restore/dup OIDs HELP!

От
Jack Flak
Дата:
Stephan,

That's very interesting!  I didn't even know about these other "hidden"
fields.  How many others are there?

You're right about the xmin value.  It's the same for all the dups.  That
could have saved me a lot of work.  I built a perl script to locate the
dups, pull the data, re-insert it, and then delete by old OID.

What is the 'ctid' field about?  It's wierd.
For that matter, what is 'xmin'???


Stephan Szabo wrote:

> On Sun, 5 Jan 2003, Jack Flak wrote:
>
>> Greetings Group,
>>
>> I'm running 7.1.
>>
>> Basically, my question is this:  how do I delete an exact dup without
>> deleting the original?
>
> You can use one of the other hidden fields.  ctid will be unique for the
> various rows, but I'm not sure of a good way to bulk delete them (tids
> don't seem to have comparison operators so the normal ways I can think of
> don't work).
>
> If all of the new rows were added by a single transaction (through copy
> say) you'd probably be able to delete all the rows added by that
> transaction using the xmin column.  Find the xmin belonging to those rows
> and delete all the rows having that xmin.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


Re: restore/dup OIDs HELP!

От
Jack Flak
Дата:
Steve,

Thanks for the excellent suggestion!

Yes, that would have worked.  I guess I did it the hard way.  I wrote a
nice little perl script to locate the dups by OID and then pull the data,
re-insert it, and then delete the original dup by OID.  It was smart, I
could run it as many times as needed and it would not have touched any
entry which was not dupped.

This is the SQL soltuion to such a problem.  Thanks for the help!


Steve Crawford wrote:

> Would this work for you (if you can muck with the comm table temporarily
> without causing problems)?
>
> --Create a table with unique values
> create temporary table communique as select distinct * from comm;
>
> --Empty the table
> delete from comm;
> (or truncate comm if recent enough version or drop and recreate comm if
> it's really large and an older version of PostgreSQL)
>
> --Repopulate comm from commtemp
> insert into comm select * from commtemp;
>
> Cheers,
> Steve
>
> On Saturday 04 January 2003 10:30 pm, Jack Flak wrote:
>> Greetings Group,
>>
>> I'm running 7.1.
>>
>> Basically, my question is this:  how do I delete an exact dup without
>> deleting the original?
>>
>> Let me explain...
>>
>> I just accidentally ran a restore on my perfectly running database. When
>> I originally made the dump file, I had OIDs turned on. I figured they
>> were unique. Now, after the restore is done, I see that they are not.
>>
>> So now I have duplicate entries in a large table. They are duplicate all
>> the way down to the OIDs!  I was stupid enough not to set a unique key
>> field for this table when I designed it, so the system accepted the "new"
>> entries with the exact same OIDs as already existed. However, almost all
>> of my other tables do have unique keys set, so the dups were rejected.
>> Check out my stupidity:
>>
>>
>> # SELECT oid, sender, length(text), date_submitted  FROM comm  ORDER BY
>> sender, date_submitted  LIMIT 20;
>>   oid  | sender | length |     date_submitted
>> -------+--------+--------+------------------------
>>  61385 | 132    |   2179 | 2001-02-23 16:43:00-08
>>  61385 | 132    |   2179 | 2001-02-23 16:43:00-08
>>  61386 | 132    |   1313 | 2001-02-25 17:40:00-08
>>  52234 | 154    |   2073 | 2001-05-07 23:40:00-07
>>  52234 | 154    |   2073 | 2001-05-07 23:40:00-07
>>  49588 | 168    |   3063 | 2002-03-20 12:04:00-08
>>  49588 | 168    |   3063 | 2002-03-20 12:04:00-08
>>  49592 | 168    |   5243 | 2002-03-26 10:54:00-08
>>  49592 | 168    |   5243 | 2002-03-26 10:54:00-08
>>  49801 | 188    |   1010 | 2000-08-22 12:30:00-07
>>  49801 | 188    |   1010 | 2000-08-22 12:30:00-07
>>  49802 | 188    |    307 | 2000-08-22 12:37:00-07
>>  49802 | 188    |    307 | 2000-08-22 12:37:00-07
>>  49803 | 188    |   1790 | 2000-08-22 12:39:00-07
>>  49803 | 188    |   1790 | 2000-08-22 12:39:00-07
>>  49804 | 188    |    531 | 2000-08-22 12:41:00-07
>>  49804 | 188    |    531 | 2000-08-22 12:41:00-07
>>  49805 | 188    |   4700 | 2000-08-22 12:45:00-07
>>  49805 | 188    |   4700 | 2000-08-22 12:45:00-07
>>  49809 | 188    |   2855 | 2000-08-22 12:47:00-07
>> (20 rows)
>>
>>
>> The 'sender' field, plus the 'date_submitted' field are used as "the key"
>> for each entry.
>> If you're sharp, you'll notice that for 'sender' 132, there are three
>> total, one dup.  This is because I tried to delete one of the dups.  This
>> was a mistake, as BOTH were deleted.  I re-added it again from the dump
>> file manually (boy, that was really not fun).
>>
>> So, once again, here's my question:  how do I go about deleting the
>> duplicate entries WITHOUT also deleting the originals?
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: if posting/reading through Usenet, please send an appropriate
>> subscribe-nomail command to majordomo@postgresql.org so that your
>> message can get through to the mailing list cleanly
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


Re: restore/dup OIDs HELP!

От
Stephan Szabo
Дата:
On Thu, 9 Jan 2003, Jack Flak wrote:

> Stephan,
>
> That's very interesting!  I didn't even know about these other "hidden"
> fields.  How many others are there?

Let's see, I think the set is
ctid, oid, xmin, cmin, xmax, cmax, tableoid

IIRC,
 ctid is basically like a physical pointer to where the row actually is
 oid you know
 xmin - transaction that made the row
 cmin - command counter in that transaction that made the row
 xmax, cmax - like xmin, cmin but for the transaction that removes it
 tableoid - which table it belongs to (for inheritance)

Cmin and xmax share storage, and it looks like xmin/xmax are of a type
that you can't actually do much with from an SQL statement.



Re: restore/dup OIDs HELP!

От
Jack Flak
Дата:
Thanks man; Iappreciate the data.

Stephan Szabo wrote:

>
> On Thu, 9 Jan 2003, Jack Flak wrote:
>
>> Stephan,
>>
>> That's very interesting!  I didn't even know about these other "hidden"
>> fields.  How many others are there?
>
> Let's see, I think the set is
> ctid, oid, xmin, cmin, xmax, cmax, tableoid
>
> IIRC,
>  ctid is basically like a physical pointer to where the row actually is
>  oid you know
>  xmin - transaction that made the row
>  cmin - command counter in that transaction that made the row
>  xmax, cmax - like xmin, cmin but for the transaction that removes it
>  tableoid - which table it belongs to (for inheritance)
>
> Cmin and xmax share storage, and it looks like xmin/xmax are of a type
> that you can't actually do much with from an SQL statement.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>