Обсуждение: Delete with subquery deleting all records

Поиск
Список
Период
Сортировка

Delete with subquery deleting all records

От
Francisco Reyes
Дата:
I have two tables
exports
export_messages

They were done without a foreign key and I am trying to clean the data to
put a constraint.

For every record in exports_messages there is supposed to be a matching
record in exports with a export_id (ie export_id is the foreign key for
export_messages)

The following query identified all records that I need to delete:
SELECT distinct export_messages.export_id as id
FROM export_messages
LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id);

I checked a number of them.. and all the records returned from that select
do not have a matching export_id in exports.

When I try to run:
delete from export_messages where export_id in
(SELECT distinct export_messages.export_id as id
FROM export_messages
LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id)
);


All records from export_messages get deleted.
Any suggestions?

Re: Delete with subquery deleting all records

От
"Joris Dobbelsteen"
Дата:
Hint: LEFT JOIN is your mistake...

Thought: are you sure you are going to delete those rows? In there cases
human verification is usually the way to go, though it takes a lot of
time.

Read on...

>-----Original Message-----
>From: pgsql-general-owner@postgresql.org
>[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
>Francisco Reyes
>Sent: donderdag 24 mei 2007 1:12
>To: PostgreSQL general
>Subject: [GENERAL] Delete with subquery deleting all records
>
>I have two tables
>exports
>export_messages
>
>They were done without a foreign key and I am trying to clean
>the data to put a constraint.
>
>For every record in exports_messages there is supposed to be a
>matching record in exports with a export_id (ie export_id is
>the foreign key for
>export_messages)
>
>The following query identified all records that I need to delete:
>SELECT distinct export_messages.export_id as id FROM
>export_messages LEFT OUTER JOIN exports ON
>(export_messages.export_id = exports.export_id);
>
>I checked a number of them.. and all the records returned from
>that select do not have a matching export_id in exports.
>
>When I try to run:
>delete from export_messages where export_id in (SELECT
>distinct export_messages.export_id as id FROM export_messages
>LEFT OUTER JOIN exports ON (export_messages.export_id =
>exports.export_id) );
>

What seems more resonable:
DELETE FORM export_messages
WHERE NOT export_id IN (SELECT export_id FROM exports)

Make sure you know what you are doing (backup)...

[snip]

- Joris


Re: Delete with subquery deleting all records

От
Francisco Reyes
Дата:
Joris Dobbelsteen writes:

> Hint: LEFT JOIN is your mistake...

The use of left join in general.. or my left join?
When I do the left join by itself I verified manually and it had the data I
wanted.

> Thought: are you sure you are going to delete those rows? In there cases
> human verification is usually the way to go, though it takes a lot of
> time.

If I can't not find a way to do this through SQL I will write a program.
The time to try and clean this by hand would be countless hours. There are a
lot of records in the child table that do not have a matching record in the
parent table.


Re: Delete with subquery deleting all records

От
"Joris Dobbelsteen"
Дата:
>-----Original Message-----
>From: Francisco Reyes [mailto:lists@stringsutils.com]
>Sent: donderdag 24 mei 2007 2:04
>To: Joris Dobbelsteen
>Cc: PostgreSQL general
>Subject: Re: [GENERAL] Delete with subquery deleting all records
>
>Joris Dobbelsteen writes:
>
>> Hint: LEFT JOIN is your mistake...
>
>The use of left join in general.. or my left join?
>When I do the left join by itself I verified manually and it
>had the data I wanted.

Your usage in this context...

Did you really check your list thoroughly.
>>>SELECT distinct export_messages.export_id as id
>>>FROM export_messages
>>>>LEFT OUTER JOIN exports ON
>>>(export_messages.export_id = exports.export_id);

Take any value from "SELECT export_id FROM exports"
Does it not exist in your list?

Try this:
SELECT distinct export_messages.export_id as id,
  exports.export_id as exports_export_id
FROM export_messages
LEFT OUTER JOIN exports ON
(export_messages.export_id = exports.export_id)
WHERE exports.export_id IS NOT NULL;

At this point you should know whats going wrong...

>>>DELETE FROM export_messages
>>>WHERE export_id IN
>>>   (SELECT distinct export_messages.export_id as id
>>>    FROM export_messages
>>>    LEFT OUTER JOIN exports

The LEFT OUTER join will at all times return ALL rows from
export_messages.
In effect, you generate a list with ALL export_messages.export_id. Thus
we must conclude that for every row you are trying to delete, the
condition must evaluate to true.

>>>    ON (export_messages.export_id = exports.export_id)
>>>    );

>> Thought: are you sure you are going to delete those rows? In there
>> cases human verification is usually the way to go, though it takes a
>> lot of time.
>
>If I can't not find a way to do this through SQL I will write
>a program.
>The time to try and clean this by hand would be countless
>hours. There are a lot of records in the child table that do
>not have a matching record in the parent table.

That's the trade-off: effects of a mistake * chance of a mistake against
the cost to prevent these.

Hope this helps...

- Joris


Re: Delete with subquery deleting all records

От
Francisco Reyes
Дата:
Joris Dobbelsteen writes:

> Did you really check your list thoroughly.
>>>>SELECT distinct export_messages.export_id as id
>>>>FROM export_messages
>>>>>LEFT OUTER JOIN exports ON
>>>>(export_messages.export_id = exports.export_id);
>
> Take any value from "SELECT export_id FROM exports"
> Does it not exist in your list?

Correct.
I thought of that.. and the outerjoin as I showed.. only shows values that
are in export_messages but are not in exports.

I went over nearly 100 values and that select only had the right values.

> Try this:
> SELECT distinct export_messages.export_id as id,
>   exports.export_id as exports_export_id
> FROM export_messages
> LEFT OUTER JOIN exports ON
> (export_messages.export_id = exports.export_id)
> WHERE exports.export_id IS NOT NULL;

Thanks will try it.

> The LEFT OUTER join will at all times return ALL rows from
> export_messages.

What is very, very strange is that it didn't return all values.

> In effect, you generate a list with ALL export_messages.expor
> we must conclude that for every row you are trying to delete, the
> condition must evaluate to true.

This is what was driving me crazy.. when I did the select by itself the list
was correct.

> That's the trade-off: effects of a mistake * chance of a mistake against
> the cost to prevent these.

Thanks much. Will try your query.
doing all this within a transaction so I can double check the results.. that
is the primary reason i would rather get it done from within psql.
If I do it in a program I will have no easy way to tell if I am doing the
right thing... Small tests.. and print statements will helpfully help, but
once I believe the program is working.. and run it.. the only solution is a
restore (I do a backup before doing any changes of course).

Re: Delete with subquery deleting all records

От
Francisco Reyes
Дата:
Joris Dobbelsteen writes:

> Try this:
> SELECT distinct export_messages.export_id as id,
>   exports.export_id as exports_export_id
> FROM export_messages
> LEFT OUTER JOIN exports ON
> (export_messages.export_id = exports.export_id)
> WHERE exports.export_id IS NOT NULL;

In my case I needed "IS NULL"

Your query worked. Thanks!!!

However.. I find it very strange that just the selects by themselves
produced the same ouput up to limit 100.

SELECT distinct export_messages.export_id as id
FROM export_messages
LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id)
limit 100;

and
SELECT distinct export_messages.export_id as id
FROM export_messages
LEFT OUTER JOIN exports ON
(export_messages.export_id = exports.export_id)
WHERE exports.export_id IS NULL limit 100;

Produced the same output.







> At this point you should know whats going wrong...
>
>>>>DELETE FROM export_messages
>>>>WHERE export_id IN
>>>>   (SELECT distinct export_messages.export_id as id
>>>>    FROM export_messages
>>>>    LEFT OUTER JOIN exports
>
> The LEFT OUTER join will at all times return ALL rows from
> export_messages.
> In effect, you generate a list with ALL export_messages.export_id. Thus
> we must conclude that for every row you are trying to delete, the
> condition must evaluate to true.
>
>>>>    ON (export_messages.export_id = exports.export_id)
>>>>    );
>
>>> Thought: are you sure you are going to delete those rows? In there
>>> cases human verification is usually the way to go, though it takes a
>>> lot of time.
>>
>>If I can't not find a way to do this through SQL I will write
>>a program.
>>The time to try and clean this by hand would be countless
>>hours. There are a lot of records in the child table that do
>>not have a matching record in the parent table.
>
> That's the trade-off: effects of a mistake * chance of a mistake against
> the cost to prevent these.
>
> Hope this helps...
>
> - Joris
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

Re: Delete with subquery deleting all records

От
Alban Hertroys
Дата:
Francisco Reyes wrote:
> When I try to run:
> delete from export_messages where export_id in
> (SELECT distinct export_messages.export_id as id
> FROM export_messages
> LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id)
> );

Why not use EXISTS?

DELETE FROM export_messages WHERE NOT EXISTS (
    SELECT 1
      FROM exports
     WHERE exports.export_id = export_messages.export_id
)

I suppose you run those queries in a transaction block, right?

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: Delete with subquery deleting all records

От
Francisco Reyes
Дата:
Alban Hertroys writes:

> Why not use EXISTS?
>
> DELETE FROM export_messages WHERE NOT EXISTS (
>     SELECT 1
>       FROM exports
>      WHERE exports.export_id = export_messages.export_id
> )

Didn't think of it. Thanks for the code.

> I suppose you run those queries in a transaction block, right?

Correct.
Also I do a backup before doing the deletions.


Re: Delete with subquery deleting all records

От
Scott Marlowe
Дата:
Francisco Reyes wrote:
> Alban Hertroys writes:
>
>> I suppose you run those queries in a transaction block, right?
>
> Correct.
> Also I do a backup before doing the deletions.

Note that if you're in a transaction, you don't technically need the
backup (doesn't hurt though) as if you get it wrong you can just roll it
back.

Re: Delete with subquery deleting all records

От
Francisco Reyes
Дата:
Scott Marlowe writes:

> Note that if you're in a transaction, you don't technically need the
> backup (doesn't hurt though) as if you get it wrong you can just roll it
> back.


I know.. I do the backup in case I forget to do the transaction. :-)

Re: Delete with subquery deleting all records

От
Lew
Дата:
Your top-posting was confusing to me, but I eventually figured out what went
where.

Francisco Reyes wrote:
> However.. I find it very strange that just the selects by themselves
> produced the same ouput up to limit 100.

Strange?  Why?  Did you expect a particular statistical distribution?  Perhaps
you were surprised by the extent of the situation, not thinking there could be
100 records that didn't match?

Apparently by coincidence the first 100 distinct values returned from
export_messages just happened not to have corresponding rows in exports.

There is really nothing strange about your result, just like there's really
nothing strange about getting 100 heads in a row in a coin toss.

> SELECT distinct export_messages.export_id as id
> FROM export_messages
> LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id)
> limit 100;
>
> and
> SELECT distinct export_messages.export_id as id
> FROM export_messages
> LEFT OUTER JOIN exports ON
> (export_messages.export_id = exports.export_id)
> WHERE exports.export_id IS NULL limit 100;
>
> Produced the same output.

The unconstrained outer join is guaranteed to return every distinct value of
export_messages.export_id, the LEFT table, by the definition of LEFT OUTER JOIN.

As Alban said, a better query would be (SELECT column changed from his
suggestion):

DELETE FROM export_messages WHERE NOT EXISTS
  ( SELECT export_id FROM exports
    WHERE exports.export_id = export_messages.export_id
  );

or

DELETE FROM export_messages WHERE export_id NOT IN
  ( SELECT export_id FROM exports );

--
Lew

Re: Delete with subquery deleting all records

От
Francisco Reyes
Дата:
Lew writes:

> Strange?  Why?  Did you expect a particular statistical distribution?  Perhaps


The impression was that one query was returning everything.. and the other
only the records that did not exist in the one table.

> you were surprised by the extent of the situation, not thinking there could be
> 100 records that didn't match?

Surprised that the outer join actually did ONLY display records that did not
exist in the second table, even though I did not have a where clause to not
list the records with a NULL value.

> The unconstrained outer join is guaranteed to return every distinct value of
> export_messages.export_id, the LEFT table, by the definition of LEFT OUTER JOIN.

That is what I expected, BUT it only returned records that did NOT exist in
the second table. It did not, as far as I could check, return all records.

>
> DELETE FROM export_messages WHERE NOT EXISTS
>   ( SELECT export_id FROM exports
>     WHERE exports.export_id = export_messages.export_id
>   );

That is what I ended up using.
It worked.


> DELETE FROM export_messages WHERE export_id NOT IN
>   ( SELECT export_id FROM exports );

Will keep that one handy too for future reference.

I think I may have not explained properly what I think I was seeing..
The left outter join without the where clause seemed to return the right
data.. only rows that existed in one table, but not in the other.

I looked at a few hundred records and check a good deal of them manually.

The issue was that I used it as a subquery to delete it seemed to produce
the entire list (ie all records from both tables)..

It is possible, or even likely, that I did something wrong or had some form
of oversight.. After I got the suggestion with the better query (with the
where clause) I did not go back to test anymore.

Thanks for your feedback.

Re: Delete with subquery deleting all records

От
Lew
Дата:
Francisco Reyes wrote:
> Lew writes:
>
>> Strange?  Why?  Did you expect a particular statistical distribution?
>> Perhaps
>
>
> The impression was that one query was returning everything.. and the
> other only the records that did not exist in the one table.
>
>> you were surprised by the extent of the situation, not thinking there
>> could be 100 records that didn't match?
>
> Surprised that the outer join actually did ONLY display records that did
> not exist in the second table, even though I did not have a where clause
> to not list the records with a NULL value.

You only looked at some of the records, not all of them, correct?

Ah, yes, you did say,
> I checked a number of them.

Your evaluation of a whole data set by manual examination of a small subset of
the returned results cannot be certain.

Did you try SELECT COUNT(*) to check if the queries differed in the size of
their returned result sets?

> That is what I expected, BUT it only returned records that did NOT exist in the second table. It did not, as far as I
couldcheck, return all records. 

You mean "as far as you did check".  You still do not know the truth of your
assertion that the outer join returned only a subset of the records.

SELECT COUNT( DISTINCT export_messages.export_id )
FROM export_messages
LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id);

vs.

SELECT COUNT( DISTINCT export_messages.export_id )
FROM export_messages;

will reveal.

--
Lew