Re: Index seems "lost" after consecutive deletes

Поиск
Список
Период
Сортировка
От Edson Richter
Тема Re: Index seems "lost" after consecutive deletes
Дата
Msg-id BLU436-SMTP259970C19A04869FEBE92FFCF540@phx.gbl
обсуждение исходный текст
Ответ на Re: Index seems "lost" after consecutive deletes  ("David G. Johnston" <david.g.johnston@gmail.com>)
Ответы Re: Index seems "lost" after consecutive deletes  (Edson Richter <edsonrichter@hotmail.com>)
Список pgsql-general
Em 14/06/2016 10:32, David G. Johnston escreveu:
On Tue, Jun 14, 2016 at 12:51 AM, Edson Richter <edsonrichter@hotmail.com> wrote:
Em 14/06/2016 01:33, David G. Johnston escreveu:
On Monday, June 13, 2016, Edson Richter <edsonrichter@hotmail.com> wrote:
Em 13/06/2016 23:36, Edson Richter escreveu:
Em 13/06/2016 23:18, rob stone escreveu:
On Mon, 2016-06-13 at 22:41 -0300, Edson Richter wrote:
Em 13/06/2016 22:33, Edson Richter escreveu:
I've a table "A" with 4,000,000 records.

I've decided to delete records from oldest to newest but I can't 
delete records that have references in tables "B", "C" or "D".


so, I've


with qry as (

    select A.id

      from A

    where not exists (select 1 from B where B.a_id = A.id)

       and not exists (select 1 from C where C.a_id = A.id)

       and not exists (select 1 from D where D.a_id = A.id)

       and A.creation_date < (now()::date - interval '12 month')

  order by A.id DESC

  limit 2000

)

delete from A where id in (select id from qry);


All three referenced tables have indexes (B.a_id; C.a_id; D.a_id)
in 
order to make query faster.

So for first 2 million rows it worked really well, taking about 1 
minute to delete each group of 2000 records.

Then, after a while I just started to get errors like:


​​
ERROR: update or delete in "A" violates foreign key
​​
"fk_C_A" in "C". DETAIL:  Key (id)=(3240124) is still referenced by table "C". Seems to me that indexes got lost in the path - the query is really  specific and no "C" referenced records can be in my deletion. Has anyone faced a behavior like this? Am I doing something wrong?
Of course:
Version string    PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu,
compiled 
by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
Oracle Linux 7 x64 with all updates. Running on EXT4 file system.
Computer is Dell R420 with mirrored disks, 80GB of RAM (database has
< 
40GB in total).

Sorry for not putting the info in the first e-mail.

Edson


What does:-

SELECT COUNT(*) FROM C WHERE C.a_id = 3240124;

return?

Is it a many-to-one or a one-to-one relationship?



SELECT COUNT(*) FROM C WHERE C.a_id = 3240124;
 count
-------
     1
(1 registro)


A.id is primary key of A table. Each table has its own primary key.

Relationship to others table is 1-N, being N = {0,1}


A.id -> B.a_id (being B.a_id unique but not enforced by unique key)

A.id -> C.a_id (being C.a_id unique but not enforced by unique key)

A.id -> D.a_id (being D.a_id unique but not enforced by unique key)


Regards,

Edson


Just in case, I've run:

- vacuum full analyze verbose;
- reindex index ix_c_a_id;

Result I get same error. So, I'm inclined to discard that this is a index error.


Interesting:

with qry as (select A.id
 from A
where creatingdate < (now()::date - interval '12 month')
and not exists (select 1 from B where B.a_id = A.id)
and not exists (select 1 from C where C.a_id = A.id)
and not exists (select 1 from D where D.a_id = A.id)
order by A.id limit 2000)

select * from qry where id = 3240124;

Why do you assume 3240124 is within the first 2000 qualified records that the CTE is limited to checking?

Because this is the key causing the error on delete.

​Wasn't the point though I do suspect your expectations are acceptable in this instance.  If you truly want to see if qry contains 3240124 you should lose the LIMIT 2000.​


Running the risk to deviate the focus, if records are ordered in the query, limiting them will always produce same result.



 


Total query runtime: 2.2 secs
0 rows retrieved.


Why delete causes error, but querying don't?

Given the error message this answer seems self-evident...

Not self-evident to me: delete says I'm trying to delete the record with id = 3240124 and I can't. But the select says this record with id = 3240124 is not there!!!


​This error:
 ​ERROR: update or delete in "A" violates foreign key "fk_C_A" in
​ ​
"C".
is impossible to encounter when executing a pure select...

Yes, but is also impossible to get this error if the record is not in the subquery results. That's why I've executed the query filtering id=3240124.
If this record is not in the subquery, why does the "delete..." is trying to remove it?


 

Would it be a bug when using delete ... where id in (subquery)?
  
 
I'm unsure regarding the solution but I suspect the problem is that between selecting the A row and deleting it another concurrent process added a record to C that, if you were to re-run the select would cause the row from A to be skipped.  But the single query doesn't have that option so it ends up failing.

There is a FOR UPDATE clause you can add to the select but I don't think that works here since table C is the one being altered and at the time of the query there is nothing to lock.

I'm doubting this is a bug, just poor concurrency understanding.  Sorry I cannot enlighten further at the moment.

There is not other process adding/updating records - and even there is, the ID would be so high (because it is a sequence) that cannot be the 3240124 (this record has been generated more than 2 years ago - that's why my query has the "creationdate" filter - I don't touch records created within last 12 months).

Also, I've tried the same with "for update": same error!


​Have you confirmed that 
"fk_C_A" is referencing the columns you think it is?

Yes, first thing. All references are maintained by a automated system. If the relation is not there, then it will be automatically created.
But your question raised another interesting line of investigation: if there is any other cascading foreign keys pointing to A table.
Until now, I've been concentrated in the related tables, but would be possible that another FK is cascading, which in turn would have another cascade that is causing the error.


​What's the history of this machine?  Did you pass through 9.3 (especially early releases) on your way to 9.4?

Started with 9.0, then 9.1, 9.2, 9.3 and now 9.4.
Nevertheless, for every migration I've used a "dump" and "restore" to avoid the "upgrade" caveats.
For example, when migrating from 9.3 to 9.4, I've used "9.4" pg_dump to create the dump, and then "9.4" pg_restore to restore it in the new cluster.

Edson


David J.


В списке pgsql-general по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: How to pass jsonb and inet arguments to a stored function with JDBC?
Следующее
От: Rupesh Choudhary
Дата:
Сообщение: Data ingestion failing when using higher Batch size