Обсуждение: How does this FK constraint error happen?

Поиск
Список
Период
Сортировка

How does this FK constraint error happen?

От
Ron Johnson
Дата:
PG 14.12

The job does DELETE FROM rel_group_user; (no WHERE clause!!) then does DELETE FROM public.access_user; (also no WHERE clause), but the public.access_user statement fails on FK constraint error on rel_group_user (which was just recently emptied).

Each statement is in a different transaction, since they are executed via separate psql statements.  Thus, no apparent MVCC visibility weirdness.

My first thought, of course, was that there are two rel_group_user tables.  Alas, no, there's just one.  See below for grep statement.

Excerpts from the cron job log file:
[snip]
2024-07-15 02:40:04 Deleting from FISPTAPPGS401DA/TAPd.rel_group_user
DELETE FROM rel_group_user;
DELETE 42747
[snip]
2024-07-15 02:41:15 Deleting from FISPTAPPGS401DA/TAPd.public.access_user
DELETE FROM public.access_user;
ERROR:  update or delete on table "access_user" violates foreign key constraint "fk_rel_group_user_1" on table "rel_group_user"
DETAIL:  Key (user_id)=(1210) is still referenced from table "rel_group_user".
ERROR: deleting FISPTAPPGS401DA/TAPd.public.access_user
[snip]

$ psql --host=FISPTAPPGS401DA TAPd -Xc "\dt *.*" | grep " rel_group_user "
 public             | rel_group_user                                | table       | TAP

Here are the table definitions (if relevant):
TAPd=# \d public.access_user
                                                   Table "public.access_user"
           Column           |            Type             | Collation | Nullable |                   Default                    
----------------------------+-----------------------------+-----------+----------+----------------------------------------------
 user_id                    | integer                     |           | not null | nextval('access_user_user_id_seq'::regclass)
 login_id                   | character varying(255)      |           | not null |
[snip]
 Indexes:
    "pk_access_user" PRIMARY KEY, btree (user_id)
    "idx_user_login_id" UNIQUE, btree (login_id)
Foreign-key constraints:
    "fk_access_user_home_domain" FOREIGN KEY (home_domain_id) REFERENCES access_domain(domain_id)
    "fk_user_userdesktop" FOREIGN KEY (user_desktop_id) REFERENCES user_desktop(user_desktop_id)
Referenced by:
[snip]
    TABLE "rel_group_user" CONSTRAINT "fk_rel_group_user_1" FOREIGN KEY (user_id) REFERENCES access_user(user_id)
[snip]

TAPd=# \d rel_group_user
                       Table "public.rel_group_user"
   Column    |            Type             | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+---------
 user_id     | integer                     |           | not null |
 group_id    | integer                     |           | not null |
 modified_by | integer                     |           |          |
 modified_on | timestamp without time zone |           | not null |
Indexes:
    "idx_rel_group_user" UNIQUE, btree (user_id, group_id)
    "idx_rel_group_user_groupid" btree (group_id)
    "idx_rel_group_user_userid" btree (user_id)
Foreign-key constraints:
    "fk_rel_group_user_1" FOREIGN KEY (user_id) REFERENCES access_user(user_id)
    "fk_rel_group_user_2" FOREIGN KEY (group_id) REFERENCES access_group(group_id)




Re: How does this FK constraint error happen?

От
"Peter J. Holzer"
Дата:
On 2024-07-15 10:04:39 -0400, Ron Johnson wrote:
> The job does DELETE FROM rel_group_user; (no WHERE clause!!) then does DELETE
> FROM public.access_user; (also no WHERE clause), but the public.access_user
>  statement fails on FK constraint error on rel_group_user (which was just
> recently emptied).
>
> Each statement is in a different transaction, since they are executed via
> separate psql statements.  Thus, no apparent MVCC visibility weirdness.
>
> My first thought, of course, was that there are two rel_group_user tables. 
> Alas, no, there's just one.  See below for grep statement.
>
> Excerpts from the cron job log file:
> [snip]
> 2024-07-15 02:40:04 Deleting from FISPTAPPGS401DA/TAPd.rel_group_user
> DELETE FROM rel_group_user;
> DELETE 42747
> [snip]
> 2024-07-15 02:41:15 Deleting from FISPTAPPGS401DA/TAPd.public.access_user
> DELETE FROM public.access_user;
> ERROR:  update or delete on table "access_user" violates foreign key constraint
> "fk_rel_group_user_1" on table "rel_group_user"
> DETAIL:  Key (user_id)=(1210) is still referenced from table "rel_group_user".
> ERROR: deleting FISPTAPPGS401DA/TAPd.public.access_user
> [snip]

Is it possible that some other process created an entry in
rel_group_user between these two queries?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: How does this FK constraint error happen?

От
Ron Johnson
Дата:
On Mon, Jul 15, 2024 at 10:35 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
[snip] 

Is it possible that some other process created an entry in
rel_group_user between these two queries?
 
That was, in fact, the problem.  At just the wrong time to impact one of the child databases (TAPd), but not the other two (TAPb and TAPc).

TAPd=# select * from rel_group_user
where user_id between 1100 and 1300
order by user_id;
 user_id | group_id | modified_by |       modified_on      
---------+----------+-------------+-------------------------
    1133 |        2 |        1133 | 2024-07-15 08:43:35.669
    1142 |        2 |        1142 | 2024-07-15 09:05:58.451
    1147 |        2 |        1147 | 2024-07-15 09:30:37.169
    1158 |        2 |        1158 | 2024-07-15 09:36:45.142
    1197 |        2 |        1197 | 2024-07-15 09:52:58.477
    1210 |        2 |        1210 | 2024-07-15 02:42:09.355  <<<<<<<<<<<<<
    1229 |        2 |        1229 | 2024-07-15 08:33:48.443
    1242 |        2 |        1242 | 2024-07-15 10:29:51.176
    1260 |        2 |        1260 | 2024-07-15 07:36:21.182
    1283 |        2 |        1283 | 2024-07-15 09:48:25.214
    1288 |        2 |        1288 | 2024-07-15 08:10:33.609
(11 rows)

TAPd=# select user_id, login_id, created_on, modified_on
TAPd-# from public.access_user
TAPd-# where user_id = 1210;
 user_id |  login_id  |       created_on        |       modified_on      
---------+------------+-------------------------+-------------------------
    1210 | JORIEUSER3 | 2023-10-20 11:54:24.562 | 2024-07-15 02:42:09.355
(1 row)
 

Re: How does this FK constraint error happen?

От
Adrian Klaver
Дата:
On 7/15/24 07:53, Ron Johnson wrote:
> On Mon, Jul 15, 2024 at 10:35 AM Peter J. Holzer <hjp-pgsql@hjp.at 
> <mailto:hjp-pgsql@hjp.at>> wrote:
> [snip]
> 
> 
>     Is it possible that some other process created an entry in
>     rel_group_user between these two queries?
> 
> That was, in fact, the problem.  At just the wrong time to impact one of 
> the child databases (TAPd), but not the other two (TAPb and TAPc).
> 
> TAPd=# select * from rel_group_user
> where user_id between 1100 and 1300
> order by user_id;
>   user_id | group_id | modified_by |       modified_on
> ---------+----------+-------------+-------------------------
>      1133 |        2 |        1133 | 2024-07-15 08:43:35.669
>      1142 |        2 |        1142 | 2024-07-15 09:05:58.451
>      1147 |        2 |        1147 | 2024-07-15 09:30:37.169
>      1158 |        2 |        1158 | 2024-07-15 09:36:45.142
>      1197 |        2 |        1197 | 2024-07-15 09:52:58.477
>      1210 |        2 |        1210 | 2024-07-15 02:42:09.355  <<<<<<<<<<<<<

Time travel?

2024-07-15 02:41:15 Deleting from FISPTAPPGS401DA/TAPd.public.access_user
DELETE FROM public.access_user;

Or do the cron jobs take that long to execute?

How is modified_on created?


>      1229 |        2 |        1229 | 2024-07-15 08:33:48.443
>      1242 |        2 |        1242 | 2024-07-15 10:29:51.176
>      1260 |        2 |        1260 | 2024-07-15 07:36:21.182
>      1283 |        2 |        1283 | 2024-07-15 09:48:25.214
>      1288 |        2 |        1288 | 2024-07-15 08:10:33.609
> (11 rows)
> 
> TAPd=# select user_id, login_id, created_on, modified_on
> TAPd-# from public.access_user
> TAPd-# where user_id = 1210;
>   user_id |  login_id  |       created_on        |       modified_on
> ---------+------------+-------------------------+-------------------------
>      1210 | JORIEUSER3 | 2023-10-20 11:54:24.562 | 2024-07-15 02:42:09.355
> (1 row)

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: How does this FK constraint error happen?

От
Ron Johnson
Дата:
On Mon, Jul 15, 2024 at 11:06 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 7/15/24 07:53, Ron Johnson wrote:
> On Mon, Jul 15, 2024 at 10:35 AM Peter J. Holzer <hjp-pgsql@hjp.at
> <mailto:hjp-pgsql@hjp.at>> wrote:
> [snip]
>
>
>     Is it possible that some other process created an entry in
>     rel_group_user between these two queries?
>
> That was, in fact, the problem.  At just the wrong time to impact one of
> the child databases (TAPd), but not the other two (TAPb and TAPc).
>
> TAPd=# select * from rel_group_user
> where user_id between 1100 and 1300
> order by user_id;
>   user_id | group_id | modified_by |       modified_on
> ---------+----------+-------------+-------------------------
>      1133 |        2 |        1133 | 2024-07-15 08:43:35.669
>      1142 |        2 |        1142 | 2024-07-15 09:05:58.451
>      1147 |        2 |        1147 | 2024-07-15 09:30:37.169
>      1158 |        2 |        1158 | 2024-07-15 09:36:45.142
>      1197 |        2 |        1197 | 2024-07-15 09:52:58.477
>      1210 |        2 |        1210 | 2024-07-15 02:42:09.355  <<<<<<<<<<<<<

Time travel?

😞
 

2024-07-15 02:41:15 Deleting from FISPTAPPGS401DA/TAPd.public.access_user
DELETE FROM public.access_user;

Or do the cron jobs take that long to execute?

The deletes from 26*3 tables (the same 26 tables in three children) took from 02:40:02 to 02:41:47.
 
Then a bunch of COPY statements run (pg_dump from the federation master, then COPY to the federation children).  Must be done in a specific order.

How is modified_on created?

It's updated by the application.
 
>      1229 |        2 |        1229 | 2024-07-15 08:33:48.443
>      1242 |        2 |        1242 | 2024-07-15 10:29:51.176
>      1260 |        2 |        1260 | 2024-07-15 07:36:21.182
>      1283 |        2 |        1283 | 2024-07-15 09:48:25.214
>      1288 |        2 |        1288 | 2024-07-15 08:10:33.609
> (11 rows)
>
> TAPd=# select user_id, login_id, created_on, modified_on
> TAPd-# from public.access_user
> TAPd-# where user_id = 1210;
>   user_id |  login_id  |       created_on        |       modified_on
> ---------+------------+-------------------------+-------------------------
>      1210 | JORIEUSER3 | 2023-10-20 11:54:24.562 | 2024-07-15 02:42:09.355
> (1 row)

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: How does this FK constraint error happen?

От
Adrian Klaver
Дата:
On 7/15/24 08:18, Ron Johnson wrote:
> On Mon, Jul 15, 2024 at 11:06 AM Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 7/15/24 07:53, Ron Johnson wrote:
>      > On Mon, Jul 15, 2024 at 10:35 AM Peter J. Holzer


>      > TAPd=# select * from rel_group_user
>      > where user_id between 1100 and 1300
>      > order by user_id;
>      >   user_id | group_id | modified_by |       modified_on
>      > ---------+----------+-------------+-------------------------
>      >      1133 |        2 |        1133 | 2024-07-15 08:43:35.669
>      >      1142 |        2 |        1142 | 2024-07-15 09:05:58.451
>      >      1147 |        2 |        1147 | 2024-07-15 09:30:37.169
>      >      1158 |        2 |        1158 | 2024-07-15 09:36:45.142
>      >      1197 |        2 |        1197 | 2024-07-15 09:52:58.477
>      >      1210 |        2 |        1210 | 2024-07-15 02:42:09.355 
>     <<<<<<<<<<<<<
> 
>     Time travel?
> 
> 
> 😞
> 
> 
>     2024-07-15 02:41:15 Deleting from
>     FISPTAPPGS401DA/TAPd.public.access_user
>     DELETE FROM public.access_user;
> 
>     Or do the cron jobs take that long to execute?
> 
> 
> The deletes from 26*3 tables (the same 26 tables in three children) took 
> from 02:40:02 to 02:41:47.
> Then a bunch of COPY statements run (pg_dump from the federation master, 
> then COPY to the federation children).  Must be done in a specific order.

I don't think it is entirely coincidental that 1210 is the only shown 
user_id with a modified_on value that is in proximity to the delete 
error. My suspicion is that actions are not happening in the exact order 
you think they are. I would think that combining DELETE FROM 
rel_group_user; and DELETE FROM public.access_user; in a single 
transaction would be a good start to fixing this.

> 
>     How is modified_on created?
> 
> 
> It's updated by the application.

At what point in the process?

> 
>      >      1229 |        2 |        1229 | 2024-07-15 08:33:48.443
>      >      1242 |        2 |        1242 | 2024-07-15 10:29:51.176
>      >      1260 |        2 |        1260 | 2024-07-15 07:36:21.182
>      >      1283 |        2 |        1283 | 2024-07-15 09:48:25.214
>      >      1288 |        2 |        1288 | 2024-07-15 08:10:33.609
>      > (11 rows)
>      >
>      > TAPd=# select user_id, login_id, created_on, modified_on
>      > TAPd-# from public.access_user
>      > TAPd-# where user_id = 1210;
>      >   user_id |  login_id  |       created_on        |       modified_on
>      >
>     ---------+------------+-------------------------+-------------------------
>      >      1210 | JORIEUSER3 | 2023-10-20 11:54:24.562 | 2024-07-15
>     02:42:09.355
>      > (1 row)
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: How does this FK constraint error happen?

От
Ron Johnson
Дата:
On Mon, Jul 15, 2024 at 11:37 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 7/15/24 08:18, Ron Johnson wrote:
> On Mon, Jul 15, 2024 at 11:06 AM Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 7/15/24 07:53, Ron Johnson wrote:
>      > On Mon, Jul 15, 2024 at 10:35 AM Peter J. Holzer


>      > TAPd=# select * from rel_group_user
>      > where user_id between 1100 and 1300
>      > order by user_id;
>      >   user_id | group_id | modified_by |       modified_on
>      > ---------+----------+-------------+-------------------------
>      >      1133 |        2 |        1133 | 2024-07-15 08:43:35.669
>      >      1142 |        2 |        1142 | 2024-07-15 09:05:58.451
>      >      1147 |        2 |        1147 | 2024-07-15 09:30:37.169
>      >      1158 |        2 |        1158 | 2024-07-15 09:36:45.142
>      >      1197 |        2 |        1197 | 2024-07-15 09:52:58.477
>      >      1210 |        2 |        1210 | 2024-07-15 02:42:09.355
>     <<<<<<<<<<<<<
>
>     Time travel?
>
>
> 😞
>
>
>     2024-07-15 02:41:15 Deleting from
>     FISPTAPPGS401DA/TAPd.public.access_user
>     DELETE FROM public.access_user;
>
>     Or do the cron jobs take that long to execute?
>
>
> The deletes from 26*3 tables (the same 26 tables in three children) took
> from 02:40:02 to 02:41:47.
> Then a bunch of COPY statements run (pg_dump from the federation master,
> then COPY to the federation children).  Must be done in a specific order.

I don't think it is entirely coincidental that 1210 is the only shown
user_id with a modified_on value that is in proximity to the delete
error.

I don't think so either.
 
My suspicion is that actions are not happening in the exact order
you think they are.

modified_on is CURRENT_TIMESTAMP or NOW() or somesuch.  I'm not sure, because I'm not privy to the code.

But I'm printing the system time in bash before every statement.
 
I would think that combining DELETE FROM
rel_group_user; and DELETE FROM public.access_user; in a single
transaction would be a good start to fixing this.

That is in fact what I'm working on now.  There are 26 tables, and they must be done in a specific order when deleting, and the reverse while inserting.

postgres_fdw would make this easier...
 
>
>     How is modified_on created?
>
>
> It's updated by the application.

At what point in the process?

>
>      >      1229 |        2 |        1229 | 2024-07-15 08:33:48.443
>      >      1242 |        2 |        1242 | 2024-07-15 10:29:51.176
>      >      1260 |        2 |        1260 | 2024-07-15 07:36:21.182
>      >      1283 |        2 |        1283 | 2024-07-15 09:48:25.214
>      >      1288 |        2 |        1288 | 2024-07-15 08:10:33.609
>      > (11 rows)
>      >
>      > TAPd=# select user_id, login_id, created_on, modified_on
>      > TAPd-# from public.access_user
>      > TAPd-# where user_id = 1210;
>      >   user_id |  login_id  |       created_on        |       modified_on
>      >
>     ---------+------------+-------------------------+-------------------------
>      >      1210 | JORIEUSER3 | 2023-10-20 11:54:24.562 | 2024-07-15
>     02:42:09.355
>      > (1 row)
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: How does this FK constraint error happen?

От
Adrian Klaver
Дата:
On 7/15/24 09:21, Ron Johnson wrote:
> On Mon, Jul 15, 2024 at 11:37 AM Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 


>     I don't think it is entirely coincidental that 1210 is the only shown
>     user_id with a modified_on value that is in proximity to the delete
>     error.
> 
> 
> I don't think so either.
> 
>     My suspicion is that actions are not happening in the exact order
>     you think they are.
> 
> 
> modified_on is CURRENT_TIMESTAMP or NOW() or somesuch.  I'm not sure, 
> because I'm not privy to the code.
> 
> But I'm printing the system time in bash before every statement.

That is why I wrote 'Time travel?'.

I suspect the modified_on time in the table is not accurately 
representing when the row is modified.

> 
>     I would think that combining DELETE FROM
>     rel_group_user; and DELETE FROM public.access_user; in a single
>     transaction would be a good start to fixing this.
> 
> 
> That is in fact what I'm working on now.  There are 26 tables, and they 
> must be done in a specific order when deleting, and the reverse while 
> inserting.
> 
> postgres_fdw would make this easier...

It can't be installed?



-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: How does this FK constraint error happen?

От
Ron Johnson
Дата:
On Mon, Jul 15, 2024 at 12:47 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 7/15/24 09:21, Ron Johnson wrote:
> On Mon, Jul 15, 2024 at 11:37 AM Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>


>     I don't think it is entirely coincidental that 1210 is the only shown
>     user_id with a modified_on value that is in proximity to the delete
>     error.
>
>
> I don't think so either.
>
>     My suspicion is that actions are not happening in the exact order
>     you think they are.
>
>
> modified_on is CURRENT_TIMESTAMP or NOW() or somesuch.  I'm not sure,
> because I'm not privy to the code.
>
> But I'm printing the system time in bash before every statement.

That is why I wrote 'Time travel?'.

I suspect the modified_on time in the table is not accurately
representing when the row is modified.

That JBDC code is pretty slow...
 

>
>     I would think that combining DELETE FROM
>     rel_group_user; and DELETE FROM public.access_user; in a single
>     transaction would be a good start to fixing this.
>
>
> That is in fact what I'm working on now.  There are 26 tables, and they
> must be done in a specific order when deleting, and the reverse while
> inserting.
>
> postgres_fdw would make this easier...

It can't be installed?

Less bureaucratic overhead to write a script.

Re: How does this FK constraint error happen?

От
Adrian Klaver
Дата:

On 7/15/24 10:14 AM, Ron Johnson wrote:

>     It can't be installed?
> 
> 
> Less bureaucratic overhead to write a script.
> 

But potentially fewer errors.

Also if the current process allows for connecting to one database and 
transferring from it to another, I'm not sure how using postgres_fdw 
really changes anything?

-- 
Adrian Klaver
adrian.klaver@aklaver.com