Обсуждение: EXPLAIN PLAN for DELETE CASCADE or DELETE not using pkey indexdespite EXPLAINing that it would?

Поиск
Список
Период
Сортировка
Hi,

I have 250 rows to delete, but they are a target to a bunch of child 
tables with foreign key on delete cascade.

EXPLAIN DELETE FROM Foo WHERE id = (SELECT fooId FROM Garbage);

shows me that it uses the nested loop by Foo_pkey index to find the 250 
items from Garbage to be deleted.

But once that starts, I see HUGE amount of read activity from the 
tablespace Foo_main that contains the Foo table, and only the Foo table, 
not the Foo_pkey, not any other index, not any other child table, not 
even the toast table for Foo is contained in that tablespace (I have the 
toast table diverted with symlinks to another volume).

I see the read activity with iostat, reading heavily at 130 MB/s for a 
long time until my burst balance is used up, then continuing to churn 
with 32 MB/s.

I also see the read activity with iotop, that tells me that it is that 
postgres backend running the DELETE query that is doing this, not some 
autovacuum nor anything else.

It looks to me that in actuality it is doing a sequential scan for each 
of the 250 rows, despite it EPLAINing to me that it was going to use 
that index.

It would really be good to know what it is churning so heavily?

I have seen some ways of using dtrace or things like that to do some 
measurement points. But I haven't seen how this is done to inspect the 
effective execution plan and where in that plan it is, i.e., which 
iteration. It would be nice if there was some way of doing a "deep 
explain plan" or even better, having an idea of the execution plan which 
the executor is actually following, and a way to report on the current 
status of work according to this plan.

How else do I figure out what causes this heavy read activity on the 
main Foo table?

This is something I might even want to contribute. For many years I am 
annoyed by this waiting for long running statement without any idea 
where it is and how much is there still to go. If I have a plan 
structure and an executor that follows the plan structure, there must be 
a way to dump it out.

The pg_stat_activity table might contain a current statement id, and 
then a superuser might ask EXPLAIN STATEMENT :statementId. Or just a 
pg_plantrace command which would dump the current plan with an 
indication of completion % of each step.

But also delete cascades and triggers should be viewable from this, they 
should be traced, I am sure that somewhere inside there is some data 
structure representing this activity and all it would take is to dump it?

regards,
-Gunther



Gunther <raj@gusw.net> writes:
> I have 250 rows to delete, but they are a target to a bunch of child 
> tables with foreign key on delete cascade.

> EXPLAIN DELETE FROM Foo WHERE id = (SELECT fooId FROM Garbage);

> shows me that it uses the nested loop by Foo_pkey index to find the 250 
> items from Garbage to be deleted.

> But once that starts, I see HUGE amount of read activity from the 
> tablespace Foo_main that contains the Foo table, and only the Foo table, 
> not the Foo_pkey, not any other index, not any other child table, not 
> even the toast table for Foo is contained in that tablespace (I have the 
> toast table diverted with symlinks to another volume).

I'm betting you neglected to index the referencing column for one
or more of the foreign keys involved.  You can get away with that
as long as you're not concerned with the speed of DELETE ...

            regards, tom lane


On 3/21/2019 17:16, Tom Lane wrote:
> Gunther <raj@gusw.net> writes:
>> I have 250 rows to delete, but they are a target to a bunch of child
>> tables with foreign key on delete cascade.
>> EXPLAIN DELETE FROM Foo WHERE id = (SELECT fooId FROM Garbage);
>> shows me that it uses the nested loop by Foo_pkey index to find the 250
>> items from Garbage to be deleted.
>> But once that starts, I see HUGE amount of read activity from the
>> tablespace Foo_main that contains the Foo table, and only the Foo table,
>> not the Foo_pkey, not any other index, not any other child table, not
>> even the toast table for Foo is contained in that tablespace (I have the
>> toast table diverted with symlinks to another volume).
> I'm betting you neglected to index the referencing column for one
> or more of the foreign keys involved.  You can get away with that
> as long as you're not concerned with the speed of DELETE ...
>
>             regards, tom lane

I had the same suspicion. But firstly my schema is generated 
automatically and all foreign keys have the indexes.

But what is even more stunning is that the table where this massive read 
activity happens is the Foo heap table. I verified that by using strace 
where all the massive amounts of reads are on those files for the main 
Foo table. And this doesn't make sense, since any foreign key targets 
its primary key. The foreign keys of the child tables are also indexed 
and there is no io on the volumes that hold these child tables, nor is 
the io on the volume that holds the Foo_pkey.




Re: EXPLAIN PLAN for DELETE CASCADE or DELETE not using pkey indexdespite EXPLAINing that it would?

От
Justin Pryzby
Дата:
On Thu, Mar 21, 2019 at 03:31:42PM -0400, Gunther wrote:
> Hi,
> 
> I have 250 rows to delete, but they are a target to a bunch of child tables
> with foreign key on delete cascade.
> 
> EXPLAIN DELETE FROM Foo WHERE id = (SELECT fooId FROM Garbage);

Probably because:
https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK
"Since a DELETE of a row from the referenced table [...] will require a scan of
the referencing table for rows matching the old value, it is often a good idea
to index the referencing columns too."

Can you show "\d+ foo", specifically its FKs ?

Justin