Обсуждение: Deleting, indexes and transactions

Поиск
Список
Период
Сортировка

Deleting, indexes and transactions

От
Stevo Slavić
Дата:
Hello PostgreSQL community,

Two tables, A and B, both with auto generated technical PK, A and B are in relationship via nullable non-unique FK a_fk column in B to A's PK. There are no other relationships involving table A. Lets say A has ~20k rows, and B ~500k rows.

When there is no index on a_fk column, if one deletes Bs with DELETE FROM b WHERE a_fk IS NOT NULL, and then in same transaction also deletes all As - deleting As lasts painfully long.

Adding an index on FK in B, improves A deletion times significantly.

Can someone please provide an explanation/rationale of this behavior, why does it take so long to delete As in first case without index? Thanks in advance!

Btw, I'm using PostgreSQL 9.0. Will try how 9.1 behaves.

Kind regards,
Stevo.

Re: Deleting, indexes and transactions

От
Adrian Klaver
Дата:
On 05/28/2012 07:23 AM, Stevo Slavić wrote:
> Hello PostgreSQL community,
>
> Two tables, A and B, both with auto generated technical PK, A and B are
> in relationship via nullable non-unique FK a_fk column in B to A's PK.
> There are no other relationships involving table A. Lets say A has ~20k
> rows, and B ~500k rows.
>
> When there is no index on a_fk column, if one deletes Bs with DELETE
> FROM b WHERE a_fk IS NOT NULL, and then in same transaction also deletes
> all As - deleting As lasts painfully long.
>
> Adding an index on FK in B, improves A deletion times significantly.
>
> Can someone please provide an explanation/rationale of this behavior,
> why does it take so long to delete As in first case without index?
> Thanks in advance!

It is documented behavior:
http://www.postgresql.org/docs/9.0/interactive/sql-createtable.html

"If the referenced column(s) are changed frequently, it might be wise to
add an index to the foreign key column so that referential actions
associated with the foreign key column can be performed more efficiently."

Though in your case would it not be worth it to just have an ON DELETE
CASCADE clause on your FK?

>
> Btw, I'm using PostgreSQL 9.0. Will try how 9.1 behaves.
>
> Kind regards,
> Stevo.


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Deleting, indexes and transactions

От
Stevo Slavić
Дата:
Hello Adrian,

Thanks for replying!

I guess in this case, referential action, from your quote, on deleting As is check that there are no Bs referencing to-be-deleted A row. But since all Bs are deleted (not committed yet though) prior to deleting As, I don't understand why is this check taking that long time. Doesn't this transaction, that both deleting Bs and As belong to, have enough "knowledge" if not to skip this check then to at least have it executed faster? It seems, in case without index, that postgres is executing this referential integrity check sequentially over B data as if they were not deleted, it just skips raising error because it ultimately after long time finds B records are about to be deleted. It would be faster if postgres had a structure/info on transaction level which would allow it to execute following (sequential) queries/checks only over rows which haven't been marked for deletion - I guess that would add complexity. With index I guess postgres does same logic just uses index to lookup Bs referencing to-be-deleted A much faster, and then determines Bs have been marked for deletion and doesn't raise error. I wonder how other RDBMS behave in this case.

Anyway, regarding your second question, cascade delete hasn't been applied or tried yet. Case I've initially explained is one subcase of actual case that needs to be supported which is to sync As with an external source, which unfortunatelly doesn't provide info whether Bs have been changed or not for given A. So, there are two subcases, one where almost all data is dropped (As and Bs) and replaced with new, while in other subcase just some As data gets added while some As are deleted. In either case, we need to drop all Bs and add them because of lack of information of changes in Bs. Will check and see how that performs for both scenarios.

Kind regards,
Stevo.

On Mon, May 28, 2012 at 4:40 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
On 05/28/2012 07:23 AM, Stevo Slavić wrote:
Hello PostgreSQL community,

Two tables, A and B, both with auto generated technical PK, A and B are
in relationship via nullable non-unique FK a_fk column in B to A's PK.
There are no other relationships involving table A. Lets say A has ~20k
rows, and B ~500k rows.

When there is no index on a_fk column, if one deletes Bs with DELETE
FROM b WHERE a_fk IS NOT NULL, and then in same transaction also deletes
all As - deleting As lasts painfully long.

Adding an index on FK in B, improves A deletion times significantly.

Can someone please provide an explanation/rationale of this behavior,
why does it take so long to delete As in first case without index?
Thanks in advance!

It is documented behavior:
http://www.postgresql.org/docs/9.0/interactive/sql-createtable.html

"If the referenced column(s) are changed frequently, it might be wise to add an index to the foreign key column so that referential actions associated with the foreign key column can be performed more efficiently."

Though in your case would it not be worth it to just have an ON DELETE CASCADE clause on your FK?



Btw, I'm using PostgreSQL 9.0. Will try how 9.1 behaves.

Kind regards,
Stevo.


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Deleting, indexes and transactions

От
Adrian Klaver
Дата:
On 05/28/2012 08:46 AM, Stevo Slavić wrote:
> Hello Adrian,
>
> Thanks for replying!
>
> I guess in this case, referential action, from your quote, on deleting
> As is check that there are no Bs referencing to-be-deleted A row. But
> since all Bs are deleted (not committed yet though) prior to deleting
> As, I don't understand why is this check taking that long time. Doesn't
> this transaction, that both deleting Bs and As belong to, have enough
> "knowledge" if not to skip this check then to at least have it executed
> faster? It seems, in case without index, that postgres is executing this
> referential integrity check sequentially over B data as if they were not
> deleted, it just skips raising error because it ultimately after long
> time finds B records are about to be deleted. It would be faster if
> postgres had a structure/info on transaction level which would allow it
> to execute following (sequential) queries/checks only over rows which
> haven't been marked for deletion - I guess that would add complexity.
> With index I guess postgres does same logic just uses index to lookup Bs
> referencing to-be-deleted A much faster, and then determines Bs have
> been marked for deletion and doesn't raise error. I wonder how other
> RDBMS behave in this case.

FKs, as I understand it, are basically system triggers. The exact method
by which they work and the effect of indexes on that are beyond me at
this point. Others may have more insight.

>
> Anyway, regarding your second question, cascade delete hasn't been
> applied or tried yet. Case I've initially explained is one subcase of
> actual case that needs to be supported which is to sync As with an
> external source, which unfortunatelly doesn't provide info whether Bs
> have been changed or not for given A. So, there are two subcases, one
> where almost all data is dropped (As and Bs) and replaced with new,
> while in other subcase just some As data gets added while some As are
> deleted. In either case, we need to drop all Bs and add them because of
> lack of information of changes in Bs. Will check and see how that
> performs for both scenarios.

Well the issue seems to be with what you do to A not B. The FK is on B
but the reference is to A and when you do an action on A in it needs to
verify the state of the referring rows in B. By explicitly specifying a
course of action (ON DELETE CASCADE) you streamline the process in the
first case. In the second case it not as big an issue because you are
only changing a small subset of A.

>
> Kind regards,
> Stevo.
>



--
Adrian Klaver
adrian.klaver@gmail.com

Re: Deleting, indexes and transactions

От
Tom Lane
Дата:
Adrian Klaver <adrian.klaver@gmail.com> writes:
> On 05/28/2012 08:46 AM, Stevo Slavić wrote:
>> I guess in this case, referential action, from your quote, on deleting
>> As is check that there are no Bs referencing to-be-deleted A row. But
>> since all Bs are deleted (not committed yet though) prior to deleting
>> As, I don't understand why is this check taking that long time.

> FKs, as I understand it, are basically system triggers. The exact method
> by which they work and the effect of indexes on that are beyond me at
> this point. Others may have more insight.

Right.  The ON DELETE trigger for table A is fired for each row, and it
has to search table B to verify that there are no rows referencing the
to-be-deleted one.  If there's an index on the referencing column, this
is a reasonably cheap operation; without an index, not so much.  There
is no way for that trigger to know that the current transaction already
deleted all the referencing rows; and even if it did know that, it would
have to search table B anyway, because some other transaction could have
inserted a referencing row in between the DELETE on B and the DELETE on A.

> Well the issue seems to be with what you do to A not B. The FK is on B
> but the reference is to A and when you do an action on A in it needs to
> verify the state of the referring rows in B. By explicitly specifying a
> course of action (ON DELETE CASCADE) you streamline the process in the
> first case. In the second case it not as big an issue because you are
> only changing a small subset of A.

ON DELETE CASCADE isn't really going to help here, because that's just a
different trigger that has still got to search table B.  You still need
the index if you want its performance to not suck.

We have had several discussions over the years about whether a foreign
key constraint should require indexes on both sides of the constraint,
not just the PK side.  The answer has remained "no", on the grounds that
(a) this would be contrary to SQL standard, which does not require any
such thing, and (b) if you never delete keys from the PK table then you
don't really need to pay the cost of keeping an index on the FK column.
But the rule of thumb for most cases is that you want to have that
index.

            regards, tom lane