Обсуждение: index(fct(primary key)) kills INSERTs

Поиск
Список
Период
Сортировка

index(fct(primary key)) kills INSERTs

От
Frank Miles
Дата:
see attached bug-report

Re: index(fct(primary key)) kills INSERTs

От
Tom Lane
Дата:
Frank Miles <fpm@u.washington.edu> writes:
> If an index is created based on a function of the primary key,
> you cannot insert new entries into the database.

I think the critical point here is that your "function of the primary
key" is actually doing a SELECT from the table:

                SELECT INTO prec * FROM test_table WHERE tt_id = dum_int;
                IF NOT FOUND THEN
                        RAISE EXCEPTION ''project % not found'', dum_int;
                END IF;

When I try your example, I get

play=> INSERT INTO test_table (tt_descr) VALUES ('third - will fail');
ERROR:  project 3 not found

which surprises me not at all, because at the point where this function
is invoked, the new record with tt_id 3 hasn't been entered into the
table yet.

I'm not sure what you are really trying to accomplish here --- as you
say, it's a stripped-down example and not too intelligible.  As far
as the example goes, you could skip the SELECT and just use the
passed-in parameter value.  What was the original goal that made you
feel you needed to SELECT the about-to-be-inserted row?

            regards, tom lane

Re: index(fct(primary key)) kills INSERTs

От
Frank Miles
Дата:
On Fri, 10 Nov 2000, Tom Lane wrote:

> Frank Miles <fpm@u.washington.edu> writes:
> > If an index is created based on a function of the primary key,
> > you cannot insert new entries into the database.
>
> I think the critical point here is that your "function of the primary
> key" is actually doing a SELECT from the table:
>
>                 SELECT INTO prec * FROM test_table WHERE tt_id = dum_int;
>                 IF NOT FOUND THEN
>                         RAISE EXCEPTION ''project % not found'', dum_int;
>                 END IF;
>
> When I try your example, I get
>
> play=> INSERT INTO test_table (tt_descr) VALUES ('third - will fail');
> ERROR:  project 3 not found
>
> which surprises me not at all, because at the point where this function
> is invoked, the new record with tt_id 3 hasn't been entered into the
> table yet.

This makes sense if the index is updated before the new record is completely
added.  (showing my ignorance here:) Is that necessary?  Can't the new
index be updated after entry?  How can it be that the table has the
item (it's attempting to update the index based on a table that includes
the record '3'), yet the table doesn't have it (project 3 not found).
This smells like a race condition.

> I'm not sure what you are really trying to accomplish here --- as you
> say, it's a stripped-down example and not too intelligible.  As far
> as the example goes, you could skip the SELECT and just use the
> passed-in parameter value.  What was the original goal that made you
> feel you needed to SELECT the about-to-be-inserted row?

I don't need the function to work for the insertion.  That's only
used for queries.  But having the index clearly makes insertion impossible.

In the real tables, I have several indexes which depend on a variety
of values in the table.  A part of the database (eventually) will be
used to track projects done by my group for researchers at this University
and elsewhere.  Unfortunately different people at different times have
used different enumeration schemes for tracking these projects.  The
indexes allow me to have a simple SERIAL-based enumeration scheme,
while allowing the records to be accessed using older-format enumerations.

It also keeps response time to queries reasonable for these old formats.
The conversion is roughly a two-step process.  I have a set of python
routines that convert the old data formats into a form suitable for
postgres.  In this part of the process, the indicies are critical in
avoiding sequential scans, as the python routines query the database
in order to do their part of the conversion.  This has been especially
important when converting 'large numbers' (hundreds) of records from
various legacy data sources.  Without these indicies, processing can take
on the order of a second per record.  Once a set has been prepared,
the whole array is inserted as a single transaction.  The original data
have inconsistencies that are trapped by constraints set up in the
database.  While this is good -- we will end up with a cleaner database
-- it means that we have to go back to the data source, diagnose the
problem, make corrections, re-run the python scripts (need the indicies),
then try the insert transaction again (can't have the indicies).  The
whole process is ugly.

Clearly when I'm doing the inserts, these extra indicies aren't necessary.
In fact, my (hopefully temporary) workaround is to drop the indicies
when I need to insert new records; then restore the indicies.

In the long term, it's not critical for me, since eventually we won't be
importing 'large numbers' of records -- the functional index can be
killed, and people can simply wait the second or so for a query to be
processed.

I regard this bug as a 'wishlist'-level item : it seems that you should
be able to do this without the database choking.  It's not critical
for me -- I have a workaround.  I clearly know little about databases,
you are undoubtedly more knowledgable regarding whether my use of
this index might be something someone else might want.

Thank you for your attention to this matter.

    -frank

Re: index(fct(primary key)) kills INSERTs

От
Tom Lane
Дата:
Frank Miles <fpm@u.washington.edu> writes:
>> which surprises me not at all, because at the point where this function
>> is invoked, the new record with tt_id 3 hasn't been entered into the
>> table yet.

> This makes sense if the index is updated before the new record is completely
> added.  (showing my ignorance here:) Is that necessary?  Can't the new
> index be updated after entry?  How can it be that the table has the
> item (it's attempting to update the index based on a table that includes
> the record '3'), yet the table doesn't have it (project 3 not found).
> This smells like a race condition.

It's not a race condition, it's a visibility condition, and it's
entirely intentional.  The tuple is indeed physically there in the table
(else we wouldn't know where to make the index entry point), but you
can't see it until the command that's inserting it is completed.
This is just like the normal rule that you can't see the results of an
uncommitted transaction, except it applies to individual commands within
a transaction.  Without this rule we'd have all sorts of problems with
ghost tuples, infinite loops during UPDATE, etc.

You still haven't explained why your index function should need to use
a SELECT to retrieve the tuple it's going to index.  It's supposed to
get all the info it needs from the passed parameter(s).  Doing a SELECT
is not only extremely inefficient, but opens up all sorts of interesting
questions about whether your function is depending only on the tuple
it's allegedly indexing.  (A "functional index" that depends on more
inputs than just the indexed tuple is a logical contradiction that
will cause all kinds of headaches; for example, the system won't know
it needs to update the index entries when those other inputs change.)
Why don't you just make the function receive all the columns it needs as
passed parameters?

            regards, tom lane

Re: index(fct(primary key)) kills INSERTs

От
Frank Miles
Дата:
On Fri, 10 Nov 2000, Tom Lane wrote:

> Frank Miles <fpm@u.washington.edu> writes:
> >> which surprises me not at all, because at the point where this function
> >> is invoked, the new record with tt_id 3 hasn't been entered into the
> >> table yet.
>
> > This makes sense if the index is updated before the new record is completely
> > added.  (showing my ignorance here:) Is that necessary?  Can't the new
> > index be updated after entry?  How can it be that the table has the
> > item (it's attempting to update the index based on a table that includes
> > the record '3'), yet the table doesn't have it (project 3 not found).
> > This smells like a race condition.
>
> It's not a race condition, it's a visibility condition, and it's
> entirely intentional.  The tuple is indeed physically there in the table
> (else we wouldn't know where to make the index entry point), but you
> can't see it until the command that's inserting it is completed.
> This is just like the normal rule that you can't see the results of an
> uncommitted transaction, except it applies to individual commands within
> a transaction.  Without this rule we'd have all sorts of problems with
> ghost tuples, infinite loops during UPDATE, etc.

Good. Makes sense.

> You still haven't explained why your index function should need to use
> a SELECT to retrieve the tuple it's going to index.  It's supposed to
> get all the info it needs from the passed parameter(s).  Doing a SELECT
> is not only extremely inefficient, but opens up all sorts of interesting
> questions about whether your function is depending only on the tuple
> it's allegedly indexing.  (A "functional index" that depends on more
> inputs than just the indexed tuple is a logical contradiction that
> will cause all kinds of headaches; for example, the system won't know
> it needs to update the index entries when those other inputs change.)
> Why don't you just make the function receive all the columns it needs as
> passed parameters?

Ok, maybe the light slowly is dawning.  I need the capabilities of the
function analogous to what I've described, but it is a misuse of it
to use it for the purposes of indexing.  For that, I should generate
an additional (new) function which has the arguments of the columns much
as you describe.  It will do the same processing, but won't require
the SELECT.

Thanks again!  Sorry for being dense...

    -frank