Обсуждение: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

Поиск
Список
Период
Сортировка

Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Andy Fan
Дата:
Hi,

During my recent work,  I need some new stuff attached to Relation.  Rather than adding
some new data structures,  I added it to Relation directly.  Like relation->balabala.  Then
I initialize it during ExecutorRun,  like  table_tuple_insert.. and destroy it at ExecutorEnd.

The above solution works based on 2 assumptions at least: 
1.  During the ExecutorRun & ExecutorEnd,  the relcache will never by invalidated, if not
the old relation->balabala will be lost.  I assume this is correct since I didn't see any places
where we handle such changes in Executor code. 
2.  We need to consider the ExuecotRun raised error,  we need to destroy the balabala resource
as well.  so I added it to the RelationClose function.  

So the overall design works like this:

xxx_table_tuple_insert(Relation rel, ...)
{
   if (rel->balabala == NULL)
        rel->balabala = allocate_bala_resource(rel);  // Allocate the memory xCtx which is under TopTransactionContext.
   do_task_with(rel->balabala); 
}

at the end of the executor,  I run

release_bala_resource(Relation rel)
{
   if (rel->balabala == NULL)
         return;
   do_the_real_task();
   MemoryContextDelete(rel->bala->memctx); 
   rel->balabala = NULL
}

For the failed cases:

RelationClose(..)
{
   if (RelationHasReferenceCountZero(relation))
         release_bala_resource(relation); 
}

Will my suluation work?

--
Best Regards
Andy Fan

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Tom Lane
Дата:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> During my recent work,  I need some new stuff attached to Relation.  Rather
> than adding
> some new data structures,  I added it to Relation directly.  Like
> relation->balabala.  Then
> I initialize it during ExecutorRun,  like  table_tuple_insert.. and
> destroy it at ExecutorEnd.

This is not going to work, at least not if you expect that a relcache
reset would not preserve the data.  Also, what happens in the case of
nested executor runs touching the same relation?

Why do you think this ought to be in the relcache, and not in the
executor's rangetable-associated data structures?

            regards, tom lane



Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Robert Haas
Дата:
On Mon, Nov 29, 2021 at 2:10 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> 1.  During the ExecutorRun & ExecutorEnd,  the relcache will never by invalidated, if not
> the old relation->balabala will be lost.  I assume this is correct since I didn't see any places
> where we handle such changes in Executor code.

It's not correct. We accept invalidation messages in a number of
different code paths, including whenever we acquire a lock on a
relation. That doesn't typically happen in the middle of a query, but
that's just because most queries don't happen to do anything that
would make it happen. They can, though. For example, the query can
call a user-defined function that accesses a table not previously
touched by the transaction. Or a built-in function that does the same
thing, like table_to_xml().

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Andy Fan
Дата:


On Mon, Nov 29, 2021 at 10:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> During my recent work,  I need some new stuff attached to Relation.  Rather
> than adding
> some new data structures,  I added it to Relation directly.  Like
> relation->balabala.  Then
> I initialize it during ExecutorRun,  like  table_tuple_insert.. and
> destroy it at ExecutorEnd.
 
at least not if you expect that a relcache reset would not preserve the data. 

Thanks for looking into this question. 

I am not clear about this sentence .  IMO,  if the relcache is reset,  my data 
would be lost,  That's why I expect there is no relcache reset happening in 
Executor Stage,  are you talking about something different?  
 
Also, what happens in the case of nested executor runs touching the same relation?

If you are talking about the RelationClose would be called many times, then
my solution is my resource is only destroyed when the refcount == 0;   If you are talking
about the different situation needs different resource types, then that's something
I didn't describe it clearly.  At the current time, we can assume "For 1 SQL statement, there
are only 1 resource needed per relation, even the executor or nest executor touch the
same relation many times".   I'd like to describe this clearly as well for review purposes, 
but I want to focus on one topic only first.  So let's first assume my assumption is correct. 
 
Why do you think this ought to be in the relcache, and not in the
executor's rangetable-associated data structures?


I just see the ExecutorEnd code is not called after the exceptions case.  and I hope my resource
can be released totally even if the statement raises errors.  for example:

CREATE TABLE t(A INT PRIMARY KEY);

INSERT INTO t VALUES(1);
INSERT INTO t VALUES(1);

In the above case, the ExecutorEnd is not called, but the RelationClose will be absolutely called
during the ResourceOwnerRelease call. 

--
Best Regards
Andy Fan

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Andy Fan
Дата:


On Mon, Nov 29, 2021 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Nov 29, 2021 at 2:10 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> 1.  During the ExecutorRun & ExecutorEnd,  the relcache will never by invalidated, if not
> the old relation->balabala will be lost.  I assume this is correct since I didn't see any places
> where we handle such changes in Executor code.

It's not correct. We accept invalidation messages in a number of
different code paths, including whenever we acquire a lock on a
relation. That doesn't typically happen in the middle of a query, but
that's just because most queries don't happen to do anything that
would make it happen. They can, though.

Thanks for looking into this question. 

Actually I think this would not happen.  My reasons are:

1. I think the case which would cause the relcache reset needs a lock, after
the executor start,  the executor lock is acquired so on one can really have chances
to send an invalidation message until the lock is released. (let me first ignore the 2 
examples you talked about, and I will talk about it later). 

2. _If_ the relation can be reset after we open it during Executor code, then would the
relation (RelationData *) pointed memory still validated after the relcache reset?  For example

CREATE TABLE t(a int);
INSERT INTO t VALUES(1), (2);

UPDATE t set a = 100; 

We need to update 2 tuples in the update statement, if the relcache is reset,  can we still use the previous 
(RelationData *) to do the following update?  If not, what code is used to change the relation for the old relcache
address to the new relcache.   I assumed (RelationData *) pointer. to the relcache directly, hope this is correct.. 

For example, the query can
call a user-defined function that accesses a table not previously
touched by the transaction. Or a built-in function that does the same
thing, like table_to_xml().


OK, this is something I missed before.  but looks it would not caused different _in my case_;  

IIUC, you are describing the thing like this:

CREATE FUNCTION udf() 
...
SELECT * FROM t2;
$$;

SELECT udf() FROM t1;

then the relation t2 will not be opened at the beginning of ExecutorRun and it will be only opened
when we fetch the first tuple from t1;  so we can have cache invalidation between ExecutorRun and
the first call of udf. 

But in my case, my exception should be that the relcache should not be invalidated _after the first relation open_
in the executor (not the beginning of executorRun), this is something I didn't describe well when I post
my message since I didn't find out this situation at that time. 

 
--
Best Regards
Andy Fan

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Andy Fan
Дата:
 
Why do you think this ought to be in the relcache, and not in the
executor's rangetable-associated data structures?

I re-think about this,  I guess I didn't mention something  clear enough.
That's true that I bound my bala struct to Relation struct, and the memory
relation used  is allocated in relcache.  but the memory of bala is allocated in
TopTransactionMemory context.  

xxx_table_tuple_insert(Relation rel, ...)
{
   if (rel->balabala == NULL)
        rel->balabala = allocate_bala_resource(rel);  //  TopTransactionContext.
   do_task_with(rel->balabala); 
}

not sure if this should be called as putting my data in relcache. 

and I rechecked the RelationData struct, and it looks like some Executor-bind struct also
resides in it. for example: RelationData.rd_lookInfo.  If the relcache can be reset, the 
fields like this are unsafe to access as well.  Am I missing something? 

--
Best Regards
Andy Fan

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Dilip Kumar
Дата:
On Tue, Nov 30, 2021 at 6:44 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>>
>>
>>>
>>> Why do you think this ought to be in the relcache, and not in the
>>> executor's rangetable-associated data structures?
>
>
> I re-think about this,  I guess I didn't mention something  clear enough.
> That's true that I bound my bala struct to Relation struct, and the memory
> relation used  is allocated in relcache.  but the memory of bala is allocated in
> TopTransactionMemory context.
>
> xxx_table_tuple_insert(Relation rel, ...)
> {
>    if (rel->balabala == NULL)
>         rel->balabala = allocate_bala_resource(rel);  //  TopTransactionContext.
>    do_task_with(rel->balabala);
> }
>
> not sure if this should be called as putting my data in relcache.
>
> and I rechecked the RelationData struct, and it looks like some Executor-bind struct also
> resides in it. for example: RelationData.rd_lookInfo.  If the relcache can be reset, the
> fields like this are unsafe to access as well.  Am I missing something?

I think you are talking about rd_lockInfo? first, think this is not a
pointer so even if you get a new recache entry this memory will be
allocated along with the new relation descriptor and it will be
initialized whenever a new relation descriptor is created check
RelationInitLockInfo().  You will see there are many pointers also in
RelationData but we ensure before we access them they are initialized,
and most of them are allocated while building the relation descriptor
see RelationBuildDesc().

So if you keep some random pointer in RelationData without ensuring
that is getting reinitialized while building the relation cache then I
think that's not the correct way to do it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Andy Fan
Дата:

 You will see there are many pointers also in
RelationData but we ensure before we access them they are initialized,

The initialized values are not much helpful in the cases I provided here.

What do you think about this question?

2. _If_ the relation can be reset after we open it during Executor code, then would the
relation (RelationData *) pointed memory still validated after the relcache reset?  For example

CREATE TABLE t(a int);
INSERT INTO t VALUES(1), (2);

UPDATE t set a = 100; 

We need to update 2 tuples in the update statement, if the relcache is reset after the first tuple is
updated,  can we still use the previous (RelationData *) to do the 2nd update? This is just 
a common example.  If you would say, in this case the relcache can't be reset,  then the question
come back to what situation the relcache can be reset between the first time I open it during execution
code  and the end of execution code.  I think we have some talks about this at [1]. 

Right now, I am not pretty sure I am doing something well.  That's why I post the question here.  but still
I didn't find a better solution right now. [2]

 
--
Best Regards
Andy Fan

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Dilip Kumar
Дата:
On Tue, Nov 30, 2021 at 12:12 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>>
>>
>>  You will see there are many pointers also in
>> RelationData but we ensure before we access them they are initialized,
>
>
> The initialized values are not much helpful in the cases I provided here.
>
> What do you think about this question?
>
> 2. _If_ the relation can be reset after we open it during Executor code, then would the
> relation (RelationData *) pointed memory still validated after the relcache reset?  For example
>
> CREATE TABLE t(a int);
> INSERT INTO t VALUES(1), (2);
>
> UPDATE t set a = 100;
>
> We need to update 2 tuples in the update statement, if the relcache is reset after the first tuple is
> updated,  can we still use the previous (RelationData *) to do the 2nd update? This is just
> a common example.  If you would say, in this case the relcache can't be reset,  then the question
> come back to what situation the relcache can be reset between the first time I open it during execution
> code  and the end of execution code.  I think we have some talks about this at [1].

IMHO, if you are doing an update then you must be already holding the
relation lock, so even if you call some UDF (e.g. UPDATE t set a = 100
WHERE x= UDF()) and it accepts the invalidation it wont affect your
RelationData because you are already holding the lock so parallelly
there could not be any DDL on this particular relation so this recache
entry should not be invalidated at least in the example you have
given.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Andy Fan
Дата:

Thanks for everyone's insight so far!

my exception should be that the relcache should not be invalidated _after the first relation open_
in the executor (not the beginning of executorRun)。 


s/exception/expectation. 

To be more accurate,  my expectation is for a single sql statement,  after the first time I write data into
the relation, until the statement is completed or "aborted and RelationClose is called in ResourceOwnerRelease",
the relcache reset should never happen. 

Since there are many places the write table access methods can be called like COPY,  CAST,  REFRESH MATVIEW,
VACUUM and ModifyNode,  and it is possible that we can get error in each place,  so I think RelationClose
should be a great places for exceptional case(at the same time, we should remember to destroy properly for non
exceptional case). 

This might be not very professional since I bind some executor related data into a relcache related struct.
But it should be workable in my modified user case.  The most professional method I can think out is adding
another resource type in ResourceOwner and let ResourceOwnerRelease to handle the exceptional cases.

--
Best Regards
Andy Fan

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Robert Haas
Дата:
On Tue, Nov 30, 2021 at 4:47 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> my exception should be that the relcache should not be invalidated _after the first relation open_
>> in the executor (not the beginning of executorRun)。
>
> s/exception/expectation.
>
> To be more accurate,  my expectation is for a single sql statement,  after the first time I write data into
> the relation, until the statement is completed or "aborted and RelationClose is called in ResourceOwnerRelease",
> the relcache reset should never happen.

Well .... I'm not sure why you asked the question and then argued with
the answer you got. I mean, you can certainly decide how you think it
works, but that's not how I think it works.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Andy Fan
Дата:
On Wed, Dec 1, 2021 at 3:33 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Nov 30, 2021 at 4:47 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> my exception should be that the relcache should not be invalidated _after the first relation open_
>> in the executor (not the beginning of executorRun)。
>
> s/exception/expectation.
>
> To be more accurate,  my expectation is for a single sql statement,  after the first time I write data into
> the relation, until the statement is completed or "aborted and RelationClose is called in ResourceOwnerRelease",
> the relcache reset should never happen.

Well .... I'm not sure why you asked the question and then argued with
the answer you got.

I think you misunderstand me,  I argued with the answer because after I got the
answer and I rethink my problem, I found my question description is not accurate
enough,  so I improved the question and willing discussion again. My exception was
things will continue with something like this:
1. In your new described situation,  your solution still does not work because ...
2. In your new described situation,  the solution would work for sure
3.  your situation is still not cleared enough. 


--
Best Regards
Andy Fan

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Robert Haas
Дата:
On Tue, Nov 30, 2021 at 7:50 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> I think you misunderstand me,  I argued with the answer because after I got the
> answer and I rethink my problem, I found my question description is not accurate
> enough,  so I improved the question and willing discussion again. My exception was
> things will continue with something like this:
> 1. In your new described situation,  your solution still does not work because ...
> 2. In your new described situation,  the solution would work for sure
> 3.  your situation is still not cleared enough.

I mean, it's clear to me that you can make a new table get opened at
any time in the execution of a query. Just use CASE or an IF statement
inside of a stored procedure to do nothing for the first 9999 calls
and then on call 10000 access a new relation. And as soon as that
happens, you can AcceptInvalidationMessages(), which can cause
relcache entries to be destroyed or rebuilt. If the relation is open,
the relcache entry can't be destroyed altogether, but it can be
rebuilt: see RelationClearRelation(). Whether that's a problem for
what you are doing I don't know. But the overall point is that access
to a new relation can happen at any point in the query -- and as soon
as it does, we will accept ALL pending invalidation messages for ALL
relations regardless of what locks anyone holds on anything.

So it's generally a mistake to assume that relcache entries are
"stable" across large stretches of code. They are in fact stable in a
certain sense - if we have the relation open, we hold a reference
count on it, and so the Relation pointer itself will remain valid. But
the data it points to can change in various ways, and different
members of the RelationData struct are handled differently. Separately
from the reference count, the heavyweight lock that we also hold on
the relation as a condition of opening it prevents certain kinds of
changes, so that even if the relation cache entry is rebuilt, certain
particular fields will be unaffected. Which fields are protected in
this way will depend on what kind of lock is held. It's hard to speak
in general terms. The best advice I can give you is (1) look exactly
what RelationClearRelation() is going to do to the fields you care
about if a rebuild happens, (2) err on the side of assuming that
things can change under you, and (3) try running your code under
debug_discard_caches = 1. It will be slow that way, but it's pretty
effective in finding places where you've made unsafe assumptions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Andy Fan
Дата:

If the relation is open,
the relcache entry can't be destroyed altogether, but it can be
rebuilt: see RelationClearRelation().

Thanks! This is a new amazing knowledge for me!
 
They are in fact stable in a certain sense - if we have the relation open
we hold a reference count on it, and so the Relation pointer itself will 
remain valid.

This sounds amazing as well. 
 
But
the data it points to can change in various ways, and different
members of the RelationData struct are handled differently. Separately
from the reference count, the heavyweight lock that we also hold on
the relation as a condition of opening it prevents certain kinds of
changes, so that even if the relation cache entry is rebuilt, certain
particular fields will be unaffected. Which fields are protected in
this way will depend on what kind of lock is held. It's hard to speak
in general terms.

Amazing++; 
 
The best advice I can give you is (1) look exactly
what RelationClearRelation() is going to do to the fields you care
about if a rebuild happens, (2) err on the side of assuming that
things can change under you, and (3) try running your code under
debug_discard_caches = 1. It will be slow that way, but it's pretty
effective in finding places where you've made unsafe assumptions.


Thanks! I clearly understand what's wrong in my previous knowledge. 
That is, after a relation is open with some lock, then the content of the relation
will never change until the RelationClose.  It would take time to fill the
gap, but I'd like to say "thank you!" first. 

--
Best Regards
Andy Fan

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

От
Robert Haas
Дата:
On Wed, Dec 1, 2021 at 10:58 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> Thanks! I clearly understand what's wrong in my previous knowledge.

Cool, glad it helped.

-- 
Robert Haas
EDB: http://www.enterprisedb.com