Обсуждение: Why to index a "Recently DEAD" tuple when creating index

Поиск
Список
Период
Сортировка

Why to index a "Recently DEAD" tuple when creating index

От
Alex
Дата:
 HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */

 It is a tuple which has been deleted AND committed but before the delete there is a transaction started but not committed. Let call this transaction as Transaction A.

if we create index on this time, Let's call this index as Index A, it still index this record.  my question is why need this.   

The only reason I can think out (maybe also not reasonable enough) is:
If we index like this and the  isolate level of transaction A is serializable, it is possible that the query in transaction A can use Index A since it contains the snapshot data when the transaction A was began.   this reason may be not reasonable enough is because the transaction A may be should not see the index A at all. 


Re: Why to index a "Recently DEAD" tuple when creating index

От
Kuntal Ghosh
Дата:
On Mon, Jun 10, 2019 at 12:15 PM Alex <zhihui.fan1213@gmail.com> wrote:
 HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */

 It is a tuple which has been deleted AND committed but before the delete there is a transaction started but not committed. Let call this transaction as Transaction A.

if we create index on this time, Let's call this index as Index A, it still index this record.  my question is why need this.   

In this case, the changes of the tuple is not visible yet. Now suppose, your transaction A is serializable and you've another serializable transaction B which can see the index A. It generates a plan that requires to fetch the deleted tuple through an index scan. If the tuple is not present in the index, how are you going to create a conflict edge between transaction A and transaction B?

Basically, you need to identify the following clause to detect serializable conflicts:
Transaction A precedes transaction B. (Because, transaction A has deleted a tuple and it's not visible to transaction B)

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: Why to index a "Recently DEAD" tuple when creating index

От
Alex
Дата:


On Mon, Jun 10, 2019 at 3:28 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Mon, Jun 10, 2019 at 12:15 PM Alex <zhihui.fan1213@gmail.com> wrote:
 HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */

 It is a tuple which has been deleted AND committed but before the delete there is a transaction started but not committed. Let call this transaction as Transaction A.

if we create index on this time, Let's call this index as Index A, it still index this record.  my question is why need this.   

In this case, the changes of the tuple is not visible yet. Now suppose, your transaction A is serializable and you've another serializable transaction B which can see the index A. It generates a plan that requires to fetch the deleted tuple through an index scan. If the tuple is not present in the index, how are you going to create a conflict edge between transaction A and transaction B?

Basically, you need to identify the following clause to detect serializable conflicts:
Transaction A precedes transaction B. (Because, transaction A has deleted a tuple and it's not visible to transaction B)


thanks Ghosh.  Looks your answer is similar with my previous point (transaction is  serializable).   actually if the transaction B can't see the “deleted" which has been committed,  should it see the index A which is created after the "delete" transaction?


--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: Why to index a "Recently DEAD" tuple when creating index

От
Kuntal Ghosh
Дата:
On Mon, Jun 10, 2019 at 1:30 PM Alex <zhihui.fan1213@gmail.com> wrote:
>
>
>
> On Mon, Jun 10, 2019 at 3:28 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>>
>> On Mon, Jun 10, 2019 at 12:15 PM Alex <zhihui.fan1213@gmail.com> wrote:
>>>
>>>  HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */
>>>
>>>  It is a tuple which has been deleted AND committed but before the delete there is a transaction started but not
committed.Let call this transaction as Transaction A. 
>>>
>>> if we create index on this time, Let's call this index as Index A, it still index this record.  my question is why
needthis. 
>>>
>> In this case, the changes of the tuple is not visible yet. Now suppose, your transaction A is serializable and
you'veanother serializable transaction B which can see the index A. It generates a plan that requires to fetch the
deletedtuple through an index scan. If the tuple is not present in the index, how are you going to create a conflict
edgebetween transaction A and transaction B? 
>>
>> Basically, you need to identify the following clause to detect serializable conflicts:
>> Transaction A precedes transaction B. (Because, transaction A has deleted a tuple and it's not visible to
transactionB) 
>>
>
> thanks Ghosh.  Looks your answer is similar with my previous point (transaction is  serializable).   actually if the
transactionB can't see the “deleted" which has been committed,  should it see the index A which is created after the
"delete"transaction? 
>
I think what I'm trying to say is different.

For my case, the sequence is as following:
1. Transaction A has deleted a tuple, say t1 and got committed.
2. Index A has been created successfully.
3. Now, transaction B starts and use the index A to fetch the tuple
t1. While doing visibility check, transaction B gets to know that t1
has been deleted by a committed transaction A, so it can't see the
tuple. But, it creates a dependency edge that transaction A precedes
transaction B. This edge is required to detect a serializable conflict
failure.

If you don't create the index entry, it'll not be able to create that edge.


--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: Why to index a "Recently DEAD" tuple when creating index

От
Alex
Дата:


On Mon, Jun 10, 2019 at 4:10 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Mon, Jun 10, 2019 at 1:30 PM Alex <zhihui.fan1213@gmail.com> wrote:
>
>
>
> On Mon, Jun 10, 2019 at 3:28 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>>
>> On Mon, Jun 10, 2019 at 12:15 PM Alex <zhihui.fan1213@gmail.com> wrote:
>>>
>>>  HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */
>>>
>>>  It is a tuple which has been deleted AND committed but before the delete there is a transaction started but not committed. Let call this transaction as Transaction A.
>>>
>>> if we create index on this time, Let's call this index as Index A, it still index this record.  my question is why need this.
>>>
>> In this case, the changes of the tuple is not visible yet. Now suppose, your transaction A is serializable and you've another serializable transaction B which can see the index A. It generates a plan that requires to fetch the deleted tuple through an index scan. If the tuple is not present in the index, how are you going to create a conflict edge between transaction A and transaction B?
>>
>> Basically, you need to identify the following clause to detect serializable conflicts:
>> Transaction A precedes transaction B. (Because, transaction A has deleted a tuple and it's not visible to transaction B)
>>
>
> thanks Ghosh.  Looks your answer is similar with my previous point (transaction is  serializable).   actually if the transaction B can't see the “deleted" which has been committed,  should it see the index A which is created after the "delete" transaction?
>
I think what I'm trying to say is different.

For my case, the sequence is as following:
1. Transaction A has deleted a tuple, say t1 and got committed.
2. Index A has been created successfully.
3. Now, transaction B starts and use the index A to fetch the tuple
t1. While doing visibility check, transaction B gets to know that t1
has been deleted by a committed transaction A, so it can't see the
tuple. But, it creates a dependency edge that transaction A precedes
transaction B. This edge is required to detect a serializable conflict
failure.

If you don't create the index entry, it'll not be able to create that edge.

Thanks,  I got the difference now, but still not get the necessity of it.   
1.   Assume we don't index it,  in which situation we can get a wrong result?
2.   If we only support "Read Committed" isolation level,  is there a safe way to not index such data?

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: Why to index a "Recently DEAD" tuple when creating index

От
Kuntal Ghosh
Дата:
On Mon, Jun 10, 2019 at 2:12 PM Alex <zhihui.fan1213@gmail.com> wrote:
> On Mon, Jun 10, 2019 at 4:10 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>> I think what I'm trying to say is different.
>>
>> For my case, the sequence is as following:
>> 1. Transaction A has deleted a tuple, say t1 and got committed.
>> 2. Index A has been created successfully.
>> 3. Now, transaction B starts and use the index A to fetch the tuple
>> t1. While doing visibility check, transaction B gets to know that t1
>> has been deleted by a committed transaction A, so it can't see the
>> tuple. But, it creates a dependency edge that transaction A precedes
>> transaction B. This edge is required to detect a serializable conflict
>> failure.
>>
>> If you don't create the index entry, it'll not be able to create that edge.
>
>
> Thanks,  I got the difference now, but still not get the necessity of it.
> 1.   Assume we don't index it,  in which situation we can get a wrong result?

Consider the following sequence of three different transactions X,A and B:

1. Transaction X reads a tuple t2.
2. Transaction A updates the tuple t2, deletes a tuple t1 and gets
committed. So, there transaction X precedes transaction A, i.e., X <-
A.
3. Index A is created successfully.
4. Transaction B starts and use the index A to fetch tuple t1. But,
it's already deleted by the committed transaction A. So, transaction A
precedes transaction B, i.e., A<-B.
5. At this point you've a dangerous structure X<-A<-B (definition of
dangerous structure src/backend/storage/lmgr/README-SSI) in the graph
which can produce an anomaly. For example now, if X tries to update
another tuple previously read by B, you'll have a dependency B<-X.
But, you already have X<-B which leads to serializable conflict.
Postgres tries to resolve this anomaly by rolling back one of the
transaction.

In your case, it'll be difficult to detect.

> 2.   If we only support "Read Committed" isolation level,  is there a safe way to not index such data?
>
I can't think of a case where the RECENTLY_DELETED tuple needs to be
indexed in "Read Committed" case. So, your suggestion likely to work
logically in "Read committed" isolation level. But, I'm not sure
whether you'll encounter any assertion failures in vacuum path or
concurrent index paths.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: Why to index a "Recently DEAD" tuple when creating index

От
Alex
Дата:
Thanks! Appreciate it for your time!

On Mon, Jun 10, 2019 at 5:34 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Mon, Jun 10, 2019 at 2:12 PM Alex <zhihui.fan1213@gmail.com> wrote:
> On Mon, Jun 10, 2019 at 4:10 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>> I think what I'm trying to say is different.
>>
>> For my case, the sequence is as following:
>> 1. Transaction A has deleted a tuple, say t1 and got committed.
>> 2. Index A has been created successfully.
>> 3. Now, transaction B starts and use the index A to fetch the tuple
>> t1. While doing visibility check, transaction B gets to know that t1
>> has been deleted by a committed transaction A, so it can't see the
>> tuple. But, it creates a dependency edge that transaction A precedes
>> transaction B. This edge is required to detect a serializable conflict
>> failure.
>>
>> If you don't create the index entry, it'll not be able to create that edge.
>
>
> Thanks,  I got the difference now, but still not get the necessity of it.
> 1.   Assume we don't index it,  in which situation we can get a wrong result?

Consider the following sequence of three different transactions X,A and B:

1. Transaction X reads a tuple t2.
2. Transaction A updates the tuple t2, deletes a tuple t1 and gets
committed. So, there transaction X precedes transaction A, i.e., X <-
A.
3. Index A is created successfully.
4. Transaction B starts and use the index A to fetch tuple t1. But,
it's already deleted by the committed transaction A. So, transaction A
precedes transaction B, i.e., A<-B.
5. At this point you've a dangerous structure X<-A<-B (definition of
dangerous structure src/backend/storage/lmgr/README-SSI) in the graph
which can produce an anomaly. For example now, if X tries to update
another tuple previously read by B, you'll have a dependency B<-X.
But, you already have X<-B which leads to serializable conflict.
Postgres tries to resolve this anomaly by rolling back one of the
transaction.

In your case, it'll be difficult to detect.

> 2.   If we only support "Read Committed" isolation level,  is there a safe way to not index such data?
>
I can't think of a case where the RECENTLY_DELETED tuple needs to be
indexed in "Read Committed" case. So, your suggestion likely to work
logically in "Read committed" isolation level. But, I'm not sure
whether you'll encounter any assertion failures in vacuum path or
concurrent index paths.


--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: Why to index a "Recently DEAD" tuple when creating index

От
Tom Lane
Дата:
Kuntal Ghosh <kuntalghosh.2007@gmail.com> writes:
>> 2.   If we only support "Read Committed" isolation level,  is there a safe way to not index such data?

> I can't think of a case where the RECENTLY_DELETED tuple needs to be
> indexed in "Read Committed" case.

I think you're making dangerously optimistic assumptions about how
long a query snapshot might survive in READ COMMITTED mode.

In particular, I suspect you're reasoning that the new index couldn't
be used except by a freshly-created query plan, which is possibly
true, and that such a plan must be used with a freshly-created
snapshot, which is simply wrong.  A counterexample could be built
using a SQL or PL function that's marked STABLE, because such a
function is defined to be executed using the calling query's
snapshot.  But it'll make query plans using current reality.

            regards, tom lane



Re: Why to index a "Recently DEAD" tuple when creating index

От
Kuntal Ghosh
Дата:
On Mon, Jun 10, 2019 at 5:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kuntal Ghosh <kuntalghosh.2007@gmail.com> writes:
> >> 2.   If we only support "Read Committed" isolation level,  is there a safe way to not index such data?
>
> > I can't think of a case where the RECENTLY_DELETED tuple needs to be
> > indexed in "Read Committed" case.
>
> I think you're making dangerously optimistic assumptions about how
> long a query snapshot might survive in READ COMMITTED mode.
>
> In particular, I suspect you're reasoning that the new index couldn't
> be used except by a freshly-created query plan, which is possibly
> true, and that such a plan must be used with a freshly-created
> snapshot, which is simply wrong.  A counterexample could be built
> using a SQL or PL function that's marked STABLE, because such a
> function is defined to be executed using the calling query's
> snapshot.  But it'll make query plans using current reality.
>
Wow. I've not thought of this scenario. Also, I'm not aware about this
different snapshot usage as well. I'll debug the same. Thank you Tom.

So, the READ COMMITTED mode will also cause this kind of issues.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com