Обсуждение: [HACKERS] GSoC 2017: weekly progress reports (week 6)

Поиск
Список
Период
Сортировка

[HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Shubham Barai
Дата:
Project: Explicitly support predicate locks in index AMs besides b-tree


I have done following tasks during this week.

1) worked on how to detect rw conflicts when fast update is enabled

2) created tests for different gin operators

3) went through some patches on commitfest to review

4) solved some issues that came up while testing


Sent with Mailtrack

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Shubham Barai
Дата:
Hi,

I am attaching a patch for predicate locking in gin index.


Regards,
Shubham



Sent with Mailtrack

On 11 July 2017 at 19:10, Shubham Barai <shubhambaraiss@gmail.com> wrote:
Project: Explicitly support predicate locks in index AMs besides b-tree


I have done following tasks during this week.

1) worked on how to detect rw conflicts when fast update is enabled

2) created tests for different gin operators

3) went through some patches on commitfest to review

4) solved some issues that came up while testing


Sent with Mailtrack

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Shubham Barai
Дата:
Hi,

Please find the updated patch for predicate locking in gin index here.

There was a small issue in the previous patch. I didn't consider the case
where only root page exists in the tree, and there is a predicate lock on it,
and it gets split. 

If we treat the original page as a left page and create a new root and right
page, then we just need to copy a predicate lock from the left to the right 
page (this is the case in B-tree).

But if we treat the original page as a root and create a new left and right
page, then we need to copy a predicate lock on both new pages (in the case of rum and gin).


Regards,
Shubham




Sent with Mailtrack

On 17 July 2017 at 19:08, Shubham Barai <shubhambaraiss@gmail.com> wrote:
Hi,

I am attaching a patch for predicate locking in gin index.


Regards,
Shubham



Sent with Mailtrack

On 11 July 2017 at 19:10, Shubham Barai <shubhambaraiss@gmail.com> wrote:
Project: Explicitly support predicate locks in index AMs besides b-tree


I have done following tasks during this week.

1) worked on how to detect rw conflicts when fast update is enabled

2) created tests for different gin operators

3) went through some patches on commitfest to review

4) solved some issues that came up while testing


Sent with Mailtrack


Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
Hi!

On Wed, Aug 9, 2017 at 6:30 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
Please find the updated patch for predicate locking in gin index here.

There was a small issue in the previous patch. I didn't consider the case
where only root page exists in the tree, and there is a predicate lock on it,
and it gets split. 

If we treat the original page as a left page and create a new root and right
page, then we just need to copy a predicate lock from the left to the right 
page (this is the case in B-tree).

But if we treat the original page as a root and create a new left and right
page, then we need to copy a predicate lock on both new pages (in the case of rum and gin).


I've assigned to review this patch.  First of all I'd like to understand general idea of this patch.

As I get, you're placing predicate locks to both entry tree leaf pages and posting tree leaf pages.  But, GIN implements so called "fast scan" technique which allows it to skip some leaf pages of posting tree when these pages are guaranteed to not contain matching item pointers.  Wherein the particular order of posting list scan and skip depends of their estimated size (so it's a kind of coincidence).

But thinking about this more generally, I found that proposed locking scheme is redundant.  Currently when entry has posting tree, you're locking both:
1) entry tree leaf page containing pointer to posting tree,
2) leaf pages of corresponding posting tree.
Therefore conflicting transactions accessing same entry would anyway conflict while accessing the same entry tree leaf page.  So, there is no necessity to lock posting tree leaf pages at all.  Alternatively, if entry has posting tree, you can skip locking entry tree leaf page and lock posting tree leaf pages instead.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
On Thu, Sep 28, 2017 at 12:45 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Wed, Aug 9, 2017 at 6:30 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
Please find the updated patch for predicate locking in gin index here.

There was a small issue in the previous patch. I didn't consider the case
where only root page exists in the tree, and there is a predicate lock on it,
and it gets split. 

If we treat the original page as a left page and create a new root and right
page, then we just need to copy a predicate lock from the left to the right 
page (this is the case in B-tree).

But if we treat the original page as a root and create a new left and right
page, then we need to copy a predicate lock on both new pages (in the case of rum and gin).


I've assigned to review this patch.  First of all I'd like to understand general idea of this patch.

As I get, you're placing predicate locks to both entry tree leaf pages and posting tree leaf pages.  But, GIN implements so called "fast scan" technique which allows it to skip some leaf pages of posting tree when these pages are guaranteed to not contain matching item pointers.  Wherein the particular order of posting list scan and skip depends of their estimated size (so it's a kind of coincidence).

But thinking about this more generally, I found that proposed locking scheme is redundant.  Currently when entry has posting tree, you're locking both:
1) entry tree leaf page containing pointer to posting tree,
2) leaf pages of corresponding posting tree.
Therefore conflicting transactions accessing same entry would anyway conflict while accessing the same entry tree leaf page.  So, there is no necessity to lock posting tree leaf pages at all.  Alternatively, if entry has posting tree, you can skip locking entry tree leaf page and lock posting tree leaf pages instead.

I'd like to note that I had following warnings during compilation using clang.

gininsert.c:519:47: warning: incompatible pointer to integer conversion passing 'void *' to parameter of type 'Buffer' (aka 'int') [-Wint-conversion]
                CheckForSerializableConflictIn(index, NULL, NULL);
                                                            ^~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.0.0/include/stddef.h:105:16: note: expanded from macro 'NULL'
#  define NULL ((void*)0)
               ^~~~~~~~~~
../../../../src/include/storage/predicate.h:64:87: note: passing argument to parameter 'buffer' here
extern void CheckForSerializableConflictIn(Relation relation, HeapTuple tuple, Buffer buffer);
                                                                                      ^
1 warning generated.
ginvacuum.c:163:2: warning: implicit declaration of function 'PredicateLockPageCombine' is invalid in C99 [-Wimplicit-function-declaration]
        PredicateLockPageCombine(gvs->index, deleteBlkno, rightlink);
        ^
1 warning generated. 

Also, I tried to remove predicate locks from posting tree leafs.  At least isolation tests passed correctly after this change.

However, after telegram discussion with Andrew Borodin, we decided that it would be better to do predicate locking and conflict checking for posting tree leafs, but skip that for entry tree leafs (in the case when entry has posting tree).  That would give us more granular locking and less false positives.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Shubham Barai
Дата:




Sent with Mailtrack

On 28 September 2017 at 15:49, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Thu, Sep 28, 2017 at 12:45 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Wed, Aug 9, 2017 at 6:30 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
Please find the updated patch for predicate locking in gin index here.

There was a small issue in the previous patch. I didn't consider the case
where only root page exists in the tree, and there is a predicate lock on it,
and it gets split. 

If we treat the original page as a left page and create a new root and right
page, then we just need to copy a predicate lock from the left to the right 
page (this is the case in B-tree).

But if we treat the original page as a root and create a new left and right
page, then we need to copy a predicate lock on both new pages (in the case of rum and gin).


I've assigned to review this patch.  First of all I'd like to understand general idea of this patch.

As I get, you're placing predicate locks to both entry tree leaf pages and posting tree leaf pages.  But, GIN implements so called "fast scan" technique which allows it to skip some leaf pages of posting tree when these pages are guaranteed to not contain matching item pointers.  Wherein the particular order of posting list scan and skip depends of their estimated size (so it's a kind of coincidence).

But thinking about this more generally, I found that proposed locking scheme is redundant.  Currently when entry has posting tree, you're locking both:
1) entry tree leaf page containing pointer to posting tree,
2) leaf pages of corresponding posting tree.
Therefore conflicting transactions accessing same entry would anyway conflict while accessing the same entry tree leaf page.  So, there is no necessity to lock posting tree leaf pages at all.  Alternatively, if entry has posting tree, you can skip locking entry tree leaf page and lock posting tree leaf pages instead.

I'd like to note that I had following warnings during compilation using clang.

gininsert.c:519:47: warning: incompatible pointer to integer conversion passing 'void *' to parameter of type 'Buffer' (aka 'int') [-Wint-conversion]
                CheckForSerializableConflictIn(index, NULL, NULL);
                                                            ^~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.0.0/include/stddef.h:105:16: note: expanded from macro 'NULL'
#  define NULL ((void*)0)
               ^~~~~~~~~~
../../../../src/include/storage/predicate.h:64:87: note: passing argument to parameter 'buffer' here
extern void CheckForSerializableConflictIn(Relation relation, HeapTuple tuple, Buffer buffer);
                                                                                      ^
1 warning generated.
ginvacuum.c:163:2: warning: implicit declaration of function 'PredicateLockPageCombine' is invalid in C99 [-Wimplicit-function-declaration]
        PredicateLockPageCombine(gvs->index, deleteBlkno, rightlink);
        ^
1 warning generated. 

Also, I tried to remove predicate locks from posting tree leafs.  At least isolation tests passed correctly after this change.

However, after telegram discussion with Andrew Borodin, we decided that it would be better to do predicate locking and conflict checking for posting tree leafs, but skip that for entry tree leafs (in the case when entry has posting tree).  That would give us more granular locking and less false positives.


 
Hi Alexander,

I have made changes according to your suggestions. Please have a look at the updated patch. 
I am also considering your suggestions for my other patches also. But, I will need some time to 
make changes as I am currently busy doing my master's.


Kind Regards,
Shubham 

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
On Sat, Sep 30, 2017 at 6:12 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
I have made changes according to your suggestions. Please have a look at the updated patch. 
I am also considering your suggestions for my other patches also. But, I will need some time to 
make changes as I am currently busy doing my master's.

I don't understand why sometimes you call PredicateLockPage() only when fast update is off.  For example:

@@ -94,6 +101,9 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
  break; /* no more pages */
 
  buffer = ginStepRight(buffer, index, GIN_SHARE);
+
+ if (!GinGetUseFastUpdate(index))
+ PredicateLockPage(index, BufferGetBlockNumber(buffer), snapshot);
  }
 
  UnlockReleaseBuffer(buffer);

But sometimes you call PredicateLockPage() unconditionally.

@@ -131,6 +141,8 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
  attnum = scanEntry->attnum;
  attr = btree->ginstate->origTupdesc->attrs[attnum - 1];
 
+ PredicateLockPage(btree->index, stack->buffer, snapshot);
+
  for (;;)
  {
  Page page;

As I understand, all page-level predicate locking should happen only when fast update is off.

Also, despite general idea is described in README-SSI, in-code comments would be useful.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Shubham Barai
Дата:


On 1 October 2017 at 01:47, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Sat, Sep 30, 2017 at 6:12 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
I have made changes according to your suggestions. Please have a look at the updated patch. 
I am also considering your suggestions for my other patches also. But, I will need some time to 
make changes as I am currently busy doing my master's.

I don't understand why sometimes you call PredicateLockPage() only when fast update is off.  For example:

@@ -94,6 +101,9 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
  break; /* no more pages */
 
  buffer = ginStepRight(buffer, index, GIN_SHARE);
+
+ if (!GinGetUseFastUpdate(index))
+ PredicateLockPage(index, BufferGetBlockNumber(buffer), snapshot);
  }
 
  UnlockReleaseBuffer(buffer);

But sometimes you call PredicateLockPage() unconditionally.

@@ -131,6 +141,8 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
  attnum = scanEntry->attnum;
  attr = btree->ginstate->origTupdesc->attrs[attnum - 1];
 
+ PredicateLockPage(btree->index, stack->buffer, snapshot);
+
  for (;;)
  {
  Page page;

As I understand, all page-level predicate locking should happen only when fast update is off.

Also, despite general idea is described in README-SSI, in-code comments would be useful.

 
Hi Alexander,

Yes, page-level predicate locking should happen only when fast update is off.
Actually, I forgot to put conditions in updated patch. Does everything else look ok ?
 



Kind Regards,
Shubham

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
On Sun, Oct 1, 2017 at 11:53 AM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
Yes, page-level predicate locking should happen only when fast update is off.
Actually, I forgot to put conditions in updated patch. Does everything else look ok ?

I think that isolation tests should be improved.  It doesn't seems that any posting tree would be generated by the tests that you've provided, because all the TIDs could fit the single posting list.  Note, that you can get some insight into GIN physical structure using pageinspect contrib.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Michael Paquier
Дата:
On Tue, Oct 3, 2017 at 1:51 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> I think that isolation tests should be improved.  It doesn't seems that any
> posting tree would be generated by the tests that you've provided, because
> all the TIDs could fit the single posting list.  Note, that you can get some
> insight into GIN physical structure using pageinspect contrib.

This thread had no updates for almost two months, so I am marking it
as returned with feedback.
-- 
Michael


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Shubham Barai
Дата:


On 2 October 2017 at 22:21, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Sun, Oct 1, 2017 at 11:53 AM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
Yes, page-level predicate locking should happen only when fast update is off.
Actually, I forgot to put conditions in updated patch. Does everything else look ok ?

I think that isolation tests should be improved.  It doesn't seems that any posting tree would be generated by the tests that you've provided, because all the TIDs could fit the single posting list.  Note, that you can get some insight into GIN physical structure using pageinspect contrib.


I have created new isolation tests. Please have a look at
updated patch.

Regards,
Shubham
Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Thomas Munro
Дата:
On Wed, Jan 3, 2018 at 4:31 AM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
> I have created new isolation tests. Please have a look at
> updated patch.

Hi Shubham,

Could we please have a rebased version of the gin one?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Shubham Barai
Дата:


On 28 February 2018 at 05:51, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Wed, Jan 3, 2018 at 4:31 AM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
> I have created new isolation tests. Please have a look at
> updated patch.

Hi Shubham,

Could we please have a rebased version of the gin one?

Sure. I have attached a rebased version

Regards,
Shubham
Вложения

Re: GSoC 2017: weekly progress reports (week 6)

От
Andres Freund
Дата:
This appears to be a duplicate of https://commitfest.postgresql.org/17/1466/ - as the other one is older, I'm closing
thisone. 

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Andrey Borodin
Дата:
Hi!

> 28 февр. 2018 г., в 22:19, Shubham Barai <shubhambaraiss@gmail.com> написал(а):
>
> Sure. I have attached a rebased version

I've looked into the code closely again. The patch is heavily reworked since GSoC state :)
Tests are looking fine and locking is fine-grained.
But there is one thing I could not understand:
Why do we take a lock during moveRightIfItNeeded()?
This place is supposed to be called whenever page is split just before we a locking it and right after we've come to
thepage from parent. 

Best regards, Andrey Borodin.

Re: GSoC 2017: weekly progress reports (week 6)

От
Alvaro Herrera
Дата:
Andres Freund wrote:

> This appears to be a duplicate of https://commitfest.postgresql.org/17/1466/ - as the other one is older, I'm closing
thisone.
 

This comment makes no sense from the POV of the mail archives.  I had to
look at the User-Agent in your email to realize that you wrote it in the
commitfest app.  I see three problems here

1. impersonating the "From:" header is a bad idea; needs fixed much as
   we did with the bugs and doc comments submission forms
2. it should have had a header indicating it comes from CF app
3. it would be great to include in said header a link to the CF entry
   to which the comment was attached.

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: GSoC 2017: weekly progress reports (week 6)

От
Alvaro Herrera
Дата:
Andres Freund wrote:

> This appears to be a duplicate of https://commitfest.postgresql.org/17/1466/ - as the other one is older, I'm closing
thisone.
 

This comment makes no sense from the POV of the mail archives.  I had to
look at the User-Agent in your email to realize that you wrote it in the
commitfest app.  I see three problems here

1. impersonating the "From:" header is a bad idea; needs fixed much as
   we did with the bugs and doc comments submission forms
2. it should have had a header indicating it comes from CF app
3. it would be great to include in said header a link to the CF entry
   to which the comment was attached.

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alvaro Herrera
Дата:
I suggest to create a new function GinPredicateLockPage() that checks
whether fast update is enabled for the index.  The current arrangement
looks too repetitive and it seems easy to make a mistake.

Stylistically, please keep #include lines ordered alphabetically, and
cut long lines to below 80 chars.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: GSoC 2017: weekly progress reports (week 6)

От
Andres Freund
Дата:
Hi,

On 2018-03-07 11:58:51 -0300, Alvaro Herrera wrote:
> > This appears to be a duplicate of https://commitfest.postgresql.org/17/1466/ - as the other one is older, I'm
closingthis one.
 
> 
> This comment makes no sense from the POV of the mail archives.  I had to
> look at the User-Agent in your email to realize that you wrote it in the
> commitfest app.

Yea, I stopped doing so afterwards...


> 1. impersonating the "From:" header is a bad idea; needs fixed much as
>    we did with the bugs and doc comments submission forms
> 2. it should have had a header indicating it comes from CF app
> 3. it would be great to include in said header a link to the CF entry
>    to which the comment was attached.

Sounds reasonable.

Greetings,

Andres Freund


Re: GSoC 2017: weekly progress reports (week 6)

От
Andres Freund
Дата:
Hi,

On 2018-03-07 11:58:51 -0300, Alvaro Herrera wrote:
> > This appears to be a duplicate of https://commitfest.postgresql.org/17/1466/ - as the other one is older, I'm
closingthis one.
 
> 
> This comment makes no sense from the POV of the mail archives.  I had to
> look at the User-Agent in your email to realize that you wrote it in the
> commitfest app.

Yea, I stopped doing so afterwards...


> 1. impersonating the "From:" header is a bad idea; needs fixed much as
>    we did with the bugs and doc comments submission forms
> 2. it should have had a header indicating it comes from CF app
> 3. it would be great to include in said header a link to the CF entry
>    to which the comment was attached.

Sounds reasonable.

Greetings,

Andres Freund


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Shubham Barai
Дата:


On 07-Mar-2018 11:00 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
I suggest to create a new function GinPredicateLockPage() that checks
whether fast update is enabled for the index.  The current arrangement
looks too repetitive and it seems easy to make a mistake.

Stylistically, please keep #include lines ordered alphabetically, and
cut long lines to below 80 chars.

Okay, I will update the patch.

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
On Wed, Mar 7, 2018 at 8:30 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I suggest to create a new function GinPredicateLockPage() that checks
whether fast update is enabled for the index.  The current arrangement
looks too repetitive and it seems easy to make a mistake.

BTW, should we also skip CheckForSerializableConflictIn() when
fast update is enabled?  AFAICS, now it doesn't cause any errors or
false positives, but makes useless load.  Is it correct?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Andrey Borodin
Дата:

> 12 марта 2018 г., в 1:54, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
>
> On Wed, Mar 7, 2018 at 8:30 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> I suggest to create a new function GinPredicateLockPage() that checks
> whether fast update is enabled for the index.  The current arrangement
> looks too repetitive and it seems easy to make a mistake.
>
> BTW, should we also skip CheckForSerializableConflictIn() when
> fast update is enabled?  AFAICS, now it doesn't cause any errors or
> false positives, but makes useless load.  Is it correct?
>
BTW to BTW. I think we should check pending list size with GinGetPendingListCleanupSize() here
+
+    /*
+     * If fast update is enabled, we acquire a predicate lock on the entire
+     * relation as fast update postpones the insertion of tuples into index
+     * structure due to which we can't detect rw conflicts.
+     */
+    if (GinGetUseFastUpdate(ginstate->index))
+        PredicateLockRelation(ginstate->index, snapshot);

Because we can alter alter index set (fastupdate = off), but there still will be pending list.

We were discussing this with Shubham back in July, chosen some approach that seemed better, but I can't remember what
wasthat... 

Best regards, Andrey Borodin.

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
On Mon, Mar 12, 2018 at 9:47 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> 12 марта 2018 г., в 1:54, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
>
> On Wed, Mar 7, 2018 at 8:30 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> I suggest to create a new function GinPredicateLockPage() that checks
> whether fast update is enabled for the index.  The current arrangement
> looks too repetitive and it seems easy to make a mistake.
>
> BTW, should we also skip CheckForSerializableConflictIn() when
> fast update is enabled?  AFAICS, now it doesn't cause any errors or
> false positives, but makes useless load.  Is it correct?
>
BTW to BTW. I think we should check pending list size with GinGetPendingListCleanupSize() here
+
+       /*
+        * If fast update is enabled, we acquire a predicate lock on the entire
+        * relation as fast update postpones the insertion of tuples into index
+        * structure due to which we can't detect rw conflicts.
+        */
+       if (GinGetUseFastUpdate(ginstate->index))
+               PredicateLockRelation(ginstate->index, snapshot);

Because we can alter alter index set (fastupdate = off), but there still will be pending list.

And what happen if somebody concurrently set (fastupdate = on)?
Can we miss conflicts because of that?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alvaro Herrera
Дата:
Alexander Korotkov wrote:

> And what happen if somebody concurrently set (fastupdate = on)?
> Can we miss conflicts because of that?

I think it'd be better to have that option require AccessExclusive lock,
so that it can never be changed concurrently with readers.  Seems to me
that penalizing every single read to cope with this case would be a bad
trade-off.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Andrey Borodin
Дата:

> 13 марта 2018 г., в 17:02, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
>
> BTW to BTW. I think we should check pending list size with GinGetPendingListCleanupSize() here
> +
> +       /*
> +        * If fast update is enabled, we acquire a predicate lock on the entire
> +        * relation as fast update postpones the insertion of tuples into index
> +        * structure due to which we can't detect rw conflicts.
> +        */
> +       if (GinGetUseFastUpdate(ginstate->index))
> +               PredicateLockRelation(ginstate->index, snapshot);
>
> Because we can alter alter index set (fastupdate = off), but there still will be pending list.
>
> And what happen if somebody concurrently set (fastupdate = on)?
> Can we miss conflicts because of that?
No, AccessExclusiveLock will prevent this kind of problems with enabling fastupdate.

Best regards, Andrey Borodin.

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
On Tue, Mar 13, 2018 at 3:26 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> 13 марта 2018 г., в 17:02, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
>
> BTW to BTW. I think we should check pending list size with GinGetPendingListCleanupSize() here
> +
> +       /*
> +        * If fast update is enabled, we acquire a predicate lock on the entire
> +        * relation as fast update postpones the insertion of tuples into index
> +        * structure due to which we can't detect rw conflicts.
> +        */
> +       if (GinGetUseFastUpdate(ginstate->index))
> +               PredicateLockRelation(ginstate->index, snapshot);
>
> Because we can alter alter index set (fastupdate = off), but there still will be pending list.
>
> And what happen if somebody concurrently set (fastupdate = on)?
> Can we miss conflicts because of that?
No, AccessExclusiveLock will prevent this kind of problems with enabling fastupdate.

True.  I didn't notice that ALTER INDEX SET locks index in so high mode.
Thus, everything is fine from this perspective.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
On Tue, Mar 13, 2018 at 3:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Alexander Korotkov wrote:

> And what happen if somebody concurrently set (fastupdate = on)?
> Can we miss conflicts because of that?

I think it'd be better to have that option require AccessExclusive lock,
so that it can never be changed concurrently with readers.  Seems to me
that penalizing every single read to cope with this case would be a bad
trade-off.

As Andrey Borodin mentioned, we already do.  Sorry for buzz :)

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Shubham Barai
Дата:


On 16 March 2018 at 03:57, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Mar 13, 2018 at 3:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Alexander Korotkov wrote:

> And what happen if somebody concurrently set (fastupdate = on)?
> Can we miss conflicts because of that?

I think it'd be better to have that option require AccessExclusive lock,
so that it can never be changed concurrently with readers.  Seems to me
that penalizing every single read to cope with this case would be a bad
trade-off.

As Andrey Borodin mentioned, we already do.  Sorry for buzz :)



I have updated the patch based on suggestions.

Regards,
Shubham
Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
Hi!

Patch seems good, but I found one bug in it, in fact, nobody
checks serializible conflict with fastupdate=on:
gininsert()
{
    if (GinGetUseFastUpdate())
    {
    /* two next lines are GinCheckForSerializableConflictIn() */
        if (!GinGetUseFastUpdate())
            CheckForSerializableConflictIn()
    }
}

I  changed to direct call CheckForSerializableConflictIn() (see attachment)

I'd like to see fastupdate=on in test too, now tests cover only case without 
fastupdate. Please, add them.

Shubham Barai wrote:
> 
> 
> On 16 March 2018 at 03:57, Alexander Korotkov <a.korotkov@postgrespro.ru 
> <mailto:a.korotkov@postgrespro.ru>> wrote:
> 
>     On Tue, Mar 13, 2018 at 3:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org
>     <mailto:alvherre@alvh.no-ip.org>> wrote:
> 
>         Alexander Korotkov wrote:
> 
>         > And what happen if somebody concurrently set (fastupdate = on)?
>         > Can we miss conflicts because of that?
> 
>         I think it'd be better to have that option require AccessExclusive lock,
>         so that it can never be changed concurrently with readers.  Seems to me
>         that penalizing every single read to cope with this case would be a bad
>         trade-off.
> 
> 
>     As Andrey Borodin mentioned, we already do.  Sorry for buzz :)
> 
> 
> 
> I have updated the patch based on suggestions.
> 
> Regards,
> Shubham

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Dmitry Ivanov
Дата:
> I'd like to see fastupdate=on in test too, now tests cover only case
> without fastupdate. Please, add them.

Here's a couple of tests for pending list (fastupdate = on).

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Dmitry Ivanov
Дата:
>> I'd like to see fastupdate=on in test too, now tests cover only case
>> without fastupdate. Please, add them.
> 
> Here's a couple of tests for pending list (fastupdate = on).

By the way, isn't it strange that permutation "fu1" "rxy3" "wx3" "rxy4" 
"c1" "wy4" "c2" doesn't produce an ERROR?

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
Hi!

I slightly modified test for clean demonstration of difference between 
fastupdate on and off. Also I added CheckForSerializableConflictIn() to 
fastupdate off codepath, but only in case of non-empty pending list.

The next question what I see: why do not we lock entry leaf pages in some cases? 
As I understand, scan should lock any visited page, but now it's true only for 
posting tree. Seems, it also should lock pages in entry tree because concurrent 
procesess could add new entries which could be matched by partial search, for 
example. BTW, partial search (see collectMatchBitmap()) locks correctly entry 
tree, but regular startScanEntry() doesn't lock entry page in case of posting 
tree, only in case of posting list.


Dmitry Ivanov wrote:
>> I'd like to see fastupdate=on in test too, now tests cover only case
>> without fastupdate. Please, add them.
> 
> Here's a couple of tests for pending list (fastupdate = on).
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
> The next question what I see: why do not we lock entry leaf pages in some cases? 

I've modified patch to predicate lock each leaf (entry or posting) page. Now 
patch reaches commitable state from my point view.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alvaro Herrera
Дата:
I don't quite understand the new call in gininsert -- I mean I see that
it wants to check for conflicts even when fastupdate is set, but why?
Maybe, just maybe, it would be better to add a new flag to the
GinCheckForSerializableConflictIn function, that's passed differently
for this one callsite, and then a comment next to it that indicates why
do we test for fastupdates in one case and not the other case.
If you don't like this idea, then I think more commentary on why
fastupdate is not considered in gininsert is warranted.

In startScanEntry, if you "goto restartScanEntry" in the fastupdate
case, are you trying to acquire the same lock again?  Maybe the lock
acquire should occur before the goto target? (If this doesn't matter for
some reason, maybe add a comment about it)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
On Thu, Mar 29, 2018 at 1:38 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
The next question what I see: why do not we lock entry leaf pages in some cases?

I've modified patch to predicate lock each leaf (entry or posting) page. Now patch reaches commitable state from my point view.

I made some enhancements in comments and readme.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:

Alvaro Herrera wrote:
> I don't quite understand the new call in gininsert -- I mean I see that
> it wants to check for conflicts even when fastupdate is set, but why?
If fastupdate is set then we check conflict with whole index, not a particular 
pages in it. Predicate lock on penging list pages will be effectively a lock 
over index, because every scan will begin from pending list and each insert will 
insert into it. I

> Maybe, just maybe, it would be better to add a new flag to the
> GinCheckForSerializableConflictIn function, that's passed differently
> for this one callsite, and then a comment next to it that indicates why
> do we test for fastupdates in one case and not the other case.
> If you don't like this idea, then I think more commentary on why
> fastupdate is not considered in gininsert is warranted.
> 
> In startScanEntry, if you "goto restartScanEntry" in the fastupdate
> case, are you trying to acquire the same lock again?  Maybe the lock
> acquire should occur before the goto target? (If this doesn't matter for
> some reason, maybe add a comment about it)
Thank you for noticing that, I've completely rework this part. Somehow I 
misreaded actual work of GinGetPendingListCleanupSize() :(.

See attached patch
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alvaro Herrera
Дата:
Teodor Sigaev wrote:
> 
> Alvaro Herrera wrote:
> > I don't quite understand the new call in gininsert -- I mean I see that
> > it wants to check for conflicts even when fastupdate is set, but why?
> If fastupdate is set then we check conflict with whole index, not a
> particular pages in it. Predicate lock on penging list pages will be
> effectively a lock over index, because every scan will begin from pending
> list and each insert will insert into it. I

Oh, right, that makes sense.  I'm not sure that the comments explain
this sufficiently -- I think it'd be good to expand that.


Given this patch, it seems clear that serializable mode is much worse
with fastupdate than without it!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
Thanks for everyone, pushed

> Oh, right, that makes sense.  I'm not sure that the comments explain
> this sufficiently -- I think it'd be good to expand that.
I improved comments

> Given this patch, it seems clear that serializable mode is much worse
> with fastupdate than without it!
That's true. Any list-based approaches like pending lis, brin or bloom indexes 
could work only with locking entire relation.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Tom Lane
Дата:
Teodor Sigaev <teodor@sigaev.ru> writes:
> Thanks for everyone, pushed

prion doesn't like this patch, which probably means that something is
trying to use the content of a relcache entry after having closed it.

            regards, tom lane


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Tom Lane
Дата:
I wrote:
> prion doesn't like this patch, which probably means that something is
> trying to use the content of a relcache entry after having closed it.

Oh, sorry, scratch that --- looking closer, it was failing before this.

            regards, tom lane


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Heikki Linnakangas
Дата:
On 16/03/18 00:26, Alexander Korotkov wrote:
> On Tue, Mar 13, 2018 at 3:26 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> On 13/03/18 14:02, Alexander Korotkov wrote:
>>> And what happen if somebody concurrently set (fastupdate = on)?
>>> Can we miss conflicts because of that?
>>
>> No, AccessExclusiveLock will prevent this kind of problems with enabling
>> fastupdate.
> 
> True.  I didn't notice that ALTER INDEX SET locks index in so high mode.
> Thus, everything is fine from this perspective.

Nope, an AccessExclusiveLock is not good enough. Predicate locks stay 
around after the transaction has committed and regular locks have been 
released.

Attached is a test case that demonstrates a case where we miss a 
serialization failure, when fastupdate is turned on concurrently. It 
works on v10, but fails to throw a serialization error on v11.

- Heikki

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Heikki Linnakangas
Дата:
On 28/03/18 19:53, Teodor Sigaev wrote:
> Hi!
> 
> I slightly modified test for clean demonstration of difference between
> fastupdate on and off. Also I added CheckForSerializableConflictIn() to
> fastupdate off codepath, but only in case of non-empty pending list.
> 
> The next question what I see: why do not we lock entry leaf pages in some cases?

Why should we?

> As I understand, scan should lock any visited page, but now it's true only for
> posting tree. Seems, it also should lock pages in entry tree because concurrent
> procesess could add new entries which could be matched by partial search, for
> example. BTW, partial search (see collectMatchBitmap()) locks correctly entry
> tree, but regular startScanEntry() doesn't lock entry page in case of posting
> tree, only in case of posting list.
I think this needs some high-level comments or README to explain how the 
locking works. It seems pretty ad hoc at the moment. And incorrect.

1. Why do we lock all posting tree pages, even though they all represent 
the same value? Isn't it enough to lock the root of the posting tree?

2. Why do we lock any posting tree pages at all, if we lock the entry 
tree page anyway? Isn't the lock on the entry tree page sufficient to 
cover the key value?

3. Why do we *not* lock the entry leaf page, if there is no match? We 
still need a lock to remember that we probed for that value and there 
was no match, so that we conflict with a tuple that might be inserted later.

At least #3 is a bug. The attached patch adds an isolation test that 
demonstrates it. #1 and #2 are weird, and cause unnecessary locking, so 
I think we should fix those too, even if they don't lead to incorrect 
results.

Remember, the purpose of predicate locks is to lock key ranges, not 
physical pages or tuples in the index. We use leaf pages as handy 
shortcut for "any key value that would belong on this page", but it is 
just an implementation detail.

I took a stab at fixing those issues, as well as the bug when fastupdate 
is turned on concurrently. Does the attached patch look sane to you?

- Heikki

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
> Attached is a test case that demonstrates a case where we miss a serialization 
> failure, when fastupdate is turned on concurrently. It works on v10, but fails 
> to throw a serialization error on v11.
Thank you for reserching!

Proof of concept:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 43b2fce2c5..b8291f96e2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10745,6 +10745,7 @@ ATExecSetRelOptions(Relation rel, List *defList, 
AlterTableType operation,
         case RELKIND_INDEX:
         case RELKIND_PARTITIONED_INDEX:
             (void) index_reloptions(rel->rd_amroutine->amoptions, newOptions, 
true);
+           TransferPredicateLocksToHeapRelation(rel);
             break;
         default:
             ereport(ERROR,

it fixes pointed bug, but will gives false positives. Right place for that in 
ginoptions function, but ginoptions doesn't has an access to relation structure 
and I don't see a reason why it should.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
Ugh, I miss your last email where you another locking protocol. Reading.

Teodor Sigaev wrote:
>> Attached is a test case that demonstrates a case where we miss a serialization 
>> failure, when fastupdate is turned on concurrently. It works on v10, but fails 
>> to throw a serialization error on v11.
> Thank you for reserching!
> 
> Proof of concept:
> diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
> index 43b2fce2c5..b8291f96e2 100644
> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -10745,6 +10745,7 @@ ATExecSetRelOptions(Relation rel, List *defList, 
> AlterTableType operation,
>          case RELKIND_INDEX:
>          case RELKIND_PARTITIONED_INDEX:
>              (void) index_reloptions(rel->rd_amroutine->amoptions, newOptions, 
> true);
> +           TransferPredicateLocksToHeapRelation(rel);
>              break;
>          default:
>              ereport(ERROR,
> 
> it fixes pointed bug, but will gives false positives. Right place for that in 
> ginoptions function, but ginoptions doesn't has an access to relation structure 
> and I don't see a reason why it should.
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
Hi!


> 1. Why do we lock all posting tree pages, even though they all represent the 
> same value? Isn't it enough to lock the root of the posting tree?
> 2. Why do we lock any posting tree pages at all, if we lock the entry tree page 
> anyway? Isn't the lock on the entry tree page sufficient to cover the key value?
> 3. Why do we *not* lock the entry leaf page, if there is no match? We still need 
> a lock to remember that we probed for that value and there was no match, so that 
> we conflict with a tuple that might be inserted later.
> 
> At least #3 is a bug. The attached patch adds an isolation test that 
> demonstrates it. #1 and #2 are weird, and cause unnecessary locking, so I think 
> we should fix those too, even if they don't lead to incorrect results.

I can't find a hole here. Agree.

> I took a stab at fixing those issues, as well as the bug when fastupdate is 
> turned on concurrently. Does the attached patch look sane to you?

I like an idea use metapage locking, thank you. Patch seems good, will you push it?

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alexander Korotkov
Дата:
Hi!

Thank you for taking a look at this patch.  I really appreciate your
attention over complex subjects like this.

On Mon, Apr 9, 2018 at 1:33 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 28/03/18 19:53, Teodor Sigaev wrote:
As I understand, scan should lock any visited page, but now it's true only for
posting tree. Seems, it also should lock pages in entry tree because concurrent
procesess could add new entries which could be matched by partial search, for
example. BTW, partial search (see collectMatchBitmap()) locks correctly entry
tree, but regular startScanEntry() doesn't lock entry page in case of posting
tree, only in case of posting list.
I think this needs some high-level comments or README to explain how the locking works. It seems pretty ad hoc at the moment. And incorrect.
 
I agree that explanation in README in insufficient.

1. Why do we lock all posting tree pages, even though they all represent the same value? Isn't it enough to lock the root of the posting tree?

2. Why do we lock any posting tree pages at all, if we lock the entry tree page anyway? Isn't the lock on the entry tree page sufficient to cover the key value?

I already have similar concerns in [1].  The idea of locking posting tree leafs was to
get more granular locks.  I think you've correctly describe it in the commit message
here:

With a very large posting tree, it would
possibly be better to lock the posting tree leaf pages instead, so that a
"skip scan" with a query like "A & B", you could avoid unnecessary conflict
if a new tuple is inserted with A but !B. But let's keep this simple.

However, it's very complex and error prone.  So, +1 for simplify it for v11.
 
3. Why do we *not* lock the entry leaf page, if there is no match? We still need a lock to remember that we probed for that value and there was no match, so that we conflict with a tuple that might be inserted later.
 
+1

At least #3 is a bug. The attached patch adds an isolation test that demonstrates it. #1 and #2 are weird, and cause unnecessary locking, so I think we should fix those too, even if they don't lead to incorrect results.

Remember, the purpose of predicate locks is to lock key ranges, not physical pages or tuples in the index. We use leaf pages as handy shortcut for "any key value that would belong on this page", but it is just an implementation detail.

I took a stab at fixing those issues, as well as the bug when fastupdate is turned on concurrently. Does the attached patch look sane to you?

Teodor has already answered you, and I'd like to mention that your patch
looks good for me too.


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Alvaro Herrera
Дата:
Heikki Linnakangas wrote:

> Remember, the purpose of predicate locks is to lock key ranges, not physical
> pages or tuples in the index. We use leaf pages as handy shortcut for "any
> key value that would belong on this page", but it is just an implementation
> detail.

Hmm ... so, thinking about pending list locking, would it work to
acquire locks on the posting tree's root of each item in the pending
list, when the item is put in the pending list? (even if we insert the
item in the pending list instead of its posting tree).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Andrey Borodin
Дата:
> 9 апр. 2018 г., в 19:50, Teodor Sigaev <teodor@sigaev.ru> написал(а):
>>
>> 3. Why do we *not* lock the entry leaf page, if there is no match? We still need a lock to remember that we probed
forthat value and there was no match, so that we conflict with a tuple that might be inserted later. 
>> At least #3 is a bug. The attached patch adds an isolation test that demonstrates it. #1 and #2 are weird, and cause
unnecessarylocking, so I think we should fix those too, even if they don't lead to incorrect results. 
>
> I can't find a hole here. Agree.
Please correct me if I'm wrong.
Let's say we have posting trees for word A and word B.
We are looking for a document that contains both.
We will read through all posting tree of A, but only through some segments of B.
If we will not find anything in B, we have to lock only segments where we actually were looking, not all the posting
treeof B. 

BTW I do not think that we lock ranges. We lock possibility of appearance of tuples that we might find. Ranges are
shortcutsfor places where we were looking.. That's how I understand, chances are I'm missing something. 

Best regards, Andrey Borodin.

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:

Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
> 
>> Remember, the purpose of predicate locks is to lock key ranges, not physical
>> pages or tuples in the index. We use leaf pages as handy shortcut for "any
>> key value that would belong on this page", but it is just an implementation
>> detail.
> 
> Hmm ... so, thinking about pending list locking, would it work to
> acquire locks on the posting tree's root of each item in the pending
> list, when the item is put in the pending list? (even if we insert the
> item in the pending list instead of its posting tree).

Items in pending list doesn't use posting tree or list, pending list is just 
list of pair (ItemPointer to heap, entry) represented as IndexTuple. There is no 
order in pending list, so Heikki suggests to lock metapage always instead of 
locking whole relation if fastupdate=on. If fastupdate is off then insertion 
process will not change metapage.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Heikki Linnakangas
Дата:
On 09/04/18 18:04, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
> 
>> Remember, the purpose of predicate locks is to lock key ranges, not physical
>> pages or tuples in the index. We use leaf pages as handy shortcut for "any
>> key value that would belong on this page", but it is just an implementation
>> detail.
> 
> Hmm ... so, thinking about pending list locking, would it work to
> acquire locks on the posting tree's root of each item in the pending
> list, when the item is put in the pending list? (even if we insert the
> item in the pending list instead of its posting tree).

Hmm, you mean, when inserting a new tuple? Yes, that would be correct. I 
don't think it would perform very well, though. If you have to traverse 
down to the posting trees, anyway, then you might as well insert the new 
tuples there directly, and forget about the pending list.

- Heikki


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Heikki Linnakangas
Дата:
On 09/04/18 18:21, Andrey Borodin wrote:
> 
>> 9 апр. 2018 г., в 19:50, Teodor Sigaev <teodor@sigaev.ru>
>> написал(а):
>>> 
>>> 3. Why do we *not* lock the entry leaf page, if there is no
>>> match? We still need a lock to remember that we probed for that
>>> value and there was no match, so that we conflict with a tuple
>>> that might be inserted later. At least #3 is a bug. The attached
>>> patch adds an isolation test that demonstrates it. #1 and #2 are
>>> weird, and cause unnecessary locking, so I think we should fix
>>> those too, even if they don't lead to incorrect results.
>> 
>> I can't find a hole here. Agree.
> Please correct me if I'm wrong. Let's say we have posting trees for
> word A and word B. We are looking for a document that contains both. 
> We will read through all posting tree of A, but only through some
> segments of B. If we will not find anything in B, we have to lock
> only segments where we actually were looking, not all the posting
> tree of B.

True, that works. It was not clear from the code or comments that that 
was intended. I'm not sure if that's worthwhile, compared to locking 
just the posting tree root block. I'll let Teodor decide..

> BTW I do not think that we lock ranges. We lock possibility of
> appearance of tuples that we might find. Ranges are shortcuts for
> places where we were looking.. That's how I understand, chances are
> I'm missing something.

Yeah, that's one way of thinking about it.

- Heikki


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Andrey Borodin
Дата:


9 апр. 2018 г., в 23:04, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):

On 09/04/18 18:21, Andrey Borodin wrote:
9 апр. 2018 г., в 19:50, Teodor Sigaev <teodor@sigaev.ru>
написал(а):
3. Why do we *not* lock the entry leaf page, if there is no
match? We still need a lock to remember that we probed for that
value and there was no match, so that we conflict with a tuple
that might be inserted later. At least #3 is a bug. The attached
patch adds an isolation test that demonstrates it. #1 and #2 are
weird, and cause unnecessary locking, so I think we should fix
those too, even if they don't lead to incorrect results.
I can't find a hole here. Agree.
Please correct me if I'm wrong. Let's say we have posting trees for
word A and word B. We are looking for a document that contains both. We will read through all posting tree of A, but only through some
segments of B. If we will not find anything in B, we have to lock
only segments where we actually were looking, not all the posting
tree of B.

True, that works. It was not clear from the code or comments that that was intended. I'm not sure if that's worthwhile, compared to locking just the posting tree root block.
From the text search POV this is kind of bulky granularity: if you have frequent words like "the", "a", "in", conflicts are inevitable.
I'm not sure we have means for picking optimal granularity: should it be ranges of postings, ranges of pages of posting trees, entries, pages of entries or whole index.
Technically, [time for locking] should be less than [time of transaction retry]*[probability of conflict]. Holding this constraint we should minimize [time for locking] + [time of transaction retry]*[probability of conflict].
I suspect that [time for locking] is some orders of magnitude less than time of transaction. So, efforts should be skewed towards smaller granularity to reduce [probability of conflict].
But all this is not real math and have no strength of science.

I'll let Teodor decide..
+1. I belive this is very close to optimal solution :)

Best regards, Andrey Borodin.

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
> I took a stab at fixing those issues, as well as the bug when fastupdate is 
> turned on concurrently. Does the attached patch look sane to you?
That's look sane, and I believe it should be applied but I see some issue in 
your patch:

I don't very happy with rootBuffer added everywhere. ginInsertItemPointers() and 
  ginPrepareDataScan() now take both args, rootBlkno and rootBuffer, second 
could be invalid. As I can see, you did it to call 
CheckForSerializableConflictIn() in ginEntryInsert(). Seems, it could be 
reverted and CheckForSerializableConflictIn() should be added to 
ginFindLeafPage() with searchMode = false.

Rebased patch is attached.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
> I don't very happy with rootBuffer added everywhere. ginInsertItemPointers() and 
>   ginPrepareDataScan() now take both args, rootBlkno and rootBuffer, second 
> could be invalid. As I can see, you did it to call 
> CheckForSerializableConflictIn() in ginEntryInsert(). Seems, it could be 
> reverted and CheckForSerializableConflictIn() should be added to 
> ginFindLeafPage() with searchMode = false.
Implemented, v3 is attached.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

От
Teodor Sigaev
Дата:
Thanks to everyone, v3 is pushed.

Teodor Sigaev wrote:
>> I don't very happy with rootBuffer added everywhere. ginInsertItemPointers() 
>> and   ginPrepareDataScan() now take both args, rootBlkno and rootBuffer, 
>> second could be invalid. As I can see, you did it to call 
>> CheckForSerializableConflictIn() in ginEntryInsert(). Seems, it could be 
>> reverted and CheckForSerializableConflictIn() should be added to 
>> ginFindLeafPage() with searchMode = false.
> Implemented, v3 is attached.
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/