Обсуждение: Cache Hash Index meta page.

Поиск
Список
Период
Сортировка

Cache Hash Index meta page.

От
Mithun Cy
Дата:

I have created a patch to cache the meta page of Hash index in backend-private memory. This is to save reading the meta page buffer every time when we want to find the bucket page. In “_hash_first” call, we try to read meta page buffer twice just to make sure bucket is not split after we found bucket page. With this patch meta page buffer read is not done, if the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket number in hasho_prevblkno of bucket primary page (which will always be NULL other wise, so reusing it here for this cause!!!). So when we try to do hash lookup for bucket page if locally cached maxbucket number is greater than or equal to bucket page's maxbucket number then we can say given bucket is not split after we have cached the meta page. Hence avoid reading meta page buffer.

I have attached the benchmark results and perf stats (refer hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2: Benchmark results). There we can see improvements at higher clients, as lwlock contentions due to buffer read are more at higher clients. If I apply the same patch on Amit's concurrent hash index patch [1] we can see improvements at lower clients also. Amit's patch has removed a heavy weight page lock which was the bottle neck at lower clients.

[1] Concurrent Hash Indexes


--

Thanks and Regards
Mithun C Y

Вложения

Re: Cache Hash Index meta page.

От
Jeff Janes
Дата:
On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> I have created a patch to cache the meta page of Hash index in
> backend-private memory. This is to save reading the meta page buffer every
> time when we want to find the bucket page. In “_hash_first” call, we try to
> read meta page buffer twice just to make sure bucket is not split after we
> found bucket page. With this patch meta page buffer read is not done, if the
> bucket is not split after caching the meta page.
>
> Idea is to cache the Meta page data in rd_amcache and store maxbucket number
> in hasho_prevblkno of bucket primary page (which will always be NULL other
> wise, so reusing it here for this cause!!!).

If it is otherwise unused, shouldn't we rename the field to reflect
what it is now used for?

What happens on a system which has gone through pg_upgrade?  Are we
sure that those on-disk representations will always have
InvalidBlockNumber in that fields?  If not, then it seems we can't
support pg_upgrade at all.  If so, I don't see a provision for
properly dealing with pages which still have InvalidBlockNumber in
them.  Unless I am missing something, the code below will always think
it found the right bucket in such cases, won't it?

if (opaque->hasho_prevblkno <=  metap->hashm_maxbucket)

Cheers,

Jeff



Re: Cache Hash Index meta page.

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> I have created a patch to cache the meta page of Hash index in
>> backend-private memory. This is to save reading the meta page buffer every
>> time when we want to find the bucket page. In “_hash_first” call, we try to
>> read meta page buffer twice just to make sure bucket is not split after we
>> found bucket page. With this patch meta page buffer read is not done, if the
>> bucket is not split after caching the meta page.

Is this really safe?  The metapage caching in btree is all right because
the algorithm is guaranteed to work even if it starts with a stale idea of
where the root page is.  I do not think the hash code is equally robust
about stale data in its metapage.

>> Idea is to cache the Meta page data in rd_amcache and store maxbucket number
>> in hasho_prevblkno of bucket primary page (which will always be NULL other
>> wise, so reusing it here for this cause!!!).

> If it is otherwise unused, shouldn't we rename the field to reflect
> what it is now used for?

No, because on other pages that still means what it used to.  Nonetheless,
I agree that's a particularly ugly wart, and probably a dangerous one.

> What happens on a system which has gone through pg_upgrade?

That being one reason why.  It might be okay if we add another hasho_flag
bit saying that hasho_prevblkno really contains a maxbucket number, and
then add tests for that bit everyplace that hasho_prevblkno is referenced.
        regards, tom lane



Re: Cache Hash Index meta page.

От
Amit Kapila
Дата:
On Thu, Aug 4, 2016 at 3:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>>> I have created a patch to cache the meta page of Hash index in
>>> backend-private memory. This is to save reading the meta page buffer every
>>> time when we want to find the bucket page. In “_hash_first” call, we try to
>>> read meta page buffer twice just to make sure bucket is not split after we
>>> found bucket page. With this patch meta page buffer read is not done, if the
>>> bucket is not split after caching the meta page.
>
> Is this really safe?  The metapage caching in btree is all right because
> the algorithm is guaranteed to work even if it starts with a stale idea of
> where the root page is.  I do not think the hash code is equally robust
> about stale data in its metapage.
>

I think stale data in metapage could only cause problem if it leads to
a wrong calculation of bucket based on hashkey.  I think that
shouldn't happen.  It seems to me that the safety comes from the fact
that required fields (lowmask/highmask) to calculate the bucket won't
be changed more than once without splitting the current bucket (which
we are going to scan).   Do you see a problem in hashkey to bucket
mapping (_hash_hashkey2bucket), if the lowmask/highmask are changed by
one additional table half or do you have something else in mind?

>
>> What happens on a system which has gone through pg_upgrade?
>
> That being one reason why.  It might be okay if we add another hasho_flag
> bit saying that hasho_prevblkno really contains a maxbucket number, and
> then add tests for that bit everyplace that hasho_prevblkno is referenced.
>

Good idea.

- if (retry)
+ if (opaque->hasho_prevblkno <=  metap->hashm_maxbucket)

This code seems to be problematic with respect to upgrades, because
hasho_prevblkno will be initialized to 0xFFFFFFFF without the patch.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Cache Hash Index meta page.

От
Jesper Pedersen
Дата:
On 07/22/2016 06:02 AM, Mithun Cy wrote:
> I have created a patch to cache the meta page of Hash index in
> backend-private memory. This is to save reading the meta page buffer every
> time when we want to find the bucket page. In “_hash_first” call, we try to
> read meta page buffer twice just to make sure bucket is not split after we
> found bucket page. With this patch meta page buffer read is not done, if
> the bucket is not split after caching the meta page.
>
> Idea is to cache the Meta page data in rd_amcache and store maxbucket
> number in hasho_prevblkno of bucket primary page (which will always be NULL
> other wise, so reusing it here for this cause!!!). So when we try to do
> hash lookup for bucket page if locally cached maxbucket number is greater
> than or equal to bucket page's maxbucket number then we can say given
> bucket is not split after we have cached the meta page. Hence avoid reading
> meta page buffer.
>
> I have attached the benchmark results and perf stats (refer
> hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
> Benchmark results). There we can see improvements at higher clients, as
> lwlock contentions due to buffer read are more at higher clients. If I
> apply the same patch on Amit's concurrent hash index patch [1] we can see
> improvements at lower clients also. Amit's patch has removed a heavy weight
> page lock which was the bottle neck at lower clients.
>

Could you provide a rebased patch based on Amit's v5 ?

Best regards, Jesper





Re: Cache Hash Index meta page.

От
Mithun Cy
Дата:

On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com> wrote:
> Could you provide a rebased patch based on Amit's v5 ?

Please find the the patch, based on Amit's V5.

I have fixed following things

1. now in "_hash_first" we check if (opaque->hasho_prevblkno == InvalidBlockNumber) to see if bucket is from older version hashindex and has been upgraded. Since as of now InvalidBlockNumber is one value greater than maximum value the variable "metap->hashm_maxbucket" can be set (see _hash_expandtable). We can distinguish it from rest. I tested the upgrade issue reported by amit. It is fixed now.

2. One case which buckets hasho_prevblkno is used is where we do backward scan. So now before testing for previous block number I test whether current page is bucket page if so we end the bucket scan (see changes in _hash_readprev). On other places where hasho_prevblkno is used it is not for bucket page, so I have not put any extra check to verify if is a bucket page.

Вложения

Re: Cache Hash Index meta page.

От
Amit Kapila
Дата:
On Tue, Sep 6, 2016 at 12:20 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com>
> wrote:
>> Could you provide a rebased patch based on Amit's v5 ?
>
> Please find the the patch, based on Amit's V5.
>

I think you want to say based on patch in the below mail:
https://www.postgresql.org/message-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm%2Bqn0jZX31-obXrJw%40mail.gmail.com

It is better if we can provide the link for a patch on which the
current patch is based on, that will help people to easily identify
the dependent patches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Cache Hash Index meta page.

От
Jesper Pedersen
Дата:
On 09/05/2016 02:50 PM, Mithun Cy wrote:
> On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com>
> wrote:
>> Could you provide a rebased patch based on Amit's v5 ?
>
> Please find the the patch, based on Amit's V5.
>
> I have fixed following things
>
> 1. now in "_hash_first" we check if (opaque->hasho_prevblkno ==
> InvalidBlockNumber) to see if bucket is from older version hashindex and
> has been upgraded. Since as of now InvalidBlockNumber is one value greater
> than maximum value the variable "metap->hashm_maxbucket" can be set (see
> _hash_expandtable). We can distinguish it from rest. I tested the upgrade
> issue reported by amit. It is fixed now.
>
> 2. One case which buckets hasho_prevblkno is used is where we do backward
> scan. So now before testing for previous block number I test whether
> current page is bucket page if so we end the bucket scan (see changes in
> _hash_readprev). On other places where hasho_prevblkno is used it is not
> for bucket page, so I have not put any extra check to verify if is a bucket
> page.
>

I think that the

+ pageopaque->hasho_prevblkno = metap->hashm_maxbucket;

trick should be documented in the README, as hashm_maxbucket is defined 
as uint32 where as hasho_prevblkno is a BlockNumber.

(All bucket variables should probably use the Bucket definition instead 
of uint32).

For the archives, this patch conflicts with the WAL patch [1].

[1] 
https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com

Best regards, Jesper




Re: Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Thu, Sep 8, 2016 at 11:21 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
> For the archives, this patch conflicts with the WAL patch [1].

> [1] https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com

Updated the patch it applies over Amit's concurrent hash index[1] and Amit's wal for hash index patch[2] together.

--
Thanks and Regards
Mithun C Y

Вложения

Re: Cache Hash Index meta page.

От
Jeff Janes
Дата:
On Tue, Sep 13, 2016 at 12:55 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
On Thu, Sep 8, 2016 at 11:21 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
> For the archives, this patch conflicts with the WAL patch [1].

> [1] https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com

Updated the patch it applies over Amit's concurrent hash index[1] and Amit's wal for hash index patch[2] together.

I think that this needs to be updated again for v8 of concurrent and v5 of wal

Thanks,

Jeff

Re: Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
 > I think that this needs to be updated again for v8 of concurrent and v5 of wal

Adding the rebased patch over [1] + [2]


--
Thanks and Regards
Mithun C Y

Вложения

Re: Cache Hash Index meta page.

От
Michael Paquier
Дата:
On Thu, Sep 29, 2016 at 12:55 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>  > I think that this needs to be updated again for v8 of concurrent and v5
> of wal
>
> Adding the rebased patch over [1] + [2]
>
> [1] Concurrent Hash index.
> [2] Wal for hash index.

Moved to next CF.
-- 
Michael



Re: Cache Hash Index meta page.

От
Jeff Janes
Дата:

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

I have created a patch to cache the meta page of Hash index in backend-private memory. This is to save reading the meta page buffer every time when we want to find the bucket page. In “_hash_first” call, we try to read meta page buffer twice just to make sure bucket is not split after we found bucket page. With this patch meta page buffer read is not done, if the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket number in hasho_prevblkno of bucket primary page (which will always be NULL other wise, so reusing it here for this cause!!!). So when we try to do hash lookup for bucket page if locally cached maxbucket number is greater than or equal to bucket page's maxbucket number then we can say given bucket is not split after we have cached the meta page. Hence avoid reading meta page buffer.

I have attached the benchmark results and perf stats (refer hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2: Benchmark results). There we can see improvements at higher clients, as lwlock contentions due to buffer read are more at higher clients. If I apply the same patch on Amit's concurrent hash index patch [1] we can see improvements at lower clients also. Amit's patch has removed a heavy weight page lock which was the bottle neck at lower clients.

[1] Concurrent Hash Indexes


Hi Mithun,

Can you describe your benchmarking machine?  Your benchmarking data went up to 128 clients.  But how many cores does the machine have?  Are you testing how well it can use the resources it has, or how well it can deal with oversubscription of the resources?

Also, was the file supposed to be named .ods?  I didn't find it to be openable as an .odc file.

Cheers,

Jeff


Re: Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Tue, Oct 4, 2016 at 11:55 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>Can you describe your benchmarking machine?  Your benchmarking data went up to 128 clients.  But how many cores does the machine have?  Are
>you testing how well it can use the resources it has, or how well it can deal with oversubscription of the resources?

It is a power2 machine with 192 hyperthreads.

Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                192
On-line CPU(s) list:   0-191
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             24
NUMA node(s):          4
Model:                 IBM,8286-42A
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0-47
NUMA node1 CPU(s):     48-95
NUMA node2 CPU(s):     96-143
NUMA node3 CPU(s):     144-191
 
 >Also, was the file supposed to be named .ods?  I didn't find it to be openable as an .odc file.

Yes .ods right it is a spreadsheet in ODF.

--
Thanks and Regards
Mithun C Y

Re: Cache Hash Index meta page.

От
Jesper Pedersen
Дата:
On 09/28/2016 11:55 AM, Mithun Cy wrote:
> On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>  > I think that this needs to be updated again for v8 of concurrent and v5
> of wal
>
> Adding the rebased patch over [1] + [2]
>

As the concurrent hash index patch was committed in 6d46f4 this patch 
needs a rebase.

I have moved this submission to the next CF.

Thanks for working on this !

Best regards, Jesper





Re: Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
>As the concurrent hash index patch was committed in 6d46f4 this patch needs a rebase.

Thanks Jesper,

Adding the rebased patch.

I have re-run the pgbench readonly tests with below modification.

"alter table pgbench_accounts drop constraint pgbench_accounts_pkey" postgres
"create index pgbench_accounts_pkey on pgbench_accounts using hash(aid)" postgres

Postgres Server settings:
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c checkpoint_completion_target=0.9

pgbench settings:
scale_factor = 300 (so database fits in shared_buffer)
Mode =  -M prepared -S (prepared readonly mode).

Machine used:
power2 with sufficient ram for above shared_buffer.

#############lscpu
CPU(s):                192
On-line CPU(s) list:   0-191
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             24
NUMA node(s):          4
Model:                 IBM,8286-42A

Clients     

Cache  Meta Page patch                   

Base code with amits changes

      %imp

1

17062.513102

17218.353817 

-0.9050848685

8

138525.808342

128149.381759

8.0971335488

16

212278.44762

205870.456661

3.1126326054

32

369453.224112

360423.566937

2.5052904425

64

576090.293018

510665.044842

12.8117733604

96

686813.187117

504950.885867

36.0158396272

104

688932.67516

498365.55841

38.2384202789

128

730728.526322

409011.008553

78.6574226711


Appears there is a good improvement at higher clients.


--
Thanks and Regards
Mithun C Y

Вложения

Re: Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Tue, Dec 6, 2016 at 1:28 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Clients     

Cache  Meta Page patch                   

Base code with amits changes

      %imp

1

17062.513102

17218.353817 

-0.9050848685

8

138525.808342

128149.381759

8.0971335488

16

212278.44762

205870.456661

3.1126326054

32

369453.224112

360423.566937

2.5052904425

64

576090.293018

510665.044842

12.8117733604

96

686813.187117

504950.885867

36.0158396272

104

688932.67516

498365.55841

38.2384202789

128

730728.526322

409011.008553

78.6574226711


All the above readings are median of 3 runs. 

--
Thanks and Regards
Mithun C Y

Re: Cache Hash Index meta page.

От
Robert Haas
Дата:
On Mon, Dec 5, 2016 at 2:58 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
>As the concurrent hash index patch was committed in 6d46f4 this patch needs a rebase.

Thanks Jesper,

Adding the rebased patch.

-        bucket = _hash_hashkey2bucket(hashkey,
-                                      metap->hashm_maxbucket,
+        bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
                                       metap->hashm_highmask,
                                       metap->hashm_lowmask);

This hunk appears useless.

+        metap = (HashMetaPage)rel->rd_amcache;

Whitespace.

+        /*  Cache the metapage data for next time*/

Whitespace.

+        /* Check if this bucket is split after we have cached the metapage.

Whitespace.

Shouldn't _hash_doinsert() be using the cache, too?

I think it's probably not a good idea to cache the entire metapage.  The only things that you are "really" trying to cache, I think, are hashm_maxbucket, hashm_lowmask, and hashm_highmask.  The entire HashPageMetaData structure is 696 bytes on my machine, and it doesn't really make sense to copy the whole thing into memory if you only need 16 bytes of it.  It could even be dangerous -- if somebody tries to rely on the cache for some other bit of data and we're not really guaranteeing that it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members hmc_maxbucket, hmc_lowmask, and hmc_highmask.  The structure can have a comment explaining that we only care about having the data be fresh enough to test whether the bucket mapping we computed for a tuple is still correct, and that for that to be the case we only need to know whether a bucket has suffered a new split since we last refreshed the cache.

The comments in this patch need some work, e.g.:

-
+       oopaque->hasho_prevblkno = maxbucket;

No comment?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
Thanks Robert, I have tried to address all of the comments,
On Tue, Dec 6, 2016 at 2:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
+        bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
                                       metap->hashm_highmask,
                                       metap->hashm_lowmask);

This hunk appears useless.

+        metap = (HashMetaPage)rel->rd_amcache;

Whitespace.

Fixed. 

+        /*  Cache the metapage data for next time*/

Whitespace.

Fixed. 
+        /* Check if this bucket is split after we have cached the metapage.

Whitespace.

Fixed. 

Shouldn't _hash_doinsert() be using the cache, too?

Yes, we have an opportunity there, added same in code. But one difference is at the end at-least once we need to read the meta page to split and/or write. Performance improvement might not be as much as read-only.

I did some pgbench simple-update tests for same, with below changes.

-               "alter table pgbench_branches add primary key (bid)",
-               "alter table pgbench_tellers add primary key (tid)",
-               "alter table pgbench_accounts add primary key (aid)"
+               "create index pgbench_branches_bid on pgbench_branches using hash (bid)",
+               "create index pgbench_tellers_tid on pgbench_tellers using hash (tid)",
+               "create index pgbench_accounts_aid on pgbench_accounts using hash (aid)"

And, removed all reference keys. But I see no improvements; I will further do benchmarking for copy command and report same.

Clients

      After Meta page cache      

Base Code                

%imp

1

     2276.151633

2304.253611

-1.2195696631

32

     36816.596513

36439.552652

1.0347104549

64

     50943.763133

51005.236973

-0.120524565

128

     49156.980457

48458.275106

1.4418700407

Above result is median of three runs, and each run is for 30mins.

Postgres Server settings:
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c checkpoint_completion_target=0.9

pgbench settings:
scale_factor = 300 (so database fits in shared_buffer)
Mode =  -M prepared -N (prepared simple-update).

Machine used:
power2 same as described as above.


I think it's probably not a good idea to cache the entire metapage.  The only things that you are "really" trying to cache, I think, are hashm_maxbucket, hashm_lowmask, and hashm_highmask.  The entire HashPageMetaData structure is 696 bytes on my machine, and it doesn't really make sense to copy the whole thing into memory if you only need 16 bytes of it.  It could even be dangerous -- if somebody tries to rely on the cache for some other bit of data and we're not really guaranteeing that it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members hmc_maxbucket, hmc_lowmask, and hmc_highmask.  The structure can have a comment explaining that we only care about having the data be fresh enough to test whether the bucket mapping we computed for a tuple is still correct, and that for that to be the case we only need to know whether a bucket has suffered a new split since we last refreshed the cache.

It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3 uint32s) but we also need

uint32      hashm_spares[HASH_MAX_SPLITPOINTS], for bucket number to block mapping in "BUCKET_TO_BLKNO(metap, bucket)".

Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) = 35*4 = 140 bytes.


The comments in this patch need some work, e.g.:

-
+       oopaque->hasho_prevblkno = maxbucket;

No comment?


I have tried to improve commenting part in the new patch.

Apart from this, there seems to be some base bug in _hash_doinsert().
+    * XXX this is useless code if we are only storing hash keys.

+   */

+    if (itemsz > HashMaxItemSize((Page) metap))

+        ereport(ERROR,

+                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),

+                 errmsg("index row size %zu exceeds hash maximum %zu",

+                        itemsz, HashMaxItemSize((Page) metap)),

+           errhint("Values larger than a buffer page cannot be indexed.")));

 "metap" (HashMetaPage) and Page are different data structure their member types are not in sync, so should not typecast blindly as above. I think we should remove this part of the code as we only store hash keys. So I have removed same but kept the assert below as it is.

Also, there was a bug in the previous patch. I was not releasing the bucket page lock if cached metadata is old, now same is fixed.

--
Thanks and Regards
Mithun C Y

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Robert Haas
Дата:
On Fri, Dec 16, 2016 at 5:16 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
Shouldn't _hash_doinsert() be using the cache, too

Yes, we have an opportunity there, added same in code. But one difference is at the end at-least once we need to read the meta page to split and/or write. Performance improvement might not be as much as read-only.

Why would you need to read it at least once in one case but not the other?
 

I think it's probably not a good idea to cache the entire metapage.  The only things that you are "really" trying to cache, I think, are hashm_maxbucket, hashm_lowmask, and hashm_highmask.  The entire HashPageMetaData structure is 696 bytes on my machine, and it doesn't really make sense to copy the whole thing into memory if you only need 16 bytes of it.  It could even be dangerous -- if somebody tries to rely on the cache for some other bit of data and we're not really guaranteeing that it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members hmc_maxbucket, hmc_lowmask, and hmc_highmask.  The structure can have a comment explaining that we only care about having the data be fresh enough to test whether the bucket mapping we computed for a tuple is still correct, and that for that to be the case we only need to know whether a bucket has suffered a new split since we last refreshed the cache.

It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3 uint32s) but we also need

uint32      hashm_spares[HASH_MAX_SPLITPOINTS], for bucket number to block mapping in "BUCKET_TO_BLKNO(metap, bucket)".

Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) = 35*4 = 140 bytes.

Well, I guess that makes it more appealing to cache the whole page at least in terms of raw number of bytes, but I suppose my real complaint here is that there don't seem to be any clear rules for where, whether, and to what extent we can rely on the cache to be valid.  Without that, I think this patch is creating an extremely serious maintenance hazard for anyone who wants to try to modify this code in the future.  A future patch author needs to be able to tell what data they can use and what data they can't use, and why.

Apart from this, there seems to be some base bug in _hash_doinsert().
+    * XXX this is useless code if we are only storing hash keys.

+   */

+    if (itemsz > HashMaxItemSize((Page) metap))

+        ereport(ERROR,

+                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),

+                 errmsg("index row size %zu exceeds hash maximum %zu",

+                        itemsz, HashMaxItemSize((Page) metap)),

+           errhint("Values larger than a buffer page cannot be indexed.")));

 "metap" (HashMetaPage) and Page are different data structure their member types are not in sync, so should not typecast blindly as above. I think we should remove this part of the code as we only store hash keys. So I have removed same but kept the assert below as it is.

Any existing bugs should be the subject of a separate patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Tue, Dec 20, 2016 at 3:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Dec 16, 2016 at 5:16 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
Shouldn't _hash_doinsert() be using the cache, too

Yes, we have an opportunity there, added same in code. But one difference is at the end at-least once we need to read the meta page to split and/or write. Performance improvement might not be as much as read-only.

Why would you need to read it at least once in one case but not the other?

 For read-only: in _hash_first if target bucket is not plit after caching the meta page contents. we never need to read the metapage. But for insert: in _hash_doinsert at the end we modify the meta page to store the number of tuples.

+  * Write-lock the metapage so we can increment the tuple count. After
+  * incrementing it, check to see if it's time for a split.
+ */
+ _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
+
+ metap->hashm_ntuples += 1;


Well, I guess that makes it more appealing to cache the whole page at least in terms of raw number of bytes, but I suppose my real complaint here is that there don't seem to be any clear rules for where, whether, and to what extent we can rely on the cache to be valid.

--- Okay will revert back to cache the entire meta page. 
 
Without that, I think this patch is creating an extremely serious maintenance hazard for anyone who wants to try to modify this code in the future.  A future patch author needs to be able to tell what data they can use and what data they can't use, and why.


-- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from meta page or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go with that approach, I will produce a neat patch for same.

typedef struct HashMetaPageData

{

1. uint32 hashm_magic; /* magic no. for hash tables */

-- I think this should remain same. So can be read from catched meta page.

2. uint32 hashm_version; /* version ID */

-- This is one time initied, never changed afterwards. So can be read from catched metapage.

3. double hashm_ntuples; /* number of tuples stored in the table */

-- This changes on every insert. So should not be read from chached data.

4. uint16 hashm_ffactor; /* target fill factor (tuples/bucket) */

-- This is one time initied, never changed afterwards. So can be read from catched metapage.

5. uint16 hashm_bsize; /* index page size (bytes) */

-- This is one time initied, never changed afterwards. So can be read from catched metapage.

6. uint16 hashm_bmsize; /* bitmap array size (bytes) - must be a power of 2 */

-- This is one time initied, never changed afterwards. So can be read from catched metapage

7. uint16 hashm_bmshift; /* log2(bitmap array size in BITS) */

-- This is one time initied, never changed afterwards. So can be read from catched metapage

8. If hashm_maxbucket, hashm_highmask and hashm_lowmask are all read and cached at same time when metapage was locked, then key to bucket number map function _hash_hashkey2bucket() should always produce same output.  If bucket is split after caching above elements (which can be known because old bucket pages will never move once allocated and we mark bucket pages hasho_prevblkno with incremented hashm_highmask), we can invalidate them and re-read same from meta page. If your intention is not to save a metapage read while trying to map the key to buket page, then do not read them from cached meta page.

uint32 hashm_maxbucket; /* ID of maximum bucket in use */

uint32 hashm_highmask; /* mask to modulo into entire table */

uint32 hashm_lowmask; /* mask to modulo into lower half of table */

9.

uint32 hashm_ovflpoint;/* splitpoint from which ovflpgs being

         * allocated */

-- Since used for allocation of overflow pages, should get latest value directly from meta page.

10.

uint32 hashm_firstfree; /* lowest-number free ovflpage (bit#) */

-- Should always be read from metapage directly.

11. 

uint32 hashm_nmaps; /* number of bitmap pages */

-- Should always be read from metapage directly.

12.

RegProcedure hashm_procid; /* hash procedure id from pg_proc */

-- Never used till now, When we use, need to look at it about its policy.

13

uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* spare pages before

* each splitpoint */

Spares before given split_point never change and bucket pages never move. So when combined with cached hashm_maxbucket, hashm_highmask and hashm_lowmask, all read at same time under lock, BUCKET_TO_BLKNO should always produce same output, pointing to right physical block. Should only be used to save a meta page read when we want to map key to bucket block as said above.

14.

BlockNumber hashm_mapp[HASH_MAX_BITMAPS]; /* blknos of ovfl bitmaps */

Always read from metapage durectly.

} HashMetaPageData;


 
Any existing bugs should be the subject of a separate patch.


Sorry I will make a new patch for same sepatately.
 
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Thanks and Regards
Mithun C Y

Re: [HACKERS] Cache Hash Index meta page.

От
Robert Haas
Дата:
On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> -- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from
metapage or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go
withthat approach, I will produce a neat patch for same. 

Plain text emails are preferred on this list.

I don't have any confidence in this approach.  I'm not sure exactly
what needs to be changed here, but what you're doing right now is just
too error-prone.  There's a cached metapage available, and you've got
code accessing directly, and that's OK except when it's not, and maybe
we can add some comments to explain, but I don't think that's going to
be good enough to really make it clear and maintainable.  We need some
kind of more substantive safeguard to prevent the cached metapage data
from being used in unsafe ways -- and while we're at it, we should try
to use it in as many of the places where it *is* safe as possible.  My
suggestion for a separate structure was one idea; another might be
providing some kind of API that's always used to access the metapage
cache.  Or maybe there's a third option.  But this, the way it is
right now, is just too ad-hoc, even with more comments.  IMHO, anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Cache Hash Index meta page.

От
Amit Kapila
Дата:
On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> -- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from
metapage or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go
withthat approach, I will produce a neat patch for same. 
>
> Plain text emails are preferred on this list.
>
> I don't have any confidence in this approach.  I'm not sure exactly
> what needs to be changed here, but what you're doing right now is just
> too error-prone.  There's a cached metapage available, and you've got
> code accessing directly, and that's OK except when it's not, and maybe
> we can add some comments to explain, but I don't think that's going to
> be good enough to really make it clear and maintainable.  We need some
> kind of more substantive safeguard to prevent the cached metapage data
> from being used in unsafe ways -- and while we're at it, we should try
> to use it in as many of the places where it *is* safe as possible.  My
> suggestion for a separate structure was one idea; another might be
> providing some kind of API that's always used to access the metapage
> cache.  Or maybe there's a third option.
>

This metapage cache can be validated only when we have a bucket in
which we have stored the maxbucket value.  I think what we can do to
localize the use of metapage cache is to write a new API which will
return a bucket page locked in specified mode based on hashkey.
Something like Buffer _hash_get_buc_buffer_from_hashkey(hashkey,
lockmode).  I think this will make metpage cache access somewhat
similar to what we have in btree where we use cache to access
rootpage.  Will something like that address your concern?



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Thu, Dec 22, 2016 at 12:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>>> -- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from
metapage or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go
withthat approach, I will produce a neat patch for same. 
>>
>> Plain text emails are preferred on this list.

Sorry, I have set the mail to plain text mode now.

> I think this will make metpage cache access somewhat
> similar to what we have in btree where we use cache to access
> rootpage.  Will something like that address your concern?

Thanks, just like _bt_getroot I am introducing a new function
_hash_getbucketbuf_from_hashkey() which will give the target bucket
buffer for the given hashkey. This will use the cached metapage
contents instead of reading meta page buffer if cached data is valid.
There are 2 places which can use this service 1. _hash_first and 2.
_hash_doinsert.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Tue, Dec 27, 2016 at 1:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
Oops, patch number should be 08 re-attaching same after renaming.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Amit Kapila
Дата:
On Tue, Dec 27, 2016 at 1:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Thu, Dec 22, 2016 at 12:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>>>> -- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from
metapage or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go
withthat approach, I will produce a neat patch for same. 
>>>
>>> Plain text emails are preferred on this list.
>
> Sorry, I have set the mail to plain text mode now.
>
>> I think this will make metpage cache access somewhat
>> similar to what we have in btree where we use cache to access
>> rootpage.  Will something like that address your concern?
>
> Thanks, just like _bt_getroot I am introducing a new function
> _hash_getbucketbuf_from_hashkey() which will give the target bucket
> buffer for the given hashkey. This will use the cached metapage
> contents instead of reading meta page buffer if cached data is valid.
> There are 2 places which can use this service 1. _hash_first and 2.
> _hash_doinsert.
>

This version of the patch looks much better than the previous version.
I have few review comments:

1.
+ * pin so we can use the metabuf while writing into to it below.
+ */
+ metabuf =
_hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);

usage "into to .." in above comment seems to be wrong.

2.
- pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+ /*
+ *
Setting hasho_prevblkno of bucket page with latest maxbucket number
+ * to indicate
bucket has been initialized and need to reconstruct
+ * HashMetaCache if it is older.
+
*/
+ pageopaque->hasho_prevblkno = metap->hashm_maxbucket;

In the above comment, a reference to HashMetaCache is confusing, are
your referring to any structure here?  If you change this, consider to
change the similar usage at other places in the patch as well.

Also, it is not clear to what do you mean by ".. to indicate bucket
has been initialized .."?  assigning maxbucket as a special value to
prevblkno is not related to initializing a bucket page.

3.typedef struct HashPageOpaqueData{
+ /*
+ * If this is an ovfl page this stores previous ovfl
(or bucket) blkno.
+ * Else if this is a bucket page we use this for a special purpose. We
+ *
store hashm_maxbucket value, whenever this page is initialized or
+ * split. So this helps us
to know whether the bucket has been split after
+ * caching the some of the meta page data.
See _hash_doinsert(),
+ * _hash_first() to know how to use same.
+ */

In above comment, where you are saying ".. caching the some of the
meta page data .." is slightly confusing, because it appears to me
that you are caching whole of the metapage not some part of it.

4.
+_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,

Generally, for _hash_* API's, we use rel as the first parameter, so I
think it is better to maintain the same with this API as well.

5. _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-   maxbucket, highmask, lowmask);
+   usedmetap->hashm_maxbucket,
+   usedmetap->hashm_highmask,
+   usedmetap->hashm_lowmask);

I think we should add an Assert for the validity of usedmetap before using it.

6. Getting few compilation errors in assert-enabled build.

1>src/backend/access/hash/hashinsert.c(85): error C2065: 'bucket' :
undeclared identifier
1>src/backend/access/hash/hashinsert.c(155): error C2065: 'bucket' :
undeclared identifier

1>src/backend/access/hash/hashsearch.c(223): warning C4101: 'blkno' :
unreferenced local variable
1>  hashutil.c
1>\src\backend\access\hash\hashsearch.c(284): warning C4700:
uninitialized local variable 'bucket' used

7.
+ /*
+ * Check if this bucket is split after we have cached the hash meta
+ * data. To do this we need to check whether cached maxbucket number
+ * is less than or equal to maxbucket number stored in bucket page,
+ * which was set with that times maxbucket number during bucket page
+ * splits. In case of upgrade hashno_prevblkno of old bucket page will
+ * be set with InvalidBlockNumber. And as of now maximum value the
+ * hashm_maxbucket can take is 1 less than InvalidBlockNumber (see
+ * _hash_expandtable). So an explicit check for InvalidBlockNumber in
+ * hasho_prevblkno will tell whether current bucket has been split
+ * after caching hash meta data.
+ */

I can understand what you want to say in above comment, but I think
you can write it in somewhat shorter form.  There is no need to
explain the exact check.

8.
@@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan, _hash_relbuf(rel, *bufp);
 *bufp = InvalidBuffer;
+
+ /* If it is a bucket page there will not be a prevblkno. */
+ if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+ return;
+

I don't think above check is right.  Even if it is a bucket page, we
might need to scan bucket being populated, refer check else if
(so->hashso_buc_populated && so->hashso_buc_split).  Apart from that,
you can't access bucket page after releasing the lock on it.  Why have
you added such a check?



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
Thanks Amit for detailed review, and pointing out various issues in
the patch. I have tried to fix all of your comments as below.

On Mon, Jan 2, 2017 at 11:29 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

> 1.
> usage "into to .." in above comment seems to be wrong.usage "into to .." in above comment seems to be wrong.usage
"intoto .." in above comment seems to be wrong.usage "into to .." in above comment seems to be wrong. 

-- Fixed.

> 2.
> In the above comment, a reference to HashMetaCache is confusing, are
> your referring to any structure here?  If you change this, consider toyour referring to any structure here?  If you
changethis, consider toyour referring to any structure here?  If you change this, consider toyour referring to any
structurehere?  If you change this, consider to 
> change the similar usage at other places in the patch as well.change the similar usage at other places in the patch
aswell.change the similar usage at other places in the patch as well.change the similar usage at other places in the
patchas well. 

-- Fixed. Removed HashMetaCache everywhere in the code. Where ever
needed added HashMetaPageData.

> Also, it is not clear to what do you mean by ".. to indicate bucketto indicate bucket
> has been initialized .."?  assigning maxbucket as a special value tohas been initialized .."?  assigning maxbucket as
aspecial value to 
> prevblkno is not related to initializing a bucket page.prevblkno is not related to initializing a bucket page.

-- To be consistent with our definition of prevblkno, I am setting
prevblkno with current hashm_maxbucket when we initialize the bucket
page. I have tried to correct the comments accordingly.

> 3.
> In above comment, where you are saying ".. caching the some of the
> meta page data .." is slightly confusing, because it appears to me
> that you are caching whole of the metapage not some part of it.

-- Fixed. Changed to caching the HashMetaPageData.

> 4.
> +_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,
>
> Generally, for _hash_* API's, we use rel as the first parameter, so I
> think it is better to maintain the same with this API as well.

-- Fixed.

> 5.
>   _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
> -   maxbucket, highmask, lowmask);
> +   usedmetap->hashm_maxbucket,
> +   usedmetap->hashm_highmask,
> +   usedmetap->hashm_lowmask);

> I think we should add an Assert for the validity of usedmetap before using it.

-- Fixed. Added Assert(usedmetap != NULL);

> 6. Getting few compilation errors in assert-enabled build.
>

-- Fixed. Sorry, I missed handling bucket number which is needed at
below codes. I have fixed same now.

> 7.
> I can understand what you want to say in above comment, but I think
> you can write it in somewhat shorter form.  There is no need to
> explain the exact check.

-- Fixed. I have tried to compress it into a few lines.

> 8.
> @@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
>   _hash_relbuf(rel, *bufp);
>
>   *bufp = InvalidBuffer;
> +
> + /* If it is a bucket page there will not be a prevblkno. */
> + if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
> + return;
> +
>
> I don't think above check is right.  Even if it is a bucket page, we
> might need to scan bucket being populated, refer check else if
> (so->hashso_buc_populated && so->hashso_buc_split).  Apart from that,
> you can't access bucket page after releasing the lock on it.  Why have
> you added such a check?
>

-- Fixed. That was a mistake, now I have fixed it. Actually, if bucket
page is passed to _hash_readprev then there will not be a prevblkno.
But from this patch onwards prevblkno of bucket page will store
hashm_maxbucket. So a check BlockNumberIsValid (blkno) will not be
valid anymore. I have fixed by adding as below.
+ /* If it is a bucket page there will not be a prevblkno. */
+ if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))
  {
+ Assert(BlockNumberIsValid(blkno));

There are 2 other places which does same @_hash_freeovflpage,
@_hash_squeezebucket.
But that will only be called for overflow pages. So I did not make
changes. But I think we should also change there to make it
consistent.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Tue, Jan 3, 2017 at 12:05 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

As part performance/space analysis of hash index on varchar data type
with this patch, I have run some tests for same with modified pgbench.
Modification includes:
1. Changed aid column of pg_accounts table from int to varchar(x)
2.  To generate unique values as before inserted stringified integer
repeatedly to fill x.

I have mainly tested for varchar(30) and varchar(90),
Below document has the detailed report on our captured data. New hash
indexes have some ~25% improved performance over btree. And, as
expected very space efficient.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Wed, Jan 4, 2017 at 4:19 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> As part performance/space analysis of hash index on varchar data typevarchar data type
> with this patch, I have run some tests for same with modified pgbench.with this patch, I have run some tests for same
withmodified pgbench.
 
I forgot to mention all these tests were run on power2 machine which
has 192  2 hyperthreads.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
I have re-based the patch to fix one compilation warning
@_hash_doinsert where variable bucket was only used for Asserting but
was not declared about its purpose.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Amit Kapila
Дата:
On Thu, Jan 5, 2017 at 11:38 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> I have re-based the patch to fix one compilation warning
> @_hash_doinsert where variable bucket was only used for Asserting but
> was not declared about its purpose.
>

Few more comments:
1.}
+
+
+/*
+ * _hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given

no need to two extra lines, one is sufficient and matches the nearby
coding pattern.

2.
+ * old bucket in case of bucket split, see @_hash_doinsert().

Do you see anywhere else in the code the pattern of using @ symbol in
comments before function name?  In general, there is no harm in using
it, but maybe it is better to be consistent with usage at other
places.

3.
+ /*
+ * @_hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+ * Should be safe to use further.
+ */
+ bucket = pageopaque->hasho_bucket;
 /* * If this bucket is in the process of being split, try to finish the
@@ -152,7 +107,9 @@ restart_insert: LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-   maxbucket, highmask, lowmask);
+   usedmetap->hashm_maxbucket,

after this change, i think you can directly use bucket in
_hash_finish_split instead of pageopaque->hasho_bucket

4.
@@ -154,8 +154,11 @@ _hash_readprev(IndexScanDesc scan, *bufp = InvalidBuffer; /* check for interrupts while we're not
holdingany buffer lock */ CHECK_FOR_INTERRUPTS();
 
- if (BlockNumberIsValid(blkno))
+
+ /* If it is a bucket page there will not be a prevblkno. */
+ if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))

Won't the check similar to the existing check (if (*bufp ==
so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just
above this code will suffice the need?  If so, then you can check it
once and use it in both places.

5. The reader and insertion algorithm needs to be updated in README.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Cache Hash Index meta page.

От
Robert Haas
Дата:
On Tue, Dec 27, 2016 at 3:06 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Thanks, just like _bt_getroot I am introducing a new function
> _hash_getbucketbuf_from_hashkey() which will give the target bucket
> buffer for the given hashkey. This will use the cached metapage
> contents instead of reading meta page buffer if cached data is valid.
> There are 2 places which can use this service 1. _hash_first and 2.
> _hash_doinsert.

Can we adapt the ad-hoc caching logic in hashbulkdelete() to work with
this new logic?  Or at least update the comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Few more comments:
> 1.
> no need to two extra lines, one is sufficient and matches the nearby
> coding pattern.

-- Fixed.

> 2.
> Do you see anywhere else in the code the pattern of using @ symbol in
> comments before function name?

-- Fixed.

> 3.
>
> after this change, i think you can directly use bucket in
> _hash_finish_split instead of pageopaque->hasho_bucket

-- Fixed.

> 4.
>
> Won't the check similar to the existing check (if (*bufp ==
> so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just
> above this code will suffice the need?  If so, then you can check it
> once and use it in both places.
>

-- Fixed.

> 5. The reader and insertion algorithm needs to be updated in README.

-- Added info in README.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Wed, Jan 11, 2017 at 12:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Can we adapt the ad-hoc caching logic in hashbulkdelete() to work with
> this new logic?  Or at least update the comments? 
I have introduced a new function _hash_getcachedmetap in patch 11 [1] with this hashbulkdelete() can use metapage cache instead of saving it locally.

[1] cache_hash_index_meta_page_11.patch
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Cache Hash Index meta page.

От
Amit Kapila
Дата:
On Fri, Jan 13, 2017 at 9:58 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Below are review comments on latest version of patch.

1. /*
- * Read the metapage to fetch original bucket and tuple counts.  Also, we
- * keep a copy of the last-seen metapage so that we can use its
- * hashm_spares[] values to compute bucket page addresses.  This is a bit
- * hokey but perfectly safe, since the interesting entries in the spares
- * array cannot change under us; and it beats rereading the metapage for
- * each bucket.
+ * update and get the metapage cache data. */
- metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
- metap = HashPageGetMeta(BufferGetPage(metabuf));
- orig_maxbucket = metap->hashm_maxbucket;
- orig_ntuples = metap->hashm_ntuples;
- memcpy(&local_metapage, metap, sizeof(local_metapage));
- /* release the lock, but keep pin */
- LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+ cachedmetap = _hash_getcachedmetap(rel, true);
+ orig_maxbucket = cachedmetap->hashm_maxbucket;
+ orig_ntuples = cachedmetap->hashm_ntuples;

(a) I think you can retain the previous comment or modify it slightly.
Just removing the whole comment and replacing it with a single line
seems like a step backward.
(b) Another somewhat bigger problem is that with this new change it
won't retain the pin on meta page till the end which means we might
need to perform an I/O again during operation to fetch the meta page.
AFAICS, you have just changed it so that you can call new API
_hash_getcachedmetap, if that's true, then I think you have to find
some other way of doing it.  BTW, why can't you design your new API
such that it always take pinned metapage?  You can always release the
pin in the caller if required.  I understand that you don't always
need a metapage in that API, but I think the current usage of that API
is also not that good.


2.
+ if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
+ bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+ cachedmetap = _hash_getcachedmetap(rel, true);

I don't understand the meaning of above if check.  It seems like you
will update the metapage when previous block number is not a valid
block number which will be true at the first split.  How will you
ensure that there is a re-split and cached metapage is not relevant.
I think if there is && in the above condition, then we can ensure it.

3.
+ Given a hashkey get the target bucket page with read lock, using cached
+ metapage. The getbucketbuf_from_hashkey method below explains the same.
+

All the sentences in algorithm start with small letters, then why do
you need an exception for this sentence.  I think you don't need to
add an empty line. Also, I think the usage of
getbucketbuf_from_hashkey seems out of place.  How about writing it
as:

The usage of cached metapage is explained later.


4.
+ If target bucket is split before metapage data was cached then we are
+ done.
+ Else first release the bucket page and then update the metapage cache
+ with latest metapage data.

I think it is better to retain original text of readme and add about
meta page update.

5.
+ Loop:
..
..
+ Loop again to reach the new target bucket.

No need to write "Loop again ..", that is implicit.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> 1.
> (a) I think you can retain the previous comment or modify it slightly.
> Just removing the whole comment and replacing it with a single line
> seems like a step backward.

-- Fixed, Just modified to say it

> (b) Another somewhat bigger problem is that with this new change it
> won't retain the pin on meta page till the end which means we might
> need to perform an I/O again during operation to fetch the meta page.
> AFAICS, you have just changed it so that you can call new API
> _hash_getcachedmetap, if that's true, then I think you have to find
> some other way of doing it.  BTW, why can't you design your new API
> such that it always take pinned metapage?  You can always release the
> pin in the caller if required.  I understand that you don't always
> need a metapage in that API, but I think the current usage of that API
> is also not that good.


-- Yes what you say is right. I wanted to make _hash_getcachedmetap
API self-sufficient. But always 2 possible consecutive reads for
metapage are connected as we pin the page to buffer to avoid I/O. Now
redesigned the API such a way that caller pass pinned meta page if we
want to set the metapage cache; So this gives us the flexibility to
use the cached meta page data in different places.


> 2.
> + if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
> + bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
> + cachedmetap = _hash_getcachedmetap(rel, true);
>
> I don't understand the meaning of above if check.  It seems like you
> will update the metapage when previous block number is not a valid
> block number which will be true at the first split.  How will you
> ensure that there is a re-split and cached metapage is not relevant.
> I think if there is && in the above condition, then we can ensure it.
>

-- Oops that was a mistake corrected as you stated.

> 3.
> + Given a hashkey get the target bucket page with read lock, using cached
> + metapage. The getbucketbuf_from_hashkey method below explains the same.
> +
>
> All the sentences in algorithm start with small letters, then why do
> you need an exception for this sentence.  I think you don't need to
> add an empty line. Also, I think the usage of
> getbucketbuf_from_hashkey seems out of place.  How about writing it
> as:
>
> The usage of cached metapage is explained later.
>

-- Fixed as like you have asked.

>
> 4.
> + If target bucket is split before metapage data was cached then we are
> + done.
> + Else first release the bucket page and then update the metapage cache
> + with latest metapage data.
>
> I think it is better to retain original text of readme and add about
> meta page update.
>

-- Fixed. Now where ever it is meaning full I have kept original
wordings. But, the way we get to right target buffer after the latest
split is slightly different than what the original code did. So there
is a slight modification to show we use metapage cache.

> 5.
> + Loop:
> ..
> ..
> + Loop again to reach the new target bucket.
>
> No need to write "Loop again ..", that is implicit.
>

-- Fixed as liked you have asked.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Amit Kapila
Дата:
On Wed, Jan 18, 2017 at 11:51 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>> (b) Another somewhat bigger problem is that with this new change it
>> won't retain the pin on meta page till the end which means we might
>> need to perform an I/O again during operation to fetch the meta page.
>> AFAICS, you have just changed it so that you can call new API
>> _hash_getcachedmetap, if that's true, then I think you have to find
>> some other way of doing it.  BTW, why can't you design your new API
>> such that it always take pinned metapage?  You can always release the
>> pin in the caller if required.  I understand that you don't always
>> need a metapage in that API, but I think the current usage of that API
>> is also not that good.
>
>
> -- Yes what you say is right. I wanted to make _hash_getcachedmetap
> API self-sufficient. But always 2 possible consecutive reads for
> metapage are connected as we pin the page to buffer to avoid I/O. Now
> redesigned the API such a way that caller pass pinned meta page if we
> want to set the metapage cache; So this gives us the flexibility to
> use the cached meta page data in different places.
>

1.
@@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
..
..

+ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+ cachedmetap = _hash_getcachedmetap(rel, metabuf);


In the above flow, do we really need an updated metapage, can't we use
the cached one?  We are already taking care of bucket split down in
that function.

2.
+HashMetaPage
+_hash_getcachedmetap(Relation rel, Buffer metabuf)
+{
..
..
+ if (BufferIsInvalid(metabuf))
+ return (HashMetaPage) rel->rd_amcache;
..


+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+ HashMetaPage *cachedmetap)
{
..

+ if (!(metap = _hash_getcachedmetap(rel, InvalidBuffer)))
+ {
+ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+ metap = _hash_getcachedmetap(rel, metabuf);
+ Assert(metap != NULL);
+ }
..
}

The above two chunks of code look worse as compare to your previous
patch.  I think what we can do is keep the patch ready with both the
versions of _hash_getcachedmetap API (as you have in _v11 and _v12)
and let committer take the final call.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Tue, Jan 24, 2017 at 3:10 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> 1.
> @@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info,

> In the above flow, do we really need an updated metapage, can't we use
> the cached one?  We are already taking care of bucket split down in
> that function.

Yes, we can use the old cached metap entry, the only reason I decided
to use the latest metapage content is because the old code used to do
that. And, cached metap is used to avoid ad-hoc local saving of same
and hence unify the cached metap API. I did not intend to save the
metapage read here which I thought will not be much useful if new
buckets are added anyway we need to read the metapage at the end. I
have taken you comments now I only read metap cache which is already
set.

> 2.
> The above two chunks of code look worse as compare to your previous
> patch.  I think what we can do is keep the patch ready with both the
> versions of _hash_getcachedmetap API (as you have in _v11 and _v12)
> and let committer take the final call.

_v11 API's was self-sustained one but it does not hold pins on the
metapage buffer. Whereas in _v12 we hold the pin for two consecutive
reads of metapage. I have taken your advice and producing 2 different
patches.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Robert Haas
Дата:
On Thu, Jan 26, 2017 at 1:48 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> _v11 API's was self-sustained one but it does not hold pins on the
> metapage buffer. Whereas in _v12 we hold the pin for two consecutive
> reads of metapage. I have taken your advice and producing 2 different
> patches.

Hmm.  I think both of these APIs have some advantages.  On the one
hand, passing metabuf sometimes allows you to avoid pin/unpin cycles -
e.g. in hashbulkdelete it makes a fair amount of sense to keep the
metabuf pinned once we've had to read it, just in case we need it
again.  On the other hand, it's surprising that passing a value for
the metabuf forces the cached to be refreshed.  I wonder if a good API
might be something like this:

HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
force_refresh);

If the cache is initialized and force_refresh is not true, then this
just returns the cached data, and the metabuf argument isn't used.
Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
the metabuffer, pin and lock it, use it to set the cache, release the
lock, and return with the pin still held.  If *metabuf !=
InvalidBuffer, we assume it's pinned and return with it still pinned.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
> force_refresh);
>
> If the cache is initialized and force_refresh is not true, then this
> just returns the cached data, and the metabuf argument isn't used.
> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
> the metabuffer, pin and lock it, use it to set the cache, release the
> lock, and return with the pin still held.  If *metabuf !=
> InvalidBuffer, we assume it's pinned and return with it still pinned.

Thanks, Robert I have made a new patch which tries to do same. Now I
think code looks less complicated.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Michael Paquier
Дата:
On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
>> force_refresh);
>>
>> If the cache is initialized and force_refresh is not true, then this
>> just returns the cached data, and the metabuf argument isn't used.
>> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
>> the metabuffer, pin and lock it, use it to set the cache, release the
>> lock, and return with the pin still held.  If *metabuf !=
>> InvalidBuffer, we assume it's pinned and return with it still pinned.
>
> Thanks, Robert I have made a new patch which tries to do same. Now I
> think code looks less complicated.

Moved to CF 2017-03.
-- 
Michael



Re: [HACKERS] Cache Hash Index meta page.

От
Robert Haas
Дата:
On Wed, Feb 1, 2017 at 12:23 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>>> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
>>> force_refresh);
>>>
>>> If the cache is initialized and force_refresh is not true, then this
>>> just returns the cached data, and the metabuf argument isn't used.
>>> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
>>> the metabuffer, pin and lock it, use it to set the cache, release the
>>> lock, and return with the pin still held.  If *metabuf !=
>>> InvalidBuffer, we assume it's pinned and return with it still pinned.
>>
>> Thanks, Robert I have made a new patch which tries to do same. Now I
>> think code looks less complicated.
>
> Moved to CF 2017-03.

Committed with some changes (which I noted in the commit message).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Cache Hash Index meta page.

От
Erik Rijkers
Дата:
On 2017-02-07 18:41, Robert Haas wrote:
> Committed with some changes (which I noted in the commit message).

This has caused a warning with gcc 6.20:

hashpage.c: In function ‘_hash_getcachedmetap’:
hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]    rel->rd_amcache = cache;    ~~~~~~~~~~~~~~~~^~~~~~~

which hopefully can be prevented...

thanks,

Erik Rijkers









Re: [HACKERS] Cache Hash Index meta page.

От
Mithun Cy
Дата:
On Tue, Feb 7, 2017 at 11:21 PM, Erik Rijkers <er@xs4all.nl> wrote:
> On 2017-02-07 18:41, Robert Haas wrote:
>>
>> Committed with some changes (which I noted in the commit message).

Thanks, Robert and all who have reviewed the patch and given their
valuable comments.

> This has caused a warning with gcc 6.20:
>
> hashpage.c: In function ‘_hash_getcachedmetap’:
> hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this
> function [-Wmaybe-uninitialized]
>     rel->rd_amcache = cache;
>     ~~~~~~~~~~~~~~~~^~~~~~~

Yes, I also see the warning.  I think the compiler is not able to see
cache is always initialized and used under condition if
(rel->rd_amcache == NULL).
I think to make the compiler happy we can initialize the cache with
NULL when it is defined.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cache Hash Index meta page.

От
Robert Haas
Дата:
On Tue, Feb 7, 2017 at 1:52 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Tue, Feb 7, 2017 at 11:21 PM, Erik Rijkers <er@xs4all.nl> wrote:
>> On 2017-02-07 18:41, Robert Haas wrote:
>>>
>>> Committed with some changes (which I noted in the commit message).
>
> Thanks, Robert and all who have reviewed the patch and given their
> valuable comments.
>
>> This has caused a warning with gcc 6.20:
>>
>> hashpage.c: In function ‘_hash_getcachedmetap’:
>> hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this
>> function [-Wmaybe-uninitialized]
>>     rel->rd_amcache = cache;
>>     ~~~~~~~~~~~~~~~~^~~~~~~
>
> Yes, I also see the warning.  I think the compiler is not able to see
> cache is always initialized and used under condition if
> (rel->rd_amcache == NULL).
> I think to make the compiler happy we can initialize the cache with
> NULL when it is defined.

Thanks for the reports and patch.  Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company