Re: [HACKERS] Cache Hash Index meta page.

Поиск

Список

Период

Сортировка

От	Mithun Cy
Тема	Re: [HACKERS] Cache Hash Index meta page.
Дата	16 декабря 2016 г. 16:16:17
Msg-id	CAD__OuhWbWt=t-nXGMGdvdqqm7X8aNfoh=AA13huVtx80S8+9g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Cache Hash Index meta page. (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: [HACKERS] Cache Hash Index meta page. (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Thanks Robert, I have tried to address all of the comments,

On Tue, Dec 6, 2016 at 2:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:

+        bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
                                     metap->hashm_highmask,
                                     metap->hashm_lowmask);

This hunk appears useless.

+        metap = (HashMetaPage)rel->rd_amcache;

Whitespace.

Fixed.

+ /* Cache the metapage data for next time*/

Whitespace.

Fixed.

+ /* Check if this bucket is split after we have cached the metapage.

Whitespace.

Fixed.

Shouldn't _hash_doinsert() be using the cache, too?

Yes, we have an opportunity there, added same in code. But one difference is at the end at-least once we need to read the meta page to split and/or write. Performance improvement might not be as much as read-only.

I did some pgbench simple-update tests for same, with below changes.

- "alter table pgbench_branches add primary key (bid)",

- "alter table pgbench_tellers add primary key (tid)",

- "alter table pgbench_accounts add primary key (aid)"

+ "create index pgbench_branches_bid on pgbench_branches using hash (bid)",

+ "create index pgbench_tellers_tid on pgbench_tellers using hash (tid)",

+ "create index pgbench_accounts_aid on pgbench_accounts using hash (aid)"

And, removed all reference keys. But I see no improvements; I will further do benchmarking for copy command and report same.

Clients	After Meta page cache	Base Code	%imp
1	2276.151633	2304.253611	-1.2195696631
32	36816.596513	36439.552652	1.0347104549
64	50943.763133	51005.236973	-0.120524565
128	49156.980457	48458.275106	1.4418700407

Above result is median of three runs, and each run is for 30mins.

Postgres Server settings:

./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c checkpoint_completion_target=0.9

pgbench settings:

scale_factor = 300 (so database fits in shared_buffer)

Mode = -M prepared -N (prepared simple-update).

Machine used:

power2 same as described as above.

I think it's probably not a good idea to cache the entire metapage. The only things that you are "really" trying to cache, I think, are hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire HashPageMetaData structure is 696 bytes on my machine, and it doesn't really make sense to copy the whole thing into memory if you only need 16 bytes of it. It could even be dangerous -- if somebody tries to rely on the cache for some other bit of data and we're not really guaranteeing that it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a comment explaining that we only care about having the data be fresh enough to test whether the bucket mapping we computed for a tuple is still correct, and that for that to be the case we only need to know whether a bucket has suffered a new split since we last refreshed the cache.

It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3 uint32s) but we also need

uint32 hashm_spares[HASH_MAX_SPLITPOINTS], for bucket number to block mapping in "BUCKET_TO_BLKNO(metap, bucket)".

Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) = 35*4 = 140 bytes.

The comments in this patch need some work, e.g.:

-
+ oopaque->hasho_prevblkno = maxbucket;

No comment?

I have tried to improve commenting part in the new patch.

Apart from this, there seems to be some base bug in _hash_doinsert().

+ * XXX this is useless code if we are only storing hash keys.

+ */

+ if (itemsz > HashMaxItemSize((Page) metap))

+ ereport(ERROR,

+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),

+ errmsg("index row size %zu exceeds hash maximum %zu",

+ itemsz, HashMaxItemSize((Page) metap)),

+ errhint("Values larger than a buffer page cannot be indexed.")));

"metap" (HashMetaPage) and Page are different data structure their member types are not in sync, so should not typecast blindly as above. I think we should remove this part of the code as we only store hash keys. So I have removed same but kept the assert below as it is.

Also, there was a bug in the previous patch. I was not releasing the bucket page lock if cached metadata is old, now same is fixed.

Thanks and Regards

Mithun C Y

EnterpriseDB: http://www.enterprisedb.com