Re: Performance Improvement by reducing WAL for Update Operation

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Performance Improvement by reducing WAL for Update Operation
Дата	11 января 2014 г. 09:08:44
Msg-id	CAA4eK1+wpwK=t=c4ga0K3_d1dDbbdtSXOpzS8_CB4w7hsg1QuQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Performance Improvement by reducing WAL for Update Operation (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Performance Improvement by reducing WAL for Update Operation (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Jan 10, 2014 at 9:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> 2. Provide a new reloption to specify Wal compression
>>     for update operation on table
>>     Create table tbl(c1 char(100)) With (compress_wal = true);
>>
>>     Alternative options:
>>     a. compress_wal can take input as operation, e.g. 'insert', 'update',
>>     b. use alternate syntax:
>>         Create table tbl(c1 char(100))  Compress Wal For Update;
>>     c. anything better?
>
> I think WITH (compress_wal = true) is pretty good.  I don't understand
> your point about taking the operation as input, because this only
> applies to updates.

Yes, currently this applies to update, what I have in mind is that
in future if some one wants to use WAL compression for any other
operation like 'full_page_writes', then it can be easily extendible.

To be honest, I have not evaluated whether such a flag or compression
would make sense for full page writes, but I think it should be possible
while doing full page write (BkpBlock has RelFileNode) to check such a
flag if it's present.

> But we could try to work "update" into the name
> of the setting somehow, so as to be less likely to conflict with
> future additions, like maybe wal_compress_update.  I think the
> alternate syntax you propose is clearly worse, because it would
> involve adding new keywords, something we try to avoid.

Yes, this would be better than current, I will include it in
next version of patch, unless there is any other better idea than
this one.

>
>> Points to consider
>> -----------------------------
>>
>> 1. As the current algorithm store the entry for same chunks at head of list,
>>    it will always find last but one chunk (we don't store last 4 bytes) for
>>    long matching string during match phase in encoding (pgrb_delta_encode).
>>
>>    We can improve it either by storing same chunks at end of list instead of at
>>    head or by trying to find a good_match technique used in lz algorithm.
>>    Finding good_match technique can have overhead in some of the cases
>>    when there is actually no match.
>
> I don't see what the good_match thing has to do with anything in the
> Rabin algorithm.

The case for which I have mentioned it is where most of the data is
repetitive and modified tuple is almost same.

For example

orignal tuple
aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

modified tuple
ccaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Now let us see what will happen as per current algorithm

Step -1: Form the hash table (lets consider 4 byte chunks just for purpose        of explanation):        a. First
chunkafter first 4 a's (aaaa) with hash value 1024.        b. After that all the chunks will be 4 b's (bbbb) and will
have          same hash value (lets call it H2), so will be mapped to same           bucket (let us call it 'B2'
bucket)and hence form a list.           Every new chunk with same hash value is added to front of list,           so if
goby this B2 has front element with hash value as H2           and location as 58 (last but 8 bytes, as we don't
includelast           4 bytes in our algorithm)

Step -2: Perform encoding for modified tuple by using hash table formed in        Step-1        a. First chunk after
first4 bytes (ccaa) with hash value 1056.        b. try to find match in hash table, no match, so proceed.        c.
Nextchunk with 4 b's and hash value H2, try to find a match,           it will find match in B2 bucket (at first
element).       d. okay hash value matched, so it will try to match each byte.           if all bytes matched, it will
considerchunk as matched_chunk.        e. Now if the chunk matches, then we try to match consecutive bytes
afterchunk (in the hope of finding longer match), and in this           case, it will find a match of 8 bytes and
considermatch_len as 8.        f. it will increment the modified tuple (dp) to point to byte           next to
match_len.       g. Again, it will consider next 4 b's as chunk and repeat step           c~f.

So here, what is happening is steps c~f are getting repeated, whereas if
we would have added same chunks at end of list in step 1b, then we
could have found the matching string just in one go (c~f).

The reason of adding the same chunk in head of list is that it uses same
technique as pglz_hist_add. Now in pglz, it will not have repeat steps
from c~f, as it has concept of good_match which leads to get this done in
one go.

Being said above, I am really not sure, how much real world data falls
in above category and should we try to optimize based on above example,
but yes it will save some CPU cycles in current test we are using.

>
>But I do think there might be a bug here, which is
> that, unless I'm misinterpreting something, hp is NOT the end of the
> chunk.  After calling pgrb_hash_init(), we've looked at the first FOUR
> bytes of the input.  If we find that we have a zero hash value at that
> point, shouldn't the chunk size be 4, not 1?  And similarly if we find
> it after sucking in one more byte, shouldn't the chunk size be 5, not
> 2?  Right now, we're deciding where the chunks should end based on the
> data in the chunk plus the following 3 bytes, and that seems wonky.  I
> would expect us to include all of those bytes in the chunk.

It depends on how we define chunk, basically chunk size will be based
on the byte for which we consider hindex. The hindex for any byte is
calculated considering that byte and the following 3 bytes, so
after calling pgrb_hash_init(), even though we have looked at 4 bytes
but still the hindex is for first byte and thats why it consider
chunk size as 1, not 4.

Isn't it similar to how current pglz works, basically it also
uses next 4 bytes to calculate index (pglz_hist_idx) but still
does byte by byte comparison, here if we try to map to rabin's
delta encoding then always chunk size is 1.

If we follow the same logic to define chunk both for encoding and match,
will there be any problem?

I have tried to keep the implementation closer to previous lz delta
encoding, but if you see benefit in including the supporting bytes
(next 3 bytes) to define a chunk, then I can try to change it.

>> 2. Another optimization that we can do in pgrb_find_match(), is that
>> currently if
>>     it doesn't find the first chunk (chunk got by hash index) matching, it
>>     continues to find the match in other chunks. I am not sure if there is any
>>     benefit to search for other chunks if first one is not matching.
>
> Well, if you took that out, I suspect it would hurt the compression
> ratio.

True, this is the reason I have kept it, but was not sure what kind
of scenarios it can benefit and whether such scenarios can be
more common for updates.

> Unless the CPU savings are substantial, I'd leave it alone.

Okay, lets leave it as it is.

>> 3. We can move code from pg_lzcompress.c to some new file pg_rbcompress.c,
>>     if we want to move, then we need to either duplicate some common macros
>>     like pglz_out_tag or keep it common, but might be change the name.
>
> +1 for a new file.

Okay, will take care of it in next version.

>
>> 4. Decide on min and max chunksize. (currently kept as 2 and 4 respectively).
>>     The point to consider is that if we keep bigger chunk sizes, then it can
>>     save us on CPU cycles, but less reduction in Wal, on the other side if we
>>     keep it small it can have better reduction in Wal but consume more CPU
>>     cycles.
>
> Whoa.  That seems way too small.  Since PGRB_PATTERN_AFTER_BITS is 4,
> the average length of a chunk is about 16 bytes.  It makes little
> sense to have the maximum chunk size be 25% of the expected chunk
> length.  I'd recommend making the maximum chunk length something like
> 4 * PGRB_CONST_NUM, and the minimum chunk length maybe something like
> 4.

Agreed, but I think for some strings (where it doesn't find special bit
pattern) it will create long chunks which can effect the reduction.
AFAIR, in the current test which we are using to evaluate the
performance of this patch has strings where it will do so, but on
the other side, this test might not be the case for real update strings.

I will make these modifications and report here if it affects the results.

>> 5. kept an guc variable 'wal_update_compression_ratio', for test purpose, we
>>    can remove it before commit.
>
> Let's remove it now.

Sure, will remove in next version.

>
>> 7. docs needs to be updated, tab completion needs some work.
>
> Tab completion can be skipped for now, but documentation is important.

Agreed, left it for this version of patch, so that we can conclude on syntax
and then accordingly, I can mention it in docs.

>> 8. We can extend Alter Table to set compress option for table.
>
> I don't understand what you have in mind here.

Let us say user created table without this new option (compress_wal)
and later wants to enable it, so we can provide similar to what we
provide for other storage parameters.
Alter Table Set (compress_wal=true)

One more point to note that in the version of patch I posted, it has
default value for compress_wal as true, just for easiness of test,
we might want to change it to false for backeward compatibility.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Greg Stark
Дата: 11 января 2014 г., 09:01:49
Сообщение: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

Следующее

От: Peter Geoghegan
Дата: 11 января 2014 г., 09:48:42
Сообщение: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Performance Improvement by reducing WAL for Update Operation

Предыдущее

Следующее