Re: Performance Improvement by reducing WAL for Update Operation

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Performance Improvement by reducing WAL for Update Operation
Дата	14 января 2014 г. 09:16:22
Msg-id	CAA4eK1JHPiCCmsnSOBoiRwSa-QV9KBdt5crt9005jtZFfyLU_Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Performance Improvement by reducing WAL for Update Operation (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Performance Improvement by reducing WAL for Update Operation (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Jan 14, 2014 at 2:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jan 11, 2014 at 1:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Yes, currently this applies to update, what I have in mind is that
>> in future if some one wants to use WAL compression for any other
>> operation like 'full_page_writes', then it can be easily extendible.
>>
>> To be honest, I have not evaluated whether such a flag or compression
>> would make sense for full page writes, but I think it should be possible
>> while doing full page write (BkpBlock has RelFileNode) to check such a
>> flag if it's present.
>
> Makes sense.
  So shall I change it to string instead of bool and keep the name as  compress_wal or compress_wal_for_opr?

>> The reason of adding the same chunk in head of list is that it uses same
>> technique as pglz_hist_add. Now in pglz, it will not have repeat steps
>> from c~f, as it has concept of good_match which leads to get this done in
>> one go.
>>
>> Being said above, I am really not sure, how much real world data falls
>> in above category and should we try to optimize based on above example,
>> but yes it will save some CPU cycles in current test we are using.
>
> In the Rabin algorithm, we shouldn't try to find a longer match.  The
> match should end at the chunk end, period.  Otherwise, you lose the
> shift-resistant property of the algorithm.
  Okay, it will work well for cases when most chunks in tuple are due  due to special pattern in it, but it will loose
outon CPU cycles in  cases where most of the chunks are due to maximum chunk boundary  and most part of new tuple
matcheswith old tuple. The reason is that  if the algorithm have some such property of finding longer matches than
chunkboundaries, then it can save us on calculating hash again and  again when we try to find match in old tuple.
HoweverI think it is better to go with rabin's algorithm instead of adding  optimizations based on our own assumptions,
becauseit is difficult to  predict the real world tuple data.
 

>>
>> Isn't it similar to how current pglz works, basically it also
>> uses next 4 bytes to calculate index (pglz_hist_idx) but still
>> does byte by byte comparison, here if we try to map to rabin's
>> delta encoding then always chunk size is 1.
>
> I don't quite understand this.  The point of the Rabin algorithm is to
> split the old tuple up into chunks and then for those chunks in the
> new tuple.  For example, suppose the old tuple is
> abcdefghijklmnopqrstuvwxyz.  It might get split like this: abcdef
> hijklmnopqrstuvw xyz.  If any of those three chunks appear in the new
> tuple, then we'll use them for compression.  If not, we'll just copy
> the literal bytes.  If the chunks appear in the new tuple reordered or
> shifted or with stuff inserted between one chunk at the next, we'll
> still find them.  Unless I'm confused, which is possible, what you're
> doing is essentially looking at the string and spitting it in those
> three places, but then recording the chunks as being three bytes
> shorter than they really are.  I don't see how that can be right.
 Today again spending some time on algorithm, I got the bug you are pointing to and you are right in saying that chunk
isshorter. I think it should not be difficult to address this issue without affecting most part of algorithm, let me
tryto handle it.
 


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Craig Ringer
Дата: 14 января 2014 г., 08:55:39
Сообщение: Re: Case sensitive mode in windows build option

Следующее

От: KONDO Mitsumasa
Дата: 14 января 2014 г., 09:47:12
Сообщение: Re: gaussian distribution pgbench

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Performance Improvement by reducing WAL for Update Operation

Предыдущее

Следующее