Re: Performance Improvement by reducing WAL for Update Operation

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Performance Improvement by reducing WAL for Update Operation
Дата
Msg-id CA+TgmobOpaE3azdgo3cV47wYwy7CB9vKoerWkXV=mpSWYZOPbQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Sat, Jan 11, 2014 at 1:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Yes, currently this applies to update, what I have in mind is that
> in future if some one wants to use WAL compression for any other
> operation like 'full_page_writes', then it can be easily extendible.
>
> To be honest, I have not evaluated whether such a flag or compression
> would make sense for full page writes, but I think it should be possible
> while doing full page write (BkpBlock has RelFileNode) to check such a
> flag if it's present.

Makes sense.

> The reason of adding the same chunk in head of list is that it uses same
> technique as pglz_hist_add. Now in pglz, it will not have repeat steps
> from c~f, as it has concept of good_match which leads to get this done in
> one go.
>
> Being said above, I am really not sure, how much real world data falls
> in above category and should we try to optimize based on above example,
> but yes it will save some CPU cycles in current test we are using.

In the Rabin algorithm, we shouldn't try to find a longer match.  The
match should end at the chunk end, period.  Otherwise, you lose the
shift-resistant property of the algorithm.

>>But I do think there might be a bug here, which is
>> that, unless I'm misinterpreting something, hp is NOT the end of the
>> chunk.  After calling pgrb_hash_init(), we've looked at the first FOUR
>> bytes of the input.  If we find that we have a zero hash value at that
>> point, shouldn't the chunk size be 4, not 1?  And similarly if we find
>> it after sucking in one more byte, shouldn't the chunk size be 5, not
>> 2?  Right now, we're deciding where the chunks should end based on the
>> data in the chunk plus the following 3 bytes, and that seems wonky.  I
>> would expect us to include all of those bytes in the chunk.
>
> It depends on how we define chunk, basically chunk size will be based
> on the byte for which we consider hindex. The hindex for any byte is
> calculated considering that byte and the following 3 bytes, so
> after calling pgrb_hash_init(), even though we have looked at 4 bytes
> but still the hindex is for first byte and thats why it consider
> chunk size as 1, not 4.
>
> Isn't it similar to how current pglz works, basically it also
> uses next 4 bytes to calculate index (pglz_hist_idx) but still
> does byte by byte comparison, here if we try to map to rabin's
> delta encoding then always chunk size is 1.

I don't quite understand this.  The point of the Rabin algorithm is to
split the old tuple up into chunks and then for those chunks in the
new tuple.  For example, suppose the old tuple is
abcdefghijklmnopqrstuvwxyz.  It might get split like this: abcdef
hijklmnopqrstuvw xyz.  If any of those three chunks appear in the new
tuple, then we'll use them for compression.  If not, we'll just copy
the literal bytes.  If the chunks appear in the new tuple reordered or
shifted or with stuff inserted between one chunk at the next, we'll
still find them.  Unless I'm confused, which is possible, what you're
doing is essentially looking at the string and spitting it in those
three places, but then recording the chunks as being three bytes
shorter than they really are.  I don't see how that can be right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Planning time in explain/explain analyze
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Planning time in explain/explain analyze