Re: [HACKERS] Proposal: generic WAL compression

Поиск
Список
Период
Сортировка
От Oleg Ivanov
Тема Re: [HACKERS] Proposal: generic WAL compression
Дата
Msg-id 5A87DA4C.6050509@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] Proposal: generic WAL compression  (Antonin Houska <ah@cybertec.at>)
Список pgsql-hackers
Hello everyone!

Unfortunately, I faced the use case with RUM index, in which my patch 
produced
enormously large time overhead (queries execution time is about 2 or 3 times
slower). Only now I finally managed to limit this overhead by 20% or 
even make
it statistically insignificant on my HDD. Nevertheless, SSD evaluation 
is still
required.
So I attach new test totat_test.sh script which includes the bad RUM use 
case
and the new version of the patch.

Thanks everybody for review, it improved the patch a lot. The majority 
of the
listed drawbacks were fixed, the others are discussed below.

On 11/16/2017 08:31 PM, Antonin Houska wrote:
> * writeToPtr() and readFromPtr() are applied to the existing code. I think
>    this is a reason for a separate diff, to be applied before the actual patch.
Removed the existing code refactoring, that should be a separate patch 
indeed.

> * What's the reason for changing FRAGMENT_HEADER_SIZE ? I see no change in
>    the data layout that writeFragment() produces.
Diff delta has its own header which consists of one char and one int. To 
make
this point more clear, I defined DIFF_DELTA_HEADER_SIZE and added
2 * DIFF_DELTA_HEADER_SIZE to MAX_DELTA_SIZE.

> ** "An O(ND) Difference Algorithm and Its Variations" - I think the reference
>     should contain more information, e.g. name of the author(s). I think I even
>     saw URLs to scientific papers in the PG source but I'm not 100% sure right
>     now.
The reference to the paper contains much more information now.

> ** As for the algorithm itself: Although I didn't have enough patience (and
>     time) to follow it in every detail for this first review, I think I can
>     imagine what it is about. Yet I think reading would be easier if the
>     concepts of alignment and "diff delta" were depicted in the comments,
>     perhaps using some sort of "ASCII graphics".
I briefly described the main idea of the dynamical programming algorithm
in comments of alignRegions. It can be further refined if you think it is
still unclear now, but I'm afraid the detailed description will result 
in just
copying the referred paper into the comments.

> ** DiffDeltaOperations enumeration: the individual values should be described.
>
>
> * computeRegionDiffDelta()
>
> ** Does "previous delta" in one of the comments refer to "base delta",
>     i.e. the delta computation w/o your patch? If so, the comment should
>     mention it explicitly.
>
> ** If you use Min() to initialize the for-loop, it'd be more consistent if you
>     also used Max() when checking the limits:
>
> for (i = Min(validStart, targetStart); i < Max(validEnd, targetEnd); ++i)
>
> And similarly you can simplify the body of the loop.
I found that this part of code is unnecessary and deleted it.

On 11/17/2017 03:02 PM, Arthur Silva wrote:
> I wonder if the algorithm could be changed to operate with units of 2 
> or 4 bytes (instead of 1).
> Since the algorithm speed is essentially doubled/quadrupled it could 
> yield a better runtime/compression trade-off.
>
> Does that make sense?
This approach implies that we cannot detect data shift by one byte, for 
example.
It is suitable for 2- or 4-bytes-aligned data, but looks not general enough.

On 01/23/2018 12:37 AM, Stephen Frost wrote:
>> * writeToPtr() and readFromPtr() are applied to the existing code. I think
>>    this is a reason for a separate diff, to be applied before the actual patch.
> I'm not really a fan of using these, for my 2c.  I'm not sure how others
> feel, but having these macros which change one of the arguments to the
> macro (by advancing the pointer) doesn't result in a net improvement to
> readability for me.
Sounds reasonable, but the patch uses such constructions as writeToPtr 
rather
often. How do you feel about using writeToPtr(&ptr, data) which which more
explicitly indicates the first argument changing?

On 02/07/2018 09:02 PM, Markus Nullmeier wrote:
> One general comment I can already make is that enabling compression
> should be made optional, which appears to be a small and easy addition
> to the generic WAL API.
Now I'm working on this.I consider creation the function
GenericXLogRegisterBufferEx in addition to GenericXLogRegisterBuffer.
GenericXLogRegisterBufferEx will take a structure with parameters for diff
delta such as maximal alignment length, to which parts of page it must be
applied, etc. And GenericXLogRegisterBufferwill call
GenericXLogRegisterBufferEx with default parameters. This allows using
different compression settings for different indexes. What do you think 
about
such solution?

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] why not parallel seq scan for slow functions
Следующее
От: David Rowley
Дата:
Сообщение: Re: [HACKERS] Runtime Partition Pruning