Re: Performance Improvement by reducing WAL for Update Operation

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Performance Improvement by reducing WAL for Update Operation
Дата
Msg-id 00a201cdb5e2$2f6d9700$8e48c500$@kapila@huawei.com
обсуждение исходный текст
Ответ на Re: Performance Improvement by reducing WAL for Update Operation  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Sunday, October 28, 2012 12:28 AM Heikki Linnakangas wrote:
> On 27.10.2012 14:27, Amit Kapila wrote:
> > On Saturday, October 27, 2012 4:03 AM Noah Misch wrote:
> >> In my previous review, I said:
> >>
> >>     Given [not relying on the executor to know which columns changed],
> >> why not
> >
> > For this patch I am interested to go with delta encoding approach
> based on
> > column boundaries.
> >
> > However I shall try to do it separately and if it gives positive
> results
> > then I will share with hackers.
> > I will try with VCDiff once or let me know if you have any other
> algorithm
> > in mind.
> One idea is to use the LZ format in the WAL record, but use your
> memcmp() code to construct it. I believe the slow part in LZ compression
> is in trying to locate matches in the "history", so if you just replace
> that with your code that's aware of the column boundaries and uses
> simple memcmp() to detect what parts changed, you could create LZ
> compressed output just as quickly as the custom encoded format. It would
> leave the door open for making the encoding smarter or to do actual
> compression in the future, without changing the format and the code to
> decode it.

This is good idea. I shall try it.

In the existing algorithm for storing the new data which is not present in
the history, it needs 1 control byte for 
every 8 bytes of new data which can increase the size of the compressed
output as compare to our delta encoding approach. 

Shall we modify the LZ Algorithm little bit, so that it can work best for
our case:

Approach-1
---------------
Is it possible to increase the control data from 1 bit to 2 bits [0 - new
data, 1 - pick from history based on OFFSET-LENGTH, 2 - Length and new data]
The new bit value (2) is to handle the new field data as a continuous stream
of data, instead of treating every byte as a new data. 

Approach-2
---------------
Use only one bit for control data [0 - Length and new data, 1 - pick from
history based on OFFSET-LENGTH]
The modified bit value (0) is to handle the new field data as a continuous
stream of data, instead of treating every byte as a new data.


With Regards,
Amit Kapila.




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Proposal for Allow postgresql.conf values to be changed via SQL
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Creating indexes in the background