Re: refactoring basebackup.c

Поиск
Список
Период
Сортировка
От Jeevan Ladhe
Тема Re: refactoring basebackup.c
Дата
Msg-id CAOgcT0PdvW1aV+Pnim-NuqDxU3zkN4EkQJ1Op9ZVKOssuWBQkw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: refactoring basebackup.c  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: refactoring basebackup.c  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
>> + /*
>> + * LZ4F_compressUpdate() returns the number of bytes written into output
>> + * buffer. We need to keep track of how many bytes have been cumulatively
>> + * written into the output buffer(bytes_written). But,
>> + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
>> + * written to output buffer, set autoFlush to 1 to force the writing to the
>> + * output buffer.
>> + */
>> + prefs->autoFlush = 1;
>>
>> I don't see why this should be necessary. Elsewhere you have code that
>> caters to bytes being stuck inside LZ4's buffer, so why do we also
>> require this?
>
> This is needed to know the actual bytes written in the output buffer. If it is
> set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual
> bytes are written to the output buffer, depending on whether it has buffered
> or really flushed data to the output buffer.

The problem is that if we autoflush, I think it will cause the
compression ratio to be less good. Try un-lz4ing a file that is
produced this way and then re-lz4 it and compare the size of the
re-lz4'd file to the original one. Compressors rely on postponing
decisions about how to compress until they've seen as much of the
input as possible, and flushing forces them to decide earlier, and
maybe making a decision that isn't as good as it could have been. So I
believe we should look for a way of avoiding this. Now I realize
there's a problem there with doing that and also making sure the
output buffer is large enough, and I'm not quite sure how we solve
that problem, but there is probably a way to do it.

Yes, you are right here, and I could verify this fact with an experiment.
When autoflush is 1, the file gets less compressed i.e. the compressed file
is of more size than the one generated when autoflush is set to 0.
But, as of now, I couldn't think of a solution as we need to really advance the
bytes written to the output buffer so that we can write into the output buffer.

Regards,
Jeevan Ladhe
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: possibility to read dumped table's name from file
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: proposal: possibility to read dumped table's name from file