Re: refactoring basebackup.c

Поиск

Список

Период

Сортировка

От	Jeevan Ladhe
Тема	Re: refactoring basebackup.c
Дата	21 сентября 2021 г. 16:07:37
Msg-id	CAOgcT0PdvW1aV+Pnim-NuqDxU3zkN4EkQJ1Op9ZVKOssuWBQkw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: refactoring basebackup.c (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: refactoring basebackup.c (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

>> + /*
>> + * LZ4F_compressUpdate() returns the number of bytes written into output
>> + * buffer. We need to keep track of how many bytes have been cumulatively
>> + * written into the output buffer(bytes_written). But,
>> + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
>> + * written to output buffer, set autoFlush to 1 to force the writing to the
>> + * output buffer.
>> + */
>> + prefs->autoFlush = 1;
>>
>> I don't see why this should be necessary. Elsewhere you have code that
>> caters to bytes being stuck inside LZ4's buffer, so why do we also
>> require this?
>
> This is needed to know the actual bytes written in the output buffer. If it is
> set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual
> bytes are written to the output buffer, depending on whether it has buffered
> or really flushed data to the output buffer.

The problem is that if we autoflush, I think it will cause the
compression ratio to be less good. Try un-lz4ing a file that is
produced this way and then re-lz4 it and compare the size of the
re-lz4'd file to the original one. Compressors rely on postponing
decisions about how to compress until they've seen as much of the
input as possible, and flushing forces them to decide earlier, and
maybe making a decision that isn't as good as it could have been. So I
believe we should look for a way of avoiding this. Now I realize
there's a problem there with doing that and also making sure the
output buffer is large enough, and I'm not quite sure how we solve
that problem, but there is probably a way to do it.

Yes, you are right here, and I could verify this fact with an experiment.

When autoflush is 1, the file gets less compressed i.e. the compressed file

is of more size than the one generated when autoflush is set to 0.

But, as of now, I couldn't think of a solution as we need to really advance the

bytes written to the output buffer so that we can write into the output buffer.

Regards,

Jeevan Ladhe

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 21 сентября 2021 г., 15:46:27
Сообщение: Re: proposal: possibility to read dumped table's name from file

Следующее

От: Alvaro Herrera
Дата: 21 сентября 2021 г., 16:28:51
Сообщение: Re: proposal: possibility to read dumped table's name from file

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: refactoring basebackup.c

Предыдущее

Следующее