Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)
Дата
Msg-id CAB7nPqTev+fS7GTVXw27e6O-kzrvnhg7Z88pUZzZGfdyu7WpnQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-bugs
On Tue, Jul 28, 2015 at 2:32 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 07/27/2015 04:40 AM, Michael Paquier wrote:
>>
>> On Mon, Jul 27, 2015 at 12:32 AM, Andres Freund <andres@anarazel.de>
>> wrote:
>>>
>>> On 2015-07-23 22:50:12 +0900, Michael Paquier wrote:
>>>>
>>>>
>>>> While running valgring on pg_rewind, I have noticed that each new call
>>>> to XLogReadRecord leaks XLogReaderState->main_data and
>>>> XLogReaderState->blocks[block_id].data because each one of them is
>>>> palloc'd to store respectively the main data of the record and the
>>>> data attached to each block. I think that the most correct fix to
>>>> prevent the leak is to free those pointers when calling ResetDecoder()
>>>> which is used to reset a reader state between two records.
>>>
>>>
>>> I don't think that'd be a good fix. Isn't the idea here that we only
>>> allocate new buffers when the previous one is too small? Your fix will
>>> greatly increase the memory management overhead.
>>
>>
>> OK.
>>
>>>> ==46805== 154 bytes in 1 blocks are definitely lost in loss record 103
>>
>> of 167
>>>>
>>>> ==46805==    at 0x6B1B: malloc (in
>>>>
>>
>> /usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
>>>>
>>>> ==46805==    by 0x10000DB2D: pg_malloc_internal (fe_memutils.c:30)
>>>> ==46805==    by 0x10000DE79: palloc (fe_memutils.c:118)
>>>> ==46805==    by 0x1000052A0: DecodeXLogRecord (xlogreader.c:1203)
>>>> ==46805==    by 0x100003AA9: XLogReadRecord (xlogreader.c:461)
>>>> ==46805==    by 0x1000022F0: extractPageMap (parsexlog.c:78)
>>>> ==46805==    by 0x10000188C: main (pg_rewind.c:280)
>>>
>>>
>>> Uh. This is the leak?
>>>          if (state->main_data_len > 0)
>>>          {
>>>                  if (!state->main_data || state->main_data_len >
>>
>> state->main_data_bufsz)
>>>
>>>                  {
>>>                          if (state->main_data)g
>>>                                  pfree(state->main_data);
>>>                          state->main_data_bufsz = state->main_data_len;
>>>                          state->main_data =
>>
>> palloc(state->main_data_bufsz); <--- here
>>>
>>>                  }
>>>                  memcpy(state->main_data, ptr, state->main_data_len);
>>>                  ptr += state->main_data_len;
>>>          }
>>>
>>> Where/When are we leaking memory here? The previously used memory is
>>> freed before we allocate a larger buffer.
>>
>>
>> Yep, it is here, and it is not related to the calls of XLogReadRecord
>> actually, that's a problem in XLogReaderFree that relies on max_block_id,
>> which gets reinitialized to -1 between two records in ResetDecoder,
>> instead
>> of XLR_MAX_BLOCK_ID to clean up the data blocks when freeing the XLOG
>> reader (note: looking at memory leaks at night can lead to incorrect
>> conclusions). So for example if a first record uses some data blocks and
>> allocates buffers for some data, and then a second record uses no data
>> blocks, we'll leak. So that's less critical than I though as the couple
>> XLofReaderAllocate/XLogReaderFree is called only once in StartupXLOG,
>> still
>> that's a leak.
>>
>> Attached is a self-contained script able to show the problem if anybody
>> wants to check by himself (the other leaks related to pg_rewind are
>> addressed in another thread), and a patch that takes care of it.
>
>
> Thanks, committed. The leak was not actually in the "main_data" handling, as
> quoted in the above snippet, but in the per-block buffers.

The patch pushed does not take care of the leak (tested on 3d5cb31c).
With the test script I sent upthread, I am still seeing it:
==32015== 154 bytes in 1 blocks are definitely lost in loss record 103 of 155
==32015==    at 0x6B1B: malloc (in
/usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==32015==    by 0x10000DB2D: pg_malloc_internal (fe_memutils.c:30)
==32015==    by 0x10000DE79: palloc (fe_memutils.c:118)
==32015==    by 0x100005270: DecodeXLogRecord (xlogreader.c:1200)
==32015==    by 0x100003A79: XLogReadRecord (xlogreader.c:458)
==32015==    by 0x100002AF0: findLastCheckpoint (parsexlog.c:186)
==32015==    by 0x1000017E1: main (pg_rewind.c:257)
==32015==
==32015== 154 bytes in 1 blocks are definitely lost in loss record 104 of 155
==32015==    at 0x6B1B: malloc (in
/usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==32015==    by 0x10000DB2D: pg_malloc_internal (fe_memutils.c:30)
==32015==    by 0x10000DE79: palloc (fe_memutils.c:118)
==32015==    by 0x100005270: DecodeXLogRecord (xlogreader.c:1200)
==32015==    by 0x100003A79: XLogReadRecord (xlogreader.c:458)
==32015==    by 0x1000022E0: extractPageMap (parsexlog.c:78)
==32015==    by 0x10000187C: main (pg_rewind.c:280)

The problem is that state->max_block_id gets reset to -1 each time a
new record is read via ResetDecoder. Hence if we decode a record that
has block data and then if a second record that does not contain block
data is decoded, we will leak if XLogReaderFree is called because no
attempts will be made to pfree what has been allocated. Per se the
patch attached.
Regards,
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #13442: ISBN doesn't always roundtrip with text
Следующее
От: Sandeep Thakkar
Дата:
Сообщение: Re: installation error please help