Re: post-freeze damage control

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: post-freeze damage control
Дата
Msg-id 5a78917f-67a9-4678-a89b-f9adb7666951@pgmasters.net
обсуждение исходный текст
Ответ на Re: post-freeze damage control  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On 4/13/24 21:02, Tomas Vondra wrote:
> On 4/13/24 01:23, David Steele wrote:
> 
>> Even for the summarizer, though, I do worry about the complexity of
>> maintaining it over time. It seems like it would be very easy to
>> introduce a bug and have it go unnoticed until it causes problems in the
>> field. A lot of testing was done outside of the test suite for this
>> feature and I'm not sure if we can rely on that focus with every release.
>>
> 
> I'm not sure there's a simpler way to implement this. I haven't really
> worked on that part (not until the CoW changes a couple weeks ago), but
> I think Robert was very conscious of the complexity.
> 
> I don't think expect this code to change very often, but I agree it's
> not great to rely on testing outside the regular regression test suite.
> But I'm not sure how much more we can do, really - for example my
> testing was very much "randomized stress testing" with a lot of data and
> long runs, looking for unexpected stuff. That's not something we could
> do in the usual regression tests, I think.
> 
> But if you have suggestions how to extend the testing ...

Doing stress testing in the regular test suite is obviously a problem 
due to runtime, but it would still be great to see tests for issues that 
were found during external stress testing.

For example, the issue you and Jakub found was fixed in 55a5ee30 but 
there is no accompanying test and no existing test was broken by the change.

>> For me an incremental approach would be to introduce the WAL summarizer
>> first. There are already plenty of projects that do page-level
>> incremental (WAL-G, pg_probackup, pgBackRest) and could help shake out
>> the bugs. Then introduce the client tools later when they are more
>> robust. Or, release the client tools now but mark them as experimental
>> or something so people know that changes are coming and they don't get
>> blindsided by that in the next release. Or, at the very least, make the
>> caveats very clear so users can make an informed choice.
>>
> 
> I don't think introducing just the summarizer, without any client tools,
> would really work. How would we even test the summarizer, for example?
> If the only users of that code are external tools, we'd do only some
> very rudimentary tests. But the more complex tests would happen in the
> external tools, which means it wouldn't be covered by cfbot, buildfarm
> and so on. Considering the external tools are likely a bit behind, It's
> not clear to me how I would do the stress testing, for example.
> 
> IMHO we should aim to have in-tree clients when possible, even if some
> external tools can do more advanced stuff etc.
> 
> This however reminds me my question is the summarizer provides the right
> interface(s) for the external tools. One option is to do pg_basebackup
> and then parse the incremental files, but is that suitable for the
> external tools, or should there be a more convenient way?

Running a pg_basebackup to get the incremental changes would not be at 
all satisfactory. Luckily there are the 
pg_wal_summary_contents()/pg_available_wal_summaries() functions, which 
seem to provide the required information. I have not played with them 
much but I think they will do the trick.

They are pretty awkward to work with since they are essentially 
time-series data but what you'd really want, I think, is the ability to 
get page changes for a particular relfileid/segment.

Regards,
-David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Stability of queryid in minor versions
Следующее
От: David Steele
Дата:
Сообщение: [MASSMAIL]pg_combinebackup fails on file named INCREMENTAL.*