Re: Warn when parallel restoring a custom dump without data offsets

Поиск
Список
Период
Сортировка
От David Gilman
Тема Re: Warn when parallel restoring a custom dump without data offsets
Дата
Msg-id 20200525185429.6pp5vdvfu3vxlwac@dar.local
обсуждение исходный текст
Ответ на Re: Warn when parallel restoring a custom dump without data offsets  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: Warn when parallel restoring a custom dump without data offsets  (David Gilman <dgilman@gilslotd.com>)
Re: Warn when parallel restoring a custom dump without data offsets  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
Updated patches are attached, I ditched the gmail web interface so
hopefully this works.

Not mentioned in Justin's feedback: I dropped the extra sort in the test
as it's no longer necessary. I also added a parallel dump -> parallel
restore -> dump test run for the directory format to get some free test
coverage.

On Sat, May 23, 2020 at 05:47:51PM -0500, Justin Pryzby wrote:
> I'm not sure this will be considered a bugfix, since the behavior is known.
> Maybe eligible for backpatch though (?)

I'm not familiar with how your release management works, but I'm
personally fine with whatever version you can get it into. I urge you to
try landing this as soon as possible. The minimum reproducible example
in the test case is very minimal and I imagine all real world databases
are going to trigger this.

> I would have thought to mention the seeks() ; but it's true that the read()s now
> grow quadratically.  I did run a test, but I don't know how many objects would
> be unreasonable or how many it'd take to show a problem.

And I misunderstood how bad it was. I thought it was reading little
header structs off the disk but it's actually reading the entire table
(see _skipData). So you're quadratically rereading entire tables and
thrashing your cache. Oops.

> Maybe we should avoid fseeko(0, SEEK_SET) unless we need to wrap around after
> EOF - I'm not sure.

The seek location is already the location of the end of the last good
object so just adding wraparound gives the good algorithmic performance
from the technique in commit 42f70cd9c. I’ve gone ahead and implemented
this.

> Is it possible to dump to stdout (or pipe to cat or dd) to avoid a new option ?

The underlying IPC::Run code seems to support piping in a cross-platform
way. I am not a Perl master though and after spending an evening trying
to get it to work I went with this approach. If you can put me in touch
with anyone to help me out here I'd appreciate it.

-- 
David Gilman  :DG<
https://gilslotd.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: Trouble with hashagg spill I/O pattern and costing
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: some grammar refactoring