Re: .ready and .done files considered harmful

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: .ready and .done files considered harmful
Дата
Msg-id CA+TgmoYc84KaS=ieRM_AdpU+O_7DnbofXOiKVZ0+TFojFPNThg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: .ready and .done files considered harmful  ("Bossart, Nathan" <bossartn@amazon.com>)
Ответы Re: .ready and .done files considered harmful  ("Bossart, Nathan" <bossartn@amazon.com>)
Список pgsql-hackers
On Tue, Sep 7, 2021 at 1:28 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> Thanks for chiming in.  The limit of 64 in the multiple-files-per-
> directory-scan approach was mostly arbitrary.  My earlier testing [0]
> with different limits didn't reveal any significant difference, but
> using a higher limit might yield a small improvement when there are
> several hundred thousand .ready files.  IMO increasing the limit isn't
> really worth it for this approach.  For 500,000 .ready files,
> ordinarily you'd need 500,000 directory scans.  When 64 files are
> archived for each directory scan, you need ~8,000 directory scans.
> With 128 files per directory scan, you need ~4,000.  With 256, you
> need ~2000.  The difference between 8,000 directory scans and 500,000
> is quite significant.  The difference between 2,000 and 8,000 isn't
> nearly as significant in comparison.

That's certainly true.

I guess what I don't understand about the multiple-files-per-dirctory
scan implementation is what happens when something happens that would
require the keep-trying-the-next-file approach to perform a forced
scan. It seems to me that you still need to force an immediate full
scan, because if the idea is that you want to, say, prioritize
archiving of new timeline files over any others, a cached list of
files that you should archive next doesn't accomplish that, just like
keeping on trying the next file in sequence doesn't accomplish that.

So I'm wondering if in the end the two approaches converge somewhat,
so that with either patch you get (1) some kind of optimization to
scan the directory less often, plus (2) some kind of notification
mechanism to tell you when you need to avoid applying that
optimization. If you wanted to, (1) could even include both batching
and then, when the batch is exhausted, trying files in sequence. I'm
not saying that's the way to go, but you could. In the end, it seems
less important that we do any particular thing here and more important
that we do something - but if prioritizing timeline history files is
important, then we have to preserve that behavior.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Bossart, Nathan"
Дата:
Сообщение: Re: .ready and .done files considered harmful
Следующее
От: Robert Haas
Дата:
Сообщение: Re: VARDATA_COMPRESSED_GET_COMPRESS_METHOD comment?