Re: making relfilenodes 56 bits

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: making relfilenodes 56 bits
Дата
Msg-id 20220712170944.uyhlisplliyz2sir@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: making relfilenodes 56 bits  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: making relfilenodes 56 bits  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2022-07-12 09:51:12 -0400, Robert Haas wrote:
> On Mon, Jul 11, 2022 at 7:22 PM Andres Freund <andres@anarazel.de> wrote:
> > I guess I'm not enthused in duplicating the necessary knowledge in evermore
> > places. We've forgotten one of the magic incantations in the past, and needing
> > to find all the places that need to be patched is a bit bothersome.
> >
> > Perhaps we could add extract helpers out of durable_rename()?
> >
> > OTOH, I don't really see what we gain by keeping things out of the critical
> > section? It does seem good to have the temp-file creation/truncation and write
> > separately, but after that I don't think it's worth much to avoid a
> > PANIC. What legitimate issue does it avoid?
> 
> OK, so then I think we should just use durable_rename(). Here's a
> patch that does it that way. I briefly considered the idea of
> extracting helpers, but it doesn't seem worthwhile to me. There's not
> that much code in durable_rename() in the first place.

Cool.


> In this version, I also removed the struct padding, changed the limit
> on the number of entries to a nice round 64, and made some comment
> updates.

What does currently happen if we exceed that?

I wonder if we should just reference a new define generated by genbki.pl
documenting the number of relations that need to be tracked. Then we don't
need to maintain this manually going forward.


> I considered trying to go further and actually make the file
> variable-size, so that we never again need to worry about the limit on
> the number of entries, but I don't actually think that's a good idea.

Yea, I don't really see what we'd gain. For this stuff to change we need to
recompile anyway.


> If we were going to split up durable_rename(), the only intelligible
> split I can see would be to have a second version of the function, or
> a flag to the existing function, that caters to the situation where
> the old file is already known to have been fsync()'d.

I was thinking of something like durable_rename_prep() that'd fsync the
file/directories under their old names, and then durable_rename_exec() that
actually renames and then fsyncs.  But without a clear usecase...


> +    /* Write new data to the file. */
> +    pgstat_report_wait_start(WAIT_EVENT_RELATION_MAP_WRITE);
> +    if (write(fd, newmap, sizeof(RelMapFile)) != sizeof(RelMapFile))
...
> +    pgstat_report_wait_end();
> +

Not for this patch, but we eventually should move this sequence into a
wrapper. Perhaps combined with retry handling for short writes, the ENOSPC
stuff and an error message when the write fails. It's a bit insane how many
copies of this we have.


> diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
> index b578e2ec75..5d3775ccde 100644
> --- a/src/include/utils/wait_event.h
> +++ b/src/include/utils/wait_event.h
> @@ -193,7 +193,7 @@ typedef enum
>      WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE,
>      WAIT_EVENT_LOGICAL_REWRITE_WRITE,
>      WAIT_EVENT_RELATION_MAP_READ,
> -    WAIT_EVENT_RELATION_MAP_SYNC,
> +    WAIT_EVENT_RELATION_MAP_RENAME,

Very minor nitpick: To me REPLACE would be a bit more accurate than RENAME,
since it includes fsync etc?

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Следующее
От: Andres Freund
Дата:
Сообщение: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)