Re: Refactoring the checkpointer's fsync request queue

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Refactoring the checkpointer's fsync request queue
Дата
Msg-id CA+hUKG+=-p4oiNC=U6yjtDigAvTXxGbe2__c=cVSxdxZYBxTPQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Refactoring the checkpointer's fsync request queue  (Shawn Debnath <sdn@amazon.com>)
Ответы Re: Refactoring the checkpointer's fsync request queue  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
On Thu, Apr 4, 2019 at 10:44 AM Shawn Debnath <sdn@amazon.com> wrote:
> On Thu, Apr 04, 2019 at 01:08:31AM +1300, Thomas Munro wrote:
> > As a sanity check on the programming interface this thing gives you, I
> > tried teaching the SLRUs to use the fsync queue.  I finished up making
> > a few small improvements, but the main thing I learned is that
> > "handler" needs to be part of the hash table key.  I suppose the
> > discriminator could even be inside FileTag itself, but I chose to keep
> > it separate and introduce a new struct to hold hander enum + FileTag
> > in the hash table.
>
> I think this is fine, but can you elaborate a bit more on why we need to
> include the handler in the key for the hash table? We are de-duping
> relfilenodes here and these should never collide with files from another
> component. The component separation would be encoded in the RelFileNode
> that the target smgr, or SLRU in this case, would be able to decipher.
> Do you foresee, or know of, use cases where FileTag alone will result in
> conflicts on the same file but from different handlers?

Well it depends how we think of the FileTag namespace.  I didn't worry
before because md.c and my development version of undofile.c generate
FileTag objects that never collide, because undofile.c sets the dbNode
to invalid (and in earlier versions had a special magic DB OID), while
md.c always uses valid values.  But when I did the SLRU experiment
with that 0002 patch, I figured it was reasonable to want to set only
the segment number part of the FileTag.  Then sync requests for
pg_xact, pg_multixact/offsets and pg_multixact/members got merged.
Oops.

As you say, we could decide to assign bogus reserved dbNode values to
keep them the keyspaces apart and not allow future modules to deviate
from using a RelFileNode in the tag, but we already more-or-less
decided in another thread that we don't want to do that for buffer
tags.  People seemed keen to see another discriminator for smgr ID,
rather than requiring eg md.c and undo.c to use non-colliding
RelFileNode values via a fake dbNode scheme.  This problem is
essentially the same, except that we decided we want the set of sync
handlers to be potentially larger than the set of smgr handlers.  For
example, until we have a patch that moves SLRUs into shared buffers,
SLRUs don't have SMGR IDs, and get as the 0002 patch shows, they could
theoretically still benefit from handing off fsync work; there may be
other cases like that.  Generally, the question is similar in both
places: (1) do sync handlers each get their own separate namespace of
FileTags?, (2) do smgrs get their own namespace of BufferTag?  Perhaps
that is an argument for putting the sync handler number *inside* the
FileTag, since we currently intend to do that with smgr IDs in
BufferTag (stealing space from ForkNumber).

> +/*
> + * Populate a file tag describing a segment file.  We only use the segment
> + * number, since we can derive everything else we need by having separate
> + * sync handler functions for clog, multixact etc.
> + */
> +#define INIT_SLRUFILETAG(a,xx_segno) \
> +( \
> +       (a).rnode.spcNode = 0, \
> +       (a).rnode.dbNode = 0, \
> +       (a).rnode.relNode = 0, \
> +       (a).forknum = 0, \
> +       (a).segno = (xx_segno) \
> +)
>
> Based on the definition of INIT_SLRUFILETAG in your patch, it seems you
> are trying to only use the segno to identify the file. Not sure why we
> can't use other unused fields in FileTag to identify the component?  You
> could for example in the current SLRU implementation have the handler
> set to SYNC_HANDLER_SLRU when invoking RegisterSyncRequest() and use
> relNode to distinguish between each SLRU component in a wrapper function
> and call slrusyncfiletag() with the right SlruCtl.

Yes, that would work too, though then slru.c would have to know about
clog, multixact etc so it could find their SlruCtlData objects, which
are currently static in the eg clog.c, multixact.c etc.  That's why I
put a tiny callback into each of those that could call slru.c to do
the work.  I'm not really proposing anything here, and I understand
that you might want to refactor this stuff completely, I just wanted a
quick sanity check of *something* else using this interface, other
than md.c and my undo patch, in a particular a
thing-that-isn't-connected-to-smgr.  If it turns out to be useful for
later work, good, if not, I won't be sad.  I also suppressed the urge
to teach it to use fd.c files and keep them open in a small cache --
all topics for future threads.

> For the SLRU to buffer cache work, I was planning on using the relNode
> field to identify which specific component this tag belongs to.  dbNode
> would be pointing to the type of smgr (as discussed earlier in the
> thread and still TBD).

I look forward to the patches :-)  But as mentioned, there was some
serious push-back on hijacking dbNode like that.

> > The 0001 patch is what I'm going to commit soon.  I don't know why
> > Shawn measured a performance change -- a bug in an earlier version? --
> > I can't see it, but I'm planning to look into that a bit more first.
> > I've attached the 0002 SLRU patch for interest, but I'm not planning
> > to commit that one.
>
> Question is which block of code did you measure? I can redo the
> instrumentation on the latest patch and re-validate and share the
> results. I previously measured the average time it took mdsync() and
> ProcessSyncRequests() in the patch to complete under similar workloads.

Ok.  Let me try that again here too.

-- 
Thomas Munro
https://enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: explain plans with information about (modified) gucs
Следующее
От: Raymond Martin
Дата:
Сообщение: RE: minimizing pg_stat_statements performance overhead