Re: Show WAL write and fsync stats in pg_stat_io

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Show WAL write and fsync stats in pg_stat_io
Дата
Msg-id CAAKRu_ZDwdvg64=HHVk8f7tOaFK_rQugfWoZBVqjOtLao3wP4w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Show WAL write and fsync stats in pg_stat_io  (Nazir Bilal Yavuz <byavuz81@gmail.com>)
Ответы Re: Show WAL write and fsync stats in pg_stat_io  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
I have code review feedback as well, but I've saved that for my next email.

On Wed, Jan 3, 2024 at 8:11 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> On Sun, 31 Dec 2023 at 03:58, Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Tue, Dec 26, 2023 at 03:35:52PM +0300, Nazir Bilal Yavuz wrote:
> > > On Tue, 26 Dec 2023 at 13:10, Michael Paquier <michael@paquier.xyz> wrote:
> > >> I am not sure while the whole point of the exercise is to have all the
> > >> I/O related data in a single view.  Something that I've also found a
> > >> bit disturbing yesterday while looking at your patch is the fact that
> > >> the operation size is guessed from the context and object type when
> > >> querying the view because now everything is tied to BLCKSZ.  This
> > >> patch extends it with two more operation sizes, and there are even
> > >> cases where it may be a variable.  Could it be a better option to
> > >> extend pgstat_count_io_op_time() so as callers can themselves give the
> > >> size of the operation?
> > >
> > > Do you mean removing the op_bytes column and tracking the number of
> > > bytes in reads, writes, and extends? If so, that makes sense to me but
> > > I don't want to remove the number of operations; I believe that has a
> > > value too. We can extend the pgstat_count_io_op_time() so it can both
> > > track the number of bytes and the number of operations.
> >
> > Apologies if my previous wording sounded confusing.  The idea I had in
> > mind was to keep op_bytes in pg_stat_io, and extend it so as a value
> > of NULL (or 0, or -1) is a synonym as "writes", "extends" and "reads"
> > as a number of bytes.
>
> Oh, I understand it now. Yes, that makes sense.
> I thought removing op_bytes completely ( as you said "This patch
> extends it with two more operation sizes, and there are even cases
> where it may be a variable" ) from pg_stat_io view then adding
> something like {read | write | extend}_bytes and {read | write |
> extend}_calls could be better, so that we don't lose any information.

Forgive me as I catch up on this thread.

Upthread, Michael says:

> I find the use of 1 in this context a bit confusing, because when
> referring to a counter at N, then it can be understood as doing N
> times a operation,

I didn't understand this argument, so I'm not sure if I agree or
disagree with it.

I think these are the three proposals for handling WAL reads:

1) setting op_bytes to 1 and the number of reads is the number of bytes
2) setting op_bytes to XLOG_BLCKSZ and the number of reads is the
number of calls to pg_pread() or similar
3) setting op_bytes to NULL and the number of reads is the number of
calls to pg_pread() or similar

Looking at the patch, I think it is still doing 2.

It would be good to list all our options with pros and cons (if only
because they are a bit spread throughout the thread now).

For an unpopular idea: we could add separate [IOOp]_bytes columns for
all those IOOps for which it would be relevant. It kind of stinks but
it would give us the freedom to document exactly what a single IOOp
means for each combination of BackendType, IOContext, IOObject, and
IOOp (as relevant) and still have an accurate number in the *bytes
columns. Everyone will probably hate us if we do that, though.
Especially because having bytes for the existing IOObjects is an
existing feature.

A separate question: suppose [1] goes in (to read WAL from WAL buffers
directly). Now, WAL reads are not from permanent storage anymore. Are
we only tracking permanent storage I/O in pg_stat_io? I also had this
question for some of the WAL receiver functions. Should we track any
I/O other than permanent storage I/O? Or did I miss this being
addressed upthread?

> > > Also, it is not directly related to this patch but vectored IO [1] is
> > > coming soon; so the number of operations could be wrong since vectored
> > > IO could merge a couple of operations.
> >
> > Hmm.  I have not checked this patch series so I cannot say for sure,
> > but we'd likely just want to track the number of bytes if a single
> > operation has a non-equal size rather than registering in pg_stat_io N
> > rows with different op_bytes, no?
>
> Yes, that is correct.

I do not like the idea of having basically GROUP BY op_bytes in the
view (if that is the suggestion).

In terms of what I/O we should track in a streaming/asynchronous
world, the options would be:

1) track read/write syscalls
2) track blocks of BLCKSZ submitted to the kernel
3) track bytes submitted to the kernel
4) track merged I/Os (after doing any merging in the application)

I think the debate was largely between 2 and 4. There was some
disagreement, but I think we landed on 2 because there is merging that
can happen at many levels in the storage stack (even the storage
controller). Distinguishing between whether or not Postgres submitted
2 32k I/Os or 8 8k I/Os could be useful while you are developing AIO,
but I think it might be confusing for the Postgres user trying to
determine why their query is slow. It probably makes the most sense to
still track in block size.

No matter what solution we pick, you should get a correct number if
you multiply op_bytes by an IOOp (assuming nothing is NULL). Or,
rather, there should be some way of getting an accurate number in
bytes of the amount of a particular kind of I/O that has been done.

- Melanie



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: initdb --no-locale=C doesn't work as specified when the environment is not C
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum