Обсуждение: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Поиск
Список
Период
Сортировка

Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi hackers,

Please find attached a patch proposal for $SUBJECT.

The idea has been raised in [1] by Andres: it would allow to simplify even more the work done to
generate pg_stat_get_xact*() functions with Macros.

Indeed, with the reconciliation done in find_tabstat_entry() then all the pg_stat_get_xact*() functions
(making use of find_tabstat_entry()) now "look the same" (should they take into account live subtransactions or not).

Looking forward to your feedback,

Regards

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

[1]: https://www.postgresql.org/message-id/20230111225907.6el6c5j3hukizqxc%40awork3.anarazel.de
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Kyotaro Horiguchi
Дата:
At Thu, 9 Feb 2023 11:38:18 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in 
> Hi hackers,
> 
> Please find attached a patch proposal for $SUBJECT.
> 
> The idea has been raised in [1] by Andres: it would allow to simplify
> even more the work done to
> generate pg_stat_get_xact*() functions with Macros.
> 
> Indeed, with the reconciliation done in find_tabstat_entry() then all
> the pg_stat_get_xact*() functions
> (making use of find_tabstat_entry()) now "look the same" (should they
> take into account live subtransactions or not).
> 
> Looking forward to your feedback,

I like that direction.

Don't we rename PgStat_FunctionCounts to PgStat_FuncStatus, unifying
neighboring functions?

Why does find_tabstat_entry() copies the whole pending data and
performs subxaction summarization? The summarization is needed only by
few callers but now that cost is imposed to the all callers along with
additional palloc()/pfree() calls. That doesn't seem reasonable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 2/10/23 3:32 AM, Kyotaro Horiguchi wrote:
> At Thu, 9 Feb 2023 11:38:18 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in
>> Hi hackers,
>>
>> Please find attached a patch proposal for $SUBJECT.
>>
>> The idea has been raised in [1] by Andres: it would allow to simplify
>> even more the work done to
>> generate pg_stat_get_xact*() functions with Macros.
>>
>> Indeed, with the reconciliation done in find_tabstat_entry() then all
>> the pg_stat_get_xact*() functions
>> (making use of find_tabstat_entry()) now "look the same" (should they
>> take into account live subtransactions or not).
>>
>> Looking forward to your feedback,
> 
> I like that direction.
> 

Thanks for looking at it!

> Don't we rename PgStat_FunctionCounts to PgStat_FuncStatus, unifying
> neighboring functions?
> 

Not sure, I think it's the counter part of PgStat_TableCounts for example.

> Why does find_tabstat_entry() copies the whole pending data and
> performs subxaction summarization? 

It copies the pending data to not increment it's counters while doing the summarization.
The summarization was done here to avoid the pg_stat_get_xact*() functions to do the computation so that all
the pg_stat_get_xact*() functions look the same but....

> The summarization is needed only by
> few callers but now that cost is imposed to the all callers along with
> additional palloc()/pfree() calls. That doesn't seem reasonable.
> 

I agree that's not the best approach.....

Let me come back with another proposal (thinking to increment reconciled
counters in pgstat_count_heap_insert(), pgstat_count_heap_delete() and
pgstat_count_heap_update()).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Andres Freund
Дата:
: Andres Freund <andres@anarazel.de>
References: <b9e1f543-ee93-8168-d530-d961708ad9d3@gmail.com>
 <20230210.113242.699878230551547182.horikyota.ntt@gmail.com>
 <5420b28c-d33f-d25d-9f47-b06b8a2372ba@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5420b28c-d33f-d25d-9f47-b06b8a2372ba@gmail.com>

Hi,

On 2023-02-10 16:50:32 +0100, Drouvot, Bertrand wrote:
> On 2/10/23 3:32 AM, Kyotaro Horiguchi wrote:
> > The summarization is needed only by
> > few callers but now that cost is imposed to the all callers along with
> > additional palloc()/pfree() calls. That doesn't seem reasonable.
> > 
> 
> I agree that's not the best approach.....

I think it's completely fine to do unnecessary reconciliation for the _xact_
functions. They're not that commonly used, and very rarely is there a huge
number of relations with lots of pending data across lots of subtransactions.


> Let me come back with another proposal (thinking to increment reconciled
> counters in pgstat_count_heap_insert(), pgstat_count_heap_delete() and
> pgstat_count_heap_update()).

Those are the performance crucial functions, we shouldn't do any additional
work there if we can avoid it. Shifting cost from the "looking at
transactional stats" side to the collecting stats side is the opposite of what
we should.

Greetings,

Andres Freund



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Andres Freund
Дата:
Hi,

On 2023-02-09 11:38:18 +0100, Drouvot, Bertrand wrote:
> Please find attached a patch proposal for $SUBJECT.
> 
> The idea has been raised in [1] by Andres: it would allow to simplify even more the work done to
> generate pg_stat_get_xact*() functions with Macros.

Thanks!

I think this is useful beyond being able to generate those functions with
macros. The fact that we had to deal with transactional code in pgstatfuncs.c
meant that a lot of the relevant itnernals had to be exposed "outside" pgstat,
which seems architecturally not great.


> Indeed, with the reconciliation done in find_tabstat_entry() then all the pg_stat_get_xact*() functions
> (making use of find_tabstat_entry()) now "look the same" (should they take into account live subtransactions or
not).

I'm not bothered by making all of pg_stat_get_xact* functions more expensive,
they're not a hot code path. But if we need to, we could just add a parameter
to find_tabstat_entry() indicating whether we need to reconcile or not.


>      /* save stats for this function, later used to compensate for recursion */
> -    fcu->save_f_total_time = pending->f_counts.f_total_time;
> +    fcu->save_f_total_time = pending->f_total_time;
>  
>      /* save current backend-wide total time */
>      fcu->save_total = total_func_time;

The diff is noisy due to all the mechanical changes like the above. Could that
be split into a separate commit?


>  find_tabstat_entry(Oid rel_id)
>  {
>      PgStat_EntryRef *entry_ref;
> +    PgStat_TableStatus *tablestatus = NULL;
>  
>      entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
>      if (!entry_ref)
>          entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
>  
>      if (entry_ref)
> -        return entry_ref->pending;
> -    return NULL;
> +    {
> +        PgStat_TableStatus *tabentry = (PgStat_TableStatus *) entry_ref->pending;

I'd add an early return for the !entry_ref case, that way you don't need to
indent the bulk of the function.


> +        PgStat_TableXactStatus *trans;
> +
> +        tablestatus  = palloc(sizeof(PgStat_TableStatus));
> +        memcpy(tablestatus, tabentry, sizeof(PgStat_TableStatus));

For things like this I'd use
  *tablestatus = *tabentry;

that way the compiler will warn you about mismatching types, and you don't
need the sizeof().


> +        /* live subtransactions' counts aren't in t_counts yet */
> +        for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
> +        {
> +            tablestatus->t_counts.t_tuples_inserted += trans->tuples_inserted;
> +            tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
> +            tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
> +        }
> +    }

Hm, why do we end uup with t_counts still being used here, but removed other
places?


> diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
> index 6737493402..40a6fbf871 100644
> --- a/src/backend/utils/adt/pgstatfuncs.c
> +++ b/src/backend/utils/adt/pgstatfuncs.c
> @@ -1366,7 +1366,10 @@ pg_stat_get_xact_numscans(PG_FUNCTION_ARGS)
>      if ((tabentry = find_tabstat_entry(relid)) == NULL)
>          result = 0;
>      else
> +    {
>          result = (int64) (tabentry->t_counts.t_numscans);
> +        pfree(tabentry);
> +    }
>  
>      PG_RETURN_INT64(result);
>  }

I don't think we need to bother with individual pfrees in this path. The
caller will call the function in a dedicated memory context, that'll be reset
very soon after this.

Greetings,

Andres Freund



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 2/10/23 10:46 PM, Andres Freund wrote:
> Hi,
> 
> On 2023-02-09 11:38:18 +0100, Drouvot, Bertrand wrote:
>> Please find attached a patch proposal for $SUBJECT.
>>
>> The idea has been raised in [1] by Andres: it would allow to simplify even more the work done to
>> generate pg_stat_get_xact*() functions with Macros.
> 
> Thanks!
> 

Thanks for looking at it!

> I think this is useful beyond being able to generate those functions with
> macros. The fact that we had to deal with transactional code in pgstatfuncs.c
> meant that a lot of the relevant itnernals had to be exposed "outside" pgstat,
> which seems architecturally not great.
> 
  
Right, good point.

>> Indeed, with the reconciliation done in find_tabstat_entry() then all the pg_stat_get_xact*() functions
>> (making use of find_tabstat_entry()) now "look the same" (should they take into account live subtransactions or
not).
> 
> I'm not bothered by making all of pg_stat_get_xact* functions more expensive,
> they're not a hot code path. But if we need to, we could just add a parameter
> to find_tabstat_entry() indicating whether we need to reconcile or not.
> 

I think that's a good idea to avoid doing extra work if not needed.
V2 adds such a bool.

>>       /* save stats for this function, later used to compensate for recursion */
>> -    fcu->save_f_total_time = pending->f_counts.f_total_time;
>> +    fcu->save_f_total_time = pending->f_total_time;
>>   
>>       /* save current backend-wide total time */
>>       fcu->save_total = total_func_time;
> 
> The diff is noisy due to all the mechanical changes like the above. Could that
> be split into a separate commit?
> 

Fully agree, the PgStat_BackendFunctionEntry stuff will be done in a separate patch.

> 
>>   find_tabstat_entry(Oid rel_id)
>>   {
>>       PgStat_EntryRef *entry_ref;
>> +    PgStat_TableStatus *tablestatus = NULL;
>>   
>>       entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
>>       if (!entry_ref)
>>           entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
>>   
>>       if (entry_ref)
>> -        return entry_ref->pending;
>> -    return NULL;
>> +    {
>> +        PgStat_TableStatus *tabentry = (PgStat_TableStatus *) entry_ref->pending;
> 
> I'd add an early return for the !entry_ref case, that way you don't need to
> indent the bulk of the function.
> 

Good point, done in V2.

> 
>> +        PgStat_TableXactStatus *trans;
>> +
>> +        tablestatus  = palloc(sizeof(PgStat_TableStatus));
>> +        memcpy(tablestatus, tabentry, sizeof(PgStat_TableStatus));
> 
> For things like this I'd use
>    *tablestatus = *tabentry;
> 
> that way the compiler will warn you about mismatching types, and you don't
> need the sizeof().
> 
> 

Good point, done in V2.

>> +        /* live subtransactions' counts aren't in t_counts yet */
>> +        for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
>> +        {
>> +            tablestatus->t_counts.t_tuples_inserted += trans->tuples_inserted;
>> +            tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
>> +            tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
>> +        }
>> +    }
> 
> Hm, why do we end uup with t_counts still being used here, but removed other
> places?
> 

t_counts are not removed, maybe you are confused with the "f_counts" that were removed
in V1 due to the PgStat_BackendFunctionEntry related changes?

> 
>> diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
>> index 6737493402..40a6fbf871 100644
>> --- a/src/backend/utils/adt/pgstatfuncs.c
>> +++ b/src/backend/utils/adt/pgstatfuncs.c
>> @@ -1366,7 +1366,10 @@ pg_stat_get_xact_numscans(PG_FUNCTION_ARGS)
>>       if ((tabentry = find_tabstat_entry(relid)) == NULL)
>>           result = 0;
>>       else
>> +    {
>>           result = (int64) (tabentry->t_counts.t_numscans);
>> +        pfree(tabentry);
>> +    }
>>   
>>       PG_RETURN_INT64(result);
>>   }
> 
> I don't think we need to bother with individual pfrees in this path. The
> caller will call the function in a dedicated memory context, that'll be reset
> very soon after this.

Oh right, the palloc is done in the ExprContext memory context that is reset soon after.
Removing the pfrees in V2 attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Kyotaro Horiguchi
Дата:
At Mon, 13 Feb 2023 08:09:50 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in 
>> I think this is useful beyond being able to generate those functions
>> with
>> macros. The fact that we had to deal with transactional code in
>> pgstatfuncs.c
>> meant that a lot of the relevant itnernals had to be exposed "outside"
>> pgstat,
>> which seems architecturally not great.
>> 
> Right, good point.

Agreed.

> Removing the pfrees in V2 attached.

Ah, that sound good.

     if (!entry_ref)
+    {
         entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
+        return tablestatus;
+    }

We should return something if the call returns a non-null value?

So, since we want to hide the internal from pgstatfuncs, the
additional flag should be gone. If the additional cost doesn't bother
anyone, I don't mind to remove the flag.  The patch will get far
simpler by that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 2/13/23 8:40 AM, Kyotaro Horiguchi wrote:
> At Mon, 13 Feb 2023 08:09:50 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in
>>> I think this is useful beyond being able to generate those functions
>>> with
>>> macros. The fact that we had to deal with transactional code in
>>> pgstatfuncs.c
>>> meant that a lot of the relevant itnernals had to be exposed "outside"
>>> pgstat,
>>> which seems architecturally not great.
>>>
>> Right, good point.
> 
> Agreed.
> 
>> Removing the pfrees in V2 attached.
> 
> Ah, that sound good.
> 
>       if (!entry_ref)
> +    {
>           entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
> +        return tablestatus;
> +    }
> 
> We should return something if the call returns a non-null value?

What we do is: if entry_ref is NULL then we return NULL (so that the caller returns 0).

If entry_ref is not NULL then we return a copy of entry_ref->pending (with or without subtrans).

> 
> So, since we want to hide the internal from pgstatfuncs, the
> additional flag should be gone. 

I think there is pros and cons for both but I don't have a strong opinion about that.

So also proposing V3 attached without the flag in case this is the preferred approach.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Kyotaro Horiguchi
Дата:
Thanks for the new version.

At Mon, 13 Feb 2023 09:58:52 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in 
> Hi,
> 
> On 2/13/23 8:40 AM, Kyotaro Horiguchi wrote:
> > At Mon, 13 Feb 2023 08:09:50 +0100, "Drouvot, Bertrand"
> > <bertranddrouvot.pg@gmail.com> wrote in
> >>> I think this is useful beyond being able to generate those functions
> >>> with
> >>> macros. The fact that we had to deal with transactional code in
> >>> pgstatfuncs.c
> >>> meant that a lot of the relevant itnernals had to be exposed "outside"
> >>> pgstat,
> >>> which seems architecturally not great.
> >>>
> >> Right, good point.
> > Agreed.
> > 
> >> Removing the pfrees in V2 attached.
> > Ah, that sound good.
> >       if (!entry_ref)
> > +    {
> >           entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
> >           InvalidOid, rel_id);
> > +        return tablestatus;
> > +    }
> > We should return something if the call returns a non-null value?
> 
> What we do is: if entry_ref is NULL then we return NULL (so that the
> caller returns 0).
> 
> If entry_ref is not NULL then we return a copy of entry_ref->pending
> (with or without subtrans).

Isn't it ignoring the second call to pgstat_fetch_pending_entry?

What the code did is: if entry_ref is NULL for MyDatabaseId then we
*retry* fetching an global (not database-wise) entry. If any global
entry is found, return it (correctly entry_ref->pending) to the caller.

The current patch returns NULL when a glboal entry is found.

I thought that we might be able to return entry_ref->pending since the
callers don't call pfree on the returned pointer, but it is not great
that we don't inform the callers if the returned memory can be safely
pfreed or not.

Thus what I have in mind is the following.

>       if (!entry_ref)
> +    {
>           entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
>           InvalidOid, rel_id);
> +        if (!entry_ref)
> +         return NULL;
> +    }



> > So, since we want to hide the internal from pgstatfuncs, the
> > additional flag should be gone. 
> 
> I think there is pros and cons for both but I don't have a strong
> opinion about that.
> 
> So also proposing V3 attached without the flag in case this is the
> preferred approach.

That part looks good to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 2/14/23 7:11 AM, Kyotaro Horiguchi wrote:
> 
> Isn't it ignoring the second call to pgstat_fetch_pending_entry?
> 

Oh right, my bad (the issue has been introduced in V2).
Fixed in V4.

> I thought that we might be able to return entry_ref->pending since the
> callers don't call pfree on the returned pointer, but it is not great
> that we don't inform the callers if the returned memory can be safely
> pfreed or not.
> 
> Thus what I have in mind is the following.
> 
>>        if (!entry_ref)
>> +    {
>>            entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
>>            InvalidOid, rel_id);
>> +        if (!entry_ref)
>> +         return NULL;
>> +    }

LGTM, done that way in V4.

> 
> 
> 
>>> So, since we want to hide the internal from pgstatfuncs, the
>>> additional flag should be gone.
>>
>> I think there is pros and cons for both but I don't have a strong
>> opinion about that.
>>
>> So also proposing V3 attached without the flag in case this is the
>> preferred approach.
> 
> That part looks good to me.
> 

Thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Kyotaro Horiguchi
Дата:
At Tue, 14 Feb 2023 15:43:26 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in 
> Oh right, my bad (the issue has been introduced in V2).
> Fixed in V4.

Great!

> > I thought that we might be able to return entry_ref->pending since the
> > callers don't call pfree on the returned pointer, but it is not great
> > that we don't inform the callers if the returned memory can be safely
> > pfreed or not.
> > Thus what I have in mind is the following.
> > 
> >>        if (!entry_ref)
> >> +    {
> >>            entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
> >>            InvalidOid, rel_id);
> >> +        if (!entry_ref)
> >> +         return NULL;
> >> +    }
> 
> LGTM, done that way in V4.

That part looks good to me, thanks!

I was going through v4 and it seems to me that the comment for
find_tabstat_entry may not be quite right.

> * find any existing PgStat_TableStatus entry for rel
> *
> * Find any existing PgStat_TableStatus entry for rel_id in the current
> * database. If not found, try finding from shared tables.
> *
> * If no entry found, return NULL, don't create a new one

The comment assumed that the function directly returns an entry from
shared memory, but now it copies the entry's contents into a palloc'ed
memory and stores the sums of some counters for the current
transaction in it. Do you think we should update the comment to
reflect this change?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 2/15/23 1:56 AM, Kyotaro Horiguchi wrote:
> At Tue, 14 Feb 2023 15:43:26 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in
> 
> The comment assumed that the function directly returns an entry from
> shared memory, but now it copies the entry's contents into a palloc'ed
> memory and stores the sums of some counters for the current
> transaction in it. Do you think we should update the comment to
> reflect this change?
> 

Good point, thanks! Yeah, definitively, done in V5 attached.
  
Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Andres Freund
Дата:
Hi,

On 2023-02-15 09:21:48 +0100, Drouvot, Bertrand wrote:
> diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
> index f793ac1516..b26e2a5a7a 100644
> --- a/src/backend/utils/activity/pgstat_relation.c
> +++ b/src/backend/utils/activity/pgstat_relation.c
> @@ -471,20 +471,46 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
>   * Find any existing PgStat_TableStatus entry for rel_id in the current
>   * database. If not found, try finding from shared tables.
>   *
> + * If an entry is found, copy it and increment the copy's counters with their
> + * subtransactions counterparts. Then return the copy. There is no need for the
> + * caller to pfree the copy as the MemoryContext will be reset soon after.
> + *

The "There is no need" bit seems a bit off. Yes, that's true for the current
callers, but who says that it has to stay that way?

Otherwise this looks ready, on a casual scan.

Greetings,

Andres Freund



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 2/16/23 10:21 PM, Andres Freund wrote:
> Hi,
> 
> On 2023-02-15 09:21:48 +0100, Drouvot, Bertrand wrote:
>> diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
>> index f793ac1516..b26e2a5a7a 100644
>> --- a/src/backend/utils/activity/pgstat_relation.c
>> +++ b/src/backend/utils/activity/pgstat_relation.c
>> @@ -471,20 +471,46 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
>>    * Find any existing PgStat_TableStatus entry for rel_id in the current
>>    * database. If not found, try finding from shared tables.
>>    *
>> + * If an entry is found, copy it and increment the copy's counters with their
>> + * subtransactions counterparts. Then return the copy. There is no need for the
>> + * caller to pfree the copy as the MemoryContext will be reset soon after.
>> + *
> 
> The "There is no need" bit seems a bit off. Yes, that's true for the current
> callers, but who says that it has to stay that way?
> 

Good point. Wording has been changed in V6 attached.

> Otherwise this looks ready, on a casual scan.
> 

Thanks for having looked at it!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Mon, Mar 06, 2023 at 08:33:15AM +0100, Drouvot, Bertrand wrote:
> Thanks for having looked at it!

Looking at that, I have a few comments.

+    tabentry = (PgStat_TableStatus *) entry_ref->pending;
+    tablestatus = palloc(sizeof(PgStat_TableStatus));
+    *tablestatus = *tabentry;
+
[...]
+    for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
+    {
+        tablestatus->t_counts.t_tuples_inserted += trans->tuples_inserted;
+        tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
+        tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
+    }

-    if (entry_ref)
-        return entry_ref->pending;
-    return NULL;
+    return tablestatus;

From what I get with this change, the number of tuples changed by DMLs
have their computations done a bit earlier, meaning that it would make
all the callers of find_tabstat_entry() pay the computation cost.
Still it is not really going to matter, because we will just do the
computation once when looking at any pending changes of
pg_stat_xact_all_tables for each entry.  There are 9 callers of
find_tabstat_entry, with 7 being used for pg_stat_xact_all_tables.
How much do we need to care about the remaining two callers
pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()?
Could it be a problem if these two also pay the extra computation cost
if a transaction with many subtransactions (aka )needs to look at their
data?  These two are used nowhere, they have pg_proc entries and they
are undocumented, so it is hard to say the impact of this change on
them..

Second question: once the data from the subtransactions is copied,
would it be cleaner to set trans to NULL after the data copy is done?

It would feel a bit safer to me to document that find_tabstat_entry()
is currently only used for this xact system view..  The extra
computation could lead to surprises, actually, if this routine is used
outside this context?  Perhaps that's OK, but it does not give me a
warm feeling, just to reshape three functions of pgstatfuncs.c with
macros.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 3/16/23 7:29 AM, Michael Paquier wrote:
> On Mon, Mar 06, 2023 at 08:33:15AM +0100, Drouvot, Bertrand wrote:
>> Thanks for having looked at it!
> 
> Looking at that, I have a few comments.
> 
> +    tabentry = (PgStat_TableStatus *) entry_ref->pending;
> +    tablestatus = palloc(sizeof(PgStat_TableStatus));
> +    *tablestatus = *tabentry;
> +
> [...]
> +    for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
> +    {
> +        tablestatus->t_counts.t_tuples_inserted += trans->tuples_inserted;
> +        tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
> +        tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
> +    }
>   
> -    if (entry_ref)
> -        return entry_ref->pending;
> -    return NULL;
> +    return tablestatus;
> 
>  From what I get with this change, the number of tuples changed by DMLs
> have their computations done a bit earlier,

Thanks for looking at it!

Right, but note this is in a dedicated new tablestatus (created within find_tabstat_entry()).

> meaning that it would make
> all the callers of find_tabstat_entry() pay the computation cost.

Right. Another suggested approach was to add a flag but then we'd not really
hide the internal from pgstatfuncs.

> Still it is not really going to matter, because we will just do the
> computation once when looking at any pending changes of
> pg_stat_xact_all_tables for each entry. 

Yes.

> There are 9 callers of
> find_tabstat_entry, with 7 being used for pg_stat_xact_all_tables.

Right, those are:

pg_stat_get_xact_numscans()

pg_stat_get_xact_tuples_returned()

pg_stat_get_xact_tuples_fetched()

pg_stat_get_xact_tuples_inserted()

pg_stat_get_xact_tuples_updated()

pg_stat_get_xact_tuples_deleted()

pg_stat_get_xact_tuples_hot_updated()

> How much do we need to care about the remaining two callers
> pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()?

Regarding pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()
the callers (if any) are outside of the core PG (as from what I can see they are not used
at all).

I don't think we should pay any particular attention to those 2 ones as anyway nothing
prevent the 7 others to be called outside of the pg_stat_xact_all_tables view.

> Could it be a problem if these two also pay the extra computation cost
> if a transaction with many subtransactions (aka )needs to look at their
> data?  These two are used nowhere, they have pg_proc entries and they
> are undocumented, so it is hard to say the impact of this change on
> them..
> 

Right, and that's the same for the 7 others as nothing prevent them to be called outside
of the pg_stat_xact_all_tables view.

Do you think it would be better to add the extra flag then?

> Second question: once the data from the subtransactions is copied,
> would it be cleaner to set trans to NULL after the data copy is done?
> 

That would not hurt but I'm not sure it's worth it (well, it's currently
not done in pg_stat_get_xact_tuples_inserted() for example).

> It would feel a bit safer to me to document that find_tabstat_entry()
> is currently only used for this xact system view..  The extra
> computation could lead to surprises, actually, if this routine is used
> outside this context?  Perhaps that's OK, but it does not give me a
> warm feeling, just to reshape three functions of pgstatfuncs.c with
> macros.

That's a fair point. On the other hand those 9 functions (which can all be used outside
of the pg_stat_xact_all_tables view) are not documented, so I'm not sure this is that much of
a concern (and if we think it is we still gave the option to add an extra flag to indicate whether
or not the extra computation is needed.)

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Thu, Mar 16, 2023 at 11:32:56AM +0100, Drouvot, Bertrand wrote:
> On 3/16/23 7:29 AM, Michael Paquier wrote:
>>  From what I get with this change, the number of tuples changed by DMLs
>> have their computations done a bit earlier,
>
> Thanks for looking at it!
>
> Right, but note this is in a dedicated new tablestatus (created
> within find_tabstat_entry()).

Sure, however it copies the pointer of the PgStat_TableXactStatus from
tabentry, isn't it?  This means that it keeps a reference of the chain
of subtransactions.  It does not matter for the functions but it could
for out-of-core callers of find_tabstat_entry(), no?  Perhaps you are
right and that's not worth worrying, still I don't feel particularly
confident that this is the best approach we can take.

>> How much do we need to care about the remaining two callers
>> pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()?
>
> Regarding pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()
> the callers (if any) are outside of the core PG (as from what I can
> see they are not used at all).
>
> I don't think we should pay any particular attention to those 2 ones
> as anyway nothing prevent the 7 others to be called outside of the
> pg_stat_xact_all_tables view.

I am not quite sure, TBH.  Did you look at the difference with a long
chain of subtrans, like savepoints?  The ODBC driver "loves" producing
a lot of savepoints, for example.

>> It would feel a bit safer to me to document that find_tabstat_entry()
>> is currently only used for this xact system view..  The extra
>> computation could lead to surprises, actually, if this routine is used
>> outside this context?  Perhaps that's OK, but it does not give me a
>> warm feeling, just to reshape three functions of pgstatfuncs.c with
>> macros.
>
> That's a fair point. On the other hand those 9 functions (which can
> all be used outside of the pg_stat_xact_all_tables view) are not
> documented, so I'm not sure this is that much of a concern (and if
> we think it is we still gave the option to add an extra flag to
> indicate whether or not the extra computation is needed.)

That's not quite exact, I think.  The first 7 functions are used in a
system catalog that is documented.  Still we have a problem here.  I
can actually see a few projects relying on these two functions while
looking a bit around, so they are used.  And the issue comes from
ddfc2d9, that has removed these functions from the documentation
ignoring that they are used in no system catalogs.  I think that we
should fix that and re-add the two missing functions with a proper
description in the docs, at least?  There is no trace of them.
Perhaps the ones exposted through pg_stat_xact_all_tables are fine if
not listed.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
On 3/16/23 12:46 PM, Michael Paquier wrote:
> On Thu, Mar 16, 2023 at 11:32:56AM +0100, Drouvot, Bertrand wrote:
>> On 3/16/23 7:29 AM, Michael Paquier wrote:
>>>   From what I get with this change, the number of tuples changed by DMLs
>>> have their computations done a bit earlier,
>>
>> Thanks for looking at it!
>>
>> Right, but note this is in a dedicated new tablestatus (created
>> within find_tabstat_entry()).
> 
> Sure, however it copies the pointer of the PgStat_TableXactStatus from
> tabentry, isn't it?  

Oh I see what you mean, yeah, the pointer is copied.

> This means that it keeps a reference of the chain
> of subtransactions.  It does not matter for the functions but it could
> for out-of-core callers of find_tabstat_entry(), no?

Yeah, maybe.

> Perhaps you are
> right and that's not worth worrying, still I don't feel particularly
> confident that this is the best approach we can take.
> 

due to what potential out-of-core callers could do with it?

>>> How much do we need to care about the remaining two callers
>>> pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()?
>>
>> Regarding pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()
>> the callers (if any) are outside of the core PG (as from what I can
>> see they are not used at all).
>>
>> I don't think we should pay any particular attention to those 2 ones
>> as anyway nothing prevent the 7 others to be called outside of the
>> pg_stat_xact_all_tables view.
> 
> I am not quite sure, TBH.  Did you look at the difference with a long
> chain of subtrans, like savepoints?  The ODBC driver "loves" producing
> a lot of savepoints, for example.
> 

No, I did not measure the impact.

>>> It would feel a bit safer to me to document that find_tabstat_entry()
>>> is currently only used for this xact system view..  The extra
>>> computation could lead to surprises, actually, if this routine is used
>>> outside this context?  Perhaps that's OK, but it does not give me a
>>> warm feeling, just to reshape three functions of pgstatfuncs.c with
>>> macros.
>>
>> That's a fair point. On the other hand those 9 functions (which can
>> all be used outside of the pg_stat_xact_all_tables view) are not
>> documented, so I'm not sure this is that much of a concern (and if
>> we think it is we still gave the option to add an extra flag to
>> indicate whether or not the extra computation is needed.)
> 
> That's not quite exact, I think.  The first 7 functions are used in a
> system catalog that is documented. 

Right.

> Still we have a problem here.  I
> can actually see a few projects relying on these two functions while
> looking a bit around, so they are used.  And the issue comes from
> ddfc2d9, that has removed these functions from the documentation
> ignoring that they are used in no system catalogs.  I think that we
> should fix that and re-add the two missing functions with a proper
> description in the docs, at least? 

As they could be/are used outside of the xact view, yes I think the same.

> There is no trace of them.
> Perhaps the ones exposted through pg_stat_xact_all_tables are fine if
> not listed.

I'd be tempted to add documentation for all of them, I can look at it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Thu, Mar 16, 2023 at 02:13:45PM +0100, Drouvot, Bertrand wrote:
> On 3/16/23 12:46 PM, Michael Paquier wrote:
>> There is no trace of them.
>> Perhaps the ones exposted through pg_stat_xact_all_tables are fine if
>> not listed.
>
> I'd be tempted to add documentation for all of them, I can look at it.

I am not sure that there is any need to completely undo ddfc2d9, later
simplified by 5f2b089, so my opinion would be to just add
documentation for the functions that can be used but are used in none
of the system functions.

Anyway, double-checking, I only see an inconsistency for these two,
confirming my first impression:
- pg_stat_get_xact_blocks_fetched
- pg_stat_get_xact_blocks_hit

There may be a point in having them in some of the system views, but
the non-xact flavors are only used in the statio views, which don't
really need xact versions AFAIK.  I am not sure that it makes much
sense to add them in pg_stat_xact_all_tables, either.  Another view is
just remove them, though some could rely on them externally.  At the
end, documenting both still sounds like the best move to me.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 3/20/23 12:43 AM, Michael Paquier wrote:
> At the
> end, documenting both still sounds like the best move to me.

Agree.

Please find attached v1-0001-pg_stat_get_xact_blocks_fetched-and_hit-doc.patch doing so.

I did not put the exact same wording as the one being removed in ddfc2d9, as:

- For pg_stat_get_xact_blocks_hit(), I think it's better to be closer to say the
pg_statio_all_tables.heap_blks_hit definition.

- For pg_stat_get_xact_blocks_fetched(), I think that using "buffer" is better (than block) as at the
end it's related to pgstat_count_buffer_read().

At the end there is a choice to be made for both for the wording between block and buffer. Indeed their
counters get incremented in "buffer" macros while retrieved in those "blocks" functions.

"Buffer" sounds more appropriate to me, so the attached has been done that way.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Mon, Mar 20, 2023 at 11:57:31AM +0100, Drouvot, Bertrand wrote:
> "Buffer" sounds more appropriate to me, so the attached has been done that way.

This choice is OK for me.

> +        <indexterm>
> +         <primary>pg_stat_get_xact_blocks_fetched</primary>
> +        </indexterm>
> +        <function>pg_stat_get_xact_blocks_fetched</function> ( <type>oid</type> )
> +        <returnvalue>bigint</returnvalue>
> +       </para>
> +       <para>
> +        Returns the number of buffer fetches for table or index, in the current transaction

This should be "number of buffer fetched", no?

> +        </indexterm>
> +        <function>pg_stat_get_xact_blocks_hit</function> ( <type>oid</type> )
> +        <returnvalue>bigint</returnvalue>
> +       </para>
> +       <para>
> +        Returns the number of buffer hits for table or index, in the current transaction
> +       </para></entry>

This one looks OK to me too.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Kyotaro Horiguchi
Дата:
At Wed, 22 Mar 2023 10:16:12 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Mon, Mar 20, 2023 at 11:57:31AM +0100, Drouvot, Bertrand wrote:
> > "Buffer" sounds more appropriate to me, so the attached has been done that way.
> 
> This choice is OK for me.
> 
> > +        <indexterm>
> > +         <primary>pg_stat_get_xact_blocks_fetched</primary>
> > +        </indexterm>
> > +        <function>pg_stat_get_xact_blocks_fetched</function> ( <type>oid</type> )
> > +        <returnvalue>bigint</returnvalue>
> > +       </para>
> > +       <para>
> > +        Returns the number of buffer fetches for table or index, in the current transaction
> 
> This should be "number of buffer fetched", no?

In the original description, "buffer fetches" appears to be a plural
form of a compound noun and correct, similar to "buffer hits"
mentioned later. If we reword it, I think it should be "number of
buffers fetched".

> > +        </indexterm>
> > +        <function>pg_stat_get_xact_blocks_hit</function> ( <type>oid</type> )
> > +        <returnvalue>bigint</returnvalue>
> > +       </para>
> > +       <para>
> > +        Returns the number of buffer hits for table or index, in the current transaction
> > +       </para></entry>
> 
> This one looks OK to me too.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Wed, Mar 22, 2023 at 11:37:03AM +0900, Kyotaro Horiguchi wrote:
> In the original description, "buffer fetches" appears to be a plural
> form of a compound noun and correct, similar to "buffer hits"
> mentioned later. If we reword it, I think it should be "number of
> buffers fetched".

Using the plural makes sense, yes.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 3/22/23 5:45 AM, Michael Paquier wrote:
> On Wed, Mar 22, 2023 at 11:37:03AM +0900, Kyotaro Horiguchi wrote:
>> In the original description, "buffer fetches" appears to be a plural
>> form of a compound noun and correct, similar to "buffer hits"
>> mentioned later. If we reword it, I think it should be "number of
>> buffers fetched".
> 
> Using the plural makes sense, yes.

Yeah, "buffer fetches" is similar to "buffer hits".

For consistency, ISTM than renaming it to "buffers fetched" would also mean
renaming "buffer hits" to "buffers hit". But then it would not be consistent
with the documentation for things like pg_statio_all_tables.heap_blks_hit, idx_blks_hit, toast_blks_hit,
tidx_blks_hit, pg_statio_all_indexes.idx_blks_hit, pg_statio_all_sequences.blks_hit
where "Number of buffer hits" is used).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 3/22/23 7:44 AM, Drouvot, Bertrand wrote:
> Hi,
> 
> On 3/22/23 5:45 AM, Michael Paquier wrote:
>> On Wed, Mar 22, 2023 at 11:37:03AM +0900, Kyotaro Horiguchi wrote:
>>> In the original description, "buffer fetches" appears to be a plural
>>> form of a compound noun and correct, similar to "buffer hits"
>>> mentioned later. If we reword it, I think it should be "number of
>>> buffers fetched".
>>
>> Using the plural makes sense, yes.
> 
> Yeah, "buffer fetches" is similar to "buffer hits".
> 
> For consistency, ISTM than renaming it to "buffers fetched" would also mean
> renaming "buffer hits" to "buffers hit". But then it would not be consistent
> with the documentation for things like pg_statio_all_tables.heap_blks_hit, idx_blks_hit, toast_blks_hit,
> tidx_blks_hit, pg_statio_all_indexes.idx_blks_hit, pg_statio_all_sequences.blks_hit
> where "Number of buffer hits" is used).
> 

That said, please find enclosed V2 with "buffers fetched" suggested above (and no changes to
"buffer hits" to keep consistency with the other part of the documentation mentioned up-thread).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Wed, Mar 22, 2023 at 09:20:25AM +0100, Drouvot, Bertrand wrote:
> That said, please find enclosed V2 with "buffers fetched" suggested
> above (and no changes to "buffer hits" to keep consistency with the
> other part of the documentation mentioned up-thread).

Thanks.  Applied and backpatched that.  In 12 and 11, the style of the
table for these functions was a bit different.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Melanie Plageman
Дата:
On Mon, Mar 20, 2023 at 6:58 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 3/20/23 12:43 AM, Michael Paquier wrote:
> > At the
> > end, documenting both still sounds like the best move to me.
>
> Agree.
>
> Please find attached v1-0001-pg_stat_get_xact_blocks_fetched-and_hit-doc.patch doing so.
>
> I did not put the exact same wording as the one being removed in ddfc2d9, as:
>
> - For pg_stat_get_xact_blocks_hit(), I think it's better to be closer to say the
> pg_statio_all_tables.heap_blks_hit definition.
>
> - For pg_stat_get_xact_blocks_fetched(), I think that using "buffer" is better (than block) as at the
> end it's related to pgstat_count_buffer_read().
>
> At the end there is a choice to be made for both for the wording between block and buffer. Indeed their
> counters get incremented in "buffer" macros while retrieved in those "blocks" functions.
>
> "Buffer" sounds more appropriate to me, so the attached has been done that way.

Apologies as I know this docs update has already been committed, but
buffers fetched and blocks fetched both feel weird to me. If you have a
cache hit, you don't end up really "fetching" anything at all (since
pgstat_count_buffer_read() is called before ReadBuffer_common() and we
don't know if it is a hit or miss yet). And, I would normally associate
fetching with fetching a block into a buffer. It seems like this counter
is really reflecting the number of buffers acquired or used.

tuples_fetched makes more sense because a tuple is "fetched" into a
slot.

This isn't really the fault of this patch since that member was already
called blocks_fetched.

- Melanie



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Wed, Mar 22, 2023 at 02:21:12PM -0400, Melanie Plageman wrote:
> Apologies as I know this docs update has already been committed, but
> buffers fetched and blocks fetched both feel weird to me. If you have a
> cache hit, you don't end up really "fetching" anything at all (since
> pgstat_count_buffer_read() is called before ReadBuffer_common() and we
> don't know if it is a hit or miss yet). And, I would normally associate
> fetching with fetching a block into a buffer. It seems like this counter
> is really reflecting the number of buffers acquired or used.

Well, it is the number of times we've requested a block read, though
it may not actually be a read if the block was in the cache already.

> This isn't really the fault of this patch since that member was already
> called blocks_fetched.

The original documentation of these functions added by 46aa77c refers
to "block fetch requests" and "block requests found in cache", so that
would not be right either based on your opinion here.  If you find
"fetch" to be incorrect in this context, here is another idea:
- "block read requests" for blocks_fetched().
- "block read requested but actually found in cache" for blocks_hit().

All the system views care only about the difference between both
counters to count the number of physical reads really done.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Thu, Mar 16, 2023 at 02:13:45PM +0100, Drouvot, Bertrand wrote:
> On 3/16/23 12:46 PM, Michael Paquier wrote:
>>> I don't think we should pay any particular attention to those 2 ones
>>> as anyway nothing prevent the 7 others to be called outside of the
>>> pg_stat_xact_all_tables view.
>>
>> I am not quite sure, TBH.  Did you look at the difference with a long
>> chain of subtrans, like savepoints?  The ODBC driver "loves" producing
>> a lot of savepoints, for example.
>
> No, I did not measure the impact.

I have been thinking again about this particular point, and I would be
fine with an additional boolean flag to compute the subtrans data in
the global counter when only necessary.  This would not make the
macros for the stat functions that much more complicated, while being
sure that we don't do unnecessary computations when we know that we
don't need them..
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Andres Freund
Дата:
Hi,

On 2023-03-25 11:56:22 +0900, Michael Paquier wrote:
> On Thu, Mar 16, 2023 at 02:13:45PM +0100, Drouvot, Bertrand wrote:
> > On 3/16/23 12:46 PM, Michael Paquier wrote:
> >>> I don't think we should pay any particular attention to those 2 ones
> >>> as anyway nothing prevent the 7 others to be called outside of the
> >>> pg_stat_xact_all_tables view.
> >> 
> >> I am not quite sure, TBH.  Did you look at the difference with a long
> >> chain of subtrans, like savepoints?  The ODBC driver "loves" producing
> >> a lot of savepoints, for example.
> > 
> > No, I did not measure the impact.
> 
> I have been thinking again about this particular point, and I would be
> fine with an additional boolean flag to compute the subtrans data in
> the global counter when only necessary.  This would not make the
> macros for the stat functions that much more complicated, while being
> sure that we don't do unnecessary computations when we know that we
> don't need them..

I don't understand what we're optimizing for here. These functions are very
very very far from being a hot path. The xact functions are barely ever
used. Compared to the cost of query evaluation the cost of iterating throught
he subxacts is neglegible.

Greetings,

Andres Freund



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Fri, Mar 24, 2023 at 08:00:44PM -0700, Andres Freund wrote:
> I don't understand what we're optimizing for here. These functions are very
> very very far from being a hot path. The xact functions are barely ever
> used. Compared to the cost of query evaluation the cost of iterating throught
> he subxacts is neglegible.

I was wondering about that, and I see why I'm wrong.  I have quickly
gone up to 10k subtransactions, and while I was seeing what looks like
difference of 8~10% in runtime when looking at
pg_stat_xact_all_tables, the overval runtime was still close enough
(5.8ms vs 6.4ms).  At this scale, possible that it was some noise,
these seemed repeatable still not to worry about.

Anyway, I was looking at this patch, and I still feel that it is a bit
incorrect to have the copy of PgStat_TableStatus returned by
find_tabstat_entry() to point to the same list of subtransaction data
as the pending entry found, while the counters are incremented.  This
could lead to mistakes if the copy from find_tabstat_entry() is used
in an unexpected way in the future.  The current callers are OK, but
this does not give me a warm feeling :/
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Melanie Plageman
Дата:
On Wed, Mar 22, 2023 at 6:42 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Mar 22, 2023 at 02:21:12PM -0400, Melanie Plageman wrote:
> > Apologies as I know this docs update has already been committed, but
> > buffers fetched and blocks fetched both feel weird to me. If you have a
> > cache hit, you don't end up really "fetching" anything at all (since
> > pgstat_count_buffer_read() is called before ReadBuffer_common() and we
> > don't know if it is a hit or miss yet). And, I would normally associate
> > fetching with fetching a block into a buffer. It seems like this counter
> > is really reflecting the number of buffers acquired or used.
>
> Well, it is the number of times we've requested a block read, though
> it may not actually be a read if the block was in the cache already.
>
> > This isn't really the fault of this patch since that member was already
> > called blocks_fetched.
>
> The original documentation of these functions added by 46aa77c refers
> to "block fetch requests" and "block requests found in cache", so that
> would not be right either based on your opinion here.  If you find
> "fetch" to be incorrect in this context, here is another idea:
> - "block read requests" for blocks_fetched().
> - "block read requested but actually found in cache" for blocks_hit().

I do like/prefer "block read requests" and
"blocks requested found in cache"
Though, now I fear my initial complaint may have been a bit pedantic.

- Melanie



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Mon, Mar 27, 2023 at 10:24:46AM -0400, Melanie Plageman wrote:
> I do like/prefer "block read requests" and
> "blocks requested found in cache"
> Though, now I fear my initial complaint may have been a bit pedantic.

That's fine.  Let's ask for extra opinions, then.

So, have others an opinion to share here?
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Kyotaro Horiguchi
Дата:
At Tue, 28 Mar 2023 07:38:25 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Mon, Mar 27, 2023 at 10:24:46AM -0400, Melanie Plageman wrote:
> > I do like/prefer "block read requests" and
> > "blocks requested found in cache"
> > Though, now I fear my initial complaint may have been a bit pedantic.
> 
> That's fine.  Let's ask for extra opinions, then.
> 
> So, have others an opinion to share here?

I do not have a strong preference for the wording, but consistency is
important.  IMHO simply swapping out a few words won't really improve
things.

I found that commit ddfc2d9a37 removed the descriptions for
pg_stat_get_blocks_fetched and pg_stat_get_blocks_hit. Right before
that commit, monitoring.sgml had these lines:

-     <function>pg_stat_get_blocks_fetched</function> minus
-     <function>pg_stat_get_blocks_hit</function> gives the number of kernel
-     <function>read()</> calls issued for the table, index, or
-     database; the number of actual physical reads is usually
-     lower due to kernel-level buffering.  The <literal>*_blks_read</>
-     statistics columns use this subtraction, i.e., fetched minus hit.

The commit then added the following sentence to the description for
pg_statio_all_tables.heap_blks_read.

>     <entry>Number of disk blocks read from this table.
>     This value can also be returned by directly calling
>     the <function>pg_stat_get_blocks_fetched</function> and
>     <function>pg_stat_get_blocks_hit</function> functions and
>     subtracting the results.</entry>

Later, in 5f2b089387 it twas revised as:
+     <entry>Number of disk blocks read in this database</entry>

This revision lost the explantion regarding the relationship among
fetch, hit and read, as it became hidden in the views' definitions.

As the result, in the current state, it doesn't make sense to just add
a description for pg_stat_get_xact_blocks_fetched().

The confusion stems from the inconsistency between the views and
underlying functions related to block reads and hits. If we add
descriptions for the two functions, we should also explain their
relationship.  Otherwise, it might be better to add the functions
pg_stat_*_blocks_read() instead.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Tue, Mar 28, 2023 at 12:36:15PM +0900, Kyotaro Horiguchi wrote:
> I found that commit ddfc2d9a37 removed the descriptions for
> pg_stat_get_blocks_fetched and pg_stat_get_blocks_hit. Right before
> that commit, monitoring.sgml had these lines:
>
> -     <function>pg_stat_get_blocks_fetched</function> minus
> -     <function>pg_stat_get_blocks_hit</function> gives the number of kernel
> -     <function>read()</> calls issued for the table, index, or
> -     database; the number of actual physical reads is usually
> -     lower due to kernel-level buffering.  The <literal>*_blks_read</>
> -     statistics columns use this subtraction, i.e., fetched minus hit.
>
> The commit then added the following sentence to the description for
> pg_statio_all_tables.heap_blks_read.
>
> Later, in 5f2b089387 it twas revised as:
> +     <entry>Number of disk blocks read in this database</entry>

Yeah, maybe adding something like that at the bottom of the table for
stat functions, telling that the difference is the number of read()
calls, may help.  Perhaps also adding a mention that these are used in
none of the existing system views..

> The confusion stems from the inconsistency between the views and
> underlying functions related to block reads and hits. If we add
> descriptions for the two functions, we should also explain their
> relationship.  Otherwise, it might be better to add the functions
> pg_stat_*_blocks_read() instead.

I am not sure that we really need to get down to that as this holds
the same meaning as the current system views showing read as the
difference between fetched and hit.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 3/28/23 7:23 AM, Michael Paquier wrote:
> On Tue, Mar 28, 2023 at 12:36:15PM +0900, Kyotaro Horiguchi wrote:
>> I found that commit ddfc2d9a37 removed the descriptions for
>> pg_stat_get_blocks_fetched and pg_stat_get_blocks_hit. Right before
>> that commit, monitoring.sgml had these lines:
>>
>> -     <function>pg_stat_get_blocks_fetched</function> minus
>> -     <function>pg_stat_get_blocks_hit</function> gives the number of kernel
>> -     <function>read()</> calls issued for the table, index, or
>> -     database; the number of actual physical reads is usually
>> -     lower due to kernel-level buffering.  The <literal>*_blks_read</>
>> -     statistics columns use this subtraction, i.e., fetched minus hit.
>>
>> The commit then added the following sentence to the description for
>> pg_statio_all_tables.heap_blks_read.
>>
>> Later, in 5f2b089387 it twas revised as:
>> +     <entry>Number of disk blocks read in this database</entry>
> 
> Yeah, maybe adding something like that at the bottom of the table for
> stat functions, telling that the difference is the number of read()
> calls, may help.  Perhaps also adding a mention that these are used in
> none of the existing system views..
> 
>> The confusion stems from the inconsistency between the views and
>> underlying functions related to block reads and hits. If we add
>> descriptions for the two functions, we should also explain their
>> relationship. 

I agree that adding more explanation would help and avoid confusion.

What about something like?

for pg_stat_get_xact_blocks_fetched(): "block read requests for table or index, in the current
transaction. This number minus pg_stat_get_xact_blocks_hit() gives the number of kernel
read() calls."

pg_stat_get_xact_blocks_hit(): "block read requests for table or index, in the current
transaction, found in cache (not triggering kernel read() calls)".

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Tue, Mar 28, 2023 at 07:49:45AM +0200, Drouvot, Bertrand wrote:
> What about something like?
>
> for pg_stat_get_xact_blocks_fetched(): "block read requests for table or index, in the current
> transaction. This number minus pg_stat_get_xact_blocks_hit() gives the number of kernel
> read() calls."
>
> pg_stat_get_xact_blocks_hit(): "block read requests for table or index, in the current
> transaction, found in cache (not triggering kernel read() calls)".

Something among these lines within the table would be also OK by me.
Horiguchi-san or Melanie-san, perhaps you have counter-proposals?
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Kyotaro Horiguchi
Дата:
At Tue, 28 Mar 2023 15:16:36 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Tue, Mar 28, 2023 at 07:49:45AM +0200, Drouvot, Bertrand wrote:
> > What about something like?
> > 
> > for pg_stat_get_xact_blocks_fetched(): "block read requests for table or index, in the current
> > transaction. This number minus pg_stat_get_xact_blocks_hit() gives the number of kernel
> > read() calls."
> > 
> > pg_stat_get_xact_blocks_hit(): "block read requests for table or index, in the current
> > transaction, found in cache (not triggering kernel read() calls)".
> 
> Something among these lines within the table would be also OK by me.
> Horiguchi-san or Melanie-san, perhaps you have counter-proposals?

No. Fine by me, except that "block read requests" seems to suggest
kernel read() calls, maybe because it's not clear whether "block"
refers to our buffer blocks or file blocks to me.. If it is generally
clear, I'm fine with the proposal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Tue, Mar 28, 2023 at 05:43:26PM +0900, Kyotaro Horiguchi wrote:
> No. Fine by me, except that "block read requests" seems to suggest
> kernel read() calls, maybe because it's not clear whether "block"
> refers to our buffer blocks or file blocks to me.. If it is generally
> clear, I'm fine with the proposal.

Okay.  Would somebody like to draft a patch?
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 3/29/23 2:09 AM, Michael Paquier wrote:
> On Tue, Mar 28, 2023 at 05:43:26PM +0900, Kyotaro Horiguchi wrote:
>> No. Fine by me, except that "block read requests" seems to suggest
>> kernel read() calls, maybe because it's not clear whether "block"
>> refers to our buffer blocks or file blocks to me.. If it is generally
>> clear, I'm fine with the proposal.
> 
> Okay.  Would somebody like to draft a patch?

Please find a draft attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Wed, Mar 29, 2023 at 07:44:20AM +0200, Drouvot, Bertrand wrote:
> Please find a draft attached.

This addition looks OK for me.  Thanks for the patch!
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Tue, Apr 04, 2023 at 09:04:34PM +0900, Michael Paquier wrote:
> This addition looks OK for me.  Thanks for the patch!

Okay, finally done.  One part that was still not complete to me in
light of the information ddfc2d9 has removed is that the number of
physical reads could be lower than the reported number depending on
what the kernel cache holds.  So I've added this sentence, while on
it.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 3/27/23 8:35 AM, Michael Paquier wrote:
> On Fri, Mar 24, 2023 at 08:00:44PM -0700, Andres Freund wrote:
>> I don't understand what we're optimizing for here. These functions are very
>> very very far from being a hot path. The xact functions are barely ever
>> used. Compared to the cost of query evaluation the cost of iterating throught
>> he subxacts is neglegible.
> 
> I was wondering about that, and I see why I'm wrong.  I have quickly
> gone up to 10k subtransactions, and while I was seeing what looks like
> difference of 8~10% in runtime when looking at
> pg_stat_xact_all_tables, the overval runtime was still close enough
> (5.8ms vs 6.4ms).  At this scale, possible that it was some noise,
> these seemed repeatable still not to worry about.
> 
> Anyway, I was looking at this patch, and I still feel that it is a bit
> incorrect to have the copy of PgStat_TableStatus returned by
> find_tabstat_entry() to point to the same list of subtransaction data
> as the pending entry found, while the counters are incremented.  This
> could lead to mistakes if the copy from find_tabstat_entry() is used
> in an unexpected way in the future.  The current callers are OK, but
> this does not give me a warm feeling :/

FWIW, please find attached V7 (mandatory rebase).

It would allow to also define:

- pg_stat_get_xact_tuples_inserted
- pg_stat_get_xact_tuples_updated
- pg_stat_get_xact_tuples_deleted

as macros, joining others pg_stat_get_xact_*() that are already
defined as macros.

The concern you raised above has not been addressed, meaning that
find_tabstat_entry() still return a copy of PgStat_TableStatus.

By "used in an unexpected way in the future", what do you mean exactly? Do you mean
the caller forgetting it is working on a copy and then could work with
"stale" counters?

Trying to understand to see if I should invest time to try to address your concern
or leave those 3 functions as they are currently before moving back to the
"Split index and table statistics into different types of stats" work [1].


[1]: https://www.postgresql.org/message-id/flat/f572abe7-a1bb-e13b-48c7-2ca150546822@gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Thu, Oct 26, 2023 at 10:04:25AM +0200, Drouvot, Bertrand wrote:
> By "used in an unexpected way in the future", what do you mean exactly? Do you mean
> the caller forgetting it is working on a copy and then could work with
> "stale" counters?

(Be careful about the code indentation.)

The part that I found disturbing is here:
+       tabentry = (PgStat_TableStatus *) entry_ref->pending;
+       tablestatus = palloc(sizeof(PgStat_TableStatus));
+       *tablestatus = *tabentry;

This causes tablestatus->trans to point to the same location as
tabentry->trans, but wouldn't it be better to set tablestatus->trans
to NULL instead for the copy returned to the caller?
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
"Drouvot, Bertrand"
Дата:
Hi,

On 10/27/23 8:07 AM, Michael Paquier wrote:
> 
> The part that I found disturbing is here:
> +       tabentry = (PgStat_TableStatus *) entry_ref->pending;
> +       tablestatus = palloc(sizeof(PgStat_TableStatus));
> +       *tablestatus = *tabentry;
> 
> This causes tablestatus->trans to point to the same location as
> tabentry->trans, but wouldn't it be better to set tablestatus->trans
> to NULL instead for the copy returned to the caller?

Oh I see, yeah I do agree to set tablestatus->trans to NULL to avoid
any undesired interference with tabentry->trans.

Done in V8 attached (pgindent has been run on pgstatfuncs.c and
pgstat_relation.c).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Fri, Oct 27, 2023 at 09:45:52AM +0200, Drouvot, Bertrand wrote:
> Oh I see, yeah I do agree to set tablestatus->trans to NULL to avoid
> any undesired interference with tabentry->trans.
>
> Done in V8 attached (pgindent has been run on pgstatfuncs.c and
> pgstat_relation.c).

LGTM.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Michael Paquier
Дата:
On Fri, Oct 27, 2023 at 09:45:52AM +0200, Drouvot, Bertrand wrote:
> Done in V8 attached (pgindent has been run on pgstatfuncs.c and
> pgstat_relation.c).

And applied that after editing a bit the comments.
--
Michael

Вложения

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

От
Andres Freund
Дата:
On 2023-10-30 08:25:17 +0900, Michael Paquier wrote:
> On Fri, Oct 27, 2023 at 09:45:52AM +0200, Drouvot, Bertrand wrote:
> > Done in V8 attached (pgindent has been run on pgstatfuncs.c and
> > pgstat_relation.c).
> 
> And applied that after editing a bit the comments.

Thank you both!