Обсуждение: pg_test_timing: fix unit typo and widen diff type

Поиск
Список
Период
Сортировка

pg_test_timing: fix unit typo and widen diff type

От
Chao Li
Дата:
Hi,

This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While
reading82c0cb4e672, I noticed a mistake in an error message. The relevant code is like: 
```
diff = INSTR_TIME_GET_NANOSEC(diff_time);

fprintf(stderr, _("Time warp: %d ms\n"), diff);
```

Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.

To fix that, I think there are two possible options:

1. Use INSTR_TIME_GET_MILLISEC to get “diff"
2. Change “ms" to “ns" in the error message

After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another
issue.

“diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should be
int64for two reasons: 

* INSTR_TIME_GET_NANOSEC() explicitly returns int64:
```
#define INSTR_TIME_GET_NANOSEC(t) \
               ((int64) (t).ticks)
```

* The current code has a sanity check for backward clock drift:
```
        /* Did time go backwards? */
        if (unlikely(diff < 0))
        {
            fprintf(stderr, _("Detected clock going backwards in time.\n"));
            fprintf(stderr, _("Time warp: %d ms\n"), diff);
            exit(1);
        }
```
Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed in
nanoseconds,making the value appear negative. In that case, the code could report a “backwards” clock jump when the
actualjump was forwards, which would be misleading. 

To make the patch easier to process, I split it into two parts: 0001 fixes the unit in the error message, and 0002
changesthe type of diff. If this gets accepted, I would be happy to squash them into one commit. 

I should also note that although I noticed this while reading 82c0cb4e672, I do not think this was an oversight of that
commit.More likely, because clock drift backwards is rare, this issue has simply gone unnoticed for many years. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/





Вложения

Re: pg_test_timing: fix unit typo and widen diff type

От
Lukas Fittl
Дата:
Hi Chao,

On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi,
>
> This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While
reading82c0cb4e672, I noticed a mistake in an error message. The relevant code is like: 
> ```
> diff = INSTR_TIME_GET_NANOSEC(diff_time);
>
> fprintf(stderr, _("Time warp: %d ms\n"), diff);
> ```
>
> Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.

Good catch!

It looks like the use of nanoseconds for "diff" got introduced last
year in 0b096e379e6f9bd49 (as you note later in the email, today's
commit didn't actually change that part), CCing Tom and Hannu as
authors of that earlier change.

That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
there before that earlier commit, but called it "ms" (i.e.
milliseconds) in the error message.

>
> To fix that, I think there are two possible options:
>
> 1. Use INSTR_TIME_GET_MILLISEC to get “diff"
> 2. Change “ms" to “ns" in the error message
>
> After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another
issue.
>
> “diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should
beint64 for two reasons: 
>
> * INSTR_TIME_GET_NANOSEC() explicitly returns int64:
> ```
> #define INSTR_TIME_GET_NANOSEC(t) \
>                ((int64) (t).ticks)
> ```
>
> * The current code has a sanity check for backward clock drift:
> ```
>         /* Did time go backwards? */
>         if (unlikely(diff < 0))
>         {
>             fprintf(stderr, _("Detected clock going backwards in time.\n"));
>             fprintf(stderr, _("Time warp: %d ms\n"), diff);
>             exit(1);
>         }
> ```
> Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed
innanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the
actualjump was forwards, which would be misleading. 

I agree it doesn't seem sound to use an int32 for storing the result
of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
the call to pg_leftmost_one_pos32 though if we actually accept that
large a "diff" value, as in your patch.

Maybe we should error out if the diff is larger than an int32, noting
a positive time drift?

Independently of that, its worth noting we could easily emit the diff
in a larger unit (micro or milliseconds) for easier interpretation, by
just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
"diff_time" again.

Thanks,
Lukas


--
Lukas Fittl



Re: pg_test_timing: fix unit typo and widen diff type

От
Chao Li
Дата:

> On Apr 2, 2026, at 12:17, Lukas Fittl <lukas@fittl.com> wrote:
>
> Hi Chao,
>
> On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>
>> Hi,
>>
>> This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While
reading82c0cb4e672, I noticed a mistake in an error message. The relevant code is like: 
>> ```
>> diff = INSTR_TIME_GET_NANOSEC(diff_time);
>>
>> fprintf(stderr, _("Time warp: %d ms\n"), diff);
>> ```
>>
>> Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.
>
> Good catch!
>
> It looks like the use of nanoseconds for "diff" got introduced last
> year in 0b096e379e6f9bd49 (as you note later in the email, today's
> commit didn't actually change that part), CCing Tom and Hannu as
> authors of that earlier change.
>
> That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
> there before that earlier commit, but called it "ms" (i.e.
> milliseconds) in the error message.
>
>>
>> To fix that, I think there are two possible options:
>>
>> 1. Use INSTR_TIME_GET_MILLISEC to get “diff"
>> 2. Change “ms" to “ns" in the error message
>>
>> After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another
issue.
>>
>> “diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should
beint64 for two reasons: 
>>
>> * INSTR_TIME_GET_NANOSEC() explicitly returns int64:
>> ```
>> #define INSTR_TIME_GET_NANOSEC(t) \
>>               ((int64) (t).ticks)
>> ```
>>
>> * The current code has a sanity check for backward clock drift:
>> ```
>>        /* Did time go backwards? */
>>        if (unlikely(diff < 0))
>>        {
>>            fprintf(stderr, _("Detected clock going backwards in time.\n"));
>>            fprintf(stderr, _("Time warp: %d ms\n"), diff);
>>            exit(1);
>>        }
>> ```
>> Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed
innanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the
actualjump was forwards, which would be misleading. 
>
> I agree it doesn't seem sound to use an int32 for storing the result
> of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
> the call to pg_leftmost_one_pos32 though if we actually accept that
> large a "diff" value, as in your patch.

You are right. Changed to pg_leftmost_one_pos64 in v2.

>
> Maybe we should error out if the diff is larger than an int32, noting
> a positive time drift?

I agree we should warn/fail upon clock forwards drift. But I doubt int32 is too big (~2.14 seconds), I consider 1
secondcould be a too big threshold. Let’s wait for more voices on this. 

>
> Independently of that, its worth noting we could easily emit the diff
> in a larger unit (micro or milliseconds) for easier interpretation, by
> just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
> "diff_time" again.
>

Given the error should rarely happen, I personally feel that might not be super helpful. Also, if the drift is just
beyondthe threshold, bumping to microsecond or millisecond might print just 0. 

PFA v2 - updated 0002 for pg_leftmost_one_pos64.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/





Вложения

Re: pg_test_timing: fix unit typo and widen diff type

От
Chao Li
Дата:

> On Apr 2, 2026, at 15:28, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
>> On Apr 2, 2026, at 12:17, Lukas Fittl <lukas@fittl.com> wrote:
>>
>> Hi Chao,
>>
>> On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While
reading82c0cb4e672, I noticed a mistake in an error message. The relevant code is like: 
>>> ```
>>> diff = INSTR_TIME_GET_NANOSEC(diff_time);
>>>
>>> fprintf(stderr, _("Time warp: %d ms\n"), diff);
>>> ```
>>>
>>> Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.
>>
>> Good catch!
>>
>> It looks like the use of nanoseconds for "diff" got introduced last
>> year in 0b096e379e6f9bd49 (as you note later in the email, today's
>> commit didn't actually change that part), CCing Tom and Hannu as
>> authors of that earlier change.
>>
>> That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
>> there before that earlier commit, but called it "ms" (i.e.
>> milliseconds) in the error message.
>>
>>>
>>> To fix that, I think there are two possible options:
>>>
>>> 1. Use INSTR_TIME_GET_MILLISEC to get “diff"
>>> 2. Change “ms" to “ns" in the error message
>>>
>>> After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another
issue.
>>>
>>> “diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should
beint64 for two reasons: 
>>>
>>> * INSTR_TIME_GET_NANOSEC() explicitly returns int64:
>>> ```
>>> #define INSTR_TIME_GET_NANOSEC(t) \
>>>              ((int64) (t).ticks)
>>> ```
>>>
>>> * The current code has a sanity check for backward clock drift:
>>> ```
>>>       /* Did time go backwards? */
>>>       if (unlikely(diff < 0))
>>>       {
>>>           fprintf(stderr, _("Detected clock going backwards in time.\n"));
>>>           fprintf(stderr, _("Time warp: %d ms\n"), diff);
>>>           exit(1);
>>>       }
>>> ```
>>> Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when
expressedin nanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump
whenthe actual jump was forwards, which would be misleading. 
>>
>> I agree it doesn't seem sound to use an int32 for storing the result
>> of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
>> the call to pg_leftmost_one_pos32 though if we actually accept that
>> large a "diff" value, as in your patch.
>
> You are right. Changed to pg_leftmost_one_pos64 in v2.
>
>>
>> Maybe we should error out if the diff is larger than an int32, noting
>> a positive time drift?
>
> I agree we should warn/fail upon clock forwards drift. But I doubt int32 is too big (~2.14 seconds), I consider 1
secondcould be a too big threshold. Let’s wait for more voices on this. 
>
>>
>> Independently of that, its worth noting we could easily emit the diff
>> in a larger unit (micro or milliseconds) for easier interpretation, by
>> just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
>> "diff_time" again.
>>
>
> Given the error should rarely happen, I personally feel that might not be super helpful. Also, if the drift is just
beyondthe threshold, bumping to microsecond or millisecond might print just 0. 
>
> PFA v2 - updated 0002 for pg_leftmost_one_pos64.
>
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
>
>
>
>
>
<v2-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch><v2-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch>

PFA v3: Fixed a CI failure.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/





Вложения

Re:Re: pg_test_timing: fix unit typo and widen diff type

От
"wang.xiao.peng"
Дата:
At 2026-04-08 12:06:07, "Chao Li" <li.evan.chao@gmail.com> wrote:
>
>
>> On Apr 2, 2026, at 15:28, Chao Li <li.evan.chao@gmail.com> wrote:
>> 
>> 
>> 
>>> On Apr 2, 2026, at 12:17, Lukas Fittl <lukas@fittl.com> wrote:
>>> 
>>> Hi Chao,
>>> 
>>> On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While reading 82c0cb4e672, I noticed a mistake in an error message. The relevant code is like:
>>>> ```
>>>> diff = INSTR_TIME_GET_NANOSEC(diff_time);
>>>> 
>>>> fprintf(stderr, _("Time warp: %d ms\n"), diff);
>>>> ```
>>>> 
>>>> Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.
>>> 
>>> Good catch!
>>> 
>>> It looks like the use of nanoseconds for "diff" got introduced last
>>> year in 0b096e379e6f9bd49 (as you note later in the email, today's
>>> commit didn't actually change that part), CCing Tom and Hannu as
>>> authors of that earlier change.
>>> 
>>> That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
>>> there before that earlier commit, but called it "ms" (i.e.
>>> milliseconds) in the error message.
>>> 
>>>> 
>>>> To fix that, I think there are two possible options:
>>>> 
>>>> 1. Use INSTR_TIME_GET_MILLISEC to get “diff"
>>>> 2. Change “ms" to “ns" in the error message
>>>> 
>>>> After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another issue.
>>>> 
>>>> “diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should be int64 for two reasons:
>>>> 
>>>> * INSTR_TIME_GET_NANOSEC() explicitly returns int64:
>>>> ```
>>>> #define INSTR_TIME_GET_NANOSEC(t) \
>>>>              ((int64) (t).ticks)
>>>> ```
>>>> 
>>>> * The current code has a sanity check for backward clock drift:
>>>> ```
>>>>       /* Did time go backwards? */
>>>>       if (unlikely(diff < 0))
>>>>       {
>>>>           fprintf(stderr, _("Detected clock going backwards in time.\n"));
>>>>           fprintf(stderr, _("Time warp: %d ms\n"), diff);
>>>>           exit(1);
>>>>       }
>>>> ```
>>>> Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed in nanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the actual jump was forwards, which would be misleading.
>>> 
>>> I agree it doesn't seem sound to use an int32 for storing the result
>>> of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
>>> the call to pg_leftmost_one_pos32 though if we actually accept that
>>> large a "diff" value, as in your patch.
>> 
>> You are right. Changed to pg_leftmost_one_pos64 in v2.
>> 
>>> 
>>> Maybe we should error out if the diff is larger than an int32, noting
>>> a positive time drift?
>> 
>> I agree we should warn/fail upon clock forwards drift. But I doubt int32 is too big (~2.14 seconds), I consider 1 second could be a too big threshold. Let’s wait for more voices on this.
>> 
>>> 
>>> Independently of that, its worth noting we could easily emit the diff
>>> in a larger unit (micro or milliseconds) for easier interpretation, by
>>> just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
>>> "diff_time" again.
>>> 
>> 
>> Given the error should rarely happen, I personally feel that might not be super helpful. Also, if the drift is just beyond the threshold, bumping to microsecond or millisecond might print just 0.
>> 
>> PFA v2 - updated 0002 for pg_leftmost_one_pos64.
>> 
>> Best regards,
>> --
>> Chao Li (Evan)
>> HighGo Software Co., Ltd.
>> https://www.highgo.com/
>> 
>> 
>> 
>> 
>> <v2-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch><v2-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch>
>
>PFA v3: Fixed a CI failure.
>
>Best regards,
>--
>Chao Li (Evan)
>HighGo Software Co., Ltd.
>https://www.highgo.com/
>
>
Hi Chao, Lukas,
Just finished reviewing the v3 patches - looks good to me. The pg_leftmost_one_pos64 fix is spot on, and using INT64_FORMAT for the error message is the right approach.
Nice catch on the unit mismatch and the int32 overflow issue. Both patches are ready to go from my perspective.
Cheers,
Xiaopeng Wang