Обсуждение: What is a typical precision of gettimeofday()?
Over in the thread discussing the addition of UUIDv7 support [0], there is some uncertainty about what timestamp precision one can expect from gettimeofday(). UUIDv7 uses milliseconds since Unix epoch, but can optionally use up to 12 additional bits of timestamp precision (see [1]), but it can also just use a counter instead of the extra precision. The current patch uses the counter method "because of portability concerns" (source code comment). I feel that we don't actually have any information about this portability concern. Does anyone know what precision we can expect from gettimeofday()? Can we expect the full microsecond precision usually? [0]: https://www.postgresql.org/message-id/flat/CAAhFRxitJv=yoGnXUgeLB_O+M7J2BJAmb5jqAT9gZ3bij3uLDA@mail.gmail.com [1]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#section-6.2-5.6.1
Hi, cc: Andrey > Over in the thread discussing the addition of UUIDv7 support [0], there > is some uncertainty about what timestamp precision one can expect from > gettimeofday(). > > UUIDv7 uses milliseconds since Unix epoch, but can optionally use up to > 12 additional bits of timestamp precision (see [1]), but it can also > just use a counter instead of the extra precision. The current patch > uses the counter method "because of portability concerns" (source code > comment). > > I feel that we don't actually have any information about this > portability concern. Does anyone know what precision we can expect from > gettimeofday()? Can we expect the full microsecond precision usually? Specifically in the UUIDv7 application the goal is to generate not necessarily time-precise UUIDs but rather do our best to get *unique* UUIDs. As I understand, this is the actual reason why the patch needs counters. As Linux man page puts it: """ The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). """ On top of that MacOS man page says: """ The resolution of the system clock is hardware dependent, and the time may be updated continuously or in ``ticks.'' """ On Windows our gettimeofday() implementation is a wrapper for GetSystemTimePreciseAsFileTime(). The corresponding MSDN page [1] is somewhat laconic. Considering the number of environments PostgreSQL can run in (OS + hardware + virtualization technologies) and the fact that hardware/software changes I doubt that it's realistic to expect any particular guarantees from gettimeofday() in the general case. [1]: https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime -- Best regards, Aleksander Alekseev
On 19.03.24 10:38, Aleksander Alekseev wrote: > Considering the number of environments PostgreSQL can run in (OS + > hardware + virtualization technologies) and the fact that > hardware/software changes I doubt that it's realistic to expect any > particular guarantees from gettimeofday() in the general case. If we want to be robust without any guarantees from gettimeofday(), then arguably gettimeofday() is not the right underlying function to use for UUIDv7. I'm not arguing that, I think we can assume some reasonable baseline for what gettimeofday() produces. But it would be good to get some information about what that might be. Btw., here is util-linux saying /* Assume that the gettimeofday() has microsecond granularity */ https://github.com/util-linux/util-linux/blob/master/libuuid/src/gen_uuid.c#L232
On Wed, 20 Mar 2024 at 07:35, Peter Eisentraut <peter@eisentraut.org> wrote: > If we want to be robust without any guarantees from gettimeofday(), then > arguably gettimeofday() is not the right underlying function to use for > UUIDv7. There's also clock_gettime which exposes its resolution using clock_getres
> On 19 Mar 2024, at 13:28, Peter Eisentraut <peter@eisentraut.org> wrote: > > I feel that we don't actually have any information about this portability concern. Does anyone know what precision wecan expect from gettimeofday()? Can we expect the full microsecond precision usually? At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve tried this module(slightly modified to measurenanoseconds) on some systems, and everywhere I found ~100ns resolution (95% of ticks fall into 64ns and 128ns buckets). I’ll add cc Hannu, and also pg_test_timing module authors Ants ang Greg. Maybe they can add some context. Best regards, Andrey Borodin.
I plan to send patch to pg_test_timing in a day or two
the underlying time precision on modern linux seems to be
2 ns for some Intel CPUs
10 ns for Zen4
40 ns for ARM (Ampere)
---
the underlying time precision on modern linux seems to be
2 ns for some Intel CPUs
10 ns for Zen4
40 ns for ARM (Ampere)
---
Hannu
|
|
On Tue, Jun 18, 2024 at 7:48 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> On 19 Mar 2024, at 13:28, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I feel that we don't actually have any information about this portability concern. Does anyone know what precision we can expect from gettimeofday()? Can we expect the full microsecond precision usually?
At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve tried this module(slightly modified to measure nanoseconds) on some systems, and everywhere I found ~100ns resolution (95% of ticks fall into 64ns and 128ns buckets).
I’ll add cc Hannu, and also pg_test_timing module authors Ants ang Greg. Maybe they can add some context.
Best regards, Andrey Borodin.
Here is the output of nanosecond-precision pg_test_timing on M1 Macbook Air /work/postgres/src/bin/pg_test_timing % ./pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 21.54 ns Histogram of timing durations: <= ns % of total running % count 0 49.1655 49.1655 68481688 1 0.0000 49.1655 0 3 0.0000 49.1655 0 7 0.0000 49.1655 0 15 0.0000 49.1655 0 31 0.0000 49.1655 0 63 50.6890 99.8545 70603742 127 0.1432 99.9976 199411 255 0.0015 99.9991 2065 511 0.0001 99.9992 98 1023 0.0001 99.9993 140 2047 0.0002 99.9995 284 4095 0.0000 99.9995 50 8191 0.0000 99.9996 65 16383 0.0002 99.9997 240 32767 0.0001 99.9998 128 65535 0.0001 99.9999 97 131071 0.0000 99.9999 58 262143 0.0000 100.0000 44 524287 0.0000 100.0000 22 1048575 0.0000 100.0000 7 2097151 0.0000 100.0000 2 First 128 exact nanoseconds: 0 49.1655 49.1655 68481688 41 16.8964 66.0619 23534708 42 33.7926 99.8545 47069034 83 0.0835 99.9380 116362 84 0.0419 99.9799 58349 125 0.0177 99.9976 24700 As you see the 40 ns internal tick gets somehow blurred into not-quite-40-ns timing step On Linux / ARM Ampere where __builtin_readcyclecounter() works (it compiles but crashes on Mac OS M1, I have not yet tested on Linux M1) the tick is exactly 40 ns and I'd expect it to be the same on M1. On Tue, Jun 18, 2024 at 5:08 PM Hannu Krosing <hannuk@google.com> wrote: > > I plan to send patch to pg_test_timing in a day or two > > the underlying time precision on modern linux seems to be > > 2 ns for some Intel CPUs > 10 ns for Zen4 > 40 ns for ARM (Ampere) > > --- > Hannu > > > > | > > > > > On Tue, Jun 18, 2024 at 7:48 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote: >> >> >> >> > On 19 Mar 2024, at 13:28, Peter Eisentraut <peter@eisentraut.org> wrote: >> > >> > I feel that we don't actually have any information about this portability concern. Does anyone know what precisionwe can expect from gettimeofday()? Can we expect the full microsecond precision usually? >> >> At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve tried this module(slightly modified to measurenanoseconds) on some systems, and everywhere I found ~100ns resolution (95% of ticks fall into 64ns and 128ns buckets). >> >> I’ll add cc Hannu, and also pg_test_timing module authors Ants ang Greg. Maybe they can add some context. >> >> >> Best regards, Andrey Borodin.
On 18.06.24 07:47, Andrey M. Borodin wrote: > > >> On 19 Mar 2024, at 13:28, Peter Eisentraut <peter@eisentraut.org> wrote: >> >> I feel that we don't actually have any information about this portability concern. Does anyone know what precision wecan expect from gettimeofday()? Can we expect the full microsecond precision usually? > > At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve tried this module(slightly modified to measurenanoseconds) on some systems, and everywhere I found ~100ns resolution (95% of ticks fall into 64ns and 128ns buckets). AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't really address the original question.
Peter Eisentraut <peter@eisentraut.org> writes: > AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't > really address the original question. It's not exactly hard to make it do so (see attached). I tried this on several different machines, and my conclusion is that gettimeofday() reports full microsecond precision on any platform anybody is likely to be running PG on today. Even my one surviving pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like this: $ ./pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 901.41 ns Histogram of timing durations: < us % of total count 1 10.46074 348148 2 89.51495 2979181 4 0.00574 191 8 0.00430 143 16 0.00691 230 32 0.00376 125 64 0.00012 4 128 0.00303 101 256 0.00027 9 512 0.00009 3 1024 0.00009 3 I also modified pg_test_timing to measure nanoseconds not microseconds (second patch attached), and got this: $ ./pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 805.50 ns Histogram of timing durations: < ns % of total count 1 19.84234 739008 2 0.00000 0 4 0.00000 0 8 0.00000 0 16 0.00000 0 32 0.00000 0 64 0.00000 0 128 0.00000 0 256 0.00000 0 512 0.00000 0 1024 80.14013 2984739 2048 0.00078 29 4096 0.00658 245 8192 0.00290 108 16384 0.00252 94 32768 0.00250 93 65536 0.00016 6 131072 0.00185 69 262144 0.00008 3 524288 0.00008 3 1048576 0.00008 3 confirming that when the result changes it generally does so by 1usec. Applying just the second patch, I find that clock_gettime on this old hardware seems to be limited to 1us resolution, but on my more modern machines (mac M1, x86_64) it can tick at 40ns or less. Even a raspberry pi 4 shows $ ./pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 69.12 ns Histogram of timing durations: < ns % of total count 1 0.00000 0 2 0.00000 0 4 0.00000 0 8 0.00000 0 16 0.00000 0 32 0.00000 0 64 37.59583 16317040 128 62.38568 27076131 256 0.01674 7265 512 0.00002 8 1024 0.00000 0 2048 0.00000 0 4096 0.00153 662 8192 0.00019 83 16384 0.00001 3 32768 0.00001 5 suggesting that the clock_gettime resolution is better than 64 ns. So I concur with Hannu that it's time to adjust pg_test_timing to resolve nanoseconds not microseconds. I gather he's created a patch that does more than mine below, so I'll wait for that. regards, tom lane diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h index a6fc1922f2..5509d23d2f 100644 --- a/src/include/portability/instr_time.h +++ b/src/include/portability/instr_time.h @@ -84,7 +84,7 @@ typedef struct instr_time /* Use clock_gettime() */ -#include <time.h> +#include <sys/time.h> /* * The best clockid to use according to the POSIX spec is CLOCK_MONOTONIC, @@ -111,10 +111,10 @@ static inline instr_time pg_clock_gettime_ns(void) { instr_time now; - struct timespec tmp; + struct timeval tmp; - clock_gettime(PG_INSTR_CLOCK, &tmp); - now.ticks = tmp.tv_sec * NS_PER_S + tmp.tv_nsec; + gettimeofday(&tmp, NULL); + now.ticks = tmp.tv_sec * NS_PER_S + tmp.tv_usec * 1000; return now; } diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c index c29d6f8762..ea2b565b14 100644 --- a/src/bin/pg_test_timing/pg_test_timing.c +++ b/src/bin/pg_test_timing/pg_test_timing.c @@ -133,7 +133,7 @@ test_timing(unsigned int duration) total_time = duration > 0 ? duration * INT64CONST(1000000) : 0; INSTR_TIME_SET_CURRENT(start_time); - cur = INSTR_TIME_GET_MICROSEC(start_time); + cur = INSTR_TIME_GET_NANOSEC(start_time); while (time_elapsed < total_time) { @@ -142,7 +142,7 @@ test_timing(unsigned int duration) prev = cur; INSTR_TIME_SET_CURRENT(temp); - cur = INSTR_TIME_GET_MICROSEC(temp); + cur = INSTR_TIME_GET_NANOSEC(temp); diff = cur - prev; /* Did time go backwards? */ @@ -183,7 +183,7 @@ output(uint64 loop_count) { int64 max_bit = 31, i; - char *header1 = _("< us"); + char *header1 = _("< ns"); char *header2 = /* xgettext:no-c-format */ _("% of total"); char *header3 = _("count"); int len1 = strlen(header1);
(resending to list and other CC:s ) Hi Tom This is my current patch which also adds running % and optionally uses faster way to count leading zeros, though I did not see a change from that. It also bucketizes first 128 ns to get better overview of exact behaviour. We may want to put reporting this behind a flag --- Hannu On Wed, Jun 19, 2024 at 6:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Peter Eisentraut <peter@eisentraut.org> writes: > > AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't > > really address the original question. > > It's not exactly hard to make it do so (see attached). > > I tried this on several different machines, and my conclusion is that > gettimeofday() reports full microsecond precision on any platform > anybody is likely to be running PG on today. Even my one surviving > pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like > this: > > $ ./pg_test_timing > Testing timing overhead for 3 seconds. > Per loop time including overhead: 901.41 ns > Histogram of timing durations: > < us % of total count > 1 10.46074 348148 > 2 89.51495 2979181 > 4 0.00574 191 > 8 0.00430 143 > 16 0.00691 230 > 32 0.00376 125 > 64 0.00012 4 > 128 0.00303 101 > 256 0.00027 9 > 512 0.00009 3 > 1024 0.00009 3 > > I also modified pg_test_timing to measure nanoseconds not > microseconds (second patch attached), and got this: > > $ ./pg_test_timing > Testing timing overhead for 3 seconds. > Per loop time including overhead: 805.50 ns > Histogram of timing durations: > < ns % of total count > 1 19.84234 739008 > 2 0.00000 0 > 4 0.00000 0 > 8 0.00000 0 > 16 0.00000 0 > 32 0.00000 0 > 64 0.00000 0 > 128 0.00000 0 > 256 0.00000 0 > 512 0.00000 0 > 1024 80.14013 2984739 > 2048 0.00078 29 > 4096 0.00658 245 > 8192 0.00290 108 > 16384 0.00252 94 > 32768 0.00250 93 > 65536 0.00016 6 > 131072 0.00185 69 > 262144 0.00008 3 > 524288 0.00008 3 > 1048576 0.00008 3 > > confirming that when the result changes it generally does so by 1usec. > > Applying just the second patch, I find that clock_gettime on this > old hardware seems to be limited to 1us resolution, but on my more > modern machines (mac M1, x86_64) it can tick at 40ns or less. > Even a raspberry pi 4 shows > > $ ./pg_test_timing > Testing timing overhead for 3 seconds. > Per loop time including overhead: 69.12 ns > Histogram of timing durations: > < ns % of total count > 1 0.00000 0 > 2 0.00000 0 > 4 0.00000 0 > 8 0.00000 0 > 16 0.00000 0 > 32 0.00000 0 > 64 37.59583 16317040 > 128 62.38568 27076131 > 256 0.01674 7265 > 512 0.00002 8 > 1024 0.00000 0 > 2048 0.00000 0 > 4096 0.00153 662 > 8192 0.00019 83 > 16384 0.00001 3 > 32768 0.00001 5 > > suggesting that the clock_gettime resolution is better than 64 ns. > > So I concur with Hannu that it's time to adjust pg_test_timing to > resolve nanoseconds not microseconds. I gather he's created a > patch that does more than mine below, so I'll wait for that. > > regards, tom lane >
Вложения
I also have a variant that uses the low-level CPU cycle counter directly (attached) It currently only works on clang, as it is done using __builtin_readcyclecounter() in order to support both x64 and ARM. This one is there to understand the overhead of the calculations when going from cycle counter to POSIX time struct This works OK with Clang, but we should probably not integrate this directly into the code as it has some interesting corner cases. For example Apple's clang does compile __builtin_readcyclecounter() but crashes with unknown instruction when trying to run it. Therefore I have not integrated it into Makefile so if you want to use it, just copy it into src/bin/pg_test_timing and run cd src/bin/pgtest_timing mv pg_test_timing.c pg_test_timing.c.backup cp pg_test_cyclecounter.c pg_test_timing.c make mv pg_test_timing pg_test_cyclecounter mv pg_test_timing.c.backup pg_test_timing.c It gives output like Testing timing overhead for 3 seconds. Total 25000001 ticks in 1000000073 ns, 24999999.175000 ticks per ns This CPU is running at 24999999 ticks / second, will run test for 74999997 ticks loop_count 287130958Per loop time including overhead: 10.45 ns, min: 0 ticks (0.0 ns), same: 212854591 Total ticks in: 74999997, in: 3000000541 nr Log2 histogram of timing durations: < ticks ( < ns) % of total running % count 1 ( 40.0) 74.1315 74.1315 212854591 2 ( 80.0) 25.8655 99.9970 74267876 4 ( 160.0) 0.0000 99.9970 7 8 ( 320.0) 0.0000 99.9970 3 16 ( 640.0) 0.0000 99.9970 1 32 ( 1280.0) 0.0000 99.9971 27 64 ( 2560.0) 0.0012 99.9983 3439 128 ( 5120.0) 0.0016 99.9999 4683 256 ( 10240.0) 0.0001 100.0000 265 512 ( 20480.0) 0.0000 100.0000 37 1024 ( 40960.0) 0.0000 100.0000 23 2048 ( 81920.0) 0.0000 100.0000 6 First 64 ticks -- 0 ( 0.0) 74.1315 74.1315 212854591 1 ( 40.0) 25.8655 99.9970 74267876 2 ( 80.0) 0.0000 99.9970 2 3 ( 120.0) 0.0000 99.9970 5 4 ( 160.0) 0.0000 99.9970 2 6 ( 240.0) 0.0000 99.9983 1 13 ( 520.0) 0.0000 100.0000 1 ... 59 ( 2360.0) 0.0000 100.0000 140 60 ( 2400.0) 0.0001 100.0000 210 61 ( 2440.0) 0.0002 100.0000 497 62 ( 2480.0) 0.0002 100.0000 524 63 ( 2520.0) 0.0001 100.0000 391 If you run on some interesting hardware, please share the results. If we her enough I will put them together in a spreadsheet and share I also attach my lightning talk slides here --- Hannu On Thu, Jun 20, 2024 at 12:41 PM Hannu Krosing <hannuk@google.com> wrote: > > (resending to list and other CC:s ) > > Hi Tom > > This is my current patch which also adds running % and optionally uses > faster way to count leading zeros, though I did not see a change from > that. > > It also bucketizes first 128 ns to get better overview of exact behaviour. > > We may want to put reporting this behind a flag > > --- > Hannu > > On Wed, Jun 19, 2024 at 6:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Peter Eisentraut <peter@eisentraut.org> writes: > > > AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't > > > really address the original question. > > > > It's not exactly hard to make it do so (see attached). > > > > I tried this on several different machines, and my conclusion is that > > gettimeofday() reports full microsecond precision on any platform > > anybody is likely to be running PG on today. Even my one surviving > > pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like > > this: > > > > $ ./pg_test_timing > > Testing timing overhead for 3 seconds. > > Per loop time including overhead: 901.41 ns > > Histogram of timing durations: > > < us % of total count > > 1 10.46074 348148 > > 2 89.51495 2979181 > > 4 0.00574 191 > > 8 0.00430 143 > > 16 0.00691 230 > > 32 0.00376 125 > > 64 0.00012 4 > > 128 0.00303 101 > > 256 0.00027 9 > > 512 0.00009 3 > > 1024 0.00009 3 > > > > I also modified pg_test_timing to measure nanoseconds not > > microseconds (second patch attached), and got this: > > > > $ ./pg_test_timing > > Testing timing overhead for 3 seconds. > > Per loop time including overhead: 805.50 ns > > Histogram of timing durations: > > < ns % of total count > > 1 19.84234 739008 > > 2 0.00000 0 > > 4 0.00000 0 > > 8 0.00000 0 > > 16 0.00000 0 > > 32 0.00000 0 > > 64 0.00000 0 > > 128 0.00000 0 > > 256 0.00000 0 > > 512 0.00000 0 > > 1024 80.14013 2984739 > > 2048 0.00078 29 > > 4096 0.00658 245 > > 8192 0.00290 108 > > 16384 0.00252 94 > > 32768 0.00250 93 > > 65536 0.00016 6 > > 131072 0.00185 69 > > 262144 0.00008 3 > > 524288 0.00008 3 > > 1048576 0.00008 3 > > > > confirming that when the result changes it generally does so by 1usec. > > > > Applying just the second patch, I find that clock_gettime on this > > old hardware seems to be limited to 1us resolution, but on my more > > modern machines (mac M1, x86_64) it can tick at 40ns or less. > > Even a raspberry pi 4 shows > > > > $ ./pg_test_timing > > Testing timing overhead for 3 seconds. > > Per loop time including overhead: 69.12 ns > > Histogram of timing durations: > > < ns % of total count > > 1 0.00000 0 > > 2 0.00000 0 > > 4 0.00000 0 > > 8 0.00000 0 > > 16 0.00000 0 > > 32 0.00000 0 > > 64 37.59583 16317040 > > 128 62.38568 27076131 > > 256 0.01674 7265 > > 512 0.00002 8 > > 1024 0.00000 0 > > 2048 0.00000 0 > > 4096 0.00153 662 > > 8192 0.00019 83 > > 16384 0.00001 3 > > 32768 0.00001 5 > > > > suggesting that the clock_gettime resolution is better than 64 ns. > > > > So I concur with Hannu that it's time to adjust pg_test_timing to > > resolve nanoseconds not microseconds. I gather he's created a > > patch that does more than mine below, so I'll wait for that. > > > > regards, tom lane > >
Вложения
Another thing I changed in reporting was to report <= ns instead of < ns This was inspired by not wanting to report "zero ns" as "< 1 ns" and easiest was to change them all to <= On Thu, Jun 20, 2024 at 12:41 PM Hannu Krosing <hannuk@google.com> wrote: > > (resending to list and other CC:s ) > > Hi Tom > > This is my current patch which also adds running % and optionally uses > faster way to count leading zeros, though I did not see a change from > that. > > It also bucketizes first 128 ns to get better overview of exact behaviour. > > We may want to put reporting this behind a flag > > --- > Hannu > > On Wed, Jun 19, 2024 at 6:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Peter Eisentraut <peter@eisentraut.org> writes: > > > AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't > > > really address the original question. > > > > It's not exactly hard to make it do so (see attached). > > > > I tried this on several different machines, and my conclusion is that > > gettimeofday() reports full microsecond precision on any platform > > anybody is likely to be running PG on today. Even my one surviving > > pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like > > this: > > > > $ ./pg_test_timing > > Testing timing overhead for 3 seconds. > > Per loop time including overhead: 901.41 ns > > Histogram of timing durations: > > < us % of total count > > 1 10.46074 348148 > > 2 89.51495 2979181 > > 4 0.00574 191 > > 8 0.00430 143 > > 16 0.00691 230 > > 32 0.00376 125 > > 64 0.00012 4 > > 128 0.00303 101 > > 256 0.00027 9 > > 512 0.00009 3 > > 1024 0.00009 3 > > > > I also modified pg_test_timing to measure nanoseconds not > > microseconds (second patch attached), and got this: > > > > $ ./pg_test_timing > > Testing timing overhead for 3 seconds. > > Per loop time including overhead: 805.50 ns > > Histogram of timing durations: > > < ns % of total count > > 1 19.84234 739008 > > 2 0.00000 0 > > 4 0.00000 0 > > 8 0.00000 0 > > 16 0.00000 0 > > 32 0.00000 0 > > 64 0.00000 0 > > 128 0.00000 0 > > 256 0.00000 0 > > 512 0.00000 0 > > 1024 80.14013 2984739 > > 2048 0.00078 29 > > 4096 0.00658 245 > > 8192 0.00290 108 > > 16384 0.00252 94 > > 32768 0.00250 93 > > 65536 0.00016 6 > > 131072 0.00185 69 > > 262144 0.00008 3 > > 524288 0.00008 3 > > 1048576 0.00008 3 > > > > confirming that when the result changes it generally does so by 1usec. > > > > Applying just the second patch, I find that clock_gettime on this > > old hardware seems to be limited to 1us resolution, but on my more > > modern machines (mac M1, x86_64) it can tick at 40ns or less. > > Even a raspberry pi 4 shows > > > > $ ./pg_test_timing > > Testing timing overhead for 3 seconds. > > Per loop time including overhead: 69.12 ns > > Histogram of timing durations: > > < ns % of total count > > 1 0.00000 0 > > 2 0.00000 0 > > 4 0.00000 0 > > 8 0.00000 0 > > 16 0.00000 0 > > 32 0.00000 0 > > 64 37.59583 16317040 > > 128 62.38568 27076131 > > 256 0.01674 7265 > > 512 0.00002 8 > > 1024 0.00000 0 > > 2048 0.00000 0 > > 4096 0.00153 662 > > 8192 0.00019 83 > > 16384 0.00001 3 > > 32768 0.00001 5 > > > > suggesting that the clock_gettime resolution is better than 64 ns. > > > > So I concur with Hannu that it's time to adjust pg_test_timing to > > resolve nanoseconds not microseconds. I gather he's created a > > patch that does more than mine below, so I'll wait for that. > > > > regards, tom lane > >
Hannu Krosing <hannuk@google.com> writes: > This is my current patch which also adds running % and optionally uses > faster way to count leading zeros, though I did not see a change from > that. I've not read the patch yet, but I did create a CF entry [1] to get some CI cycles on this. The cfbot complains [2] about [19:24:31.951] pg_test_timing.c: In function ‘output’: [19:24:31.951] pg_test_timing.c:229:11: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type‘int64’ {aka ‘long long int’} [-Werror=format=] [19:24:31.951] 229 | printf("%*ld %*.4f %*.4f %*lld\n", [19:24:31.951] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [19:24:31.951] 230 | Max(8, len1), i, [19:24:31.951] | ~ [19:24:31.951] | | [19:24:31.951] | int64 {aka long long int} which seems a bit confused, but anyway you cannot assume that int64 is a match for "%ld", or "%lld" either. What we generally do for this elsewhere is to explicitly cast printf arguments to long long int. Also there's this on Windows: [19:23:48.231] ../src/bin/pg_test_timing/pg_test_timing.c(162): warning C4067: unexpected tokens following preprocessor directive- expected a newline regards, tom lane [1] https://commitfest.postgresql.org/48/5066/ [2] http://cfbot.cputube.org/highlights/all.html#5066