Обсуждение: BUG #16561: timestamp with time zone microseconds inconsistently recorded correctly and incorrectly

Поиск
Список
Период
Сортировка

BUG #16561: timestamp with time zone microseconds inconsistently recorded correctly and incorrectly

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16561
Logged by:          Peter Thomas
Email address:      peter@mccarthy.co.nz
PostgreSQL version: 9.5.9
Operating system:   Windows 10 x64
Description:

Windows C++ code issues UPDATE query to local Windows PostgreSQL 9.5.9 over
libpq.

One field being updated is a timestamp provided as UTC text representation
(e.g. '2020-07-29T22:30:00.124248Z')  but stored as timestamp with time
zone. The timestamp sub-second component is not consistently written -
sometimes it is stored correctly, sometime it is stored incorrectly. Always
the sub second part of the time (including more significant digits) and
never the date/time from seconds upwards.

The query is verified client side then logged in PostgreSQL confirming that
the client query is received by PostgreSQL verbatim. 

Interestingly, the same query applied manually through pgadmin iii fails to
store the microsecond component incorrectly even after numerous attempts.

The query is not performed within a transaction and occurs immediately
subsequent to creating the record that is then updated. This is the test
case that revealed the problem. 

The manual test through pgadmin iii differs in that the record had already
been updated at least once.

The query is similar to as follows ("locked_tz" and "uid" vary):

UPDATE "tnt_res_b195217c_cfc8_11ea_8c1b_00155dce25bc".job 
SET 
locked_tz = CASE WHEN locked_tz IS NULL THEN '2020-07-29T22:30:00.124248Z'
ELSE locked_tz END, 
locked_by_operative_uid = CASE WHEN locked_by_operative_uid IS NULL THEN
'b32ffd2c-cfc8-11ea-987d-00155dce25bc' ELSE locked_by_operative_uid END, 
version = CASE WHEN locked_tz IS NULL THEN version + 1 ELSE version END,
description='2020-07-29T22:30:00.124248Z' -- added for debugging
WHERE uid = '09dbe5d6-d1eb-11ea-9185-00155dce25bc' 
RETURNING locked_tz, locked_by_operative_uid;

The query is written this way to ensure that locked_tz and
locked_by_operative_uid are returned regardless of whether the values are
updated or not. It is a "job locking" query. When executed in the test code
the pre-existing values of locked_tz and locked_by_operative_uid are both
NULL.

The time was also written into the "description" column as a sanity check
(debugging).

The outputs from two back-to-back test runs reveals the first succeeding and
the second failing.

"07ff87e0-d1eb-11ea-93e6-00155dce25bc";"b32ffd2c-cfc8-11ea-987d-00155dce25bc";"New
Job";"2020-07-29T22:29:56.992816Z";"";"";"";"JOTRAINING";"";FALSE;"b32ffd2c-cfc8-11ea-987d-00155dce25bc";"2020-07-30
10:29:56.992816+12";"";"";"";"";"2020-07-30 10:29:56.999488+12";1

"09dbe5d6-d1eb-11ea-9185-00155dce25bc";"b32ffd2c-cfc8-11ea-987d-00155dce25bc";"New
Job";"2020-07-29T22:30:00.124248Z";"";"";"";"JOTRAINING";"";FALSE;"b32ffd2c-cfc8-11ea-987d-00155dce25bc";"2020-07-30
10:30:00.114735+12";"";"";"";"";"2020-07-30 10:30:00.125314+12";1

The description column containing the supplied time as a (debug) string
follows the name column containing "New Job". On the second record you will
see the time in the description has a different microsecond component to
that in the "+12" timestamp with timezone field.

I plan to upgrade and see if this goes away but will wait in case more info
is required.

As an FYI the table definition is as follows:

CREATE TABLE tnt_res_b195217c_cfc8_11ea_8c1b_00155dce25bc.job
(
  uid uuid NOT NULL DEFAULT uuid_generate_v4(),
  creator_operative_uid uuid NOT NULL,
  name character varying(1024),
  description character varying(16384),
  scenario_uid uuid,
  scenario_jdoc json,
  folder_uid uuid,
  job_type_code character varying(10) NOT NULL,
  time_zone character varying(64),
  allow_lc_res boolean NOT NULL DEFAULT false,
  locked_by_operative_uid uuid,
  locked_tz timestamp with time zone,
  closed_tz timestamp with time zone,
  purged_tz timestamp with time zone,
  execution_jdoc json,
  extension_jdoc json,
  created_tz timestamp with time zone NOT NULL DEFAULT now(),
  version integer NOT NULL DEFAULT 0,
  CONSTRAINT job_pkey PRIMARY KEY (uid)
)
WITH (
  OIDS=FALSE,
  autovacuum_enabled=true
);

postgresql_conf
timezone = 'Pacific/Auckland'


PG Bug reporting form <noreply@postgresql.org> writes:
> One field being updated is a timestamp provided as UTC text representation
> (e.g. '2020-07-29T22:30:00.124248Z')  but stored as timestamp with time
> zone. The timestamp sub-second component is not consistently written -
> sometimes it is stored correctly, sometime it is stored incorrectly. Always
> the sub second part of the time (including more significant digits) and
> never the date/time from seconds upwards.

Given the described query:

> UPDATE "tnt_res_b195217c_cfc8_11ea_8c1b_00155dce25bc".job 
> SET 
> locked_tz = CASE WHEN locked_tz IS NULL THEN '2020-07-29T22:30:00.124248Z'
> ELSE locked_tz END, 
> locked_by_operative_uid = CASE WHEN locked_by_operative_uid IS NULL THEN
> 'b32ffd2c-cfc8-11ea-987d-00155dce25bc' ELSE locked_by_operative_uid END, 
> version = CASE WHEN locked_tz IS NULL THEN version + 1 ELSE version END,
> description='2020-07-29T22:30:00.124248Z' -- added for debugging
> WHERE uid = '09dbe5d6-d1eb-11ea-9185-00155dce25bc' 
> RETURNING locked_tz, locked_by_operative_uid;

what seems far more likely than random data corruption is that some other
transaction updated this same row slightly earlier, setting the locked_tz
value that you are reading back.  The CASE in this query would then have
preserved that value, but the description field would get updated anyway.

In the specific example you show, if I'm not confused, the reported
locked_tz value is a bit older than the description value, so that
this sequence of events seems very plausible.  But even if the order
were reversed, that wouldn't immediately destroy this theory, because
you haven't said where the timestamps are coming from.  The transaction
that got to the row first could possibly try to store a "newer" timestamp
than the one that got there second, unless there's some guarantee about
how those timestamps are computed that you've not described.

In short, I think you need to take a hard look at whatever logic you
think is preventing concurrent selection of the same job row by multiple
transactions, because what it looks like from here is that that's not
working reliably.

I won't completely deny that there could be a Postgres bug here, but
there are enough moving parts that are missing from this bug report
that it can't be investigated usefully.

            regards, tom lane



Hi Tom,

Thanks for the detailed consideration.

I can confirm that there is only one connection/transaction accessing the job record on the given database.

The flow is simply:

    initiate test
    create job
    update job (using provided query)
    compare timestamps

No other logic has access to the "job" record that is created in the isolated test (on the given database) other than an AFTER ROW trigger that simply does a NOTIFY (no modification of data).

The timestamp was generated by determining "now" in C++ code and provided as an ISO8601 string in the query.

When I say "on the given database" that is because the same logic is executed concurrently on two databases hosted by the same PostgreSQL server. Two separate connections. Two independent threads. Each thread using its own connection for both operations (create then update). Each thread using the same query.

So, the same update query (hence time) gets applied to two databases on the same server in parallel. In the real (production) world these will be separate servers.


Regards,

Peter Thomas

On Thu, Jul 30, 2020 at 11:52 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> One field being updated is a timestamp provided as UTC text representation
> (e.g. '2020-07-29T22:30:00.124248Z')  but stored as timestamp with time
> zone. The timestamp sub-second component is not consistently written -
> sometimes it is stored correctly, sometime it is stored incorrectly. Always
> the sub second part of the time (including more significant digits) and
> never the date/time from seconds upwards.

Kia ora,

Just to rule out another theory, if you run pg_controldata -D pgdata,
you can see which storage format is used for timestamps:

Date/time type storage:               64-bit integers

Before release 10, it was possible for it to use floating point
storage instead of integers; I wonder if that could be a factor here.
There's a note about that here:

https://www.postgresql.org/docs/9.5/datatype-datetime.html



Hi Thomas,

pg_controldata -D pgdata

returns

Date/time type storage:               64-bit integers



Regarding the parallel execution on two same-server databases, I can also reproduce the problem running sequentially (same-thread) across the two same-server databases.

(Note: in both parallel and sequential execution the multi-db same-query inserts are both completed before the multi-db same-query updates get executed)

What is interesting is that in either threading case the incorrect timestamp values are written consistently in both databases, which is a down-vote for random corruption.



Regards,

Peter Thomas 
Hi Tom, Thomas,


Tom was correct. You may mark this as closed.

Highly embarrassing.

Default C++ param propagation caused locked_tz to be set on create.

If it's any consolation I've been staring at this for two days.

Sorry to have wasted everybody's time.


Regards,

Peter Thomas
Thomas Munro <thomas.munro@gmail.com> writes:
> Before release 10, it was possible for it to use floating point
> storage instead of integers; I wonder if that could be a factor here.

I wondered about that briefly as well.  It doesn't seem to fit the
facts though.

(1) integer timestamps have been the default since 8.4;

(2) if the OP were using float timestamps, he'd have to be working
with dates millions of years away from the epoch to lose any
substantial fraction of a second in precision.  2^53 seconds is
~285 million years if I've not miscounted.  Even dropping the last
microsecond digit won't happen for dates within a couple hundred
years of AD 2000.

            regards, tom lane