Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

Поиск
Список
Период
Сортировка
Thank you to all who have weighed in! Very much appreciated.

A few thoughts based on what I've read:

As far as Lincoln describes it, the series is not the real problem here,
but is just intended to be a simplified example of his actual data.

This both is and isn't the case. I was using gneerate_series to create some data for testing purposes, but I also would love to be able to use generate_series for the logic as well.

> SELECT
>     sub.gs AS ts_in_utc
>     ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz
>     ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York',
> '2023-01-01')
> FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz,
> '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub
> WHERE
>     sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND
> sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz

I believe this query will be funky around DST borders, because `sub.gs AT TIME ZONE 'America/New_York'` will be localized in a way that erases the difference between hours with different offsets, which are genuinely different. For instance, I ran this and there are two rows within it that look like:

` 2023-11-05 05:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`
and 
` 2023-11-05 06:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`

I think that the non-unique second column will pose an issue for the date binning at a resolution finer than 1 day.

> Note that setting the time zone is a client/connection setting so if you
> set it within a transaction, it will stay set when the transaction
> concludes. But time manipulation is tricky and trying to DIY reinvent the
> wheel is painful and often buggy. Let PostgreSQL do the work for you.

Yeah, I'm definitely interested in keeping as much of the DST stuff outside my code as possible. I guess my concern is that I think there are other places where this database is being used in my codebase that may rely on the database time setting being UTC (or really, GMT, though I don't think there's a difference for these purposes). It would be best if all of my application's code declared its intentions about the time zone of the database connection before running its query, but I don't think that's a feasible change to make right now. That's what's motivating my pursuit of finding a way to write these queries without changing this setting, through appropriate casting and such.

> create function generate_series(timestamptz, timestamptz, interval, text)
> returns setof timestamptz
strict immutable language plpgsql as
$$
begin
perform set_config('timezone', $4, true);
return query select generate_series($1, $2, $3);
end
$$ set timezone = 'UTC';

This is a nice suggestion, and in fact, it would be fine from my perspective to reset to UTC every time. My concern is only around the safety of the final `set timezone`. Under what circumstances/when can I count on that being set? E.g. if a query using that function was cancelled before finishing, would the connection timezone remain as $4? I guess the 3rd parameter to set_config is `is_local` (based on https://pgpedia.info/s/set_config.html). Does that mean I could run this outside this context of a function, and expect the setting to go back to UTC on a rollback? Apologies if these are naive questions.

Thanks again for all the help.

Best,
Lincoln


On Wed, Oct 4, 2023 at 11:09 AM Marian Wendt <marian.wendt@yahoo.com> wrote:


Am 04.10.2023 um 16:11 schrieb Tom Lane:
Steve Crawford <scrawford@pinpointresearch.com> writes:
On Tue, Oct 3, 2023 at 7:33 PM Lincoln Swaine-Moore <lswainemoore@gmail.com>
wrote:
5) Ideally, the solution would not involve messing with the
server/connection's value of timezone. (Though I would be interested if
there was a solution that relaxed this constraint and was relatively
safe/compatible with transactions and psycopg2.)
Note that setting the time zone is a client/connection setting so if you
set it within a transaction, it will stay set when the transaction
concludes. But time manipulation is tricky and trying to DIY reinvent the
wheel is painful and often buggy. Let PostgreSQL do the work for you.
Expanding on that philosophy: you should be able to set the timezone
locally within a function, so that it wouldn't be that hard to make a
wrapper for generate_series that emulates the 4-argument version added
in v16.

Rather than messing with manually saving and restoring the prevailing
zone, I'd let the function SET infrastructure do it for me.  Sadly,
that SET clause only takes literal constant arguments, so it'd go
roughly like this:

create function generate_series(timestamptz, timestamptz, interval, text)
returns setof timestamptz
strict immutable language plpgsql as
$$
begin  perform set_config('timezone', $4, true);  return query select generate_series($1, $2, $3);
end
$$ set timezone = 'UTC';

Setting the zone to UTC is a useless step, but that triggers
restoring the previous zone when the function exits; simpler
and probably faster than coding the save/restore explicitly.

Side note: whether this is really "immutable" is a matter for
debate, since time zone definitions tend to change over time.
But we chose to mark the new 4-argument version that way,
so you might as well do so too.
			regards, tom lane



As far as Lincoln describes it, the series is not the real problem here, but is just intended to be a simplified example of his actual data. The consideration that you can use the time zone using a function should apply here... The following SELECT should show as an example that the desired result can be achieved (TIMEZONE set to 'Etc/UTC'). Variations of date_bin for 15 minutes or 1 hour should work similarly...

SELECT
    sub.gs AS ts_in_utc
    ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz
    ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York', '2023-01-01')
FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz, '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub
WHERE
    sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz

--
regards, marian wendt


--
Lincoln Swaine-Moore

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: [EXT]Re: Strange error trying to import with Ora2PG
Следующее
От: "Johnson, Bruce E - (bjohnson)"
Дата:
Сообщение: Re: [EXT]Re: Strange error trying to import with Ora2PG