time-based range partitioning and truncate/delete different timezones

Поиск
Список
Период
Сортировка
От Niels Jespersen
Тема time-based range partitioning and truncate/delete different timezones
Дата
Msg-id 97f889bbea084fdb8cfd94e73ab1358d@dst.dk
обсуждение исходный текст
Ответы Re: time-based range partitioning and truncate/delete different timezones  (Michael Lewis <mlewis@entrata.com>)
Список pgsql-general
Hi all

I need ideas regarding a way to design an insert, truncate/delete, and reinsert process on data that is in UTC and
databaserange partitions that are in the Europe/Brussels time zone. Therefore, any input is welcome.  

The issue at hand is this. We receive daily file deliveries of time-based measurements. The files are zipped csv's. A
filecovers a UTC-day (e.g. 2021-01-15 00:00:00+00 to 2021-01-15 23:59:59+00). About 100 million rows a day.  

Loaded by Python into a Postgres 12 database table, range-partitioned on Europe/Brussels days. Database session in UTC,
sincetimestamps in data has no time zone indicator. Most rows hit the same partition, but the last hour (this time of
year,in the summer it is 2 hours) hits the next partition. Streamed directly from zip by psycopg2.cursor.copy_expert.
Worksperfect.  

Then analysis on the data can proceed. Aggregating stuff on daily or monthly basis. This works fine since data is
partitionedon our local time zone and we want aggregates on the same time zone.  

Then, two weeks later a better delivery comes in covering the same UTC-period. Data quality is better, so we want to
replacethe old data with the new data. If data were partitioned on the same time zone boundary as the file covers, then
atruncate partition, insert new data would be the obvious solution. That is no good, since the truncate will wipe an
hourtoo much on one end and an hour too little on the other end.  

So what is a better solution? Delete from t where t.timecolumn between a and b, and reinsert the new data. My instinct
saysno, but I cannot really think of a good alternative. It is 100 million rows with 5 or 6 numeric columns. 

Regards Niels Jespersen






В списке pgsql-general по дате отправления:

Предыдущее
От: Soni M
Дата:
Сообщение: Re: could not fsync file ""pg_wal/xlogtemp.23229"": Input/output error
Следующее
От: Atul Kumar
Дата:
Сообщение: migration from postgres to enterprosedb