Обсуждение: Are all unlogged tables in any case truncated after a server-crash?
Hi everyone, every few weeks I use Postgres ability, to import huge data sets very fast by means of "unlogged tables". The bulk load (consisting of plenty "copy"- & DML-Stmts) and the spatial index creation afterwards, takes about 5 hours on a proper server (pg12.7 & PostGIS-Extension). After that all unlogged tables remain completely unchanged (no DML-/DDL-Statements). Hence all of my huge unlogged, "static" tables get never "unclean" and should not be truncated after a server crash. BTW, if I set all unlogged tables to logged after bulk load, it takes additional 1.5 hours, mainly because of re-indexing, I suppose. I assume that a restart of the database after a server crash takes another 1.5 hours (reading from WAL) until the database is up and running. Therefore I am seeking a strategy, to not tagging those tables as "unclean" and not truncating all unlogged tables on server restart. Cheers and regards.
Re: Are all unlogged tables in any case truncated after a server-crash?
От
"David G. Johnston"
Дата:
On Thu, Nov 11, 2021 at 11:39 AM <sch8el@posteo.de> wrote:
After
that all unlogged tables remain completely unchanged (no
DML-/DDL-Statements). Hence all of my huge unlogged, "static" tables get
never "unclean" and should not be truncated after a server crash.
The server cannot make this assumption so it truncates unlogged relations upon an unclean shutdown/crash because it has no WAL with which to ensure a proper restoration.
BTW, if I set all unlogged tables to logged after bulk load, it takes
additional 1.5 hours, mainly because of re-indexing, I suppose.
More likely it is writing the entire table, and all of its indexes, to WAL.
I assume
that a restart of the database after a server crash takes another 1.5
hours (reading from WAL) until the database is up and running.
That would be incorrect. See "CHECKPOINT".
Therefore I am seeking a strategy, to not tagging those tables as
"unclean" and not truncating all unlogged tables on server restart.
There is no middle ground that I am aware of. Either the contents of the table are in WAL ,or they are not. If not, they can be lost upon an unclean shutdown. For manually initiated shutdowns you do have the option to do so cleanly.
This topic (unlogged optimizations) does draw quite a bit of attention every year but so far the problem of proving to the system that the physical file on disk is a truly accurate representation of the post-crash relation is yet unsolved.
David J.
On Thu, 2021-11-11 at 18:39 +0000, sch8el@posteo.de wrote: > every few weeks I use Postgres ability, to import huge data sets very > fast by means of "unlogged tables". The bulk load (consisting of plenty > "copy"- & DML-Stmts) and the spatial index creation afterwards, takes > about 5 hours on a proper server (pg12.7 & PostGIS-Extension). After > that all unlogged tables remain completely unchanged (no > DML-/DDL-Statements). Hence all of my huge unlogged, "static" tables get > never "unclean" and should not be truncated after a server crash. There is no way to achieve that. But you could keep the "huge data sets" around and load them again if your server happens to crash (which doesn't happen often, I hope). Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Hi David,
thx for your comments and your advice on reading docs on "checkpoint".
Of course consistency is most important to any DBMS, and if in doubt about that, truncate data rows and restore from WAL.
But in this case, where data is never modified after bulk load, I thought there might be an undocumented feature or workaround, like ...
- option to set the datafiles of those tables in read-only mode and record this in the metadata
- on server-recovery spare these unlogged tables and indexes from truncating all data rows
Its truly a "nice to have"-thing, but I have learned now, that there is not feature like that.
Mart
thx for your comments and your advice on reading docs on "checkpoint".
Of course consistency is most important to any DBMS, and if in doubt about that, truncate data rows and restore from WAL.
But in this case, where data is never modified after bulk load, I thought there might be an undocumented feature or workaround, like ...
- option to set the datafiles of those tables in read-only mode and record this in the metadata
- on server-recovery spare these unlogged tables and indexes from truncating all data rows
Its truly a "nice to have"-thing, but I have learned now, that there is not feature like that.
Mart
Am 11.11.2021 um 22:10 schrieb David G. Johnston:
On Thu, Nov 11, 2021 at 11:39 AM <sch8el@posteo.de> wrote:After
that all unlogged tables remain completely unchanged (no
DML-/DDL-Statements). Hence all of my huge unlogged, "static" tables get
never "unclean" and should not be truncated after a server crash.The server cannot make this assumption so it truncates unlogged relations upon an unclean shutdown/crash because it has no WAL with which to ensure a proper restoration.BTW, if I set all unlogged tables to logged after bulk load, it takes
additional 1.5 hours, mainly because of re-indexing, I suppose.More likely it is writing the entire table, and all of its indexes, to WAL.I assume
that a restart of the database after a server crash takes another 1.5
hours (reading from WAL) until the database is up and running.That would be incorrect. See "CHECKPOINT".
Therefore I am seeking a strategy, to not tagging those tables as
"unclean" and not truncating all unlogged tables on server restart.There is no middle ground that I am aware of. Either the contents of the table are in WAL ,or they are not. If not, they can be lost upon an unclean shutdown. For manually initiated shutdowns you do have the option to do so cleanly.This topic (unlogged optimizations) does draw quite a bit of attention every year but so far the problem of proving to the system that the physical file on disk is a truly accurate representation of the post-crash relation is yet unsolved.David J.
Am 12.11.2021 um 08:41 schrieb Laurenz Albe: > On Thu, 2021-11-11 at 18:39 +0000, sch8el@posteo.de wrote: >> every few weeks I use Postgres ability, to import huge data sets very >> fast by means of "unlogged tables". The bulk load (consisting of plenty >> "copy"- & DML-Stmts) and the spatial index creation afterwards, takes >> about 5 hours on a proper server (pg12.7 & PostGIS-Extension). After >> that all unlogged tables remain completely unchanged (no >> DML-/DDL-Statements). Hence all of my huge unlogged, "static" tables get >> never "unclean" and should not be truncated after a server crash. > There is no way to achieve that. > > But you could keep the "huge data sets" around and load them again if > your server happens to crash (which doesn't happen often, I hope). Thx Laurenz for yr reply! Yes, that's what we did after server crashes (~ 2/yr on different locations). But the system is at least 5 hours offline plus the time until the admin manually re-starts the bulk loads. On my system, I have 6 databases configured like this. For all I have to redo the bulk loads. I hoped there was a 'switch' on crash-recovery, to avoid truncating the datafiles of these unlogged tables, which are definitely in a perfect condition. Mart > > Yours, > Laurenz Albe
Why keep them as unlogged tables? If data is static, can you spare the disk space to gradually copy data from existing unlogged table to new copy that is logged, and then have brief exclusive lock to drop unlogged and rename new one?
Curious... why keep the table as unlogged if it is static? If you can spare the disk space, perhaps just create a regular table with same definition, gradually copy the data to spread the impact on WAL, and when complete, just drop the old table and rename the new one.