Обсуждение: journaled FS and and WAL
Hi, two question related to the WAL. 1) I read in the doc that journaled FS is not important as WAL is journaling itself. But who garantees that the WAL is written correctly? I know that it's sequential and a partial update of WAL can be discarded after a restart. But am I sure that without a journaled FS, if there is a crash during the WAL update, nothing already updated in the WAL before my commit can get corrupted? 2) Let's suppose that I have one database, one table of 100000 rows, each 256 bytes. Now, in a single SQL commit, I update row 10, row 30000 and row 80000. How much should I expect the WAL increase by? (supposing no WAL segments will be deleted). I could guess 8192x3 but I'm not sure Regards Pupillo
t.dalpozzo@gmail.com wrote: > two question related to the WAL. > > 1) I read in the doc that journaled FS is not important as WAL is > journaling itself. But who garantees that the WAL is written correctly? > I know that it's sequential and a partial update of WAL can be discarded > after a restart. But am I sure that without a journaled FS, if there is > a crash during the WAL update, nothing already updated in the WAL before > my commit can get corrupted? At commit time, the WAL is "synchronized": PostgreSQL instructs the operating system to write the data to the physical medium (not just a memory cache) and only return success if that write was successful. After a successful commit, the WAL file and its metadata are on disk. Moreover, the file metadata won't change (except for the write and access timestamps) because WAL files are created with their full size and never extended, so no WAL file should ever get "lost" because of partial metadata writes. > 2) Let's suppose that I have one database, one table of 100000 rows, > each 256 bytes. Now, in a single SQL commit, I update row 10, row 30000 > and row 80000. How much should I expect the WAL increase by? (supposing > no WAL segments will be deleted). I could guess 8192x3 but I'm not sure It will be that much immediately after a checkpoint, but for subsequent writes to the same disk block only the actually changed parts of the data block will be written to WAL. Yours, Laurenz Albe
On Fri, Oct 14, 2016 at 11:27 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > After a successful commit, the WAL file and its metadata are on disk. > Moreover, the file metadata won't change (except for the write and access > timestamps) because WAL files are created with their full size and never > extended, so no WAL file should ever get "lost" because of partial metadata > writes. This behavior depends as well on the value of wal_sync_method. For example with fdatasync the metadata is not flushed. It does not matter any for for WAL segments as Albe has already mentioned, but the choice here impacts performance. -- Michael
So, as for the data content of the WAL file, I see that no more page will be allocated. I wonder if during a crash, strange things can still happen at disk level however, in particular in SSD devices; on these things we have no control, and perhaps journaling helps? As for the metadata, if during a crash it's flushed (with fdatasync, only when the FS decides to do that), can anything bad happen without journaling? Third, let's suppose that the WAL can't get corrupted. When the system flushes data pages to the disk according to the WAL content, if there is a crash, am I sure that tables files old pages and /or their metadata, inode.... can't get corrupted? If that, there is no possibility to reconstruct the things, even through the WAL. Even in this case, perhaps journaling helps. I don't mind about performance but I absolutely mind about reliability, so I was thinking about the safest setting of linux FS and postgresql I can use. Thanks! Pupillo Il 15/10/2016 07:52, Michael Paquier ha scritto: > On Fri, Oct 14, 2016 at 11:27 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: >> After a successful commit, the WAL file and its metadata are on disk. >> Moreover, the file metadata won't change (except for the write and access >> timestamps) because WAL files are created with their full size and never >> extended, so no WAL file should ever get "lost" because of partial metadata >> writes. > This behavior depends as well on the value of wal_sync_method. For > example with fdatasync the metadata is not flushed. It does not matter > any for for WAL segments as Albe has already mentioned, but the choice > here impacts performance.
t.dalpozzo@gmail.com wrote: > I don't mind about performance but I absolutely mind about reliability, > so I was thinking about the safest setting of linux FS and postgresql I > can use. Sure, use journaling then. I do it all the time. Yours, Laurenz Albe
-----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of t.dalpozzo@gmail.com Sent: Wednesday, October 19, 2016 11:01 AM To: Michael Paquier <michael.paquier@gmail.com> Cc: Albe Laurenz <laurenz.albe@wien.gv.at>; pgsql-general@postgresql.org Subject: Re: [GENERAL] journaled FS and and WAL So, as for the data content of the WAL file, I see that no more page will be allocated. I wonder if during a crash, strangethings can still happen at disk level however, in particular in SSD devices; on these things we have no control, andperhaps journaling helps? As for the metadata, if during a crash it's flushed (with fdatasync, only when the FS decides to do that), can anything badhappen without journaling? Third, let's suppose that the WAL can't get corrupted. When the system flushes data pages to the disk according to the WALcontent, if there is a crash, am I sure that tables files old pages and /or their metadata, inode.... can't get corrupted? If that, there is no possibility to reconstruct the things, even through the WAL. Even in this case, perhaps journaling helps. I don't mind about performance but I absolutely mind about reliability, so I was thinking about the safest setting of linuxFS and postgresql I can use. Thanks! Pupillo Il 15/10/2016 07:52, Michael Paquier ha scritto: > On Fri, Oct 14, 2016 at 11:27 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: >> After a successful commit, the WAL file and its metadata are on disk. >> Moreover, the file metadata won't change (except for the write and >> access >> timestamps) because WAL files are created with their full size and >> never extended, so no WAL file should ever get "lost" because of >> partial metadata writes. > This behavior depends as well on the value of wal_sync_method. For > example with fdatasync the metadata is not flushed. It does not matter > any for for WAL segments as Albe has already mentioned, but the choice > here impacts performance. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general Hi! PG can lost its segments from data file and nobody knows it. For PG - no file = no data and no need to recover after crash,there is no infos about what data files belongs to PG. After this don’t bother about WAL and anything else =) Just use FS with journal, check sums you DB with initdb -k, fsync=on , do regular backups and check it thoroughly with restore.Also don’t forget to praise the gods that so far PG clogs file is not corrupted while being not protected by anychecksums in minds. Youl never know that PG clog is corrupted until "doomsday" -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, let's suppose I have: - a primary server with its own local archive location, configured for continuous archiving - a standby server without archive. These servers are configured for Sync streaming replication . Let's suppose that the standby stays down for a long time, then it restarts, goes into catchup mode and now needs some old WALs from the server archive location. Will the standby be able to automatically drain those files through the replication or only the WALs being currently updated by the primary ? Regards Pupillo
On Tuesday 25 October 2016 17:08:26 t.dalpozzo@gmail.com wrote: > Hi, > let's suppose I have: > - a primary server with its own local archive location, configured for > continuous archiving > - a standby server without archive. > These servers are configured for Sync streaming replication . > Let's suppose that the standby stays down for a long time, then it > restarts, goes into catchup mode and now needs some old WALs from the > server archive location. > Will the standby be able to automatically drain those files through the > replication or only the WALs being currently updated by the primary ? > It would need its own direct access to the master's archive.
I may be confused but... On Tue, Oct 25, 2016 at 5:08 PM, t.dalpozzo@gmail.com <t.dalpozzo@gmail.com> wrote: > These servers are configured for Sync streaming replication . > Let's suppose that the standby stays down for a long time, then it restarts, Doesn't sync replication plus standby down mean primary will stop accepting work? Francisco Olarte.
Sure you're right... my oversight, sorry. I wanted only to create a situation in which the standby remains quite behind with updates, so we can suppose that there is a list of standby servers (so the primary keeps going on with the 2nd) or simply suppose the replication async. Pupillo Il 25/10/2016 18:17, Francisco Olarte ha scritto: > I may be confused but... > > On Tue, Oct 25, 2016 at 5:08 PM, t.dalpozzo@gmail.com > <t.dalpozzo@gmail.com> wrote: >> These servers are configured for Sync streaming replication . >> Let's suppose that the standby stays down for a long time, then it restarts, > Doesn't sync replication plus standby down mean primary will stop > accepting work? > > Francisco Olarte.