Обсуждение: pg_archivecleanup bug
An EDB customer reported a problem with pg_archivecleanup which I have looked into and found a likely cause. It is, in any event, a bug which I think should be fixed. It has to do with our use of the readdir() function: http://pubs.opengroup.org/onlinepubs/7908799/xsh/readdir_r.html These are the relevant bits: | Applications wishing to check for error situations should set | errno to 0 before calling readdir(). If errno is set to non-zero | on return, an error occurred. | Upon successful completion, readdir() returns a pointer to an | object of type struct dirent. When an error is encountered, a | null pointer is returned and errno is set to indicate the error. | When the end of the directory is encountered, a null pointer is | returned and errno is not changed. Here is our current usage: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/pg_archivecleanup/pg_archivecleanup.c;h=8f77998de12f95f41bb95c3e05a14de6cdf18047;hb=7800229b36d0444cf2c61f5c5895108ee5e8ee2a#l110 So an error in scanning the directory will not be reported; the cleanup will quietly terminate the WAL deletions without processing the remainder of the directory. Attached is the simplest fix, which would report the error, stop looking for WAL files, and continue with other clean-ups. I'm not sure we should keep the fix that simple. We could set a flag so that we would exit with a non-zero code, or we could try a new directory scan as long as the last scan found and deleted at least one WAL file. Perhaps we want to back-patch the simple fix and do something fancier for 9.4? I would also add a few comment lines before committing this, if we decide to go with the simple approach -- this is for purposes of illustration; to facilitate discussion. Thoughts? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Thu, Dec 5, 2013 at 3:06 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > An EDB customer reported a problem with pg_archivecleanup which I > have looked into and found a likely cause. It is, in any event, a > bug which I think should be fixed. It has to do with our use of > the readdir() function: > > http://pubs.opengroup.org/onlinepubs/7908799/xsh/readdir_r.html > > These are the relevant bits: > > | Applications wishing to check for error situations should set > | errno to 0 before calling readdir(). If errno is set to non-zero > | on return, an error occurred. > > | Upon successful completion, readdir() returns a pointer to an > | object of type struct dirent. When an error is encountered, a > | null pointer is returned and errno is set to indicate the error. > | When the end of the directory is encountered, a null pointer is > | returned and errno is not changed. > > Here is our current usage: > > http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/pg_archivecleanup/pg_archivecleanup.c;h=8f77998de12f95f41bb95c3e05a14de6cdf18047;hb=7800229b36d0444cf2c61f5c5895108ee5e8ee2a#l110 > > So an error in scanning the directory will not be reported; the > cleanup will quietly terminate the WAL deletions without processing > the remainder of the directory. Attached is the simplest fix, > which would report the error, stop looking for WAL files, and > continue with other clean-ups. I'm not sure we should keep the fix > that simple. We could set a flag so that we would exit with a > non-zero code, or we could try a new directory scan as long as the > last scan found and deleted at least one WAL file. Perhaps we want > to back-patch the simple fix and do something fancier for 9.4? A directory that you can't read sounds like a pretty bad thing. I'd be inclined to print an error message and exit forthwith. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > | Applications wishing to check for error situations should set > | errno to 0 before calling readdir(). If errno is set to non-zero > | on return, an error occurred. > So an error in scanning the directory will not be reported; the > cleanup will quietly terminate the WAL deletions without processing > the remainder of the directory. Attached is the simplest fix, > which would report the error, stop looking for WAL files, and > continue with other clean-ups. I'm not sure we should keep the fix > that simple. We could set a flag so that we would exit with a > non-zero code, or we could try a new directory scan as long as the > last scan found and deleted at least one WAL file. Perhaps we want > to back-patch the simple fix and do something fancier for 9.4? A quick grep shows about ten other readdir() usages, most of which have a similar disease. In general, I think there is no excuse for code in the backend to use readdir() directly; it should be using ReadDir(), which takes care of this as well as error reporting. It appears that src/backend/storage/ipc/dsm.c didn't get that memo; it certainly is innocent of any error checking concerns. But the other usages seem to be in assorted utilities, which will need to do it right for themselves. initdb.c's walkdir() seems to have it right and might be a reasonable model to follow. Or maybe we should invent a frontend-friendly version of ReadDir() rather than duplicating all the error checking code in ten-and-counting places? regards, tom lane
On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > In general, I think there is no excuse for code in the backend to use > readdir() directly; it should be using ReadDir(), which takes care of this > as well as error reporting. My understanding is that the fd.c infrastructure can't be used in the postmaster. I agree that sucks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In general, I think there is no excuse for code in the backend to use >> readdir() directly; it should be using ReadDir(), which takes care of this >> as well as error reporting. > My understanding is that the fd.c infrastructure can't be used in the > postmaster. Say what? See ParseConfigDirectory for code that certainly runs in the postmaster, and uses ReadDir(). regards, tom lane
On Fri, Dec 6, 2013 at 11:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> In general, I think there is no excuse for code in the backend to use >>> readdir() directly; it should be using ReadDir(), which takes care of this >>> as well as error reporting. > >> My understanding is that the fd.c infrastructure can't be used in the >> postmaster. > > Say what? See ParseConfigDirectory for code that certainly runs in the > postmaster, and uses ReadDir(). Gosh, I could have sworn that I had calls into fd.c that were crashing and burning during development because they happened too early in postmaster startup. But it seems to work fine now, so I've pushed a fix for this and a few related issues. Please let me know if you think there are remaining issues. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > But the other usages seem to be in assorted utilities, which > will need to do it right for themselves. initdb.c's walkdir() seems to > have it right and might be a reasonable model to follow. Or maybe we > should invent a frontend-friendly version of ReadDir() rather than > duplicating all the error checking code in ten-and-counting places? If there's enough uniformity in all of those places to make that feasible, it certainly seems wise to do it that way. I don't know if that's the case, though - e.g. maybe some callers want to exit and others do not. pg_resetxlog wants to exit; pg_archivecleanup and pg_standby most likely want to print an error and carry on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 5, 2013 at 12:06:07PM -0800, Kevin Grittner wrote: > An EDB customer reported a problem with pg_archivecleanup which I > have looked into and found a likely cause. It is, in any event, a > bug which I think should be fixed. It has to do with our use of > the readdir() function: > > http://pubs.opengroup.org/onlinepubs/7908799/xsh/readdir_r.html > > These are the relevant bits: > > | Applications wishing to check for error situations should set > | errno to 0 before calling readdir(). If errno is set to non-zero > | on return, an error occurred. > > | Upon successful completion, readdir() returns a pointer to an > | object of type struct dirent. When an error is encountered, a > | null pointer is returned and errno is set to indicate the error. > | When the end of the directory is encountered, a null pointer is > | returned and errno is not changed. Wow, another case where errno clearing is necessary. We were just looking this requirement for getpwuid() last week. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Mon, Dec 9, 2013 at 11:27:28AM -0500, Robert Haas wrote: > On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > But the other usages seem to be in assorted utilities, which > > will need to do it right for themselves. initdb.c's walkdir() seems to > > have it right and might be a reasonable model to follow. Or maybe we > > should invent a frontend-friendly version of ReadDir() rather than > > duplicating all the error checking code in ten-and-counting places? > > If there's enough uniformity in all of those places to make that > feasible, it certainly seems wise to do it that way. I don't know if > that's the case, though - e.g. maybe some callers want to exit and > others do not. pg_resetxlog wants to exit; pg_archivecleanup and > pg_standby most likely want to print an error and carry on. I have developed the attached patch which fixes all cases where readdir() wasn't checking for errno, and cleaned up the syntax in other cases to be consistent. While I am not a fan of backpatching, the fact we are ignoring errors in some critical cases seems the non-cosmetic parts should be backpatched. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Вложения
On Thu, Mar 13, 2014 at 1:48 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Mon, Dec 9, 2013 at 11:27:28AM -0500, Robert Haas wrote: >> On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > But the other usages seem to be in assorted utilities, which >> > will need to do it right for themselves. initdb.c's walkdir() seems to >> > have it right and might be a reasonable model to follow. Or maybe we >> > should invent a frontend-friendly version of ReadDir() rather than >> > duplicating all the error checking code in ten-and-counting places? >> >> If there's enough uniformity in all of those places to make that >> feasible, it certainly seems wise to do it that way. I don't know if >> that's the case, though - e.g. maybe some callers want to exit and >> others do not. pg_resetxlog wants to exit; pg_archivecleanup and >> pg_standby most likely want to print an error and carry on. > > I have developed the attached patch which fixes all cases where > readdir() wasn't checking for errno, and cleaned up the syntax in other > cases to be consistent. Thanks! > While I am not a fan of backpatching, the fact we are ignoring errors in > some critical cases seems the non-cosmetic parts should be backpatched. While I haven't read the patch, I agree that this is a back-patchable bug fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 13, 2014 at 11:18 AM, Bruce Momjian <bruce@momjian.us> wrote: > > I have developed the attached patch which fixes all cases where > readdir() wasn't checking for errno, and cleaned up the syntax in other > cases to be consistent. 1. One common thing missed wherever handling for errno is added is below check which is present in all existing cases where errno is used (initdb.c, pg_resetxlog.c, ReadDir, ..) #ifdef WIN32 /* * This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in * released version */ if (GetLastError() == ERROR_NO_MORE_FILES) errno = 0; #endif 2. ! if (errno || closedir(chkdir) == -1) result = -1; /* some kind of I/O error? */ Is there a special need to check return value of closedir in this function, as all other uses (initdb.c, pg_resetxlog.c, pgfnames.c) of it in similar context doesn't check the same? One thing I think for which this code needs change is to check errno before closedir as is done in initdb.c or pg_resetxlog.c > While I am not a fan of backpatching, the fact we are ignoring errors in > some critical cases seems the non-cosmetic parts should be backpatched. +1 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 18, 2014 at 11:25:46AM +0530, Amit Kapila wrote: > On Thu, Mar 13, 2014 at 11:18 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > > I have developed the attached patch which fixes all cases where > > readdir() wasn't checking for errno, and cleaned up the syntax in other > > cases to be consistent. > > > 1. One common thing missed wherever handling for errno is added > is below check which is present in all existing cases where errno > is used (initdb.c, pg_resetxlog.c, ReadDir, ..) > > #ifdef WIN32 > /* > * This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in > * released version > */ > if (GetLastError() == ERROR_NO_MORE_FILES) > errno = 0; > #endif Very good point. I have modified the patch to add this block in all cases where it was missing. I started to wonder about the comment and if the Mingw fix was released. Based on some research, I see this as fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old. (What I don't know is when that was paired with Msys in a bundled release.) Here is the Mingw fixed code: http://ftp.ntua.gr/mirror/mingw/OldFiles/mingw-runtime-3.2-src.tar.gz { /* Get the next search entry. */ if(_tfindnext (dirp->dd_handle, &(dirp->dd_dta))) { /* We are off the end or otherwise error. _findnext setserrno to ENOENT if no more file Undo this. */ DWORD winerr = GetLastError(); if (winerr == ERROR_NO_MORE_FILES) errno = 0; The current code has a better explanation: http://sourceforge.net/p/mingw/mingw-org-wsl/ci/master/tree/src/libcrt/tchar/dirent.cif( dirp->dd_private.dd_stat++ > 0 ){ /* Otherwise... * * Get the next search entry. POSIX mandates that this must * return NULL after the last entryhas been read, but that it * MUST NOT change errno in this case. MS-Windows _findnext() * DOES change errno (toENOENT) after the last entry has been * read, so we must be prepared to restore it to its previous * value, whenno actual error has occurred. */ int prev_errno = errno; if( DIRENT_UPDATE( dirp->dd_private ) != 0 ) { /* May be an error, or just the case described above... */ if( GetLastError() == ERROR_NO_MORE_FILES) /* * ...which requires us to reset errno. */ errno = prev_errno; but it is basically doing the same thing. I am wondering if we should back-patch the PG code block where it was missing, and remove it from head in all places on the logic that everyone running 9.4 will have a post-3.1 version of Mingw. Postgres 8.4 was released in 2009 and it is possible some people are still using pre-3.2 Mingw versions with that PG release. > 2. > ! if (errno || closedir(chkdir) == -1) > result = -1; /* some kind of I/O error? */ > > Is there a special need to check return value of closedir in this > function, as all other uses (initdb.c, pg_resetxlog.c, pgfnames.c) > of it in similar context doesn't check the same? > > One thing I think for which this code needs change is to check > errno before closedir as is done in initdb.c or pg_resetxlog.c Yes, good point. Patch adjusted to add this. > > While I am not a fan of backpatching, the fact we are ignoring errors in > > some critical cases seems the non-cosmetic parts should be backpatched. > > +1 The larger the patch gets, the more worried I am about backpatching. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: > Very good point. I have modified the patch to add this block in all > cases where it was missing. I started to wonder about the comment and > if the Mingw fix was released. Based on some research, I see this as > fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old. Yeah. I would vote for removing that code in all branches. There is no reason to suppose somebody is going to install 8.4.22 on a machine that they haven't updated mingw on since 2003. Or, if you prefer, just remove it in HEAD --- but going around and *adding* more copies seems like make-work. The fact that we've not heard complaints about the omissions is good evidence that nobody's using the buggy mingw versions anymore. regards, tom lane
On Tue, Mar 18, 2014 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: >> Very good point. I have modified the patch to add this block in all >> cases where it was missing. I started to wonder about the comment and >> if the Mingw fix was released. Based on some research, I see this as >> fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old. > > Yeah. I would vote for removing that code in all branches. There is no > reason to suppose somebody is going to install 8.4.22 on a machine that > they haven't updated mingw on since 2003. Or, if you prefer, just remove > it in HEAD --- but going around and *adding* more copies seems like > make-work. The fact that we've not heard complaints about the omissions > is good evidence that nobody's using the buggy mingw versions anymore. I don't think it is. Right now we're not checking errno *at all* in a bunch of these places, so we're sure not going to get complaints about doing it incorrectly in those places. Or do I need more caffeine? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 18, 2014 at 10:03:46AM -0400, Robert Haas wrote: > On Tue, Mar 18, 2014 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Bruce Momjian <bruce@momjian.us> writes: > >> Very good point. I have modified the patch to add this block in all > >> cases where it was missing. I started to wonder about the comment and > >> if the Mingw fix was released. Based on some research, I see this as > >> fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old. > > > > Yeah. I would vote for removing that code in all branches. There is no > > reason to suppose somebody is going to install 8.4.22 on a machine that > > they haven't updated mingw on since 2003. Or, if you prefer, just remove > > it in HEAD --- but going around and *adding* more copies seems like > > make-work. The fact that we've not heard complaints about the omissions > > is good evidence that nobody's using the buggy mingw versions anymore. > > I don't think it is. Right now we're not checking errno *at all* in a > bunch of these places, so we're sure not going to get complaints about > doing it incorrectly in those places. Or do I need more caffeine? You are correct. This code is seriously broken and I am susprised we have not gotten more complaints. Good thing readdir/closedir rarely fail. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian escribió: > On Tue, Mar 18, 2014 at 10:03:46AM -0400, Robert Haas wrote: > > On Tue, Mar 18, 2014 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > >> Very good point. I have modified the patch to add this block in all > > >> cases where it was missing. I started to wonder about the comment and > > >> if the Mingw fix was released. Based on some research, I see this as > > >> fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old. > > > > > > Yeah. I would vote for removing that code in all branches. There is no > > > reason to suppose somebody is going to install 8.4.22 on a machine that > > > they haven't updated mingw on since 2003. Or, if you prefer, just remove > > > it in HEAD --- but going around and *adding* more copies seems like > > > make-work. The fact that we've not heard complaints about the omissions > > > is good evidence that nobody's using the buggy mingw versions anymore. > > > > I don't think it is. Right now we're not checking errno *at all* in a > > bunch of these places, so we're sure not going to get complaints about > > doing it incorrectly in those places. Or do I need more caffeine? > > You are correct. This code is seriously broken and I am susprised we > have not gotten more complaints. Good thing readdir/closedir rarely > fail. I think we need to keep the check for old mingw runtime in older branches; it seems reasonable to keep updating Postgres when new versions come out but keep mingw the same if it doesn't break. A good criterion here, to me, is: would we make it a runtime error if an old mingw version is detected? If we would, then let's go and remove all those errno checks. Then we force everyone to update to a sane mingw. But if we're not adding such a check, then we might cause subtle trouble just because we think running old mingw is unlikely. On another note, please let's not make the code dissimilar in some branches just because of source code embellishments are not back-ported out of fear. I mean, if we want them in master, let them be in older branches as well. Otherwise we end up with slightly different versions that make back-patching future fixes a lot harder, for no gain. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 18 March 2014 14:15, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Bruce Momjian escribió: >> On Tue, Mar 18, 2014 at 10:03:46AM -0400, Robert Haas wrote: >> > On Tue, Mar 18, 2014 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > > Bruce Momjian <bruce@momjian.us> writes: >> > >> Very good point. I have modified the patch to add this block in all >> > >> cases where it was missing. I started to wonder about the comment and >> > >> if the Mingw fix was released. Based on some research, I see this as >> > >> fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old. >> > > >> > > Yeah. I would vote for removing that code in all branches. There is no >> > > reason to suppose somebody is going to install 8.4.22 on a machine that >> > > they haven't updated mingw on since 2003. Or, if you prefer, just remove >> > > it in HEAD --- but going around and *adding* more copies seems like >> > > make-work. The fact that we've not heard complaints about the omissions >> > > is good evidence that nobody's using the buggy mingw versions anymore. >> > >> > I don't think it is. Right now we're not checking errno *at all* in a >> > bunch of these places, so we're sure not going to get complaints about >> > doing it incorrectly in those places. Or do I need more caffeine? >> >> You are correct. This code is seriously broken and I am susprised we >> have not gotten more complaints. Good thing readdir/closedir rarely >> fail. > back-patching Some commentary on this... Obviously, all errors are mine. If pg_archivecleanup is a problem, then so is pg_standby a problem. Given the above, this means we've run for about 7 years without a reported issue on this. If we are going to "make this better" by actually having it throw errors in places that didn't throw errors before, are we sure that is going to make people happier? The archive cleanup isn't exactly critical in most cases, so dynamic errors don't matter much. Also, the programs were originally written to work as standalone program as well as an archive_cleanup_command. So we can't use PostgreSQL infrastructure (can we?). That aspect is needed to allow testing the program before it goes live. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 18, 2014 at 11:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Given the above, this means we've run for about 7 years without a > reported issue on this. If we are going to "make this better" by > actually having it throw errors in places that didn't throw errors > before, are we sure that is going to make people happier? The archive > cleanup isn't exactly critical in most cases, so dynamic errors don't > matter much. We report errors returned by system calls in many other places. I can't see why this place should be any different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 18 March 2014 15:50, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Mar 18, 2014 at 11:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Given the above, this means we've run for about 7 years without a >> reported issue on this. If we are going to "make this better" by >> actually having it throw errors in places that didn't throw errors >> before, are we sure that is going to make people happier? The archive >> cleanup isn't exactly critical in most cases, so dynamic errors don't >> matter much. > > We report errors returned by system calls in many other places. I > can't see why this place should be any different. Sure. Just wanted to make sure it's a conscious, explicit choice to do so. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 13 March 2014 05:48, Bruce Momjian <bruce@momjian.us> wrote: > On Mon, Dec 9, 2013 at 11:27:28AM -0500, Robert Haas wrote: >> On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > But the other usages seem to be in assorted utilities, which >> > will need to do it right for themselves. initdb.c's walkdir() seems to >> > have it right and might be a reasonable model to follow. Or maybe we >> > should invent a frontend-friendly version of ReadDir() rather than >> > duplicating all the error checking code in ten-and-counting places? >> >> If there's enough uniformity in all of those places to make that >> feasible, it certainly seems wise to do it that way. I don't know if >> that's the case, though - e.g. maybe some callers want to exit and >> others do not. pg_resetxlog wants to exit; pg_archivecleanup and >> pg_standby most likely want to print an error and carry on. > > I have developed the attached patch which fixes all cases where > readdir() wasn't checking for errno, and cleaned up the syntax in other > cases to be consistent. > > While I am not a fan of backpatching, the fact we are ignoring errors in > some critical cases seems the non-cosmetic parts should be backpatched. pg_resetxlog was not an offender here; its coding was sound. We shouldn't be discussing backpatching a patch that contains changes to coding style. ISTM we should change the code with missing checks to adopt the coding style of pg_resetxlog, not the other way around. I assume you or Kevin have this in hand and you don't want me to apply the patch? (Since it was originally my bug) -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 18, 2014 at 04:17:53PM +0000, Simon Riggs wrote: > > While I am not a fan of backpatching, the fact we are ignoring errors in > > some critical cases seems the non-cosmetic parts should be backpatched. > > pg_resetxlog was not an offender here; its coding was sound. > > We shouldn't be discussing backpatching a patch that contains changes > to coding style. I was going to remove the coding style changes to pg_resetxlog from the backpatched portion. > ISTM we should change the code with missing checks to adopt the coding > style of pg_resetxlog, not the other way around. > > I assume you or Kevin have this in hand and you don't want me to apply > the patch? (Since it was originally my bug) I know the email subject says pg_archivecleanup but the problem is all over our code. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 18 March 2014 18:01, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Mar 18, 2014 at 04:17:53PM +0000, Simon Riggs wrote: >> > While I am not a fan of backpatching, the fact we are ignoring errors in >> > some critical cases seems the non-cosmetic parts should be backpatched. >> >> pg_resetxlog was not an offender here; its coding was sound. >> >> We shouldn't be discussing backpatching a patch that contains changes >> to coding style. > > I was going to remove the coding style changes to pg_resetxlog from the > backpatched portion. Why make style changes at all? A patch with no style changes would mean backpatch and HEAD versions would be the same. >> ISTM we should change the code with missing checks to adopt the coding >> style of pg_resetxlog, not the other way around. >> >> I assume you or Kevin have this in hand and you don't want me to apply >> the patch? (Since it was originally my bug) > > I know the email subject says pg_archivecleanup but the problem is all > over our code. Yes, understood. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 18, 2014 at 06:11:30PM +0000, Simon Riggs wrote: > On 18 March 2014 18:01, Bruce Momjian <bruce@momjian.us> wrote: > > On Tue, Mar 18, 2014 at 04:17:53PM +0000, Simon Riggs wrote: > >> > While I am not a fan of backpatching, the fact we are ignoring errors in > >> > some critical cases seems the non-cosmetic parts should be backpatched. > >> > >> pg_resetxlog was not an offender here; its coding was sound. > >> > >> We shouldn't be discussing backpatching a patch that contains changes > >> to coding style. > > > > I was going to remove the coding style changes to pg_resetxlog from the > > backpatched portion. > > Why make style changes at all? A patch with no style changes would > mean backpatch and HEAD versions would be the same. The old style had errno set in two places in the loop, while the new style has it set in only one place. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian escribió: > On Tue, Mar 18, 2014 at 06:11:30PM +0000, Simon Riggs wrote: > > On 18 March 2014 18:01, Bruce Momjian <bruce@momjian.us> wrote: > > > On Tue, Mar 18, 2014 at 04:17:53PM +0000, Simon Riggs wrote: > > >> > While I am not a fan of backpatching, the fact we are ignoring errors in > > >> > some critical cases seems the non-cosmetic parts should be backpatched. > > >> > > >> pg_resetxlog was not an offender here; its coding was sound. > > >> > > >> We shouldn't be discussing backpatching a patch that contains changes > > >> to coding style. > > > > > > I was going to remove the coding style changes to pg_resetxlog from the > > > backpatched portion. > > > > Why make style changes at all? A patch with no style changes would > > mean backpatch and HEAD versions would be the same. > > The old style had errno set in two places in the loop, while the new > style has it set in only one place. I think it makes more sense to have all callsites look the same in all supported branches. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 18 March 2014 18:18, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Mar 18, 2014 at 06:11:30PM +0000, Simon Riggs wrote: >> On 18 March 2014 18:01, Bruce Momjian <bruce@momjian.us> wrote: >> > On Tue, Mar 18, 2014 at 04:17:53PM +0000, Simon Riggs wrote: >> >> > While I am not a fan of backpatching, the fact we are ignoring errors in >> >> > some critical cases seems the non-cosmetic parts should be backpatched. >> >> >> >> pg_resetxlog was not an offender here; its coding was sound. >> >> >> >> We shouldn't be discussing backpatching a patch that contains changes >> >> to coding style. >> > >> > I was going to remove the coding style changes to pg_resetxlog from the >> > backpatched portion. >> >> Why make style changes at all? A patch with no style changes would >> mean backpatch and HEAD versions would be the same. > > The old style had errno set in two places in the loop, while the new > style has it set in only one place. Seems better to leave the previously-good coding in place. ISTM to be clearer to use simple C. You're doing this anyway for the backpatch, so I'm not causing you any more work. Better one patch than two. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs escribió: > On 18 March 2014 18:18, Bruce Momjian <bruce@momjian.us> wrote: > > On Tue, Mar 18, 2014 at 06:11:30PM +0000, Simon Riggs wrote: > >> Why make style changes at all? A patch with no style changes would > >> mean backpatch and HEAD versions would be the same. > > > > The old style had errno set in two places in the loop, while the new > > style has it set in only one place. > > Seems better to leave the previously-good coding in place. ISTM to be > clearer to use simple C. If you're saying we should use that style in all readdir loops, with the errno=0 before the loop and at the bottom of it, I don't disagree. Let's just make sure they're all safe though (i.e. watch out for "continue" for instance). That said, I don't find comma expression to be particularly "not simple". -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 18 March 2014 18:55, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > That said, I don't find comma expression to be particularly "not > simple". Maybe, but we've not used it before anywhere in Postgres, so I don't see a reason to start now. Especially since C is not the native language of many people these days and people just won't understand it. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03/18/2014 09:04 PM, Simon Riggs wrote: > On 18 March 2014 18:55, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> That said, I don't find comma expression to be particularly "not >> simple". > > Maybe, but we've not used it before anywhere in Postgres, so I don't > see a reason to start now. Especially since C is not the native > language of many people these days and people just won't understand > it. Agreed. The psqlODBC code is littered with comma expressions, and the first time I saw it, it took me a really long time to figure out what "if (foo = malloc(...), foo) { } " meant. And I consider myself quite experienced with C. - Heikki
On Tue, Mar 18, 2014 at 09:13:28PM +0200, Heikki Linnakangas wrote: > On 03/18/2014 09:04 PM, Simon Riggs wrote: > >On 18 March 2014 18:55, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > >>That said, I don't find comma expression to be particularly "not > >>simple". > > > >Maybe, but we've not used it before anywhere in Postgres, so I don't > >see a reason to start now. Especially since C is not the native > >language of many people these days and people just won't understand > >it. > > Agreed. The psqlODBC code is littered with comma expressions, and > the first time I saw it, it took me a really long time to figure out > what "if (foo = malloc(...), foo) { } " meant. And I consider myself > quite experienced with C. I can see how the comma syntax would be confusing, though it does the job well. Attached is a patch that does the double-errno. However, some of these loops are large, and there are 'continue' calls in there, causing the addition of many new errno locations. I am not totally comfortable that this new coding layout will stay unbroken. Would people accept? for (errno = 0; (dirent = readdir(dir)) != NULL; errno = 0) That would keep the errno's together and avoid the 'continue' additions. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Вложения
Bruce Momjian <bruce@momjian.us> writes: > Would people accept? > for (errno = 0; (dirent = readdir(dir)) != NULL; errno = 0) It's a bit weird looking, but I agree that there's value in only needing the errno-zeroing in precisely one place. regards, tom lane
On 03/19/2014 02:30 AM, Bruce Momjian wrote: > On Tue, Mar 18, 2014 at 09:13:28PM +0200, Heikki Linnakangas wrote: >> On 03/18/2014 09:04 PM, Simon Riggs wrote: >>> On 18 March 2014 18:55, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> >>>> That said, I don't find comma expression to be particularly "not >>>> simple". >>> >>> Maybe, but we've not used it before anywhere in Postgres, so I don't >>> see a reason to start now. Especially since C is not the native >>> language of many people these days and people just won't understand >>> it. >> >> Agreed. The psqlODBC code is littered with comma expressions, and >> the first time I saw it, it took me a really long time to figure out >> what "if (foo = malloc(...), foo) { } " meant. And I consider myself >> quite experienced with C. > > I can see how the comma syntax would be confusing, though it does the > job well. Attached is a patch that does the double-errno. However, > some of these loops are large, and there are 'continue' calls in there, > causing the addition of many new errno locations. I am not totally > comfortable that this new coding layout will stay unbroken. > > Would people accept? > > for (errno = 0; (dirent = readdir(dir)) != NULL; errno = 0) > > That would keep the errno's together and avoid the 'continue' additions. That's clever. A less clever way would be: for (;;) { errno = 0; if ((dirent = readdir(dir)) != NULL) break; ... } I'm fine with either, but that's how I would naturally write it. Yet another way would be to have a wrapper function for readdir that resets errno, and just replace the current readdir() calls with that. And now that I look at initdb.c, walkdir is using the comma expression for this already. So there's some precedence, and it doesn't actually look that bad. So I withdraw my objection for that approach; I'm fine with any of the discussed alternatives, really. - Heikki
On Wed, Mar 19, 2014 at 09:59:19AM +0200, Heikki Linnakangas wrote: > >Would people accept? > > > > for (errno = 0; (dirent = readdir(dir)) != NULL; errno = 0) > > > >That would keep the errno's together and avoid the 'continue' additions. > > That's clever. A less clever way would be: > > for (;;) > { > errno = 0; > if ((dirent = readdir(dir)) != NULL) > break; > > ... > } > > I'm fine with either, but that's how I would naturally write it. > > Yet another way would be to have a wrapper function for readdir that > resets errno, and just replace the current readdir() calls with > that. > > And now that I look at initdb.c, walkdir is using the comma > expression for this already. So there's some precedence, and it > doesn't actually look that bad. So I withdraw my objection for that > approach; I'm fine with any of the discussed alternatives, really. OK, let's go with the comma. Ironically, the for() loop would be an odd way to avoid commas as 'for' uses commas often: for (i = 0, j = 1; i < 10; i++, j++) The attached patch is slightly updated. I will apply it to head and all the back branches, including the stylistic change to pg_resetxlog (for consistency) and remove the MinGW block in head. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Вложения
On Wed, Mar 19, 2014 at 02:02:50PM -0400, Bruce Momjian wrote: > The attached patch is slightly updated. I will apply it to head and all > the back branches, including the stylistic change to pg_resetxlog (for > consistency) and remove the MinGW block in head. Patch applied back through 8.4. I had the closedir() tests backwards and that was fixed. I also went over all the readdir/closedir() calls in all back branches to make sure they were properly handled. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +