Обсуждение: strncpy is not a safe version of strcpy
Hi All,
As a bit of a background task, over the past few days I've been analysing the uses of strncpy in the code just to try and validate if it is the right function to be using. I've already seen quite a few places where their usage is wrongly assumed.
As many of you will know and maybe some of you have forgotten that strncpy is not a safe version of strcpy. It is also quite an inefficient way to copy a string to another buffer as strncpy will 0 out any space that happens to remain in the buffer. If there is no space left after the copy then the buffer won't end with a 0.
It is likely far better explained here --> http://www.courtesan.com/todd/papers/strlcpy.html
For example , the following 2 lines in jsonfuncs.c
memset(name, 0, NAMEDATALEN);
strncpy(name, fname, NAMEDATALEN);
The memset here is redundant as strncpy will null the remaining buffer. This example is not dangerous, but it does highlight that there's code that's made the final cut which made this wrong assumption about strncpy.
I was not going to bring this to light until I had done some more analysis, but there was just a commit which added a usage of strncpy that really looks like it should be a strlcpy.
I'll continue with my analysis, but perhaps posting this early will bring something to light which I've not yet realised.
Regards
David Rowley
On 15 Listopad 2013, 0:07, David Rowley wrote: > Hi All, > > As a bit of a background task, over the past few days I've been analysing > the uses of strncpy in the code just to try and validate if it is the > right > function to be using. I've already seen quite a few places where their > usage is wrongly assumed. > > As many of you will know and maybe some of you have forgotten that strncpy > is not a safe version of strcpy. It is also quite an inefficient way to > copy a string to another buffer as strncpy will 0 out any space that > happens to remain in the buffer. If there is no space left after the copy > then the buffer won't end with a 0. > > It is likely far better explained here --> > http://www.courtesan.com/todd/papers/strlcpy.html > > For example , the following 2 lines in jsonfuncs.c > > memset(name, 0, NAMEDATALEN); > strncpy(name, fname, NAMEDATALEN); Be careful with 'Name' data type - it's not just a simple string buffer. AFAIK it needs to work with hashing etc. so the zeroing is actually needed here to make sure two values produce the same result. At least that's how I understand the code after a quick check - for example this is from the same jsonfuncs.c you mentioned: memset(fname, 0, NAMEDATALEN); strncpy(fname, NameStr(tupdesc->attrs[i]->attname), NAMEDATALEN); hashentry = hash_search(json_hash,fname, HASH_FIND, NULL); So the zeroing is on purpose, although if strncpy does that then the memset is probably superflous. Either people do that because of habit / copy'n'paste, or maybe there are supported platforms when strncpy does not behave like this for some reason. I seriously doubt this inefficiency is going to be measurable in real world. If the result was a buffer-overflow bug, that'd be a different story, but maybe we could check the ~120 calls to strncpy in the whole code base and replace it with strlcpy where appropriate. That being said, thanks for looking into things like this. Tomas
On Fri, Nov 15, 2013 at 12:33 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> It is likely far better explained here -->Be careful with 'Name' data type - it's not just a simple string buffer.
> http://www.courtesan.com/todd/papers/strlcpy.html
>
> For example , the following 2 lines in jsonfuncs.c
>
> memset(name, 0, NAMEDATALEN);
> strncpy(name, fname, NAMEDATALEN);
AFAIK it needs to work with hashing etc. so the zeroing is actually needed
here to make sure two values produce the same result. At least that's how
I understand the code after a quick check - for example this is from the
same jsonfuncs.c you mentioned:
memset(fname, 0, NAMEDATALEN);
strncpy(fname, NameStr(tupdesc->attrs[i]->attname), NAMEDATALEN);
hashentry = hash_search(json_hash, fname, HASH_FIND, NULL);
So the zeroing is on purpose, although if strncpy does that then the
memset is probably superflous. Either people do that because of habit /
copy'n'paste, or maybe there are supported platforms when strncpy does not
behave like this for some reason.
I had not thought of the fact the some platforms don't properly implement strncpy(). On quick check http://man.he.net/man3/strncpy seems to indicate that this behaviour is part of the C89 standard. So does this mean we can always assume that all supported platforms always 0 out the remaining buffer?
I seriously doubt this inefficiency is going to be measurable in real
world. If the result was a buffer-overflow bug, that'd be a different
story, but maybe we could check the ~120 calls to strncpy in the whole
code base and replace it with strlcpy where appropriate.
The example was more of a demonstration of wrong assumption rather than wasted cycles. Though the wasted cycles was on my mind a bit too. I was more focused on trying to draw a bit of attention to commit 061b88c732952c59741374806e1e41c1ec845d50 which uses strncpy and does not properly set the last byte to 0 afterwards. I think this case could just be replaced with strlcpy which does all this hard work for us.
Regards
David Rowley
That being said, thanks for looking into things like this.
Tomas
On 15 Listopad 2013, 1:00, David Rowley wrote: > On Fri, Nov 15, 2013 at 12:33 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > >> > It is likely far better explained here --> >> > http://www.courtesan.com/todd/papers/strlcpy.html >> > >> > For example , the following 2 lines in jsonfuncs.c >> > >> > memset(name, 0, NAMEDATALEN); >> > strncpy(name, fname, NAMEDATALEN); >> >> Be careful with 'Name' data type - it's not just a simple string buffer. >> AFAIK it needs to work with hashing etc. so the zeroing is actually >> needed >> here to make sure two values produce the same result. At least that's >> how >> I understand the code after a quick check - for example this is from the >> same jsonfuncs.c you mentioned: >> >> memset(fname, 0, NAMEDATALEN); >> strncpy(fname, NameStr(tupdesc->attrs[i]->attname), NAMEDATALEN); >> hashentry = hash_search(json_hash, fname, HASH_FIND, NULL); >> >> So the zeroing is on purpose, although if strncpy does that then the >> memset is probably superflous. Either people do that because of habit / >> copy'n'paste, or maybe there are supported platforms when strncpy does >> not >> behave like this for some reason. >> >> > I had not thought of the fact the some platforms don't properly implement > strncpy(). On quick check http://man.he.net/man3/strncpy seems to indicate > that this behaviour is part of the C89 standard. So does this mean we can > always assume that all supported platforms always 0 out the remaining > buffer? I don't know about such platform - I was merely speculating about why people might use such code. >> I seriously doubt this inefficiency is going to be measurable in real >> world. If the result was a buffer-overflow bug, that'd be a different >> story, but maybe we could check the ~120 calls to strncpy in the whole >> code base and replace it with strlcpy where appropriate. >> >> > The example was more of a demonstration of wrong assumption rather than > wasted cycles. Though the wasted cycles was on my mind a bit too. I was Yeah. To be fair, number of occurrences in the code base is not a particularly exact measure of the impact - some of those uses might be used in code paths that are quite busy. > more focused on trying to draw a bit of attention to commit > 061b88c732952c59741374806e1e41c1ec845d50 which uses strncpy and does not > properly set the last byte to 0 afterwards. I think this case could just > be > replaced with strlcpy which does all this hard work for us. Hmm, you mean this piece of code? strncpy(saved_argv0, argv[0], MAXPGPATH); IMHO you're right that's probably broken, unless there's some checking happening before the call. Tomas
* Tomas Vondra (tv@fuzzy.cz) wrote: > On 15 Listopad 2013, 1:00, David Rowley wrote: > > more focused on trying to draw a bit of attention to commit > > 061b88c732952c59741374806e1e41c1ec845d50 which uses strncpy and does not > > properly set the last byte to 0 afterwards. I think this case could just > > be > > replaced with strlcpy which does all this hard work for us. > > Hmm, you mean this piece of code? > > strncpy(saved_argv0, argv[0], MAXPGPATH); > > IMHO you're right that's probably broken, unless there's some checking > happening before the call. Agreed, that looks like a place we should be using strlcpy() instead. Robert, what do you think? Thanks, Stephen
Tomas Vondra <tv@fuzzy.cz> wrote: > On 15 Listopad 2013, 1:00, David Rowley wrote: >> more focused on trying to draw a bit of attention to commit >> 061b88c732952c59741374806e1e41c1ec845d50 which uses strncpy and >> does not properly set the last byte to 0 afterwards. I think >> this case could just be replaced with strlcpy which does all >> this hard work for us. > > Hmm, you mean this piece of code? > > strncpy(saved_argv0, argv[0], MAXPGPATH); > > IMHO you're right that's probably broken, unless there's some > checking happening before the call. I agree, and there is no such checking. Fix pushed. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-11-15 09:24:59 -0500, Stephen Frost wrote: > * Tomas Vondra (tv@fuzzy.cz) wrote: > > On 15 Listopad 2013, 1:00, David Rowley wrote: > > > more focused on trying to draw a bit of attention to commit > > > 061b88c732952c59741374806e1e41c1ec845d50 which uses strncpy and does not > > > properly set the last byte to 0 afterwards. I think this case could just > > > be > > > replaced with strlcpy which does all this hard work for us. > > > > Hmm, you mean this piece of code? > > > > strncpy(saved_argv0, argv[0], MAXPGPATH); > > > > IMHO you're right that's probably broken, unless there's some checking > > happening before the call. > > Agreed, that looks like a place we should be using strlcpy() instead. I don't mind fixing it, but I think anything but s/strncpy/strlcpy/ is over the top. Translating such strings is just a waste of translator's time. If you really worry about paths being longer than MAXPGPATH, there's lots, and lots of things to do that are, far, far more critical than this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-11-15 04:21:50 +0100, Tomas Vondra wrote: > Hmm, you mean this piece of code? > > strncpy(saved_argv0, argv[0], MAXPGPATH); > > IMHO you're right that's probably broken, unless there's some checking > happening before the call. FWIW, argv0 is pretty much guaranteed to be shorter than MAXPGPATH since MAXPGPATH is the longest a path can be, and argv[0] is either the executable's name (if executed via PATH) or the path to the executable. Now, you could probably write a program to exeve() a binary with argv[0] being longer, but in that case you can also just put garbage in there. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Andres Freund (andres@2ndquadrant.com) wrote: > FWIW, argv0 is pretty much guaranteed to be shorter than MAXPGPATH since > MAXPGPATH is the longest a path can be, and argv[0] is either the executable's > name (if executed via PATH) or the path to the executable. Err, it's the longest that *we* think the path can be.. That's not the same as actually being the longest that a path can be, which depends on the filesystem and OS... It's not hard to get past our 1024 limit: sfrost@beorn:/really/long/path> echo $PWD | wc -c 1409 > Now, you could probably write a program to exeve() a binary with argv[0] > being longer, but in that case you can also just put garbage in there. We shouldn't blow up in that case either, really. Thanks, Stephen
On 2013-11-15 09:53:24 -0500, Stephen Frost wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: > > FWIW, argv0 is pretty much guaranteed to be shorter than MAXPGPATH since > > MAXPGPATH is the longest a path can be, and argv[0] is either the executable's > > name (if executed via PATH) or the path to the executable. > > Err, it's the longest that *we* think the path can be.. That's not the > same as actually being the longest that a path can be, which depends on > the filesystem and OS... It's not hard to get past our 1024 limit: Sure, there can be longer paths, but postgres don't support them. In a *myriad* of places. It's just not worth spending code on it. Just about any of the places that use MAXPGPATH are "vulnerable" or produce confusing error messages if it's exceeded. And there are about zero complaints about it. > > Now, you could probably write a program to exeve() a binary with argv[0] > > being longer, but in that case you can also just put garbage in there. > > We shouldn't blow up in that case either, really. Good luck. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Andres Freund (andres@2ndquadrant.com) wrote: > Sure, there can be longer paths, but postgres don't support them. In a > *myriad* of places. It's just not worth spending code on it. > > Just about any of the places that use MAXPGPATH are "vulnerable" or > produce confusing error messages if it's exceeded. And there are about > zero complaints about it. Confusing error messages are one thing, segfaulting is another. Thanks, Stephen
On 2013-11-15 10:04:12 -0500, Stephen Frost wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: > > Sure, there can be longer paths, but postgres don't support them. In a > > *myriad* of places. It's just not worth spending code on it. > > > > Just about any of the places that use MAXPGPATH are "vulnerable" or > > produce confusing error messages if it's exceeded. And there are about > > zero complaints about it. > > Confusing error messages are one thing, segfaulting is another. I didn't argue against s/strncpy/strlcpy/. That's clearly a sensible fix. I am arguing about introducing additional code and error messages about it, that need to be translated. And starting doing so in isolationtester of all places. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley escribió: > On Fri, Nov 15, 2013 at 12:33 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > > Be careful with 'Name' data type - it's not just a simple string buffer. > > AFAIK it needs to work with hashing etc. so the zeroing is actually needed > > here to make sure two values produce the same result. At least that's how > > I understand the code after a quick check - for example this is from the > > same jsonfuncs.c you mentioned: > > > > memset(fname, 0, NAMEDATALEN); > > strncpy(fname, NameStr(tupdesc->attrs[i]->attname), NAMEDATALEN); > > hashentry = hash_search(json_hash, fname, HASH_FIND, NULL); > > > > So the zeroing is on purpose, although if strncpy does that then the > > memset is probably superflous. This code should probably be using namecpy(). Note namecpy() doesn't memset() after strncpy() and has survived the test of time, which strongly suggests that the memset is indeed superfluous. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > This code should probably be using namecpy(). Note namecpy() > doesn't memset() after strncpy() and has survived the test of > time, which strongly suggests that the memset is indeed > superfluous. That argument would be more persuasive if I could find any current usage of the namecpy() function anywhere in the source code. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andres Freund <andres@2ndquadrant.com> writes: > I didn't argue against s/strncpy/strlcpy/. That's clearly a sensible > fix. > I am arguing about introducing additional code and error messages about > it, that need to be translated. And starting doing so in isolationtester > of all places. I agree with Andres on this. Commit 7cb964acb794078ef033cbf2e3a0e7670c8992a9 is the very definition of overkill, and I don't want to see us starting to plaster the source code with things like this. Converting strncpy to strlcpy seems appropriate --- and sufficient. regards, tom lane
Kevin Grittner escribió: > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > This code should probably be using namecpy(). Note namecpy() > > doesn't memset() after strncpy() and has survived the test of > > time, which strongly suggests that the memset is indeed > > superfluous. > > That argument would be more persuasive if I could find any current > usage of the namecpy() function anywhere in the source code. Well, its cousin namestrcpy is used in a lot of places. That one uses a regular C string as source; namecpy uses a Name as source, so they are slightly different but the coding is pretty much the same. There is a difference in using the macro StrNCpy instead of the strncpy library function directly. ISTM this makes sense because Name is known to be zero-terminated at NAMEDATALEN, which a random C string is not. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > I didn't argue against s/strncpy/strlcpy/. That's clearly a sensible > > fix. > > I am arguing about introducing additional code and error messages about > > it, that need to be translated. And starting doing so in isolationtester > > of all places. > > I agree with Andres on this. Commit > 7cb964acb794078ef033cbf2e3a0e7670c8992a9 is the very definition of > overkill, and I don't want to see us starting to plaster the source > code with things like this. Converting strncpy to strlcpy seems > appropriate --- and sufficient. Personally, I'd like to see better handling like this- but done in a way which minimizes impact to code and translators. A function like namecpy() (which I agree with Kevin about- curious that it's not used..) which handled the check, errmsg and exit seems reasonable to me, for the "userland" binaries (and perhaps the postmaster when doing command-line checking of, eg, -D) that need it. Still, I'm not offering to go do it, so take my feelings on it with that in mind. :) Thanks, Stephen
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner escribió: >> That argument would be more persuasive if I could find any current >> usage of the namecpy() function anywhere in the source code. > > Well, its cousin namestrcpy is used in a lot of places. That one uses a > regular C string as source; namecpy uses a Name as source, so they are > slightly different but the coding is pretty much the same. Fair enough. > There is a difference in using the macro StrNCpy instead of the strncpy > library function directly. ISTM this makes sense because Name is known > to be zero-terminated at NAMEDATALEN, which a random C string is not. Is the capital T in the second #undef in this pg_locale.c code intended?: #ifdef WIN32 /* * This Windows file defines StrNCpy. We don't need it here, so we undefine * it to keep the compiler quiet, and undefine it again after the file is * included, so we don't accidentally use theirs. */ #undef StrNCpy #include <shlwapi.h> #ifdef StrNCpy #undef STrNCpy #endif #endif -- Kevin GrittnerEDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Nov 16, 2013 at 4:09 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
David Rowley
David Rowley escribió:> > Be careful with 'Name' data type - it's not just a simple string buffer.This code should probably be using namecpy(). Note namecpy() doesn't
> > AFAIK it needs to work with hashing etc. so the zeroing is actually needed
> > here to make sure two values produce the same result. At least that's how
> > I understand the code after a quick check - for example this is from the
> > same jsonfuncs.c you mentioned:
> >
> > memset(fname, 0, NAMEDATALEN);
> > strncpy(fname, NameStr(tupdesc->attrs[i]->attname), NAMEDATALEN);
> > hashentry = hash_search(json_hash, fname, HASH_FIND, NULL);
> >
> > So the zeroing is on purpose, although if strncpy does that then the
> > memset is probably superflous.
memset() after strncpy() and has survived the test of time, which
strongly suggests that the memset is indeed superfluous.
I went on a bit of a strncpy cleanup rampage this morning and ended up finding quite a few places where strncpy is used wrongly.
I'm not quite sure if I have got them all in this patch, but I' think I've got the obvious ones at least.
For the hash_search in jsconfuncs.c after thinking about it a bit more... Can we not just pass the attname without making a copy of it? I see keyPtr in hash_search is const void * so it shouldn't get modified in there. I can't quite see the reason for making the copy.
Attached is a patch with various cleanups where I didn't like the look of the strncpy. I didn't go overboard with this as I know making this sort of small changes all over can be a bit scary and I thought maybe it would get rejected on that basis.
I also cleaned up things like strncpy(dest, src, strlen(src)); which just seems a bit weird and I'm failing to get my head around why it was done. I replaced these with memcpy instead, but they could perhaps be a plain old strcpy.
Regards
David Rowley
--
Álvaro Herrera http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Sat, Nov 16, 2013 at 12:53:10PM +1300, David Rowley wrote: > I went on a bit of a strncpy cleanup rampage this morning and ended up > finding quite a few places where strncpy is used wrongly. > I'm not quite sure if I have got them all in this patch, but I' think I've > got the obvious ones at least. > > For the hash_search in jsconfuncs.c after thinking about it a bit more... > Can we not just pass the attname without making a copy of it? I see keyPtr > in hash_search is const void * so it shouldn't get modified in there. I > can't quite see the reason for making the copy. +1 for the goal of this patch. Another commit took care of your jsonfuncs.c concerns, and the patch for CVE-2014-0065 fixed several of the others. Plenty remain, though. > Attached is a patch with various cleanups where I didn't like the look of > the strncpy. I didn't go overboard with this as I know making this sort of > small changes all over can be a bit scary and I thought maybe it would get > rejected on that basis. > > I also cleaned up things like strncpy(dest, src, strlen(src)); which just > seems a bit weird and I'm failing to get my head around why it was done. I > replaced these with memcpy instead, but they could perhaps be a plain old > strcpy. I suggest preparing one or more patches that focus on the cosmetic-only changes, such as strncpy() -> memcpy() when strncpy() is guaranteed not to reach a NUL byte. With that noise out of the way, it will be easier to give the rest the attention it deserves. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On Wed, Aug 13, 2014 at 3:19 PM, Noah Misch <noah@leadboat.com> wrote:
On Sat, Nov 16, 2013 at 12:53:10PM +1300, David Rowley wrote:+1 for the goal of this patch. Another commit took care of your jsonfuncs.c
> I went on a bit of a strncpy cleanup rampage this morning and ended up
> finding quite a few places where strncpy is used wrongly.
> I'm not quite sure if I have got them all in this patch, but I' think I've
> got the obvious ones at least.
>
> For the hash_search in jsconfuncs.c after thinking about it a bit more...
> Can we not just pass the attname without making a copy of it? I see keyPtr
> in hash_search is const void * so it shouldn't get modified in there. I
> can't quite see the reason for making the copy.
concerns, and the patch for CVE-2014-0065 fixed several of the others. Plenty
remain, though.
Thanks for taking interest in this.
I had a quick look at the usages of strncpy in master tonight and I've really just picked out the obviously broken ones for now. The other ones, on first look, either look safe, or require some more analysis to see what's actually done with the string.
I think this is likely best tackled in small increments anyway.
Does anyone disagree with the 2 changes in the attached?
Regards
David Rowley
Вложения
David Rowley <dgrowleyml@gmail.com> wrote: > I had a quick look at the usages of strncpy in master tonight and > I've really just picked out the obviously broken ones for now. > The other ones, on first look, either look safe, or require some > more analysis to see what's actually done with the string. > > Does anyone disagree with the 2 changes in the attached? I am concerned that failure to check for truncation could allow deletion of unexpected files or directories. While this is probably not as dangerous as *executing* unexpected files, it seems potentially problematic. At the very least, a code comment explaining why calling unlink on something which is not what appears to be expected is not a problem there. Some might consider it overkill, but I tend to draw a pretty hard line on deleting or executing random files, even if the odds seem to be that the mangled name won't find a match. Granted, those problems exist now, but without checking for truncation it seems to me that we're just deleting *different* incorrect filenames, not really fixing the problem. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > I am concerned that failure to check for truncation could allow > deletion of unexpected files or directories. I believe that we deal with this by the expedient of checking the lengths of tablespace paths in advance, when the tablespace is created. regards, tom lane
On 08/13/2014 04:31 PM, Kevin Grittner wrote: > David Rowley <dgrowleyml@gmail.com> wrote: > >> I had a quick look at the usages of strncpy in master tonight and >> I've really just picked out the obviously broken ones for now. >> The other ones, on first look, either look safe, or require some >> more analysis to see what's actually done with the string. >> >> Does anyone disagree with the 2 changes in the attached? > > I am concerned that failure to check for truncation could allow > deletion of unexpected files or directories. While this is > probably not as dangerous as *executing* unexpected files, it seems > potentially problematic. At the very least, a code comment > explaining why calling unlink on something which is not what > appears to be expected is not a problem there. > > Some might consider it overkill, but I tend to draw a pretty hard > line on deleting or executing random files, even if the odds seem > to be that the mangled name won't find a match. Granted, those > problems exist now, but without checking for truncation it seems to > me that we're just deleting *different* incorrect filenames, not > really fixing the problem. strlcpy is clearly better than strncpy here, but I wonder if we should have yet another string copying function that throws an error instead of truncating, if the buffer is too small. What you really want in these cases is a "path too long" error. - Heikki
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: > >> I am concerned that failure to check for truncation could allow >> deletion of unexpected files or directories. > > I believe that we deal with this by the expedient of checking the > lengths of tablespace paths in advance, when the tablespace is > created. As long as it is covered. I would point out that the when strlcpy is used it returns a size_t which can be directly compared to one of the arguments passed in (in this case MAXPGPATH) to detect whether the name was truncated for the cost of an integer compare (probably in registers). No additional scan of the data is needed. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 13, 2014 at 10:21:50AM -0400, Tom Lane wrote: > Kevin Grittner <kgrittn@ymail.com> writes: > > I am concerned that failure to check for truncation could allow > > deletion of unexpected files or directories. > > I believe that we deal with this by the expedient of checking the lengths > of tablespace paths in advance, when the tablespace is created. The files under scrutiny here are not located in a tablespace. Even if they were, isn't the length of $PGDATA/pg_tblspc the important factor? $PGDATA can change between runs if the DBA moves the data directory or reaches it via different symlinks, so any DDL-time defense would be incomplete. > > Some might consider it overkill, but I tend to draw a pretty hard > > line on deleting or executing random files, even if the odds seem > > to be that the mangled name won't find a match. Granted, those > > problems exist now, but without checking for truncation it seems to > > me that we're just deleting *different* incorrect filenames, not > > really fixing the problem. I share your (Kevin's) discomfort with our use of strlcpy(). I wouldn't mind someone replacing most strlcpy()/snprintf() calls with calls to wrappers that ereport(ERROR) on truncation. Though as reliability problems go, this one has been minor. David's specific patch has no concrete problem: On Wed, Aug 13, 2014 at 10:26:01PM +1200, David Rowley wrote: > --- a/contrib/pg_archivecleanup/pg_archivecleanup.c > +++ b/contrib/pg_archivecleanup/pg_archivecleanup.c > @@ -108,7 +108,7 @@ CleanupPriorWALFiles(void) > { > while (errno = 0, (xlde = readdir(xldir)) != NULL) > { > - strncpy(walfile, xlde->d_name, MAXPGPATH); > + strlcpy(walfile, xlde->d_name, MAXPGPATH); The code proceeds to check strlen(walfile) == XLOG_DATA_FNAME_LEN, so a long name can't trick it. > TrimExtension(walfile, additional_ext); > > /* > diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c > index 37745dc..0c9498a 100644 > --- a/src/backend/access/transam/xlogarchive.c > +++ b/src/backend/access/transam/xlogarchive.c > @@ -459,7 +459,7 @@ KeepFileRestoredFromArchive(char *path, char *xlogfname) > xlogfpath, oldpath))); > } > #else > - strncpy(oldpath, xlogfpath, MAXPGPATH); > + strlcpy(oldpath, xlogfpath, MAXPGPATH); This one never overflows, because it's copying from one MAXPGPATH buffer to another. Plain strcpy() would be fine, too. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Noah Misch <noah@leadboat.com> writes: > On Wed, Aug 13, 2014 at 10:21:50AM -0400, Tom Lane wrote: >> I believe that we deal with this by the expedient of checking the lengths >> of tablespace paths in advance, when the tablespace is created. > The files under scrutiny here are not located in a tablespace. Even if they > were, isn't the length of $PGDATA/pg_tblspc the important factor? The length of $PGDATA is of no relevance whatsoever; we chdir into that directory at startup, and subsequently all paths are implicitly relative to there. If there is any backend code that's prepending $PGDATA to something else, it's wrong to start with. regards, tom lane
On Thu, Aug 14, 2014 at 02:50:02AM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Wed, Aug 13, 2014 at 10:21:50AM -0400, Tom Lane wrote: > >> I believe that we deal with this by the expedient of checking the lengths > >> of tablespace paths in advance, when the tablespace is created. > > > The files under scrutiny here are not located in a tablespace. Even if they > > were, isn't the length of $PGDATA/pg_tblspc the important factor? > > The length of $PGDATA is of no relevance whatsoever; we chdir into that > directory at startup, and subsequently all paths are implicitly relative > to there. If there is any backend code that's prepending $PGDATA to > something else, it's wrong to start with. Ah; quite right. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On Thu, Aug 14, 2014 at 4:13 PM, Noah Misch <noah@leadboat.com> wrote:
someone replacing most strlcpy()/snprintf() calls with calls to wrappers thatI share your (Kevin's) discomfort with our use of strlcpy(). I wouldn't mind
ereport(ERROR) on truncation. Though as reliability problems go, this one has
been minor.
Or maybe it would be better to just remove the restriction and just palloc something of the correct size?
Although, that sounds like a much larger patch. I'd vote that the strlcpy should be used in the meantime.
Regards
David Rowley
On Sat, Aug 16, 2014 at 10:38:39AM +1200, David Rowley wrote: > On Thu, Aug 14, 2014 at 4:13 PM, Noah Misch <noah@leadboat.com> wrote: > > > I share your (Kevin's) discomfort with our use of strlcpy(). I wouldn't > > mind > > someone replacing most strlcpy()/snprintf() calls with calls to wrappers > > that > > ereport(ERROR) on truncation. Though as reliability problems go, this one > > has > > been minor. > > > > > Or maybe it would be better to just remove the restriction and just palloc > something of the correct size? > Although, that sounds like a much larger patch. I'd vote that the strlcpy > should be used in the meantime. I agree that, in principle, dynamic allocation might be better still. I also agree that it would impose more code churn, for an awfully-narrow benefit. Barring objections, I will commit your latest patch with some comments about why truncation is harmless for those two particular calls. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On Fri, Aug 15, 2014 at 11:26:55PM -0400, Noah Misch wrote: > Barring objections, I will commit your latest patch with some comments about > why truncation is harmless for those two particular calls. Done.