Обсуждение: "buffer too small" or "path too long"?
Recently we added the error messages "buffer for root directory too small" and siblings to pg_upgrade. This means "<new_cluster's pgdata>/pg_upgrade_output.d" was longer than MAXPGPATH. I feel that the "root directory" is obscure here, and moreover "buffer is too small" looks pointless since no user can do something on the buffer length. At least I can't read out from the message concretely what I should do next.. The root cause of the errors is that the user-provided directory path of new cluster's root was too long. Anywhich one of the four buffers is overflowed, it doesn't makes any difference for users and doesn't offer any further detail to suppoerters/developers. I see "output directory path of new cluster too long" clear enough. Above all, this change reduces the number of messages that need translation:) # And the messages are missing trailing line breaks. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Вложения
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > The root cause of the errors is that the user-provided directory path > of new cluster's root was too long. Anywhich one of the four buffers > is overflowed, it doesn't makes any difference for users and doesn't > offer any further detail to suppoerters/developers. I see "output > directory path of new cluster too long" clear enough. +1, but I'm inclined to make it read "... is too long". > # And the messages are missing trailing line breaks. I was about to question that, but now I remember that pg_upgrade has its own logging facility with a different idea about who provides the trailing newline than common/logging.[hc] has. Undoubtedly that's the source of this mistake. We really need to get pg_upgrade out of the business of having its own logging conventions. regards, tom lane
At Mon, 13 Jun 2022 13:25:01 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > The root cause of the errors is that the user-provided directory path > > of new cluster's root was too long. Anywhich one of the four buffers > > is overflowed, it doesn't makes any difference for users and doesn't > > offer any further detail to suppoerters/developers. I see "output > > directory path of new cluster too long" clear enough. > > +1, but I'm inclined to make it read "... is too long". Yeah, I feel so and it is what I wondered about recently when I saw some complete error messages. Is that because of the length of the subject? > > # And the messages are missing trailing line breaks. > > I was about to question that, but now I remember that pg_upgrade has > its own logging facility with a different idea about who provides > the trailing newline than common/logging.[hc] has. Undoubtedly > that's the source of this mistake. We really need to get pg_upgrade > out of the business of having its own logging conventions. Yes... I don't find a written reason excluding pg_upgrade in both the commit 9a374b77fb and or the thread [1]. But I guess that we decided that we first provide the facility in the best style ignoring the current impletent in pg_upgrade then let pg_upgrade use it. So I think it should emerge in the next cycle? I'll give it a shot if no one is willing to do that for now. (I believe it is straightforward..) [1] https://www.postgresql.org/message-id/941719.1645587865%40sss.pgh.pa.us regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 14 Jun 2022 09:48:26 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Mon, 13 Jun 2022 13:25:01 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > > +1, but I'm inclined to make it read "... is too long". > > Yeah, I feel so and it is what I wondered about recently when I saw > some complete error messages. Is that because of the length of the > subject? And I found that it is alrady done. Thanks! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Jun 14, 2022 at 09:52:52AM +0900, Kyotaro Horiguchi wrote: > At Tue, 14 Jun 2022 09:48:26 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in >> Yeah, I feel so and it is what I wondered about recently when I saw >> some complete error messages. Is that because of the length of the >> subject? > > And I found that it is alrady done. Thanks! I have noticed this thread and 4e54d23 as a result this morning. If you want to spread this style more, wouldn't it be better to do that in all the places of pg_upgrade where we store paths to files? I can see six code paths with log_opts.basedir that could do the same, as of the attached. The hardcoded file names have various lengths, and some of them are quite long making the generated paths more exposed to being cut in the middle. -- Michael
Вложения
Michael Paquier <michael@paquier.xyz> writes: > I have noticed this thread and 4e54d23 as a result this morning. If > you want to spread this style more, wouldn't it be better to do that > in all the places of pg_upgrade where we store paths to files? I can > see six code paths with log_opts.basedir that could do the same, as of > the attached. The hardcoded file names have various lengths, and some > of them are quite long making the generated paths more exposed to > being cut in the middle. Well, I just fixed the ones in make_outputdirs because it seemed weird that that part of the function was not doing something the earlier parts did. I didn't look around for more trouble. I think that pg_fatal'ing on the grounds of path-too-long once we've already started the upgrade isn't all that great. Really we want to fail on that early on --- so coding make_outputdirs like this is fine, but maybe we need a different plan for files made later. regards, tom lane
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > At Mon, 13 Jun 2022 13:25:01 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in >> I was about to question that, but now I remember that pg_upgrade has >> its own logging facility with a different idea about who provides >> the trailing newline than common/logging.[hc] has. Undoubtedly >> that's the source of this mistake. We really need to get pg_upgrade >> out of the business of having its own logging conventions. > Yes... I don't find a written reason excluding pg_upgrade in both the > commit 9a374b77fb and or the thread [1]. Well, as far as 9a374b77fb went, Peter had left pg_upgrade out of the mix in the original creation of common/logging.c, and I didn't think that dealing with that was a reasonable part of my update patch. > But I guess that we decided > that we first provide the facility in the best style ignoring the > current impletent in pg_upgrade then let pg_upgrade use it. So I > think it should emerge in the next cycle? I'll give it a shot if no > one is willing to do that for now. (I believe it is straightforward..) Actually, I spent some time earlier today looking into that, and I can see why Peter stayed away from it :-(. There are a few issues: * The inconsistency with the rest of the world about trailing newlines. That aspect actually seems fixable fairly easily, and I have a patch mostly done for it. * logging.c believes it should prefix every line of output with the program's name and so on. This doesn't seem terribly appropriate for pg_upgrade's use --- at least, not unless we make pg_upgrade WAY less chatty. Perhaps that'd be fine, I dunno. * pg_upgrade's pg_log_v duplicates all (well, most) stdout messages into the INTERNAL_LOG_FILE log file, something logging.c has no provision for (and it'd not be too easy to do, because of the C standard's restrictions on use of va_list). Personally I'd be okay with nuking the INTERNAL_LOG_FILE log file from orbit, but I bet somebody will fight to keep it. * pg_log_v has also got a bunch of specialized rules around how to format PG_STATUS message traffic. Again I wonder how useful that whole behavior really is, but taking it out would be a big user-visible change. In short, it seems like pg_upgrade's logging habits are sufficiently far out in left field that we couldn't rebase it on top of logging.c without some seriously large user-visible behavioral changes. I have better things to spend my time on than advocating for that. However, the inconsistency in newline handling is a problem: I found that there are already other bugs with missing or extra newlines, and it will only get worse if we don't unify that behavior. So my inclination for now is to fix that and let the other issues go. Patch coming. regards, tom lane
On Mon, Jun 13, 2022 at 10:41:41PM -0400, Tom Lane wrote: > * logging.c believes it should prefix every line of output with the > program's name and so on. This doesn't seem terribly appropriate > for pg_upgrade's use --- at least, not unless we make pg_upgrade > WAY less chatty. Perhaps that'd be fine, I dunno. pg_upgrade was designed to be chatty because it felt it could fail under unpredictable circumstances --- I am not sure how true that is today. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
On 14.06.22 03:55, Michael Paquier wrote: > On Tue, Jun 14, 2022 at 09:52:52AM +0900, Kyotaro Horiguchi wrote: >> At Tue, 14 Jun 2022 09:48:26 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in >>> Yeah, I feel so and it is what I wondered about recently when I saw >>> some complete error messages. Is that because of the length of the >>> subject? >> >> And I found that it is alrady done. Thanks! > > I have noticed this thread and 4e54d23 as a result this morning. If > you want to spread this style more, wouldn't it be better to do that > in all the places of pg_upgrade where we store paths to files? I can > see six code paths with log_opts.basedir that could do the same, as of > the attached. The hardcoded file names have various lengths, and some > of them are quite long making the generated paths more exposed to > being cut in the middle. We have this problem of long file names being silently truncated all over the source code. Instead of equipping each one of them with a length check, why don't we get rid of the fixed-size buffers and allocate dynamically, as in the attached patch.
Вложения
On Wed, Jun 15, 2022 at 2:51 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > We have this problem of long file names being silently truncated all > over the source code. Instead of equipping each one of them with a > length check, why don't we get rid of the fixed-size buffers and > allocate dynamically, as in the attached patch. I've always wondered why we rely on MAXPGPATH instead of dynamic allocation. It seems pretty lame. I don't know how much we gain by fixing one place and not all the others, but maybe it would set a trend. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jun 15, 2022 at 2:51 AM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: >> We have this problem of long file names being silently truncated all >> over the source code. Instead of equipping each one of them with a >> length check, why don't we get rid of the fixed-size buffers and >> allocate dynamically, as in the attached patch. > I don't know how much we gain by fixing one place and not all the > others, but maybe it would set a trend. Yeah, that was what was bugging me about this proposal. Removing one function's dependency on MAXPGPATH isn't much of a step forward. I note also that the patch leaks quite a lot of memory (a kilobyte or so per pathname, IIRC). That's probably negligible in this particular context, but anyplace that was called more than once per program run would need to be more tidy. regards, tom lane
On Wed, Jun 15, 2022 at 02:02:03PM -0400, Tom Lane wrote: > Yeah, that was what was bugging me about this proposal. Removing > one function's dependency on MAXPGPATH isn't much of a step forward. This comes down to out-of-memory vs path length at the end. Changing only the paths of make_outputdirs() without touching all the paths in check.c and the one in function.c does not sound good to me, as this increases the risk of failing pg_upgrade in the middle, and that's what we should avoid, as said upthread. > I note also that the patch leaks quite a lot of memory (a kilobyte or > so per pathname, IIRC). That's probably negligible in this particular > context, but anyplace that was called more than once per program run > would need to be more tidy. Surely. -- Michael
Вложения
On 15.06.22 19:08, Robert Haas wrote: > On Wed, Jun 15, 2022 at 2:51 AM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: >> We have this problem of long file names being silently truncated all >> over the source code. Instead of equipping each one of them with a >> length check, why don't we get rid of the fixed-size buffers and >> allocate dynamically, as in the attached patch. > > I've always wondered why we rely on MAXPGPATH instead of dynamic > allocation. It seems pretty lame. I think it came in before we had extensible string buffers APIs.