Обсуждение: backslash in psql output
I believe Tom Lane removed the double-backslash from the psql output by modifying PQprint. I recommend that is reversed, because psql escapes out the table column delimiters with a single backslash, and now the format will be ambigious. rtest=> create table test(x text); ERROR: test relation already exists test=> create table test3(x text); CREATE test=> insert into test3 values ('\\x'); INSERT 322185 1 test=> select * from test3; x -- \x (1 row) This used to show as: x -- \\x (1 row) Comments? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> now the format will be ambigious. > test=> insert into test3 values ('\\x'); > test=> select * from test3; > -- > \x > This used to show as: > -- > \\x > Comments? Well, actually I've been thinking that this is closer to the behavior we might want (though I haven't looked carefully at the new version). Of course it bothered me more than it should have, since I misunderstood where the re-escaping was happening; I had thought it was happening in the backend. psql could have an option to re-escape strings, but imho by default should display what is stored, not what was typed in originally. pg_dump _should_ re-escape everything, so that it reloads properly. - $0.02 from Tom
> > now the format will be ambigious. > > test=> insert into test3 values ('\\x'); > > test=> select * from test3; > > -- > > \x > > This used to show as: > > -- > > \\x > > Comments? > > Well, actually I've been thinking that this is closer to the behavior we > might want (though I haven't looked carefully at the new version). Of > course it bothered me more than it should have, since I misunderstood > where the re-escaping was happening; I had thought it was happening in > the backend. > > psql could have an option to re-escape strings, but imho by default > should display what is stored, not what was typed in originally. > > pg_dump _should_ re-escape everything, so that it reloads properly. But what about backward compatability? Aren't there people expecting psql output to show double backslashes? What do we do to display pipes in the output? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > psql could have an option to re-escape strings, but imho by default > > should display what is stored, not what was typed in originally. > > pg_dump _should_ re-escape everything, so that it reloads properly. > But what about backward compatability? Aren't there people expecting > psql output to show double backslashes? What do we do to display > pipes in the output? We can include a command-line option to re-enable the escapes. Anyone can use that form, and can define an alias for it if they need. I understand your concern about backward compatibility, but imho the convention it was using makes things more confusing, especially for a new user. I'm not certain (assuming things have changed recently) that the new behavior is what I would want, but the old behavior was not one I would have guessed at. Using psql to pipe output to another program is the best example of why one might _not_ want to re-escape the strings. If the goal is to use data from the database, then that data is what you will want to see in the pipe. Anyway, in the long run I'd like to consider moving toward a "no-escape" default output for psql. If you think that it is too short notice to fully understand the ramifications of a change this close to release, or if your "faithful correspondents" have cottage cheese for brains, then go ahead and revert it ;) But I like the new behavior in principle... - Tom Another possibility is to make it a configuration option: ./configure --enable-psql-string-escapes or something like that...
> We can include a command-line option to re-enable the escapes. Anyone > can use that form, and can define an alias for it if they need. > > I understand your concern about backward compatibility, but imho the > convention it was using makes things more confusing, especially for a > new user. I'm not certain (assuming things have changed recently) that > the new behavior is what I would want, but the old behavior was not one > I would have guessed at. > > Using psql to pipe output to another program is the best example of why > one might _not_ want to re-escape the strings. If the goal is to use > data from the database, then that data is what you will want to see in > the pipe. > > Anyway, in the long run I'd like to consider moving toward a "no-escape" > default output for psql. If you think that it is too short notice to > fully understand the ramifications of a change this close to release, or > if your "faithful correspondents" have cottage cheese for brains, then > go ahead and revert it ;) But I like the new behavior in principle... > > - Tom > > Another possibility is to make it a configuration option: > > ./configure --enable-psql-string-escapes > > or something like that... > I realize the double-backslash is confusing, but I don't think we can make such a user-visible change at this time. I think we need to open discussion on this issue on the general list, and to include discussion of NULL displays, and any other issues, as well as how to properly output the column separation character if that appears in the data. So, I think we have to put it back to the old way, and open discussion about this after 6.4. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: >> > now the format will be ambigious. >> > test=> insert into test3 values ('\\x'); >> > test=> select * from test3; >> > -- >> > \x >> > This used to show as: >> > -- >> > \\x >> > Comments? >>... >But what about backward compatability? Aren't there people expecting >psql output to show double backslashes? What do we do to display pipes >in the output? That change seems a good thing: the front-end ought to display what the user wants. Any manipulations should be done behind the scenes. If I store a DOS pathname, I don't want to see the backslashes doubled in it. Even worse, I don't want to see them eliminated altogether, which is what happens now if I don't remember to double them on input. You mentioned that psql backslash-escapes the column delimiter character. I think that this behaviour ought to be removed as well; it should be obvious from the alignment with headings and other lines whether a pipe character is part of the data or a column delimiter. If it really matters, a user can specify another character to use as delimiter. An unsophisticated user expects to type characters and have them accepted; he should not have to know that certain characters need to be doubled or escaped, nor that certain characters he sees in the output are to be ignored. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP key from public servers; key ID 32B8FAA1 ======================================== "He that covereth his sins shall not prosper; but whoso confesseth and forsaketh them shall have mercy." Proverbs 28:13
Bruce Momjian <maillist@candle.pha.pa.us> writes: > I realize the double-backslash is confusing, but I don't think we can > make such a user-visible change at this time. I think we need to open > discussion on this issue on the general list, and to include discussion > of NULL displays, and any other issues, as well as how to properly > output the column separation character if that appears in the data. > So, I think we have to put it back to the old way, and open discussion > about this after 6.4. Well, actually there *was* public discussion of this issue, on the pgsql-interfaces list around 12/13 August. The consensus was that unnecessary backslashing was a bad idea --- in fact, I didn't see *anyone* arguing in favor of the old behavior, and the people who actually had backslashes in their data definitely didn't want it. Admittedly it was a pretty small sample (Tom Lockhart and I were two of the primary complainers) but there wasn't any sentiment for keeping the old behavior. Keep in mind that what we are discussing here is the behavior of PQprint(), not the behavior of FE/BE transport protocol or anything else that affects data received by applications. PQprint's goal in life is to present data in a reasonably human-friendly way, *not* to produce a completely unambiguous machine-readable syntax. Its output format is in fact very ambiguous. Here's an example: play=> create table test(id int4, val text); play=> insert into test values (1, NULL); play=> insert into test values (2, ' '); play=> insert into test values (3, 'foobar'); play=> insert into test values (4, 'oneback\\slash'); play=> insert into test values (5, 'onevert|bar'); play=> select * from test; id|val --+------------- 1| 2| 3|foobar 4|oneback\slash 5|onevert|bar (5 rows) You can't tell the difference between a NULL field and an all-blanks value in this format; nor can you really be sure how many trailing blanks there are in tuples 3 and 5. So the goal is readability, not lack of ambiguity. Given that goal, I don't see the value of printing backslash escapes. Are you really having difficulty telling the data vertical bar from the ones used as column separators? Physical alignment is the cue the eye relies on, I think. The only cases that PQprint inserted backslashes for were the column separator char (unnecessary per above example), newlines (also not exactly hard to recognize), and backslash itself. All of these seem unnecessary and confusing to me. I'm sorry that this change sat in my to-do queue for so long, but I don't see it as a last-minute thing. The consensus to do it was established two months ago. regards, tom lane
> Bruce Momjian wrote: > >> > now the format will be ambigious. > >> > test=> insert into test3 values ('\\x'); > >> > test=> select * from test3; > >> > -- > >> > \x > >> > This used to show as: > >> > -- > >> > \\x > >> > Comments? > >>... > >But what about backward compatability? Aren't there people expecting > >psql output to show double backslashes? What do we do to display pipes > >in the output? > > That change seems a good thing: the front-end ought to display what the > user wants. Any manipulations should be done behind the scenes. If I > store a DOS pathname, I don't want to see the backslashes doubled in it. > Even worse, I don't want to see them eliminated altogether, which is what > happens now if I don't remember to double them on input. > > You mentioned that psql backslash-escapes the column delimiter character. > I think that this behaviour ought to be removed as well; it should be obvious > from the alignment with headings and other lines whether a pipe character is > part of the data or a column delimiter. If it really matters, a user > can specify another character to use as delimiter. If the user is reading psql output into a program, it is very unclear how to determine a valid column delimiter vs. a delimiter in the data. Yes, they can change delimiters, but they person has to choose one that would never appear in the data stream, and that is sometimes impossible. I don't know how many people do this type of thing, but I think we have to ask the general users. > > An unsophisticated user expects to type characters and have them > accepted; he should not have to know that certain characters need to be > doubled or escaped, nor that certain characters he sees in the output are > to be ignored. If we don't require double backslashes on input, we can no longer accept C escape sequences like \r or \n, or octal values. As it is now, octal values are now interpreted, instead of returning their values: test=> insert into test3 values ('\253'); INSERT 323237 1 test=> select * from test3; x -------- nedd � � (3 rows) I don't know if it always did this or not. I can understand people wanting to change things, but we have to discuss all the issues. And what about the COPY command. Do you want to change the display of escape characters there too? I hope not. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:... >And what about the COPY command. Do you want to change the display of >escape characters there too? I hope not. No; but I wouldn't expect unsophisticated users to be using COPY. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP key from public servers; key ID 32B8FAA1 ======================================== "He that covereth his sins shall not prosper; but whoso confesseth and forsaketh them shall have mercy." Proverbs 28:13
Bruce Momjian <maillist@candle.pha.pa.us> writes: > If the user is reading psql output into a program, it is very unclear > how to determine a valid column delimiter vs. a delimiter in the data. Quite true, but the PQprint output format has never been program-friendly, as I pointed out previously. I think the appropriate response to this gripe is to add another command-line switch to psql, which would change its display of SELECT data to something more suitable for program consumption, rather than trying to make a single output format that's a bad compromise between human readability and machine readability. Perhaps the COPY data syntax would work for a program-friendly output format? (Basically tabs and newlines are field and row delimiters, with (IIRC) backslash-escaping of tabs, newlines, backslash itself, and maybe nulls and suchlike. Also \N stands for a null field.) > And what about the COPY command. Do you want to change the display of > escape characters there too? I hope not. I wasn't proposing any such thing, unless it proves necessary to ensure we can copy arbitrary-8-bit text data out and back in. But that's a task for a future release. regards, tom lane
> You can't tell the difference between a NULL field and an all-blanks > value in this format; nor can you really be sure how many trailing > blanks there are in tuples 3 and 5. So the goal is readability, > not lack of ambiguity. Given that goal, I don't see the value of > printing backslash escapes. Are you really having difficulty telling > the data vertical bar from the ones used as column separators? > Physical alignment is the cue the eye relies on, I think. > > The only cases that PQprint inserted backslashes for were the column > separator char (unnecessary per above example), newlines (also not > exactly hard to recognize), and backslash itself. All of these > seem unnecessary and confusing to me. OK, I understand your point here, that we must maximize readability, and that robustness is not as important. OK, let's keep the removal of backslashes. Can you recommend a nice NULL display, perhaps '[NULL]' or '<NULL>'? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Thus spake Bruce Momjian > OK, let's keep the removal of backslashes. Can you recommend a nice > NULL display, perhaps '[NULL]' or '<NULL>'? I'd like to make at least one vote to keep the status quo. Some of us have come to depend on the existing behaviour. At least make it an option that you have to turn on. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner.
> Thus spake Bruce Momjian > > OK, let's keep the removal of backslashes. Can you recommend a nice > > NULL display, perhaps '[NULL]' or '<NULL>'? > > I'd like to make at least one vote to keep the status quo. Some of > us have come to depend on the existing behaviour. At least make > it an option that you have to turn on. Man, I can't win. :-) I vote for the status quo, and people want double backslashes removed. I want to add a NULL display, and people want status quo. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > I realize the double-backslash is confusing, but I don't think we can > > make such a user-visible change at this time. I think we need to open > > discussion on this issue on the general list, and to include discussion > > of NULL displays, and any other issues, as well as how to properly > > output the column separation character if that appears in the data. > > So, I think we have to put it back to the old way, and open discussion > > about this after 6.4. > > Well, actually there *was* public discussion of this issue, on the > pgsql-interfaces list around 12/13 August. The consensus was that > unnecessary backslashing was a bad idea --- in fact, I didn't see > *anyone* arguing in favor of the old behavior, and the people who > actually had backslashes in their data definitely didn't want it. > Admittedly it was a pretty small sample (Tom Lockhart and I were > two of the primary complainers) but there wasn't any sentiment > for keeping the old behavior. > > Keep in mind that what we are discussing here is the behavior of > PQprint(), not the behavior of FE/BE transport protocol or anything > else that affects data received by applications. PQprint's goal in > life is to present data in a reasonably human-friendly way, *not* > to produce a completely unambiguous machine-readable syntax. Its > output format is in fact very ambiguous. Here's an example: > > play=> create table test(id int4, val text); > play=> insert into test values (1, NULL); > play=> insert into test values (2, ' '); > play=> insert into test values (3, 'foobar'); > play=> insert into test values (4, 'oneback\\slash'); > play=> insert into test values (5, 'onevert|bar'); > play=> select * from test; > id|val > --+------------- > 1| > 2| > 3|foobar > 4|oneback\slash > 5|onevert|bar > (5 rows) > > You can't tell the difference between a NULL field and an all-blanks > value in this format; nor can you really be sure how many trailing > blanks there are in tuples 3 and 5. So the goal is readability, > not lack of ambiguity. Given that goal, I don't see the value of > printing backslash escapes. Are you really having difficulty telling > the data vertical bar from the ones used as column separators? > Physical alignment is the cue the eye relies on, I think. > > The only cases that PQprint inserted backslashes for were the column > separator char (unnecessary per above example), newlines (also not > exactly hard to recognize), and backslash itself. All of these > seem unnecessary and confusing to me. > > I'm sorry that this change sat in my to-do queue for so long, but > I don't see it as a last-minute thing. The consensus to do it was > established two months ago. > > regards, tom lane > > > In my opinion we should privilege machine-readableness first and then provide some user option to enable user-friendly conversion in psql output if one really needs it. In situations where data is processed by other programs it is very important that there is no ambiguity in strings exchanged between the application and the backend. This is already done for input, which supports C-like escape, but not yet for output, which can produce ambiguous data when nulls, arrays or non-printing characters are involved. This is the reason why I always use my C-like output functions (contrib/string-io) in all my applications. These arguments apply also to the copy command which uses the same output functions. Consider the case where a text field contains a multi-line string with newlines embedded; if you export the table into an external files the field is split into many lines which are interpreted as separate records by commonly used line-oriented filters like awk or grep. I believe that the right way to handle all this stuff is the following: input: binary data escaped data | | (user conversion) (psql input) | | +-----------------------+ | escaped query | (libpq) | escaped query escaped data | | (parser unescape) (copy-from unescape) | | +-----------------------+ | binary data | (input function) | internal data output: internal data | (output function) | escaped data | +-----------------------+ | | (libpq) (copy-to) | | escaped data escaped data | | +-----------------------+-----------------------+ | | | (user conversion) (psql output) (psql unescape) | | | binary data escaped data binary data In the above schema binary data means the external representation of data containing non-printing or delimiters characters like quotes or newlines. In this schema all the data exchanged with the backend should be escaped in order to guarantee unambiguity to applications. The input and output user conversion functions could be provided by libpq as utilities, and the conversion could possibly be done automatically by libpq itself if some global flag is set by the application. Psql input should accept only escaped data while the output could be escaped (default) or binary depending on a user supplied switch. Files read or written by the copy command should be always escaped with exactly one record for line. Pg_dump should produce escaped strings. All this stuff requires the use of new output functions like those provided in contrib/string-io. There is still the problem of distinguishing between scalars and arrays which is necessary for user output conversion. In my output functions I solved the problem by escaping the first '{' of each field if it is not an array. Another problem is that array input requires a double escaping, one for the query parser and a second one for the array parser. Also nulls (\0) are not handled by the input code. This should be fixed if we want true binary data. I don't know if C-escapes violate the ansi sql standard but I believe they makes life easier for the programmer. And if we add some global flag in libpq we could also do automatic conversion to be compatible with ansi sql and old applications. Note that arrays aren't ansi sql anyway. Anyway a runtime switch is preferable to a configure switch. -- Massimo Dal Zotto +----------------------------------------------------------------------+ | Massimo Dal Zotto email: dz@cs.unitn.it | | Via Marconi, 141 phone: ++39-461-534251 | | 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ | | Italy pgp: finger dz@tango.cs.unitn.it | +----------------------------------------------------------------------+
On Sun, 11 Oct 1998, D'Arcy J.M. Cain wrote: > I'd like to make at least one vote to keep the status quo. Some of > us have come to depend on the existing behaviour. At least make > it an option that you have to turn on. > > -- > D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves > http://www.druid.net/darcy/ | and a sheep voting on > +1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner. > To which I add my hearty approval (I'd like both options)! Marc Zuckman marc@fallon.classyad.com _\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ _ Visit The Home and Condo MarketPlace _ _ http://www.ClassyAd.com _ _ _ _ FREE basic property listings/advertisements and searches. _ _ _ _ Try our premium, yet inexpensive services for a real _ _ selling or buying edge! _ _\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
(Sorry for being so slow to respond to this...) Massimo Dal Zotto <dz@cs.unitn.it> writes: > I believe that the right way to handle all this stuff is the following: > input: > binary data escaped data > | | > (user conversion) (psql input) > | | > +-----------------------+ > | > escaped query > | > (libpq) > | > escaped query escaped data > | | > (parser unescape) (copy-from unescape) > | | > +-----------------------+ > | > binary data > | > (input function) > | > internal data > output: > internal data > | > (output function) > | > escaped data > | > +-----------------------+ > | | > (libpq) (copy-to) > | | > escaped data escaped data > | > | > +-----------------------+-----------------------+ > | | | > (user conversion) (psql output) (psql unescape) > | | | > binary data escaped data binary data I disagree with this, at least for the output side. The FE/BE protocol for SELECT/FETCH results is already completely 8-bit clean. There is no reason to escape output data before passing it across the wire and through libpq. The application program might want to escape the data for its own purposes, but that's not our concern. As far as I can tell, COPY IN/OUT data is the only case where we really have an issue. Since the COPY protocol is inherently text-based, we have to escape anything that won't do as text. (Offhand, I think only tab, newline, null, and of course backslash are essential to convert, though we might also want to convert other nonprinting characters for readability's sake.) The conversions involved need to be nailed down and documented as part of the FE/BE protocol. Coping with array-valued fields is also a concern --- there needs to be some reasonable way for an application to discover that a given field is an array and what datatype it is an array of. But I think the need there is to extend the RowDescription information returned by SELECT, not to modify the data representation. regards, tom lane
> As far as I can tell, COPY IN/OUT data is the only case where we really > have an issue. Since the COPY protocol is inherently text-based, we > have to escape anything that won't do as text. (Offhand, I think only > tab, newline, null, and of course backslash are essential to convert, > though we might also want to convert other nonprinting characters for > readability's sake.) The conversions involved need to be nailed down > and documented as part of the FE/BE protocol. \. as end-of-input is also escaped. Not sure that gets sent to the backend, or is just used by the frontend protocol to signal end-of-input. > > Coping with array-valued fields is also a concern --- there needs to > be some reasonable way for an application to discover that a given field > is an array and what datatype it is an array of. But I think the need > there is to extend the RowDescription information returned by SELECT, > not to modify the data representation. Yes, arrays can be a problem. Not sure. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026