Обсуждение: Allowing line-continuation in pgbench custom scripts
Hi, In a custom pgbench script, it seems convenient to be able to split a really long query to span multiple lines using an escape character (bash-style). Attached adds that capability to read_line_from_file() in pgbench.c For example, BEGIN; \setrandom 1 16500000 UPDATE table \ SET col2 = (clock_timestamp() + '10s'::interval * random() * 1000), \ col3 = (clock_timestamp() + '10s'::interval * sin(random() * (2*pi()) ) * 1000) \ WHERE col1 = :id; COMMIT; instead of: BEGIN; \setrandom id 1 16500000 UPDATE table SET col2 = (clock_timestamp() + '10s'::interval * random() * :id), col3 = (clock_timestamp() + '10s'::interval * sin(random() * (2*pi()) ) * 100000) WHERE col1 = :id; COMMIT; Thoughts? -- Amit
Вложения
On Mon, May 26, 2014 at 6:50 PM, Amit Langote <amitlangote09@gmail.com> wrote: > Hi, > > In a custom pgbench script, it seems convenient to be able to split a > really long query to span multiple lines using an escape character > (bash-style). Attached adds that capability to read_line_from_file() > in pgbench.c > > For example, > > BEGIN; > \setrandom 1 16500000 > UPDATE table \ > SET col2 = (clock_timestamp() + '10s'::interval * random() * 1000), \ > col3 = (clock_timestamp() + '10s'::interval * sin(random() * > (2*pi()) ) * 1000) \ > WHERE col1 = :id; > COMMIT; > > instead of: > > BEGIN; > \setrandom id 1 16500000 > UPDATE table SET col2 = (clock_timestamp() + '10s'::interval * > random() * :id), col3 = (clock_timestamp() + '10s'::interval * > sin(random() * (2*pi()) ) * 100000) WHERE col1 = :id; > COMMIT; > > Thoughts? IMO it's better if we can write SQL in multiples line *without* a tailing escape character, like psql's input file. Regards, -- Fujii Masao
On Mon, May 26, 2014 at 10:59 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, May 26, 2014 at 6:50 PM, Amit Langote <amitlangote09@gmail.com> wrote: >> Thoughts? > > IMO it's better if we can write SQL in multiples line *without* a tailing > escape character, like psql's input file. > Yeah, that would be much cleaner. -- Amit
Amit Langote wrote: > On Mon, May 26, 2014 at 10:59 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Mon, May 26, 2014 at 6:50 PM, Amit Langote <amitlangote09@gmail.com> wrote: > >> Thoughts? > > > > IMO it's better if we can write SQL in multiples line *without* a tailing > > escape character, like psql's input file. > > Yeah, that would be much cleaner. But that would require duplicating the lexing stuff to determine where quotes are and where commands end. There are already some cases where pgbench itself is the bottleneck; adding a lexing step would be more expensive, no? Whereas simply detecting line continuations would be cheaper. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Amit Langote <amitlangote09@gmail.com> writes: > In a custom pgbench script, it seems convenient to be able to split a > really long query to span multiple lines using an escape character > (bash-style). Attached adds that capability to read_line_from_file() > in pgbench.c This seems pretty likely to break existing scripts that happen to contain backslashes. Is it really worth the compatibility risk? The patch as written has got serious problems even discounting any compatibility risk: it will be fooled by a backslash near the end of a bufferload that doesn't end with a newline, and it doesn't allow for DOS-style newlines (\r\n), and it indexes off the array if the buffer contains *only* a newline (and, assuming that it fails to crash in that case, it'd also fail to note a backslash that had been in the previous bufferload). regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Amit Langote wrote: >> On Mon, May 26, 2014 at 10:59 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> IMO it's better if we can write SQL in multiples line *without* a tailing >>> escape character, like psql's input file. >> Yeah, that would be much cleaner. > But that would require duplicating the lexing stuff to determine where > quotes are and where commands end. There are already some cases where > pgbench itself is the bottleneck; adding a lexing step would be more > expensive, no? Whereas simply detecting line continuations would be > cheaper. Well, we only parse the script file(s) once at run start, and that time isn't included in the TPS timing, so I don't think performance is really an issue here. But yeah, the amount of code that would have to be duplicated out of psql is pretty daunting --- it'd be a maintenance nightmare, for what seems like not a lot of gain. There would also be a compatibility issue if we went this way, because existing scripts that haven't bothered with semicolon line terminators would break. regards, tom lane
On Tue, May 27, 2014 at 12:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: >> In a custom pgbench script, it seems convenient to be able to split a >> really long query to span multiple lines using an escape character >> (bash-style). Attached adds that capability to read_line_from_file() >> in pgbench.c > > This seems pretty likely to break existing scripts that happen to contain > backslashes. Is it really worth the compatibility risk? > > The patch as written has got serious problems even discounting any > compatibility risk: it will be fooled by a backslash near the end of a > bufferload that doesn't end with a newline, and it doesn't allow for > DOS-style newlines (\r\n), and it indexes off the array if the buffer > contains *only* a newline (and, assuming that it fails to crash in that > case, it'd also fail to note a backslash that had been in the previous > bufferload). > Sorry, the patch was in a really bad shape. Should have pondered these points before submitting it. Even if I drop the backslash line-continuation idea and decide to use semi-colons as SQL command separators, given the compatibility issues mentioned downthread, it would not be worthwhile. -- Amit
On Monday, May 26, 2014, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
> In a custom pgbench script, it seems convenient to be able to split a
> really long query to span multiple lines using an escape character
> (bash-style). Attached adds that capability to read_line_from_file()
> in pgbench.c
This seems pretty likely to break existing scripts that happen to contain
backslashes. Is it really worth the compatibility risk?
Do you mean due to the bugs you point out, or in general? Is it really at all likely that someone has ended a line of their custom benchmark file with a backslash? I'm having a hard time seeing what, other than malice, would prod someone to do that.
Cheers,
Jeff
On 2014-05-26 11:44:31 -0400, Tom Lane wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > In a custom pgbench script, it seems convenient to be able to split a > > really long query to span multiple lines using an escape character > > (bash-style). Attached adds that capability to read_line_from_file() > > in pgbench.c > > This seems pretty likely to break existing scripts that happen to contain > backslashes. Is it really worth the compatibility risk? Weaknesses in the implementation aside, I don't think pgbench has to be hold up to the same level of compatibility as many of our other tools. It's just a benchmark tool after all. I've more than once wished for the capability. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Jeff Janes <jeff.janes@gmail.com> writes: > On Monday, May 26, 2014, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This seems pretty likely to break existing scripts that happen to contain >> backslashes. Is it really worth the compatibility risk? > Do you mean due to the bugs you point out, or in general? Is it really at > all likely that someone has ended a line of their custom benchmark file > with a backslash? I'm having a hard time seeing what, other than malice, > would prod someone to do that. No, I was worried that the feature would pose such a risk even when correctly implemented. But on reflection, you're right, it seems a bit hard to credit that any existing script file would have a backslash just before EOL. There's certainly not SQL syntax in which that could be valid; perhaps someone would do it in a "--" comment but that seems a tad far fetched. regards, tom lane
Andres Freund wrote: > I've more than once wished for the capability. +1 -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 05/26/2014 08:52 AM, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Amit Langote wrote: >>> On Mon, May 26, 2014 at 10:59 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> IMO it's better if we can write SQL in multiples line *without* a tailing >>>> escape character, like psql's input file. > >>> Yeah, that would be much cleaner. > >> But that would require duplicating the lexing stuff to determine where >> quotes are and where commands end. There are already some cases where >> pgbench itself is the bottleneck; adding a lexing step would be more >> expensive, no? Whereas simply detecting line continuations would be >> cheaper. > > Well, we only parse the script file(s) once at run start, and that time > isn't included in the TPS timing, so I don't think performance is really > an issue here. But yeah, the amount of code that would have to be > duplicated out of psql is pretty daunting --- it'd be a maintenance > nightmare, for what seems like not a lot of gain. There would also > be a compatibility issue if we went this way, because existing scripts > that haven't bothered with semicolon line terminators would break. What if we make using semicolons or not a config option in the file? i.e.: \multiline -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: Tom Lane 2014-05-26 <26629.1401119523@sss.pgh.pa.us> > >> Yeah, that would be much cleaner. > > > But that would require duplicating the lexing stuff to determine where > > quotes are and where commands end. There are already some cases where > > pgbench itself is the bottleneck; adding a lexing step would be more > > expensive, no? Whereas simply detecting line continuations would be > > cheaper. > > Well, we only parse the script file(s) once at run start, and that time > isn't included in the TPS timing, so I don't think performance is really > an issue here. But yeah, the amount of code that would have to be > duplicated out of psql is pretty daunting --- it'd be a maintenance > nightmare, for what seems like not a lot of gain. There would also > be a compatibility issue if we went this way, because existing scripts > that haven't bothered with semicolon line terminators would break. Fwiw, I would love to have some \ line continuation thing also for .psqlrc. I have some dozen \set in there containing queries for looking into stats/locks/whatever I can invoke just typing e.g. :user_tables, and these are pretty hard to edit as they are squeezed on one line. I agree that putting an SQL parser into the backslash command parser is overkill, but there's hardly a chance backslashes at the end of a backslash command line would break anything, except for meeting what most people would expect. Christoph -- cb@df7cb.de | http://www.df7cb.de/
On Tue, May 27, 2014 at 6:19 AM, Josh Berkus <josh@agliodbs.com> wrote: > On 05/26/2014 08:52 AM, Tom Lane wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> Amit Langote wrote: >>>> On Mon, May 26, 2014 at 10:59 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>> IMO it's better if we can write SQL in multiples line *without* a tailing >>>>> escape character, like psql's input file. >> >>>> Yeah, that would be much cleaner. >> >>> But that would require duplicating the lexing stuff to determine where >>> quotes are and where commands end. There are already some cases where >>> pgbench itself is the bottleneck; adding a lexing step would be more >>> expensive, no? Whereas simply detecting line continuations would be >>> cheaper. >> >> Well, we only parse the script file(s) once at run start, and that time >> isn't included in the TPS timing, so I don't think performance is really >> an issue here. But yeah, the amount of code that would have to be >> duplicated out of psql is pretty daunting --- it'd be a maintenance >> nightmare, for what seems like not a lot of gain. There would also >> be a compatibility issue if we went this way, because existing scripts >> that haven't bothered with semicolon line terminators would break. > > What if we make using semicolons or not a config option in the file? i.e.: > > \multiline > > And perhaps make 'off' the default if I get it correctly? It would apply only to the SQL commands though, no? -- Amit