Обсуждение: Patch to fix search_path defencies with pg_bench
Hello, I have been doing some testing with pgbench and I realized that it forces the use of public as its search_path. This is bad if: * You want to run multiple pgbench instances within the same database * You don't want to use public (for whatever reason) This patch removes that ability and thus will defer to the default search_path for the connecting user. diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index ad20cac..1f25921 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -357,8 +357,6 @@ doConnect(void) return NULL; } - executeStatement(conn, "SET search_path = public"); - return conn;} -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
"Joshua D. Drake" <jd@commandprompt.com> writes: > I have been doing some testing with pgbench and I realized that it > forces the use of public as its search_path. This is bad if: > * You want to run multiple pgbench instances within the same database > * You don't want to use public (for whatever reason) > This patch removes that ability and thus will defer to the default > search_path for the connecting user. Hmm. The search_path setting seems to have been added here http://archives.postgresql.org/pgsql-committers/2002-10/msg00118.php http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/pgbench/pgbench.c.diff?r1=1.20;r2=1.21 as part of a mass patch to make everything in contrib work in the public schema. I agree that it probably wasn't considered carefully whether pg_bench should do that; but does anyone see a reason not to change it? regards, tom lane
On Tue, 5 May 2009, Tom Lane wrote: > I agree that it probably wasn't considered carefully whether pg_bench > should do that; but does anyone see a reason not to change it? I thought of one pretty weak use-case for not making this change, but would wager the additional flexibility here is far more likely to be appreciated. I'd say it's a clear net improvement. As for that case...many good database designs put all the user relations into a schema, so that it's easier to do bulk operations on all of them while avoiding catalog tables etc.--less work to filter out pg_class to find them for example. I once did some pgbench testing on a system that included a real "accounts" table in a named schema. "pgbench -i" will execute "drop table if exists accounts". It had already accidentally wiped out the copy of the accounts table on the system during an earlier test, before the schema policy was in place, leaving everyone wary of it. I was able to defend the risk for running pgbench with the new schema layout by saying "that can only execute against public.accounts no matter what the user search_path is, so you're safe now". That made everybody happy. Anyone counting on such behavior could be rudely surprised at this change. For all I know I'm the only person to ever actually run into that particular situation though. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes:
> I once did some pgbench testing on a system that included a real 
> "accounts" table in a named schema.  "pgbench -i" will execute "drop table 
> if exists accounts".  It had already accidentally wiped out the copy of 
> the accounts table on the system during an earlier test, before the schema 
> policy was in place, leaving everyone wary of it.
Seems like the right policy for that is "run pgbench in its own
database".  I doubt that either adding or removing the "set search_path"
command changes the risk of trouble very much.
        regards, tom lane
			
		Em Qua, 2009-05-06 às 09:37 -0400, Tom Lane escreveu: > Greg Smith <gsmith@gregsmith.com> writes: > > I once did some pgbench testing on a system that included a real > > "accounts" table in a named schema. "pgbench -i" will execute "drop table > > if exists accounts". It had already accidentally wiped out the copy of > > the accounts table on the system during an earlier test, before the schema > > policy was in place, leaving everyone wary of it. > > Seems like the right policy for that is "run pgbench in its own > database". A text warning about this could be shown at start of pgbench if the target database isn't named "pgbench", for examplo, or just some text could be added to the docs. regards. -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://planeta.postgresql.org.br
Dickson S. Guedes wrote: > Em Qua, 2009-05-06 às 09:37 -0400, Tom Lane escreveu: > > Seems like the right policy for that is "run pgbench in its own > > database". > > A text warning about this could be shown at start of pgbench if the > target database isn't named "pgbench", for examplo, or just some text > could be added to the docs. I think it would be better that the schema is specified on the command line. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
"Dickson S. Guedes" <listas@guedesoft.net> writes:
> Em Qua, 2009-05-06 �s 09:37 -0400, Tom Lane escreveu:
>> Seems like the right policy for that is "run pgbench in its own
>> database". 
> A text warning about this could be shown at start of pgbench if the
> target database isn't named "pgbench", for examplo, or just some text
> could be added to the docs.
There already is a prominent warning in the pgbench docs:
    Caution
pgbench -i creates four tables accounts, branches, history, andtellers, destroying any existing tables of these names.
Beverycareful to use another database if you have tables having thesenames!
 
        regards, tom lane
			
		On Wed, 2009-05-06 at 15:13 -0400, Alvaro Herrera wrote: > Dickson S. Guedes wrote: > > Em Qua, 2009-05-06 às 09:37 -0400, Tom Lane escreveu: > > > > Seems like the right policy for that is "run pgbench in its own > > > database". > > > > A text warning about this could be shown at start of pgbench if the > > target database isn't named "pgbench", for examplo, or just some text > > could be added to the docs. > > I think it would be better that the schema is specified on the command > line. I could see that as an option but applications that use a role should adhere to the rules the DBA sets forth for that role. In this particular case I explicitly said that role bench01 was to connect to the database bench and that his search path was bench01 (thus all tables would be created under the schema bench01). Public should never come into play in that scenario. Sincerely, Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I think it would be better that the schema is specified on the command
> line.
Surely that's more work than the issue is worth.  It's also inconvenient
to use, because you'd have to remember to give the switch both for the
-i run and the normal test runs.
        regards, tom lane
			
		On Wed, 2009-05-06 at 17:42 -0300, Dickson S. Guedes wrote: > Em Qua, 2009-05-06 às 16:27 -0400, Tom Lane escreveu: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > I think it would be better that the schema is specified on the command > > > line. > > > > Surely that's more work than the issue is worth. It's also inconvenient > > to use, because you'd have to remember to give the switch both for the > > -i run and the normal test runs. > > So, in my opinion, the Joshua alternative is a good little change that > let "pgbench" runs in a more flexible way. > > But, there is the possibility that someone are using an automated script > that could be broken by this change? Only if the role pgbench is using as an explicit search_path set. Sincerely, Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Em Qua, 2009-05-06 às 16:27 -0400, Tom Lane escreveu: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I think it would be better that the schema is specified on the command > > line. > > Surely that's more work than the issue is worth. It's also inconvenient > to use, because you'd have to remember to give the switch both for the > -i run and the normal test runs. So, in my opinion, the Joshua alternative is a good little change that let "pgbench" runs in a more flexible way. But, there is the possibility that someone are using an automated script that could be broken by this change? -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://planeta.postgresql.org.br
Em Qua, 2009-05-06 às 13:49 -0700, Joshua D. Drake escreveu: > On Wed, 2009-05-06 at 17:42 -0300, Dickson S. Guedes wrote: > > Em Qua, 2009-05-06 às 16:27 -0400, Tom Lane escreveu: > > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > > I think it would be better that the schema is specified on the command > > > > line. > > > > > > Surely that's more work than the issue is worth. It's also inconvenient > > > to use, because you'd have to remember to give the switch both for the > > > -i run and the normal test runs. > > > > So, in my opinion, the Joshua alternative is a good little change that > > let "pgbench" runs in a more flexible way. > > > > But, there is the possibility that someone are using an automated script > > that could be broken by this change? > > Only if the role pgbench is using as an explicit search_path set. So, in a way to avoid the scenario where a ROLE has an explicit search_path set to schemes that already have tables named same as the pgbench's tables, doesn't makes sense also create a "pgbench_" suffix for them? -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://planeta.postgresql.org.br
"Joshua D. Drake" <jd@commandprompt.com> writes:
> On Wed, 2009-05-06 at 17:42 -0300, Dickson S. Guedes wrote:
>> But, there is the possibility that someone are using an automated script
>> that could be broken by this change? 
> Only if the role pgbench is using as an explicit search_path set.
Even then, it's not a problem from the point of view of pgbench ---
the tables will still get created and used correctly.  The only problem
shows up if someone is ignoring the existing warning in the docs and
running pgbench in a database that has application tables named accounts
&etc.  If you're doing that you're at considerable risk anyway, no
matter *what* we do or don't do with pgbench's search path.
        regards, tom lane
			
		"Dickson S. Guedes" <listas@guedesoft.net> writes:
> So, in a way to avoid the scenario where a ROLE has an explicit
> search_path set to schemes that already have tables named same as the
> pgbench's tables, doesn't makes sense also create a "pgbench_" suffix
> for them?
Hm, just rename the standard scenario's tables to pgbench_accounts
etc?  Sure, but then we break custom pgbench scripts that happen
to be using the default tables for their own purposes.  There's
no free lunch.
        regards, tom lane
			
		On Wed, 2009-05-06 at 15:18 -0400, Tom Lane wrote: > "Dickson S. Guedes" <listas@guedesoft.net> writes: > > Em Qua, 2009-05-06 s 09:37 -0400, Tom Lane escreveu: > >> Seems like the right policy for that is "run pgbench in its own > >> database". > > > A text warning about this could be shown at start of pgbench if the > > target database isn't named "pgbench", for examplo, or just some text > > could be added to the docs. > > There already is a prominent warning in the pgbench docs: > > Caution > > pgbench -i creates four tables accounts, branches, history, and > tellers, destroying any existing tables of these names. Be very > careful to use another database if you have tables having these > names! Holy Handgrenade, what a huge footgun! It doesn't even have a conceivable upside. The table names "accounts" and "history" are fairly common and a caution isn't a sufficient safeguard on production data. We know the manual rarely gets read *after* a problem, let alone beforehand. We should check they are the correct tables before we just drop them. Perhaps check for the comment "Tables for pgbench application. Not production data" on the tables, which would be nice to add anyway. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Thu, May 7, 2009 at 10:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Wed, 2009-05-06 at 15:18 -0400, Tom Lane wrote: >> "Dickson S. Guedes" <listas@guedesoft.net> writes: >> > Em Qua, 2009-05-06 s 09:37 -0400, Tom Lane escreveu: >> >> Seems like the right policy for that is "run pgbench in its own >> >> database". >> >> > A text warning about this could be shown at start of pgbench if the >> > target database isn't named "pgbench", for examplo, or just some text >> > could be added to the docs. >> >> There already is a prominent warning in the pgbench docs: >> >> Caution >> >> pgbench -i creates four tables accounts, branches, history, and >> tellers, destroying any existing tables of these names. Be very >> careful to use another database if you have tables having these >> names! > > Holy Handgrenade, what a huge footgun! It doesn't even have a > conceivable upside. > > The table names "accounts" and "history" are fairly common and a caution > isn't a sufficient safeguard on production data. We know the manual > rarely gets read *after* a problem, let alone beforehand. > > We should check they are the correct tables before we just drop them. > Perhaps check for the comment "Tables for pgbench application. Not > production data" on the tables, which would be nice to add anyway. I bet it would be just as good and a lot simpler to do what someone suggested upthread, namely s/^/pgbench_/ ...Robert
On Thu, 2009-05-07 at 11:14 -0400, Robert Haas wrote: > > We should check they are the correct tables before we just drop them. > > Perhaps check for the comment "Tables for pgbench application. Not > > production data" on the tables, which would be nice to add anyway. > > I bet it would be just as good and a lot simpler to do what someone > suggested upthread, namely s/^/pgbench_/ Running pgbench has become more popular now, with various people recommending running it on every system to test performance. I don't disagree with that recommendation, but I've had problems myself with a similar issue - hence earlier patch to prevent pgbench running a complete database VACUUM before every test run. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
* Robert Haas <robertmhaas@gmail.com> [090507 11:15]: > I bet it would be just as good and a lot simpler to do what someone > suggested upthread, namely s/^/pgbench_/ That has the "legacy compatibility" problem... But seeing as "legacy" has a:SET search_path TO public; And uses plain <table> in it's queries/creates/drops, couldn't we just make "new" pgbench refer to tables as <schema>.<table> where <schema> is "public"? If we leave "schema" as public, and leave in the search_path, we should be identical to what we currently have, except we've explicliyt scoped was was searched for before. And it leads to an easy way for people to change public (in the search path and/or <schema>.<table>) to do other things (although I'm not saying that's necessarily required or desired either). a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Thu, 2009-05-07 at 11:14 -0400, Robert Haas wrote:
>>> We should check they are the correct tables before we just drop them.
>>> Perhaps check for the comment "Tables for pgbench application. Not
>>> production data" on the tables, which would be nice to add anyway.
>> 
>> I bet it would be just as good and a lot simpler to do what someone
>> suggested upthread, namely s/^/pgbench_/
> Running pgbench has become more popular now, with various people
> recommending running it on every system to test performance. I don't
> disagree with that recommendation, but I've had problems myself with a
> similar issue - hence earlier patch to prevent pgbench running a
> complete database VACUUM before every test run.
Well, pgbench has been coded this way since forever and we've only seen
this one report of trouble.  Still, I can't object very hard to renaming
the tables to pgbench_accounts etc --- it's a trivial change and the
only thing it could break is custom pgbench scenarios that rely on the
default scenario's tables, which there are probably not many of.
So do we have consensus on dropping the "SET search_path" and renaming
the tables?
        regards, tom lane
			
		Aidan Van Dyk <aidan@highrise.ca> writes:
> ... couldn't we just
> make "new" pgbench refer to tables as <schema>.<table> where <schema> is
> "public"?
I'd prefer not to do that because it changes the amount of parsing work
demanded by the benchmark.  Maybe not by enough to matter ... or maybe
it does.  Adjusting the length of the identifiers is a small enough
change that I'm prepared to believe it doesn't invalidate comparisons,
but changing the set of catalog lookups that occur is another question.
        regards, tom lane
			
		On Thu, 2009-05-07 at 12:47 -0400, Tom Lane wrote: > Well, pgbench has been coded this way since forever and we've only seen > this one report of trouble. Still, I can't object very hard to renaming > the tables to pgbench_accounts etc --- it's a trivial change and the > only thing it could break is custom pgbench scenarios that rely on the > default scenario's tables, which there are probably not many of. > > So do we have consensus on dropping the "SET search_path" and renaming > the tables? +1 (I hate prefixed table names but I get the idea) Joshua D. Drake > > regards, tom lane > -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
* Tom Lane <tgl@sss.pgh.pa.us> [090507 12:53]: > Aidan Van Dyk <aidan@highrise.ca> writes: > > ... couldn't we just > > make "new" pgbench refer to tables as <schema>.<table> where <schema> is > > "public"? > > I'd prefer not to do that because it changes the amount of parsing work > demanded by the benchmark. Maybe not by enough to matter ... or maybe > it does. Adjusting the length of the identifiers is a small enough > change that I'm prepared to believe it doesn't invalidate comparisons, > but changing the set of catalog lookups that occur is another question. True enough... What about making the prefix be configurable, so by default, it could be "pgbench_", it could be set to "" (to force it to use old pgbench names) or set to "something." to get it to use a different schema (noting that the comparisons to older ones not doing catalog lookups are void). But by dropping the search_path, you're necessarily changing the catalog comparisons and lookups anyways, because your now taking a "random" search path to follow (which could have multiple entries in it) instead of one guaranteed to be a single, useable entry. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Thu, 2009-05-07 at 12:58 -0400, Aidan Van Dyk wrote: > True enough... What about making the prefix be configurable, so by > default, it could be "pgbench_", it could be set to "" (to force it to > use old pgbench names) or set to "something." to get it to use a > different schema (noting that the comparisons to older ones not doing > catalog lookups are void). Then you have to pass the prefix on the command line. That seems a bit over doing it for such a simple utility. > > But by dropping the search_path, you're necessarily changing the catalog > comparisons and lookups anyways, because your now taking a "random" > search path to follow (which could have multiple entries in it) instead > of one guaranteed to be a single, useable entry. Except that it isn't a guaranteed usable entry, which is why I submitted the patch. Sincerely, Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
* Joshua D. Drake <jd@commandprompt.com> [090507 13:02]: > On Thu, 2009-05-07 at 12:58 -0400, Aidan Van Dyk wrote: > > > True enough... What about making the prefix be configurable, so by > > default, it could be "pgbench_", it could be set to "" (to force it to > > use old pgbench names) or set to "something." to get it to use a > > different schema (noting that the comparisons to older ones not doing > > catalog lookups are void). > > Then you have to pass the prefix on the command line. That seems a bit > over doing it for such a simple utility. Sure, but by putting a sane default (which seems to be leaning towards "" or "pgbench_"), you don't *need* to do anything on the command line. > > But by dropping the search_path, you're necessarily changing the catalog > > comparisons and lookups anyways, because your now taking a "random" > > search path to follow (which could have multiple entries in it) instead > > of one guaranteed to be a single, useable entry. > > Except that it isn't a guaranteed usable entry, which is why I submitted > the patch. Well ya, but at least you didn't have any pgbench result to try and "compare unevenly" with something else ;-) a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Thu, 2009-05-07 at 09:53 -0700, Joshua D. Drake wrote: > On Thu, 2009-05-07 at 12:47 -0400, Tom Lane wrote: > > > Well, pgbench has been coded this way since forever and we've only seen > > this one report of trouble. Still, I can't object very hard to renaming > > the tables to pgbench_accounts etc --- it's a trivial change and the > > only thing it could break is custom pgbench scenarios that rely on the > > default scenario's tables, which there are probably not many of. > > > > So do we have consensus on dropping the "SET search_path" and renaming > > the tables? > > +1 (I hate prefixed table names but I get the idea) +1, sorry JD. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
"Joshua D. Drake" <jd@commandprompt.com> writes:
> On Thu, 2009-05-07 at 12:58 -0400, Aidan Van Dyk wrote:
>> True enough... What about making the prefix be configurable, so by
>> default, it could be "pgbench_", it could be set to "" (to force it to
>> use old pgbench names) or set to "something." to get it to use a
>> different schema (noting that the comparisons to older ones not doing
>> catalog lookups are void).
> Then you have to pass the prefix on the command line. That seems a bit
> over doing it for such a simple utility.
Yes, this seems like vastly more work than is called for.
>> But by dropping the search_path, you're necessarily changing the catalog
>> comparisons and lookups anyways, because your now taking a "random"
>> search path to follow (which could have multiple entries in it) instead
>> of one guaranteed to be a single, useable entry.
> Except that it isn't a guaranteed usable entry, which is why I submitted
> the patch.
I think this argument is bogus anyway.  The tables are always going to be
created in the default creation schema, ie, the first one on the search
path.  As long as you don't change the effective search_path between
pgbench -i and the actual test runs, it won't matter whether that is
public or something else.
        regards, tom lane
			
		"Joshua D. Drake" <jd@commandprompt.com> writes:
> --- a/contrib/pgbench/pgbench.c
> +++ b/contrib/pgbench/pgbench.c
> @@ -357,8 +357,6 @@ doConnect(void)
>                 return NULL;
>         }
>  
> -       executeStatement(conn, "SET search_path = public");
> -
>         return conn;
>  }
Applied along with changes of table names accounts -> pgbench_accounts
etc, per discussion.
        regards, tom lane
			
		On Thu, 7 May 2009, Aidan Van Dyk wrote: > But by dropping the search_path, you're necessarily changing the catalog > comparisons and lookups anyways, because your now taking a "random" > search path to follow (which could have multiple entries in it) instead > of one guaranteed to be a single, useable entry. You are correct here. Right now, pgbench is guaranteed to be running against a search_path with only one entry in it. If someone runs the new version against a configuration with something like: search_path='a,b,c,d,e,f,g,h,i,j,public' instead, that is going to execute more slowly than the current pgbench would have. But it seems pretty unlikely such a change would actually be noticable relative to how much per-transaction overhead and run to run variation there already is in pgbench for reasonably sized catalogs. Maybe it's worth adding a quick comment about the issue in the docs, I don't think this downside is significant enough to worry about beyond that. I think Joshua's original suggestion here is worth considering a bug fix for merging into 8.4. As long as testers don't do anything crazy with their manually set search_path, results should be identical with the earlier verions. The additional suggestion of renaming the tables with a prefix is reasonable to me, but it seems way out of scope for something to consider applying right now though. If you look at the pgbench client, there's a lot of string parsing going on in there that's not particularly efficient. I'd want to see a benchmark aimed that quantifying whether that part suffers measurably from making the table names all longer before such a change got applied. And there's already a couple of us who are in the middle of 8.4 pgbench tests that really don't need disruption like that thrown into the mix right now. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes:
> On Thu, 7 May 2009, Aidan Van Dyk wrote:
> You are correct here.  Right now, pgbench is guaranteed to be running 
> against a search_path with only one entry in it.  If someone runs the new 
> version against a configuration with something like:
> search_path='a,b,c,d,e,f,g,h,i,j,public'
> instead, that is going to execute more slowly than the current pgbench 
> would have.
No, it is not.  The tables will be created and used in schema 'a',
and the effective search path depth will be the same.
        regards, tom lane
			
		On Thu, 7 May 2009, Tom Lane wrote: > The tables will be created and used in schema 'a', and the effective > search path depth will be the same. The case to be concerned about here is where the search_path changes between initialization and the pgbench run, which certainly isn't impossible. That can leave you with a longer effective path to search. Pretty unlikely to be a problem in the field though. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes:
> On Thu, 7 May 2009, Tom Lane wrote:
>> The tables will be created and used in schema 'a', and the effective 
>> search path depth will be the same.
> The case to be concerned about here is where the search_path changes 
> between initialization and the pgbench run, which certainly isn't 
> impossible.  That can leave you with a longer effective path to search. 
> Pretty unlikely to be a problem in the field though.
Yeah.  Also, there is another consideration here that hasn't been
brought up AFAIR: the main point of running pgbench in-the-field
is to find out whether your installation is properly tuned.  If
you've chosen a search_path setting that *did* have some unexpected
performance issues, wouldn't you want pgbench to reveal that?
It's peculiar to have pgbench forcing this one particular GUC setting
and not any others.
        regards, tom lane