Обсуждение: random_page_cost = 2.0 on Heroku Postgres
Per the thread from last month, I've updated the default random_page_cost on Heroku Postgres to reduce the expected cost of a random_page on all new databases. Thanks to everyone who helped come to this conclusion! Peter -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On 2/7/12 4:59 PM, Peter van Hardenberg wrote: > Per the thread from last month, I've updated the default > random_page_cost on Heroku Postgres to reduce the expected cost of a > random_page on all new databases. This is because Heroku uses AWS storage, which has fast seeks but poor throughput compared to internal disk on a standard system, BTW. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Feb 8, 2012 at 5:50 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 2/7/12 4:59 PM, Peter van Hardenberg wrote: >> Per the thread from last month, I've updated the default >> random_page_cost on Heroku Postgres to reduce the expected cost of a >> random_page on all new databases. > > This is because Heroku uses AWS storage, which has fast seeks but poor > throughput compared to internal disk on a standard system, BTW. Also judging by the other thread, it might be something to stop closer to 1.2 to 1.4 or something.
Having read the thread, I don't really see how I could study what a more principled value would be. That said, I have access to a very large fleet in which to can collect data so I'm all ears for suggestions about how to measure and would gladly share the results with the list. Peter On Wed, Feb 8, 2012 at 5:39 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Feb 8, 2012 at 5:50 PM, Josh Berkus <josh@agliodbs.com> wrote: >> On 2/7/12 4:59 PM, Peter van Hardenberg wrote: >>> Per the thread from last month, I've updated the default >>> random_page_cost on Heroku Postgres to reduce the expected cost of a >>> random_page on all new databases. >> >> This is because Heroku uses AWS storage, which has fast seeks but poor >> throughput compared to internal disk on a standard system, BTW. > > Also judging by the other thread, it might be something to stop closer > to 1.2 to 1.4 or something. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: > Having read the thread, I don't really see how I could study what a > more principled value would be. Agreed. Just pointing out more research needs to be done. > That said, I have access to a very large fleet in which to can collect > data so I'm all ears for suggestions about how to measure and would > gladly share the results with the list. I wonder if some kind of script that grabbed random queries and ran them with explain analyze and various random_page_cost to see when they switched and which plans are faster would work?
On 08/02/12 21:15, Peter van Hardenberg wrote: > Having read the thread, I don't really see how I could study what a > more principled value would be. > > That said, I have access to a very large fleet in which to can collect > data so I'm all ears for suggestions about how to measure and would > gladly share the results with the list. > > Peter > > On Wed, Feb 8, 2012 at 5:39 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >> On Wed, Feb 8, 2012 at 5:50 PM, Josh Berkus<josh@agliodbs.com> wrote: >>> On 2/7/12 4:59 PM, Peter van Hardenberg wrote: >>>> Per the thread from last month, I've updated the default >>>> random_page_cost on Heroku Postgres to reduce the expected cost of a >>>> random_page on all new databases. >>> This is because Heroku uses AWS storage, which has fast seeks but poor >>> throughput compared to internal disk on a standard system, BTW. >> Also judging by the other thread, it might be something to stop closer >> to 1.2 to 1.4 or something. >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance > > You can execute several queries with the three different values provided by Scott and Josh. - SET random_page_cost = 2.0 First execution of the queries with EXPLAIN ANALYZE - SET random_page_cost = 1.4 Second execution of the queries with EXPLAIN ANALYZE - SET random_page_cost = 1.2 Second execution of the queries with EXPLAIN ANALYZE And then, you can compare the pattern behind these queries executions Regards, -- Marcos Luis Ortíz Valmaseda Sr. Software Engineer (UCI) http://marcosluis2186.posterous.com http://www.linkedin.com/in/marcosluis2186 Twitter: @marcosluis2186 Fin a la injusticia, LIBERTAD AHORA A NUESTROS CINCO COMPATRIOTAS QUE SE ENCUENTRAN INJUSTAMENTE EN PRISIONES DE LOS EEUU! http://www.antiterroristas.cu http://justiciaparaloscinco.wordpress.com
On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: >> That said, I have access to a very large fleet in which to can collect >> data so I'm all ears for suggestions about how to measure and would >> gladly share the results with the list. > > I wonder if some kind of script that grabbed random queries and ran > them with explain analyze and various random_page_cost to see when > they switched and which plans are faster would work? We aren't exactly in a position where we can adjust random_page_cost on our users' databases arbitrarily to see what breaks. That would be... irresponsible of us. How would one design a meta-analyzer which we could run across many databases and collect data? Could we perhaps collect useful information from pg_stat_user_indexes, for example? -p -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On Wed, Feb 8, 2012 at 6:47 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: > On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: >>> That said, I have access to a very large fleet in which to can collect >>> data so I'm all ears for suggestions about how to measure and would >>> gladly share the results with the list. >> >> I wonder if some kind of script that grabbed random queries and ran >> them with explain analyze and various random_page_cost to see when >> they switched and which plans are faster would work? > > We aren't exactly in a position where we can adjust random_page_cost > on our users' databases arbitrarily to see what breaks. That would > be... irresponsible of us. > Oh, of course we could do this on the session, but executing potentially expensive queries would still be unneighborly. Perhaps another way to think of this problem would be that we want to find queries where the cost estimate is inaccurate. -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On Wed, Feb 8, 2012 at 7:54 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: > On Wed, Feb 8, 2012 at 6:47 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: >> On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >>> On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: >>>> That said, I have access to a very large fleet in which to can collect >>>> data so I'm all ears for suggestions about how to measure and would >>>> gladly share the results with the list. >>> >>> I wonder if some kind of script that grabbed random queries and ran >>> them with explain analyze and various random_page_cost to see when >>> they switched and which plans are faster would work? >> >> We aren't exactly in a position where we can adjust random_page_cost >> on our users' databases arbitrarily to see what breaks. That would >> be... irresponsible of us. >> > > Oh, of course we could do this on the session, but executing > potentially expensive queries would still be unneighborly. > > Perhaps another way to think of this problem would be that we want to > find queries where the cost estimate is inaccurate. Yeah, have a script the user runs for you heroku guys in their spare time to see what queries are using the most time and then to jangle the random_page_cost while running them to get an idea what's faster and why.
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > You can execute several queries with the three different values provided > by Scott and Josh. > - SET random_page_cost = 2.0 > First execution of the queries with EXPLAIN ANALYZE > - SET random_page_cost = 1.4 > Second execution of the queries with EXPLAIN ANALYZE > - SET random_page_cost = 1.2 > Second execution of the queries with EXPLAIN ANALYZE Well, such a tool would ideally be smarter than that, such that you would run EXPLAIN and compare to the previous plan and only run EXPLAIN ANALYZE if the plan changed. One could even decrement rpc slowly and find out at one points it changes, which would be more interesting than testing arbitrary numbers. Would lead to some really sweet graphs as well. :) - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201202082338 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAk8zTewACgkQvJuQZxSWSsiprACfTlYKiC4SS1UnERU+1N/2EGhJ s9AAoIXLJk88hoNHEkWKhUTqikDBtC/B =S65l -----END PGP SIGNATURE-----
On 09/02/12 00:09, Greg Sabino Mullane wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > >> You can execute several queries with the three different values provided >> by Scott and Josh. >> - SET random_page_cost = 2.0 >> First execution of the queries with EXPLAIN ANALYZE >> - SET random_page_cost = 1.4 >> Second execution of the queries with EXPLAIN ANALYZE >> - SET random_page_cost = 1.2 >> Second execution of the queries with EXPLAIN ANALYZE > Well, such a tool would ideally be smarter than that, such that > you would run EXPLAIN and compare to the previous plan and > only run EXPLAIN ANALYZE if the plan changed. One could even > decrement rpc slowly and find out at one points it changes, > which would be more interesting than testing arbitrary numbers. > Would lead to some really sweet graphs as well. :) > Well, the MyYearBook.com´s guys built something seemed called Posuta, I don´t know is this project is alive, but we can ask to them (michael.glaesemann@myyearbook.com). http://area51.myyearbook.com Posuta can be a starting point for it. It uses Ruby and Clojure for core functionalities, jQuery/Flot for graphics, -- Marcos Luis Ortíz Valmaseda Sr. Software Engineer (UCI) http://marcosluis2186.posterous.com http://www.linkedin.com/in/marcosluis2186 Twitter: @marcosluis2186 Fin a la injusticia, LIBERTAD AHORA A NUESTROS CINCO COMPATRIOTAS QUE SE ENCUENTRAN INJUSTAMENTE EN PRISIONES DE LOS EEUU! http://www.antiterroristas.cu http://justiciaparaloscinco.wordpress.com
On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: >> Having read the thread, I don't really see how I could study what a >> more principled value would be. > > Agreed. Just pointing out more research needs to be done. > >> That said, I have access to a very large fleet in which to can collect >> data so I'm all ears for suggestions about how to measure and would >> gladly share the results with the list. > > I wonder if some kind of script that grabbed random queries and ran > them with explain analyze and various random_page_cost to see when > they switched and which plans are faster would work? But if you grab a random query and execute it repeatedly, you drastically change the caching. Results from any execution after the first one are unlikely to give you results which are meaningful to the actual production situation. Cheers, Jeff
Hmm, perhaps we could usefully aggregate auto_explain output. On Thu, Feb 9, 2012 at 7:32 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: >>> Having read the thread, I don't really see how I could study what a >>> more principled value would be. >> >> Agreed. Just pointing out more research needs to be done. >> >>> That said, I have access to a very large fleet in which to can collect >>> data so I'm all ears for suggestions about how to measure and would >>> gladly share the results with the list. >> >> I wonder if some kind of script that grabbed random queries and ran >> them with explain analyze and various random_page_cost to see when >> they switched and which plans are faster would work? > > But if you grab a random query and execute it repeatedly, you > drastically change the caching. > > Results from any execution after the first one are unlikely to give > you results which are meaningful to the actual production situation. > > Cheers, > > Jeff -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On Thu, Feb 9, 2012 at 3:41 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: > Hmm, perhaps we could usefully aggregate auto_explain output. How about something where you run a site at random_page cost of x, then y, then z and you do some aggregating of query times in each. A scatter plot should tell you lots.
On 2/9/12 2:41 PM, Peter van Hardenberg wrote: > Hmm, perhaps we could usefully aggregate auto_explain output. The other option is to take a statistical approach. After all, what you want to do is optimize average response times across all your user's databases, not optimize for a few specific queries. So one thought would be to add in pg_stat_statements to your platform ... something I'd like to see Heroku do anyway. Then you can sample this across dozens (or hundreds) of user databases, each with RPC set to a slightly different level, and aggregate it into a heat map. That's the way I'd do it, anyway. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Le vendredi 10 février 2012 20:32:50, Josh Berkus a écrit : > On 2/9/12 2:41 PM, Peter van Hardenberg wrote: > > Hmm, perhaps we could usefully aggregate auto_explain output. > > The other option is to take a statistical approach. After all, what you > want to do is optimize average response times across all your user's > databases, not optimize for a few specific queries. > > So one thought would be to add in pg_stat_statements to your platform > ... something I'd like to see Heroku do anyway. Then you can sample > this across dozens (or hundreds) of user databases, each with RPC set to > a slightly different level, and aggregate it into a heat map. > > That's the way I'd do it, anyway. in such set up, I sometime build a ratio between transactions processed and CPU usage, many indicators exists, inside and outside DB, that are useful to combine and use just as a 'this is normal behavior'. It turns to be easy in the long term to see if things go better or worse. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On Fri, Feb 10, 2012 at 11:32 AM, Josh Berkus <josh@agliodbs.com> wrote: > On 2/9/12 2:41 PM, Peter van Hardenberg wrote: > So one thought would be to add in pg_stat_statements to your platform > ... something I'd like to see Heroku do anyway. Then you can sample > this across dozens (or hundreds) of user databases, each with RPC set to > a slightly different level, and aggregate it into a heat map. > We've funded some work by Peter Geoghegan to make pg_stat_statements more useful, but the patch is currently sitting in the commitfest in need of a champion. I'd very much like to see it landed. Between that work, 9.2, and Dimitri's extension whitelist module, pg_stat_statements should be usefully installable by everyone. We'd seriously consider installing it by default, but that would make us Not Vanilla, which is something we avoid very diligently. -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
Peter, > We've funded some work by Peter Geoghegan to make pg_stat_statements > more useful, but the patch is currently sitting in the commitfest in > need of a champion. I'd very much like to see it landed. Ok, let me review it then ... > Between that work, 9.2, and Dimitri's extension whitelist module, > pg_stat_statements should be usefully installable by everyone. We'd > seriously consider installing it by default, but that would make us > Not Vanilla, which is something we avoid very diligently. Bummer. You can get why this would be useful for autotuning, though, yes? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Feb 10, 2012 at 5:40 PM, Josh Berkus <josh@agliodbs.com> wrote: > Peter, > >> We've funded some work by Peter Geoghegan to make pg_stat_statements >> more useful, but the patch is currently sitting in the commitfest in >> need of a champion. I'd very much like to see it landed. > > Ok, let me review it then ... > >> Between that work, 9.2, and Dimitri's extension whitelist module, >> pg_stat_statements should be usefully installable by everyone. We'd >> seriously consider installing it by default, but that would make us >> Not Vanilla, which is something we avoid very diligently. > > Bummer. You can get why this would be useful for autotuning, though, yes? > Absolutely! Everything we can do to better monitor users' database performance without interfering (unduly) with that performance or affecting their dump/restore cycle is very exciting for us. -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On Thu, Feb 9, 2012 at 2:41 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: > Hmm, perhaps we could usefully aggregate auto_explain output. By the time you realize the query is running long, it is too late to start analyzing it. And without analyzing it, you probably can't get the information you need. Maybe with the timing = off feature,it would might make sense to just preemptively analyze everything. Cheers, Jeff
On Sat, Feb 11, 2012 at 8:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > By the time you realize the query is running long, it is too late to > start analyzing it. And without analyzing it, you probably can't get > the information you need. > > Maybe with the timing = off feature,it would might make sense to just > preemptively analyze everything. > As before, I'm reluctant to introduce structural performance costs across the fleet, though I suppose giving users the option to opt out might ameliorate that. I don't think I have time right now to seriously explore these ideas, but I'll keep it in the back of my mind for a while. If anyone is interested in seriously exploring the idea of researching query planner accuracy across an enormous fleet of production databases with the goal of feeding that information back to the project please feel free to contact me off-list. -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On Thu, Feb 9, 2012 at 5:29 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Thu, Feb 9, 2012 at 3:41 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: >> Hmm, perhaps we could usefully aggregate auto_explain output. > > How about something where you run a site at random_page cost of x, > then y, then z and you do some aggregating of query times in each. A > scatter plot should tell you lots. Is there an easy and unintrusive way to get such a metric as the aggregated query times? And to normalize it for how much work happens to have been doing on at the time? Without a good way to do normalization, you could just do lots of tests with randomized settings, to average out any patterns in workload, but that means you need an awful lot of tests to have enough data to rely on randomization. But it would be desirable to do this anyway, in case the normalization isn't as effective as we think. But how long should each setting be tested for? If a different setting causes certain index to start being used, then performance would go down until those indexes get cached and then increase from there. But how long is long enough to allow this to happen? Thanks, Jeff
> Is there an easy and unintrusive way to get such a metric as the > aggregated query times? And to normalize it for how much work > happens > to have been doing on at the time? You'd pretty much need to do large-scale log harvesting combined with samples of query concurrency taken several times perminute. Even that won't "normalize" things the way you want, though, since all queries are not equal in terms of theamount of data they hit. Given that, I'd personally take a statistical approach. Sample query execution times across a large population of serversand over a moderate amount of time. Then apply common tests of statistical significance. This is why Heroku hasthe opportunity to do this in a way that smaller sites could not; they have enough servers to (probably) cancel out anyrandom activity effects. --Josh Berkus
On Sun, Feb 12, 2012 at 12:01 PM, Joshua Berkus <josh@agliodbs.com> wrote: > You'd pretty much need to do large-scale log harvesting combined with samples of query concurrency taken several timesper minute. Even that won't "normalize" things the way you want, though, since all queries are not equal in terms ofthe amount of data they hit. > > Given that, I'd personally take a statistical approach. Sample query execution times across a large population of serversand over a moderate amount of time. Then apply common tests of statistical significance. This is why Heroku hasthe opportunity to do this in a way that smaller sites could not; they have enough servers to (probably) cancel out anyrandom activity effects. > Yes, I think if we could normalize, anonymize, and randomly EXPLAIN ANALYZE 0.1% of all queries that run on our platform we could look for bad choices by the planner. I think the potential here could be quite remarkable. -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On 12 February 2012 22:28, Peter van Hardenberg <pvh@pvh.ca> wrote: > Yes, I think if we could normalize, anonymize, and randomly EXPLAIN > ANALYZE 0.1% of all queries that run on our platform we could look for > bad choices by the planner. I think the potential here could be quite > remarkable. Tom Lane suggested that plans, rather than the query tree, might be a more appropriate thing for the new pg_stat_statements to be hashing, as plans should be directly blamed for execution costs. While I don't think that that's appropriate for normalisation (consider that there'd often be duplicate pg_stat_statements entries per query), it does seem like an idea that could be worked into a future revision, to detect problematic plans. Maybe it could be usefully combined with auto_explain or something like that (in a revision of auto_explain that doesn't necessarily explain every plan, and therefore doesn't pay the considerable overhead of that instrumentation across the board). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services