Обсуждение: random_page_cost = 2.0 on Heroku Postgres

Поиск

Список

Период

Сортировка

random_page_cost = 2.0 on Heroku Postgres

От

Peter van Hardenberg

Дата:

08 февраля 2012 г., 00:00:33

Per the thread from last month, I've updated the default
random_page_cost on Heroku Postgres to reduce the expected cost of a
random_page on all new databases.

Thanks to everyone who helped come to this conclusion!

Peter

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

Re: random_page_cost = 2.0 on Heroku Postgres

От

Josh Berkus

Дата:

08 февраля 2012 г., 23:51:06

On 2/7/12 4:59 PM, Peter van Hardenberg wrote:
> Per the thread from last month, I've updated the default
> random_page_cost on Heroku Postgres to reduce the expected cost of a
> random_page on all new databases.

This is because Heroku uses AWS storage, which has fast seeks but poor
throughput compared to internal disk on a standard system, BTW.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: random_page_cost = 2.0 on Heroku Postgres

От

Scott Marlowe

Дата:

09 февраля 2012 г., 00:39:34

On Wed, Feb 8, 2012 at 5:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 2/7/12 4:59 PM, Peter van Hardenberg wrote:
>> Per the thread from last month, I've updated the default
>> random_page_cost on Heroku Postgres to reduce the expected cost of a
>> random_page on all new databases.
>
> This is because Heroku uses AWS storage, which has fast seeks but poor
> throughput compared to internal disk on a standard system, BTW.

Also judging by the other thread, it might be something to stop closer
to 1.2 to 1.4 or something.

Re: random_page_cost = 2.0 on Heroku Postgres

От

Peter van Hardenberg

Дата:

09 февраля 2012 г., 00:46:34

Having read the thread, I don't really see how I could study what a
more principled value would be.

That said, I have access to a very large fleet in which to can collect
data so I'm all ears for suggestions about how to measure and would
gladly share the results with the list.

Peter

On Wed, Feb 8, 2012 at 5:39 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Feb 8, 2012 at 5:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> On 2/7/12 4:59 PM, Peter van Hardenberg wrote:
>>> Per the thread from last month, I've updated the default
>>> random_page_cost on Heroku Postgres to reduce the expected cost of a
>>> random_page on all new databases.
>>
>> This is because Heroku uses AWS storage, which has fast seeks but poor
>> throughput compared to internal disk on a standard system, BTW.
>
> Also judging by the other thread, it might be something to stop closer
> to 1.2 to 1.4 or something.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

Re: random_page_cost = 2.0 on Heroku Postgres

От

Scott Marlowe

Дата:

09 февраля 2012 г., 01:28:39

On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
> Having read the thread, I don't really see how I could study what a
> more principled value would be.

Agreed.  Just pointing out more research needs to be done.

> That said, I have access to a very large fleet in which to can collect
> data so I'm all ears for suggestions about how to measure and would
> gladly share the results with the list.

I wonder if some kind of script that grabbed random queries and ran
them with explain analyze and various random_page_cost to see when
they switched and which plans are faster would work?

Re: random_page_cost = 2.0 on Heroku Postgres

От

Marcos Ortiz Valmaseda

Дата:

09 февраля 2012 г., 01:35:54

On 08/02/12 21:15, Peter van Hardenberg wrote:
> Having read the thread, I don't really see how I could study what a
> more principled value would be.
>
> That said, I have access to a very large fleet in which to can collect
> data so I'm all ears for suggestions about how to measure and would
> gladly share the results with the list.
>
> Peter
>
> On Wed, Feb 8, 2012 at 5:39 PM, Scott Marlowe<scott.marlowe@gmail.com>  wrote:
>> On Wed, Feb 8, 2012 at 5:50 PM, Josh Berkus<josh@agliodbs.com>  wrote:
>>> On 2/7/12 4:59 PM, Peter van Hardenberg wrote:
>>>> Per the thread from last month, I've updated the default
>>>> random_page_cost on Heroku Postgres to reduce the expected cost of a
>>>> random_page on all new databases.
>>> This is because Heroku uses AWS storage, which has fast seeks but poor
>>> throughput compared to internal disk on a standard system, BTW.
>> Also judging by the other thread, it might be something to stop closer
>> to 1.2 to 1.4 or something.
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>
>
You can execute several queries with the three different values provided
by Scott and Josh.
- SET random_page_cost  = 2.0
First execution of the queries with EXPLAIN ANALYZE
- SET random_page_cost  = 1.4
Second execution of the queries with EXPLAIN ANALYZE
- SET random_page_cost  = 1.2
Second execution of the queries with EXPLAIN ANALYZE

And then, you can compare the pattern behind these queries executions
Regards,

--
Marcos Luis Ortíz Valmaseda
  Sr. Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://www.linkedin.com/in/marcosluis2186
  Twitter: @marcosluis2186




Fin a la injusticia, LIBERTAD AHORA A NUESTROS CINCO COMPATRIOTAS QUE SE ENCUENTRAN INJUSTAMENTE EN PRISIONES DE LOS
EEUU!
http://www.antiterroristas.cu
http://justiciaparaloscinco.wordpress.com

Re: random_page_cost = 2.0 on Heroku Postgres

От

Peter van Hardenberg

Дата:

09 февраля 2012 г., 01:48:30

On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
>> That said, I have access to a very large fleet in which to can collect
>> data so I'm all ears for suggestions about how to measure and would
>> gladly share the results with the list.
>
> I wonder if some kind of script that grabbed random queries and ran
> them with explain analyze and various random_page_cost to see when
> they switched and which plans are faster would work?

We aren't exactly in a position where we can adjust random_page_cost
on our users' databases arbitrarily to see what breaks. That would
be... irresponsible of us.

How would one design a meta-analyzer which we could run across many
databases and collect data? Could we perhaps collect useful
information from pg_stat_user_indexes, for example?

-p

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

Re: random_page_cost = 2.0 on Heroku Postgres

От

Peter van Hardenberg

Дата:

09 февраля 2012 г., 01:54:45

On Wed, Feb 8, 2012 at 6:47 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
> On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
>>> That said, I have access to a very large fleet in which to can collect
>>> data so I'm all ears for suggestions about how to measure and would
>>> gladly share the results with the list.
>>
>> I wonder if some kind of script that grabbed random queries and ran
>> them with explain analyze and various random_page_cost to see when
>> they switched and which plans are faster would work?
>
> We aren't exactly in a position where we can adjust random_page_cost
> on our users' databases arbitrarily to see what breaks. That would
> be... irresponsible of us.
>

Oh, of course we could do this on the session, but executing
potentially expensive queries would still be unneighborly.

Perhaps another way to think of this problem would be that we want to
find queries where the cost estimate is inaccurate.

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

Re: random_page_cost = 2.0 on Heroku Postgres

От

Scott Marlowe

Дата:

09 февраля 2012 г., 03:38:52

On Wed, Feb 8, 2012 at 7:54 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
> On Wed, Feb 8, 2012 at 6:47 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
>> On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>>> On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
>>>> That said, I have access to a very large fleet in which to can collect
>>>> data so I'm all ears for suggestions about how to measure and would
>>>> gladly share the results with the list.
>>>
>>> I wonder if some kind of script that grabbed random queries and ran
>>> them with explain analyze and various random_page_cost to see when
>>> they switched and which plans are faster would work?
>>
>> We aren't exactly in a position where we can adjust random_page_cost
>> on our users' databases arbitrarily to see what breaks. That would
>> be... irresponsible of us.
>>
>
> Oh, of course we could do this on the session, but executing
> potentially expensive queries would still be unneighborly.
>
> Perhaps another way to think of this problem would be that we want to
> find queries where the cost estimate is inaccurate.

Yeah, have a script the user runs for you heroku guys in their spare
time to see what queries are using the most time and then to jangle
the random_page_cost while running them to get an idea what's faster
and why.

Re: random_page_cost = 2.0 on Heroku Postgres

От

"Greg Sabino Mullane"

Дата:

09 февраля 2012 г., 03:40:12

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> You can execute several queries with the three different values provided
> by Scott and Josh.
> - SET random_page_cost  = 2.0
> First execution of the queries with EXPLAIN ANALYZE
> - SET random_page_cost  = 1.4
> Second execution of the queries with EXPLAIN ANALYZE
> - SET random_page_cost  = 1.2
> Second execution of the queries with EXPLAIN ANALYZE

Well, such a tool would ideally be smarter than that, such that
you would run EXPLAIN and compare to the previous plan and
only run EXPLAIN ANALYZE if the plan changed. One could even
decrement rpc slowly and find out at one points it changes,
which would be more interesting than testing arbitrary numbers.
Would lead to some really sweet graphs as well. :)

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201202082338
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAk8zTewACgkQvJuQZxSWSsiprACfTlYKiC4SS1UnERU+1N/2EGhJ
s9AAoIXLJk88hoNHEkWKhUTqikDBtC/B
=S65l
-----END PGP SIGNATURE-----

Re: random_page_cost = 2.0 on Heroku Postgres

От

Marcos Ortiz Valmaseda

Дата:

09 февраля 2012 г., 04:42:14

On 09/02/12 00:09, Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
>
>> You can execute several queries with the three different values provided
>> by Scott and Josh.
>> - SET random_page_cost  = 2.0
>> First execution of the queries with EXPLAIN ANALYZE
>> - SET random_page_cost  = 1.4
>> Second execution of the queries with EXPLAIN ANALYZE
>> - SET random_page_cost  = 1.2
>> Second execution of the queries with EXPLAIN ANALYZE
> Well, such a tool would ideally be smarter than that, such that
> you would run EXPLAIN and compare to the previous plan and
> only run EXPLAIN ANALYZE if the plan changed. One could even
> decrement rpc slowly and find out at one points it changes,
> which would be more interesting than testing arbitrary numbers.
> Would lead to some really sweet graphs as well. :)
>
Well, the MyYearBook.com´s guys built something seemed called Posuta, I
don´t know is this project is alive, but we can ask to them
(michael.glaesemann@myyearbook.com).

http://area51.myyearbook.com
Posuta can be a starting point for it. It uses Ruby and Clojure for core
functionalities, jQuery/Flot for graphics,

--
Marcos Luis Ortíz Valmaseda
  Sr. Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://www.linkedin.com/in/marcosluis2186
  Twitter: @marcosluis2186




Fin a la injusticia, LIBERTAD AHORA A NUESTROS CINCO COMPATRIOTAS QUE SE ENCUENTRAN INJUSTAMENTE EN PRISIONES DE LOS
EEUU!
http://www.antiterroristas.cu
http://justiciaparaloscinco.wordpress.com

Re: random_page_cost = 2.0 on Heroku Postgres

От

Jeff Janes

Дата:

09 февраля 2012 г., 14:32:37

On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
>> Having read the thread, I don't really see how I could study what a
>> more principled value would be.
>
> Agreed.  Just pointing out more research needs to be done.
>
>> That said, I have access to a very large fleet in which to can collect
>> data so I'm all ears for suggestions about how to measure and would
>> gladly share the results with the list.
>
> I wonder if some kind of script that grabbed random queries and ran
> them with explain analyze and various random_page_cost to see when
> they switched and which plans are faster would work?

But if you grab a random query and execute it repeatedly, you
drastically change the caching.

Results from any execution after the first one are unlikely to give
you results which are meaningful to the actual production situation.

Cheers,

Jeff

Re: random_page_cost = 2.0 on Heroku Postgres

От

Peter van Hardenberg

Дата:

09 февраля 2012 г., 21:42:08

Hmm, perhaps we could usefully aggregate auto_explain output.

On Thu, Feb 9, 2012 at 7:32 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Wed, Feb 8, 2012 at 6:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Wed, Feb 8, 2012 at 6:45 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
>>> Having read the thread, I don't really see how I could study what a
>>> more principled value would be.
>>
>> Agreed.  Just pointing out more research needs to be done.
>>
>>> That said, I have access to a very large fleet in which to can collect
>>> data so I'm all ears for suggestions about how to measure and would
>>> gladly share the results with the list.
>>
>> I wonder if some kind of script that grabbed random queries and ran
>> them with explain analyze and various random_page_cost to see when
>> they switched and which plans are faster would work?
>
> But if you grab a random query and execute it repeatedly, you
> drastically change the caching.
>
> Results from any execution after the first one are unlikely to give
> you results which are meaningful to the actual production situation.
>
> Cheers,
>
> Jeff



--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

Re: random_page_cost = 2.0 on Heroku Postgres

От

Scott Marlowe

Дата:

10 февраля 2012 г., 00:29:31

On Thu, Feb 9, 2012 at 3:41 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
> Hmm, perhaps we could usefully aggregate auto_explain output.

How about something where you run a site at random_page cost of x,
then y, then z and you do some aggregating of query times in each.  A
scatter plot should tell you lots.

Re: random_page_cost = 2.0 on Heroku Postgres

От

Josh Berkus

Дата:

10 февраля 2012 г., 18:33:08

On 2/9/12 2:41 PM, Peter van Hardenberg wrote:
> Hmm, perhaps we could usefully aggregate auto_explain output.

The other option is to take a statistical approach.  After all, what you
want to do is optimize average response times across all your user's
databases, not optimize for a few specific queries.

So one thought would be to add in pg_stat_statements to your platform
... something I'd like to see Heroku do anyway.  Then you can sample
this across dozens (or hundreds) of user databases, each with RPC set to
a slightly different level, and aggregate it into a heat map.

That's the way I'd do it, anyway.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: random_page_cost = 2.0 on Heroku Postgres

От

Cédric Villemain

Дата:

10 февраля 2012 г., 18:48:38

Le vendredi 10 février 2012 20:32:50, Josh Berkus a écrit :
> On 2/9/12 2:41 PM, Peter van Hardenberg wrote:
> > Hmm, perhaps we could usefully aggregate auto_explain output.
>
> The other option is to take a statistical approach.  After all, what you
> want to do is optimize average response times across all your user's
> databases, not optimize for a few specific queries.
>
> So one thought would be to add in pg_stat_statements to your platform
> ... something I'd like to see Heroku do anyway.  Then you can sample
> this across dozens (or hundreds) of user databases, each with RPC set to
> a slightly different level, and aggregate it into a heat map.
>
> That's the way I'd do it, anyway.

in such set up, I sometime build a ratio between transactions processed and
CPU usage, many indicators exists, inside and outside DB, that are useful to
combine and use just as a 'this is normal behavior'. It turns to be easy in
the long term to see if things go better or worse.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: random_page_cost = 2.0 on Heroku Postgres

От

Peter van Hardenberg

Дата:

10 февраля 2012 г., 20:31:34

On Fri, Feb 10, 2012 at 11:32 AM, Josh Berkus <josh@agliodbs.com> wrote:
> On 2/9/12 2:41 PM, Peter van Hardenberg wrote:
> So one thought would be to add in pg_stat_statements to your platform
> ... something I'd like to see Heroku do anyway.  Then you can sample
> this across dozens (or hundreds) of user databases, each with RPC set to
> a slightly different level, and aggregate it into a heat map.
>

We've funded some work by Peter Geoghegan to make pg_stat_statements
more useful, but the patch is currently sitting in the commitfest in
need of a champion. I'd very much like to see it landed.

Between that work, 9.2, and Dimitri's extension whitelist module,
pg_stat_statements should be usefully installable by everyone. We'd
seriously consider installing it by default, but that would make us
Not Vanilla, which is something we avoid very diligently.

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

Re: random_page_cost = 2.0 on Heroku Postgres

От

Josh Berkus

Дата:

11 февраля 2012 г., 00:40:57

Peter,

> We've funded some work by Peter Geoghegan to make pg_stat_statements
> more useful, but the patch is currently sitting in the commitfest in
> need of a champion. I'd very much like to see it landed.

Ok, let me review it then ...

> Between that work, 9.2, and Dimitri's extension whitelist module,
> pg_stat_statements should be usefully installable by everyone. We'd
> seriously consider installing it by default, but that would make us
> Not Vanilla, which is something we avoid very diligently.

Bummer.  You can get why this would be useful for autotuning, though, yes?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: random_page_cost = 2.0 on Heroku Postgres

От

Peter van Hardenberg

Дата:

11 февраля 2012 г., 01:13:52

On Fri, Feb 10, 2012 at 5:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Peter,
>
>> We've funded some work by Peter Geoghegan to make pg_stat_statements
>> more useful, but the patch is currently sitting in the commitfest in
>> need of a champion. I'd very much like to see it landed.
>
> Ok, let me review it then ...
>
>> Between that work, 9.2, and Dimitri's extension whitelist module,
>> pg_stat_statements should be usefully installable by everyone. We'd
>> seriously consider installing it by default, but that would make us
>> Not Vanilla, which is something we avoid very diligently.
>
> Bummer.  You can get why this would be useful for autotuning, though, yes?
>

Absolutely! Everything we can do to better monitor users' database
performance without interfering (unduly) with that performance or
affecting their dump/restore cycle is very exciting for us.

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

Re: random_page_cost = 2.0 on Heroku Postgres

От

Jeff Janes

Дата:

11 февраля 2012 г., 15:26:55

On Thu, Feb 9, 2012 at 2:41 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:

> Hmm, perhaps we could usefully aggregate auto_explain output.

By the time you realize the query is running long, it is too late to
start analyzing it.  And without analyzing it, you probably can't get
the information you need.

Maybe with the timing = off feature,it would might make sense to just
preemptively analyze everything.

Cheers,

Jeff

Re: random_page_cost = 2.0 on Heroku Postgres

От

Peter van Hardenberg

Дата:

11 февраля 2012 г., 18:53:04

On Sat, Feb 11, 2012 at 8:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> By the time you realize the query is running long, it is too late to
> start analyzing it.  And without analyzing it, you probably can't get
> the information you need.
>
> Maybe with the timing = off feature,it would might make sense to just
> preemptively analyze everything.
>

As before, I'm reluctant to introduce structural performance costs
across the fleet, though I suppose giving users the option to opt out
might ameliorate that. I don't think I have time right now to
seriously explore these ideas, but I'll keep it in the back of my mind
for a while.

If anyone is interested in seriously exploring the idea of researching
query planner accuracy across an enormous fleet of production
databases with the goal of feeding that information back to the
project please feel free to contact me off-list.

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

Re: random_page_cost = 2.0 on Heroku Postgres

От

Jeff Janes

Дата:

12 февраля 2012 г., 18:49:48

On Thu, Feb 9, 2012 at 5:29 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Feb 9, 2012 at 3:41 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
>> Hmm, perhaps we could usefully aggregate auto_explain output.
>
> How about something where you run a site at random_page cost of x,
> then y, then z and you do some aggregating of query times in each.  A
> scatter plot should tell you lots.

Is there an easy and unintrusive way to get such a metric as the
aggregated query times?  And to normalize it for how much work happens
to have been doing on at the time?

Without a good way to do normalization, you could just do lots of
tests with randomized settings, to average out any patterns in
workload, but that means you need an awful lot of tests to have enough
data to rely on randomization.  But it would be desirable to do this
anyway, in case the normalization isn't as effective as we think.

But how long should each setting be tested for?  If a different
setting causes certain index to start being used, then performance
would go down until those indexes get cached and then increase from
there.  But how long is long enough to allow this to happen?

Thanks,

Jeff

Re: random_page_cost = 2.0 on Heroku Postgres

От

Joshua Berkus

Дата:

12 февраля 2012 г., 19:02:21

> Is there an easy and unintrusive way to get such a metric as the
> aggregated query times?  And to normalize it for how much work
> happens
> to have been doing on at the time?

You'd pretty much need to do large-scale log harvesting combined with samples of query concurrency taken several times
perminute.  Even that won't "normalize" things the way you want, though, since all queries are not equal in terms of
theamount of data they hit. 

Given that, I'd personally take a statistical approach.  Sample query execution times across a large population of
serversand over a moderate amount of time.  Then apply common tests of statistical significance.  This is why Heroku
hasthe opportunity to do this in a way that smaller sites could not; they have enough servers to (probably) cancel out
anyrandom activity effects. 

--Josh Berkus

Re: random_page_cost = 2.0 on Heroku Postgres

От

Peter van Hardenberg

Дата:

12 февраля 2012 г., 21:29:04

On Sun, Feb 12, 2012 at 12:01 PM, Joshua Berkus <josh@agliodbs.com> wrote:
> You'd pretty much need to do large-scale log harvesting combined with samples of query concurrency taken several
timesper minute.  Even that won't "normalize" things the way you want, though, since all queries are not equal in terms
ofthe amount of data they hit. 
>
> Given that, I'd personally take a statistical approach.  Sample query execution times across a large population of
serversand over a moderate amount of time.  Then apply common tests of statistical significance.  This is why Heroku
hasthe opportunity to do this in a way that smaller sites could not; they have enough servers to (probably) cancel out
anyrandom activity effects. 
>

Yes, I think if we could normalize, anonymize, and randomly EXPLAIN
ANALYZE 0.1% of all queries that run on our platform we could look for
bad choices by the planner. I think the potential here could be quite
remarkable.

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

Re: random_page_cost = 2.0 on Heroku Postgres

От

Peter Geoghegan

Дата:

12 февраля 2012 г., 22:37:34

On 12 February 2012 22:28, Peter van Hardenberg <pvh@pvh.ca> wrote:
> Yes, I think if we could normalize, anonymize, and randomly EXPLAIN
> ANALYZE 0.1% of all queries that run on our platform we could look for
> bad choices by the planner. I think the potential here could be quite
> remarkable.

Tom Lane suggested that plans, rather than the query tree, might be a
more appropriate thing for the new pg_stat_statements to be hashing,
as plans should be directly blamed for execution costs. While I don't
think that that's appropriate for normalisation (consider that there'd
often be duplicate pg_stat_statements entries per query), it does seem
like an idea that could be worked into a future revision, to detect
problematic plans. Maybe it could be usefully combined with
auto_explain or something like that (in a revision of auto_explain
that doesn't necessarily explain every plan, and therefore doesn't pay
the considerable overhead of that instrumentation across the board).

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: random_page_cost = 2.0 on Heroku Postgres