Обсуждение: understanding explain data

Поиск
Список
Период
Сортировка

understanding explain data

От
Sim Zacks
Дата:
I am looking at the explain data for my query and it mostly
understandable thanks to an excellent article by Jim Nasby,
http://www.pervasive-postgres.com/instantkb13/article.aspx?id=10120

It is very time consuming and confusing walking through the explain. Is
there a possibility (or does it even make sense), to be able to feed the
  explain into an application that would tell you what you need to do to
optimize the query?

Something such as: with this explain data, adding an index on table tbl
column A would drastically improve the efficiency. Or at least an
application that would say, the least efficient part of your query is on
this part of the code so that you could more easily figure out what to
do about it.

Re: understanding explain data

От
Alban Hertroys
Дата:
Sim Zacks wrote:
> Something such as: with this explain data, adding an index on table tbl
> column A would drastically improve the efficiency. Or at least an
> application that would say, the least efficient part of your query is on
> this part of the code so that you could more easily figure out what to
> do about it.

The latter part is the most useful IMO, optimizing usually needs a
(human) brain to put things into the right perspective. Adding an index
can speed up your queries only so much, a more optimal data presentation
  (like moving calculations to insert/update instead of select) can do a
lot more sometimes.

It looks like something like that shouldn't be too hard to write...
Maybe it even does exist already. Personally I'd prefer a command line
tool ;)
It would help if you can pipe the output of explain analyze to an
external tool from within psql.

Regards,
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: understanding explain data

От
"chris smith"
Дата:
On 5/10/06, Alban Hertroys <alban@magproductions.nl> wrote:
> Sim Zacks wrote:
> > Something such as: with this explain data, adding an index on table tbl
> > column A would drastically improve the efficiency. Or at least an
> > application that would say, the least efficient part of your query is on
> > this part of the code so that you could more easily figure out what to
> > do about it.
>
> The latter part is the most useful IMO, optimizing usually needs a
> (human) brain to put things into the right perspective. Adding an index
> can speed up your queries only so much, a more optimal data presentation
>   (like moving calculations to insert/update instead of select) can do a
> lot more sometimes.
>
> It looks like something like that shouldn't be too hard to write...
> Maybe it even does exist already. Personally I'd prefer a command line
> tool ;)
> It would help if you can pipe the output of explain analyze to an
> external tool from within psql.

I've thought about writing a similar tool.. I'm about 30% of the way :)

It's written in python and can grab the queries out of the db logs..
but the harder part is working out the explain output.. also taking in
to consideration an index might be available but not the best option
for the query.

I guess the easiest way to check is to have the script turn seq scans
off when it runs explain and go from there.

If anyone's interested in helping it go further contact me off list
(can put it on pgfoundry.org and go from there if need be).

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: understanding explain data

От
Sim Zacks
Дата:
I agree with you that an index isn't always the answer, that was more of
an example. I was thinking more along the lines of an intelligent part
of the database that has access to the statistics and would be able to
spit out recommendations for the query.

Such as, I type in a monster query and say optimize and in would be able
to spit out 4 smaller views that use each other and generate the same
result in 10% of the time. Or to say this query is optimized, but you
need an index on these columns.

I disagree with you that a human brain would be better then a machine
for optimizing purposes. If the system is programmed to optimize
correctly, then it will when to stick data into a temp table and update
columns instead of doing a select because x number of joins are too much
for the select. Humans may not know the optimal number of joins before
the query becomes inefficent.

Alban Hertroys wrote:
> Sim Zacks wrote:
>> Something such as: with this explain data, adding an index on table
>> tbl column A would drastically improve the efficiency. Or at least an
>> application that would say, the least efficient part of your query is
>> on this part of the code so that you could more easily figure out what
>> to do about it.
>
> The latter part is the most useful IMO, optimizing usually needs a
> (human) brain to put things into the right perspective. Adding an index
> can speed up your queries only so much, a more optimal data presentation
>  (like moving calculations to insert/update instead of select) can do a
> lot more sometimes.
>
> It looks like something like that shouldn't be too hard to write...
> Maybe it even does exist already. Personally I'd prefer a command line
> tool ;)
> It would help if you can pipe the output of explain analyze to an
> external tool from within psql.
>
> Regards,

Re: understanding explain data

От
Alban Hertroys
Дата:
Sim Zacks wrote:
> I disagree with you that a human brain would be better then a machine
> for optimizing purposes. If the system is programmed to optimize
> correctly, then it will when to stick data into a temp table and update
> columns instead of doing a select because x number of joins are too much
> for the select. Humans may not know the optimal number of joins before
> the query becomes inefficent.

You're thinking about single queries here ;) A piece of software can
very well optimize a single query, but it will be limited to that.

I was referring to changing parts of your database so that the query can
be written in a simpler, more optimal way.

As an example, say that you have a tree structure of factories, each
with a multitude of departments, and you want to query the number of
employees (sum of all departments) in a specific factory.

You could write a query to read in all those departments and then count
all the employees in them.

You could also keep track of the the employee-count for each department
when you insert or delete employee records, and summarize these values
for each factory record (using a trigger, most likely). If you do that,
you only need to read the employee-count of the factory record.

Which do you think is faster? And how do you think a piece of software
can get to this solution only from reading the explain output of your query?

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: understanding explain data

От
Richard Huxton
Дата:
Alban Hertroys wrote:
> Sim Zacks wrote:
>> Something such as: with this explain data, adding an index on table
>> tbl column A would drastically improve the efficiency. Or at least an
>> application that would say, the least efficient part of your query is
>> on this part of the code so that you could more easily figure out what
>> to do about it.
>
> The latter part is the most useful IMO, optimizing usually needs a
> (human) brain to put things into the right perspective. Adding an index
> can speed up your queries only so much, a more optimal data presentation
>  (like moving calculations to insert/update instead of select) can do a
> lot more sometimes.
>
> It looks like something like that shouldn't be too hard to write...
> Maybe it even does exist already. Personally I'd prefer a command line
> tool ;)
> It would help if you can pipe the output of explain analyze to an
> external tool from within psql.

I've got a short perl script that I throw explain output into. It's not
brilliant - can give false positives, but it will usually give you two
or three lines to look at.

For those that are interested, I've attached it.

--
   Richard Huxton
   Archonet Ltd

Вложения

Re: understanding explain data

От
Sim Zacks
Дата:
Now you're talking about data warehouse design and not optimizing
queries, though they are obviously interrelated. A human looking at the
explain data would not be able to determine that it would be better to
have a summary table either.

However, first you would want to optimize your queries and if that
didn't bring the desired performance, then you would consider adding
denormalization statistics. An experienced DBA would know when to put in
the statistics tables to begin with.

Alban Hertroys wrote:
> Sim Zacks wrote:
>> I disagree with you that a human brain would be better then a machine
>> for optimizing purposes. If the system is programmed to optimize
>> correctly, then it will when to stick data into a temp table and
>> update columns instead of doing a select because x number of joins are
>> too much for the select. Humans may not know the optimal number of
>> joins before the query becomes inefficent.
>
> You're thinking about single queries here ;) A piece of software can
> very well optimize a single query, but it will be limited to that.
>
> I was referring to changing parts of your database so that the query can
> be written in a simpler, more optimal way.
>
> As an example, say that you have a tree structure of factories, each
> with a multitude of departments, and you want to query the number of
> employees (sum of all departments) in a specific factory.
>
> You could write a query to read in all those departments and then count
> all the employees in them.
>
> You could also keep track of the the employee-count for each department
> when you insert or delete employee records, and summarize these values
> for each factory record (using a trigger, most likely). If you do that,
> you only need to read the employee-count of the factory record.
>
> Which do you think is faster? And how do you think a piece of software
> can get to this solution only from reading the explain output of your
> query?
>

Re: understanding explain data

От
Sim Zacks
Дата:
Thanks for the script.
It does a great job of finding exactly which path is taking the most
time. Now for the hard part. Why is that part taking the longest time.

Richard Huxton wrote:
> I've got a short perl script that I throw explain output into. It's not
> brilliant - can give false positives, but it will usually give you two
> or three lines to look at.
>
> For those that are interested, I've attached it.
>
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

Re: understanding explain data

От
"Jim C. Nasby"
Дата:
On Wed, May 10, 2006 at 11:00:14PM +1000, chris smith wrote:
> On 5/10/06, Alban Hertroys <alban@magproductions.nl> wrote:
> >Sim Zacks wrote:
> >> Something such as: with this explain data, adding an index on table tbl
> >> column A would drastically improve the efficiency. Or at least an
> >> application that would say, the least efficient part of your query is on
> >> this part of the code so that you could more easily figure out what to
> >> do about it.
> >
> >The latter part is the most useful IMO, optimizing usually needs a
> >(human) brain to put things into the right perspective. Adding an index
> >can speed up your queries only so much, a more optimal data presentation
> >  (like moving calculations to insert/update instead of select) can do a
> >lot more sometimes.
> >
> >It looks like something like that shouldn't be too hard to write...
> >Maybe it even does exist already. Personally I'd prefer a command line
> >tool ;)
> >It would help if you can pipe the output of explain analyze to an
> >external tool from within psql.
>
> I've thought about writing a similar tool.. I'm about 30% of the way :)
>
> It's written in python and can grab the queries out of the db logs..
> but the harder part is working out the explain output.. also taking in
> to consideration an index might be available but not the best option
> for the query.
>
> I guess the easiest way to check is to have the script turn seq scans
> off when it runs explain and go from there.
>
> If anyone's interested in helping it go further contact me off list
> (can put it on pgfoundry.org and go from there if need be).

There's been some discussion about allowing EXPLAIN to produce
machine-readable output, probably in XML. I agree that it would be a lot
easier if there was some way you could take explain output and plug it
into some tool that would present an easier to understand format. I tend
to shy away from helping people that post EXPLAIN just because it's such
a PITA to read (especially if their email program word-wrapped the
output).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461