Обсуждение: BASH script for collecting analyze-related info

Поиск
Список
Период
Сортировка

BASH script for collecting analyze-related info

От
Ken Tanzer
Дата:
I just sent off to this list for query help, and found the process of gathering all the requested info somewhat tedious.  So I created a little BASH script to try to pull together as much of this information as possible.  

The script reads an analyze file, and generates SQL queries to retrieve the following information:

  • Postgres Version
  • Changes to postgresql.conf
  • Description of all tables scanned
  • Description of all Indices
  • Actual and estimated row counts for all the tables

Hopefully this may be of use to others as well.  This list doesn't include definitions for any views or custom functions--I don't think they're in the analyze output, but if they are please let me know.

Any comments or suggestions for improvement would be most welcome.  Thanks.

Ken

p.s.,  This script runs fine on my computer (Ubuntu 13.04), but on a Fedora 11 machine it dies with 

pg_analyze_info.sh: line 18: unexpected EOF while looking for matching `)'
pg_analyze_info.sh: line 57: syntax error: unexpected end of file

If anyone knows why, or encounters a similar error and fixes it, please let me know!


--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.
Вложения

Re: BASH script for collecting analyze-related info

От
Craig James
Дата:

On Sun, Sep 29, 2013 at 2:09 AM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
I just sent off to this list for query help, and found the process of gathering all the requested info somewhat tedious.  So I created a little BASH script to try to pull together as much of this information as possible.  

The script reads an analyze file, and generates SQL queries to retrieve the following information:

  • Postgres Version
  • Changes to postgresql.conf
  • Description of all tables scanned
  • Description of all Indices
  • Actual and estimated row counts for all the tables

Hopefully this may be of use to others as well.  This list doesn't include definitions for any views or custom functions--I don't think they're in the analyze output, but if they are please let me know.

Any comments or suggestions for improvement would be most welcome.  Thanks.

Ken

p.s.,  This script runs fine on my computer (Ubuntu 13.04), but on a Fedora 11 machine it dies with 

pg_analyze_info.sh: line 18: unexpected EOF while looking for matching `)'
pg_analyze_info.sh: line 57: syntax error: unexpected end of file

If anyone knows why, or encounters a similar error and fixes it, please let me know!

It's the blank line on line 26.  Put a backslash on that line or delete it entirely.

Craig
 


--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: BASH script for collecting analyze-related info

От
Ken Tanzer
Дата:
p.s.,  This script runs fine on my computer (Ubuntu 13.04), but on a Fedora 11 machine it dies with 

pg_analyze_info.sh: line 18: unexpected EOF while looking for matching `)'
pg_analyze_info.sh: line 57: syntax error: unexpected end of file

If anyone knows why, or encounters a similar error and fixes it, please let me know!

It's the blank line on line 26.  Put a backslash on that line or delete it entirely.

Craig


Well that made perfect sense, but after making that change I'm still getting the same error.

# Get tables
TABLES=$( \
cat <( \
# Indexed tables \
egrep -o 'Index Scan using \b[a-zA-Z0-9_-]* on [a-zA-Z0-9_-]*' $FILE  | cut -f 6 -d ' ' \
) <( \
# Scanned Tables \
egrep -o 'Seq Scan on \b[a-zA-Z0-9_-]* ' $FILE | cut -f 4 -d ' ' \
) | sort | uniq )



Re: BASH script for collecting analyze-related info

От
Craig James
Дата:
On Sun, Sep 29, 2013 at 2:24 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
p.s.,  This script runs fine on my computer (Ubuntu 13.04), but on a Fedora 11 machine it dies with 

pg_analyze_info.sh: line 18: unexpected EOF while looking for matching `)'
pg_analyze_info.sh: line 57: syntax error: unexpected end of file

If anyone knows why, or encounters a similar error and fixes it, please let me know!

It's the blank line on line 26.  Put a backslash on that line or delete it entirely.

Craig


Well that made perfect sense, but after making that change I'm still getting the same error.

Try putting the entire TABLE= on one big line and see if that works.  If it does, work backwards from there, inserting backslash-newlines.  It may be something simple like an extra space after a backslash.

Craig

Re: BASH script for collecting analyze-related info

От
hubert depesz lubaczewski
Дата:
On nie, wrz 29, 2013 at 02:09:07 -0700, Ken Tanzer wrote:
> p.s.,  This script runs fine on my computer (Ubuntu 13.04), but on a Fedora
> 11 machine it dies with
>
> pg_analyze_info.sh: line 18: unexpected EOF while looking for matching `)'
> pg_analyze_info.sh: line 57: syntax error: unexpected end of file
>
> If anyone knows why, or encounters a similar error and fixes it, please let
> me know!

Fixed by changing it to:

#v+
# Get tables
TABLES=$(
cat <(
# Indexed tables
egrep -o 'Index Scan using \b[a-zA-Z0-9_-]* on [a-zA-Z0-9_-]*' $FILE  | cut -f 6 -d ' '
) <(
# Scanned Tables
egrep -o 'Seq Scan on \b[a-zA-Z0-9_-]* ' $FILE | cut -f 4 -d ' '
) | sort | uniq

)
#v-

That is - I removed the "\" at the end - it's of no use.

There are couple of issues/questions though:
1. instead of: "SELECT 'Postgres Version';" it's better to use \echo Postgres Version
2. why is there union with nulls in the last query?
3. When extracting table names you are missing:
   a. Index Scan Backward
   b. Bitmap Heap Scan
4. When extracting index names, you're missing Index Only Scans and Index Scan
   Backwards.
5. The whole script will fail if you're using table names with spaces (not that
   I think this is sane, but the script should recognize it)
6. It's generally better to use
   if [[ ...
   than
   if [ ...
   reason - [[ is internal for bash, while [ is fork to external program
7. instead of | sort | uniq, it's better to use sort -u
8. usage of all-upper-case variables in bash is (by some, more
   bash-skilled people, like on #bash on irc.freenode) frowned upon.
   all-uppercase is supposed to be for environment variables only.

All in all - looks pretty good.

depesz


Re: BASH script for collecting analyze-related info

От
Ken Tanzer
Дата:
Thanks for the suggestions, help and feedback.  New version attached.


3. When extracting table names you are missing:
   a. Index Scan Backward
   b. Bitmap Heap Scan
4. When extracting index names, you're missing Index Only Scans and Index Scan Backwards.

If someone can send me analyze output with these characteristics, I'll try to get the script to pick them up. 


5. The whole script will fail if you're using table names with spaces (not that
   I think this is sane, but the script should recognize it)

Uggh yes.  I imagine it will fail on international characters as well.  Not sure if I want to tackle that right now, though suggestions welcome.  (Or if someone else wants to do it!)  I did tweak so that quoted identifiers will work, e.g. mixed case field names, and also ones with the $ sign.  I completely understand people using other languages, but really do people need spaces in their names? :)


2. why is there union with nulls in the last query?

Laziness, convenience or expediency, pick your preferred label.  I needed something to go with the last "UNION" that was generated.  I changed it to do this more cleanly.


1. instead of: "SELECT 'Postgres Version';" it's better to use \echo Postgres Version

Much better.  I used qecho so it can be redirected with the rest of the output.


That is - I removed the "\" at the end - it's of no use.

Great.  I ended up taking them all out.

 
6. It's generally better to use
   if [[ ...
   than
   if [ ...
   reason - [[ is internal for bash, while [ is fork to external program
7. instead of | sort | uniq, it's better to use sort -u

Check and check, did both of these

 
8. usage of all-upper-case variables in bash is (by some, more
   bash-skilled people, like on #bash on irc.freenode) frowned upon.
   all-uppercase is supposed to be for environment variables only.

Personally I like the upper case names as they stand out easily in the script.  But I'd hate to be frowned on by the bashers (or bashed by the frowners), so I changed them to lower case.

 
All in all - looks pretty good.


Thanks!
 
Cheers,
Ken

--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.
Вложения

Re: BASH script for collecting analyze-related info

От
Ken Tanzer
Дата:
On Wed, Oct 9, 2013 at 10:39 AM, John Melesky <john.melesky@rentrakmail.com> wrote:
(off-list)
(on-list)
 
I doubt I'm the first to ask, but have you considered putting this up on github (or similar) so others can contribute more easily and keep it up-to-date?

If that's not in the cards, are you opposed to someone else putting it up on github and taking over management of it?

-john


Actually, you definitely were the first to ask.  I'm not opposed to any combination of putting this on github, continuing to work on this script, or having someone else do it.  But before doing so (or to start), let me throw out a few questions:

First and foremost (and primarily directed to people who are kind enough to provide help on this list), is a script like this worthwhile?  Will it help get better problem reports and save back-and-forth time of "please post x, y and z?"  If not, I don't see the point of pursuing this.

If it is worthwhile, what's the best way of going about getting the necessary information?  Parsing the contents of EXPLAIN output that wasn't designed for this purpose seems doable, but clunky at best.  And it doesn't tell you which views are involved.

In an ideal situation, I'd see this being built into postgres, so that you could do something like EXPLAIN [ANALYZE] DESCRIBE ..., and get your descriptions directly as part of the output.  If that is pure fantasy, is there any way to directly identify all the tables, views and indexes that are involved in a query plan?

If no to the above, then parsing the analyze output seems the only option.  I was pondering what language a script to do this should be written in, and I'm thinking that writing it in pgpsql might be the best option, since it would be the most portable and presumably available on all systems.  I haven't fully thought that through, so I'm wondering if anyone sees reasons that wouldn't work, or if some other language would be a better or more natural choice.

Cheers,
Ken
 

--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: BASH script for collecting analyze-related info

От
Ken Tanzer
Дата:
Hey John, and thanks for the input.

On Wed, Oct 16, 2013 at 11:00 AM, John Melesky <john.melesky@rentrakmail.com> wrote:
On Wed, Oct 16, 2013 at 12:20 AM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
First and foremost (and primarily directed to people who are kind enough to provide help on this list), is a script like this worthwhile?  Will it help get better problem reports and save back-and-forth time of "please post x, y and z?"  If not, I don't see the point of pursuing this.

Given the amount of back-and-forth that's already happened in this thread, it looks like the script meets at least some needs.

Agreed.  It definitely met my immediate need.  But I'm actually seeing the lack of responses from "list helpers" as an indication this might not be of much general interest.  Of course that's ok too. :)


(Quoting myself.)  In an ideal situation, I'd see this being built into postgres, so that you could do something like EXPLAIN [ANALYZE] DESCRIBE ..., and get your descriptions directly as part of the output.  If that is pure fantasy, is there any way to directly identify all the tables, views and indexes that are involved in a query plan?

This approach seems like the only way you could really get from half-assed to Ass 1.0.  I'll just keep it in the pipe dream category unless or until someone else says otherwise.  A script it is, with a possible goal of getting to Ass 0.7.
 
 
If it is worthwhile, what's the best way of going about getting the necessary information?  Parsing the contents of EXPLAIN output that wasn't designed for this purpose seems doable, but clunky at best.  And it doesn't tell you which views are involved.

Well, EXPLAIN can actually output in a couple of more machine-readable formats (XML, JSON, and YAML), which would be a good place to start. I'm not sure what the best way to get view information would be (aside from potentially parsing the query itself), but I'm sure someone has an idea.
If that is pure fantasy, is there any way to directly identify all the tables, views and indexes that are involved in a query plan? 

Tables and indexes you can get from explaining with a machine-readable format, then traversing the tree looking for 'Relation Name' or 'Index Name' attributes.

I'm not seeing how these other output formats would help much.  They don't seem to contain additional information (and still not the views).  Plus, I'd think it would be best to make sure the describes are based on the same explain.  Since the "regular"? format seems to be what is submitted to the mailing list, it seems best to just stick with parsing that.
 
If no to the above, then parsing the analyze output seems the only option.  I was pondering what language a script to do this should be written in, and I'm thinking that writing it in pgpsql might be the best option, since it would be the most portable and presumably available on all systems.  I haven't fully thought that through, so I'm wondering if anyone sees reasons that wouldn't work, or if some other language would be a better or more natural choice.

I'm a proponent of just taking what you have and publishing it. That lets people use it who need it now, and makes it easier for others to improve it. 

I'm not really sure how putting this in on github and linking to it here makes it any easier to access than the version attached in this thread, but here goes.  https://github.com/ktanzer/pg_analyze_info

If there's a need for a cross-platform version, that's more likely to happen if people who use that platform can test that script for you.

I was thinking that a pgpsql version would be inherently more cross-platform, and definitely more available.  A bash script seems to cut out at least most of the Windows users.
 
Notably, publishing what you have now won't in any way prevent you from rewriting it in a plpgsql function and packaging that up on pgxn later.

Yup.

Cheers,
Ken

 
--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.