Обсуждение: Are indexes blown?

Поиск
Список
Период
Сортировка

Are indexes blown?

От
"Phoenix Kiula"
Дата:
Hi,

I have an index on the user_id field in the query below:

   myuser=# delete from clients where user_id like '64.22.91.%';
   DELETE 22
   Time: 220324.975 ms

Is there any reason why it's taking 220 seconds to run this simple
query? There are about 3 million rows in this table.

How can I debug this? How can I check if the index is bloated or
blown? From the VACUUM ANALYZE output, nothing like this is apparent.

Thanks.

Re: Are indexes blown?

От
"Shoaib Mir"
Дата:


On Fri, Feb 15, 2008 at 3:55 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
Hi,

I have an index on the user_id field in the query below:

  myuser=# delete from clients where user_id like '64.22.91.%';
  DELETE 22
  Time: 220324.975 ms

Is there any reason why it's taking 220 seconds to run this simple
query? There are about 3 million rows in this table.


Use the 'pgstattuple' contrib module --> http://www.postgresql.org/docs/current/static/pgstattuple.html

pgstatindex function from the contrib module should be able to help you there.

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: Are indexes blown?

От
Richard Huxton
Дата:
Phoenix Kiula wrote:
> Hi,
>
> I have an index on the user_id field in the query below:
>
>    myuser=# delete from clients where user_id like '64.22.91.%';
>    DELETE 22
>    Time: 220324.975 ms
>
> Is there any reason why it's taking 220 seconds to run this simple
> query? There are about 3 million rows in this table.

First guess is that it's not using the index. What does
  EXPLAIN ANALYSE SELECT user_id FROM clients WHERE user_id LIKE '...'
show?

Check the list archives for locale and like and text_pattern_ops too -
that's a good place to check.

--
   Richard Huxton
   Archonet Ltd

Re: Are indexes blown?

От
"Phoenix Kiula"
Дата:
On 15/02/2008, Shoaib Mir <shoaibmir@gmail.com> wrote:
>
>
> On Fri, Feb 15, 2008 at 3:55 PM, Phoenix Kiula <phoenix.kiula@gmail.com>
> wrote:
> > Hi,
> >
> > I have an index on the user_id field in the query below:
> >
> >   myuser=# delete from clients where user_id like '64.22.91.%';
> >   DELETE 22
> >   Time: 220324.975 ms
> >
> > Is there any reason why it's taking 220 seconds to run this simple
> > query? There are about 3 million rows in this table.
> >
> >
>
> Use the 'pgstattuple' contrib module -->
> http://www.postgresql.org/docs/current/static/pgstattuple.html
>
> pgstatindex function from the contrib module should be able to help you
> there.



How should I install a "contrib" without bringing down my database, or
stopping it, or doing ANYTHING to it? It's in production. I can't
touch it. Will it be installed on the side and then I simply start
using it?

Re: Are indexes blown?

От
"Phoenix Kiula"
Дата:
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:

> First guess is that it's not using the index. What does
>   EXPLAIN ANALYSE SELECT user_id FROM clients WHERE user_id LIKE '...'
>  show?
>
>  Check the list archives for locale and like and text_pattern_ops too -
>  that's a good place to check.


There is nothing to do with locale. The same database has been working
just fine for 2 years. Why should this be an issue now?

When I ran the EXPLAIN SELECT, the database was hanging. Or taking too
much time (waiting for 5 minutes), or whatever. I cancelled it.

That's the problem. It works, then it doesn't. Then it works again. I
am guessing it could be the load, but there's nothing new in terms of
load that should be causing this!

Re: Are indexes blown?

От
Richard Huxton
Дата:
Phoenix Kiula wrote:
> On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
>
>> First guess is that it's not using the index. What does
>>   EXPLAIN ANALYSE SELECT user_id FROM clients WHERE user_id LIKE '...'
>>  show?
>>
>>  Check the list archives for locale and like and text_pattern_ops too -
>>  that's a good place to check.
>
>
> There is nothing to do with locale. The same database has been working
> just fine for 2 years. Why should this be an issue now?

No reason, but you hadn't said this was a change in behaviour, just that
it seemed slow.

> When I ran the EXPLAIN SELECT, the database was hanging. Or taking too
> much time (waiting for 5 minutes), or whatever. I cancelled it.
>
> That's the problem. It works, then it doesn't. Then it works again. I
> am guessing it could be the load, but there's nothing new in terms of
> load that should be causing this!

Ah, more new information! This does seem to point to the load,
particularly if it's exactly the same query each time. So what do
top/vmstat etc show for these "go-slow" periods?

--
   Richard Huxton
   Archonet Ltd

Re: Are indexes blown?

От
"Shoaib Mir"
Дата:


On Fri, Feb 15, 2008 at 5:18 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
 


How should I install a "contrib" without bringing down my database, or
stopping it, or doing ANYTHING to it? It's in production. I can't
touch it. Will it be installed on the side and then I simply start
using it?



You do not need to restart the database server for that purpose as all you need is the pgstattuple.so file copied to <PG-HOME>/lib folder.

Do the following (in case you have installed server from source):

- Go to the <PostgreSQL-source>/contrib/pgstattuple folder
- run make and make install (this will copy pgstattuple.so file to the lib folder of your PostgreSQL installation)
- Now from psql execute the pgstattuple.sql file for that specific database which can be found in <PG-HOME>/share/contrib folder
- Once the sql file is executed now you can use the pgstattuple function

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: Are indexes blown?

От
"Phoenix Kiula"
Дата:
Actually my host has just told me that I have a number of "hung
semaphores" in my server. And he is relating them to postgresql. I am
not surprised, because this is the only utility that has issues. All
the rest is working (apache, mysql, exim, etc). Any thoughts on where
I should start looking for hung semaphores?

Re: Are indexes blown?

От
"Phoenix Kiula"
Дата:
On 15/02/2008, Shoaib Mir <shoaibmir@gmail.com> wrote:
>
> You do not need to restart the database server for that purpose as all you
> need is the pgstattuple.so file copied to <PG-HOME>/lib folder.
>
> Do the following (in case you have installed server from source):
>
> - Go to the <PostgreSQL-source>/contrib/pgstattuple folder
> - run make and make install (this will copy pgstattuple.so file to the lib
> folder of your PostgreSQL installation)
> - Now from psql execute the pgstattuple.sql file for that specific database
> which can be found in <PG-HOME>/share/contrib folder
>  - Once the sql file is executed now you can use the pgstattuple function



Thanks. But I had installed from rpm. Can I just download that .so
file and put in the lib folder for pgsql and then start using it?

Re: Are indexes blown?

От
"Shoaib Mir"
Дата:


On Fri, Feb 15, 2008 at 6:13 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:


Thanks. But I had installed from rpm. Can I just download that .so
file and put in the lib folder for pgsql and then start using it?


Well I would say download the source for the same version you have, copy it to your desktop machine, build it and then build the .so file for contrib module using 'make' and 'make install'.... once that is done copy the .so from lib folder of PG to your production PG box's lib folder.

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: Are indexes blown?

От
Richard Huxton
Дата:
Phoenix Kiula wrote:
> On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
>
>> Ah, more new information! This does seem to point to the load,
>>  particularly if it's exactly the same query each time. So what do
>>  top/vmstat etc show for these "go-slow" periods?
>
> In included top and vmstat info in my other post yesterday, but here
> it is again:

Ah, you had a post yesterday!

(goes away, searches for previous post)
   http://archives.postgresql.org/pgsql-general/2008-02/msg00689.php
   PG quitting sporadically!!

Right, OK. Firstly, stop worrying about index usage and/or bloat. You
have unexplained process crashes to deal with first. There's no point in
looking at indexes until you figure out what is killing your processes.

Secondly, a single line from vmstat isn't useful, you want to compare
what is happening when things are fine with when they aren't. Leave
vmstat 10 logging to a file so you can catch it.

Thirdly, have you upgraded to the latest 8.2 (8.2.6) yet?

I see you've reduced work_mem, that's good.

Oh, you might as well lower max_connections from 150 too, there's no way
you can support that many concurrent queries anyway.


The fact that you're seeing various strange socket-related problems is
odd. As is the fact that logging doesn't seem to work for you.

Are you sure the two sets of vmstat/top figures are from when PG was
crashing/running queries slow? Everything seems idle to me in those figures.

--
   Richard Huxton
   Archonet Ltd

Re: Are indexes blown?

От
Harald Fuchs
Дата:
In article <bf54be870802150517q2dce6219kd5633ffb99e49d8b@mail.gmail.com>,
"Shoaib Mir" <shoaibmir@gmail.com> writes:

> On Fri, Feb 15, 2008 at 6:13 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:



>     Thanks. But I had installed from rpm. Can I just download that .so
>     file and put in the lib folder for pgsql and then start using it?



> Well I would say download the source for the same version you have, copy it to
> your desktop machine, build it and then build the .so file for contrib module
> using 'make' and 'make install'.... once that is done copy the .so from lib
> folder of PG to your production PG box's lib folder.

But you have to ensure that you build PostgreSQL on your desktop
machine in exactly the same way as the RPM got built
(integer_datetimes etc).

Re: Are indexes blown?

От
"Douglas McNaught"
Дата:
On 2/15/08, Harald Fuchs <hari.fuchs@googlemail.com> wrote:

>  But you have to ensure that you build PostgreSQL on your desktop
>  machine in exactly the same way as the RPM got built
>  (integer_datetimes etc).

It'd probably be much easier to just install the -contrib RPM.  :)
--
-Doug

Re: Are indexes blown?

От
"Phoenix Kiula"
Дата:
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
> Phoenix Kiula wrote:
>  > On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
>  >
>  >> Ah, more new information! This does seem to point to the load,
>  >>  particularly if it's exactly the same query each time. So what do
>  >>  top/vmstat etc show for these "go-slow" periods?
>  >
>  > In included top and vmstat info in my other post yesterday, but here
>  > it is again:
>
>
> Ah, you had a post yesterday!
>
>  (goes away, searches for previous post)
>    http://archives.postgresql.org/pgsql-general/2008-02/msg00689.php
>    PG quitting sporadically!!
>
>  Right, OK. Firstly, stop worrying about index usage and/or bloat. You
>  have unexplained process crashes to deal with first. There's no point in
>  looking at indexes until you figure out what is killing your processes.
>
>  Secondly, a single line from vmstat isn't useful, you want to compare
>  what is happening when things are fine with when they aren't. Leave
>  vmstat 10 logging to a file so you can catch it.
>
>  Thirdly, have you upgraded to the latest 8.2 (8.2.6) yet?
>
>  I see you've reduced work_mem, that's good.
>
>  Oh, you might as well lower max_connections from 150 too, there's no way
>  you can support that many concurrent queries anyway.
>
>
>  The fact that you're seeing various strange socket-related problems is
>  odd. As is the fact that logging doesn't seem to work for you.
>
>  Are you sure the two sets of vmstat/top figures are from when PG was
>  crashing/running queries slow? Everything seems idle to me in those figures.



No. They are the vmstat figures from when I was replying to your
email. What will vmstat tell me and how should I set it up to do
"vmstat 10 logging"?

Btw, postgresql logging is working. But here're the kind of things I
have in there:


LOG:  test message did not get through on socket for statistics collector
LOG:  disabling statistics collector for lack of working socket
LOG:  database system was shut down at 2008-02-15 06:12:10 CST
LOG:  checkpoint record is at 8/E785304C
LOG:  redo record is at 8/E785304C; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/296892698; next OID: 97929
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready

LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on client connection


Now I don't know what is wrong or even where I should look. Postgresql
is often taking quite a bit of memory and CPU resources.

I've reduced work_mem to 10MB and Max_connections to 100. (Anyway, the
old values were working just fine until recently!)

The biggest problem: when I su into postgres user and do a psql to get
into the PG console in my SSH, it takes a whole lot of time to come
up! It used to come up in a jiffy earlier!!! It now shows me this
error:

 ~ >
psql: could not connect to server: Connection timed out
        Is the server running on host "localhost" and accepting
        TCP/IP connections on port 5432?

Then, five minutes later, I can connect again! In less than a second!
What gives?

Finally, very simple queries like this one:

select url, disable_in_statistics, id, user_known from links where
alias = '1yqw7' and status = 'Y' limit 1

Which used to be server in "5 ms" (0.005 seconds)  are now taking
upwards of 200 seconds! Your suggestion to "Explain Analyze" --

=# explain analyze select url, disable_in_statistics, id, user_known
from links where alias = '1yqw7' and status = 'Y' limit 1 ;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..8.74 rows=1 width=113) (actual time=9.639..9.643
rows=1 loops=1)
   ->  Index Scan using links2_alias_key on links  (cost=0.00..8.74
rows=1 width=113) (actual time=9.630..9.630 rows=1 loops=1)
         Index Cond: ((alias)::text = '1yqw7'::text)
         Filter: (status = 'Y'::bpchar)
 Total runtime: 16.425 ms
(5 rows)


Now this is only when I have connected to the psql console, of course.
Still, these queries are intermittently very slow!

Re: Are indexes blown?

От
"Douglas McNaught"
Дата:
On 2/15/08, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

>  LOG:  could not receive data from client: Connection reset by peer
>  LOG:  unexpected EOF on client connection
>  LOG:  could not receive data from client: Connection reset by peer
>  LOG:  unexpected EOF on client connection

This means your client processes are dying or getting killed (possibly
due to memory shortages?).  Are these running on the same machine as
Postgres?  Are there any logs you can look at to see what might be
going wrong?  If this is Linux, are there any OOM-killer messages in
the syslogs?

-Doug

Re: Are indexes blown?

От
"Scott Marlowe"
Дата:
On Fri, Feb 15, 2008 at 8:36 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

>  No. They are the vmstat figures from when I was replying to your
>  email. What will vmstat tell me and how should I set it up to do
>  "vmstat 10 logging"?

Something like

vmstat 10 > vmstat.log

>  LOG:  could not receive data from client: Connection reset by peer
>  LOG:  unexpected EOF on client connection
>  LOG:  could not receive data from client: Connection reset by peer
>  LOG:  unexpected EOF on client connection
>
>  Now I don't know what is wrong or even where I should look. Postgresql
>  is often taking quite a bit of memory and CPU resources.
>
>  I've reduced work_mem to 10MB and Max_connections to 100. (Anyway, the
>  old values were working just fine until recently!)
>
>  The biggest problem: when I su into postgres user and do a psql to get
>  into the PG console in my SSH, it takes a whole lot of time to come
>  up! It used to come up in a jiffy earlier!!! It now shows me this
>  error:

How many pgsql processes are there when this happens?  Try something like

ps axu|grep postgres

to see.  use

ps axu|grep postgres|wc -l

to get a rough count.  I'm guessing that your web service layer is
keeping old connections open.  could be something as ugly as php's
pg_pconnect or a buggy jdbc driver, etc...

Re: Are indexes blown?

От
Richard Huxton
Дата:
Phoenix Kiula wrote:
> On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
>>
>>  Are you sure the two sets of vmstat/top figures are from when PG was
>>  crashing/running queries slow? Everything seems idle to me in those figures.
>
> No. They are the vmstat figures from when I was replying to your
> email. What will vmstat tell me and how should I set it up to do
> "vmstat 10 logging"?

I'd write a small script and call it e.g. "trackusage.sh" and save it in
/tmp/

#!/bin/sh
while (/bin/true)
do
   date >> /tmp/vmstat_figures.txt
   vmstat 10 60 >> /tmp/vmstat_figures.txt
done

Then, set the execute flag on it and do something like:
   nohup /tmp/trackusage.sh &

That should run even when you disconnect (don't forget to kill it once
this is fixed). It will log a timestamp every 10 minutes and vmstat
activity between.

[snip logging fragment]
> Now I don't know what is wrong or even where I should look. Postgresql
> is often taking quite a bit of memory and CPU resources.

Just checking - this is a real machine and not a virtual one, isn't it?

> I've reduced work_mem to 10MB and Max_connections to 100. (Anyway, the
> old values were working just fine until recently!)
>
> The biggest problem: when I su into postgres user and do a psql to get
> into the PG console in my SSH, it takes a whole lot of time to come
> up! It used to come up in a jiffy earlier!!! It now shows me this
> error:
>
>  ~ >
> psql: could not connect to server: Connection timed out
>         Is the server running on host "localhost" and accepting
>         TCP/IP connections on port 5432?
>
> Then, five minutes later, I can connect again! In less than a second!
> What gives?

Hopefully vmstat will show us.

> Finally, very simple queries like this one:
>
> select url, disable_in_statistics, id, user_known from links where
> alias = '1yqw7' and status = 'Y' limit 1
>
> Which used to be server in "5 ms" (0.005 seconds)  are now taking
> upwards of 200 seconds!

Same symptom. I'd have guessed the machine is running out of memory and
swapping, but the vmstat/top stuff all look fine.

 > Your suggestion to "Explain Analyze" --
>
> =# explain analyze select url, disable_in_statistics, id, user_known
> from links where alias = '1yqw7' and status = 'Y' limit 1 ;
>                                                            QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..8.74 rows=1 width=113) (actual time=9.639..9.643
> rows=1 loops=1)
>    ->  Index Scan using links2_alias_key on links  (cost=0.00..8.74
> rows=1 width=113) (actual time=9.630..9.630 rows=1 loops=1)
>          Index Cond: ((alias)::text = '1yqw7'::text)
>          Filter: (status = 'Y'::bpchar)
>  Total runtime: 16.425 ms

Fine - it's nothing to do with the planner, indexes or anything else.
This is system-related, and vmstat should point us in the right direction.
--
   Richard Huxton
   Archonet Ltd

Re: Are indexes blown?

От
"Phoenix Kiula"
Дата:
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
> Phoenix Kiula wrote:
>  > On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
>  >>
>
> >>  Are you sure the two sets of vmstat/top figures are from when PG was
>  >>  crashing/running queries slow? Everything seems idle to me in those figures.
>  >
>  > No. They are the vmstat figures from when I was replying to your
>  > email. What will vmstat tell me and how should I set it up to do
>  > "vmstat 10 logging"?
>
>
> I'd write a small script and call it e.g. "trackusage.sh" and save it in
>  /tmp/
>
>  #!/bin/sh
>  while (/bin/true)
>  do
>    date >> /tmp/vmstat_figures.txt
>    vmstat 10 60 >> /tmp/vmstat_figures.txt
>  done
>
>  Then, set the execute flag on it and do something like:
>    nohup /tmp/trackusage.sh &
>


Thanks Richard!

The script you suggested doesn't work:

    tmp > ./trackusage.sh
    -bash: ./trackusage.sh: /bin/sh: bad interpreter: Permission denied

Anyway, I did the vmstat command. I was running it while the system
was ok, then not ok, then ok...and so on. So I hope these numbers have
captured what the issue is:


tmp > vmstat 10 60
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0   3380 331140 114344 2992304    0    0    31    34   20    73  2  1 93  3
 0  0   3380 414412 114352 2992296    0    0     0   100 1105   286  1  1 96  2
 0  0   3380 430356 114380 2992268    0    0     0   133 1103   280  1  1 95  3
 0  0   3380 418988 114392 2992256    0    0     2   120 1098   277  1  2 93  4
 0  0   3380 347996 114408 2992240    0    0     0    85 1081   134  1  0 97  2
 0  0   3380 293236 114444 2992204    0    0     0    40 1076   138  0  0 97  2
 0  0   3380 252860 114456 2992192    0    0     0    80 1086   141  0  0 97  2
 0  0   3380 141340 114480 2992168    0    0     2    54 1078   145  1  0 97  2
 0  0   3380 119940 114504 2992144    0    0     0    31 1079   143  1  1 97  1
 0  0   3380 104252 114524 2992124    0    0     0    64 1087   182  1  1 96  2
 0  0   3380  90556 114560 2992088    0    0     0    59 1087   144  1  0 97  2
 0  0   3380 132476 115088 2995460    0    0    52   174 1130   447  2  1 92  4
 0  1   3380 280628 115124 2995684    0    0    31   220 1144   479  4  2 91  4
 0  0   3380 361340 115152 2995656    0    0     0   147 1135   338  2  1 94  3
 0  0   3380 382028 115180 2995628    0    0     2   113 1109   253  1  1 96  2
 0  0   3380 369740 115220 2995588    0    0     3   200 1107   260  1  1 93  4
 0  0   3380 323140 115248 2995560    0    0     0    60 1097   153  1  0 97  2
 0  0   3380 280260 115272 2995536    0    0     0    77 1087   133  1  0 98  1
 0  0   3380 200580 115296 2995512    0    0     2    65 1089   140  1  0 97  2
 0  0   3380  81916 115392 2995676    0    0    17    82 1089   188  2  1 94  2
 0  0   3380  16980  98072 2974256    0    0    48   122 1102   190  2  1 95  3
 1  0   3380  21588  73160 2954708    0    0    86   274 1128   276  2  2 88  8
 0  0   3380  52692  57860 2932048    0    0     1   128 1106   211  2  1 95  3
 0  0   3380 184748  57960 2931948    0    0     6   219 1128   451  2  1 92  5
 0  0   3380 342996  58016 2931892    0    0     0   140 1122   465  2  1 94  3
 0  0   3380 452020  58068 2932100    0    0     1   122 1114   268  1  1 95  2
 0  0   3380 478044  58132 2932036    0    0     0   106 1099   294  1  1 95  3
 0  0   3380 447540  58224 2931944    0    0     1   238 1098   319  2  2 91  5
 0  0   3380 392524  58284 2931884    0    0     0    71 1078   134  0  1 97  2
 0  0   3380 299684  58340 2931828    0    0     1    88 1079   150  1  0 97  2
 0  0   3380 231652  58388 2931780    0    0     0    40 1076   135  1  1 97  1
 0  0   3380 139012  58432 2931736    0    0     0    42 1076   145  1  0 97  2
 0  0   3380 117884  58472 2931696    0    0     1    67 1092   151  1  0 96  2
 0  0   3380 129460  58528 2931640    0    0     0    59 1097   190  1  1 96  2
 0  0   3380 179892  58584 2931584    0    0     0    42 1100   158  1  1 97  2
 0  0   3380 272900  58648 2931520    0    0     0   111 1114   308  1  1 95  3
 0  0   3380 399100  58704 2931724    0    0     0   132 1128   352  1  1 95  2
 0  0   3380 484556  58748 2931680    0    0     0    76 1111   269  1  1 96  2
 0  0   3380 501180  58804 2931884    0    0     0    93 1103   249  1  1 96  2
 0  0   3380 492636  58864 2931824    0    0     0   138 1094   259  1  1 95  3
 1  1   3380 428380  58912 2932036    0    0     0    44 1088   142  1  0 98  1
 0  0   3380 362340  58996 2931952    0    0     1    45 1085   138  1  0 97  2
 0  0   3380 292708  59072 2931876    0    0     0    71 1082   138  1  1 97  2
 0  0   3380 179292  59172 2931776    0    0     0    65 1089   149  1  0 97  2
 0  0   3380 127292  59236 2931712    0    0     0    38 1090   149  1  0 97  1
 0  0   3380 101940  59304 2931904    0    0    22    72 1097   186  1  1 96  2
 0  0   3380 134068  59340 2931868    0    0     0    74 1100   148  1  0 97  1
 0  1   3380 257908  59400 2932068    0    0     0   112 1114   424  2  1 95  3
 0  1   3380 399484  59460 2932008    0    0     0    96 1127   336  1  1 96  2
 1  0   3380 480548  59524 2932464    0    0     1   100 1118   286  1  1 96  2
 1  0   3380 497092  59580 2932408    0    0     0   120 1110   282  1  1 96  3
 0  1   3380 481684  59652 2932336    0    0     0   174 1099   310  2  2 92  4
 0  0   3380 416772  59692 2932296    0    0     0    49 1085   136  1  0 97  2
 0  0   3380 372108  59740 2932248    0    0     1    68 1089   144  1  0 97  2
 1  1   3380 307676  59808 2932180    0    0     1    75 1083   140  1  0 97  2
 0  0   3380 232620  59904 2932084    0    0     4    78 1077   154  1  0 97  2
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0   3380 169996  59948 2932300    0    0     0    41 1092   157  1  1 96  2
 0  0   3380 171660  59996 2932252    0    0     0   135 1105   190  1  1 96  3
 2  0   3380 166140  60052 2932456    0    0     0    47 1101   158  1  0 97  2
 4  0   3380 269860  60104 2932404    0    0     0   103 1114   326  1  2 94  2


Please advise??

Re: Are indexes blown?

От
Greg Smith
Дата:
On Sat, 16 Feb 2008, Phoenix Kiula wrote:

> The script you suggested doesn't work:
>    tmp > ./trackusage.sh
>    -bash: ./trackusage.sh: /bin/sh: bad interpreter: Permission denied

Try changing the first line to

#!/bin/bash

> Anyway, I did the vmstat command. I was running it while the system
> was ok, then not ok, then ok...and so on. So I hope these numbers have
> captured what the issue is:
>
> tmp > vmstat 10 60
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> 0  0   3380 323140 115248 2995560    0    0     0    60 1097   153  1  0 97  2
> 0  0   3380 280260 115272 2995536    0    0     0    77 1087   133  1  0 98  1
> 0  0   3380 200580 115296 2995512    0    0     2    65 1089   140  1  0 97  2
> 0  0   3380  81916 115392 2995676    0    0    17    82 1089   188  2  1 94  2
> 0  0   3380  16980  98072 2974256    0    0    48   122 1102   190  2  1 95  3
> 1  0   3380  21588  73160 2954708    0    0    86   274 1128   276  2  2 88  8
> 0  0   3380  52692  57860 2932048    0    0     1   128 1106   211  2  1 95  3
> 0  0   3380 184748  57960 2931948    0    0     6   219 1128   451  2  1 92  5
> 0  0   3380 342996  58016 2931892    0    0     0   140 1122   465  2  1 94  3

Looks like the worst spot was in the middle here.  Something gobbled up
over 300MB of memory in 40 seconds, enough to force the OS to blow away
almost half its disk buffers just to keep working memory free.  Not so bad
that it went to swap or invoked the OOM killer but enough to push the I/O
block out (bo) up.  I would guess the other ugly spots were the later
portions where the bo spiked >100.

But without knowing more about what the processing using this memory and
generating the output I/O are doing it's hard to say why.  That's why I
suggested you watch top with the command lines turned on for a bit, to see
what process(es) are jumping around during the bad periods.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Are indexes blown?

От
"Phoenix Kiula"
Дата:
On 16/02/2008, Greg Smith <gsmith@gregsmith.com> wrote:
> On Sat, 16 Feb 2008, Phoenix Kiula wrote:
>
>  > The script you suggested doesn't work:
>  >    tmp > ./trackusage.sh
>  >    -bash: ./trackusage.sh: /bin/sh: bad interpreter: Permission denied
>
>
> Try changing the first line to
>
>  #!/bin/bash


Thanks Greg. Same problem with that too. I guess my tmp folder is
secured and doesn't allow for executables? I put it in another folder
and it's working.


...snip....
> Looks like the worst spot was in the middle here.  Something gobbled up
>  over 300MB of memory in 40 seconds, enough to force the OS to blow away
>  almost half its disk buffers just to keep working memory free.  Not so bad
>  that it went to swap or invoked the OOM killer but enough to push the I/O
>  block out (bo) up.  I would guess the other ugly spots were the later
>  portions where the bo spiked >100.
>
>  But without knowing more about what the processing using this memory and
>  generating the output I/O are doing it's hard to say why.  That's why I
>  suggested you watch top with the command lines turned on for a bit, to see
>  what process(es) are jumping around during the bad periods.


Happy to do that, but top keeps changing before I can copy text from
it. I think most of the connections seem to be "httpd" which is Apache
2.2.6. I checked the netstat commands and the server is not under DDOS
or anything.

My hosting provider tells me that the Postgresql server is taking up a
lot of memory but I've been running the same db with the same config
for over 2 years. Yes we have been growing but what happened in the
last 3 days to warrant a sudden spike in memory consumption??!!

Anyway, I want to go back to them with some hard data that postgresql
is NOT the one that is causing my server to load. The indexes are all
in place (and I've REINDEXed my big tables anyway) so the performance
of pg itself is not an issue.

I just don't know where to get this hard data. The top output shows
httpd on top, and sometimes postmaster, but I don't know how to
repeatedly capture it. Any suggestions?

Re: Are indexes blown?

От
"Scott Marlowe"
Дата:
On Feb 15, 2008 10:38 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

> My hosting provider tells me that the Postgresql server is taking up a
> lot of memory but I've been running the same db with the same config
> for over 2 years. Yes we have been growing but what happened in the
> last 3 days to warrant a sudden spike in memory consumption??!!

OK, you've repeated this a few times.  Unless your load has been the
same the whole time, this statement means little.  If the same config
works for 2 years at load x, but fails in 1 day at load 3x then the
problem might have been there all along, and you just weren't running
the system hard enough to find the problem.

Just because PostgreSQL is exhibiting problems doesn't mean it's all
postgresql's fault.

150 or 100 connections is a LOT for a postgresql server, but
especially so if you went from actually using 5 or 10 to using 98.
The setting's the same, but the number is use is vastly different and
will have vastly different results on how postgresql runs.

Hanging connections could EASILY cause the problem you're seeing.  If
the network loses your connection from your app tier to your database,
your database might have 100 connections open doing nothing but
sitting idle in transaction holding data in memory until the
tcp_keepalive kicks in and kills them.

The earlier host connection errors point to that problem as well.

So, do you have mysteriously crashing or disappearing apache child
processes?  What do the error logs for apache have to say?

Can you tell what your load was when the system worked and what it is
now by trawling through the logs or something? (apache or pgsql as
long as their equivalent for both time periods.)

If you start leaving hanging connections to the database then you are
in fact DOSing the database server.  Not all DOS attacks are
intentional, and a crashing apache - php  can do it even without
persistent connections.

I'd say you haven't proven where the problem is yet, and should look
at the app tier.

Re: Are indexes blown?

От
Greg Smith
Дата:
On Sat, 16 Feb 2008, Phoenix Kiula wrote:

> The top output shows httpd on top, and sometimes postmaster, but I don't
> know how to repeatedly capture it. Any suggestions?

Try this:

top -bc | tee topdata

That will save everything to a file called topdata while also letting you
watch it scroll by.  Not as easy to catch the bad periods that way, the
output is going to be a huge data file, but you'll have a log to sort
through of everything.  Control-C to get out of there when you're bored.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Are indexes blown?

От
"Andrej Ricnik-Bay"
Дата:
On 16/02/2008, Greg Smith <gsmith@gregsmith.com> wrote:

> top -bc | tee topdata
>
> That will save everything to a file called topdata while also letting you
> watch it scroll by.  Not as easy to catch the bad periods that way, the
> output is going to be a huge data file, but you'll have a log to sort
> through of everything.  Control-C to get out of there when you're bored.

Or pipe through the following awk-script to make the output
less overwhelming ... save it as top.awk

BEGIN{
  # the sort numerically sorts by memory usage percentage
  # head filters out the top 20 contenders
  command = "LC_ALL=C sort -k 10,10gr|head -20"
}
{
  # read all of top into an array
  line[NR]=$0
  last=NR
}
END{
  # print the header well-formed
  for(i=1;i<8;i++){
    print line[i]
  }
  # and do the sort & strip of the processes
  for(i=8;i<last;i++){
    print line[i]|& command
  }
  close(command, "to")
  while ((command |& getline out) > 0)
    print out
  close(command)
}

Invoke like so
top -b -d 1 | awk -f top.awk | tee topdata

Instead of the "getting bored Ctrl-C" maybe a "-n 3600" as
extra parameter to top to get roughly one hours worth of data...


Cheers,
Andrej

Re: Are indexes blown?

От
"Phoenix Kiula"
Дата:
On 17/02/2008, Andrej Ricnik-Bay <andrej.groups@gmail.com> wrote:
> On 16/02/2008, Greg Smith <gsmith@gregsmith.com> wrote:
>
> > top -bc | tee topdata
> >
> > That will save everything to a file called topdata while also letting you
> > watch it scroll by.  Not as easy to catch the bad periods that way, the
> > output is going to be a huge data file, but you'll have a log to sort
> > through of everything.  Control-C to get out of there when you're bored.
>
> Or pipe through the following awk-script to make the output
> less overwhelming ... save it as top.awk
>
> BEGIN{
>   # the sort numerically sorts by memory usage percentage
>   # head filters out the top 20 contenders
>   command = "LC_ALL=C sort -k 10,10gr|head -20"
> }
> {
>   # read all of top into an array
>   line[NR]=$0
>   last=NR
> }
> END{
>   # print the header well-formed
>   for(i=1;i<8;i++){
>     print line[i]
>   }
>   # and do the sort & strip of the processes
>   for(i=8;i<last;i++){
>     print line[i]|& command
>   }
>   close(command, "to")
>   while ((command |& getline out) > 0)
>     print out
>   close(command)
> }
>
> Invoke like so
> top -b -d 1 | awk -f top.awk | tee topdata
>
> Instead of the "getting bored Ctrl-C" maybe a "-n 3600" as
> extra parameter to top to get roughly one hours worth of data...



Thanks, but it gives me syntax errors:


~> top -b -d 1 | awk -f top.awk | tee topdata

awk: top.awk:24:  for(i=8;i<last;i++
awk: top.awk:24:                    ^ syntax error
awk: top.awk:28:
awk: top.awk:28:  ^ syntax error
awk: top.awk:29:
awk: top.awk:29:    ^ syntax error
awk: top.awk:31:
awk: top.awk:31: ^ syntax error
awk: top.awk:31: }
awk: top.awk:31:  ^ invalid char '' in expression



Any ideas?

Re: Are indexes blown?

От
"Phoenix Kiula"
Дата:
On 17/02/2008, Shashank Tripathi <shanx@shanx.com> wrote:
> On 17/02/2008, Andrej Ricnik-Bay <andrej.groups@gmail.com> wrote:
> > On 17/02/2008, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> > > ~> top -b -d 1 | awk -f top.awk | tee topdata
> > >
> > > awk: top.awk:24:  for(i=8;i<last;i++
> > > awk: top.awk:24:                    ^ syntax error
> > > awk: top.awk:28:
> > > awk: top.awk:28:  ^ syntax error
> > > awk: top.awk:29:
> > > awk: top.awk:29:    ^ syntax error
> > > awk: top.awk:31:
> > > awk: top.awk:31: ^ syntax error
> > > awk: top.awk:31: }
> > > awk: top.awk:31:  ^ invalid char ' ' in expression
> >
> >
> > > Any ideas?
> > What OS are you on, which version of awk?  Mine
> > works on most current Linux variants with a gawk > 3.x
> > awk -W version
> > GNU Awk 3.1.5
> > Copyright (C) 1989, 1991-2005 Free Software Foundation.
> >
> > Another thought is that maybe when you copy & pasted you
> > got some special characters into the script that awk doesn't
> > like ....
> >
>
>
>
> Thanks. I am on CentOS 4 (Linux) and the awk bit is
>
>     GNU Awk 3.1.3
>     Copyright (C) 1989, 1991-2003 Free Software Foundatio
>
> Do I need to update awk?
>



My question exactly.

Anyway I downloaded the text file that was attached in this thread,
and then it works. Must have been some copy/paste problem as
suggested.

I ran it with this command:

    top -b -d 1 -n 3600 | awk -f top.awk | tee topdata

But this is kind of sitting there, hogging the command prompt. Is
there any way I can let it go on in the background?

Thanks for the awk tip. Looks like a thing I need to learn!

Re: Are indexes blown?

От
"Andrej Ricnik-Bay"
Дата:
On 17/02/2008, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> But this is kind of sitting there, hogging the command prompt. Is
> there any way I can let it go on in the background?

Ouch ... no, that's entirely my fault, wasn't quite awake I
guess, and hadn't thought it through completely ... that's
not going to give us the desired result...

Try this:

for z in `seq 1 3600`; do top -b -d 1 -n 1| awk -f top.awk; done | tee topoutput

Not sure whether it's going to give us the desire granularity of time...


Cheers,
Andrej


--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

Re: Are indexes blown?

От
Andrew Sullivan
Дата:
On Sat, Feb 16, 2008 at 12:38:04PM +0800, Phoenix Kiula wrote:

> My hosting provider tells me that the Postgresql server is taking up a
> lot of memory but I've been running the same db with the same config
> for over 2 years. Yes we have been growing but what happened in the
> last 3 days to warrant a sudden spike in memory consumption??!!

Maybe your database has just grown big enough that it no longer all fits in
memory, and your disk hardware is flakey?  Or maybe the disk is flakey?  Or
maybe the data itself is now different enough that the planner is changing
what it's doing?  There are lots of explanations.

But I have to agree with someone upthread: you have to stop the crashes and
clean the system up before you try to debug this stuff.  Your problem is
likely there.

A

Re: Are indexes blown?

От
Andrew Sullivan
Дата:
On Fri, Feb 15, 2008 at 09:09:32PM +0800, Phoenix Kiula wrote:
> Actually my host has just told me that I have a number of "hung
> semaphores" in my server. And he is relating them to postgresql. I am
> not surprised, because this is the only utility that has issues. All
> the rest is working (apache, mysql, exim, etc). Any thoughts on where
> I should start looking for hung semaphores?

If this means what I think it means, you have bigger problems than you've
been posting.  Shut down postgres, and clean up any shared memory segments
that are hanging around.  See the manual on ipcclean.

A