Re: Really really slow select count(*)

Поиск
Список
Период
Сортировка
От Scott Marlowe
Тема Re: Really really slow select count(*)
Дата
Msg-id AANLkTinDE0NjJqU8jQwsoyQOjm3jAFs4W6f=muHiyJ_W@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Really really slow select count(*)  (felix <crucialfelix@gmail.com>)
Список pgsql-performance
On Sun, Feb 6, 2011 at 6:52 PM, felix <crucialfelix@gmail.com> wrote:
> On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> Let's review:
>>
>> 1: No test or staging system used before production
>
> no, I do not have a full ubuntu machine replicating the exact memory and
> application load of the production server.
> this was changing one configuration parameter. something I was advised to
> do, read about quite a bit, tested on my development server (mac) and then
> proceeded to do at 6 am on Sunday morning, our slowest time.

I would strongly suggest you at least test these changes out
elsewhere.  It doesn't have to exactly match, but if you had a machine
that was even close to test on you'd have known what to expect.
Virtual machines are dirt simple to set up now.  So not having one
inexcusable.

>> 2: DB left in an unknown state (trying to shut down, not able)
>
> what ?

You told it to restart, which is a stop and a start.  It didn't stop.
It was in an unknown state.  With settings in its config file you
didn't know whether or not they worked because you hadn't tested them
already on somthing similar.

> Why did it shutdown so much later ?

Because that's when the last open connection from before when you told
it to shutdown / restart.

> I have never restarted postgres before, so this was all new to me.

Which is why you use a virtual machine to build a test lab so you CAN
make these changes somewhere other than produciton.

>  I apologize that I wasn't born innately with such knowledge.

Guess what!?  Neither was I!  I do however know how to setup a test
system so I don't test things on my production machine.

> So is it normal for postgres to report that it failed to shut down, operate
> for an hour and then go ahead and restart itself ?

Yes.  It eventually finished your restart you told it to do.

>> 3: No monitoring software to tell you when the site is down
>
> of course I have monitoring software.  both external and internal.  but it
> doesn't come and kick me out of bed.  yes, I need an automated cel phone
> call.  that was the first thing I saw to afterwards.

Monitoring software that can't send you emails when things break is in
need of having that feature enabled.

>
>> 4: I'm gonna just go ahead and guess no backups were taken either, or
>> are regularly taken.
>
> WTF ?   of course I have backups.  I just went through a very harsh down
> period event.  I fail to see why it is now necessary for you to launch such
> an attack on me.

No, it just seemed like your admin skills were pretty sloppy, so a
lack of a backup wouldn't surprise me.

> Perhaps the tone of my post sounded like I was blaming you, or at least you
> felt that way.

It felt more like you were blaming PostgreSQL for being overly
complex, but I wasn't taking it all that personally.

>  Why do you feel that way ?

I don't.

> Why not respond with:  "ouch !  did you check this ... that...."  say
> something nice and helpful.  correct my mistakes

I'd be glad to, but your message wasn't looking for help.  go back and
read it.  It's one long complaint.

>> This website can't be very important, if that's the way you treat it.
>
> just to let you know, that is straight up offensive

Really?  I'd say performing maintenance with no plan or pre-testing is
far more offensive.

> This is high traffic real estate site.  Downtime is unacceptable.  I had
> less downtime than this when I migrated to the new platform.

I expect you did more planning an testing?

> I spent rather a large amount of time reading and questioning here.  I asked
> many questions for clarification and didn't do ANYTHING until I was sure it
> was the correct solution.  I didn't just pull some shit off a blog and start
> changing settings at random.

But yet you failed to test it on even the simplest similar system
setup.  And so you lacked the practical knowledge of how to make this
change in production safely.

> I double checked opinions against different people and I searched for more
> docs on that param.  Amazingly none of the ones I found commented on the
> shared memory issue and I didn't even understand the docs discussing shared
> memory because it didn't seem to apply to what I was doing.  that's my
> misunderstanding.  I come her to share my misunderstanding.

Well, that's useful.  And I can see where there could be some changes
made to the docs or a user friendly howto on how to increase shared
memory and fsm and all that.

>> Please, that's a gross exaggeration.  The sum totoal to changing them is:
>>
>> run sysctl -a|grep shm
>> copy out proper lines to cahnge
>> edit sysctl.conf
>> put new lines in there with changes
>> sudo sysctl -p  # applies changes
>> edit the appropriate postgresql.conf, make changes
>> sudo /etc/init.d/postgresql-8.3 stop
>> sudo /etc/init.d/postgresql-8.3 start
>
> Considering how splendidly the experiment with changing fsm_max_pages went,
> I think you can understand that I have no desire to experiment with kernel
> settings.

Experimenting is what you do on a test machine, not a production server.

> It is easy for you because you ALREADY KNOW everything involved.

But this is important, it was NOT EASY the first time, and I certainly
didn't try to make changes on a production server the first time.

> I am not a
> sysadmin and we don't have one.  My apologies for that.

No need to apologize.  Learn the skills needed to fill that role, or
hire someone.

> so does the above mean that I don't have to restart the entire server, just
> postgres ?  I assumed that changing kernel settings means rebooting the
> server.

Exactly.  Just pgsql.  You use sysctl -p to make the changes take effect.

>> Did you actually delete the old entries before clustering it?  if it's
>> still got 4G of old sessions or whatever in it, clustering ain't gonna
>> help.
>
> its a different table.  the problem one has only 300k rows
> the problem is not the size, the problem is the speed is catastrophic

Well, is it bloated?  Which table in that previous post is it?

> sorry, it didn't seem to be the most important topic when I got out of bed

If it's not coffee, it's not an important topic when I get out of bed.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Really really slow select count(*)
Следующее
От: Shaun Thomas
Дата:
Сообщение: Re: Really really slow select count(*)