Обсуждение: "Mysterious" issues with newly installed 8.3

Поиск
Список
Период
Сортировка

"Mysterious" issues with newly installed 8.3

От
Carlos Moreno
Дата:
Ok, I know that such an open and vague question like this one
is...  well, open and vague...  But still.

The short story:

Just finished an 8.3.4 installation on a new machine, to replace
an existing one;  the new machine is superior (i.e., higher
performance) in virtually every way --- twice as much memory,
faster processor, faster drives, etc.

I made an exact copy of the existing database on the new
machine, and the exact same queries run on both reveal that
the old machine beats the new one by a factor of close to 2 !!!!
(i.e., the same queries run close to twice as fast on the old
machine!!!)

To make things worse:  the old machine is in operation, under
normal workload  (and right now the system may be around
peak time), and the new machine is there sitting doing nothing;
just one user logged in using psql to run the queries --- *no-one
and nothing* is connecting to the new server.

So... What's going on???


The details:

CPU:
New: Opteron DC 1218HE  (1MB cache per core) @2.6GHz
Old:  Athlon64 X2  (512K cache per core)  @2.2GHz

RAM:
New:  4GB
Old:   2GB

HD:
Doesn't matter the capacity, but I have every reason to believe
the new one is faster --- hdparm reports 105MB/sec transfer
rate;  the measurement for the old server is meaningless, since
it is in operation  (i.e., there is actual database activity), so it
measures between 50MB/sec and 70MB/sec.  Given its age, I
would estimate 70 to 80 MB/sec

OS:
New:  CentOS 5.2  (gcc 4.1.2)
Old:  FC6  (gcc 4.1.2)

PG:
New:  8.3.4 installed from source
Old:   8.2.4 installed from source

Presumably relevant configuration parameters --- shared_buffers
was set to 250MB on the old one;  I set it to 500MB on the new
one  (kinda makes sense, no?  1/8 of the physical memory in both
cases).

I set max_fsm_pages a little bit higher on the new one (409600
instead of 307200 on the old one).  The rest is pretty much
identical  (except for the autovacuum --- I left the defaults in the
new one)


The old machine is vacuum-analyzed once a day  (around 4AM);
on the new one, I ran a vacuumdb -z -f after populating it.


Some interesting outputs:

explain analyze select count(*) from users;
New:
 Aggregate  (cost=8507.11..8507.12 rows=1 width=0) (actual
time=867.582..867.584 rows=1 loops=1)
   ->  Seq Scan on users  (cost=0.00..7964.49 rows=217049 width=0)
(actual time=0.016..450.560 rows=217049 loops=1)
 Total runtime: 867.744 ms

Old:
 Aggregate  (cost=17171.22..17171.22 rows=1 width=0) (actual
time=559.475..559.476 rows=1 loops=1)
   ->  Seq Scan on users  (cost=0.00..16628.57 rows=217057 width=0)
(actual time=0.009..303.026 rows=217107 loops=1)
 Total runtime: 559.536 ms

Running the same command again several times practically
does not change anything.


explain analyze select count(*) from users where username like 'A%';
New:
 Aggregate  (cost=6361.28..6361.29 rows=1 width=0) (actual
time=87.528..87.530 rows=1 loops=1)
   ->  Bitmap Heap Scan on users  (cost=351.63..6325.33 rows=14376
width=0) (actual time=6.444..53.426 rows=17739 loops=1)
         Filter: ((username)::text ~~ 'a%'::text)
         ->  Bitmap Index Scan on c_username_unique  (cost=0.00..348.04
rows=14376 width=0) (actual time=5.383..5.383 rows=17739 loops=1)
               Index Cond: (((username)::text >= 'a'::text) AND
((username)::text < 'b'::text))
 Total runtime: 87.638 ms

Old:
 Aggregate  (cost=13188.91..13188.92 rows=1 width=0) (actual
time=61.743..61.745 rows=1 loops=1)
   ->  Bitmap Heap Scan on users  (cost=392.07..13157.75 rows=12466
width=0) (actual time=7.433..40.847 rows=17747 loops=1)
         Filter: ((username)::text ~~ 'a%'::text)
         ->  Bitmap Index Scan on c_username_unique  (cost=0.00..388.96
rows=12466 width=0) (actual time=5.652..5.652 rows=17861 loops=1)
               Index Cond: (((username)::text >= 'a'::character varying)
AND ((username)::text < 'b'::character varying))
 Total runtime: 61.824 ms


Any ideas?


Re: "Mysterious" issues with newly installed 8.3

От
"Scott Carey"
Дата:
On Thu, Oct 9, 2008 at 4:51 PM, Carlos Moreno <morenopg@mochima.com> wrote:

Ok, I know that such an open and vague question like this one
is...  well, open and vague...  But still.

The short story:

Just finished an 8.3.4 installation on a new machine, to replace
an existing one;  the new machine is superior (i.e., higher
performance) in virtually every way --- twice as much memory,
faster processor, faster drives, etc.

I made an exact copy of the existing database on the new
machine, and the exact same queries run on both reveal that
the old machine beats the new one by a factor of close to 2 !!!!
(i.e., the same queries run close to twice as fast on the old
machine!!!)

To make things worse:  the old machine is in operation, under
normal workload  (and right now the system may be around
peak time), and the new machine is there sitting doing nothing;
just one user logged in using psql to run the queries --- *no-one
and nothing* is connecting to the new server.

So... What's going on???


The details:

CPU:
New: Opteron DC 1218HE  (1MB cache per core) @2.6GHz
Old:  Athlon64 X2  (512K cache per core)  @2.2GHz

RAM:
New:  4GB
Old:   2GB

HD:
Doesn't matter the capacity, but I have every reason to believe
the new one is faster --- hdparm reports 105MB/sec transfer
rate;  the measurement for the old server is meaningless, since
it is in operation  (i.e., there is actual database activity), so it
measures between 50MB/sec and 70MB/sec.  Given its age, I
would estimate 70 to 80 MB/sec

OS:
New:  CentOS 5.2  (gcc 4.1.2)
Old:  FC6  (gcc 4.1.2)

PG:
New:  8.3.4 installed from source
Old:   8.2.4 installed from source

Presumably relevant configuration parameters --- shared_buffers
was set to 250MB on the old one;  I set it to 500MB on the new
one  (kinda makes sense, no?  1/8 of the physical memory in both
cases).

I set max_fsm_pages a little bit higher on the new one (409600
instead of 307200 on the old one).  The rest is pretty much
identical  (except for the autovacuum --- I left the defaults in the
new one)


The old machine is vacuum-analyzed once a day  (around 4AM);
on the new one, I ran a vacuumdb -z -f after populating it.


Some interesting outputs:

explain analyze select count(*) from users;
New:
 Aggregate  (cost=8507.11..8507.12 rows=1 width=0) (actual
time=867.582..867.584 rows=1 loops=1)
  ->  Seq Scan on users  (cost=0.00..7964.49 rows=217049 width=0)
(actual time=0.016..450.560 rows=217049 loops=1)
 Total runtime: 867.744 ms

Old:
 Aggregate  (cost=17171.22..17171.22 rows=1 width=0) (actual
time=559.475..559.476 rows=1 loops=1)
  ->  Seq Scan on users  (cost=0.00..16628.57 rows=217057 width=0)
(actual time=0.009..303.026 rows=217107 loops=1)
 Total runtime: 559.536 ms

Running the same command again several times practically
does not change anything.


explain analyze select count(*) from users where username like 'A%';
New:
 Aggregate  (cost=6361.28..6361.29 rows=1 width=0) (actual
time=87.528..87.530 rows=1 loops=1)
  ->  Bitmap Heap Scan on users  (cost=351.63..6325.33 rows=14376
width=0) (actual time=6.444..53.426 rows=17739 loops=1)
        Filter: ((username)::text ~~ 'a%'::text)
        ->  Bitmap Index Scan on c_username_unique  (cost=0.00..348.04
rows=14376 width=0) (actual time=5.383..5.383 rows=17739 loops=1)
              Index Cond: (((username)::text >= 'a'::text) AND
((username)::text < 'b'::text))
 Total runtime: 87.638 ms

Old:
 Aggregate  (cost=13188.91..13188.92 rows=1 width=0) (actual
time=61.743..61.745 rows=1 loops=1)
  ->  Bitmap Heap Scan on users  (cost=392.07..13157.75 rows=12466
width=0) (actual time=7.433..40.847 rows=17747 loops=1)
        Filter: ((username)::text ~~ 'a%'::text)
        ->  Bitmap Index Scan on c_username_unique  (cost=0.00..388.96
rows=12466 width=0) (actual time=5.652..5.652 rows=17861 loops=1)
              Index Cond: (((username)::text >= 'a'::character varying)
AND ((username)::text < 'b'::character varying))
 Total runtime: 61.824 ms


Any ideas?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

First, use iostat or another tool  to view the disk usage on the new machine during these queries and validate that it is not using the disk at all.  This is most likely the case.

Then, to be sure, set its config parameters to be equal to the old one, and turn off auto-vacuum.  This will also most likely have no effect.

Once this is confirmed, we can be pretty sure that the issue is restricted to:
CPU / RAM / Motherboard on the hardware side.  There may still be some software effects in the OS or drivers, or PostgreSQL to account for, but lets drill into the hardware and try and eliminate that first.

Sure, the processor should be faster, but Athlon64s / Opterons are very sensitive to the RAM used and its performance and tuning.
So, you should find some basic CPU benchmarks and RAM benchmarks -- you'll want to measure latency as well as bandwidth.
Athlon64 and Opteron both typically have two memory busses per processor, and it is possible to populate the memory banks in such a way that the system has half the bandwidth.
In any event, you'll first want to identify if simple benchmark software is able to prove a disparity between the systems independant of postgres.  This may be a bit difficult to do on the live system however. 

But it is my suspicion that Postgres performance is often more dependant on the memory subsystem performance than the CPU Mhz (as are most databases) and poor components, configuration, or tuning on that side would show up in queries like the examples here.

Re: "Mysterious" issues with newly installed 8.3

От
Craig James
Дата:
Scott Carey wrote:
> On Thu, Oct 9, 2008 at 4:51 PM, Carlos Moreno <morenopg@mochima.com
> <mailto:morenopg@mochima.com>> wrote:
>
>
>     Ok, I know that such an open and vague question like this one
>     is...  well, open and vague...  But still.
>
>     The short story:
>
>     Just finished an 8.3.4 installation on a new machine, to replace
>     an existing one;  the new machine is superior (i.e., higher
>     performance) in virtually every way --- twice as much memory,
>     faster processor, faster drives, etc.
>
>     I made an exact copy of the existing database on the new
>     machine, and the exact same queries run on both reveal that
>     the old machine beats the new one by a factor of close to 2 !!!!
>     (i.e., the same queries run close to twice as fast on the old
>     machine!!!)
>
>     To make things worse:  the old machine is in operation, under
>     normal workload  (and right now the system may be around
>     peak time), and the new machine is there sitting doing nothing;
>     just one user logged in using psql to run the queries --- *no-one
>     and nothing* is connecting to the new server.
>
>     So... What's going on???

Did you do an ANALYZE on the new database after you cloned it?  I was suprised by this too, that after doing a
pg_dump/pg_restore,the performance sucked.  But it was simply because the new database had no statistics yet. 

Craig

Re: "Mysterious" issues with newly installed 8.3

От
"Scott Marlowe"
Дата:
The first thing I'd try is installing 8.2 on the new server to see if
the problem is the server or postgresql.  Set up the new server and
new pgsql install the same and see how it runs.

Re: "Mysterious" issues with newly installed 8.3

От
Greg Smith
Дата:
On Thu, 9 Oct 2008, Craig James wrote:

> Did you do an ANALYZE on the new database after you cloned it?

He ran "vacuumdb -f -z", the -z does an analyze.  Also, he's getting
nearly identical explain plans out of the two systems, which suggests the
stats are similar enough in the two cases.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: "Mysterious" issues with newly installed 8.3

От
Greg Smith
Дата:
On Thu, 9 Oct 2008, Scott Carey wrote:

> Sure, the processor should be faster, but Athlon64s / Opterons are very
> sensitive to the RAM used and its performance and tuning. So, you should
> find some basic CPU benchmarks and RAM benchmarks -- you'll want to
> measure latency as well as bandwidth. Athlon64 and Opteron both
> typically have two memory busses per processor, and it is possible to
> populate the memory banks in such a way that the system has half the
> bandwidth.

This is really something to watch out for.  One quick thing first though:
what frequency does the CPU on the new server show when you look at
/proc/cpuinfo?  If you see "cpu MHz:  1000.00" you probably are throttling
the CPU down hard with power management which was cause the slowness you
describe.  In that case I'd suggest editing /etc/sysconfig/cpuspeed and
changing "GOVERNER=performance".

Back to memory.  What I do with any new, untrusted system is boot with a
memtest86+ CD is take a look at the memory speed information it shows,
with the most important number being the uncached RAM speed.  If you're
not running in dual-channel mode and at the maximum frequency the RAM
supports, that can run seriously slow things down.  You probably can't
take down the production server for comparison.  I can tell you that on my
little Athlon X2@2.4GHz server, I've seen the memtest86+ reported raw
memory speed run anywhere from 2093MB/s (with crummy DDR2 667 that doesn't
match the CPU bus frequency very well) to 3367MB/s (using good DDR2 800).
You should see even better from your Opteron system.

Another really handy way to gauge memory speed on Linux, if there are
similar kernels installed on each system like your case, is to use "hdparm
-T".  That cached read figure is highly correlated with overall memory
performance.  The nice part about that is you can probably get a useful
comparison result from the old server if you run that a bunch of times
even with other activity (just take the highest number you ever see),
whereas memtest86+ requires some downtime.  Those numbers are lower than
the I'd expect around 2500MB/s out of your new server here (that's what I
got when I just tested an Opteron 2220 system @2.8GHz using the RHEL5
hdparm -T).

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: "Mysterious" issues with newly installed 8.3

От
Carlos Moreno
Дата:
Thanks Greg and others for your replies,

> This is really something to watch out for.  One quick thing first
> though: what frequency does the CPU on the new server show when you
> look at /proc/cpuinfo?  If you see "cpu MHz:  1000.00"

It was like that in the initial setup --- I just disabled the cpuspeed
service  (service cpuspeed stop;  chkconfig cpuspeed off ), and now
it shows the full 2600MHz at all times  (the installation of PG was
done after this change)

> Another really handy way to gauge memory speed on Linux, if there are
> similar kernels installed on each system like your case, is to use
> "hdparm -T".

Great tip!  I was familiar with the -T switch, but was not clear on the
notion that the figure tells you that much about the overall memory
performance!

Anyway, I checked on both, and the new system is slightly superior
(around 2200 for the new, around 1900 for the old one) --- a bit below
the figure you mention you'd expect  (2500 --- though that was for a
2.8GHz Opteron, presumably with faster FSB and faster memory??)

I guess my logical next step is what was suggested by Scott --- install
8.2.4 and repeat the same tests with this one;  that should give me
interesting information.

Anyway, if I find something interesting or puzzling, I would post again
with the results of those tests.

Thanks again for the valuable advice and comments!

Carlos
--


Re: "Mysterious" issues with newly installed 8.3

От
"Scott Marlowe"
Дата:
On Mon, Oct 13, 2008 at 8:55 AM, Carlos Moreno <morenopg@mochima.com> wrote:
> I guess my logical next step is what was suggested by Scott --- install
> 8.2.4 and repeat the same tests with this one;  that should give me
> interesting information.

I'd suggest updating to the latest 8.2.x update as well.  Not for
performance tuning reasons but to make sure you're data's not at risk
etc... I think there's a good year or more of updates missing from the
8.2.4 branch.

Re: "Mysterious" issues with newly installed 8.3

От
Carlos Moreno
Дата:
Scott Marlowe wrote:
> On Mon, Oct 13, 2008 at 8:55 AM, Carlos Moreno <morenopg@mochima.com> wrote:
>
>> I guess my logical next step is what was suggested by Scott --- install
>> 8.2.4 and repeat the same tests with this one;  that should give me
>> interesting information.
>>
>
> I'd suggest updating to the latest 8.2.x update as well.  Not for
> performance tuning reasons but to make sure you're data's not at risk
> etc... I think there's a good year or more of updates missing from the
> 8.2.4 branch.
>


Of course --- but do keep in mind that the reason for this was to
do a meanigful comparison;  SQLs being run and clicked on an
8.2.4 system vs. the same SQLs being run on a different hardware
with the exact same software.  If for some reason I conclude that
the 8.2 seems to offer better performance with the given hardware,
then of course I would go with the latest 8.2.x ...

If you're referring to the existing installation, well, yeah, I've been
meaning to upgrade it, but I guess now that we are going with a
hardware upgrade as well, then the software upgrade will be a
side-effect of the maneuver.

Thanks,

Carlos
--


Re: "Mysterious" issues with newly installed 8.3

От
Greg Smith
Дата:
On Mon, 13 Oct 2008, Carlos Moreno wrote:

>> Another really handy way to gauge memory speed on Linux, if there are
>> similar kernels installed on each system like your case, is to use
>> "hdparm -T".
>
> Great tip!  I was familiar with the -T switch, but was not clear on the
> notion that the figure tells you that much about the overall memory
> performance!

I wouldn't go so far as to say it tells you *much* about it, but it does
give a fairly useful comparison figure if the kernels are basically the
same and can help spot gross errors.  I used it a bunch when I was
tinkering with DDR speeds and such earlier this year, it correlated fairly
well with other memory bandwidth measurements within the same processor
family (so far my tests suggestion results are much higher per clock on
Intel CPUs).  Certainly of no use for comparison if one system has a
32-bit kernel and the other 64, and results will vary depending on general
kernel configuration (I get very different results from seemingly similar
RedHat and Ubuntu kernels on the same system for example).

It sounds like your CPU and memory setup are all fine, which leaves your
mystery open.  Please let us know if you find anything interesting out,
normally an 8.3 upgrade would run faster so your situation is a bit
curious.  The only other benchmark I'd suggest just as a sanity check is
bonnie++, that's a bit more thorough than what hdparm -t reports.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD