Обсуждение: Compared MS SQL 2000 to Postgresql 9.0 on Windows

От:
Tom Polak
Дата:

We are in the process of deciding on how to proceed on a database upgrade.  We currently have MS SQL 2000 running on Windows 2003 (on my test server).  I was shocked at the cost for MS SQL 2008 R2 for a new server (2 CPU license).  I started comparing DB’s and came across postgresql.  It seemed to be exactly what I was after.  All of our programming is in ASP.net.  Since I am running MSSQL 2000 I have no benefit for .Net integration, so it is not a concern.

 

I ran a head to head test of MS SQL 2000 and Postgresql 9.0.  Both are running on Windows 2003.  What I found was quite surprising and I am wondering if anyone can point out what is going on here. 
Here is the test I ran. 
I created 2 tables, the main table had 5 fields with a serial ID field.  The second table linked to table 1 for a state field.

I had ASP.net via MSSQL create 1,000 records in the main table. Took 9.85 seconds to complete.
Next I had ASP.net via Postgresql create 1,000 records.  Took .65625 seconds.
Postgresql smoked MS SQL server on that test.

 

Next test is to use ASP.net and join all 1,000 rows with table 2 and then display the text out.

MS SQL took 0.76 seconds to display
select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid

 

Then I did the same test via Postgresql and it took 8.85 seconds!  I tried it again as I thought I did something wrong.  I did a few tweaks such as increasing the shared buffers.  Still the best I could get it to was 7.5 seconds.  This is insanely slow compared to MSSQL 2000.  What am I missing.  Here is my SQL statement for postgresql:
select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid

 

Any ideas on why the Postgres server is soooo much slower on the joins?   I am trying to understand what is going on here so please don’t flame me.  Any advice is appreciated. 

 

 

Thanks,
Tom Polak
Rockford Area Association of Realtors

The information contained in this email message is intended only for the use of the individual or entity named.  If the reader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited.  If you have received this email in error, please immediately notify us by telephone and reply email.  Thank you.

Although this email and any attachments are believed to be free of any viruses or other defects that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is free of viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage that results.

 

От:
"Kevin Grittner"
Дата:

Tom Polak <> wrote:

> the best I could get it to was 7.5 seconds.

> select name,address,city,state,statename,stateid,other from
> pgtemp1 left join pgtemp2 on state=stateid

We'd need a lot more information.  Please read this and post again:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

Be sure to include hardware info, postgresql.conf settings
(excluding comments), table layouts including indexes and
constraints, and the results of:

EXPLAIN ANALYZE select ...

-Kevin

От:
Andy Colson
Дата:

On 12/7/2010 11:34 AM, Tom Polak wrote:
> We are in the process of deciding on how to proceed on a database
> upgrade.  We currently have MS SQL 2000 running on Windows 2003 (on my
> test server).  I was shocked at the cost for MS SQL 2008 R2 for a new
> server (2 CPU license).  I started comparing DB’s and came across
> postgresql.  It seemed to be exactly what I was after.  All of our
> programming is in ASP.net.  Since I am running MSSQL 2000 I have no
> benefit for .Net integration, so it is not a concern.
>
> I ran a head to head test of MS SQL 2000 and Postgresql 9.0.  Both are
> running on Windows 2003.  What I found was quite surprising and I am
> wondering if anyone can point out what is going on here.
> Here is the test I ran.
> I created 2 tables, the main table had 5 fields with a serial ID field.
> The second table linked to table 1 for a state field.
>
> I had ASP.net via MSSQL create 1,000 records in the main table. Took
> 9.85 seconds to complete.
> Next I had ASP.net via Postgresql create 1,000 records.  Took .65625
> seconds.
> Postgresql smoked MS SQL server on that test.

did you play with the postgresql.conf file?  Maybe turn off fsync?  I'd
guess the above is mssql is flushing to disk while PG isnt.

>
> Next test is to use ASP.net and join all 1,000 rows with table 2 and
> then display the text out.
>
> MS SQL took 0.76 seconds to display
> select name,address,city,state,statename,stateid,other from pgtemp1 left
> join pgtemp2 on state=stateid
>
> Then I did the same test via Postgresql and it took 8.85 seconds!  I
> tried it again as I thought I did something wrong.  I did a few tweaks
> such as increasing the shared buffers.  Still the best I could get it to
> was 7.5 seconds.  This is insanely slow compared to MSSQL 2000.  What am
> I missing.  Here is my SQL statement for postgresql:
> select name,address,city,state,statename,stateid,other from pgtemp1 left
> join pgtemp2 on state=stateid
>
> Any ideas on why the Postgres server is soooo much slower on the
> joins?   I am trying to understand what is going on here so please don’t
> flame me.  Any advice is appreciated.
>

Did you create an index?  That'd be my first guess.  Also, can you run
the sql from the command line client (psql) and see if it takes that
long?  While your in psql, stick a 'explain analyze' infront of your
query, and let's see its output.

Also, as a fair warning: mssql doesn't really care about transactions,
but PG really does.  Make sure all your code is properly starting and
commiting transactions.

-Andy

От:
Andres Freund
Дата:

On Tuesday 07 December 2010 18:34:25 Tom Polak wrote:
> Then I did the same test via Postgresql and it took 8.85 seconds!  I tried
> it again as I thought I did something wrong.  I did a few tweaks such as
> increasing the shared buffers.  Still the best I could get it to was 7.5
> seconds.  This is insanely slow compared to MSSQL 2000.  What am I missing.
> Here is my SQL statement for postgresql:
> select name,address,city,state,statename,stateid,other from pgtemp1 left
> join pgtemp2 on state=stateid
I think you would at least provide the exact schema and possibly some example
data (pg_dump) to get us somewhere.

I would suggest you post the output of EXPLAIN ANALYZE $yourquery - that gives
us information about how that query was executed.

Greetings,

Andres

От:
Craig James
Дата:

On 12/7/10 9:34 AM, Tom Polak wrote:
> We are in the process of deciding on how to proceed on a database upgrade.  We currently have MS SQL 2000 running on
Windows2003 (on my test server).  I was shocked at the cost for MS SQL 2008 R2 for a new server (2 CPU license).  I
startedcomparing DB’s and came across postgresql.  It seemed to be exactly what I was after.  All of our programming is
inASP.net.  Since I am running MSSQL 2000 I have no benefit for .Net integration, so it is not a concern. 
>
> I ran a head to head test of MS SQL 2000 and Postgresql 9.0.  Both are running on Windows 2003.  What I found was
quitesurprising and I am wondering if anyone can point out what is going on here. 
> Here is the test I ran.
> I created 2 tables, the main table had 5 fields with a serial ID field.  The second table linked to table 1 for a
statefield. 

Did you run ANALYZE on the database after creating it and loading the data?  If not, do it and try again (along with
theother suggestions you'll get here).  ANALYZE gathers the statistics that allow the planner to do its job.  Without
statistics,all bets are off. 

Craig

> I had ASP.net via MSSQL create 1,000 records in the main table. Took 9.85 seconds to complete.
> Next I had ASP.net via Postgresql create 1,000 records.  Took .65625 seconds.
> Postgresql smoked MS SQL server on that test.
>
> Next test is to use ASP.net and join all 1,000 rows with table 2 and then display the text out.
>
> MS SQL took 0.76 seconds to display
> select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid
>
> Then I did the same test via Postgresql and it took 8.85 seconds!  I tried it again as I thought I did something
wrong. I did a few tweaks such as increasing the shared buffers.  Still the best I could get it to was 7.5 seconds.
Thisis insanely slow compared to MSSQL 2000.  What am I missing.  Here is my SQL statement for postgresql: 
> select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid
>
> Any ideas on why the Postgres server is soooo much slower on the joins?   I am trying to understand what is going on
hereso please don’t flame me.  Any advice is appreciated. 
>
> *Thanks,
> Tom Polak
> Rockford Area Association of Realtors
> */
> The information contained in this email message is intended only for the use of the individual or entity named.  If
thereader of this email is not the intended recipient or the employee or agent responsible for delivering it to the
intendedrecipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly
prohibited. If you have received this email in error, please immediately notify us by telephone and reply email.  Thank
you./
>
> /Although this email and any attachments are believed to be free of any viruses or other defects that might affect
anycomputer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is
freeof viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage
thatresults./ 
>


От:
Andy Colson
Дата:

On 12/7/2010 1:22 PM, Justin Pitts wrote:
>>
>> Also, as a fair warning: mssql doesn't really care about transactions, but
>> PG really does.  Make sure all your code is properly starting and commiting
>> transactions.
>>
>> -Andy
>
> I do not understand that statement. Can you explain it a bit better?

In mssql you can write code that connects to the db, fire off updates
and inserts, and then disconnects.  I believe mssql will keep all your
changes, and the transaction stuff is done for you.

In PG the first statement you fire off (like an "insert into" for
example) will start a transaction.  If you dont commit before you
disconnect that transaction will be rolled back.  Even worse, if your
program does not commit, but keeps the connection to the db open, the
transaction will stay open too.

There are differences in the way mssql and pg do transactions.  mssql
uses a transaction log and keeps current data in the table.  In mssql if
you open a transaction and write a bunch of stuff, the table contains
that new stuff.  Everyone can see it.  (I think default transaction
isolation level is read commited).  But if you set your isolation level
to something with repeatable read, then your program will block and have
to wait on every little change to the table.  (or, probably page.. I
think mssql has page level locking?)

anyway, in PG, multiple versions of the same row are kept, and when you
open, and keep open a transaction, PG has to keep a version of the row
for every change that other people make.  So a long lasting transaction
could create hundreds of versions of one row.  Then when somebody goes
to select against that table, it has to scan not only the rows, but
every version of every row!

So my point is, in PG, use transactions as they were meant to be used,
as single atomic operations.  Start, do some work, commit.

mssql made it easy to ignore transactions by doing it for you.  Ignoring
transaction in PG will hurt you.

you can google MVCC and "postgres idle in transaction" for more.

-Andy

От:
Richard Broersma
Дата:

On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson <> wrote:

> In PG the first statement you fire off (like an "insert into" for example)
> will start a transaction.  If you dont commit before you disconnect that
> transaction will be rolled back.  Even worse, if your program does not
> commit, but keeps the connection to the db open, the transaction will stay
> open too.

Huh - is this new?  I always thought that every statement was wrapped
in its own transaction unless you explicitly start your own.  So you
shouldn't need to commit before closing a connection if you never
opened a transaction to begin with.


--
Regards,
Richard Broersma Jr.

От:
Kenneth Marshall
Дата:

On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote:
> On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson <> wrote:
>
> > In PG the first statement you fire off (like an "insert into" for example)
> > will start a transaction. ?If you dont commit before you disconnect that
> > transaction will be rolled back. ?Even worse, if your program does not
> > commit, but keeps the connection to the db open, the transaction will stay
> > open too.
>
> Huh - is this new?  I always thought that every statement was wrapped
> in its own transaction unless you explicitly start your own.  So you
> shouldn't need to commit before closing a connection if you never
> opened a transaction to begin with.
>
>
> --
> Regards,
> Richard Broersma Jr.
>

The default of autocommit unless explicitly starting a transaction with
BEGIN is the normal behavior that I have seen as well.

Cheers,
Ken

От:
Gary Doades
Дата:

On 07/12/2010 7:43 PM, Andy Colson wrote:
> On 12/7/2010 1:22 PM, Justin Pitts wrote:
>>>
>>> Also, as a fair warning: mssql doesn't really care about
>>> transactions, but
>>> PG really does.  Make sure all your code is properly starting and
>>> commiting
>>> transactions.
>>>
>>> -Andy
>>
>> I do not understand that statement. Can you explain it a bit better?
>
> In mssql you can write code that connects to the db, fire off updates
> and inserts, and then disconnects.  I believe mssql will keep all your
> changes, and the transaction stuff is done for you.
>
> In PG the first statement you fire off (like an "insert into" for
> example) will start a transaction.  If you dont commit before you
> disconnect that transaction will be rolled back.  Even worse, if your
> program does not commit, but keeps the connection to the db open, the
> transaction will stay open too.
As far as I know both MS SQL and  and Postgres work just the same as
regards explicit and implicit (autocommit) transactions, only the
underlying storage/logging mechanisms are different.

Transactions shouldn't make ay real difference to the select/join
performance being complained about though. It's already stated that the
insert performance of postgres far exceeds SQL Server, which is my
experience also.

As already suggested, until we see the exact table definitions including
indexes etc. there's no real way to tell what the problem is. How many
rows are in the second table? It really shouldn't take that much time to
read 1000 rows unless you have a bizarrely slow hard disk.

It would be nice to eliminate any programmatic or driver influence too.
How does the SQL select execute in enterprise manager for mssql and psql
or pgadmin for postgres?

Cheers,
Gary.


От:
Mladen Gogala
Дата:

Tom Polak wrote:
>
> We are in the process of deciding on how to proceed on a database
> upgrade.  We currently have MS SQL 2000 running on Windows 2003 (on my
> test server).  I was shocked at the cost for MS SQL 2008 R2 for a new
> server (2 CPU license).  I started comparing DB’s and came across
> postgresql.  It seemed to be exactly what I was after.  All of our
> programming is in ASP.net.  Since I am running MSSQL 2000 I have no
> benefit for .Net integration, so it is not a concern.
>
>
>
> I ran a head to head test of MS SQL 2000 and Postgresql 9.0.  Both are
> running on Windows 2003.  What I found was quite surprising and I am
> wondering if anyone can point out what is going on here.
> Here is the test I ran.
> I created 2 tables, the main table had 5 fields with a serial ID
> field.  The second table linked to table 1 for a state field.
>
> I had ASP.net via MSSQL create 1,000 records in the main table. Took
> 9.85 seconds to complete.
> Next I had ASP.net via Postgresql create 1,000 records.  Took .65625
> seconds.
> Postgresql smoked MS SQL server on that test.
>
>
>
> Next test is to use ASP.net and join all 1,000 rows with table 2 and
> then display the text out.
>
> MS SQL took 0.76 seconds to display
> select name,address,city,state,statename,stateid,other from pgtemp1
> left join pgtemp2 on state=stateid
>
>
>
> Then I did the same test via Postgresql and it took 8.85 seconds!  I
> tried it again as I thought I did something wrong.  I did a few tweaks
> such as increasing the shared buffers.  Still the best I could get it
> to was 7.5 seconds.  This is insanely slow compared to MSSQL 2000.
> What am I missing.  Here is my SQL statement for postgresql:
> select name,address,city,state,statename,stateid,other from pgtemp1
> left join pgtemp2 on state=stateid
>
>
>
> Any ideas on why the Postgres server is soooo much slower on the
> joins?   I am trying to understand what is going on here so please
> don’t flame me.  Any advice is appreciated.
>
>
>
>
>
>
Are all structures the same? Are all indexes the same? What does
"explain analyze" tell you?


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




От:
Andy Colson
Дата:

On 12/7/2010 2:10 PM, Kenneth Marshall wrote:
> On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote:
>> On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson<>  wrote:
>>
>>> In PG the first statement you fire off (like an "insert into" for example)
>>> will start a transaction. ?If you dont commit before you disconnect that
>>> transaction will be rolled back. ?Even worse, if your program does not
>>> commit, but keeps the connection to the db open, the transaction will stay
>>> open too.
>>
>> Huh - is this new?  I always thought that every statement was wrapped
>> in its own transaction unless you explicitly start your own.  So you
>> shouldn't need to commit before closing a connection if you never
>> opened a transaction to begin with.
>>
>>
>> --
>> Regards,
>> Richard Broersma Jr.
>>
>
> The default of autocommit unless explicitly starting a transaction with
> BEGIN is the normal behavior that I have seen as well.
>
> Cheers,
> Ken

Crikey!  You're right.  I need to be more careful with my assumptions.

I maintain that people need to be more careful with pg transactions.
I've seen several posts about "idle in transaction".  But its not as bad
as I made out.  My confusion comes from the library I use to hit PG,
which fires off a "begin" for me, and if I dont explicitly commit, it
gets rolled back.

sorry, it was confused between framework and PG.

-Andy

От:
Tom Polak
Дата:

What I was really after was a quick comparison between the two.  I did not
create anything special, just the two tables.  One table SQL generated the
records for me.  I did not tweak anything after installing either system.
There was a primary key on the ID field of both tables, no indexes though
in either system.  The second table had 1 record in it.  The hardware it
is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5.  Btw,
the cost for MS SQL 2008 R2 is ~$14,000 for 2 cpus,
http://www.cdw.com/shop/products/default.aspx?EDC=2167810 .  That is why I
am pursuing this.  :)

Here is the ASP.net code that I was running
Dim starttime As Date = Date.Now
        Dim endtime As Date
        Dim reader As NpgsqlDataReader
        Dim output2 As String = ""


        Dim oConn As New
NpgsqlConnection("Server=192.168.1.5;Port=5432;Userid=postgres;Password=12
345;Protocol=3;SSL=false;Pooling=true;MinPoolSize=1;MaxPoolSize=20;Encodin
g=UNICODE;Timeout=15;SslMode=Disable;Database=tomtemp")
        oConn.Open()
        Dim x As Integer = 0
        'For x = 0 To 1000 'uncomment to insert records.
        'Dim command As New NpgsqlCommand("insert into pgtemp1(name,
address, city, state) values ('Tom" & x & "','123" & x & " main
st','rockford',1) ", oConn) 'meant for loop to put in 1,000 records in
pgtemp1 table
        'Dim command As New NpgsqlCommand("insert into pgtemp2(statename,
stateid, other) values ('Illinois',1,'This is a lot of fun') ", oConn)
'only sends 1 record into the table pgtemp2
        'command.ExecuteNonQuery()
        'Next

        'join table and read 1000 rows.
        Dim command As New NpgsqlCommand("select
name,address,city,state,statename,stateid,other from pgtemp1 left join
pgtemp2 on state=stateid", oConn)
        reader = command.ExecuteReader()
        While reader.read()
            output2 += "<tr><td>" & reader("name") & "</td><td>" &
reader("address") & "</td><td>" & reader("city") & "</td><td>" &
reader("statename") & "</td><td>" & reader("other") & "</td></tr>"
        End While
        oConn.Close()
        readeroutput.text =
"<table><tr><td>Name:</td><td>Address:</td><td>City:</td><td>State</td><td
>Other</td></tr>" & output2 & "</table>"

        endtime = Date.Now
        Dim runtime As String
        runtime = endtime.Subtract(starttime).TotalSeconds
        output.text = starttime.ToString & " " & runtime

The SQL is a straight convert from MS SQL code.  I did not tweak either
system.

From EXPLAIN ANALYZE I can see the query ran much faster.
"Nested Loop Left Join  (cost=0.00..138.04 rows=1001 width=1298) (actual
time=0.036..4.679 rows=1001 loops=1)"
"  Join Filter: (pgtemp1.state = pgtemp2.stateid)"
"  ->  Seq Scan on pgtemp1  (cost=0.00..122.01 rows=1001 width=788)
(actual time=0.010..0.764 rows=1001 loops=1)"
"  ->  Materialize  (cost=0.00..1.01 rows=1 width=510) (actual
time=0.000..0.001 rows=1 loops=1001)"
"        ->  Seq Scan on pgtemp2  (cost=0.00..1.01 rows=1 width=510)
(actual time=0.006..0.008 rows=1 loops=1)"
"Total runtime: 5.128 ms"

The general question comes down to, can I expect decent perfomance from
Postgresql compared to MSSQL.  I was hoping that Postgresql 9.0 beat MSSQL
2000 since MS 2000 is over 10 years old.

Thanks,
Tom Polak
Rockford Area Association of Realtors
815-395-6776 x203

The information contained in this email message is intended only for the
use of the individual or entity named.  If the reader of this email is not
the intended recipient or the employee or agent responsible for delivering
it to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this email is strictly
prohibited.  If you have received this email in error, please immediately
notify us by telephone and reply email.  Thank you.

Although this email and any attachments are believed to be free of any
viruses or other defects that might affect any computer system into which
it is received and opened, it is the responsibility of the recipient to
ensure that it is free of viruses, and the Rockford Area Association of
Realtors hereby disclaims any liability for any loss or damage that
results.


-----Original Message-----
From: 
[mailto:] On Behalf Of Andy Colson
Sent: Tuesday, December 07, 2010 2:23 PM
To: Kenneth Marshall
Cc: Richard Broersma; Justin Pitts; 
Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

On 12/7/2010 2:10 PM, Kenneth Marshall wrote:
> On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote:
>> On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson<>
wrote:
>>
>>> In PG the first statement you fire off (like an "insert into" for
example)
>>> will start a transaction. ?If you dont commit before you disconnect
that
>>> transaction will be rolled back. ?Even worse, if your program does not
>>> commit, but keeps the connection to the db open, the transaction will
stay
>>> open too.
>>
>> Huh - is this new?  I always thought that every statement was wrapped
>> in its own transaction unless you explicitly start your own.  So you
>> shouldn't need to commit before closing a connection if you never
>> opened a transaction to begin with.
>>
>>
>> --
>> Regards,
>> Richard Broersma Jr.
>>
>
> The default of autocommit unless explicitly starting a transaction with
> BEGIN is the normal behavior that I have seen as well.
>
> Cheers,
> Ken

Crikey!  You're right.  I need to be more careful with my assumptions.

I maintain that people need to be more careful with pg transactions.
I've seen several posts about "idle in transaction".  But its not as bad
as I made out.  My confusion comes from the library I use to hit PG,
which fires off a "begin" for me, and if I dont explicitly commit, it
gets rolled back.

sorry, it was confused between framework and PG.

-Andy

--
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

От:
Craig James
Дата:

On 12/7/10 1:29 PM, Tom Polak wrote:
> What I was really after was a quick comparison between the two.  I did not
> create anything special, just the two tables.  One table SQL generated the
> records for me.  I did not tweak anything after installing either system.

That's not a valid test.  Postgres is NOT intended to be used out of the box.  The default parameters aren't useful.

> There was a primary key on the ID field of both tables, no indexes though
> in either system.  The second table had 1 record in it.  The hardware it
> is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5.  Btw,
> the cost for MS SQL 2008 R2 is ~$14,000 for 2 cpus,
> http://www.cdw.com/shop/products/default.aspx?EDC=2167810 .  That is why I
> am pursuing this.  :)
>
> Here is the ASP.net code that I was running
> Dim starttime As Date = Date.Now
>          Dim endtime As Date
>          Dim reader As NpgsqlDataReader
>          Dim output2 As String = ""
>
>
>          Dim oConn As New
> NpgsqlConnection("Server=192.168.1.5;Port=5432;Userid=postgres;Password=12
> 345;Protocol=3;SSL=false;Pooling=true;MinPoolSize=1;MaxPoolSize=20;Encodin
> g=UNICODE;Timeout=15;SslMode=Disable;Database=tomtemp")
>          oConn.Open()
>          Dim x As Integer = 0
>          'For x = 0 To 1000 'uncomment to insert records.
>          'Dim command As New NpgsqlCommand("insert into pgtemp1(name,
> address, city, state) values ('Tom"&  x&  "','123"&  x&  " main
> st','rockford',1) ", oConn) 'meant for loop to put in 1,000 records in
> pgtemp1 table
>          'Dim command As New NpgsqlCommand("insert into pgtemp2(statename,
> stateid, other) values ('Illinois',1,'This is a lot of fun') ", oConn)
> 'only sends 1 record into the table pgtemp2
>          'command.ExecuteNonQuery()
>          'Next

You still haven't done an ANALYZE sql statement after filling your tables with data.  You should execute "analyze
pgtemp1"and "analyze pgtemp2" before you do any performance tests.  Otherwise your results are meaningless. 

Craig

>
>          'join table and read 1000 rows.
>          Dim command As New NpgsqlCommand("select
> name,address,city,state,statename,stateid,other from pgtemp1 left join
> pgtemp2 on state=stateid", oConn)
>          reader = command.ExecuteReader()
>          While reader.read()
>              output2 += "<tr><td>"&  reader("name")&  "</td><td>"&
> reader("address")&  "</td><td>"&  reader("city")&  "</td><td>"&
> reader("statename")&  "</td><td>"&  reader("other")&  "</td></tr>"
>          End While
>          oConn.Close()
>          readeroutput.text =
> "<table><tr><td>Name:</td><td>Address:</td><td>City:</td><td>State</td><td
>> Other</td></tr>"&  output2&  "</table>"
>
>          endtime = Date.Now
>          Dim runtime As String
>          runtime = endtime.Subtract(starttime).TotalSeconds
>          output.text = starttime.ToString&  " "&  runtime
>
> The SQL is a straight convert from MS SQL code.  I did not tweak either
> system.
>
>> From EXPLAIN ANALYZE I can see the query ran much faster.
> "Nested Loop Left Join  (cost=0.00..138.04 rows=1001 width=1298) (actual
> time=0.036..4.679 rows=1001 loops=1)"
> "  Join Filter: (pgtemp1.state = pgtemp2.stateid)"
> "  ->   Seq Scan on pgtemp1  (cost=0.00..122.01 rows=1001 width=788)
> (actual time=0.010..0.764 rows=1001 loops=1)"
> "  ->   Materialize  (cost=0.00..1.01 rows=1 width=510) (actual
> time=0.000..0.001 rows=1 loops=1001)"
> "        ->   Seq Scan on pgtemp2  (cost=0.00..1.01 rows=1 width=510)
> (actual time=0.006..0.008 rows=1 loops=1)"
> "Total runtime: 5.128 ms"
>
> The general question comes down to, can I expect decent perfomance from
> Postgresql compared to MSSQL.  I was hoping that Postgresql 9.0 beat MSSQL
> 2000 since MS 2000 is over 10 years old.
>
> Thanks,
> Tom Polak
> Rockford Area Association of Realtors
> 815-395-6776 x203
>
> The information contained in this email message is intended only for the
> use of the individual or entity named.  If the reader of this email is not
> the intended recipient or the employee or agent responsible for delivering
> it to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this email is strictly
> prohibited.  If you have received this email in error, please immediately
> notify us by telephone and reply email.  Thank you.
>
> Although this email and any attachments are believed to be free of any
> viruses or other defects that might affect any computer system into which
> it is received and opened, it is the responsibility of the recipient to
> ensure that it is free of viruses, and the Rockford Area Association of
> Realtors hereby disclaims any liability for any loss or damage that
> results.
>
>
> -----Original Message-----
> From: 
> [mailto:] On Behalf Of Andy Colson
> Sent: Tuesday, December 07, 2010 2:23 PM
> To: Kenneth Marshall
> Cc: Richard Broersma; Justin Pitts; 
> Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
>
> On 12/7/2010 2:10 PM, Kenneth Marshall wrote:
>> On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote:
>>> On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson<>
> wrote:
>>>
>>>> In PG the first statement you fire off (like an "insert into" for
> example)
>>>> will start a transaction. ?If you dont commit before you disconnect
> that
>>>> transaction will be rolled back. ?Even worse, if your program does not
>>>> commit, but keeps the connection to the db open, the transaction will
> stay
>>>> open too.
>>>
>>> Huh - is this new?  I always thought that every statement was wrapped
>>> in its own transaction unless you explicitly start your own.  So you
>>> shouldn't need to commit before closing a connection if you never
>>> opened a transaction to begin with.
>>>
>>>
>>> --
>>> Regards,
>>> Richard Broersma Jr.
>>>
>>
>> The default of autocommit unless explicitly starting a transaction with
>> BEGIN is the normal behavior that I have seen as well.
>>
>> Cheers,
>> Ken
>
> Crikey!  You're right.  I need to be more careful with my assumptions.
>
> I maintain that people need to be more careful with pg transactions.
> I've seen several posts about "idle in transaction".  But its not as bad
> as I made out.  My confusion comes from the library I use to hit PG,
> which fires off a "begin" for me, and if I dont explicitly commit, it
> gets rolled back.
>
> sorry, it was confused between framework and PG.
>
> -Andy
>


От:
Gary Doades
Дата:

On 07/12/2010 9:29 PM, Tom Polak wrote:
>
>  From EXPLAIN ANALYZE I can see the query ran much faster.
> "Nested Loop Left Join  (cost=0.00..138.04 rows=1001 width=1298) (actual
> time=0.036..4.679 rows=1001 loops=1)"
> "  Join Filter: (pgtemp1.state = pgtemp2.stateid)"
> "  ->   Seq Scan on pgtemp1  (cost=0.00..122.01 rows=1001 width=788)
> (actual time=0.010..0.764 rows=1001 loops=1)"
> "  ->   Materialize  (cost=0.00..1.01 rows=1 width=510) (actual
> time=0.000..0.001 rows=1 loops=1001)"
> "        ->   Seq Scan on pgtemp2  (cost=0.00..1.01 rows=1 width=510)
> (actual time=0.006..0.008 rows=1 loops=1)"
> "Total runtime: 5.128 ms"
>
> The general question comes down to, can I expect decent perfomance from
> Postgresql compared to MSSQL.  I was hoping that Postgresql 9.0 beat MSSQL
> 2000 since MS 2000 is over 10 years old.
>
So postgres actually executed the select in around 5 miiliseconds.
Pretty good I would say. The problem therefore lies not with postgres
itself, but what is done with the results afterwards? Assuming that this
is pure local and therefore no network issues, perhaps there is a
performance issue in this case with the Npgsql driver? Someone who knows
more about this driver could perhaps shed some light on this?

I have used .NET (C#) with postgres before, but only using the odbc
driver. Perhaps you could try that instead (using OdbcCommand,
OdbcDataReader etc.).

I mainly use ruby (jruby) with postgres both under linux and Windows,
but I can certainly process 1000 records of similar structure in well
under 1 second.

Cheers,
Gary.


От:
"Kevin Grittner"
Дата:

Tom Polak <> wrote:

> I did not tweak anything after installing either system.

PostgreSQL is set up with defaults such that it will start up and
run on the most ancient an underpowered system people are likely to
have lying around.  It is expected that people will tune it for
serious production use, although people often run for years before
they hit a case where the tuning makes enough of a difference that
they do something about it.  For guidelines see this page:

http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

You can get a quick comparison without doing any tuning, but it
won't tell you much about how something else compares to PostgreSQL
when it is configured for production use.

> The hardware it is running on is fairly good, dual Xeon CPUs, 4 GB
> of RAM, Raid 5.

For comparison, I would set shared_buffers to at least 200 MB,
effective_cache_size to 2 to 3 GB, and I would probably drop both
seq_page_cost and random_page_cost to 0.1, unless you actually
expect to be using a database large enough that the active portion
won't be cached.  (In that case, a test with tiny tables *really*
means nothing, though.)  There are other settings that will also
help.

> "Nested Loop Left Join  (cost=0.00..138.04 rows=1001 width=1298)
> (actual time=0.036..4.679 rows=1001 loops=1)"

> "Total runtime: 5.128 ms"

The 0.036 ms is how long it took to produce the first row of the
result once it started running, 4.679 ms is the total run time, and
5.128 includes miscellaneous other time, such as planning time.  Of
course, the EXPLAIN ANALYZE adds some overhead, so the actual run
time would normally be faster, and with tuning it might be still
faster.

> The general question comes down to, can I expect decent perfomance
> from Postgresql compared to MSSQL.

That has been my experience.  There's something about your runtime
environment which isn't playing well with PostgreSQL.  If it were
me, I would make sure that as little of my stack as possible
depended on products provided by anyone with an interest in seeing
PostgreSQL look bad compared to the alternative.  I can think of at
least one company with fourteen thousand reasons to do so.

-Kevin

От:
"Pierre C"
Дата:

> The hardware it
> is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5.

For a database you'd want to consider replacing the RAID1 with a RAID1 (or
RAID10). RAID5 is slow for small random updates, which are common in
databases. Since you probably have enough harddisks anyway, this won't
cost you. Linux or freebsd would also be better choices for postgres
rather than windows.

Also, as said, your issue looks very much like a problem in the way your
application communicates with postgres : if it takes postgres 5 ms to
process the query and your application gets the result 8 seconds later,
there is a problem. Note that SQL Server probably takes just a few ms for
such a simple query, too, so your not really benchmarking SQL server
either.

От:
Tom Polak
Дата:

So, I am back on this topic again.
I have a related question, but this might be the correct thread (and
please let me know that).  The boss is pressing the issue because of the
cost of MSSQL.

What kind of performance can I expect out of Postgres compare to MSSQL?
Let's assume that Postgres is running on Cent OS x64 and MSSQL is running
on Windows 2008 x64, both are on identical hardware running RAID 5 (for
data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs,
24 GB of RAM.  I have searched around and I do not see anyone ever really
compare the two in terms of performance.  I have learned from this thread
that Postgres needs a lot of configuration to perform the best.

We provide the MLS service to our members.  Our data goes back to 1997 and
nothing is ever deleted.  Here is a general overview of our current MSSQL
setup.  We have over 10GB of data in a couple of tables (no pictures are
stored in SQL server).  Our searches do a lot of joins to combine data to
display a listing, history, comparables, etc.  We probably do 3 or 4 reads
for every write in the database.

Any comparisons in terms of performance would be great.  If not, how can I
quickly truly compare the two systems myself without coding everything to
work for both?  Thoughts? Opinions?

Thanks,
Tom Polak
Rockford Area Association of Realtors
815-395-6776 x203

The information contained in this email message is intended only for the
use of the individual or entity named.  If the reader of this email is not
the intended recipient or the employee or agent responsible for delivering
it to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this email is strictly
prohibited.  If you have received this email in error, please immediately
notify us by telephone and reply email.  Thank you.

Although this email and any attachments are believed to be free of any
viruses or other defects that might affect any computer system into which
it is received and opened, it is the responsibility of the recipient to
ensure that it is free of viruses, and the Rockford Area Association of
Realtors hereby disclaims any liability for any loss or damage that
results.

-----Original Message-----
From: Pierre C [mailto:]
Sent: Wednesday, December 08, 2010 5:36 PM
To: ; Tom Polak
Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

> The hardware it
> is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5.

For a database you'd want to consider replacing the RAID1 with a RAID1 (or

RAID10). RAID5 is slow for small random updates, which are common in
databases. Since you probably have enough harddisks anyway, this won't
cost you. Linux or freebsd would also be better choices for postgres
rather than windows.

Also, as said, your issue looks very much like a problem in the way your
application communicates with postgres : if it takes postgres 5 ms to
process the query and your application gets the result 8 seconds later,
there is a problem. Note that SQL Server probably takes just a few ms for

such a simple query, too, so your not really benchmarking SQL server
either.

От:
Craig James
Дата:

On 12/17/10 9:08 AM, Tom Polak wrote:
> So, I am back on this topic again.
> I have a related question, but this might be the correct thread (and
> please let me know that).  The boss is pressing the issue because of the
> cost of MSSQL.

You need to analyze the total cost of the system.  For the price of MSSQL and Windows, you can probably buy a couple
morereally nice servers, or one Really Big Server that would walk all over a Windows/MSSQL system of the same total
cost(hardware+software). 

But that said, if Postgres is properly tuned and your application tuned to make good use of Postgres' features, it will
comparewell with any modern database. 

> What kind of performance can I expect out of Postgres compare to MSSQL?
> Let's assume that Postgres is running on Cent OS x64 and MSSQL is running
> on Windows 2008 x64, both are on identical hardware running RAID 5 (for
> data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs,
> 24 GB of RAM.

RAID5 is a Really Bad Idea for any database.  It is S...L...O...W.  It does NOT give better redundancy and security;
RAID10 with a battery-backed RAID controller card is massively better for performance and just as good for redundancy
andsecurity. 

Craig

От:
Richard Broersma
Дата:

On Fri, Dec 17, 2010 at 9:08 AM, Tom Polak <> wrote:
> Any comparisons in terms of performance would be great.  If not, how can I
> quickly truly compare the two systems myself without coding everything to
> work for both?  Thoughts? Opinions?

I can only offer anecdotal information.

If you strictly have an OLTP workload, with lots of simultaneous
connections issuing queries across small chunks of data, then
PostgreSQL would be a good match for SQL server.

On the other-hand, if some of your work load is OLAP with a few
connections issuing complicated queries across large chunks of data,
then PostgreSQL will not perform as well as SQL server.  SQL server
can divide processing load of complicated queries across several
processor, while PostgreSQL cannot.

So, I guess it depends upon your workload.


--
Regards,
Richard Broersma Jr.

От:
Rob Wultsch
Дата:

On Fri, Dec 17, 2010 at 10:08 AM, Tom Polak
<> wrote:
> What kind of performance can I expect out of Postgres compare to MSSQL?

You should take any generalizations with a grain of salt. I suggest
that you do a POC.

> Let's assume that Postgres is running on Cent OS x64 and MSSQL is running
> on Windows 2008 x64, both are on identical hardware running RAID 5 (for
> data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs,
> 24 GB of RAM.

RAID-5 = suckage for databases.

Things to think about:
How big is your data set and how big is your working set?
Do you have a raid card? Is it properly configured?


--
Rob Wultsch


От:
Robert Haas
Дата:

On Fri, Dec 17, 2010 at 12:08 PM, Tom Polak
<> wrote:
> What kind of performance can I expect out of Postgres compare to MSSQL?
> Let's assume that Postgres is running on Cent OS x64 and MSSQL is running
> on Windows 2008 x64, both are on identical hardware running RAID 5 (for
> data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs,
> 24 GB of RAM.  I have searched around and I do not see anyone ever really
> compare the two in terms of performance.  I have learned from this thread
> that Postgres needs a lot of configuration to perform the best.

I think this is a pretty difficult question to answer.  There are
certainly people who are running databases on hardware like that -
even databases much bigger than yours - on PostgreSQL - and getting
acceptable performance.  But it does take some work.  In all fairness,
I think that if you started on PostgreSQL and moved to MS SQL (or any
other product), you'd probably need to make some adjustments going the
other direction to get good performance, too.  You're not going to
compare two major database systems across the board and find that one
of them is just twice as fast, across the board.  They have different
advantages and disadvantages.  When you're using one product, you
naturally do things in a way that works well for that product, and
moving to a different product means starting over.  Oh, putting this
in a stored procedure was faster on MS SQL, but it's slower on
PostgreSQL.  Using a view here was terrible on MS SQL, but much faster
under PostgreSQL.

The real answer here is that anything could be true for your workload,
and asking people on a mailing list to guess is a recipe for
disappointment.  You probably need to do some real benchmarking, and
PostgreSQL will be slower at first, and you'll tune it, and it's
LIKELY that you'll be able to achieve parity, or close enough that
it's worth it to save the $$$.  But you won't really know until you
try it, I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

От:
Andy Colson
Дата:

On 12/17/2010 11:08 AM, Tom Polak wrote:
> So, I am back on this topic again.
> I have a related question, but this might be the correct thread (and
> please let me know that).  The boss is pressing the issue because of the
> cost of MSSQL.
>
> What kind of performance can I expect out of Postgres compare to MSSQL?
> Let's assume that Postgres is running on Cent OS x64 and MSSQL is running
> on Windows 2008 x64, both are on identical hardware running RAID 5 (for
> data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs,
> 24 GB of RAM.  I have searched around and I do not see anyone ever really
> compare the two in terms of performance.  I have learned from this thread
> that Postgres needs a lot of configuration to perform the best.
>
> We provide the MLS service to our members.  Our data goes back to 1997 and
> nothing is ever deleted.  Here is a general overview of our current MSSQL
> setup.  We have over 10GB of data in a couple of tables (no pictures are
> stored in SQL server).  Our searches do a lot of joins to combine data to
> display a listing, history, comparables, etc.  We probably do 3 or 4 reads
> for every write in the database.
>
> Any comparisons in terms of performance would be great.  If not, how can I
> quickly truly compare the two systems myself without coding everything to
> work for both?  Thoughts? Opinions?
>
> Thanks,
> Tom Polak
> Rockford Area Association of Realtors
> 815-395-6776 x203
>
> The information contained in this email message is intended only for the
> use of the individual or entity named.  If the reader of this email is not
> the intended recipient or the employee or agent responsible for delivering
> it to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this email is strictly
> prohibited.  If you have received this email in error, please immediately
> notify us by telephone and reply email.  Thank you.
>
> Although this email and any attachments are believed to be free of any
> viruses or other defects that might affect any computer system into which
> it is received and opened, it is the responsibility of the recipient to
> ensure that it is free of viruses, and the Rockford Area Association of
> Realtors hereby disclaims any liability for any loss or damage that
> results.

Most of the time, the database is not the bottle neck.  So find the spot
where your current database IS the bottleneck.  Then write a test that
kinda matches that situation.

Lets say its 20 people doing an mls lookup at the exact same time, while
and update is running in the background to copy in new data.

Then write a simple test (I use perl for my simple tests) for both
databases.  If PG can hold up to your worst case situation, then maybe
you'll be alright.

Also:  Are you pegged right now?  Do you have slowness problems?  Even
if PG is a tad slower, will anybody even notice?  Maybe its not worth
worrying about?  If your database isnt pegging the box, I'd bet you wont
even notice a switch.

The other's that have answered have sound advice... but I thought I'd
say: I'm using raid-5!  Gasp!

Its true, I'm hosting maps with PostGIS, and the slowest part of the
process is the arial imagery, which is HUGE.  The database query's sit
around 1% of my cpu.  I needed the disk space for the imagery.  The
imagery code uses more cpu that PG does.  The database is 98% read,
though, so my setup is different that yours.

My maps get 100K hits a day.  The cpu's never use more than 20%.  I'm
running on a $350 computer, AMD Dual core, with 4 IDE disks in software
raid-5.  On Slackware Linux, of course!

-Andy

От:
Andy Colson
Дата:

On 12/17/2010 11:37 AM, Robert Haas wrote:
> On Fri, Dec 17, 2010 at 12:08 PM, Tom Polak
> <>  wrote:

> other direction to get good performance, too.  You're not going to
> compare two major database systems across the board and find that one
> of them is just twice as fast, across the board.  They have different
> advantages and disadvantages.  When you're using one product, you
> naturally do things in a way that works well for that product, and
> moving to a different product means starting over.  Oh, putting this
> in a stored procedure was faster on MS SQL, but it's slower on
> PostgreSQL.  Using a view here was terrible on MS SQL, but much faster
> under PostgreSQL.
>

Yeah, totally agree with that.  Every database has its own personality,
and you have to work with it.  Its way.  Dont expect one bit of code to
work great on all the different databases.  You need 5 different bits of
code, one for each database.

In the end, can PG be fast?  Yes.  Very.  But only when you treat is as
PG.  If you try to use PG as if it were mssql, you wont be a happy camper.

-Andy

От:
Tom Polak
Дата:

>The real answer here is that anything could be true for your workload,
and >asking people on a mailing list to guess is a recipe for
disappointment.  >You probably need to do some real benchmarking, and
PostgreSQL will be >slower at first, and you'll tune it, and it's LIKELY
that you'll be able to >achieve parity, or close enough that it's worth it
to save the $$$.  But >you won't really know until you try it, I think.

That is what I am really after.  I know that it will be a lot of work, but
at $15,000 for MSSQL server that is a lot of man hours.  Before I invest a
lot of time to do some real benchmarking I need to make sure it would be
worth my time.  I realize going into this that we will need to change
almost everything expect maybe the simplest Select statements.

> How big is your data set and how big is your working set?
> Do you have a raid card? Is it properly configured?

The data set can get large.  Just think of a real estate listing.  When we
display a Full View, EVERYTHING must be pulled from the database.
Sometimes we are talking about 75-100 fields if not more.  We can have up
to 300 members logged (we usually peak at about 30-50 requests per second)
in the system at one time doing various tasks.

The servers would be running on a RAID hardware solution, so it would all
be offloaded from the CPU.  I will have to check out RAID 10 for the next
server.


Thanks for all your help and opinions.

Thanks,
Tom Polak
Rockford Area Association of Realtors

The information contained in this email message is intended only for the
use of the individual or entity named.  If the reader of this email is not
the intended recipient or the employee or agent responsible for delivering
it to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this email is strictly
prohibited.  If you have received this email in error, please immediately
notify us by telephone and reply email.  Thank you.

Although this email and any attachments are believed to be free of any
viruses or other defects that might affect any computer system into which
it is received and opened, it is the responsibility of the recipient to
ensure that it is free of viruses, and the Rockford Area Association of
Realtors hereby disclaims any liability for any loss or damage that
results.


-----Original Message-----
From: Robert Haas [mailto:]
Sent: Friday, December 17, 2010 11:38 AM
To: Tom Polak
Cc: 
Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

On Fri, Dec 17, 2010 at 12:08 PM, Tom Polak
<> wrote:
> What kind of performance can I expect out of Postgres compare to MSSQL?
> Let's assume that Postgres is running on Cent OS x64 and MSSQL is
running
> on Windows 2008 x64, both are on identical hardware running RAID 5 (for
> data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs,
> 24 GB of RAM.  I have searched around and I do not see anyone ever
really
> compare the two in terms of performance.  I have learned from this
thread
> that Postgres needs a lot of configuration to perform the best.

I think this is a pretty difficult question to answer.  There are
certainly people who are running databases on hardware like that -
even databases much bigger than yours - on PostgreSQL - and getting
acceptable performance.  But it does take some work.  In all fairness,
I think that if you started on PostgreSQL and moved to MS SQL (or any
other product), you'd probably need to make some adjustments going the
other direction to get good performance, too.  You're not going to
compare two major database systems across the board and find that one
of them is just twice as fast, across the board.  They have different
advantages and disadvantages.  When you're using one product, you
naturally do things in a way that works well for that product, and
moving to a different product means starting over.  Oh, putting this
in a stored procedure was faster on MS SQL, but it's slower on
PostgreSQL.  Using a view here was terrible on MS SQL, but much faster
under PostgreSQL.

The real answer here is that anything could be true for your workload,
and asking people on a mailing list to guess is a recipe for
disappointment.  You probably need to do some real benchmarking, and
PostgreSQL will be slower at first, and you'll tune it, and it's
LIKELY that you'll be able to achieve parity, or close enough that
it's worth it to save the $$$.  But you won't really know until you
try it, I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

От:
Robert Haas
Дата:

On Fri, Dec 17, 2010 at 12:49 PM, Tom Polak
<> wrote:
> That is what I am really after.  I know that it will be a lot of work, but
> at $15,000 for MSSQL server that is a lot of man hours.  Before I invest a
> lot of time to do some real benchmarking I need to make sure it would be
> worth my time.  I realize going into this that we will need to change
> almost everything expect maybe the simplest Select statements.

I doubt it will be as bad as all that.  I think you'll need to spend
some time getting the database configured properly (you can ask for
help here, or buy support) and then I'd guess that much of it will
just work.  60%?  80%?  95%?  And then there will be some number of
problem cases that you'll need to spend time beating into submission.
I've had really good luck with PG over the years, and in fact switched
to it originally because I was having problems with another database
and when I switched to PG they just... went away.  Now your data set
is bigger than the ones I've worked with, so that tends to make things
a bit more complicated, but the important thing is to have some
patience and don't assume that any problems you run into are
insoluble.  They probably aren't.  Run EXPLAIN ANALYZE a lot, read the
documentation, ask questions, and if all else fails pay somebody a few
bucks to help you get through it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

От:
Scott Marlowe
Дата:

On Fri, Dec 17, 2010 at 10:32 AM, Craig James
<> wrote:
> RAID5 is a Really Bad Idea for any database.  It is S...L...O...W.  It does
> NOT give better redundancy and security; RAID 10 with a battery-backed RAID
> controller card is massively better for performance and just as good for
> redundancy and security.

The real performance problem with RAID 5 won't show up until a drive
dies and it starts rebuilding, at which point it's then WAYYYY slower,
and while it's rebuilding you don't have redundancy.  If you HAVE to
use stripes with redundancy, use RAID-6.  It's no faster when working
right, but with a single dead drive it's still ok on performance and
can rebuild at leisure since there's till redundancy in the system.
But really if you're running a db on anything other than RAID-10 you
need to reassess your priorities.

От:
Gael Le Mignot
Дата:

Hello Scott!

Fri, 17 Dec 2010 19:06:15 -0700, you wrote:

 > On Fri, Dec 17, 2010 at 10:32 AM, Craig James
 > <> wrote:
 >> RAID5 is a Really Bad Idea for any database.  It is S...L...O...W.  It does
 >> NOT give better redundancy and security; RAID 10 with a battery-backed RAID
 >> controller card is massively better for performance and just as good for
 >> redundancy and security.

 > The real performance problem with RAID 5 won't show up until a drive
 > dies and it starts rebuilding

I don't  agree with that. RAID5 is  very slow for random  writes, since
it needs to :

1. Read a copy of the old sector you are writing (usually in cache, but
not always) ;

2. Read a copy of the parity sector conresponding to it ;

3. Recompute the parity ;

4. Write the new data on the sector you are writing ;

5. Write the new parity data.

Operation 3. is fast, but that's  still 2 reads and 2 writes for writing
a sector, and the writes have to  come after the reads, so it can't even
be fully parallelised.

And if  the database has  heavy indexes, any INSERT/UPDATE  will trigger
random writes  to update the indexes.  Increasing checkpointing interval
can group some of the random writes, but they'll still occur.

A RAID  controller with  a lot  of cache can  mitigate the  random write
slowness, but  with enough  random writes, the  cache will  be saturated
anyway.

As other people commented, RAID10  is much more efficient for databases,
even if it "costs"  a bit more (if you put 4  disks in RAID10, you've 2x
the  capacity of  one  disk, if  you put  them  in RAID5  you've 3x  the
capacity of one disk).

--
Gaël Le Mignot - 
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

От:
Scott Marlowe
Дата:

2010/12/18 Gael Le Mignot <>:
> Hello Scott!
>
> Fri, 17 Dec 2010 19:06:15 -0700, you wrote:
>
>  > On Fri, Dec 17, 2010 at 10:32 AM, Craig James
>  > <> wrote:
>  >> RAID5 is a Really Bad Idea for any database.  It is S...L...O...W.  It does
>  >> NOT give better redundancy and security; RAID 10 with a battery-backed RAID
>  >> controller card is massively better for performance and just as good for
>  >> redundancy and security.
>
>  > The real performance problem with RAID 5 won't show up until a drive
>  > dies and it starts rebuilding
>
> I don't  agree with that. RAID5 is  very slow for random  writes, since
> it needs to :

Trust me I'm well aware of how bad RAID 5 is for write performance.
But as bad as that is, when the array is degraded it's 100 times
worse.  For a lot of workloads, the meh-grade performance of a working
RAID-5 is ok.  "Not a lot of write" data warehousing often runs just
fine on RAID-5.  Until the array degrades.  Then it's much much slower
than even a single drive would be.

От:
"Pierre C"
Дата:

>  > The real performance problem with RAID 5 won't show up until a drive
>  > dies and it starts rebuilding
>
> I don't  agree with that. RAID5 is  very slow for random  writes, since
> it needs to :

"The real problem" is when RAID5 loses a drive and goes from "acceptable"
kind of slow, to "someone's fired" kind of slow. Then of course in the
middle the rebuild, a bad sector is discovered in some place the
filesystem has never visited yet on one of the remaining drives, and all
hell breaks loose.

RAID6 is only one extra disk...

От:
Justin Pitts
Дата:

> If you strictly have an OLTP workload, with lots of simultaneous
> connections issuing queries across small chunks of data, then
> PostgreSQL would be a good match for SQL server.

This matches my observations. In fact, PostgreSQL's MVCC seems to work
heavily in my favor in OLTP workloads.

> On the other-hand, if some of your work load is OLAP with a few
> connections issuing complicated queries across large chunks of data,
> then PostgreSQL will not perform as well as SQL server.  SQL server
> can divide processing load of complicated queries across several
> processor, while PostgreSQL cannot.

While I agree with this in theory, it may or may not have a big impact
in practice. If you're not seeing multi-cpu activity spike up on your
MSSQL box during complex queries, you aren't likely to benefit much.
You can test by timing a query with and without a query hint of MAXDOP
1

    select * from foo with (MAXDOP = 1)

which limits it to one processor. If it runs just as fast on one
processor, then this feature isn't something you'll miss.

Another set of features that could swing performance in MSSQL's favor
are covering indexes and clustered indexes. You can sort-of get around
clustered indexes being unavailable in PostgreSQL - especially on
low-churn tables, by scheduling CLUSTER commands. I've seen
discussions recently that one or both of these features are being
looked at pretty closely for inclusion in PostgreSQL.