Обсуждение: RE : RE: Postgresql vs SQLserver for this application ?

Поиск

Список

Период

Сортировка

RE : RE: Postgresql vs SQLserver for this application ?

От

bsimon@loxane.com

Дата:

06 апреля 2005 г., 11:11:02

Unfortunately.

But we are in the the process to choose Postgresql with pgcluster. I'm currently running some tests (performance, stability...)
Save the money on the license fees, you get it for your hardware ;-)

I still welcome any advices or comments and I'll let you know how the project is going on.

Benjamin.

"Mohan, Ross" <RMohan@arbinet.com>

05/04/2005 20:48

Pour : <bsimon@loxane.com>
cc :
Objet : RE: [PERFORM] Postgresql vs SQLserver for this application ?

You never got answers on this? Apologies, I don't have one, but'd be curious to hear about any you did get....

thx

Ross
-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of bsimon@loxane.com
Sent: Monday, April 04, 2005 4:02 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Postgresql vs SQLserver for this application ?

hi all.

We are designing a quite big application that requires a high-performance database backend.
The rates we need to obtain are at least 5000 inserts per second and 15 selects per second for one connection. There should only be 3 or 4 simultaneous connections.
I think our main concern is to deal with the constant flow of data coming from the inserts that must be available for selection as fast as possible. (kind of real time access ...)

As a consequence, the database should rapidly increase up to more than one hundred gigs. We still have to determine how and when we shoud backup old data to prevent the application from a performance drop. We intend to develop some kind of real-time partionning on our main table keep the flows up.

At first, we were planning to use SQL Server as it has features that in my opinion could help us a lot :
- replication
- clustering

Recently we started to study Postgresql as a solution for our project :
- it also has replication
- Postgis module can handle geographic datatypes (which would facilitate our developments)
- We do have a strong knowledge on Postgresql administration (we use it for production processes)
- it is free (!) and we could save money for hardware purchase.

Is SQL server clustering a real asset ? How reliable are Postgresql replication tools ? Should I trust Postgresql performance for this kind of needs ?

My question is a bit fuzzy but any advices are most welcome... hardware,tuning or design tips as well :))

Thanks a lot.

Benjamin.

Re: RE : RE: Postgresql vs SQLserver for this application ?

От

Alex Turner

Дата:

06 апреля 2005 г., 19:38:38

I think everyone was scared off by the 5000 inserts per second number.

I've never seen even Oracle do this on a top end Dell system with
copious SCSI attached storage.

Alex Turner
netEconomist

On Apr 6, 2005 3:17 AM, bsimon@loxane.com <bsimon@loxane.com> wrote:
>
> Unfortunately.
>
> But we are in the the process to choose Postgresql with pgcluster. I'm
> currently running some tests (performance, stability...)
> Save the money on the license fees, you get it for your hardware ;-)
>
> I still welcome any advices or comments and I'll let you know how the
> project is going on.
>
> Benjamin.
>
>
>
>  "Mohan, Ross" <RMohan@arbinet.com>
>
> 05/04/2005 20:48
>
>         Pour :        <bsimon@loxane.com>
>         cc :
>         Objet :        RE: [PERFORM] Postgresql vs SQLserver for this
> application ?
>
>
> You never got answers on this? Apologies, I don't have one, but'd be curious
> to hear about any you did get....
>
> thx
>
> Ross
>
> -----Original Message-----
>  From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf
> Of bsimon@loxane.com
>  Sent: Monday, April 04, 2005 4:02 AM
>  To: pgsql-performance@postgresql.org
>  Subject: [PERFORM] Postgresql vs SQLserver for this application ?
>
>
>  hi all.
>
>  We are designing a quite big application that requires a high-performance
> database backend.
>  The rates we need to obtain are at least  5000 inserts per second and 15
> selects per second for one connection. There should only be 3 or 4
> simultaneous connections.
>  I think our main concern is to deal with the constant flow of data coming
> from the inserts that must be available for selection as fast as possible.
> (kind of real time access ...)
>
>  As a consequence, the database should rapidly increase up to more than one
> hundred gigs. We still have to determine how and when we shoud backup old
> data to prevent the application from a performance drop. We intend to
> develop some kind of real-time partionning on our main table keep the flows
> up.
>
>  At first, we were planning to use SQL Server as it has features that in my
> opinion could help us a lot :
>         - replication
>         - clustering
>
>  Recently we started to study Postgresql as a solution for our project :
>         - it also has replication
>         - Postgis module can handle geographic datatypes (which would
> facilitate our developments)
>         - We do have a strong knowledge on Postgresql administration (we use
> it for production processes)
>         - it is free (!) and we could save money for hardware purchase.
>
>  Is SQL server clustering a real asset ? How reliable are Postgresql
> replication tools  ? Should I trust Postgresql performance for this kind of
> needs ?
>
>  My question is a bit fuzzy but any advices are most welcome...
> hardware,tuning or design tips as well :))
>
>  Thanks a lot.
>
>  Benjamin.
>
>
>

Re: RE : RE: Postgresql vs SQLserver for this application ?

От

"Mohan, Ross"

Дата:

06 апреля 2005 г., 20:13:39

I wish I had a Dell system and run case to show you Alex, but I don't...
however...using Oracle's "direct path" feature, it's pretty straightforward.

We've done 110,000 rows per second into index-less tables on a big system
(IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. Sustained
for almost 9 minutes. )

Yes, this is an exception, but oracle directpath/InsertAppend/BulkLoad
feature enabled us to migrate a 4 TB database...really quickly.

Now...if you ask me "can this work without Power5 and Hitachi SAN?"
my answer is..you give me a top end Dell and SCSI III on 15K disks
and I'll likely easily match it, yea.

I'd love to see PG get into this range..i am a big fan of PG (just a
rank newbie) but I gotta think the underlying code to do this has
to be not-too-complex.....

Best,

Ross



-----Original Message-----
From: Alex Turner [mailto:armtuk@gmail.com]
Sent: Wednesday, April 06, 2005 11:38 AM
To: bsimon@loxane.com
Cc: pgsql-performance@postgresql.org; Mohan, Ross
Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application ?


I think everyone was scared off by the 5000 inserts per second number.

I've never seen even Oracle do this on a top end Dell system with copious SCSI attached storage.

Alex Turner
netEconomist

On Apr 6, 2005 3:17 AM, bsimon@loxane.com <bsimon@loxane.com> wrote:
>
> Unfortunately.
>
> But we are in the the process to choose Postgresql with pgcluster. I'm
> currently running some tests (performance, stability...) Save the
> money on the license fees, you get it for your hardware ;-)
>
> I still welcome any advices or comments and I'll let you know how the
> project is going on.
>
> Benjamin.
>
>
>
>  "Mohan, Ross" <RMohan@arbinet.com>
>
> 05/04/2005 20:48
>
>         Pour :        <bsimon@loxane.com>
>         cc :
>         Objet :        RE: [PERFORM] Postgresql vs SQLserver for this
> application ?
>
>
> You never got answers on this? Apologies, I don't have one, but'd be
> curious to hear about any you did get....
>
> thx
>
> Ross
>
> -----Original Message-----
>  From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf
> Of bsimon@loxane.com
>  Sent: Monday, April 04, 2005 4:02 AM
>  To: pgsql-performance@postgresql.org
>  Subject: [PERFORM] Postgresql vs SQLserver for this application ?
>
>
>  hi all.
>
>  We are designing a quite big application that requires a
> high-performance database backend.  The rates we need to obtain are at
> least  5000 inserts per second and 15 selects per second for one
> connection. There should only be 3 or 4 simultaneous connections.
>  I think our main concern is to deal with the constant flow of data coming
> from the inserts that must be available for selection as fast as possible.
> (kind of real time access ...)
>
>  As a consequence, the database should rapidly increase up to more
> than one hundred gigs. We still have to determine how and when we
> shoud backup old data to prevent the application from a performance
> drop. We intend to develop some kind of real-time partionning on our
> main table keep the flows up.
>
>  At first, we were planning to use SQL Server as it has features that
> in my opinion could help us a lot :
>         - replication
>         - clustering
>
>  Recently we started to study Postgresql as a solution for our project :
>         - it also has replication
>         - Postgis module can handle geographic datatypes (which would
> facilitate our developments)
>         - We do have a strong knowledge on Postgresql administration
> (we use it for production processes)
>         - it is free (!) and we could save money for hardware
> purchase.
>
>  Is SQL server clustering a real asset ? How reliable are Postgresql
> replication tools  ? Should I trust Postgresql performance for this
> kind of needs ?
>
>  My question is a bit fuzzy but any advices are most welcome...
> hardware,tuning or design tips as well :))
>
>  Thanks a lot.
>
>  Benjamin.
>
>
>

Re: RE : RE: Postgresql vs SQLserver for this application

От

Steve Wampler

Дата:

06 апреля 2005 г., 20:39:25

Mohan, Ross wrote:
> I wish I had a Dell system and run case to show you Alex, but I don't...
> however...using Oracle's "direct path" feature, it's pretty straightforward.
>
> We've done 110,000 rows per second into index-less tables on a big system
> (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. Sustained
> for almost 9 minutes. )
>
> Yes, this is an exception, but oracle directpath/InsertAppend/BulkLoad
> feature enabled us to migrate a 4 TB database...really quickly.

How close to this is PG's COPY?  I get surprisingly good results using
COPY with jdbc on smallish systems (now if that patch would make into
the mainstream PG jdbc support!)  I think COPY has a bit more overhead
than what a Bulkload feature may have, but I suspect it's not that
much more.

> Now...if you ask me "can this work without Power5 and Hitachi SAN?"
> my answer is..you give me a top end Dell and SCSI III on 15K disks
> and I'll likely easily match it, yea.
>
> I'd love to see PG get into this range..i am a big fan of PG (just a
> rank newbie) but I gotta think the underlying code to do this has
> to be not-too-complex.....

It may not be that far off if you can use COPY instead of INSERT.
But comparing Bulkload to INSERT is a bit apples<->orangish.

--
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

Re: RE : RE: Postgresql vs SQLserver for this

От

Rod Taylor

Дата:

06 апреля 2005 г., 20:41:04

On Wed, 2005-04-06 at 16:12 +0000, Mohan, Ross wrote:
> I wish I had a Dell system and run case to show you Alex, but I don't...
> however...using Oracle's "direct path" feature, it's pretty straightforward.
>
> We've done 110,000 rows per second into index-less tables on a big system
> (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. Sustained
> for almost 9 minutes. )

Just for kicks I did a local test on a desktop machine (single CPU,
single IDE drive) using COPY from STDIN for a set of integers in via a
single transaction, no indexes.

1572864 tuples were loaded in 13715.613ms, which is approx 115k rows per
second.

Okay, no checkpoints and I didn't cross an index boundary, but I also
haven't tuned the config file beyond bumping up the buffers.

Lets try again with more data this time.

31Million tuples were loaded in approx 279 seconds, or approx 112k rows
per second.

> I'd love to see PG get into this range..i am a big fan of PG (just a
> rank newbie) but I gotta think the underlying code to do this has
> to be not-too-complex.....

I'd say we're there.

> -----Original Message-----
> From: Alex Turner [mailto:armtuk@gmail.com]
> Sent: Wednesday, April 06, 2005 11:38 AM
> To: bsimon@loxane.com
> Cc: pgsql-performance@postgresql.org; Mohan, Ross
> Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application ?
>
>
> I think everyone was scared off by the 5000 inserts per second number.
>
> I've never seen even Oracle do this on a top end Dell system with copious SCSI attached storage.
>
> Alex Turner
> netEconomist
>
> On Apr 6, 2005 3:17 AM, bsimon@loxane.com <bsimon@loxane.com> wrote:
> >
> > Unfortunately.
> >
> > But we are in the the process to choose Postgresql with pgcluster. I'm
> > currently running some tests (performance, stability...) Save the
> > money on the license fees, you get it for your hardware ;-)
> >
> > I still welcome any advices or comments and I'll let you know how the
> > project is going on.
> >
> > Benjamin.
> >
> >
> >
> >  "Mohan, Ross" <RMohan@arbinet.com>
> >
> > 05/04/2005 20:48
> >
> >         Pour :        <bsimon@loxane.com>
> >         cc :
> >         Objet :        RE: [PERFORM] Postgresql vs SQLserver for this
> > application ?
> >
> >
> > You never got answers on this? Apologies, I don't have one, but'd be
> > curious to hear about any you did get....
> >
> > thx
> >
> > Ross
> >
> > -----Original Message-----
> >  From: pgsql-performance-owner@postgresql.org
> > [mailto:pgsql-performance-owner@postgresql.org] On Behalf
> > Of bsimon@loxane.com
> >  Sent: Monday, April 04, 2005 4:02 AM
> >  To: pgsql-performance@postgresql.org
> >  Subject: [PERFORM] Postgresql vs SQLserver for this application ?
> >
> >
> >  hi all.
> >
> >  We are designing a quite big application that requires a
> > high-performance database backend.  The rates we need to obtain are at
> > least  5000 inserts per second and 15 selects per second for one
> > connection. There should only be 3 or 4 simultaneous connections.
> >  I think our main concern is to deal with the constant flow of data coming
> > from the inserts that must be available for selection as fast as possible.
> > (kind of real time access ...)
> >
> >  As a consequence, the database should rapidly increase up to more
> > than one hundred gigs. We still have to determine how and when we
> > shoud backup old data to prevent the application from a performance
> > drop. We intend to develop some kind of real-time partionning on our
> > main table keep the flows up.
> >
> >  At first, we were planning to use SQL Server as it has features that
> > in my opinion could help us a lot :
> >         - replication
> >         - clustering
> >
> >  Recently we started to study Postgresql as a solution for our project :
> >         - it also has replication
> >         - Postgis module can handle geographic datatypes (which would
> > facilitate our developments)
> >         - We do have a strong knowledge on Postgresql administration
> > (we use it for production processes)
> >         - it is free (!) and we could save money for hardware
> > purchase.
> >
> >  Is SQL server clustering a real asset ? How reliable are Postgresql
> > replication tools  ? Should I trust Postgresql performance for this
> > kind of needs ?
> >
> >  My question is a bit fuzzy but any advices are most welcome...
> > hardware,tuning or design tips as well :))
> >
> >  Thanks a lot.
> >
> >  Benjamin.
> >
> >
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
--

Re: RE : RE: Postgresql vs SQLserver for this application ?

От

"Mohan, Ross"

Дата:

06 апреля 2005 г., 20:44:55

How close to this is PG's COPY?  I get surprisingly good results using COPY with jdbc on smallish systems (now if that
patchwould make into the mainstream PG jdbc support!)  I think COPY has a bit more overhead than what a Bulkload
featuremay have, but I suspect it's not that much more. 

||  Steve, I do not know. But am reading the docs now, and should figure it out. Ask
    me later if you remember. Oracle's "direct path" is a way of just slamming blocks
    filled with rows into the table, above the high water mark. It sidesteps freelist
    management and all manner of intrablock issues. There is a "payback", but the benefits
    far far outweigh the costs.

> Now...if you ask me "can this work without Power5 and Hitachi SAN?" my
> answer is..you give me a top end Dell and SCSI III on 15K disks and
> I'll likely easily match it, yea.
>
> I'd love to see PG get into this range..i am a big fan of PG (just a
> rank newbie) but I gotta think the underlying code to do this has to
> be not-too-complex.....

It may not be that far off if you can use COPY instead of INSERT. But comparing Bulkload to INSERT is a bit
apples<->orangish.

||  Oh! I see! I had no idea I was doing that!  Thanks for pointing it out clearly to me. Yea, I would
    say a full transactional INSERT of 5K rows/sec into an indexed-table is a near-mythology without significant
    caveats (parallelized, deferred buffering, etc.)



--
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)

От

Mischa

Дата:

06 апреля 2005 г., 22:46:41

This thread seems to be focusing in on COPY efficiency,
I'd like to ask something I got no answer to, a few months ago.

Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
I accidentally strung together several \n-terminated input lines,
and sent them to the server with a single "putline".

To my (happy) surprise, I ended up with exactly that number of rows
in the target table.

Is this a bug? Is this fundamental to the protocol?

Since it hasn't been documented (but then, "endcopy" isn't documented),
I've been shy of investing in perf testing such mass copy calls.
But, if it DOES work, it should be reducing the number of network
roundtrips.

So. Is it a feechur? Worth stress-testing? Could be VERY cool.

--
"Dreams come true, not free."

Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)

От

Tom Lane

Дата:

07 апреля 2005 г., 00:28:20

Mischa <mischa.Sandberg@telus.net> writes:
> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".

> To my (happy) surprise, I ended up with exactly that number of rows
> in the target table.

> Is this a bug?

No, it's the way it's supposed to work.  "putline" really just sends a
stream of data ... there's no semantic significance to the number of
putline calls you use to send the stream, only to the contents of the
stream.  (By the same token, it's unlikely that deliberately aggregating
such calls would be much of a win.)

            regards, tom lane

Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this

От

Christopher Kings-Lynne

Дата:

07 апреля 2005 г., 06:02:59

> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".
>
> To my (happy) surprise, I ended up with exactly that number of rows
> in the target table.
>
> Is this a bug? Is this fundamental to the protocol?
>
> Since it hasn't been documented (but then, "endcopy" isn't documented),
> I've been shy of investing in perf testing such mass copy calls.
> But, if it DOES work, it should be reducing the number of network
> roundtrips.

I think it's documented in the libpq docs...

Chris

Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)

От

Harald Fuchs

Дата:

07 апреля 2005 г., 16:23:20

In article <1112813199.42542e8f17b4d@webmail.telus.net>,
Mischa <mischa.Sandberg@telus.net> writes:

> This thread seems to be focusing in on COPY efficiency,
> I'd like to ask something I got no answer to, a few months ago.

> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".

> To my (happy) surprise, I ended up with exactly that number of rows
> in the target table.

> Is this a bug? Is this fundamental to the protocol?

> Since it hasn't been documented (but then, "endcopy" isn't documented),
> I've been shy of investing in perf testing such mass copy calls.
> But, if it DOES work, it should be reducing the number of network
> roundtrips.

> So. Is it a feechur? Worth stress-testing? Could be VERY cool.

Using COPY from DBD::Pg _is_ documented - presumed you use DBD::Pg
version 1.41 released just today.

Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)

От

"Greg Sabino Mullane"

Дата:

08 апреля 2005 г., 06:03:55

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".
...
> So. Is it a feechur? Worth stress-testing? Could be VERY cool.

As explained elsewhere, not really a feature, more of a side-effect.
Keep in mind, however, that any network round-trip time saved has to
be balanced against some additional overhead of constructing the
combined strings in Perl before sending them over. Most times COPY
is used to parse a newline-separated file anyway. If you have a slow
network connection to the database, it *might* be a win, but my
limited testing shows that it is not an advantage for a "normal"
connection: I added 1 million rows via COPY using the normal way
(1 million pg_putline calls), via pg_putline of 1000 rows at a
time, and via 10,000 rows at a time. They all ran in 22 seconds,
with no statistical difference between them. (This was the "real" time,
the system time was actually much lower for the combined calls).

It can't hurt to test things out on your particular system and see
if it makes a real difference: it certainly does no harm as long as
you make sure the string you send always *end* in a newline.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200504072201
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFCVeZrvJuQZxSWSsgRAoP+AJ9jTNetePMwKv9rdyu6Lz+BjSiDOQCguoSU
ie9TaeIxUuvd5fhjFueacvM=
=1hWn
-----END PGP SIGNATURE-----

Re: multi-line copy (was: Re: COPY Hacks)

От

Mischa Sandberg

Дата:

08 апреля 2005 г., 08:53:34

Quoting Greg Sabino Mullane <greg@turnstep.com>:

> > Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> > I accidentally strung together several \n-terminated input lines,
> > and sent them to the server with a single "putline".
> ...
> > So. Is it a feechur? Worth stress-testing? Could be VERY cool.
>
> As explained elsewhere, not really a feature, more of a side-effect.
> Keep in mind, however, that any network round-trip time saved has to
> be balanced against some additional overhead of constructing the
> combined strings in Perl before sending them over. Most times COPY
> is used to parse a newline-separated file anyway. If you have a slow
> network connection to the database, it *might* be a win, but my
> limited testing shows that it is not an advantage for a "normal"
> connection: I added 1 million rows via COPY using the normal way
> (1 million pg_putline calls), via pg_putline of 1000 rows at a
> time, and via 10,000 rows at a time. They all ran in 22 seconds,
> with no statistical difference between them. (This was the "real" time,
> the system time was actually much lower for the combined calls).
>
> It can't hurt to test things out on your particular system and see
> if it makes a real difference: it certainly does no harm as long as
> you make sure the string you send always *end* in a newline.

Many thanks for digging into it.

For the app I'm working with, the time delay between rows being posted
is /just/ enough to exceed the TCP Nagle delay, so every row goes across
in its own packet :-( Reducing the number of network roundtrips
by a factor of 40 is enough to cut elapsed time in half.
The cost of join("",@FortyRows), which produces a 1-4K string, is what's
negligible in this case.

--
"Dreams come true, not free" -- S.Sondheim, ITW

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: RE : RE: Postgresql vs SQLserver for this application ?