Обсуждение: Unexpected pgbench result

Поиск
Список
Период
Сортировка

Unexpected pgbench result

От
Dave Johansen
Дата:
I'm working on setting up a large database (or at least what I consider to be a large one with several tables having 10-20 million records inserted per day), and I've been using pgbench to verify that the hardware and database are configured in an optimal manner.

When I run pgbench in "SELECT only" after doing "-i -s 2000" I get what appears to be good performance (60k-70k tps) but if I initialize a new database with "-i -s 4000" the tps drops to 4k-7k. Is this order of magnitude drop expected? Or is there something wrong with my hardware or database configuration that is causing this issue?

The one option that I've been considering is partitioning and I've been asking about it here:
http://www.postgresql.org/message-id/CAAcYxUcb0NFfMDsMOCL5scNRrUL7=9hKxjz021JMQp0r7f5sCQ@mail.gmail.com

Re: Unexpected pgbench result

От
Shaun Thomas
Дата:
On 12/19/2013 11:00 AM, Dave Johansen wrote:

> When I run pgbench in "SELECT only" after doing "-i -s 2000" I get what
> appears to be good performance (60k-70k tps) but if I initialize a new
> database with "-i -s 4000" the tps drops to 4k-7k. Is this order of
> magnitude drop expected? Or is there something wrong with my hardware or
> database configuration that is causing this issue?

When you increase the size of the initialized pgbench tables, you
increase the size on disk. My guess is that you doubled it so that the
data no longer fits in memory. You can verify this yourself:

SELECT pg_size_pretty(sum(pg_database_size(oid))::bigint)
   from pg_database;

Any amount of memory you have that is smaller than that, will affect
select performance. I can guarantee you will not get 60k-70k tps from
anything short of an array of SSD devices or a PCIe NVRAM solution. Your
'-s 2000' test was probably running mostly from memory, while the '-s
4000' did not.

What you're seeing is the speed your records are being supplied from
disk, plus whatever cache effects are there when records are read before
they are flushed in favor of more recent data.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Unexpected pgbench result

От
Dave Johansen
Дата:
On Thu, Dec 19, 2013 at 10:44 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 12/19/2013 11:00 AM, Dave Johansen wrote:

When I run pgbench in "SELECT only" after doing "-i -s 2000" I get what
appears to be good performance (60k-70k tps) but if I initialize a new
database with "-i -s 4000" the tps drops to 4k-7k. Is this order of
magnitude drop expected? Or is there something wrong with my hardware or
database configuration that is causing this issue?

When you increase the size of the initialized pgbench tables, you increase the size on disk. My guess is that you doubled it so that the data no longer fits in memory. You can verify this yourself:

SELECT pg_size_pretty(sum(pg_database_size(oid))::bigint)
  from pg_database;

Any amount of memory you have that is smaller than that, will affect select performance. I can guarantee you will not get 60k-70k tps from anything short of an array of SSD devices or a PCIe NVRAM solution. Your '-s 2000' test was probably running mostly from memory, while the '-s 4000' did not.

What you're seeing is the speed your records are being supplied from disk, plus whatever cache effects are there when records are read before they are flushed in favor of more recent data.

Yep, that was exactly it and that definitely makes sense now that you point it out.

Right now, we're running a RAID 1 for pg_clog, pg_log and pg_xlog and then a RAID 1+0 with 12 disks for the data. Would there be any benefit to running a separate RAID 1+0 with a tablespace for the indexes? Or is reading the indexes and data a serial process where separating them like that won't have any big benefit?

Thanks,
Dave

Re: Unexpected pgbench result

От
Shaun Thomas
Дата:
On 12/19/2013 04:06 PM, Dave Johansen wrote:

> Right now, we're running a RAID 1 for pg_clog, pg_log and pg_xlog and
> then a RAID 1+0 with 12 disks for the data. Would there be any benefit
> to running a separate RAID 1+0 with a tablespace for the indexes?

Not really. PostgreSQL doesn't currently support parallel backend
fetches, aggregation, or really anything. It's looking like 9.4 will get
us a lot closer to that, but right now, everything is serial.

Serial or not, separate backends will have separate read concerns, and
PostgreSQL 9.2 and above *do* support index only scans. So
theoretically, you might actually see some benefit there. If it were me
and I had spindles available, I would just increase the overall size of
the pool. It's a lot easier than managing multiple tablespaces.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Unexpected pgbench result

От
Dave Johansen
Дата:
On Fri, Dec 20, 2013 at 7:10 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 12/19/2013 04:06 PM, Dave Johansen wrote:

Right now, we're running a RAID 1 for pg_clog, pg_log and pg_xlog and
then a RAID 1+0 with 12 disks for the data. Would there be any benefit
to running a separate RAID 1+0 with a tablespace for the indexes?

Not really. PostgreSQL doesn't currently support parallel backend fetches, aggregation, or really anything. It's looking like 9.4 will get us a lot closer to that, but right now, everything is serial.

Serial or not, separate backends will have separate read concerns, and PostgreSQL 9.2 and above *do* support index only scans. So theoretically, you might actually see some benefit there. If it were me and I had spindles available, I would just increase the overall size of the pool. It's a lot easier than managing multiple tablespaces.

Ok, that makes sense. Is there a benefit to having the WAL and logs on the separate RAID 1? Or is just having them be part of the larger RAID 1+0 just as good?

Re: Unexpected pgbench result

От
Kevin Grittner
Дата:
Dave Johansen <davejohansen@gmail.com> wrote:

> Is there a benefit to having the WAL and logs on the separate
> RAID 1? Or is just having them be part of the larger RAID 1+0
> just as good?

I once accidentally left the pg_xlog directory on the 40-spindle
RAID with most of the data instead of moving it.  Results with
graph here:

http://www.postgresql.org/message-id/4B71358E020000250002F0E4@gw.wicourts.gov

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Unexpected pgbench result

От
Dave Johansen
Дата:
On Fri, Dec 20, 2013 at 8:22 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
Dave Johansen <davejohansen@gmail.com> wrote:

> Is there a benefit to having the WAL and logs on the separate
> RAID 1? Or is just having them be part of the larger RAID 1+0
> just as good?

I once accidentally left the pg_xlog directory on the 40-spindle
RAID with most of the data instead of moving it.  Results with
graph here:

http://www.postgresql.org/message-id/4B71358E020000250002F0E4@gw.wicourts.gov

That's very helpful information. Thanks for sharing it,
Dave