Обсуждение: pgbench --unlogged-tables

Поиск
Список
Период
Сортировка

pgbench --unlogged-tables

От
Robert Haas
Дата:
I know I'm not the only one to hack up pgbench to create unlogged
tables, so I thought maybe it would be useful to have an option to do
that.

I wasn't excited about picking a single letter option name, so I
modified pgbench to use getopt_long.  Patch attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: pgbench --unlogged-tables

От
Greg Smith
Дата:
That looks straightforward enough.  The other thing I keep realizing 
would be useful recently is to allow specifying a different tablespace 
to switch to when creating all of the indexes.  The old "data here, 
indexes on faster storage here" trick was already popular in some 
environments.  But it's becoming a really big win for environments that 
put indexes on SSD, and being able to simulate that easily with pgbench 
would be nice.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: pgbench --unlogged-tables

От
David Fetter
Дата:
On Fri, Jul 22, 2011 at 05:15:37PM -0400, Greg Smith wrote:
> That looks straightforward enough.  The other thing I keep realizing
> would be useful recently is to allow specifying a different
> tablespace to switch to when creating all of the indexes.  The old
> "data here, indexes on faster storage here" trick was already
> popular in some environments.  But it's becoming a really big win
> for environments that put indexes on SSD, and being able to simulate
> that easily with pgbench would be nice.

Do you have any theories as to how indexing on SSD speeds things up?
IIRC you found only marginal benefit in putting WALs there.  Are there
cases that SSD helps more than others when it comes to indexing?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: pgbench --unlogged-tables

От
Greg Smith
Дата:
On 07/22/2011 08:15 PM, David Fetter wrote:
> Do you have any theories as to how indexing on SSD speeds things up?
> IIRC you found only marginal benefit in putting WALs there.  Are there
> cases that SSD helps more than others when it comes to indexing?
>    

Yes, I've found a variety of workloads where using a SSD turns out to be 
slower than the old-school array of drives with a battery-backed write 
cache.  Tiny commits are slower, sequential writes can easily be slower, 
and if there isn't a random I/O component to the job the SSD won't get 
any way to make up for that.

In the standard pgbench case, the heavy UPDATE traffic does a lot of 
random writes to the index blocks of the pgbench_accounts table.  Even 
in cases where the whole index fits into RAM, having the indexes backed 
by a faster store can end up speeding those up, particularly at 
checkpoint time.  And if you can't quite fit the whole index in RAM, but 
it does fit on the SSD, being able to shuffle it in/out of flash as 
needed to look pointers to data blocks is a whole lot better than 
seeking around a regular drive.  That case is where the biggest win 
seems to be at.

I'd like to publish some hard numbers on all this, but have realized I 
need to relocate just the pgbench indexes to do a good simulation.  And 
I'm getting tired of doing that manually.  If I'm going to put time into 
testing this unlogged table variation that Robert has submitted, and I 
expect to, I'm just pointing out I'd like to that the "index on 
alternate tablespace" one available then too.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: pgbench --unlogged-tables

От
Robert Haas
Дата:
On Fri, Jul 22, 2011 at 5:15 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> That looks straightforward enough.

OK, committed.

> The other thing I keep realizing would
> be useful recently is to allow specifying a different tablespace to switch
> to when creating all of the indexes.  The old "data here, indexes on faster
> storage here" trick was already popular in some environments.  But it's
> becoming a really big win for environments that put indexes on SSD, and
> being able to simulate that easily with pgbench would be nice.

Hearing no objections, I did this, too.

At some point, we also need to sort out the scale factor limit issues,
so you can make these things bigger.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgbench --unlogged-tables

От
Greg Smith
Дата:
On 07/25/2011 09:23 AM, Robert Haas wrote:
> At some point, we also need to sort out the scale factor limit issues,
> so you can make these things bigger.
>    

I had a patch to improve that whole situation, but it hasn't seem to nag 
at me recently.  I forget why it seemed less important, but I doubt I'll 
make it another six months without coming to some resolution there.

The two systems I have in for benchmarking right now have 128GB and 
192GB of RAM in them, so large scales should have been tested.  
Unfortunately, it looks like the real-world limiting factor on doing 
lots of tests at big scales is how long it takes to populate the data 
set.  For example, here's pgbench creation time on a big server (48 
cores, 128GB RAM) with a RAID10 array, when scale=20000 (292GB):

real    174m12.055s
user    17m35.994s
sys     0m52.358s

And here's the same server putting the default tablespace (but not the 
WAL) on [much faster flash device I can't talk about yet]:

Creating new pgbench tables, scale=20000
real    169m59.541s
user    18m19.527s
sys    0m52.833s

I was hoping for a bigger drop here; maybe I needed to use unlogged 
tables? (ha!)  I think I need to start looking at the pgbench data 
generation stage as its own optimization problem.  Given how expensive 
systems this large are, I never get them for very long before they are 
rushed into production.  People don't like hearing that just generating 
the data set for a useful test is going to take 3 hours; that tends to 
limit how many of them I can schedule running.

And, yes, I'm going to try and sneak in some time to test fastpatch 
locking on one of these before they head into production.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: pgbench --unlogged-tables

От
David Fetter
Дата:
On Fri, Jul 22, 2011 at 10:15:08PM -0400, Greg Smith wrote:
> On 07/22/2011 08:15 PM, David Fetter wrote:
> >Do you have any theories as to how indexing on SSD speeds things
> >up?  IIRC you found only marginal benefit in putting WALs there.
> >Are there cases that SSD helps more than others when it comes to
> >indexing?
> 
> Yes, I've found a variety of workloads where using a SSD turns out
> to be slower than the old-school array of drives with a
> battery-backed write cache.  Tiny commits are slower, sequential
> writes can easily be slower, and if there isn't a random I/O
> component to the job the SSD won't get any way to make up for that.

So you're saying this is more of a flash thing than an SSD thing?  I
haven't heard of systems with PCM having this limitation.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate