Обсуждение: Shared memory for large PostGIS operations

Поиск
Список
Период
Сортировка

Shared memory for large PostGIS operations

От
maplabs@light42.com
Дата:
(from #postgresql IRC on freenode)

darkblue_b I did an interesting experiment the other day davidfetter_vmw .. davidfetter_vmw do tell
darkblue_b well you know I do these huge monolithic postGIS queries on an otherwise idle linux machine.. and there was
apersistant thought in my head that Postgresql+PostGIS did not make good use of memory allocation >2G 

darkblue_b so I had this long, python driven analysis.. 15 steps.. some, unusual for me, are multiple queries running
atonce on the same data ... and others are just one labor intensive thing then the next   
    (one result table is 1.8M rows for 745M on disk, others are smaller)

darkblue_b I finally got the kinks worked out.. so I ran it twice.. 4.5 hours on our hardware.. once with
shared_buffersset to 2400M and the second time with shared_buffers set to 18000M 

darkblue_b work_mem was unchanged at 640M and.. the run times were within seconds of each other.. no improvement, no
penalty

darkblue_b I have been wondering about this for the last two years

davidfetter_vmw darkblue_b, have you gone over any of this on -performance or -hackers? darkblue_b no - though I think
Ishould start a blog .. I have a couple of things like this now darkblue_b good story though eh ? 

 davidfetter_vmw darkblue_b, um, it's a story that hasn't really gotten started until you've gotten some feedback from
-performancedarkblue_b ok - true... 

darkblue_b    pg 9.1  PostGIS 1.5.3    Ubuntu Server Oneiric 64bit  Dual Xeons  
one Western Digital black label for pg_default; one 3-disk RAID 5 for the database tablespace

==
Brian Hamlin
GeoCal
OSGeo California Chapter
415-717-4462 cell


Re: Shared memory for large PostGIS operations

От
Andy Colson
Дата:

So let me clean that up for you:

 > On 3/14/2012 11:29 PM, maplabs@light42.com wrote:

Hello list, my name is Brian Hamlin, but I prefer to go by darkblue, its
mysterious and dangerous!

I run PG 9.1, PostGIS 1.5.3, Linux 64 on Dual Xeons, OS on a single
drive, and db is on 3-disk raid 5.  I'm the only user.

work_mem = 640M

I do these huge monolithic postGIS queries on an otherwise idle linux
machine.  python driven analysis.. 15 steps.. some, unusual for me, are
multiple queries running at once on the same data ... and others are
just one labor intensive thing then the next (one result table is 1.8M
rows for 745M on disk, others are smaller)

I tried shared_buffers at both 2400M and 18000M, and it took 4.5 hours
both times.  I dont know if I am CPU bound or IO bound, but since giving
PG more ram didnt help much, I'll assume I'm CPU bound.  I heard of this
program called vmstat that I'll read up on and post some results for.

I don't know how much memory my box has, and I've never run explain
analyze, but I'll try it out and post some.  I just learned about
http://explain.depesz.com/ and figure it might help me.

This is the best list ever!  Thanks all!  (especially that poetic Dave
Fetter, and that somewhat mean, but helpful, Andy Colson)

Shout outs to my friends Garlynn, Nick and Rush (best band ever!).
Party, my house, next week!

> ==
> (Virtually) Brian Hamlin
> GeoCal
> OSGeo California Chapter
> 415-717-4462 cell


-Andy

Re: Shared memory for large PostGIS operations

От
"Kevin Grittner"
Дата:
Andy Colson <andy@squeakycode.net> wrote:

> I tried shared_buffers at both 2400M and 18000M, and it took 4.5
> hours both times.  I dont know if I am CPU bound or IO bound, but
> since giving PG more ram didnt help much, I'll assume I'm CPU
> bound.

All else being the same, adjusting shared_buffers affects how much
of your cache is managed by PostgreSQL and how much of your cache is
managed by the OS; it doesn't exactly change how much you have
cached or necessarily affect disk waits.  (There's a lot more that
can be said about the fine points of this, but you don't seem to
have sorted out the big picture yet.)

> I heard of this program called vmstat that I'll read up on and
> post some results for.

That's a good way to get a handle on whether your bottleneck is
currently CPU or disk access.

> I don't know how much memory my box has

That's pretty basic information when it comes to tuning.  What does
`free -m` show?  (Copy/paste is a good thing.)

> and I've never run explain analyze

If you're looking to make things faster (a fact not yet exactly in
evidence), you might want to start with the query which runs the
longest, or perhaps the one which most surprises you with its run
time, and get the EXPLAIN ANALYZE output for that query.  There is
other information you should include; this page should help:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

> I just learned about http://explain.depesz.com/ and figure it
> might help me.

It is a nice way to present EXPLAIN ANALYZE output from complex
queries.

-Kevin

Re: Shared memory for large PostGIS operations

От
"Kevin Grittner"
Дата:
Brian Hamlin <maplabs@light42.com> wrote:
> On Mar 16, 2012, at 7:17 AM, Kevin Grittner wrote:
>> Andy Colson <andy@squeakycode.net> wrote:
>>
>>> I tried shared_buffers at both 2400M and 18000M, and it took 4.5
>>> hours both times.  ... (weak attempts at humor omitted) ....

Ah, I didn't pick up on the attempts at humor; perhaps that's why
you mistook something I said as an attempt at an insult.  We get
posts here from people at all different levels of experience, and
many people are grateful for pointers on what various utilities can
do for them or how best to formulate a post so they can get help
when they need it.  Attempts to help don't constitute insults, even
if the need is feigned.

>> All else being the same, adjusting shared_buffers affects how
>> much of your cache is managed by PostgreSQL and how much of your
>> cache is managed by the OS; it doesn't exactly change how much
>> you have cached or necessarily affect disk waits.

> Linux caching is aggressive already.. so I think this example
> points out that Postgres caching is not contributing here..  thats
> why I posted this short example to this list.. I thought ti was a
> useful data point.. that it might be useful to others... and to
> the PostgreSQL project devs...

Yeah, guidelines for shared_buffers in the docs are vague because
the best setting varies so much with the workload.  While the docs
hint at setting it at 25% of the computer's RAM, most benchmarks
posted on this list have found throughput to peak at around 8GB to
10GB on system where 25% would be more than that.  (Far less on
Windows, as the docs mention.)

There can be a point well before that where there are latency
spikes.  In our shop we have a multi-TB database backing a web site,
and to prevent these latency spikes we keep shared_buffers down to
2GB even though the system has 128GB RAM.  Forcing dirty pages out
to the OS cache helps them to be written in a more timely manner by
code which knows something about the hardware and what order of
writes will be most efficient.  PostgreSQL has, as a matter of a
design choice, decided to leave a lot to the OS caching, file
systems, and device drivers, and a key part of tuning is to discover
what balance of that versus the DBMS caching performs best for your
workload.

> some of the queries have been gone over fairly well, other not..
> Its a complex sequence and we are in production mode here,
> so I dont get a chance to do everything I might do with regard to
> one particular query...

You may want to take a look at auto_explain:

http://www.postgresql.org/docs/current/interactive/auto-explain.html

Since you're already in production it may be hard to test the
performance of your disk system, but it's a pretty safe bet that if
you are at all disk-bound you would benefit greatly from adding one
more drive and converting your 3 drive RAID 5 to a 4 drive RAID 10,
preferably with a RAID controller with BBU cache configured for
write-back.

-Kevin

Re: Shared memory for large PostGIS operations

От
Andy Colson
Дата:
On 03/16/2012 05:30 PM, Kevin Grittner wrote:
> Brian Hamlin<maplabs@light42.com>  wrote:
>> On Mar 16, 2012, at 7:17 AM, Kevin Grittner wrote:
>>> Andy Colson<andy@squeakycode.net>  wrote:
>>>
>>>> I tried shared_buffers at both 2400M and 18000M, and it took 4.5
>>>> hours both times.  ... (weak attempts at humor omitted) ....
>
> Ah, I didn't pick up on the attempts at humor; perhaps that's why
> you mistook something I said as an attempt at an insult.

It wasn't you Kevin, it was me that insulted him.  (Although I was trying to be funny, and not mean).

Sorry again Brian.

-Andy

Re: Shared memory for large PostGIS operations

От
"Kevin Grittner"
Дата:
Andy Colson  wrote:
On 03/16/2012 05:30 PM, Kevin Grittner wrote:

>> Ah, I didn't pick up on the attempts at humor; perhaps that's why
>> you mistook something I said as an attempt at an insult.
>
> It wasn't you Kevin, it was me that insulted him. (Although I was
> trying to be funny, and not mean).

Adding to the confusion, I think I missed one of the emails/posts.
Oh, well, it sounds like mostly people need to use more smiley-faces,
since humor can be so easy to miss in this medium.

Brian, I hope this doesn't put you off from posting -- we try to be
helpful here.

-Kevin

Re: Shared memory for large PostGIS operations

От
Brian Hamlin
Дата:
Hi Kevin, List, others...

On Mar 16, 2012, at 7:17 AM, Kevin Grittner wrote:

> Andy Colson <andy@squeakycode.net> wrote:
>
>> I tried shared_buffers at both 2400M and 18000M, and it took 4.5
>> hours both times.  ... (weak attempts at humor omitted) ....
>
> All else being the same, adjusting shared_buffers affects how much
> of your cache is managed by PostgreSQL and how much of your cache is
> managed by the OS; it doesn't exactly change how much you have
> cached or necessarily affect disk waits.  (There's a lot more that
> can be said about the fine points of this, but you don't seem to
> have sorted out the big picture yet.)

   Linux caching is aggressive already.. so I think this example
points out that
Postgres caching is not contributing here..  thats why I posted this
short
example to this list.. I thought ti was a useful data point..  that
it might be
useful to others... and to the PostgreSQL project devs...

>
>> I heard of this program called vmstat that I'll read up on and
>> post some results for.  -----ignore- I dont take advice with
>> vinegar well...
>
> That's a good way to get a handle on whether your bottleneck is
> currently CPU or disk access.
>
>> (attempted insults omitted)
>
> If you're looking to make things faster (a fact not yet exactly in
> evidence), you might want to start with the query which runs the
> longest, or perhaps the one which most surprises you with its run
> time, and get the EXPLAIN ANALYZE output for that query.  There is
> other information you should include; this page should help:
>
> http://wiki.postgresql.org/wiki/SlowQueryQuestions


   some of the queries have been gone over fairly well, other not..
Its a complex sequence and we are in production mode here,
so I dont get a chance to do everything I might do with regard to
one particular query...


>
>> I just learned about http://explain.depesz.com/ and figure it
>> might help me.
>
> It is a nice way to present EXPLAIN ANALYZE output from complex
> queries.


   explain.depesz.com  definitely a good reference, thank you for that..


==
Brian Hamlin
GeoCal
OSGeo California Chapter
415-717-4462 cell