Re: Do I have a hardware or a software problem?

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Do I have a hardware or a software problem?
Дата
Msg-id CAMkU=1xqUC0s_ML+KT3UbTYK5B+p40fFjq-N-wdcgQuXngsfMg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Do I have a hardware or a software problem?  (Niels Kristian Schjødt <nielskristian@autouncle.com>)
Список pgsql-performance
On Wed, Dec 12, 2012 at 8:46 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
>
> Den 11/12/2012 kl. 18.25 skrev Jeff Janes <jeff.janes@gmail.com>:
>
>> On Tue, Dec 11, 2012 at 2:04 AM, Niels Kristian Schjødt
>> <nielskristian@autouncle.com> wrote:
>>
>>> Maybe I should mention, that I never see more than max 5Gb out of my total 32Gb being in use on the server… Can I
somehowutilize more of it? 
>>
>> What tool do you use to determine that?  Is that on top of the 4GB
>> shared_buffers, are including it?
>
> Okay I might not have made myself clear, I was talking "physical" memory utilization. Here is the stats:
> free -m
> total       used       free     shared    buffers     cached
> Mem:         32075      25554       6520          0         69      22694
> -/+ buffers/cache:       2791      29284
> Swap:         2046        595       1451

I don't how you get 5 Gig from that, though.  You have 22 Gig of
cached file-system, which for your purposes probably counts as being
utilized.  Although we don't know how much of this is for postgres
data files, chances are it is a pretty good chunk.


>>
>> How big is your entire data set?  Maybe all your data fits in 5GB
>> (believable, as all your indexes listed below sum to < 2.5GB) so there
>> is no need to use more.
>
> It doesn't we are a search engine for used cars, and there are quite a lot of those out there :-)

But how big?  More than 22GB?  (you can use \l+ in psql, or du -s on
the data directory)

> However, my indexes are almost all partial indexes, which mean that they are only on cars which is still for sale, so
inthat sense, the indexes them selves doesn't really grow, but the tables do. 

So maybe this reverses things.  If your car table is huge and the
active cars are scattered randomly among all the inactive ones, then
updating random active cars is going to generate a lot of randomly
scattered writing which can't be combined into sequential writes.

Do you have plans for archiving cars that are no longer for sale?  Why
do you keep them around in the first place, i.e. what types of queries
do you do on inactive ones?

Unfortunately you currently can't use CLUSTER with partial indexes,
otherwise that might be a good idea.  You could build a full index on
whatever it is you use as the criterion for the partial indexes,
cluster on that, and then drop it.

But the table would eventually become unclustered again, so if this
works you might want to implement partitioning between active and
inactive partitions so as to maintain the clustering.


>>>> You could really crank up shared_buffers or vm.dirty_background_ratio,
>>>> but doing so might cause problems with checkpoints stalling and
>>>> latency spikes.  That would probably not be a problem during the
>>>> night, but could be during the day.
>>
>>> What do you have in mind here? Tweaking what parameters to what values?
>>
>> I'd set shared_buffers to 20GB (or 10GB, if that will hold all of your
>
> I had that before, Shaun suggested that I changed it to 4GB as he was talking about a strange behavior when larger
thanthat on 12.04. But I can say, that there has not been any notable difference between having it at 4Gb and at 8Gb. 

It is almost an all or nothing thing.  If you need 16 or 20GB, just
going from 4 to 8 isn't going to show much difference.  If you can
test this easily, I'd just set it to 24 or even 28GB and run the bulk
update.  I don't think you'd want to run a server permanently at those
settings, but it is an easy way to rule in or out different theories
about what is going on.

>> But if your database is growing so rapidly that it soon won't fit on
>> 240GB, then cranking up shared_buffers won't do for long.  If you can
>> get your tables and all of their indexes clustered together, then you
>> can do the updates in an order that makes IO more efficient.  Maybe
>> partitioning would help.
>
> Can you explain a little more about this, or provide me a good link?

If all your partial btree indexes are using the same WHERE clause,
then your indexes are already clustered together in a sense--a partial
index is kind of like a composite index with the WHERE clause as the
first column.

So the trick would be to get the table to be clustered on the same
thing--either by partitioning or by the CLUSTER command, or something
equivalent to those.  I don't know of a good link, other than the
documentation (which is more about how to do it, rather than why you
would want to or how to design it)

Cheers,

Jeff


В списке pgsql-performance по дате отправления:

Предыдущее
От: Hari Babu
Дата:
Сообщение: Memory issue for inheritance tables.
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: hash join vs nested loop join