Re: tweaking MemSet() performance - 7.4.5

Поиск
Список
Период
Сортировка
От Manfred Spraul
Тема Re: tweaking MemSet() performance - 7.4.5
Дата
Msg-id 414C567C.3060503@colorfullife.com
обсуждение исходный текст
Ответ на Re: tweaking MemSet() performance - 7.4.5  (Marc Colosimo <mcolosimo@mitre.org>)
Ответы Re: tweaking MemSet() performance - 7.4.5  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Marc Colosimo wrote:

> Oops, I used the same setting as in the old hacking message (-O2, gcc 
> 3.3). If I understand what you are saying, then it turns out yes, PG's 
> MemSet is faster for smaller blocksizes (see below, between 32 and 
> 64). I just replaced the whole MemSet with memset and it is not very 
> low when I profile.

Could you check what the OS-X memset function does internally?
One trick to speed up memset it to bypass the cache and bulk-write 
directly from write buffers to main memory. i386 cpus support that and 
in microbenchmarks it's 3 times faster (or something like that). 
Unfortunately it's a loss in real-world tests: Typically a structure is 
initialized with memset and then immediately accessed. If the memset 
bypasses the cache then the following access will cause a cache line 
miss, which can be so slow that using the faster memset can result in a 
net performance loss.

> I could squeeze more out of it if I spent more time trying to 
> understand it (change MEMSET_LOOP_LIMIT to 32 and then add memset 
> after that?). I'm now working one understanding  Spin Locks and 
> friends. Putting in a sync call (in s_lock.h) is really a time killer 
> and bad for performance (it takes up 35 cycles).
>
That's the price you pay for weakly ordered memory access.
Linux on ppc uses eieio, on ppc64 lwsync is used. Could you check if 
they are faster?

--   Manfred


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jeroen T. Vermeulen"
Дата:
Сообщение: Re: transaction idle timeout in 7.4.5 and 8.0.0beta2
Следующее
От: Tom Lane
Дата:
Сообщение: Re: transaction idle timeout in 7.4.5 and 8.0.0beta2