Re: Wait free LW_SHARED acquisition - v0.9

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Wait free LW_SHARED acquisition - v0.9
Дата
Msg-id 20141011005901.GF6724@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: Wait free LW_SHARED acquisition - v0.9  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Wait free LW_SHARED acquisition - v0.9
Список pgsql-hackers
On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
> On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
> > On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
> > > On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund <andres@2ndquadrant.com>
> > > wrote:
> > > > > Observations
> > > > > ----------------------
> > > > > a. The patch performs really well (increase upto ~40%) incase all
> the
> > > > > data fits in shared buffers (scale factor -100).
> > > > > b. Incase data doesn't fit in shared buffers, but fits in RAM
> > > > > (scale factor -3000), there is performance increase upto 16 client
> > > count,
> > > > > however after that it starts dipping (in above config unto ~4.4%).
> > > >
> > > > Hm. Interesting. I don't see that dip on x86.
> > >
> > > Is it possible that implementation of some atomic operation is costlier
> > > for particular architecture?
> >
> > Yes, sure. And IIRC POWER improved atomics performance considerably for
> > POWER8...
> >
> > > I have tried again for scale factor 3000 and could see the dip and this
> > > time I have even tried with 175 client count and the dip is
> approximately
> > > 5% which is slightly more than 160 client count.

I've run some short tests on hydra:

scale 1000:

base:
4GB:
tps = 296273.004800 (including connections establishing)
tps = 296373.978100 (excluding connections establishing)

8GB:
tps = 338001.455970 (including connections establishing)
tps = 338177.439106 (excluding connections establishing)

base + freelist:
4GB:
tps = 297057.523528 (including connections establishing)
tps = 297156.987418 (excluding connections establishing)

8GB:
tps = 335123.867097 (including connections establishing)
tps = 335239.122472 (excluding connections establishing)

base + LW_SHARED:
4GB:
tps = 296262.164455 (including connections establishing)
tps = 296357.524819 (excluding connections establishing)
8GB:
tps = 336988.744742 (including connections establishing)
tps = 337097.836395 (excluding connections establishing)

base + LW_SHARED + freelist:
4GB:
tps = 296887.981743 (including connections establishing)
tps = 296980.231853 (excluding connections establishing)

8GB:
tps = 345049.062898 (including connections establishing)
tps = 345161.947055 (excluding connections establishing)

I've also run some preliminary tests using scale=3000 - and I couldn't
see a performance difference either.

Note that all these are noticeably faster than your results.

> > >
> > > Lwlock_contention patches - client_count=128
> > > ----------------------------------------------------------------------
> > >
> > > +   7.95%      postgres  postgres               [.] GetSnapshotData
> > > +   3.58%      postgres  postgres               [.] AllocSetAlloc
> > > +   2.51%      postgres  postgres               [.] _bt_compare
> > > +   2.44%      postgres  postgres               [.]
> > > hash_search_with_hash_value
> > > +   2.33%      postgres  [kernel.kallsyms]      [k] .__copy_tofrom_user
> > > +   2.24%      postgres  postgres               [.] AllocSetFreeIndex
> > > +   1.75%      postgres  postgres               [.]
> > > pg_atomic_fetch_add_u32_impl
> >
> > Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
> > the compiler settings you used?
> 
> Nothing specific, for performance tests where I have to take profiles
> I use below:
> ./configure --prefix=<installation_path> CFLAGS="-fno-omit-frame-pointer"
> make

Hah. Doing so overwrites the CFLAGS configure normally sets. Check
# CFLAGS are selected so:
# If the user specifies something in the environment, that is used.
# else:  If the template file set something, that is used.
# else:  If coverage was enabled, don't set anything.
# else:  If the compiler is GCC, then we use -O2.
# else:  If the compiler is something else, then we use -O, unless debugging.

so, if you do like above, you're compiling without optimizations... So,
include at least -O2 as well.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Wait free LW_SHARED acquisition - v0.9
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: orangutan seizes up during isolation-check