Re: [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex

Поиск
Список
Период
Сортировка
От Jeremy Kerr
Тема Re: [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex
Дата
Msg-id 200907201047.18603.jk@ozlabs.org
обсуждение исходный текст
Ответ на Re: [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi Robert,

> That having been said, Jeremy, you probably want to take a look at
> those comments and I have a few responses to them as well.

OK, thanks for the heads-up.

> following comment:
> > Applied and built cleanly. Regress passes. Trying to hunt down ppc
> > box to see if performance enhancement can be seen.
>
> Question: are we only doing this because of PowerPC?

No, this patch will benefit any architecture that has a gcc 
implementation of __builtin_clz. I know that both x86 and powerpc gcc 
support this.

> What is going to happen on x86 and other architectures?  x86 is not
> the center of the universe, but at a very minimum we need to make sure
> that things don't go backwards on what is a very common platform.  Has
> anyone done any benchmarking of this?

Yes, Atsushi Ogawa did some benchmarking with this patch on x86, his 
numbers are here:
http://archives.postgresql.org/message-id/4A2895C4.9050108@hi-ho.ne.jp

In fact, Atushi originally submitted a patch using inline asm (using 
"bsr") to do this on x86. Coincidentally, I was working on a powerpc 
implementation (using "cntlz") at the same time, so submitted a patch 
using the gcc builtin that would work on both (and possibly other) 
architectures.

> A related question: the original email for this patch says that it
> results in a performance increase of about 2% on PPC.  But since it
> gives no details on exactly what the test was that improved by 2%,
> it's hard to judge the impact of this.  If this means that
> AllocSetFreeIndex() is 2% faster, I think we should reject this patch
> and move on.  It's not worth introducing architecture-specific code
> to get a performance improvement that probably won't even move the
> needle on a macro-benchmark.  On the other hand, if the test was
> something like a pgbench run, then this is really very impressive. 
> But we don't know which it is.

The 2% improvement I saw is for a sysbench OLTP run. I'm happy to re-do 
the run and report specific numbers if you need them.

> Zdenek Kotala added this comment:
> > I have several objections:
> >
> > - inline is forbidden to use in PostgreSQL - you need exception or
> > do it differently
> >
> > - type mismatch - Size vs unsigned int vs 32. you should use only
> > Size and sizeof(Size)

OK, happy to make these changes. What's the commitfest process here, 
should I redo the patch and re-send to -hackers?

(inline again: should I just make this a static, the compiler can inline 
where possible? or do you want a macro?)

> > And general comment:
> >
> > Do we want to have this kind of code which is optimized for one
> > compiler and platform.

One compiler, multiple platforms.

> > See openSSL or MySQL, they have many
> > optimizations which work fine on one platform with specific version
> > of compiler and specific version of OS. But if you select different
> > version of compiler or different compiler you can get very
> > different performance result (e.g. MySQL 30% degradation, OpenSSL
> > RSA 3x faster and so on).

I don't think we're going to see a lot of variation here (besides the 
difference where __builtin_clz isn't available). Given that this is 
implemented with a couple of instructions, I'm confident that we won't 
see any degradation for platforms that support __builtin_clz. For the 
other cases, we just use the exiting while-loop algorithm so there 
should be no change (unless we mess up the inlining...).

> > I think in this case, it makes sense to have optimization here, but
> > we should do it like spinlocks are implemented and put this code
> > into /port.

Unless I'm missing something, spinlocks are done the same way - we have 
one file, src/include/storage/s_lock.h, which is mostly different 
implementations of the lock primitives for different platforms, 
separated by preprocessor tests.

Essentially, this is just one line of code, protected by 
HAVE_BUILTIN_CLZ (which is a feature-test, rather than a specific 
platform or compiler test), and is only used in one place. I don't think 
that warrants a separate file.

Cheers,


Jeremy


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH 1/2 v3] [libpq] rework sigpipe-handling macros
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: navigation menu for documents