Re: [HACKERS] Arrays of domains

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Arrays of domains
Дата
Msg-id 5970.1506635083@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Arrays of domains  (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>)
Ответы Re: [HACKERS] Arrays of domains  (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>)
Список pgsql-hackers
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 09/28/2017 01:11 PM, Tom Lane wrote:
>>> I wonder if we need to do any benchmarking to assure ourselves that the
>>> changes to ArrayCoerceExpr don't have a significant performance impact?

>> That would likely be a good idea, though I'm not very sure what or
>> how to benchmark.

> Some case where we only expect the current code to produce a single
> ArrayCoerceExpr, I guess. say doing text[] -> int[] ?

I spent some time looking into this.  I settled on int4[] -> int8[]
as the appropriate case to test, because int48() is about as cheap
a cast function as we have.  Q1 is the base case without a cast:

select count(x) from (select array[i,i,i,i,i,i,i,i,i,i] as x  from generate_series(1,10000000) i) ss;

Q2 is same with a cast added:

select count(x::int8[]) from (select array[i,i,i,i,i,i,i,i,i,i] as x  from generate_series(1,10000000) i) ss;

Q3 and Q4 are the same thing, but testing 100-element instead of
10-element arrays:

select count(x) from (select array[
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i
] as x  from generate_series(1,10000000) i) ss;

select count(x::int8[]) from (select array[
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i,
i,i,i,i,i,i,i,i,i,i
] as x  from generate_series(1,10000000) i) ss;

I get these query timings in a non-cassert build:
HEAD        Patch

Q1    5453.235 ms    5440.876 ms
Q2    9340.670 ms    10191.194 ms
Q3    19078.457 ms    18967.279 ms
Q4    48196.338 ms    58547.531 ms

(Timings are reproducible to a few percent.)

So that's a bit disappointing; the per-element overhead is clearly
noticeably more than before.  However, poking into it with "perf"
gives some grounds for optimism; Q4's hotspots with the patch are
 Children      Self       Samples  Command          Shared Object                Symbol
+   33.44%    33.35%         81314  postmaster       postgres                     [.] ExecInterpExpr
+   21.88%    21.83%         53223  postmaster       postgres                     [.] array_map
+   15.19%    15.15%         36944  postmaster       postgres                     [.] CopyArrayEls
+   14.63%    14.60%         35585  postmaster       postgres                     [.] ArrayCastAndSet
+    6.07%     6.06%         14765  postmaster       postgres                     [.] construct_md_array
+    1.80%     1.79%          4370  postmaster       postgres                     [.] palloc0
+    0.77%     0.77%          1883  postmaster       postgres                     [.] AllocSetAlloc
+    0.75%     0.74%          1815  postmaster       postgres                     [.] int48
+    0.52%     0.52%          1276  postmaster       postgres                     [.] advance_aggregates

Surely we could get the amount of time spent in ExecInterpExpr down.

One idea is to make a dedicated evalfunc for the case where the
expression is just EEOP_CASE_TESTVAL + EEOP_FUNCEXPR[_STRICT],
similar to the existing fast-path routines (ExecJust*).  I've not
actually tried to do that, but a reasonable guess is that it'd about
halve that overhead, putting this case back on a par with the HEAD code.
Also, I'd imagine that Andres' planned work on JIT-compiled expressions
would put this back on par with HEAD, if not better, for installations
using that.

Also I believe that Andres has plans to revamp the CaseTestExpr mechanism,
which might shave a few cycles as well.  Right now there's an extra copy
of each array datum + isnull value, because array_map sticks those into
one pair of variables and then the EEOP_CASE_TESTVAL opcode just moves
them somewhere else.  It's reasonable to hope that we could redesign that
so that array_map sticks the values straight into where they're needed,
and then we need only the FUNCEXPR opcode, which'd be a great candidate
for an ExecJust* fast-path.

Assuming that that's going to happen for v11, I'm inclined to leave the
optimization problem until the dust settles around CaseTestExpr.
Does anyone feel that this can't be committed before that's addressed?
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple