Обсуждение: Faster array_length()

Поиск
Список
Период
Сортировка

Faster array_length()

От
Hadi Moshayedi
Дата:
Hello,

The attached patch improves the performance of array_length() by detoasting only the overhead part of the datum.

Here is a test case:

postgres=# create table array_length_test as select array_agg(a) a from generate_series(1, 10000) a, generate_series(1, 10000) b group by b;

Without the patch:

postgres=#  select sum(array_length(a, 1)) from array_length_test;
    sum    
-----------
 100000000
(1 row)

Time: 199.002 ms

With the patch:

postgres=#  select sum(array_length(a, 1)) from array_length_test;
    sum    
-----------
 100000000
(1 row)

Time: 34.599 ms

The motivation for patch is that some of our customers use arrays to store a sequence of tens of thousands of events in each row. They often need to get the last 10 event for each row, for which we do A[array_length(A, 1) - 9: 1000000] (assuming 1M is an upper-bound. we could use array_length() instead of this constant too, but that is unnecessary if we know the upper-bound and only slows down the query). Without this optimization, array gets detoasted twice. With this patch, array_length() becomes much faster, and the whole query saves few seconds.

Of course this technique is applicable to some other functions too, but they have never become a bottleneck for me, so I decided to keep the changes only to this function.

Another alternative I could think of was introducing python style slicing, in which negative indexes start from end of array, so -10 means 10th element from end. I thought this would be a bigger change and is probably unnecessary, so I decided to improve array_length() instead.

Feedback is welcome.

Thanks,
   -- Hadi

Вложения

Re: Faster array_length()

От
Tom Lane
Дата:
Hadi Moshayedi <hadi@moshayedi.net> writes:
> The attached patch improves the performance of array_length() by detoasting
> only the overhead part of the datum.

It looks to me like this would actually make things worse for cases where
the input array wasn't toasted-out-of-line (because it would uselessly
make a copy of the header part, costing a palloc cycle).  I'm not averse
to improving the case you're worried about, but you have to pay attention
to not having bad side-effects on other cases.

Another thought is that this can only win for arrays that are external
without being compressed; when they are compressed, then
heap_tuple_untoast_attr_slice will fetch and decompress the entire array
anyway (and then, just to add insult to injury, make another copy :-().
With that in mind, I was surprised that your test case showed any
improvement at all --- it looks like the arrays aren't getting compressed
for some reason.  There are going to be a lot of other cases where this
patch doesn't help unless the user turns off compression, which will hurt
his performance in other ways.

Now, the slice detoast support was only designed to work with data stored
in "external" mode (that is, with compression manually disabled via the
appropriate ALTER TABLE option), and that's not unreasonable for its
originally-intended application of being able to fetch any part of an
external text string.  But it strikes me that for what you want here,
namely fetching just a few bytes from the start, it ought to be possible
to do better.  Could we teach the toast code to fetch and decompress just
an initial subset of the data?  (This might be useful even for the
original use-case of slice fetches, as long as the desired slice isn't
too close to the end of the datum.)

Bottom line for me is that you've shown that there can be a win from
improving this code, but we need to do some basic work on the
slice-fetching logic to get full value out of the idea.

> Of course this technique is applicable to some other functions too, but
> they have never become a bottleneck for me, so I decided to keep the
> changes only to this function.

I would expect a committable version of this patch to cover all the
array-dimension-related functions.
        regards, tom lane



Re: Faster array_length()

От
Hadi Moshayedi
Дата:
Thanks for looking into this.
 
With that in mind, I was surprised that your test case showed any
improvement at all --- it looks like the arrays aren't getting compressed
for some reason.

 
You are right, it seems that they were not getting compressed, probably because the arrays were "seq 10000" which seems to not get compressed by pglz. When I changed the test data to an array containing 10000 ones, there were no speed improvement anymore.

I'll look into how to improve the compressed case and other issues you raised.

Thanks,
   -- Hadi