Обсуждение: Aggregating tsqueries

Поиск
Список
Период
Сортировка

Aggregating tsqueries

От
Alexander Hill
Дата:
Hello,

I have a table of tree nodes with a tsquery column. To get a subtree's tsquery, I need to OR all of its nodes' tsqueries together.

I defined a custom aggregate using tsquery_or:

    CREATE AGGREGATE tsquery_or_agg (tsquery)
    (
        sfunc = tsquery_or,
        stype = tsquery
    );

but I've found that

    tsquery_or_agg(query)

is about a hundred times slower than this:

    ('(' || string_agg(query::text, ')|(') || ')')::tsquery

That works perfectly so I'm happy to continue doing it, but I'm curious to know why the difference is so great and if anything can be done about it?

Cheers,
Alex

Re: Aggregating tsqueries

От
Heikki Linnakangas
Дата:
On 09/17/2014 07:56 AM, Alexander Hill wrote:
> Hello,
>
> I have a table of tree nodes with a tsquery column. To get a subtree's
> tsquery, I need to OR all of its nodes' tsqueries together.
>
> I defined a custom aggregate using tsquery_or:
>
>      CREATE AGGREGATE tsquery_or_agg (tsquery)
>      (
>          sfunc = tsquery_or,
>          stype = tsquery
>      );
>
> but I've found that
>
>      tsquery_or_agg(query)
>
> is about a hundred times slower than this:
>
>      ('(' || string_agg(query::text, ')|(') || ')')::tsquery
>
> That works perfectly so I'm happy to continue doing it, but I'm curious to
> know why the difference is so great and if anything can be done about it?

string_agg's state transition function uses a buffer that's expanded as
needed. At every step, the next string is appended to the buffer. Your
custom aggregate is less efficient, because it constructs a new tsquery
object at every step. In every step, a new tsquery object is allocated
and the old result and the next source tsquery are copied to it. That's
much more expensive.

If you're not shy of writing C code, you could write a more efficient
version of tsquery_or_agg too, using a similar technique.

- Heikki