Re: The Contrib Roundup (long)

Поиск
Список
Период
Сортировка
От Ron Mayer
Тема Re: The Contrib Roundup (long)
Дата
Msg-id 42A69B56.5060603@cheapcomplexdevices.com
обсуждение исходный текст
Ответ на The Contrib Roundup (long)  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-hackers
Josh Berkus wrote:
> 
> intagg: what does this module do which is not already available
> through the built-in array functions and operators?  Maybe I
> don't understand what it does. Unnatributed in the README.  Move
> to pgfoundry?

Short summary:
  Is there an equivalent of "int_array_enum()" built in?
  I use it for substantial (9X) performance improvements  for doing joins similar to those described in its README.
  I think it can be used to do somewhat similar things with  integer arrays that the SQL2003 UNNEST operator does  on
MULTISETs(but yeah, they're quite different too).
 


Long and boring, but with examples:

I find that it can speed up certain kinds of joins (like
those described in it's readme) drastically.  I have a
pretty big application that has a lot of joins that use
int_array_enum() to expand an array stored in one
column into something that looks like a table instead
of having a third join table connecting two tables.

Note that this is often much faster than the
array IN/ANY/SOME/NOT IN comparisons because when
planning the join it can all the various join plans
like hash joins; while the array operators seem to
just do linear searches of the arrays.

This trick is especially useful in conjunction with an aggregate
based on the "_int_union" function from the intarray/ contrib
module (similar to the FUSION operator for MULTISETS) when you
only want distinct values for that type of join. Sample queries
from an actual application showing a factor-of-9 performance
improvement(7 seconds to 800ms) are shown below.

-- similar to the standard FUSION operator for MULTISETS.
create aggregate intarray_union_agg (    sfunc = _int_union,    basetype = int[],    stype = int[],    initcond =
'{-1}'
);
explain analyze select fac_nam  from userfeatures.point_features  join entity_facets using (entity_id)  where
featureid=115group by fac_nam;
 
-- Total runtime: 7125.322 ms
explain analyze select fac_nam from (select distinct int_array_enum(fac_ids) as fac_id from (select distinct fac_ids
fromentity_facids natural join point_features where featureid=115) as a) as a join facet_lookup using (fac_id);
 
-- Total runtime: 1297.558 ms
explain analyze select fac_nam from (select distinct int_array_enum(fac_ids) as fac_id from (select
intarray_union_agg(fac_ids)as fac_ids from entity_facids natural join point_features where featureid=115) as a) as a
joinfacet_lookup using (fac_id);
 
-- Total runtime: 803.187 ms

I don't have access to the system right now, so I don't have the
full table definitions - but the basic problem is that there are
many "facets" for each row in the "point_features" table and there
are many "features" with featureid=115.   The queries are trying
to find the names of each facet available from that set of point_features.



> intarray: data_types/

Well, the array of int's data type is built in, so I think
this module is more about the functions, operators, and
indexes that it provides that operate on arrays of ints.
Would that make it fit better under functions/ in your new
directory tree?



If I had a vote, I'd think it nice if the intagg module got
merged with the intarray module (wherever it ends up) because
they really are quite complementary in providing useful
tools for manipulating arrays of ints.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Abhijit Menon-Sen
Дата:
Сообщение: linuxtag 2005
Следующее
От: Ron Mayer
Дата:
Сообщение: Re: The Contrib Roundup (long)