Re: Combining Aggregates

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Combining Aggregates
Дата
Msg-id CAKJS1f-jc4tBC4VfXNaVv5FhVKCz0HFFSoFGf9-_tH=HTztawA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Combining Aggregates  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
Re: Combining Aggregates  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 25 December 2015 at 14:10, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 21, 2015 at 4:53 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 22 December 2015 at 01:30, Robert Haas <robertmhaas@gmail.com> wrote:
>> Can we use Tom's expanded-object stuff instead of introducing
>> aggserialfn and aggdeserialfn?  In other words, if you have a
>> aggtranstype = INTERNAL, then what we do is:
>>
>> 1. Create a new data type that represents the transition state.
>> 2. Use expanded-object notation for that data type when we're just
>> within a single process, and flatten it when we need to send it
>> between processes.
>>
>
> I'd not seen this before, but on looking at it I'm not sure if using it will
> be practical to use for this. I may have missed something, but it seems that
> after each call of the transition function, I'd need to ensure that the
> INTERNAL state was in the varlana format.

No, the idea I had in mind was to allow it to continue to exist in the
expanded format until you really need it in the varlena format, and
then serialize it at that point.  You'd actually need to do the
opposite: if you get an input that is not in expanded format, expand
it.

Admittedly I'm struggling to see how this can be done. I've spent a good bit of time analysing how the expanded object stuff works.

Hypothetically let's say we can make it work like:

1. During partial aggregation (finalizeAggs = false), in finalize_aggregates(), where we'd normally call the final function, instead flatten INTERNAL states and store the flattened Datum instead of the pointer to the INTERNAL state.
2. During combining aggregation (combineStates = true) have all the combine functions written in such a ways that the INTERNAL states expand the flattened states before combining the aggregate states.

Does that sound like what you had in mind?

If so I can't quite seem to wrap my head around 1. As I'm really not quite sure how, from finalize_aggregates() we'd flatten the INTERNAL pointer. I mean, how do we know which flatten function to call here? >From reading the expanded-object code I see that its used in expand_array(), In this case we know we're working with arrays, so it just always uses the EA_methods globally scoped struct to get the function pointers it requires for flattening the array. For the case of finalize_aggregates(), the best I can think of here is to have a bunch of global structs and then have a giant case statement to select the correct one. That's clearly horrid, and not commit worthy, and it does nothing to help user defined aggregates which use INTERNAL types. Am I missing something here?

As of the most recent patch I posted, having the serial and deserial functions in the catalogs allows user defined aggregates with INTERNAL states to work just fine. Admittedly I'm not all that happy that I've had to add 4 new columns to pg_aggregate to support this, but if I could think of how to make it work without doing that, then I'd likely go and do that instead.

If your problem with the serialize and deserialize stuff is around the serialized format, then can see no reason why we couldn't just invent some composite types for the current INTERNAL aggregate states, and have the serialfn convert the INTERNAL state into one of those, then have the deserialfn perform the opposite. Likely this would be neater than what I have at the moment with just converting the INTERNAL state into text.

Please let me know what I'm missing with the expanded-object code.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Haribabu Kommi
Дата:
Сообщение: Re: Multi-tenancy with RLS
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: On columnar storage (2)