Обсуждение: Using the indexing and sampling APIs to realize progressive features

Поиск
Список
Период
Сортировка

Using the indexing and sampling APIs to realize progressive features

От
Дата:

Hi,

 

I have some questions regarding the indexing and sampling API.

My aim is to implement a variant of progressive indexing as seen in this paper (link). To summarize,

I want to implement a variant of online aggregation, where an aggregate query (Like Sum, Average, etc.) is answered in real time, where the result becomes more and more accurate as Tuples are consumed.

I thought that I could maybe use a custom sampling routine to consume table samples until I have seen the whole table with no duplicate tuples.

Meanwhile, with every consumed sample and returned partial answer, I want to add the tuples consumed to a progressively evolving index.

This would mean that I would have to be able to uniquely identify each row to be able to add them to the growing index, right? Since OID is deprecated / phased out, I am still unsure of how to solve this.

Does this sound reasonable or is there an obvious flaw in my thinking?

I would also be thankful if there is any material beyond the Postgres documentation which helps me to start out modifying the source to realize something like this.

 

Regards

Michael H.

 

Re: Using the indexing and sampling APIs to realize progressive features

От
Vijaykumar Jain
Дата:


On Thu, Feb 3, 2022, 8:55 PM <hohenstein@cs.uni-kl.de> wrote:

Hi,

 

I have some questions regarding the indexing and sampling API.

My aim is to implement a variant of progressive indexing as seen in this paper (link). To summarize,

I want to implement a variant of online aggregation, where an aggregate query (Like Sum, Average, etc.) is answered in real time, where the result becomes more and more accurate as Tuples are consumed.

I thought that I could maybe use a custom sampling routine to consume table samples until I have seen the whole table with no duplicate tuples.



I am not sure if I understand correctly, but if this is referring to faceted search, then then the below may be of some help.


Performance may vary, but it would help you get an idea of the implementation.
And you also have rollups and cubes, but they get slow over large tables and require more resources to speed up.


If this is not what you wanted, feel free to ignore.