Re: distinct values and count over window function

Поиск
Список
Период
Сортировка
От Rodrigo Rosenfeld Rosas
Тема Re: distinct values and count over window function
Дата
Msg-id 50B56AB4.1060700@gmail.com
обсуждение исходный текст
Ответ на Re: distinct values and count over window function  ("David Johnston" <polobo@yahoo.com>)
Список pgsql-general
Em 27-11-2012 19:50, David Johnston escreveu:
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Rodrigo Rosenfeld
> Rosas
> Sent: Tuesday, November 27, 2012 4:23 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] distinct values and count over window function
>
> Hello, this is my first post to this particular list (I've also posted to
> performance list a while ago to report a bug).
>
> I generate complex dynamic queries and paginate them using "LIMIT 50". I
> have a need for implementing some statistics for the results as well and I'd
> like to know if I can avoid extra queries or how to be able to do that
> without much performance hit.
>
> The statistics calculation that bothers me most is a distinct one. For
> example, suppose you have a query that would return more than 50 distinct
> values while only the first 50 ones are brought by the paginated query.
>
> When the user asks for its statistics it should show all counts by distinct
> value including those not present in the current results page for certain
> columns.
>
> I noticed that there is a plan for PG 9.2 to include a query_to_json
> function that would indeed help me to get this done when used altogether
> with a window function.
>
> Our application is currently running PG 9.1 (was rolled back from 9.2 due to
> a performance bug that seems to be now fixed but not released yet). Is there
> anyway I could avoid running another query to get this kind of statistics?
>
> The queries can become really complex and take almost 5s to complete in some
> cases. Which would mean that performing 2 queries with those same conditions
> would take about 10s... I could perform them in different times but then the
> user would have to wait for 5s just to get the statistics calculation...
> This is what I'd like to avoid.
>
> This application is a clustered web application, so I'd rather avoid any
> client library support for paginating results using cursors or something
> like that and stick with OFFSET and LIMIT.
>
> Any help is appreciated!
>
> Thanks,
> Rodrigo.
>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .
> Conceptually you could execute and cache (via a Temporary or Permanent
> Table) the results of the unlimited query then use that cache to calculate
> your statistics as well as perform your subsequent LIMIT/OFFSET queries.
>
> In 9.1+ you have the ability to incorporate INSERT inside a WITH/CTE so that
> you can perform multiple actions like this without resorting to the use of a
> function - though a function may have value as well.
>
> David J.
>
>

Indeed David, thanks! Temporary tables seems like a good enough
solution.  I tested with a complex query here, which takes about 300ms
in my dev machine.

I was implementing pagination by issuing 2 similar queries for the first
page, one of them being a count(*) one. Both summed for about 500ms
while they complete in around 350ms if I use a temporary table in within
a transaction using "CREATE TEMP TABLE partial_results ON COMMIT DROP AS".

I'd love to store those partial results for a longer time but this would
complicate things a lot. I'd need a way to ensure each query will use a
different name for the results table and I should also implement a
system that would take care of dropping such tables as the used memory
passes some caching threshold or when some timeout has occurred.

Are there any explicit caching mechanisms like described above built-in
into PostgreSQL?

Anyway, I won't have time to look into this at least until Monday so
I'll use the temp table strategy described above for now due to a tight
dead-line.

Thanks a lot!

Best,
Rodrigo.


В списке pgsql-general по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: w7 vs linux
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Restore postgres to specific time