Re: Using window functions to get the unpaginated count for paginated queries

Поиск
Список
Период
Сортировка
От Albe Laurenz
Тема Re: Using window functions to get the unpaginated count for paginated queries
Дата
Msg-id D960CB61B694CF459DCFB4B0128514C208AF0960@exadv11.host.magwien.gv.at
обсуждение исходный текст
Ответ на Using window functions to get the unpaginated count for paginated queries  (Clemens Park <clemens.park@gmail.com>)
Ответы Re: Using window functions to get the unpaginated count for paginated queries  (Igor Romanchenko <igor.a.romanchenko@gmail.com>)
Список pgsql-general
Clemens Park wrote:
> Recently, during a performance improvement sweep for an application at
my company, one of the hotspots
> that was discovered was pagination.
>
> In order to display the correct pagination links on the page, the
pagination library we used (most
> pagination libraries for that matter) ran the query with OFFSET and
LIMIT to get the paginated
> results, and then re-ran the query without the OFFSET and LIMIT and
wrapped them in a SELECT COUNT(*)
> FROM main_query to get the total number of rows.
>
> In an attempt to optimize this, we used a window function as follows:
>
> Given a query that looked as follows:
>
> SELECT a,b,c
> FROM table
> WHERE clauses
> OFFSET x LIMIT y;
>
> add total_entries_count column as follows:
>
> SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c
> FROM table
> WHERE clauses
> OFFSET x LIMIT y;
>
> This calculates the total number of unpaginated rows correctly,
without affecting the runtime of the
> query.  At least as far as I can tell.

It can affect the runtime considerably.

I created a 10^6 row test table and tried:

test=> EXPLAIN ANALYZE SELECT id, val FROM large OFFSET 100 LIMIT 10;
                                                    QUERY PLAN
------------------------------------------------------------------------
------------------------------------------
 Limit  (cost=1.49..1.64 rows=10 width=12) (actual time=0.177..0.195
rows=10 loops=1)
   ->  Seq Scan on large  (cost=0.00..14902.00 rows=1000000 width=12)
(actual time=0.028..0.114 rows=110 loops=1)
 Total runtime: 0.251 ms
(3 rows)

test=> EXPLAIN ANALYZE SELECT id, val, COUNT(*) OVER () AS
total_entries_count FROM large OFFSET 100 LIMIT 10;
                                                          QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------
 Limit  (cost=2.74..3.01 rows=10 width=12) (actual
time=1893.606..1893.625 rows=10 loops=1)
   ->  WindowAgg  (cost=0.00..27402.00 rows=1000000 width=12) (actual
time=1893.435..1893.559 rows=110 loops=1)
         ->  Seq Scan on large  (cost=0.00..14902.00 rows=1000000
width=12) (actual time=0.025..647.182 rows=1000000 loops=1)
 Total runtime: 1915.255 ms
(4 rows)

That is because the second query will have to scan all rows, while the
first one can stop scanning after 110 rows.

> The questions I have are:
>
> 1) Are there any adverse effects that the above window function can
have?

I can only think of the performance degradation mentioned above.

> 2) Are there any cases where the count would return incorrectly?

No.

> 3) In general, is this an appropriate use-case for using window
functions?

I think it is.

Maybe you can do better if you don't retrieve the total count
of rows for every set of rows you select.

Yours,
Laurenz Albe


В списке pgsql-general по дате отправления:

Предыдущее
От: Tianyin Xu
Дата:
Сообщение: How do query optimizers affect window functions
Следующее
От: "Albe Laurenz"
Дата:
Сообщение: Re: Understanding streaming replication