Обсуждение: Paged Query

Поиск
Список
Период
Сортировка

Paged Query

От
Hermann Matthes
Дата:
I want to implement a "paged Query" feature, where the user can enter in
a dialog, how much rows he want to see. After displaying the first page
of rows, he can can push a button to display the next/previous page.
On database level I could user "limit" to implement this feature. My
problem now is, that the user is not permitted to view all rows. For
every row a permission check is performed and if permission is granted,
the row is added to the list of rows sent to the client.
If for example the user has entered a page size of 50 and I use "limit
50" to only fetch 50 records, what should I do if he is only permitted
to see 20 of these 50 records? There may be more records he can view.
But if I don't use "limit", what happens if the query would return
5,000,000 rows? Would my result set contain 5,000,000 rows or would the
performance of the database go down?

Thanks in advance
Hermann

Re: Paged Query

От
Віталій Тимчишин
Дата:
What language are you using? Usually there is iterator with chunked fetch option (like setFetchSize in java jdbc). So you are passing query without limit and then read as many results as you need. Note that query plan in this case won't be optimized for your limit and I don't remember if postgres has "optimize for N rows" statement option.
Also, if your statement is ordered by some key, you can use general paging technique when you rerun query with "key>max_prev_value" filter to get next chunk.

Середа, 4 липня 2012 р. користувач Hermann Matthes <hermann.matthes@web.de> написав:
> I want to implement a "paged Query" feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page.
> On database level I could user "limit" to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client.
> If for example the user has entered a page size of 50 and I use "limit 50" to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view.
> But if I don't use "limit", what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down?
>
> Thanks in advance
> Hermann
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
Best regards,
 Vitalii Tymchyshyn

Re: Paged Query

От
"Albe Laurenz"
Дата:
Hermann Matthes wrote:
> I want to implement a "paged Query" feature, where the user can enter
in
> a dialog, how much rows he want to see. After displaying the first
page
> of rows, he can can push a button to display the next/previous page.
> On database level I could user "limit" to implement this feature. My
> problem now is, that the user is not permitted to view all rows. For
> every row a permission check is performed and if permission is
granted,
> the row is added to the list of rows sent to the client.
> If for example the user has entered a page size of 50 and I use "limit
> 50" to only fetch 50 records, what should I do if he is only permitted
> to see 20 of these 50 records? There may be more records he can view.
> But if I don't use "limit", what happens if the query would return
> 5,000,000 rows? Would my result set contain 5,000,000 rows or would
the
> performance of the database go down?

Selecting all 5000000 rows would consume a lot of memory wherever
they are cached. Also, it might lead to bad response times (with
an appropriate LIMIT clause, the server can choose a plan that
returns the first few rows quickly).

I assume that there is some kind of ORDER BY involved, so that
the order of rows displayed is not random.

I have two ideas:
- Try to integrate the permission check in the query.
  It might be more efficient, and you could just use LIMIT
  and OFFSET like you intended.
- Select some more rows than you want to display on one page,
  perform the permission checks. Stop when you reach the end
  or have enough rows. Remember the sort key of the last row
  processed.
  When the next page is to be displayed, use the remembered
  sort key value to SELECT the next rows.

Yours,
Laurenz Albe

Re: Paged Query

От
Greg Spiegelberg
Дата:
On Wed, Jul 4, 2012 at 6:25 AM, Hermann Matthes <hermann.matthes@web.de> wrote:
I want to implement a "paged Query" feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page.
On database level I could user "limit" to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client.
If for example the user has entered a page size of 50 and I use "limit 50" to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view.
But if I don't use "limit", what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down?


Sounds like your permission check is not implemented in the database.  If it were, those records would be excluded and the OFFSET-LIMIT combo would be your solution.  Also appears that you have access to the application.  If so, I would recommend implementing the permission check in the database.  Much cleaner from a query & pagination standpoint.

An alternative is to have the application complicate the query with the appropriate permission logic excluding the unviewable records from the final ORDER BY-OFFSET-LIMIT.  This will give you an accurate page count.

IMHO, the worst alternative is to select your max page size, exclude rows the user cannot see, rinse and repeat until you have your records per page limit.  Whatever you're ordering on will serve as the page number.  Issue with this solution is you may not have an accurate page count.

Luck.

-Greg

Re: Paged Query

От
Misa Simic
Дата:
Hi Hermann,

Well,

Not clear how you get rows for user without paging?

If it is some query:

SELECT columns FROM table WHERE UserHasPerimision(rowPK, userid)

Paging would be:

SELECT columns FROM table WHERE UserHasPerimision(rowPK, userid) LIMIT NoOfRecords OFFSET page*NoOfRecords

Kind Regards,

Misa

2012/7/4 Hermann Matthes <hermann.matthes@web.de>
I want to implement a "paged Query" feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page.
On database level I could user "limit" to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client.
If for example the user has entered a page size of 50 and I use "limit 50" to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view.
But if I don't use "limit", what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down?

Thanks in advance
Hermann

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Paged Query

От
Gregg Jaskiewicz
Дата:
Use cursors.
By far the most flexible. offset/limit have their down sides.

Re: Paged Query

От
Craig Ringer
Дата:
On 07/09/2012 07:55 PM, Gregg Jaskiewicz wrote:
> Use cursors.
> By far the most flexible. offset/limit have their down sides.
Do do cursors.

Keeping a cursor open across user think time has resource costs on the
database. It doesn't necessarily require keeping the transaction open
(with hold cursors) but it's going to either require a snapshot to be
retained or the whole query to be executed by the DB and stored somewhere.

Then the user goes away on a week's holiday and leaves their PC at your
"next" button.

All in all, limit/offset have better bounded and defined costs, albeit
not very nice ones.

--
Craig Ringer


Re: Paged Query

От
Shaun Thomas
Дата:
On 07/09/2012 07:02 AM, Craig Ringer wrote:

> Do do cursors.

Did you mean "Do not use cursors" here?

> Then the user goes away on a week's holiday and leaves their PC at
> your "next" button.

This exactly. Cursors have limited functionality that isn't directly
disruptive to the database in general. At the very least, the
transaction ID reservation necessary to preserve a cursor long-term can
wreak havoc on your transaction ID wraparound if you have a fairly busy
database. I can't think of a single situation where either client
caching or LIMIT/OFFSET can't supplant it with better risk levels and costs.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com



______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Paged Query

От
Craig James
Дата:
On Mon, Jul 9, 2012 at 6:22 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 07/09/2012 07:02 AM, Craig Ringer wrote:

Do do cursors.

Did you mean "Do not use cursors" here?

Then the user goes away on a week's holiday and leaves their PC at
your "next" button.

This exactly. Cursors have limited functionality that isn't directly disruptive to the database in general. At the very least, the transaction ID reservation necessary to preserve a cursor long-term can wreak havoc on your transaction ID wraparound if you have a fairly busy database. I can't think of a single situation where either client caching or LIMIT/OFFSET can't supplant it with better risk levels and costs.

A good solution to this general problem is "hitlists."  I wrote about this concept before:

http://archives.postgresql.org/pgsql-performance/2010-05/msg00058.php

Craig James (the other Craig)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com



______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Paged Query

От
Greg Spiegelberg
Дата:
On Mon, Jul 9, 2012 at 8:16 AM, Craig James <cjames@emolecules.com> wrote:

A good solution to this general problem is "hitlists."  I wrote about this concept before:

http://archives.postgresql.org/pgsql-performance/2010-05/msg00058.php


I implemented  this exact strategy in our product years ago.  Our queries were once quite complicated involving many nested sub-SELECT's and several JOIN's per SELECT.  The basics of our implementation now consists of

 1. A table tracking all "cache" tables.  A cache table is a permanent table once represented as one of the former sub-SELECT's.  The table includes the MD5 hash of the query used to create the table, time created, query type (helps to determine expire time), and a comment field to help in debugging.
 2. Simple logic checking for the existence of the cache table and creating it if it does not.
 3. Using one or many of the named cache tables in the final query using ORDER BY-LIMIT-OFFSET in a CURSOR.
 4. One scheduled backend process to clear the "expired" cache tables based on the query type.

Reason for the CURSOR is to execute once to get a tally of records for pagination purposes then rewind and fetch the right "page".

Highly recommended.

-Greg

Re: Paged Query

От
Misa Simic
Дата:


2012/7/9 Gregg Jaskiewicz <gryzman@gmail.com>
Use cursors.
By far the most flexible. offset/limit have their down sides.


Well, I am not aware what down sides there are in LIMIT OFFSET what does not exist in any other solutions for paged queries... But agree there always must be some compromise between flexibility and response time (as long user "have" impression he works "immediatly" so is query executed in 1ms od 1s - not important...) 

Query must be parsed and executed (inside DB, before returns results... - so this time is unavoidable) Cursors will ensure just to take (executed results) 1 by 1 from DB,,, OK in Cursor scenario parse and Execute is done just once... But execution plans are cached - though I don't see big downside if it is executed thousands times... you will notice in Pg that second query is much faster then 1st one...

So if you need to go straight forward form page 1 to page 576 (in situations bellow 100 pages - 50 rows by page - no point to discuss performance... You can get all rows from DB at once and do "paging" in client side in memory) - I agree response will be a bit slower in LIMIT/OFFSET case, however not sure in CURSOR scenario it will be much faster, to be more worth then many others limits of Cursors in General... (Personally I have not used them more then 7 years - Really don't see need for them todays when hardware have more and more power...)

From my experience users even very rare go to ending pages... easier to them would be to sort data by field to get those rows in very first pages...

Kind Regards,

Misa

Re: Paged Query

От
Andrew Dunstan
Дата:
On 07/09/2012 01:41 PM, Misa Simic wrote:
>
>
> From my experience users even very rare go to ending pages... easier
> to them would be to sort data by field to get those rows in very first
> pages...
>
>


Yeah, the problem really is that most client code wants to know how many
pages there are, even if it only wants one page right now.

cheers

andrew

Re: Paged Query

От
Gurjeet Singh
Дата:
On Mon, Jul 9, 2012 at 1:46 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 07/09/2012 01:41 PM, Misa Simic wrote:


From my experience users even very rare go to ending pages... easier to them would be to sort data by field to get those rows in very first pages...




Yeah, the problem really is that most client code wants to know how many pages there are, even if it only wants one page right now.

FWIW, I wrote a little about getting the numbered results along with total result count in one query[1]. The suggestions in comments to use CTE provided even better performance.

[1] http://gurjeet-tech.blogspot.com/2011/02/pagination-of-results-in-postgres.html

Best regards,
--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: Paged Query

От
Misa Simic
Дата:
Hi Andrew,

Sure... We are sending data in Json to clients

{
total:6784,
data:[50 rows for desired page]
}

SELECT count(*) FROM table - without where, without joins used to have
bad performance... However, in real scenario we have never had the case
without joins and where... Join columns are always indexed, and we
always try to put indexes on columns what will mostly be used in where
in usual queries...

So far we haven't met performance problems...

But to be honest with you, total info very rarely in our responses is
bigger then 10k, and mainly is less then 1k... what is really small
number todays.. (even tables have few million rows, but restrictions
always reduce "desired" total data on less then 1000...)

When users want to work on something on every day basis... Usually they
want "immediatly", things, what are just for them...draft things on
what they worked in last few days, or assigned just to them etc etc...

When they need to pass trough some process once a month... And
performance is "slow" - usually they don't bother... Every day tasks is
what is important and what we care about to have good performance...


In very rarely cases, when we know, performance must be slow from many
reasons - we are lying :) - return first page, (hopefully with data
what user looking for), and return 1000 as total... Return result to
user, and async run CalculateTotalForThisCaseAndCache it... On first
next request for the same thing (but other page) if calculation is
done, return results from cache (with real total number)... But it is
really on very exceptional basis then on regular...

Cheers

Misa

Sent from my Windows Phone
From: Andrew Dunstan
Sent: 09/07/2012 19:47
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Paged Query

On 07/09/2012 01:41 PM, Misa Simic wrote:
>
>
> From my experience users even very rare go to ending pages... easier
> to them would be to sort data by field to get those rows in very first
> pages...
>
>


Yeah, the problem really is that most client code wants to know how many
pages there are, even if it only wants one page right now.

cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Paged Query

От
Craig Ringer
Дата:
On 07/09/2012 09:22 PM, Shaun Thomas wrote:
> On 07/09/2012 07:02 AM, Craig Ringer wrote:
>
>> Do do cursors.
>
> Did you mean "Do not use cursors" here?
>
Oops. "So do cursors".
>> Then the user goes away on a week's holiday and leaves their PC at
>> your "next" button.
>
> This exactly. Cursors have limited functionality that isn't directly
> disruptive to the database in general. At the very least, the
> transaction ID reservation necessary to preserve a cursor long-term
> can wreak havoc on your transaction ID wraparound if you have a fairly
> busy database. I can't think of a single situation where either client
> caching or LIMIT/OFFSET can't supplant it with better risk levels and
> costs.
>
My ideal is a cursor with timeout.

If I could use a cursor but know that the DB would automatically expire
the cursor and any associated resources after a certain inactivity
period (_not_ total life, inactivity) that'd be great. Or, for that
matter, a cursor the DB could expire when it began to get in the way.

I'm surprised more of the numerous tools that use LIMIT and OFFSET don't
instead use cursors that they hold for a short time, then drop if
there's no further activity and re-create next time there's interaction
from the user. ORMs that tend to use big joins would particularly
benefit from doing this.

I suspect the reason is that many tools - esp ORMs, web frameworks, etc
- try to be portable between DBs, and cursors are a high-quirk-density
area in SQL RDBMSs, not to mention unsupported by some DBs. Pity, though.

There's nothing wrong with using a cursor so long as you don't hang onto
it over user think-time without also setting a timeout of some kind to
destroy it in the background.

--
Craig Ringer



Re: Paged Query

От
Craig Ringer
Дата:
On 07/10/2012 06:24 AM, Misa Simic wrote:
> Hi Andrew,
>
> Sure... We are sending data in Json to clients
>
> {
> total:6784,
> data:[50 rows for desired page]
> }
>
> SELECT count(*) FROM table - without where, without joins used to have
> bad performance... However, in real scenario we have never had the case
> without joins and where... Join columns are always indexed, and we
> always try to put indexes on columns what will mostly be used in where
> in usual queries...

When/if you do need a count of a single table without any filters, a
common trick is to use table statistics to return an approximation. If
your autovaccum is running regularly it's usually a very good
approximation, too.

Sounds like this hack may become unnecessary in 9.2 though.

--
Craig Ringer

Re: Paged Query

От
Jeff Janes
Дата:
On Mon, Jul 9, 2012 at 4:50 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
>
>
> When/if you do need a count of a single table without any filters, a common
> trick is to use table statistics to return an approximation. If your
> autovaccum is running regularly it's usually a very good approximation, too.
>
> Sounds like this hack may become unnecessary in 9.2 though.

Index only scans in 9.2 are nice, but they don't fundamentally change
this type of thing.

Cheers,

Jeff

Re: Paged Query

От
Misa Simic
Дата:
Hi Andrew,

Sure... We are sending data in Json to clients

{
total:6784,
data:[50 rows for desired page]
}

SELECT count(*) FROM table - without where, without joins used to have
bad performance... However, in real scenario we have never had the case
without joins and where... Join columns are always indexed, and we
always try to put indexes on columns what will mostly be used in where
in usual queries...

So far we haven't met performance problems...

But to be honest with you, total info very rarely in our responses is
bigger then 10k, and mainly is less then 1k... what is really small
number todays.. (even tables have few million rows, but restrictions
always reduce "desired" total data on less then 1000...)

When users want to work on something on every day basis... Usually they
want "immediatly", things, what are just for them...draft things on
what they worked in last few days, or assigned just to them etc etc...

When they need to pass trough some process once a month... And
performance is "slow" - usually they don't bother... Every day tasks is
what is important and what we care about to have good performance...


In very rarely cases, when we know, performance must be slow from many
reasons - we are lying :) - return first page, (hopefully with data
what user looking for), and return 1000 as total... Return result to
user, and async run CalculateTotalForThisCaseAndCache it... On first
next request for the same thing (but other page) if calculation is
done, return results from cache (with real total number)... But it is
really on very exceptional basis then on regular...

Cheers

Misa

Sent from my Windows Phone
From: Andrew Dunstan
Sent: 09/07/2012 19:47
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Paged Query

On 07/09/2012 01:41 PM, Misa Simic wrote:
>
>
> From my experience users even very rare go to ending pages... easier
> to them would be to sort data by field to get those rows in very first
> pages...
>
>


Yeah, the problem really is that most client code wants to know how many
pages there are, even if it only wants one page right now.

cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Paged Query

От
Віталій Тимчишин
Дата:


Понеділок, 9 липня 2012 р. користувач Misa Simic <misa.simic@gmail.com> написав:
>
>
> 2012/7/9 Gregg Jaskiewicz <gryzman@gmail.com>
>>
>> Use cursors.
>> By far the most flexible. offset/limit have their down sides.
>
>
> Well, I am not aware what down sides there are in LIMIT OFFSET what does not exist in any other solutions for paged queries...

where key > last-previous-key order by key

--
Best regards,
 Vitalii Tymchyshyn

Re: Paged Query

От
Віталій Тимчишин
Дата:


Понеділок, 9 липня 2012 р. користувач Misa Simic <misa.simic@gmail.com> написав:
>
>
> 2012/7/9 Gregg Jaskiewicz <gryzman@gmail.com>
>>
>> Use cursors.
>> By far the most flexible. offset/limit have their down sides.
>
>
> Well, I am not aware what down sides there are in LIMIT OFFSET what does not exist in any other solutions for paged queries...

'where key > last-value order by key limit N' is much better in performance for large offsets.
p.s. Sorry for previous email- hit send too early.

--
Best regards,
 Vitalii Tymchyshyn