Re: [HACKERS] Solution for LIMIT cost estimation
От | Philip Warner |
---|---|
Тема | Re: [HACKERS] Solution for LIMIT cost estimation |
Дата | |
Msg-id | 3.0.5.32.20000215001755.03435430@mail.rhyme.com.au обсуждение исходный текст |
Ответ на | Solution for LIMIT cost estimation (Chris <chris@bitmead.com>) |
Ответы |
Re: [HACKERS] Solution for LIMIT cost estimation
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
At 23:12 14/02/00 +1100, Chris wrote: > >How about this as a compromise: > >If you give an offset without an ORDER BY the offset >is useless if this optimisation is in place. If you >allowed the offset with the optimisation and no >order by it would be encouraging broken behaviour. Not that I would do this necessarily, but select * from t where <stuff> offset 1 limit 1 is a valid way of checking for duplicates. I would hate to see the optimization turned off in this case. >So therefore it would be reasonable to optimise a >limit,offset query with no order by as if there were >no offset. This would give consistent results, albeit >it may not choose the best plan. But at least it >won't hurt anyone. ...etc The problem with using a stateless connection is that you have no transaction control, so can not control table contents between calls. eg. if it contains: f - abel tasman charles sturt ferdinand marcos (spot the odd one out) and do a 'select * from t order by f offset 0 limit 2', then someone adds 'bruce stringsteen' and you try to get the next two rows via 'select * from t order by f offset 2 limit 2', you will get 'charles sturt' again, and miss bruce. Either you have to say that the database is almost never updated (ie. it's just a copy of real data, used for web applications), in which case you can add all sorts of fields for optimizing stateless calls (notably an ID field), or you have to implement some kind of state preservation, and dump ID's into a temporary table or use 'held' cursors, which is not really that hard [Don't know if PG supports either, but you can 'fake' temporary tables pretty easily]. I may have missed something in what you need, but someone else has already mentioned using 'MOVE' within a cursor, and it still seems to me that putting the optimizer through hoops to achieve the result is liable to be a problem in the long term. eg. The Dec/Rdb optimizer actually assesses it's strategy while it's running. If the query is taking too long, or the estimates it used prove too inaccurate, it may change strategy. If PG implemented such a thing, then this whole approach to offset/limit would be blown away - a strategy will change depending on the data retrieved. It would be a pity if this sort of improvement in the optimizer were blocked because of problems caused by breaking successive calls to offset/limit queries. Maybe either 'held cursors' or 'local temporary tables' could be added to the ToDo list for a future release. As to documenting the behaviour, I suspect that any NOTICE should also say 'This behaviour may change in the future - don't rely on it unless you like living dangerously'. Just my 0.02c, but I don't like putting limits on an optimizer. As an aside, and because I like bringing this thing up, stored query strategies would solve the problem for selected queries; you could specify the strategy to be used in all executions of a prticular query...maybe this could go on the ToDo list? ;-} ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
В списке pgsql-hackers по дате отправления:
Следующее
От: Joel ReedДата:
Сообщение: patch for binding tuples from result set to user allocated memory