Обсуждение: View Index and UNION
Hi I've encountered a fundamental problem which - to me - can only be solved with an (future/possible) real index on views in PostgreSQL (like the exist already in MS SQL Server and Ora): Given following schema: 1. TABLE a and TABLE b, each with INDEX on attribute geom. 2. A VIEW with union: CREATE VIEW myview AS SELECT * FROM a UNION SELECT * FROM b; 3. And a simple query with KNN index and a coordinate "mypos" : SELECT * FROM myview ORDER BY ST_Geomfromtext(mypos) <-> myview.geom Now, the problem is, that for the "order by" it is not enough that each on the two tables calculate the ordering separately: We want a total ordering over all involved tables! In fact, the planner realizes that and chooses a seq scan over all tuples of table a and b - which is slow and suboptimal! To me, that's a use case where we would wish to have a distinct index on views. Any opinions on this? Yours, Stefan
Could this scenario not be handled by a step that orders the two tables independently, then for the view interleaves the presorted results? Merging two sorted sets into a single sorted set is usually a trivial task, and it could still take advantage of the existing indexes. William King Senior Engineer Quentus Technologies, INC 1037 NE 65th St Suite 273 Seattle, WA 98115 Main: (877) 211-9337 Office: (206) 388-4772 Cell: (253) 686-5518 william.king@quentustech.com On 05/25/2013 05:35 PM, Stefan Keller wrote: > Hi > > I've encountered a fundamental problem which - to me - can only be > solved with an (future/possible) real index on views in PostgreSQL > (like the exist already in MS SQL Server and Ora): > > Given following schema: > > 1. TABLE a and TABLE b, each with INDEX on attribute geom. > > 2. A VIEW with union: > > CREATE VIEW myview AS > SELECT * FROM a > UNION > SELECT * FROM b; > > 3. And a simple query with KNN index and a coordinate "mypos" : > > SELECT * FROM myview > ORDER BY ST_Geomfromtext(mypos) <-> myview.geom > > Now, the problem is, that for the "order by" it is not enough that > each on the two tables calculate the ordering separately: We want a > total ordering over all involved tables! > > In fact, the planner realizes that and chooses a seq scan over all > tuples of table a and b - which is slow and suboptimal! > > To me, that's a use case where we would wish to have a distinct index on views. > > Any opinions on this? > > Yours, Stefan > >
Yes, it actually does, but the planner chooses a seq scan to prepare for that. -S. 2013/5/26 William King <william.king@quentustech.com>: > Could this scenario not be handled by a step that orders the two tables > independently, then for the view interleaves the presorted results? > Merging two sorted sets into a single sorted set is usually a trivial > task, and it could still take advantage of the existing indexes. > > William King > Senior Engineer > Quentus Technologies, INC > 1037 NE 65th St Suite 273 > Seattle, WA 98115 > Main: (877) 211-9337 > Office: (206) 388-4772 > Cell: (253) 686-5518 > william.king@quentustech.com > > On 05/25/2013 05:35 PM, Stefan Keller wrote: >> Hi >> >> I've encountered a fundamental problem which - to me - can only be >> solved with an (future/possible) real index on views in PostgreSQL >> (like the exist already in MS SQL Server and Ora): >> >> Given following schema: >> >> 1. TABLE a and TABLE b, each with INDEX on attribute geom. >> >> 2. A VIEW with union: >> >> CREATE VIEW myview AS >> SELECT * FROM a >> UNION >> SELECT * FROM b; >> >> 3. And a simple query with KNN index and a coordinate "mypos" : >> >> SELECT * FROM myview >> ORDER BY ST_Geomfromtext(mypos) <-> myview.geom >> >> Now, the problem is, that for the "order by" it is not enough that >> each on the two tables calculate the ordering separately: We want a >> total ordering over all involved tables! >> >> In fact, the planner realizes that and chooses a seq scan over all >> tuples of table a and b - which is slow and suboptimal! >> >> To me, that's a use case where we would wish to have a distinct index on views. >> >> Any opinions on this? >> >> Yours, Stefan >> >> > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
I appear to have been able to replicate what you are talking about, but it required explicitly binding the order by in different ways. See attached files. William King Senior Engineer Quentus Technologies, INC 1037 NE 65th St Suite 273 Seattle, WA 98115 Main: (877) 211-9337 Office: (206) 388-4772 Cell: (253) 686-5518 william.king@quentustech.com On 05/26/2013 02:22 AM, Stefan Keller wrote: > Yes, it actually does, but the planner chooses a seq scan to prepare for that. > > -S. > > 2013/5/26 William King <william.king@quentustech.com>: >> Could this scenario not be handled by a step that orders the two tables >> independently, then for the view interleaves the presorted results? >> Merging two sorted sets into a single sorted set is usually a trivial >> task, and it could still take advantage of the existing indexes. >> >> William King >> Senior Engineer >> Quentus Technologies, INC >> 1037 NE 65th St Suite 273 >> Seattle, WA 98115 >> Main: (877) 211-9337 >> Office: (206) 388-4772 >> Cell: (253) 686-5518 >> william.king@quentustech.com >> >> On 05/25/2013 05:35 PM, Stefan Keller wrote: >>> Hi >>> >>> I've encountered a fundamental problem which - to me - can only be >>> solved with an (future/possible) real index on views in PostgreSQL >>> (like the exist already in MS SQL Server and Ora): >>> >>> Given following schema: >>> >>> 1. TABLE a and TABLE b, each with INDEX on attribute geom. >>> >>> 2. A VIEW with union: >>> >>> CREATE VIEW myview AS >>> SELECT * FROM a >>> UNION >>> SELECT * FROM b; >>> >>> 3. And a simple query with KNN index and a coordinate "mypos" : >>> >>> SELECT * FROM myview >>> ORDER BY ST_Geomfromtext(mypos) <-> myview.geom >>> >>> Now, the problem is, that for the "order by" it is not enough that >>> each on the two tables calculate the ordering separately: We want a >>> total ordering over all involved tables! >>> >>> In fact, the planner realizes that and chooses a seq scan over all >>> tuples of table a and b - which is slow and suboptimal! >>> >>> To me, that's a use case where we would wish to have a distinct index on views. >>> >>> Any opinions on this? >>> >>> Yours, Stefan >>> >>> >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Stefan Keller <sfkeller@gmail.com> writes: > Given following schema: > 1. TABLE a and TABLE b, each with INDEX on attribute geom. > 2. A VIEW with union: > CREATE VIEW myview AS > SELECT * FROM a > UNION > SELECT * FROM b; > 3. And a simple query with KNN index and a coordinate "mypos" : > SELECT * FROM myview > ORDER BY ST_Geomfromtext(mypos) <-> myview.geom I think this would work out-of-the-box in 9.1 or later, if you made the view use UNION ALL instead of UNION. regards, tom lane
Hi Tom You are right: UNION ALL is correct in terms of contents (tables contents are disjunct) and of performance (no separate sort required theoretically). In my specific case even with UNION ALL the planner still chose a "Seq Scan". Note that there is a KNN index with "ORDER BY ... <-> ..." involved. I have to dig into my tests in order to give you the EXPLAIN ANALYZE. Yours, Stefan 2013/5/26 Tom Lane <tgl@sss.pgh.pa.us>: > Stefan Keller <sfkeller@gmail.com> writes: >> Given following schema: > >> 1. TABLE a and TABLE b, each with INDEX on attribute geom. > >> 2. A VIEW with union: > >> CREATE VIEW myview AS >> SELECT * FROM a >> UNION >> SELECT * FROM b; > >> 3. And a simple query with KNN index and a coordinate "mypos" : > >> SELECT * FROM myview >> ORDER BY ST_Geomfromtext(mypos) <-> myview.geom > > I think this would work out-of-the-box in 9.1 or later, if you > made the view use UNION ALL instead of UNION. > > regards, tom lane