Обсуждение: BUG #14136: select distinct from a materialized view does not preserve result order
BUG #14136: select distinct from a materialized view does not preserve result order
От
seandknutson@gmail.com
Дата:
The following bug has been logged on the website: Bug reference: 14136 Logged by: Sean Email address: seandknutson@gmail.com PostgreSQL version: 9.5.0 Operating system: Arch Linux Description: It seems that ordering is lost when doing a "select distinct" from a materialized view. Say I have a table called "test" that contains id | name ----+------- 1 | steve 2 | adam 3 | jim 4 | steve 5 | adam 6 | sean and a materialized view defined as create materialized view view_test as (select * from test order by name); If I run select distinct name from view_test; I get name ------- steve adam sean jim instead of name ------- adam jim sean steve like I'd expect. If I have a non-materialized view with the same definition, the query returns name ------- adam jim sean steve as expected. For simple cases, the workaround is relatively simple: just reapply the same "order by" in the "select distinct.." query. E.g. select distinct name from view_test order by name; However, if the ordering defined in the mat view is complex, it may be difficult to replicate, or even impossible (say if the order is based on a column from a table that isn't included in the "select" part of the view definition).
seandknutson@gmail.com writes: > It seems that ordering is lost when doing a "select distinct" from a > materialized view. SELECT DISTINCT doesn't promise to preserve order in any context, matview or otherwise. If you want a particular output ordering you need to say ORDER BY explicitly in the query. Otherwise the planner is free to do the DISTINCT via hashing, as it evidently did here. (Actually, it's free to do it by hashing anyhow; but with ORDER BY it'd then have to re-sort.) > and a materialized view defined as > create materialized view view_test as (select * from test order by name); AFAIK, a matview is just a table, and even if you define it like that there's no guarantee that the rows in the matview will be physically ordered by name. Certainly, if you haven't created an index on the matview, the planner will assume that it must either sort-and-unique or hash in order to do the DISTINCT correctly ... and unless the table is too large for the hashtable to fit in memory, it's likely to think the hash approach is preferable. This seems like a documentation shortcoming, though, as the documentation doesn't really say that a matview won't preserve row order (or at least I see nothing about it in the CREATE MATERIALIZED VIEW ref page --- maybe it's mentioned somewhere else?) regards, tom lane
Re: BUG #14136: select distinct from a materialized view does not preserve result order
От
Kevin Grittner
Дата:
On Fri, May 13, 2016 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > seandknutson@gmail.com writes: >> and a materialized view defined as >> create materialized view view_test as (select * from test order by name); > > AFAIK, a matview is just a table, and even if you define it like that > there's no guarantee that the rows in the matview will be physically > ordered by name. > This seems like a documentation shortcoming, though, as the documentation > doesn't really say that a matview won't preserve row order (or at least > I see nothing about it in the CREATE MATERIALIZED VIEW ref page --- maybe > it's mentioned somewhere else?) I agree that it should be documented. I'll do something about that. Note that even if the table is initially created in order, there is no guarantee that REFRESH CONCURRENTLY will leave it in order. (In that regard it's more-or-less like CLUSTER.) And even if you query it while it is in order, there is no guarantee that this is the order in which the rows would be returned, as there could be a synchronous scan or an index-only scan on an index in some other order. >> If I have a non-materialized view with the same definition, >> the query returns >> >> name >> ------- >> adam >> jim >> sean >> steve >> >> as expected. It is bad technique to assume an order from a query that is not explicitly specified in that query, regardless of whether the data is being drawn from a table, view, or materialized view. A DISTINCT or GROUP BY clause should not be assumed to order the output -- it might sometimes do so, if that is the fastest way to group things; but there is no requirement that it do so. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: BUG #14136: select distinct from a materialized view does not preserve result order
От
Sean Knutson
Дата:
> It is bad technique to assume an order from a query that is not > explicitly specified in that query, regardless of whether the data > is being drawn from a table, view, or materialized view. A > DISTINCT or GROUP BY clause should not be assumed to order the > output -- it might sometimes do so, if that is the fastest way to > group things; but there is no requirement that it do so. Thanks for the quick replies! Yeah, that makes sense. I think at first I blindly assumed that it would preserve the order of the result set when doing a DISTINCT, although it makes sense why it wouldn't (or at least that you can't count on it). I think the bigger point of confusion for me (and the reason I felt it worth reporting as a bug) was why the DISTINCT would behave differently when selecting from a mat view or table vs from a view or subquery, and it sounds like that's simply a result of which algorithm the query planner chooses to evaluate the DISTINCT in each situation? > I agree that it should be documented. I'll do something about that. Thanks, I think that will help a great deal! All the best, -Sean On Fri, May 13, 2016 at 7:33 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Fri, May 13, 2016 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > seandknutson@gmail.com writes: > > >> and a materialized view defined as > >> create materialized view view_test as (select * from test order by > name); > > > > AFAIK, a matview is just a table, and even if you define it like that > > there's no guarantee that the rows in the matview will be physically > > ordered by name. > > > This seems like a documentation shortcoming, though, as the documentation > > doesn't really say that a matview won't preserve row order (or at least > > I see nothing about it in the CREATE MATERIALIZED VIEW ref page --- maybe > > it's mentioned somewhere else?) > > I agree that it should be documented. I'll do something about that. > > Note that even if the table is initially created in order, there is > no guarantee that REFRESH CONCURRENTLY will leave it in order. (In > that regard it's more-or-less like CLUSTER.) And even if you query > it while it is in order, there is no guarantee that this is the > order in which the rows would be returned, as there could be a > synchronous scan or an index-only scan on an index in some other > order. > > >> If I have a non-materialized view with the same definition, > >> the query returns > >> > >> name > >> ------- > >> adam > >> jim > >> sean > >> steve > >> > >> as expected. > > It is bad technique to assume an order from a query that is not > explicitly specified in that query, regardless of whether the data > is being drawn from a table, view, or materialized view. A > DISTINCT or GROUP BY clause should not be assumed to order the > output -- it might sometimes do so, if that is the fastest way to > group things; but there is no requirement that it do so. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >