Обсуждение: BUG #14136: select distinct from a materialized view does not preserve result order

Поиск
Список
Период
Сортировка

BUG #14136: select distinct from a materialized view does not preserve result order

От
seandknutson@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      14136
Logged by:          Sean
Email address:      seandknutson@gmail.com
PostgreSQL version: 9.5.0
Operating system:   Arch Linux
Description:

It seems that ordering is lost when doing a "select distinct" from a
materialized view.

Say I have a table called "test" that contains

 id | name
----+-------
  1 | steve
  2 | adam
  3 | jim
  4 | steve
  5 | adam
  6 | sean

and a materialized view defined as

create materialized view view_test as (select * from test order by name);

If I run

select distinct name from view_test;

I get

 name
-------
 steve
 adam
 sean
 jim

instead of

 name
-------
 adam
 jim
 sean
 steve

like I'd expect. If I have a non-materialized view with the same definition,
the query returns

 name
-------
 adam
 jim
 sean
 steve

as expected.

For simple cases, the workaround is relatively simple: just reapply the same
"order by" in the "select distinct.." query. E.g.

select distinct name from view_test order by name;

However, if the ordering defined in the mat view is complex, it may be
difficult to replicate, or even impossible (say if the order is based on a
column from a table that isn't included in the "select" part of the view
definition).
seandknutson@gmail.com writes:
> It seems that ordering is lost when doing a "select distinct" from a
> materialized view.

SELECT DISTINCT doesn't promise to preserve order in any context,
matview or otherwise.  If you want a particular output ordering
you need to say ORDER BY explicitly in the query.  Otherwise the
planner is free to do the DISTINCT via hashing, as it evidently
did here.  (Actually, it's free to do it by hashing anyhow; but
with ORDER BY it'd then have to re-sort.)

> and a materialized view defined as
> create materialized view view_test as (select * from test order by name);

AFAIK, a matview is just a table, and even if you define it like that
there's no guarantee that the rows in the matview will be physically
ordered by name.  Certainly, if you haven't created an index on the
matview, the planner will assume that it must either sort-and-unique
or hash in order to do the DISTINCT correctly ... and unless the table
is too large for the hashtable to fit in memory, it's likely to think
the hash approach is preferable.

This seems like a documentation shortcoming, though, as the documentation
doesn't really say that a matview won't preserve row order (or at least
I see nothing about it in the CREATE MATERIALIZED VIEW ref page --- maybe
it's mentioned somewhere else?)

            regards, tom lane

Re: BUG #14136: select distinct from a materialized view does not preserve result order

От
Kevin Grittner
Дата:
On Fri, May 13, 2016 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> seandknutson@gmail.com writes:

>> and a materialized view defined as
>> create materialized view view_test as (select * from test order by name);
>
> AFAIK, a matview is just a table, and even if you define it like that
> there's no guarantee that the rows in the matview will be physically
> ordered by name.

> This seems like a documentation shortcoming, though, as the documentation
> doesn't really say that a matview won't preserve row order (or at least
> I see nothing about it in the CREATE MATERIALIZED VIEW ref page --- maybe
> it's mentioned somewhere else?)

I agree that it should be documented.  I'll do something about that.

Note that even if the table is initially created in order, there is
no guarantee that REFRESH CONCURRENTLY will leave it in order.  (In
that regard it's more-or-less like CLUSTER.)  And even if you query
it while it is in order, there is no guarantee that this is the
order in which the rows would be returned, as there could be a
synchronous scan or an index-only scan on an index in some other
order.

>> If I have a non-materialized view with the same definition,
>> the query returns
>>
>>  name
>> -------
>>  adam
>>  jim
>>  sean
>>  steve
>>
>> as expected.

It is bad technique to assume an order from a query that is not
explicitly specified in that query, regardless of whether the data
is being drawn from a table, view, or materialized view.  A
DISTINCT or GROUP BY clause should not be assumed to order the
output -- it might sometimes do so, if that is the fastest way to
group things; but there is no requirement that it do so.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #14136: select distinct from a materialized view does not preserve result order

От
Sean Knutson
Дата:
> It is bad technique to assume an order from a query that is not
> explicitly specified in that query, regardless of whether the data
> is being drawn from a table, view, or materialized view.  A
> DISTINCT or GROUP BY clause should not be assumed to order the
> output -- it might sometimes do so, if that is the fastest way to
> group things; but there is no requirement that it do so.

Thanks for the quick replies! Yeah, that makes sense. I think at first I
blindly assumed that it would preserve the order of the result set when
doing a DISTINCT, although it makes sense why it wouldn't (or at least that
you can't count on it). I think the bigger point of confusion for me (and
the reason I felt it worth reporting as a bug) was why the DISTINCT would
behave differently when selecting from a mat view or table vs from a view
or subquery, and it sounds like that's simply a result of which algorithm
the query planner chooses to evaluate the DISTINCT in each situation?

> I agree that it should be documented.  I'll do something about that.

Thanks, I think that will help a great deal!

All the best,
-Sean

On Fri, May 13, 2016 at 7:33 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

> On Fri, May 13, 2016 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > seandknutson@gmail.com writes:
>
> >> and a materialized view defined as
> >> create materialized view view_test as (select * from test order by
> name);
> >
> > AFAIK, a matview is just a table, and even if you define it like that
> > there's no guarantee that the rows in the matview will be physically
> > ordered by name.
>
> > This seems like a documentation shortcoming, though, as the documentation
> > doesn't really say that a matview won't preserve row order (or at least
> > I see nothing about it in the CREATE MATERIALIZED VIEW ref page --- maybe
> > it's mentioned somewhere else?)
>
> I agree that it should be documented.  I'll do something about that.
>
> Note that even if the table is initially created in order, there is
> no guarantee that REFRESH CONCURRENTLY will leave it in order.  (In
> that regard it's more-or-less like CLUSTER.)  And even if you query
> it while it is in order, there is no guarantee that this is the
> order in which the rows would be returned, as there could be a
> synchronous scan or an index-only scan on an index in some other
> order.
>
> >> If I have a non-materialized view with the same definition,
> >> the query returns
> >>
> >>  name
> >> -------
> >>  adam
> >>  jim
> >>  sean
> >>  steve
> >>
> >> as expected.
>
> It is bad technique to assume an order from a query that is not
> explicitly specified in that query, regardless of whether the data
> is being drawn from a table, view, or materialized view.  A
> DISTINCT or GROUP BY clause should not be assumed to order the
> output -- it might sometimes do so, if that is the fastest way to
> group things; but there is no requirement that it do so.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>