Обсуждение: Documentation example for RECURISVE WITH isn't what I would expect

Поиск
Список
Период
Сортировка

Documentation example for RECURISVE WITH isn't what I would expect

От
Alex Sherwin
Дата:
The 9.1 docs for RECURSIVE WITH: http://www.postgresql.org/docs/9.1/static/queries-with.html eventually builds up to this example query:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (       SELECT g.id, g.link, g.data, 1,         ARRAY[g.id],         false       FROM graph g     UNION ALL       SELECT g.id, g.link, g.data, sg.depth + 1,         path || g.id,         g.id = ANY(path)       FROM graph g, search_graph sg       WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;

From reading the docs and this query, it would be my understanding that the recursive portion (after UNION ALL) can refer to itself, which is what it appears this query does with the definition of cycle as "g.id = ANY(path)" then later referring to it in the WHERE clause "ANT NOT cycle", meaning that I would expect this to exclude duplicate traversals.

But, in my test this is not the case, I have to modify the query (complete working example below) as follows to make it work as the docs (I believe) intended:

drop table graph;
create table graph (
id varchar not null primary key,
link varchar null,
data varchar null);

insert into graph (id, link, data) values ('a', null, 'a');
insert into graph (id, link, data) values ('ab', 'a', 'ab');
insert into graph (id, link, data) values ('ac', 'a', 'ac');
insert into graph (id, link, data) values ('aca', 'ac', 'aca');
insert into graph (id, link, data) values ('acb', 'ac', 'acb');
insert into graph (id, link, data) values ('acba', 'acb', 'acba');
insert into graph (id, link, data) values ('acbb', 'acb', 'acbb');
insert into graph (id, link, data) values ('acc', 'ac', 'acc');
insert into graph (id, link, data) values ('ad', 'a', 'ad');

with recursive search_graph(id, link, data, depth, path, cycle) as 
(
  select 
    g.id, g.link, g.data, 1, array[g.id], false 
  from graph g
  union all
  select 
    g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id=any(path || g.id)
  from graph g, search_graph sg
  where g.id=sg.link and not cycle and not g.id=any(path || g.id)
)
select * from search_graph


Without the change "g.id=any(path || g.id)", cycle is never set to true.  If this were the only change, the first level ('a') would still be re-visited, but stop any further duplicate visits from there.  In order to prevent 'a' from being re-visited altogether, the WHERE must be modified as well with "and not g.id=any(path || g.id)", which essentially negates the need for cycle to begin with and can be re-factored as:

with recursive search_graph(id, link, data, depth, path) as 
(
  select 
    g.id, g.link, g.data, 1, array[g.id]
  from graph g
  union all
  select 
    g.id, g.link, g.data, sg.depth + 1, path || g.id
  from graph g, search_graph sg
  where g.id=sg.link and not g.id=any(path || g.id)
)
select * from search_graph


Unless I'm missing something here.. but I don't think the example in the docs works correctly.

To apply the docs example to my data set, would look like:

with recursive search_graph(id, link, data, depth, path, cycle) as 
(
  select 
    g.id, g.link, g.data, 1, array[g.id], false 
  from graph g
  union all
  select 
    g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id=any(path)
  from graph g, search_graph sg
  where g.id=sg.link and not cycle
)
select * from search_graph


Which re-visits many rows





--
Alexander Sherwin

Re: Documentation example for RECURISVE WITH isn't what I would expect

От
Tom Lane
Дата:
Alex Sherwin <alex.sherwin@gmail.com> writes:
> The 9.1 docs for RECURSIVE WITH:
> http://www.postgresql.org/docs/9.1/static/queries-with.html eventually
> builds up to this example query:
> [ which is claimed not to work ]

No, the example works as intended for me.  The reason "cycle" never gets
set to true in your test case is that your test data contains no loops.
Try something like this instead:

create table graph (id int, link int, data text);
insert into graph values (1, null, 'root');
insert into graph values (2, 1, 'top 1');
insert into graph values (3, 1, 'top 2');
insert into graph values (4, 2, 'child 1');
insert into graph values (5, 3, 'child 2');
insert into graph values (10, 11, 'loop 1');
insert into graph values (11, 10, 'loop 2');

The query as shown in the docs produces:

 id | link |  data   | depth |    path    | cycle
----+------+---------+-------+------------+-------
  1 |      | root    |     1 | {1}        | f
  2 |    1 | top 1   |     1 | {2}        | f
  3 |    1 | top 2   |     1 | {3}        | f
  4 |    2 | child 1 |     1 | {4}        | f
  5 |    3 | child 2 |     1 | {5}        | f
 10 |   11 | loop 1  |     1 | {10}       | f
 11 |   10 | loop 2  |     1 | {11}       | f
  1 |      | root    |     2 | {3,1}      | f
  1 |      | root    |     2 | {2,1}      | f
  2 |    1 | top 1   |     2 | {4,2}      | f
  3 |    1 | top 2   |     2 | {5,3}      | f
 10 |   11 | loop 1  |     2 | {11,10}    | f
 11 |   10 | loop 2  |     2 | {10,11}    | f
  1 |      | root    |     3 | {4,2,1}    | f
  1 |      | root    |     3 | {5,3,1}    | f
 10 |   11 | loop 1  |     3 | {10,11,10} | t
 11 |   10 | loop 2  |     3 | {11,10,11} | t
(17 rows)

Your proposed query produces:

 id | link |  data   | depth | path | cycle
----+------+---------+-------+------+-------
  1 |      | root    |     1 | {1}  | f
  2 |    1 | top 1   |     1 | {2}  | f
  3 |    1 | top 2   |     1 | {3}  | f
  4 |    2 | child 1 |     1 | {4}  | f
  5 |    3 | child 2 |     1 | {5}  | f
 10 |   11 | loop 1  |     1 | {10} | f
 11 |   10 | loop 2  |     1 | {11} | f
(7 rows)

which seems pretty broken to me, since it's not showing the paths at
all, much less providing any hint of where the loops are.

You could argue about the usefulness of the specific output from the
sample query --- in particular, it might make more sense to treat the
links as going in the other direction --- but the point here is to
discover all the paths that are traceable through the given set of
links.

            regards, tom lane