Обсуждение: recursive WITH nested union ALL with NOCYCLE logic

Поиск
Список
Период
Сортировка

recursive WITH nested union ALL with NOCYCLE logic

От
Michael Moore
Дата:
I have two tables, 1 is a hierarchical table and the other a map to alternative hierarchies. Given a starting node, I need to be able to return the hierarchy and all related hierarchies. 

In the mike_hier table note that:
aaa is the top of my target hierarchy with a children; bbb and ccc
However, from mike_map (see below) , we see that there are two alternative hierarchies for aaa, namely ddd, and eee .
 

CREATE TABLE mike_hier
(
  key character(3) NOT NULL,
  val character varying(5),
  parent character(3),
  CONSTRAINT key PRIMARY KEY (key)
);
INSERT INTO mike_hier( key, val, parent) VALUES  
('aaa','tom',''),
('bbb','cat','aaa'),
('ccc','tad','aaa'),
('ddd','cow',''),
('eee','rat','ddd'),
('fff','fan','ddd'),
('ggg','ram','eee'),
('hhh','sam',''),
('iii','ted','hhh'),
('jjj','abe','hhh'),
('kkk','red',''),
('lll','blu','kkk'),
('mmm','yl','kkk');

CREATE TABLE mike_map
(
  key character(3) NOT NULL,
  child character(3),
  parent character(3),
  CONSTRAINT key_const PRIMARY KEY (key)
);
insert into mike_map (key,child,parent) values
('111','ddd','aaa'),
('222','eee','aaa'),
('333','hhh','kkk');

I got pretty much what I want with :
with
    recursive inn_t(keyv, val, parent) as (
       select * from (
          select key as keyv, val, parent 
             from mike_hier hi where hi.key ='aaa'
          union all
            -- get all alt hierarchies
            select child ,null ,null from mike_map ma  where ma.parent ='aaa' ) gg
    union all
    (
       with xxx as ( select * from inn_t i ) -- only a single reference allowed to inn_t
         select * from 
          (
          select mh.key     , mh.val        , mh.parent 
           from mike_hier mh
           where  mh.parent in (select keyv from xxx) -- normally would join inn_t
          union all
           select child ,null ,null 
           from mike_map ma  
           where ma.parent in (select keyv from xxx)  -- normally would join inn_t
          ) unionall
      )
     ) 
   select distinct * from inn_t where val is not null;

So far so good, but what if I introduce a data loop?
insert into mike_map (key,child,parent) values
('555','aaa','aaa');

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (        SELECT g.id, g.link, g.data, 1,          ARRAY[g.id],          false        FROM graph g      UNION ALL        SELECT g.id, g.link, g.data, sg.depth + 1,          path || g.id,          g.id = ANY(path)        FROM graph g, search_graph sg        WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
I can't figure out how to work this ARRAY concept into my existing query since I am not actually JOINing in the recursive part of my query. 


Re: recursive WITH nested union ALL with NOCYCLE logic

От
"David G. Johnston"
Дата:
On Fri, Mar 18, 2016 at 2:06 PM, Michael Moore <michaeljmoore@gmail.com> wrote:
I have two tables, 1 is a hierarchical table and the other a map to alternative hierarchies. Given a starting node, I need to be able to return the hierarchy and all related hierarchies. 

with
    recursive inn_t(keyv, val, parent) as (
       select * from (
          select key as keyv, val, parent 
             from mike_hier hi where hi.key ='aaa'
          union all
            -- get all alt hierarchies
            select child ,null ,null from mike_map ma  where ma.parent ='aaa' ) gg
    union all
    (
       with xxx as ( select * from inn_t i ) -- only a single reference allowed to inn_t
         select * from 
          (
          select mh.key     , mh.val        , mh.parent 
           from mike_hier mh
           where  mh.parent in (select keyv from xxx) -- normally would join inn_t
          union all
           select child ,null ,null 
           from mike_map ma  
           where ma.parent in (select keyv from xxx)  -- normally would join inn_t
          ) unionall
      )
     ) 
   select distinct * from inn_t where val is not null;


Where should I send the bill for the pain relievers :)

​with recursive --applies to the second CTE really but placed at the top by convention (maybe by rule)
inn_t(keyv, val, parent) as (  --not recursive, no reference to inn_t in this CTE
-- Given a base tree lets return all rows where it is the primary...
       select * from (
          select key as keyv, val, parent 
             from mike_hier hi where hi.key ='aaa'
-- ...as well as the primary rows for any of its alises (derived though they may be it should work)
          union all
            -- get all alt hierarchies
            select child ,null ,null from mike_map ma  where ma.parent ='aaa' ) gg
), recurse_here_instead AS (
-- Now for each of the those primary rows locate in mike_heir locate the direct descendants
-- and add them to the working set.  On the next pass the original parents will be skipped
-- because they were already processed but all of these newly added children will be
-- put through the wringer to find their children.
         select * from inn_t i   --initial condition is a complex query so simplify the recursive portion by referecing a CTE
         UNION ALL
          select mh.key     , mh.val        , mh.parent 
           from mike_hier mh
           join inn_t ON (mh.parent = inn_t.keyv)
)
-- got rid of distinct...honestly not positive why but I suspect if you write the query correct DISTINCT on the outer layer should
-- be redundant.
   select * from recurse_here_instead where val is not null;

​I haven't yet coded a variation of this query that used the path array and cycle-avoidance logic so I'm leaving that open for the moment.  Now that this is written more correctly incorporating that from other's examples should be easier.

David J.

Re: recursive WITH nested union ALL with NOCYCLE logic

От
Michael Moore
Дата:

On Fri, Mar 18, 2016 at 2:32 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Mar 18, 2016 at 2:06 PM, Michael Moore <michaeljmoore@gmail.com> wrote:
I have two tables, 1 is a hierarchical table and the other a map to alternative hierarchies. Given a starting node, I need to be able to return the hierarchy and all related hierarchies. 

with
    recursive inn_t(keyv, val, parent) as (
       select * from (
          select key as keyv, val, parent 
             from mike_hier hi where hi.key ='aaa'
          union all
            -- get all alt hierarchies
            select child ,null ,null from mike_map ma  where ma.parent ='aaa' ) gg
    union all
    (
       with xxx as ( select * from inn_t i ) -- only a single reference allowed to inn_t
         select * from 
          (
          select mh.key     , mh.val        , mh.parent 
           from mike_hier mh
           where  mh.parent in (select keyv from xxx) -- normally would join inn_t
          union all
           select child ,null ,null 
           from mike_map ma  
           where ma.parent in (select keyv from xxx)  -- normally would join inn_t
          ) unionall
      )
     ) 
   select distinct * from inn_t where val is not null;


Where should I send the bill for the pain relievers :)

​with recursive --applies to the second CTE really but placed at the top by convention (maybe by rule)
inn_t(keyv, val, parent) as (  --not recursive, no reference to inn_t in this CTE
-- Given a base tree lets return all rows where it is the primary...
       select * from (
          select key as keyv, val, parent 
             from mike_hier hi where hi.key ='aaa'
-- ...as well as the primary rows for any of its alises (derived though they may be it should work)
          union all
            -- get all alt hierarchies
            select child ,null ,null from mike_map ma  where ma.parent ='aaa' ) gg
), recurse_here_instead AS (
-- Now for each of the those primary rows locate in mike_heir locate the direct descendants
-- and add them to the working set.  On the next pass the original parents will be skipped
-- because they were already processed but all of these newly added children will be
-- put through the wringer to find their children.
         select * from inn_t i   --initial condition is a complex query so simplify the recursive portion by referecing a CTE
         UNION ALL
          select mh.key     , mh.val        , mh.parent 
           from mike_hier mh
           join inn_t ON (mh.parent = inn_t.keyv)
)
-- got rid of distinct...honestly not positive why but I suspect if you write the query correct DISTINCT on the outer layer should
-- be redundant.
   select * from recurse_here_instead where val is not null;

​I haven't yet coded a variation of this query that used the path array and cycle-avoidance logic so I'm leaving that open for the moment.  Now that this is written more correctly incorporating that from other's examples should be easier.

David J.

David,
If I am understanding you correctly, you are assuming that alternative hierarchies are mapped to only ROOT level hierarchies. It's a reasonable assumption on your part given my illustration only covered this use case. But lets add another mapping.

insert into mike_map (key,child,parent) values
('555','kkk','bbb');
This creates an alternative hierarchy that branches from a CHILD (bbb) of the ROOT (aaa) hierarchy.
So I think I still need the UNION ALL inside the recursive part. 



Re: recursive WITH nested union ALL with NOCYCLE logic

От
"David G. Johnston"
Дата:
On Fri, Mar 18, 2016 at 4:11 PM, Michael Moore <michaeljmoore@gmail.com> wrote:
David,
If I am understanding you correctly, you are assuming that alternative hierarchies are mapped to only ROOT level hierarchies. It's a reasonable assumption on your part given my illustration only covered this use case. But lets add another mapping.

insert into mike_map (key,child,parent) values
('555','kkk','bbb');
This creates an alternative hierarchy that branches from a CHILD (bbb) of the ROOT (aaa) hierarchy.
So I think I still need the UNION ALL inside the recursive part. 


​You are correct about my assumed premise - and I'm not sure I want to deal with the brain damage that would result from exploring the model you are working with...

​If you can change how you model "mike_map" this should be workable.

The following provides the desired result for the original data as well as, I think, the correct result when adding you kkk/bbb alias.

I think this gets you closer..but probably not all of the way to your goal.

My specific concern is that none of these queries (including your original) have considered "ddd,cow,null" as a valid result...because it lacks a parent.

I'll leave to you to copy-paste and make it readable...

WITH recursive
mike_map_alt (parent, children) AS (
--using an array here is probably not mandatory but makes the logic below simpler by use of unnest()
VALUES ('aaa',ARRAY['ddd','eee']::text[]), ('bbb',ARRAY['kkk']::text[])
), mike_hier_aliased AS (
--here we "canonical-ize" the data so that for each row we know which direct aliases are in play
SELECT mh.key::text AS key, mh.val, mh.parent::text AS parent, 
mh.key::text || COALESCE(mp.children, ARRAY[]::text[]) AS key_aliases,
mh.parent::text || COALESCE(mk.children, ARRAY[]::text[]) AS parent_aliases
FROM mike_hier mh
LEFT JOIN mike_map_alt mp ON (mh.key = mp.parent)
LEFT JOIN mike_map_alt mk ON (mh.parent = mk.parent)
)
--SELECT * FROM mike_hier_aliased  --used for debugging the transformed data

, recursive_part (my_key, my_val, my_parent) AS (

-- unnest the for the original row output the value; we start at the root so parent should be null for all records
-- for aliases we don't know what their root entry is yet so simply record null for value
-- and let the first iteration pick up the real record

-- this could be altered to left join the mike_heir_aliased table records containing null parents
-- and coalesce the joined "val" column.  This would cause "cow" from the "ddd' record to be added to the
-- initial "ddd" unnested alias record.
SELECT me_key, CASE WHEN key = me_key THEN val ELSE null::text END AS my_val, null::text AS my_parent FROM (
SELECT *, unnest(key_aliases) AS me_key FROM mike_hier_aliased WHERE mike_hier_aliased.key = 'aaa'
) src

-- on each iteration we again have to look for aliases and create
-- dummy records for them (via unnest) for the next iteration
UNION  --let the recursive part take care of duplicates
SELECT me_key, 
CASE WHEN key = me_key THEN val ELSE null::text END AS my_val, 
CASE WHEN key = me_key THEN parent ELSE null::text END AS my_parent 
FROM (
SELECT mha.key, mha.val, mha.parent, unnest(mha.key_aliases) AS me_key
FROM mike_hier_aliased mha
JOIN recursive_part rp
ON (rp.my_key = mha.parent)  --fails to match mha.(ddd,cow,null) though rp.(ddd,null,null) is present
) rec_src
)

SELECT * FROM recursive_part WHERE my_val IS NOT NULL
​--our dummy records all have null for my_val so lets remove them
David J.

Re: recursive WITH nested union ALL with NOCYCLE logic

От
Michael Moore
Дата:


On Fri, Mar 18, 2016 at 6:03 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Mar 18, 2016 at 4:11 PM, Michael Moore <michaeljmoore@gmail.com> wrote:
David,
If I am understanding you correctly, you are assuming that alternative hierarchies are mapped to only ROOT level hierarchies. It's a reasonable assumption on your part given my illustration only covered this use case. But lets add another mapping.

insert into mike_map (key,child,parent) values
('555','kkk','bbb');
This creates an alternative hierarchy that branches from a CHILD (bbb) of the ROOT (aaa) hierarchy.
So I think I still need the UNION ALL inside the recursive part. 


​You are correct about my assumed premise - and I'm not sure I want to deal with the brain damage that would result from exploring the model you are working with...

​If you can change how you model "mike_map" this should be workable.

The following provides the desired result for the original data as well as, I think, the correct result when adding you kkk/bbb alias.

I think this gets you closer..but probably not all of the way to your goal.

My specific concern is that none of these queries (including your original) have considered "ddd,cow,null" as a valid result...because it lacks a parent.

I'll leave to you to copy-paste and make it readable...

WITH recursive
mike_map_alt (parent, children) AS (
--using an array here is probably not mandatory but makes the logic below simpler by use of unnest()
VALUES ('aaa',ARRAY['ddd','eee']::text[]), ('bbb',ARRAY['kkk']::text[])
), mike_hier_aliased AS (
--here we "canonical-ize" the data so that for each row we know which direct aliases are in play
SELECT mh.key::text AS key, mh.val, mh.parent::text AS parent, 
mh.key::text || COALESCE(mp.children, ARRAY[]::text[]) AS key_aliases,
mh.parent::text || COALESCE(mk.children, ARRAY[]::text[]) AS parent_aliases
FROM mike_hier mh
LEFT JOIN mike_map_alt mp ON (mh.key = mp.parent)
LEFT JOIN mike_map_alt mk ON (mh.parent = mk.parent)
)
--SELECT * FROM mike_hier_aliased  --used for debugging the transformed data

, recursive_part (my_key, my_val, my_parent) AS (

-- unnest the for the original row output the value; we start at the root so parent should be null for all records
-- for aliases we don't know what their root entry is yet so simply record null for value
-- and let the first iteration pick up the real record

-- this could be altered to left join the mike_heir_aliased table records containing null parents
-- and coalesce the joined "val" column.  This would cause "cow" from the "ddd' record to be added to the
-- initial "ddd" unnested alias record.
SELECT me_key, CASE WHEN key = me_key THEN val ELSE null::text END AS my_val, null::text AS my_parent FROM (
SELECT *, unnest(key_aliases) AS me_key FROM mike_hier_aliased WHERE mike_hier_aliased.key = 'aaa'
) src

-- on each iteration we again have to look for aliases and create
-- dummy records for them (via unnest) for the next iteration
UNION  --let the recursive part take care of duplicates
SELECT me_key, 
CASE WHEN key = me_key THEN val ELSE null::text END AS my_val, 
CASE WHEN key = me_key THEN parent ELSE null::text END AS my_parent 
FROM (
SELECT mha.key, mha.val, mha.parent, unnest(mha.key_aliases) AS me_key
FROM mike_hier_aliased mha
JOIN recursive_part rp
ON (rp.my_key = mha.parent)  --fails to match mha.(ddd,cow,null) though rp.(ddd,null,null) is present
) rec_src
)

SELECT * FROM recursive_part WHERE my_val IS NOT NULL
​--our dummy records all have null for my_val so lets remove them
David J.

Here is the formatted version of the query you wrote:
with
   recursive 
   mike_map_alt (parent, children) AS ( 
               VALUES  
                 ('aaa', ARRAY['ddd','eee']::text[]),
                 ('bbb', ARRAY['kkk']::text[])
                 ), 
   mike_hier_aliased AS ( 
       SELECT mh.key::text AS key, 
              mh.val, 
              mh.parent::text AS parent, 
              mh.key::text    || COALESCE(mp.children, ARRAY[]::text[]) AS key_aliases,
              mh.parent::text || COALESCE(mk.children, ARRAY[]::text[]) AS parent_aliases
       FROM mike_hier mh
       LEFT JOIN mike_map_alt mp ON (mh.key = mp.parent)
       LEFT JOIN mike_map_alt mk ON (mh.parent = mk.parent)
)
SELECT * FROM mike_hier_aliased;

and you said:
"My specific concern is that none of these queries (including your original) have considered "ddd,cow,null" as a valid result...because it lacks a parent."
I think this concern is the result of me not explaining the use case for "alternative hierarchies" sufficiently (or at all). The word "alternative" is misleading. "extension" would have been more appropriate. In other words, we want to extend the target hierarchy (the hierarchy which is the subject of the query)  with the children of the alternative hierarchy. In my example, we could say; hierarchy aaa is to be extended with the children of hierarchies ddd and eee. Hence "ddd,cow,null" in not interesting in the context of a search on aaa. 

Another thing I didn't mention, because I did not anticipate it's importance, is the number of rows on the REAL tables I and dealing with. The hierarchy table has 200,000 rows and the mapping table 4,000 rows. 
The average query will return about 20 rows. So, for this reason it's not going to be practical to "pre-join" all of the rows; not that you were advocating this, just thought I should get it out there. 

I've learned a lot from looking at what you have provided so far. 
In theory there should be no recursive loops in the data. I just thought I'd put the NO-CYCLE logic in just to be safe. The point is, I can go with my existing solution so this isn't urgent, it's more of a puzzle; a "is it possible to do this" sort of thing. Thanks again for your time and help
Mike