Обсуждение: [GENERAL] Dealing with ordered hierarchies

Поиск
Список
Период
Сортировка

[GENERAL] Dealing with ordered hierarchies

От
Tim Uckun
Дата:
I have read many articles about dealing with hierarchies in postgres including nested sets, ltree, materialized paths, using arrays as parentage,  CTEs etc but nobody talks about the following scenario.

Say I have a hierarchy like this

1
1.1
1.1.1
1.1.2
1.2
1.3
2
2.1

In this hierarchy the order is very important and I want to run frequent(ish) re-ordering of both subsets and entire trees and even more frequent inserts.

Scenario 1: I want to insert a child into the 1.1 subtree.  The next item should be 1.1.3 and I can't figure out any other way to do this other than to subquery the children and to figure out the max child ID, add one to it which is a race condition waiting to happen.

Scenario 2: I now decide the recently inserted item is the second most important so I reset the ID to 1.1.2 and then increment 1.1.2 (and possibly everything below).  Again this is both prone to race conditions and involves a heavy update.

Is there a better way to deal with this or is the complexity unavoidable?

I should state that like most database reads will be much more frequent than writes and inserts will be more frequent than updates (re-ordering)

Re: [GENERAL] Dealing with ordered hierarchies

От
Achilleas Mantzios
Дата:
On 24/07/2017 10:02, Tim Uckun wrote:
> I have read many articles about dealing with hierarchies in postgres including nested sets, ltree, materialized
paths,using arrays as parentage,  CTEs etc but nobody talks about the following scenario. 
>
> Say I have a hierarchy like this
>
> 1
> 1.1
> 1.1.1
> 1.1.2
> 1.2
> 1.3
> 2
> 2.1
>
> In this hierarchy the order is very important and I want to run frequent(ish) re-ordering of both subsets and entire
treesand even more frequent inserts. 
>
> Scenario 1: I want to insert a child into the 1.1 subtree. The next item should be 1.1.3 and I can't figure out any
otherway to do this other than to subquery the children and to figure out the max  
> child ID, add one to it which is a race condition waiting to happen.
>
> Scenario 2: I now decide the recently inserted item is the second most important so I reset the ID to 1.1.2 and then
increment1.1.2 (and possibly everything below).  Again this is both prone to  
> race conditions and involves a heavy update.
>
> Is there a better way to deal with this or is the complexity unavoidable?
Maybe you could try a hybrid approach with genealogical paths, represented by arrays, and a (possible bidirectional)
linkedlist storing the siblings of the same parent. 
Basically what you'd normally want is to convert your problem into something that can be represented in such a way that
itcan run fast on postgresql. 
>
> I should state that like most database reads will be much more frequent than writes and inserts will be more frequent
thanupdates (re-ordering) 


--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: [GENERAL] Dealing with ordered hierarchies

От
Alban Hertroys
Дата:
> On 24 Jul 2017, at 9:02, Tim Uckun <timuckun@gmail.com> wrote:
>
> I have read many articles about dealing with hierarchies in postgres including nested sets, ltree, materialized
paths,using arrays as parentage,  CTEs etc but nobody talks about the following scenario. 
>
> Say I have a hierarchy like this
>
> 1
> 1.1
> 1.1.1
> 1.1.2
> 1.2
> 1.3
> 2
> 2.1
>
> In this hierarchy the order is very important and I want to run frequent(ish) re-ordering of both subsets and entire
treesand even more frequent inserts. 

Since they're hierarchies, the order is already in the structure of the data. Do you really need to add it to the data
orwould it suffice to add it to the query result? 

If that's the case, you only need a simple ordering number per branch, like 1, 2, 3, etc. The full path (ie. '1.1.3')
getsgenerated in the query. 

I regularly generate structures like your above example using recursive CTE's. The "path" helps to get the results in
thecorrect order for starters (although you're in for a surprise if any of your levels go past 9 in the above). It's
greathow you can "trickle" all kinds of calculations through the hierarchy using CTE's. 

Something like this should help to get you started (untested, I usually do this in Oracle, which has several
peculiarities):

with recursive hierarchy (parent, node, sequence_number, path) as (
    select null, node, sequence_number, sequence_number::text from table
    union all
    select h.node, t.node, t.sequence_number, h.path || '.' || t.sequence_number::text
      from table t
      join hierarchy h on (t.parent = h.node)
)
select node, path
  from hierarchy

Where the table "table" has fields:
    parent    -- parent node
    node    -- actual node
    sequence_number    -- Order of sequence of this node within its parent branch

You may need to add a surrogate key if your parent/child combinations are otherwise not unique. That would then also be
theway to address a node directly (otherwise it would be (parent, node)). 

For the sequence_number I'd probably just use an actual sequence generator with a large enough gap to prevent problems
withreordering items later on (increment by 10 for example). You will also want to pad the sequence numbers in the
"path"column with leading zeroes (otherwise 10 sorts between 1 and 2, etc.), enough that you won't run out of numbers
perlevel. 

If you require your sequence numbers to be subsequent in the result: You can add a field with such numbering based on
theexisting sequence_numbers, by using a windowing function in each branch of the union - it's down to a fairly basic
rownumbering problem at this point. 

> Scenario 1: I want to insert a child into the 1.1 subtree.  The next item should be 1.1.3 and I can't figure out any
otherway to do this other than to subquery the children and to figure out the max child ID, add one to it which is a
racecondition waiting to happen. 

You would first need to determine which node is the parent node by traversing the hierarchy up to the point of
insertionand use the (parent, node) or surrogate key fields to append under. Similar to using '1.1', really. 

> Scenario 2: I now decide the recently inserted item is the second most important so I reset the ID to 1.1.2 and then
increment1.1.2 (and possibly everything below).  Again this is both prone to race conditions and involves a heavy
update.

No need to bother with that (much) with the above approach. And if you do run out of gaps, you can fairly simply update
allthe sequence numbers under the same parent without causing concurrency issues and without requiring
locks/synchronisation.

> Is there a better way to deal with this or is the complexity unavoidable?

I think it's better, but I don't think its ideal. It's fairly complicated to understand, for one thing, which can cause
problemsfor maintenance (I have colleagues who don't dare to touch my queries, for example). 

> I should state that like most database reads will be much more frequent than writes and inserts will be more frequent
thanupdates (re-ordering) 

More of the logic (and thus system load) gets moved to the read-side of things, that's probably a drawback, but most of
itis just keeping state and counting. I don't expect that to be all that much. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: [GENERAL] Dealing with ordered hierarchies

От
Tim Uckun
Дата:
I don't like the approach with a large increment. It would mean complicated logic to see if you filled the gap and then update all the other peers if you did. It sounds like the re-order is going to be expensive no matter what. My primary concern are race conditions though. What if two or more users are trying to update the hierarchy either by inserts or updates? I can definitely see a situation where we have issues transactions trip over each other.

On Mon, Jul 24, 2017 at 10:32 PM, Alban Hertroys <haramrae@gmail.com> wrote:

> On 24 Jul 2017, at 9:02, Tim Uckun <timuckun@gmail.com> wrote:
>
> I have read many articles about dealing with hierarchies in postgres including nested sets, ltree, materialized paths, using arrays as parentage,  CTEs etc but nobody talks about the following scenario.
>
> Say I have a hierarchy like this
>
> 1
> 1.1
> 1.1.1
> 1.1.2
> 1.2
> 1.3
> 2
> 2.1
>
> In this hierarchy the order is very important and I want to run frequent(ish) re-ordering of both subsets and entire trees and even more frequent inserts.

Since they're hierarchies, the order is already in the structure of the data. Do you really need to add it to the data or would it suffice to add it to the query result?

If that's the case, you only need a simple ordering number per branch, like 1, 2, 3, etc. The full path (ie. '1.1.3') gets generated in the query.

I regularly generate structures like your above example using recursive CTE's. The "path" helps to get the results in the correct order for starters (although you're in for a surprise if any of your levels go past 9 in the above). It's great how you can "trickle" all kinds of calculations through the hierarchy using CTE's.

Something like this should help to get you started (untested, I usually do this in Oracle, which has several peculiarities):

with recursive hierarchy (parent, node, sequence_number, path) as (
        select null, node, sequence_number, sequence_number::text from table
        union all
        select h.node, t.node, t.sequence_number, h.path || '.' || t.sequence_number::text
          from table t
          join hierarchy h on (t.parent = h.node)
)
select node, path
  from hierarchy

Where the table "table" has fields:
        parent  -- parent node
        node    -- actual node
        sequence_number -- Order of sequence of this node within its parent branch

You may need to add a surrogate key if your parent/child combinations are otherwise not unique. That would then also be the way to address a node directly (otherwise it would be (parent, node)).

For the sequence_number I'd probably just use an actual sequence generator with a large enough gap to prevent problems with reordering items later on (increment by 10 for example). You will also want to pad the sequence numbers in the "path" column with leading zeroes (otherwise 10 sorts between 1 and 2, etc.), enough that you won't run out of numbers per level.

If you require your sequence numbers to be subsequent in the result: You can add a field with such numbering based on the existing sequence_numbers, by using a windowing function in each branch of the union - it's down to a fairly basic row numbering problem at this point.

> Scenario 1: I want to insert a child into the 1.1 subtree.  The next item should be 1.1.3 and I can't figure out any other way to do this other than to subquery the children and to figure out the max child ID, add one to it which is a race condition waiting to happen.

You would first need to determine which node is the parent node by traversing the hierarchy up to the point of insertion and use the (parent, node) or surrogate key fields to append under. Similar to using '1.1', really.

> Scenario 2: I now decide the recently inserted item is the second most important so I reset the ID to 1.1.2 and then increment 1.1.2 (and possibly everything below).  Again this is both prone to race conditions and involves a heavy update.

No need to bother with that (much) with the above approach. And if you do run out of gaps, you can fairly simply update all the sequence numbers under the same parent without causing concurrency issues and without requiring locks/synchronisation.

> Is there a better way to deal with this or is the complexity unavoidable?

I think it's better, but I don't think its ideal. It's fairly complicated to understand, for one thing, which can cause problems for maintenance (I have colleagues who don't dare to touch my queries, for example).

> I should state that like most database reads will be much more frequent than writes and inserts will be more frequent than updates (re-ordering)

More of the logic (and thus system load) gets moved to the read-side of things, that's probably a drawback, but most of it is just keeping state and counting. I don't expect that to be all that much.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.


Re: [GENERAL] Dealing with ordered hierarchies

От
"Peter J. Holzer"
Дата:
On 2017-07-25 01:15:56 +1200, Tim Uckun wrote:
> I don't like the approach with a large increment. It would mean complicated
> logic to see if you filled the gap and then update all the other peers if you
> did. It sounds like the re-order is going to be expensive no matter what. My
> primary concern are race conditions though. What if two or more users are
> trying to update the hierarchy either by inserts or updates? I can definitely
> see a situation where we have issues transactions trip over each other.

You could add a unique index over (parent, sequence_number). That way
two transactions won't be able to add a node with the same sequence
number under the same parent. You will have to handle duplicate key
errors, though.

        hp

--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

Вложения