Re: [GENERAL] Dealing with ordered hierarchies

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: [GENERAL] Dealing with ordered hierarchies
Дата
Msg-id ABB26569-B4C9-48EE-AB39-AEC2D78F9C0A@gmail.com
обсуждение исходный текст
Ответ на [GENERAL] Dealing with ordered hierarchies  (Tim Uckun <timuckun@gmail.com>)
Ответы Re: [GENERAL] Dealing with ordered hierarchies
Список pgsql-general
> On 24 Jul 2017, at 9:02, Tim Uckun <timuckun@gmail.com> wrote:
>
> I have read many articles about dealing with hierarchies in postgres including nested sets, ltree, materialized
paths,using arrays as parentage,  CTEs etc but nobody talks about the following scenario. 
>
> Say I have a hierarchy like this
>
> 1
> 1.1
> 1.1.1
> 1.1.2
> 1.2
> 1.3
> 2
> 2.1
>
> In this hierarchy the order is very important and I want to run frequent(ish) re-ordering of both subsets and entire
treesand even more frequent inserts. 

Since they're hierarchies, the order is already in the structure of the data. Do you really need to add it to the data
orwould it suffice to add it to the query result? 

If that's the case, you only need a simple ordering number per branch, like 1, 2, 3, etc. The full path (ie. '1.1.3')
getsgenerated in the query. 

I regularly generate structures like your above example using recursive CTE's. The "path" helps to get the results in
thecorrect order for starters (although you're in for a surprise if any of your levels go past 9 in the above). It's
greathow you can "trickle" all kinds of calculations through the hierarchy using CTE's. 

Something like this should help to get you started (untested, I usually do this in Oracle, which has several
peculiarities):

with recursive hierarchy (parent, node, sequence_number, path) as (
    select null, node, sequence_number, sequence_number::text from table
    union all
    select h.node, t.node, t.sequence_number, h.path || '.' || t.sequence_number::text
      from table t
      join hierarchy h on (t.parent = h.node)
)
select node, path
  from hierarchy

Where the table "table" has fields:
    parent    -- parent node
    node    -- actual node
    sequence_number    -- Order of sequence of this node within its parent branch

You may need to add a surrogate key if your parent/child combinations are otherwise not unique. That would then also be
theway to address a node directly (otherwise it would be (parent, node)). 

For the sequence_number I'd probably just use an actual sequence generator with a large enough gap to prevent problems
withreordering items later on (increment by 10 for example). You will also want to pad the sequence numbers in the
"path"column with leading zeroes (otherwise 10 sorts between 1 and 2, etc.), enough that you won't run out of numbers
perlevel. 

If you require your sequence numbers to be subsequent in the result: You can add a field with such numbering based on
theexisting sequence_numbers, by using a windowing function in each branch of the union - it's down to a fairly basic
rownumbering problem at this point. 

> Scenario 1: I want to insert a child into the 1.1 subtree.  The next item should be 1.1.3 and I can't figure out any
otherway to do this other than to subquery the children and to figure out the max child ID, add one to it which is a
racecondition waiting to happen. 

You would first need to determine which node is the parent node by traversing the hierarchy up to the point of
insertionand use the (parent, node) or surrogate key fields to append under. Similar to using '1.1', really. 

> Scenario 2: I now decide the recently inserted item is the second most important so I reset the ID to 1.1.2 and then
increment1.1.2 (and possibly everything below).  Again this is both prone to race conditions and involves a heavy
update.

No need to bother with that (much) with the above approach. And if you do run out of gaps, you can fairly simply update
allthe sequence numbers under the same parent without causing concurrency issues and without requiring
locks/synchronisation.

> Is there a better way to deal with this or is the complexity unavoidable?

I think it's better, but I don't think its ideal. It's fairly complicated to understand, for one thing, which can cause
problemsfor maintenance (I have colleagues who don't dare to touch my queries, for example). 

> I should state that like most database reads will be much more frequent than writes and inserts will be more frequent
thanupdates (re-ordering) 

More of the logic (and thus system load) gets moved to the read-side of things, that's probably a drawback, but most of
itis just keeping state and counting. I don't expect that to be all that much. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



В списке pgsql-general по дате отправления:

Предыдущее
От: Dmitry Lazurkin
Дата:
Сообщение: Re: [GENERAL] Perfomance of IN-clause with many elements and possiblesolutions
Следующее
От: PAWAN SHARMA
Дата:
Сообщение: [GENERAL] monitoring PostgreSQL