Обсуждение: How to join table to itself N times?

Поиск
Список
Период
Сортировка

How to join table to itself N times?

От
"W. Matthew Wilson"
Дата:
I got this table right now:

select * from market_segment_dimension_values ;
+--------------------------+---------------+
| market_segment_dimension |     value     |
+--------------------------+---------------+
| geography                | north         |
| geography                | south         |
| industry type            | retail        |
| industry type            | manufacturing |
| industry type            | wholesale     |
+--------------------------+---------------+
(5 rows)

The PK is (market_segment_dimension, value).

The dimension column refers to another table called market_segment_dimensions.

So, "north" and "south" are to values for the "geography" dimension.

In that data above, there are two dimensions.  But sometimes there could be just one dimension, or maybe three, ... up to ten.

Now here's the part where I'm stumped.

I need to create a cartesian product of the dimensions.

I came up with this approach by hard-coding the different dimensions:

 with geog as (                                 
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'geography'),

    industry_type as (
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'industry type')

select geog.value as g,
industry_type.value as ind_type
from geog
cross join industry_type
;
+-------+---------------+
|   g   |   ind_type    |
+-------+---------------+
| north | retail        |
| north | manufacturing |
| north | wholesale     |
| south | retail        |
| south | manufacturing |
| south | wholesale     |
+-------+---------------+
(6 rows)

But that won't work if I add a new dimension (unless I update the query).  For example, maybe I need to add a new dimension called, say, customer size, which has values "big" and "small".  A

I've got some nasty plan B solutions, but I want to know if there's some solution.

There's a really elegant solution in python using itertools.product, like this:

>>> list(itertools.product(*[['north', 'south'], ['retail', 'manufacturing', 'wholesale']]))

[('north', 'retail'),
 ('north', 'manufacturing'),
 ('north', 'wholesale'),
 ('south', 'retail'),
 ('south', 'manufacturing'),
 ('south', 'wholesale')]

All advice is welcome.  Thanks in advance!

Matt



--
W. Matthew Wilson
matt@tplus1.com
http://tplus1.com

Re: How to join table to itself N times?

От
AI Rumman
Дата:


On Wed, Mar 20, 2013 at 7:38 PM, W. Matthew Wilson <matt@tplus1.com> wrote:
I got this table right now:

select * from market_segment_dimension_values ;
+--------------------------+---------------+
| market_segment_dimension |     value     |
+--------------------------+---------------+
| geography                | north         |
| geography                | south         |
| industry type            | retail        |
| industry type            | manufacturing |
| industry type            | wholesale     |
+--------------------------+---------------+
(5 rows)

The PK is (market_segment_dimension, value).

The dimension column refers to another table called market_segment_dimensions.

So, "north" and "south" are to values for the "geography" dimension.

In that data above, there are two dimensions.  But sometimes there could be just one dimension, or maybe three, ... up to ten.

Now here's the part where I'm stumped.

I need to create a cartesian product of the dimensions.

I came up with this approach by hard-coding the different dimensions:

 with geog as (                                 
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'geography'),

    industry_type as (
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'industry type')

select geog.value as g,
industry_type.value as ind_type
from geog
cross join industry_type
;
+-------+---------------+
|   g   |   ind_type    |
+-------+---------------+
| north | retail        |
| north | manufacturing |
| north | wholesale     |
| south | retail        |
| south | manufacturing |
| south | wholesale     |
+-------+---------------+
(6 rows)

But that won't work if I add a new dimension (unless I update the query).  For example, maybe I need to add a new dimension called, say, customer size, which has values "big" and "small".  A

I've got some nasty plan B solutions, but I want to know if there's some solution.

There's a really elegant solution in python using itertools.product, like this:

>>> list(itertools.product(*[['north', 'south'], ['retail', 'manufacturing', 'wholesale']]))

[('north', 'retail'),
 ('north', 'manufacturing'),
 ('north', 'wholesale'),
 ('south', 'retail'),
 ('south', 'manufacturing'),
 ('south', 'wholesale')]

All advice is welcome.  Thanks in advance!

Matt



--
W. Matthew Wilson
matt@tplus1.com
http://tplus1.com


You may try:

Select a.value, b.value
from market_segment_dimension_values as a,
from market_segment_dimension_values as b
where a.market_segment_dimension <> b.market_segment_dimension

-- AI

Re: How to join table to itself N times?

От
Scott Marlowe
Дата:
On Wed, Mar 20, 2013 at 5:38 PM, W. Matthew Wilson <matt@tplus1.com> wrote:
> I got this table right now:
>
> select * from market_segment_dimension_values ;
> +--------------------------+---------------+
> | market_segment_dimension |     value     |
> +--------------------------+---------------+
> | geography                | north         |
> | geography                | south         |
> | industry type            | retail        |
> | industry type            | manufacturing |
> | industry type            | wholesale     |
> +--------------------------+---------------+
> (5 rows)
>
> The PK is (market_segment_dimension, value).
>
> The dimension column refers to another table called
> market_segment_dimensions.
>
> So, "north" and "south" are to values for the "geography" dimension.
>
> In that data above, there are two dimensions.  But sometimes there could be
> just one dimension, or maybe three, ... up to ten.

If the number of dimensions is not fixed, then you'll probably have to
write a plpgsql function to first interrogate the data set for how
many dimensions there are and then to build an n-dimension query.
While joining a variable number of tables may be problematic as you
won't have a fixed number of columns, using a union might give you
what you want with a fixed number of columns.


Re: How to join table to itself N times?

От
Paul Jungwirth
Дата:
Wow, this is a fun puzzle. I'd love to be the first to solve it with
just SQL, but I don't have a solution yet. Here are some elements that
might be useful:

SELECT market_segment_dimension, array_agg(value)
FROM market_segment_dimension_values
GROUP BY market_segment_dimension;

the UNNEST function
the ROW function
window functions like row_number and nth_value
the crosstab function (requires installing an extension; this seems
like cheating if you ask me)

Good luck!
Paul


On Wed, Mar 20, 2013 at 7:14 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Mar 20, 2013 at 5:38 PM, W. Matthew Wilson <matt@tplus1.com> wrote:
>> I got this table right now:
>>
>> select * from market_segment_dimension_values ;
>> +--------------------------+---------------+
>> | market_segment_dimension |     value     |
>> +--------------------------+---------------+
>> | geography                | north         |
>> | geography                | south         |
>> | industry type            | retail        |
>> | industry type            | manufacturing |
>> | industry type            | wholesale     |
>> +--------------------------+---------------+
>> (5 rows)
>>
>> The PK is (market_segment_dimension, value).
>>
>> The dimension column refers to another table called
>> market_segment_dimensions.
>>
>> So, "north" and "south" are to values for the "geography" dimension.
>>
>> In that data above, there are two dimensions.  But sometimes there could be
>> just one dimension, or maybe three, ... up to ten.
>
> If the number of dimensions is not fixed, then you'll probably have to
> write a plpgsql function to first interrogate the data set for how
> many dimensions there are and then to build an n-dimension query.
> While joining a variable number of tables may be problematic as you
> won't have a fixed number of columns, using a union might give you
> what you want with a fixed number of columns.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



--
_________________________________
Pulchritudo splendor veritatis.


Re: How to join table to itself N times?

От
Paul Jungwirth
Дата:
Okay, how about this (table names shortened):

create table m (d varchar(255) not null, v varchar(255) not null);
insert into m (d, v) values ('geography', 'north'), ('geography',
'south'), ('industry type', 'retail'), ('industry type',
'manufacturing'), ('industry type', 'wholesale');

WITH RECURSIVE t(combo, n) AS (
  WITH dims AS (SELECT DISTINCT d, row_number() OVER () AS n FROM m GROUP BY d)
  SELECT '{}'::text[], 1
  UNION ALL
  SELECT array_append(t2.combo::text[], m.v::text), t2.n+1
  FROM  t t2, dims
  CROSS JOIN m
  WHERE m.d = dims.d AND dims.n = t2.n
)
SELECT *
FROM t
WHERE n = (SELECT COUNT(DISTINCT d) + 1 FROM m);

Gives these results:

         combo         | n
-----------------------+---
 {retail,north}        | 3
 {manufacturing,north} | 3
 {wholesale,north}     | 3
 {retail,south}        | 3
 {manufacturing,south} | 3
 {wholesale,south}     | 3
(6 rows)

Paul


On Wed, Mar 20, 2013 at 8:40 PM, Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
> Wow, this is a fun puzzle. I'd love to be the first to solve it with
> just SQL, but I don't have a solution yet. Here are some elements that
> might be useful:
>
> SELECT market_segment_dimension, array_agg(value)
> FROM market_segment_dimension_values
> GROUP BY market_segment_dimension;
>
> the UNNEST function
> the ROW function
> window functions like row_number and nth_value
> the crosstab function (requires installing an extension; this seems
> like cheating if you ask me)
>
> Good luck!
> Paul
>
>
> On Wed, Mar 20, 2013 at 7:14 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Wed, Mar 20, 2013 at 5:38 PM, W. Matthew Wilson <matt@tplus1.com> wrote:
>>> I got this table right now:
>>>
>>> select * from market_segment_dimension_values ;
>>> +--------------------------+---------------+
>>> | market_segment_dimension |     value     |
>>> +--------------------------+---------------+
>>> | geography                | north         |
>>> | geography                | south         |
>>> | industry type            | retail        |
>>> | industry type            | manufacturing |
>>> | industry type            | wholesale     |
>>> +--------------------------+---------------+
>>> (5 rows)
>>>
>>> The PK is (market_segment_dimension, value).
>>>
>>> The dimension column refers to another table called
>>> market_segment_dimensions.
>>>
>>> So, "north" and "south" are to values for the "geography" dimension.
>>>
>>> In that data above, there are two dimensions.  But sometimes there could be
>>> just one dimension, or maybe three, ... up to ten.
>>
>> If the number of dimensions is not fixed, then you'll probably have to
>> write a plpgsql function to first interrogate the data set for how
>> many dimensions there are and then to build an n-dimension query.
>> While joining a variable number of tables may be problematic as you
>> won't have a fixed number of columns, using a union might give you
>> what you want with a fixed number of columns.
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> _________________________________
> Pulchritudo splendor veritatis.



--
_________________________________
Pulchritudo splendor veritatis.


Re: How to join table to itself N times?

От
David Johnston
Дата:
Matt Wilson wrote
> I got this table right now:
>
> select * from market_segment_dimension_values ;
> +--------------------------+---------------+
> | market_segment_dimension |     value     |
> +--------------------------+---------------+
> | geography                | north         |
> | geography                | south         |
> | industry type            | retail        |
> | industry type            | manufacturing |
> | industry type            | wholesale     |
> +--------------------------+---------------+
> (5 rows)

Most likely you can solve your problem by using the "hstore" extension.  I
could be more certain of this if you actually state the
requirements/use-case/business-problem.

SQL requires that you know the column structure of the output so if hstore
does not suffice you will have to execute a dynamic query in your API after
querying the dimension map table to decide how many output columns you will
need.  hstore avoids that by giving you a dynamic table-in-a-column.

David J.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-join-table-to-itself-N-times-tp5749107p5749125.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: How to join table to itself N times?

От
Misa Simic
Дата:
Hi,

Not clear what is expected result - if you add new dimension...

a) three columns? - well not possible to write SQL query which returns undefined number of columns... unfortunatelly - though I am not clear why :)

b) But you can get the similar result as from python... my guess is you expect:



 ('north', 'retail', small),
 ('north', 'retail', big),
 ('north', 'manufacturing', small),
 ('north', 'manufacturing', big),
 ('north', 'wholesale', small),
 ('north', 'wholesale', big),
 ('south', 'retail', small),
 ('south', 'retail', big),
 ('south', 'manufacturing', small),
 ('south', 'manufacturing', big)
 ('south', 'wholesale', small)
 ('south', 'wholesale', big)



In your dimensions table (called: market_dimensions) you would need one more column to define desired result order

i.e.

market_segment_dimensions
market_segment_dimension ,                    ord
geography,                                                 1
industry type,                                              2
customer size,                                             3


WITH RECURSIVE t (

SELECT array_agg(value) AS values, ord + 1 AS next_dim_ord, ord AS agg_dims 
FROM market_segment_dimension_values 
INNER JOIN market_segment_dimensions USING (market_segment_dimension) 
WHERE ord = 1
UNION ALL
SELECT array_agg(value) AS values, ord + 1 AS next_dim_ord, ord AS agg_dims 
FROM t 
INNER JOIN market_segment_dimensions  ON (ord = t.next_dim_ord) 
INNER JOIN market_segment_dimension_values  USING (market_segment_dimension)
)

SELECT values FROM t WHERE t.agg_dims = (SELECT MAX(ord) FROM market_segment_dimensions) 






2013/3/21 W. Matthew Wilson <matt@tplus1.com>
I got this table right now:

select * from market_segment_dimension_values ;
+--------------------------+---------------+
| market_segment_dimension |     value     |
+--------------------------+---------------+
| geography                | north         |
| geography                | south         |
| industry type            | retail        |
| industry type            | manufacturing |
| industry type            | wholesale     |
+--------------------------+---------------+
(5 rows)

The PK is (market_segment_dimension, value).

The dimension column refers to another table called market_segment_dimensions.

So, "north" and "south" are to values for the "geography" dimension.

In that data above, there are two dimensions.  But sometimes there could be just one dimension, or maybe three, ... up to ten.

Now here's the part where I'm stumped.

I need to create a cartesian product of the dimensions.

I came up with this approach by hard-coding the different dimensions:

 with geog as (                                 
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'geography'),

    industry_type as (
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'industry type')

select geog.value as g,
industry_type.value as ind_type
from geog
cross join industry_type
;
+-------+---------------+
|   g   |   ind_type    |
+-------+---------------+
| north | retail        |
| north | manufacturing |
| north | wholesale     |
| south | retail        |
| south | manufacturing |
| south | wholesale     |
+-------+---------------+
(6 rows)

But that won't work if I add a new dimension (unless I update the query).  For example, maybe I need to add a new dimension called, say, customer size, which has values "big" and "small".  A

I've got some nasty plan B solutions, but I want to know if there's some solution.

There's a really elegant solution in python using itertools.product, like this:

>>> list(itertools.product(*[['north', 'south'], ['retail', 'manufacturing', 'wholesale']]))

[('north', 'retail'),
 ('north', 'manufacturing'),
 ('north', 'wholesale'),
 ('south', 'retail'),
 ('south', 'manufacturing'),
 ('south', 'wholesale')]

All advice is welcome.  Thanks in advance!

Matt



--
W. Matthew Wilson
matt@tplus1.com
http://tplus1.com


Re: How to join table to itself N times?

От
Misa Simic
Дата:
correction:


2013/3/22 Misa Simic <misa.simic@gmail.com>
Hi,

Not clear what is expected result - if you add new dimension...

a) three columns? - well not possible to write SQL query which returns undefined number of columns... unfortunatelly - though I am not clear why :)

b) But you can get the similar result as from python... my guess is you expect:



 ('north', 'retail', small),
 ('north', 'retail', big),
 ('north', 'manufacturing', small),
 ('north', 'manufacturing', big),
 ('north', 'wholesale', small),
 ('north', 'wholesale', big),
 ('south', 'retail', small),
 ('south', 'retail', big),
 ('south', 'manufacturing', small),
 ('south', 'manufacturing', big)
 ('south', 'wholesale', small)
 ('south', 'wholesale', big)



In your dimensions table (called: market_dimensions) you would need one more column to define desired result order

i.e.

market_segment_dimensions
market_segment_dimension ,                    ord
geography,                                                 1
industry type,                                              2
customer size,                                             3


WITH RECURSIVE t (

SELECT array_agg(value) AS values, ord + 1 AS next_dim_ord, ord AS agg_dims 
FROM market_segment_dimension_values 
INNER JOIN market_segment_dimensions USING (market_segment_dimension) 
WHERE ord = 1
UNION ALL
SELECT array_agg(value) AS values, ord + 1 AS next_dim_ord, ord AS agg_dims 
FROM t 
INNER JOIN market_segment_dimensions  ON (ord = t.next_dim_ord) 
INNER JOIN market_segment_dimension_values  USING (market_segment_dimension)
)

SELECT values FROM t WHERE t.agg_dims = (SELECT MAX(ord) FROM market_segment_dimensions) 






2013/3/21 W. Matthew Wilson <matt@tplus1.com>

I got this table right now:

select * from market_segment_dimension_values ;
+--------------------------+---------------+
| market_segment_dimension |     value     |
+--------------------------+---------------+
| geography                | north         |
| geography                | south         |
| industry type            | retail        |
| industry type            | manufacturing |
| industry type            | wholesale     |
+--------------------------+---------------+
(5 rows)

The PK is (market_segment_dimension, value).

The dimension column refers to another table called market_segment_dimensions.

So, "north" and "south" are to values for the "geography" dimension.

In that data above, there are two dimensions.  But sometimes there could be just one dimension, or maybe three, ... up to ten.

Now here's the part where I'm stumped.

I need to create a cartesian product of the dimensions.

I came up with this approach by hard-coding the different dimensions:

 with geog as (                                 
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'geography'),

    industry_type as (
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'industry type')

select geog.value as g,
industry_type.value as ind_type
from geog
cross join industry_type
;
+-------+---------------+
|   g   |   ind_type    |
+-------+---------------+
| north | retail        |
| north | manufacturing |
| north | wholesale     |
| south | retail        |
| south | manufacturing |
| south | wholesale     |
+-------+---------------+
(6 rows)

But that won't work if I add a new dimension (unless I update the query).  For example, maybe I need to add a new dimension called, say, customer size, which has values "big" and "small".  A

I've got some nasty plan B solutions, but I want to know if there's some solution.

There's a really elegant solution in python using itertools.product, like this:

>>> list(itertools.product(*[['north', 'south'], ['retail', 'manufacturing', 'wholesale']]))

[('north', 'retail'),
 ('north', 'manufacturing'),
 ('north', 'wholesale'),
 ('south', 'retail'),
 ('south', 'manufacturing'),
 ('south', 'wholesale')]

All advice is welcome.  Thanks in advance!

Matt



--
W. Matthew Wilson
matt@tplus1.com
http://tplus1.com



Re: How to join table to itself N times?

От
Misa Simic
Дата:
correction:

WITH RECURSIVE t (

SELECT array_agg('{}'::text[], value) AS values, ord + 1 AS next_dim_ord, ord AS agg_dims 
FROM market_segment_dimension_values 
INNER JOIN market_segment_dimensions USING (market_segment_dimension) 
WHERE ord = 1
UNION ALL
SELECT array_agg(t.values, value) AS values, ord + 1 AS next_dim_ord, ord AS agg_dims 
FROM t 
INNER JOIN market_segment_dimensions  ON (ord = t.next_dim_ord) 
INNER JOIN market_segment_dimension_values  USING (market_segment_dimension)
)

SELECT values FROM t WHERE t.agg_dims = (SELECT MAX(ord) FROM market_segment_dimensions) 



2013/3/22 Misa Simic <misa.simic@gmail.com>
correction:


2013/3/22 Misa Simic <misa.simic@gmail.com>
Hi,

Not clear what is expected result - if you add new dimension...

a) three columns? - well not possible to write SQL query which returns undefined number of columns... unfortunatelly - though I am not clear why :)

b) But you can get the similar result as from python... my guess is you expect:



 ('north', 'retail', small),
 ('north', 'retail', big),
 ('north', 'manufacturing', small),
 ('north', 'manufacturing', big),
 ('north', 'wholesale', small),
 ('north', 'wholesale', big),
 ('south', 'retail', small),
 ('south', 'retail', big),
 ('south', 'manufacturing', small),
 ('south', 'manufacturing', big)
 ('south', 'wholesale', small)
 ('south', 'wholesale', big)



In your dimensions table (called: market_dimensions) you would need one more column to define desired result order

i.e.

market_segment_dimensions
market_segment_dimension ,                    ord
geography,                                                 1
industry type,                                              2
customer size,                                             3


WITH RECURSIVE t (

SELECT array_agg(value) AS values, ord + 1 AS next_dim_ord, ord AS agg_dims 
FROM market_segment_dimension_values 
INNER JOIN market_segment_dimensions USING (market_segment_dimension) 
WHERE ord = 1
UNION ALL
SELECT array_agg(value) AS values, ord + 1 AS next_dim_ord, ord AS agg_dims 
FROM t 
INNER JOIN market_segment_dimensions  ON (ord = t.next_dim_ord) 
INNER JOIN market_segment_dimension_values  USING (market_segment_dimension)
)

SELECT values FROM t WHERE t.agg_dims = (SELECT MAX(ord) FROM market_segment_dimensions) 






2013/3/21 W. Matthew Wilson <matt@tplus1.com>

I got this table right now:

select * from market_segment_dimension_values ;
+--------------------------+---------------+
| market_segment_dimension |     value     |
+--------------------------+---------------+
| geography                | north         |
| geography                | south         |
| industry type            | retail        |
| industry type            | manufacturing |
| industry type            | wholesale     |
+--------------------------+---------------+
(5 rows)

The PK is (market_segment_dimension, value).

The dimension column refers to another table called market_segment_dimensions.

So, "north" and "south" are to values for the "geography" dimension.

In that data above, there are two dimensions.  But sometimes there could be just one dimension, or maybe three, ... up to ten.

Now here's the part where I'm stumped.

I need to create a cartesian product of the dimensions.

I came up with this approach by hard-coding the different dimensions:

 with geog as (                                 
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'geography'),

    industry_type as (
    select value
    from market_segment_dimension_values
    where market_segment_dimension = 'industry type')

select geog.value as g,
industry_type.value as ind_type
from geog
cross join industry_type
;
+-------+---------------+
|   g   |   ind_type    |
+-------+---------------+
| north | retail        |
| north | manufacturing |
| north | wholesale     |
| south | retail        |
| south | manufacturing |
| south | wholesale     |
+-------+---------------+
(6 rows)

But that won't work if I add a new dimension (unless I update the query).  For example, maybe I need to add a new dimension called, say, customer size, which has values "big" and "small".  A

I've got some nasty plan B solutions, but I want to know if there's some solution.

There's a really elegant solution in python using itertools.product, like this:

>>> list(itertools.product(*[['north', 'south'], ['retail', 'manufacturing', 'wholesale']]))

[('north', 'retail'),
 ('north', 'manufacturing'),
 ('north', 'wholesale'),
 ('south', 'retail'),
 ('south', 'manufacturing'),
 ('south', 'wholesale')]

All advice is welcome.  Thanks in advance!

Matt



--
W. Matthew Wilson
matt@tplus1.com
http://tplus1.com