Обсуждение: partial JOIN (was: ID column naming convention)

Поиск
Список
Период
Сортировка

partial JOIN (was: ID column naming convention)

От
Rafal Pietrak
Дата:
Hi,

Recent interesting discussion on the list, on (just) naming convention
reminded me of a related problem which I haven't resolved myself, jet.

As slowly I learn SQL (like a blind dog in a meat market), currently
I've just started to use table JOINS more extensively ... and I often
bump on a "two columns have the same name" error.

The point is, that my "large object" often contain sets of (quite)
identical component objects; like: a SLED has LEFT_RUNNER and
RIGHT_RUNNER, both referring to the same RUNNERS table. And yes, I have
sattled with naming convention where table is a plural noun (table
RUNNERS), while primary key column name is an unspeciffic singular noun
(A_RUNNER).

So, when I join the SLEDS table with RUNNERS table (twice: left and
right runner, to get a complete bom for a sled):
SELECT * FROM sleds s JOIN runners l ON (s.left=l.runner) JOIN runners r
ON (s.right=r.runner);
.... it doesn't work, when SLED table and RUNNERS table both have the
same column (like LENGTH). And it wouldn't work even if I called
sled.length a SLED.SLED_LENGTH and runner.length a RUNNER.RUNNER_LENGTH,
since RUNNERS are joined twice.

Is there an sql-language level solution (idiom) to cope with such
queries? And I don't mean:
SELECT s.*,r.*,l.* FROM sleds s JOIN runners l ON (s.left=l.runner) JOIN
runners r ON (s.right=r.runner);
...since in such case, there would be even more duplicate column names
then in the first example. The only way I know to avoid the column name
duplication is to explicity select column list:
SELECT s.sled,s.length,s....,r.runner as right,r.length as
right_length,r....,l.runner as left,l.length as left_length,l.* FROM
sleds s JOIN runners l ON (s.left=l.runner) JOIN runners r ON
(s.right=r.runner);
.... which is truely overtalkative (and thus obfuscates future query
analize during code maintenance).

To explain the problem a little better, here is a "pseudoSQL" query
example, which should ilustrate the problem/solution:
SELECT * FROM sleds s JOIN runners(length as left_length, weight as
left_weight, runner as left) l USING (left) JOIN runners(length as
right_length, weight as right_weight, runner as right) r USING (right);

I found one way to get "almost exactly" to this point, that is by
createing VIEWs to do the job of renameing columns:
CREATE VIEW right_runner AS SELECT length as right_length, weight as
right_weight, runner as right FROM runners;
CREATE VIEW left_runner AS SELECT length as left_length, weight as
right_weight, runner as left FROM runners;
.... one problem with this solution is that during the livetime of an
application, columns of tables (like RUNNERS) change (like by adding new
attributes), and it's quite tricky to promote those changes smoothly up
to the top "SELECT * FROM sleds ...", particularly if that last one is
actually a CREATED VIEW.
... the other problem is that it exploads application/db schema, and
thus make it more difficult to "comprehend" during future maintenance.

A real live example of such query is quering addreses of a person:
residential_address, office_address, delivery_address, etc... all of
them from a single ADDRESSES table. Or people telephone numbers from
PHONE_NUMERS table.

I have seen the WINDOW-PARTITION-OVER syntax very similar to this
problem, but countrary to JOINS, WINDOW can be defined
outside of a select column list, which is helpfull. I havent found such
construct for JOINs. Is there any?

Is there an ordinary SQL phrase/idiom (laconic/tarse in it's form) to
get the above result?

-R




Re: partial JOIN (was: ID column naming convention)

От
"David G. Johnston"
Дата:
On Sat, Oct 24, 2015 at 6:41 AM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
The only way I know to avoid the column name
duplication is to explicity select column list:
SELECT s.sled,s.length,s....,r.runner as right,r.length as
right_length,r....,l.runner as left,l.length as left_length,l.* FROM
sleds s JOIN runners l ON (s.left=l.runner) JOIN runners r ON
(s.right=r.runner);
.... which is truely overtalkative (and thus obfuscates future query
analize during code maintenance).

​Skimmed...​

​Using explicit column names is expected - using "*" in non-trivial and production queries is not.

You can move the aliases if you would like.

SELECT *
FROM tablea (col1, col2, col4)
JOIN tableb AS tb1 (col1, col3, col5) USING (col1)
JOIN tableb AS tb2​
 
​(col1, col6, col7) USING (col1)

The "USING" clause ensure that only a single "col1" appears in the output.

For all other columns in the duplicate join you need to provide a context-specific alias.​

David J.

Re: partial JOIN (was: ID column naming convention)

От
Rafal Pietrak
Дата:

W dniu 24.10.2015 o 15:00, David G. Johnston pisze:
> On Sat, Oct 24, 2015 at 6:41 AM, Rafal Pietrak <rafal@ztk-rp.eu
> <mailto:rafal@ztk-rp.eu>>wrote:
[----------------------]
>
> ​Using explicit column names is expected - using "*" in non-trivial and
> production queries is not.
>
> You can move the aliases if you would like.
>
> SELECT *
> FROM tablea (col1, col2, col4)
> JOIN tableb AS tb1 (col1, col3, col5) USING (col1)
> JOIN tableb AS tb2​
>
> ​(col1, col6, col7) USING (col1)

I knew there must have been something like this.

thenx,

-R


Re: partial JOIN (was: ID column naming convention)

От
Rafal Pietrak
Дата:

W dniu 24.10.2015 o 21:03, Rafal Pietrak pisze:
>
>
> W dniu 24.10.2015 o 15:00, David G. Johnston pisze:
>> On Sat, Oct 24, 2015 at 6:41 AM, Rafal Pietrak <rafal@ztk-rp.eu
>> <mailto:rafal@ztk-rp.eu>>wrote:
> [----------------------]
>>
>> ​Using explicit column names is expected - using "*" in non-trivial and
>> production queries is not.
>>
>> You can move the aliases if you would like.
>>
>> SELECT *
>> FROM tablea (col1, col2, col4)
>> JOIN tableb AS tb1 (col1, col3, col5) USING (col1)
>> JOIN tableb AS tb2​
>>
>> ​(col1, col6, col7) USING (col1)
>
> I knew there must have been something like this.

Upss. Almost, but not quite. I've just read the manual on that
(http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html)
and it looks like "col1", "col2", etc in the above example are column
*aliases*. Right?

So I have to list *all* the columns of the aliased table irrespectively
if I need any of them within the output, or not.

It's a pity standard didn't choose to make column aliasing optional,
allowing for cherry pick what's aliased like following:

.. JOIN table AS tb(column7 [as alias1], column3 [as alias2],...)

thenx anyway, "Mandatory" column aliasing is helpfull too.

-R




Re: partial JOIN (was: ID column naming convention)

От
Dane Foster
Дата:
On Sat, Oct 24, 2015 at 3:42 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:


W dniu 24.10.2015 o 21:03, Rafal Pietrak pisze:
>
>
> W dniu 24.10.2015 o 15:00, David G. Johnston pisze:
>> On Sat, Oct 24, 2015 at 6:41 AM, Rafal Pietrak <rafal@ztk-rp.eu
>> <mailto:rafal@ztk-rp.eu>>wrote:
> [----------------------]
>>
>> ​Using explicit column names is expected - using "*" in non-trivial and
>> production queries is not.
>>
>> You can move the aliases if you would like.
>>
>> SELECT *
>> FROM tablea (col1, col2, col4)
>> JOIN tableb AS tb1 (col1, col3, col5) USING (col1)
>> JOIN tableb AS tb2​
>>
>> ​(col1, col6, col7) USING (col1)
>
> I knew there must have been something like this.

Upss. Almost, but not quite. I've just read the manual on that
(http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html)
and it looks like "col1", "col2", etc in the above example are column
*aliases*. Right?

So I have to list *all* the columns of the aliased table irrespectively
if I need any of them within the output, or not.

It's a pity standard didn't choose to make column aliasing optional,
allowing for cherry pick what's aliased like following:

.. JOIN table AS tb(column7 [as alias1], column3 [as alias2],...)

thenx anyway, "Mandatory" column aliasing is helpfull too.

-R




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
You may be able to accomplish that using aliased sub-selects as in-line views. The purpose of the sub-selects in this use-case is simply to cherry pick the columns you want.
SELECT *
FROM
  (SELECT col1, col2, col4 FROM tablea) AS iv
  JOIN (SELECT co1, col3, col5 FROM tableb) AS tb1 USING (col1))
  JOIN (SELECT col1, col6, col7 FROM tableb) AS tb22 USING (col1)


Please note, this may be a performance nightmare for large tables because w/o a WHERE clause that can be pushed down to the sub-selects each sub-select will do a full table scan.

Please note that the 3rd JOIN clause is nutty (I translated it from your original) because why would you join a table to itself just to select a different set of columns?

Good luck,

Dane


Re: partial JOIN (was: ID column naming convention)

От
Dane Foster
Дата:

On Sat, Oct 24, 2015 at 5:23 PM, Dane Foster <studdugie@gmail.com> wrote:
On Sat, Oct 24, 2015 at 3:42 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:


W dniu 24.10.2015 o 21:03, Rafal Pietrak pisze:
>
>
> W dniu 24.10.2015 o 15:00, David G. Johnston pisze:
>> On Sat, Oct 24, 2015 at 6:41 AM, Rafal Pietrak <rafal@ztk-rp.eu
>> <mailto:rafal@ztk-rp.eu>>wrote:
> [----------------------]
>>
>> ​Using explicit column names is expected - using "*" in non-trivial and
>> production queries is not.
>>
>> You can move the aliases if you would like.
>>
>> SELECT *
>> FROM tablea (col1, col2, col4)
>> JOIN tableb AS tb1 (col1, col3, col5) USING (col1)
>> JOIN tableb AS tb2​
>>
>> ​(col1, col6, col7) USING (col1)
>
> I knew there must have been something like this.

Upss. Almost, but not quite. I've just read the manual on that
(http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html)
and it looks like "col1", "col2", etc in the above example are column
*aliases*. Right?

So I have to list *all* the columns of the aliased table irrespectively
if I need any of them within the output, or not.

It's a pity standard didn't choose to make column aliasing optional,
allowing for cherry pick what's aliased like following:

.. JOIN table AS tb(column7 [as alias1], column3 [as alias2],...)

thenx anyway, "Mandatory" column aliasing is helpfull too.

-R




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
You may be able to accomplish that using aliased sub-selects as in-line views. The purpose of the sub-selects in this use-case is simply to cherry pick the columns you want.
SELECT *
FROM
  (SELECT col1, col2, col4 FROM tablea) AS iv
  JOIN (SELECT co1, col3, col5 FROM tableb) AS tb1 USING (col1))
  JOIN (SELECT col1, col6, col7 FROM tableb) AS tb22 USING (col1)


Please note, this may be a performance nightmare for large tables because w/o a WHERE clause that can be pushed down to the sub-selects each sub-select will do a full table scan.

Please note that the 3rd JOIN clause is nutty (I translated it from your original) because why would you join a table to itself just to select a different set of columns?

Good luck,

Dane
For the record SELECT * in my example is absolutely the wrong thing to do but your original didn't leave me w/ any other option.

Dane

Re: partial JOIN (was: ID column naming convention)

От
Rafal Pietrak
Дата:

W dniu 24.10.2015 o 23:25, Dane Foster pisze:
>
> On Sat, Oct 24, 2015 at 5:23 PM, Dane Foster <studdugie@gmail.com
> <mailto:studdugie@gmail.com>> wrote:
>

[--------------------]
>         --
>         Sent via pgsql-general mailing list
>         (pgsql-general@postgresql.org <mailto:pgsql-general@postgresql.org>)
>         To make changes to your subscription:
>         http://www.postgresql.org/mailpref/pgsql-general
>
>     ​
>     You may be able to accomplish that using aliased sub-selects as
>     in-line views. The purpose of the sub-selects in this use-case is
>     simply to cherry pick the columns you want.
>     SELECT *
>     FROM
>       (SELECT col1, col2, col4 FROM tablea) AS iv
>       JOIN (SELECT co1, col3, col5 FROM tableb) AS tb1 USING (col1))
>       JOIN (SELECT col1, col6, col7 FROM tableb) AS tb22 USING (col1)
>
>     Please note, this may be a performance nightmare for large tables
>     because w/o a WHERE clause that can be pushed down to the
>     sub-selects each sub-select will do a full table scan.

Yes. And that's why this is not truely an option. I'd rather give all
coluns aliases (when coding), then opt for subquery on every execute.

>
>     Please note that the 3rd JOIN clause is nutty (I translated it from
>     your original) because why would you join a table to itself just to
>     select a different set of columns?

One example (a bit artificial, I know) might be the address data, for
waybill:
create table purchases( basket int, customer int, delivery int, ...);
select * from purchases p join buyers b(customer, city, address) using
(customer) join buyers d (delivery, to_city, to_address, to_zip) using
(delivery);

... or something like that. ZIP code is not actually needed to indicate
customer (SSN might be instead).

But I wouldn't agrue if real life programming actually needs that. I've
just wanted to have the most generic example I've imagined.

>
>     Good luck,
>
>     Dane
>
> ​
> For the record SELECT * in my example is absolutely the wrong thing to
> do but your original didn't leave me w/ any other option.
>

Hmmm. I've seen people say that. I do keep that in mind, but frankly I
actually never had to avoid that to get my code working (and
maintained). I do that sometimes to limit the bandwidth necesery to
deliver the results, but not so often.

But I'd say, that "the standard" sort of does that (i.e. the star)
notoriusly:
1. with table aliasing (the case we are discussing now), standard
expects us to give column aliases *in order* they are defined within the
aliased table - that's nothing else but a "hidden star" somwhere there.
And I really wish they did it without that.
2. see the systax of INSERT: a list of column names to be prowided with
values is optional, and when you don't give it, it's like you've written
"a star" in its place. This I find *very bad* and never use it myself.
But standard people thought otherwise.

So personally, I don't see a star in a select list so harmfull, quite
the oposit.

-R


Re: partial JOIN (was: ID column naming convention)

От
Jim Nasby
Дата:
On 10/25/15 2:30 AM, Rafal Pietrak wrote:
> So personally, I don't see a star in a select list so harmfull, quite
> the oposit.

Using * outside the database is generally very dangerous. Using it
*inside* the database can be very useful, because frequently it's
exactly what you want (give me all the columns, dammit!). In particular,
I find it useful in views where I want the original data, along with
some calculated or other values. For example,

SELECT nspname, c.oid, * FROM pg_class c LEFT JOIN pg_namespace ON...

But there's also times I've wanted a way to manipulate what * would
normally do. In particular, *_except_for(field_list), and *_replace(
regexp_replace to run on each field name). If those existed (and maybe a
way to combine them), it wouldn't be terribly hard for you do handle
your sled query with something like:

SELECT s.*, l.*( s/.*/left_%/ ), r.*( s/.*/right_& )
   FROM sled s JOIN runner l ON ... JOIN runner r ON ...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: partial JOIN (was: ID column naming convention)

От
Rafal Pietrak
Дата:

W dniu 26.10.2015 o 01:49, Jim Nasby pisze:
> On 10/25/15 2:30 AM, Rafal Pietrak wrote:
[----------------]
> But there's also times I've wanted a way to manipulate what * would
> normally do. In particular, *_except_for(field_list), and *_replace(
> regexp_replace to run on each field name). If those existed (and maybe a
> way to combine them), it wouldn't be terribly hard for you do handle
> your sled query with something like:
>
> SELECT s.*, l.*( s/.*/left_%/ ), r.*( s/.*/right_& )
>   FROM sled s JOIN runner l ON ... JOIN runner r ON ...

It certainly look nasty.

But in the long run I strongly believe in peoples ability to "settle for
conventions", and I think I could find my ways with that construct, if
only it was possible. Not that I'd prefere it to the explicit aliasing
for aliased table coluns, syntactically inside column alias list. My
personal point of view is that:
1. "a star" in select is not so harmfull, since backend gives out the
names along with the data; and you cen see them in psql output and
retrieve them from client library (libpq) thus adopting for any order "a
star" happen to create.
2. in all other cases (from which I know: table aliases and inserts),
the backend does not interfere only when your error in arrangeing the
list results in a data type missmatch. this is wrong.

But anyway, thenx for pointing me to column aliasing, which I didn't
know before.

-R