Обсуждение: joining tables

Поиск
Список
Период
Сортировка

joining tables

От
TedJones
Дата:
I'm having problems joining 3 tables to provide a 4th table in the correct
format. I believe I need a FULL OUTER JOIN but does not give the result that
I require.See below:

Example 1
Table: Names                Table: Friend
Name    Tel    email    PostCode        Name    friend    email
Jim    1    aa    I            Dave    yes    cc
Ted    2    bb    J            Will    yes    dd
Dave    3    cc    K            Zac    yes    ff
Will    4    dd    L            Byron    yes    gg    

Table: Details
Name    email    Town    PostCode
John    ee    A    M
Zac    ff    B    N
Byron    gg    C    O
Will    dd    D    L
Gary    hh    E    P
Ted    bb    F    J

Combined table: Result
Name    Tel    email    Town    PostCode    friend
Jim    1    aa        I
Ted    2    bb    F    J
Dave    3    cc        K        yes    
Will    4    dd    D    L        yes
John        ee    A    M
Zac        ff    B    N        yes
Byron        gg    C    O        yes
Gary        hh    E    P



Example 2
Table: Names                Table: Friend
Name    Tel    email    PostCode        Name    friend    email
Jim    1    aa    I            Gareth    yes    ii
Ted    2    bb    J            Tony    yes    jj
Dave    3    cc    K            Ken    yes    kk
Will    4    dd    L            Lloyd    yes    ll        

Table: Details
Name    email    Town    PostCode
John    ee    A    M
Zac    ff    B    N
Byron    gg    C    O
Gary    hh    E    P

Combined table: Result
Name    Tel    email    Town    PostCode    friend
Jim    1    aa        I
Ted    2    bb        J
Dave    3    cc        K            
Will    4    dd        L        
John        ee    A    M
Zac        ff    B    N        
Byron        gg    C    O        
Gary        hh    E    P
Gareth        ii                yes
Tony        jj                yes            
Ken        kk                yes    
Lloyd        ll                yes





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html



Re: joining tables

От
Patrick Headley
Дата:
The first thing I noticed is that there is no primary key field on Names. If Friend is a child table then you will use a LEFT OUTER JOIN. Friend will need a foreign key field that has values matching the primary key field in Names. Friend should also have it's own primary key field.

If you are storing names and friends in the same table then you should join them using a junction table. A junction table will have the unique ID value of the name record and the unique ID value of the friend record within the same record. A junction table allows for many-to-many relationships. In other words, you could build a tree of friend relationships.

Where you want to display yes/no values, if you use a LEFT OUTER JOIN and there is no related friend you will get the Name record back and an empty Friend record. If there is a friend record you will get the Name and the Friend record back. If there is more than one friend you will get one Name record per Friend record.

If you really just want yes/no values, you can use a sub query instead of a JOIN within the main query that selects the count of Friend records related to each Name record. If the count = 0, there are no Friend records. If the count is greater than 0, there are Friends. A CASE statement can convert the count to yes/no.


The primary key field on each table should be unique to that table and have no other meaning. For example, you shouldn't use email or telephone as primary key fields because of the chance that someone else my eventually have the same email address or telephone number as the person you've already stored in your database. Another good reason for primary keys is if you want to create updatable views. Without primary keys the database engine can't determine which records you are working on.

A primary key field can be an auto incrementing number (sequence or identity). It can also be a timestamp or UUID. The problem with timestamp and UUID is the slight possibility of a duplicate value.
Additionally, numeric keys process faster than text keys.

The other thing about your sample is why Details is separate from Names. Things that could have multiple records per name record should be in separate tables. Examples would be emails, addresses and phone numbers (work, home, etc.). Additional details for a person should be stored with the person unless the same field could have more than one value.

Patrick Headley
Linx Consulting, Inc.
(303) 916-5522
pheadley@linxco-inc.com
www.linxco-inc.com

On 9/2/19 11:24 AM, TedJones wrote:
I'm having problems joining 3 tables to provide a 4th table in the correct
format. I believe I need a FULL OUTER JOIN but does not give the result that
I require.See below:

Example 1
Table: Names				Table: Friend
Name	Tel	email	PostCode		Name	friend	email
Jim	1	aa	I			Dave	yes	cc
Ted	2	bb	J			Will	yes	dd
Dave	3	cc	K			Zac	yes	ff
Will	4	dd	L			Byron	yes	gg	

Table: Details
Name	email	Town	PostCode
John	ee	A	M
Zac	ff	B	N
Byron	gg	C	O
Will	dd	D	L
Gary	hh	E	P
Ted	bb	F	J

Combined table: Result
Name	Tel	email	Town	PostCode	friend
Jim	1	aa		I
Ted	2	bb	F	J
Dave	3	cc		K		yes	
Will	4	dd	D	L		yes
John		ee	A	M
Zac		ff	B	N		yes
Byron		gg	C	O		yes
Gary		hh	E	P



Example 2
Table: Names				Table: Friend
Name	Tel	email	PostCode		Name	friend	email
Jim	1	aa	I			Gareth	yes	ii
Ted	2	bb	J			Tony	yes	jj
Dave	3	cc	K			Ken	yes	kk
Will	4	dd	L			Lloyd	yes	ll		

Table: Details
Name	email	Town	PostCode
John	ee	A	M
Zac	ff	B	N
Byron	gg	C	O
Gary	hh	E	P

Combined table: Result
Name	Tel	email	Town	PostCode	friend
Jim	1	aa		I
Ted	2	bb		J
Dave	3	cc		K			
Will	4	dd		L		
John		ee	A	M
Zac		ff	B	N		
Byron		gg	C	O		
Gary		hh	E	P
Gareth		ii				yes
Tony		jj				yes			
Ken		kk				yes	
Lloyd		ll				yes





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html



Re: joining tables

От
TedJones
Дата:
Hi Patrick

Thanks for your response. The tables that I gave were just an example and
there is no relevance in the friend/yes or names/contact details in
different tables.

To clarity, another example would be three tables - shop1, shop2 and shop3.
Each would have products so that would be a common column but product price
may also be a common column. However, there would be also different column
names for each shop. (columns with same names would be of the same data
type).

I have no control over the three tables that I get as csv files. It should
be easy enough, if needed to add a primary key column (to each?) after
reading in the csv files into tables.

So, generally, what I am trying to achieve is to combine three tables into
one table where some of the columns are the same and some are not. e.g table
1 with 12 columns; table 2 with 10 columns where 4 are the same as in table
1; table 3 with 4 columns where 1 is the same as in table 1 and 1 the same
as in table 2. The number of columns in the result table would then be 12
(table 1) +  6 (new ones from table 2) + 2 (new ones from table 3) = 20
columns.

Usually the data rows from each table will be different but not always. If
data in a column common to all three tables e.g product name, was the same
from all tables i.e. same product name then all the information about that
product from the three tables would be in all the columns for that data row.
I hope that’s clear. i.e. the difference between example1 and example2 in my
original question.

In this was in reverse it would start with the large result table and use
three SELECT statements to create the three tables.

Thanks
Ted Jones





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html



Re: joining tables

От
Jack Royal-Gordon
Дата:
Hi Ted,

In essence, you want to merge the three tables, removing records that are duplicated by another table, right?

Here are two approaches:

1) SELECT DISTINCT (field list) FROM (SELECT * from table1 UNION SELECT * from table2 UNION SELECT * from table 3).
This will remove all duplicates, regardless of their source (e.g. if table1 as two “Ted” fields, one of the rows will
beeliminated. Overlapping fields 
will be combined (e.g. only one field in the output), and non-overlapping fields will be merged. For example if t1 has
f1,f2, and f3, and t2 has 
f1, f3, and f4, and table 3 has f1, f2, and f5, the resulting output will have f1, f2, f3, f4, and f5; fields that did
notexist in the source table will be NULL 
in the output.

2) SELECT CASE WHEN t1.field1 IS NOT NULL THEN t1.field1 WHEN t2.field1 IS NOT NULL THEN t2.field1 ELSE t3.field1 END,
… 
    FROM t1 LEFT OUTER JOIN t2 ON (key field) LEFT OUTER JOIN t3 ON (key field)
Using LEFT OUTER JOIN ensures that all records from all three tables will be included and that duplicates among the
threetables will result 
in only one record. Duplicates within a table create a problem. For example, if t1 has “Ted” twice and t2 has “Ted”
twice,there will be four “Ted” 
records in the output.

If you can live with the elimination of duplicate rows within a table, option 1 is the easiest as you don’t have to
specificallycode each field to be selected. 

> On Sep 3, 2019, at 4:01 AM, TedJones <ted@mentra.co.uk> wrote:
>
> Hi Patrick
>
> Thanks for your response. The tables that I gave were just an example and
> there is no relevance in the friend/yes or names/contact details in
> different tables.
>
> To clarity, another example would be three tables - shop1, shop2 and shop3.
> Each would have products so that would be a common column but product price
> may also be a common column. However, there would be also different column
> names for each shop. (columns with same names would be of the same data
> type).
>
> I have no control over the three tables that I get as csv files. It should
> be easy enough, if needed to add a primary key column (to each?) after
> reading in the csv files into tables.
>
> So, generally, what I am trying to achieve is to combine three tables into
> one table where some of the columns are the same and some are not. e.g table
> 1 with 12 columns; table 2 with 10 columns where 4 are the same as in table
> 1; table 3 with 4 columns where 1 is the same as in table 1 and 1 the same
> as in table 2. The number of columns in the result table would then be 12
> (table 1) +  6 (new ones from table 2) + 2 (new ones from table 3) = 20
> columns.
>
> Usually the data rows from each table will be different but not always. If
> data in a column common to all three tables e.g product name, was the same
> from all tables i.e. same product name then all the information about that
> product from the three tables would be in all the columns for that data row.
> I hope that’s clear. i.e. the difference between example1 and example2 in my
> original question.
>
> In this was in reverse it would start with the large result table and use
> three SELECT statements to create the three tables.
>
> Thanks
> Ted Jones
>
>
>
>
>
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html
>
>




Re: joining tables

От
TedJones
Дата:
Hi Jack

I'm not sure if I understand your comment about two 'Ted' fields and
duplication of rows. In the example below there is 'Ted' twice in Table3 1
and 3 and must appear twice as I've shown in the result as the data in the
rest of the row is different. I agree with no duplication of rows if all of
the row is the same.

Ted

Table: 1                
Author    Title    Sales    Publication Date        
Jim    A    aa    I            
Ted    B    bb    J        
Dave    C    cc    K            
Ted    D    dd    L            

Table: 2
Author    Publisher    
Jim    him    
Ted    me    
Dave    me    
Will    you
Gary    him

Table:3
Author    Title    Country
Ted    B    UK
Ted    D    US
Jim    A    UK
Dave    C    UK

Combined table: Result
Author    Title    Publication Date    Publisher    Sales    Country
Ted    B    J    me    bb    UK
Ted    D    L    me    dd    US
Jim    A    I    him    aa    UK
Dave    C    K    me    cc    UK
Gary    null    null    him    null    null    
Will    null    null    you    null    null




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html



Re: joining tables

От
Patrick Headley
Дата:
Not having unique IDs from the source tables will present a problem in that you could have more than one record in one of the tables with the same data.

It looks like you will need to determine the fields that make each row unique but common in all three tables. Then, you can match up records from the three tables.

The next issue is if there is similar data in similar columns, which data takes prescience. That will be something you have to decide. I think the best way to do that is to import the most accurate table first. If a record from the second table matches a record from the first table, only import column values that weren't already populated by the first import. Then do the same thing with the 3rd table and so on.

I think the overall process will be to import each CSV file into a Postgres work table of it's own. Then, import the first table into a master table that contains every field. Do the same with the second import with the additional logic to only set a column value if it wasn't already set by the first (prior) import. Then, do the same with the third table and so on. By getting the CSV files into Postgres first, it will be easier to manage cases where a column between the two tables is the same data but the names are different. I think it will also perform better.

The import of the first table into the main table will be a simple INSERT query. For each of the rest of the tables I think I would first run an UPDATE query with an INNER JOIN that updates column values of matching records but only if the column isn't already populated. Then, run an INSERT query using the second table on the left side of a LEFT OUTER JOIN and the main table on the right side and only return records from the left hand table that don't match any records on the right hand table (i.e. fields in right table are null). Do the same for the rest of the tables.

A different way to perform the INSERT query would be to use a sub query in the WHERE clause of the second table that only returns records that don't have matches in the main table. This will be easier if each of the source tables has it's own unique ID across all source tables. If you don't have unique IDs, you can use a concatenation of key field values.

SELECT columns
FROM table2
WHERE [concatenation of ] key_column(s) in table2 NOT IN (SELECT [contenation of] key_column(s) FROM table1);

Patrick Headley
Linx Consulting, Inc.
(303) 916-5522
pheadley@linxco-inc.com
www.linxco-inc.com

On 9/3/19 5:01 AM, TedJones wrote:
Hi Patrick

Thanks for your response. The tables that I gave were just an example and
there is no relevance in the friend/yes or names/contact details in
different tables.

To clarity, another example would be three tables - shop1, shop2 and shop3.
Each would have products so that would be a common column but product price
may also be a common column. However, there would be also different column
names for each shop. (columns with same names would be of the same data
type).

I have no control over the three tables that I get as csv files. It should
be easy enough, if needed to add a primary key column (to each?) after
reading in the csv files into tables.

So, generally, what I am trying to achieve is to combine three tables into
one table where some of the columns are the same and some are not. e.g table
1 with 12 columns; table 2 with 10 columns where 4 are the same as in table
1; table 3 with 4 columns where 1 is the same as in table 1 and 1 the same
as in table 2. The number of columns in the result table would then be 12
(table 1) +  6 (new ones from table 2) + 2 (new ones from table 3) = 20
columns. 

Usually the data rows from each table will be different but not always. If
data in a column common to all three tables e.g product name, was the same
from all tables i.e. same product name then all the information about that
product from the three tables would be in all the columns for that data row.
I hope that’s clear. i.e. the difference between example1 and example2 in my
original question.

In this was in reverse it would start with the large result table and use
three SELECT statements to create the three tables.

Thanks
Ted Jones





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html



Re: joining tables

От
Jack Royal-Gordon
Дата:
Let me step back a bit, as I realize upon further reflection that the first method will not work.

As far as the issue about two “Ted” records, read it as though I said two “Ted D” records. It refers to two records in
thesame table with the same key value. If “Ted D” appeared twice in table 1 and twice in table 2, the join would give
fourresulting records (all combinations of the records from table 1 and the records from table 2 (and the records from
table3) — that’s fundamentally how a join works. If you don’t want that, then make sure that there are no duplicates
withineach of the tables. 

> On Sep 3, 2019, at 9:35 AM, TedJones <ted@mentra.co.uk> wrote:
>
> Hi Jack
>
> I'm not sure if I understand your comment about two 'Ted' fields and
> duplication of rows. In the example below there is 'Ted' twice in Table3 1
> and 3 and must appear twice as I've shown in the result as the data in the
> rest of the row is different. I agree with no duplication of rows if all of
> the row is the same.
>
> Ted
>
> Table: 1
> Author    Title    Sales    Publication Date
> Jim    A    aa    I
> Ted    B    bb    J
> Dave    C    cc    K
> Ted    D    dd    L
>
> Table: 2
> Author    Publisher
> Jim    him
> Ted    me
> Dave    me
> Will    you
> Gary    him
>
> Table:3
> Author    Title    Country
> Ted    B    UK
> Ted    D    US
> Jim    A    UK
> Dave    C    UK
>
> Combined table: Result
> Author    Title    Publication Date    Publisher    Sales    Country
> Ted    B    J    me    bb    UK
> Ted    D    L    me    dd    US
> Jim    A    I    him    aa    UK
> Dave    C    K    me    cc    UK
> Gary    null    null    him    null    null
> Will    null    null    you    null    null
>
>
>
>
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html
>
>




Re: joining tables

От
Michel Feinstein
Дата:
Also, be aware that this is NOT the postgresql email list, but the pgAdmin email list, you would get a lot more help about SQL there. 

On Tue, Sep 3, 2019, 19:44 Jack Royal-Gordon <jackrg@pobox.com> wrote:
Let me step back a bit, as I realize upon further reflection that the first method will not work.

As far as the issue about two “Ted” records, read it as though I said two “Ted D” records. It refers to two records in the same table with the same key value. If “Ted D” appeared twice in table 1 and twice in table 2, the join would give four resulting records (all combinations of the records from table 1 and the records from table 2 (and the records from table 3) — that’s fundamentally how a join works. If you don’t want that, then make sure that there are no duplicates within each of the tables.

> On Sep 3, 2019, at 9:35 AM, TedJones <ted@mentra.co.uk> wrote:
>
> Hi Jack
>
> I'm not sure if I understand your comment about two 'Ted' fields and
> duplication of rows. In the example below there is 'Ted' twice in Table3 1
> and 3 and must appear twice as I've shown in the result as the data in the
> rest of the row is different. I agree with no duplication of rows if all of
> the row is the same.
>
> Ted
>
> Table: 1                             
> Author        Title   Sales   Publication Date               
> Jim   A       aa      I                       
> Ted   B       bb      J               
> Dave  C       cc      K                       
> Ted   D       dd      L                       
>
> Table: 2
> Author        Publisher       
> Jim   him     
> Ted   me     
> Dave  me     
> Will  you
> Gary  him
>
> Table:3
> Author        Title   Country
> Ted   B       UK
> Ted   D       US
> Jim   A       UK
> Dave  C       UK
>
> Combined table: Result
> Author        Title   Publication Date        Publisher       Sales   Country
> Ted   B       J       me      bb      UK
> Ted   D       L       me      dd      US
> Jim   A       I       him     aa      UK
> Dave  C       K       me      cc      UK
> Gary  null    null    him     null    null   
> Will  null    null    you     null    null
>
>
>
>
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html
>
>



Re: joining tables

От
Avin Kavish
Дата:
yes, what you need is a full outer join. How are your actual results different from the expected ones?

On Wed, Sep 4, 2019 at 4:18 AM Michel Feinstein <michelfeinstein@gmail.com> wrote:
Also, be aware that this is NOT the postgresql email list, but the pgAdmin email list, you would get a lot more help about SQL there. 

On Tue, Sep 3, 2019, 19:44 Jack Royal-Gordon <jackrg@pobox.com> wrote:
Let me step back a bit, as I realize upon further reflection that the first method will not work.

As far as the issue about two “Ted” records, read it as though I said two “Ted D” records. It refers to two records in the same table with the same key value. If “Ted D” appeared twice in table 1 and twice in table 2, the join would give four resulting records (all combinations of the records from table 1 and the records from table 2 (and the records from table 3) — that’s fundamentally how a join works. If you don’t want that, then make sure that there are no duplicates within each of the tables.

> On Sep 3, 2019, at 9:35 AM, TedJones <ted@mentra.co.uk> wrote:
>
> Hi Jack
>
> I'm not sure if I understand your comment about two 'Ted' fields and
> duplication of rows. In the example below there is 'Ted' twice in Table3 1
> and 3 and must appear twice as I've shown in the result as the data in the
> rest of the row is different. I agree with no duplication of rows if all of
> the row is the same.
>
> Ted
>
> Table: 1                             
> Author        Title   Sales   Publication Date               
> Jim   A       aa      I                       
> Ted   B       bb      J               
> Dave  C       cc      K                       
> Ted   D       dd      L                       
>
> Table: 2
> Author        Publisher       
> Jim   him     
> Ted   me     
> Dave  me     
> Will  you
> Gary  him
>
> Table:3
> Author        Title   Country
> Ted   B       UK
> Ted   D       US
> Jim   A       UK
> Dave  C       UK
>
> Combined table: Result
> Author        Title   Publication Date        Publisher       Sales   Country
> Ted   B       J       me      bb      UK
> Ted   D       L       me      dd      US
> Jim   A       I       him     aa      UK
> Dave  C       K       me      cc      UK
> Gary  null    null    him     null    null   
> Will  null    null    you     null    null
>
>
>
>
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html
>
>