Re: joining tables

Поиск
Список
Период
Сортировка
От Patrick Headley
Тема Re: joining tables
Дата
Msg-id 419aab2d-186b-730f-4440-ebb8fbb1ce50@linxco-inc.com
обсуждение исходный текст
Ответ на Re: joining tables  (TedJones <ted@mentra.co.uk>)
Список pgadmin-support
Not having unique IDs from the source tables will present a problem in that you could have more than one record in one of the tables with the same data.

It looks like you will need to determine the fields that make each row unique but common in all three tables. Then, you can match up records from the three tables.

The next issue is if there is similar data in similar columns, which data takes prescience. That will be something you have to decide. I think the best way to do that is to import the most accurate table first. If a record from the second table matches a record from the first table, only import column values that weren't already populated by the first import. Then do the same thing with the 3rd table and so on.

I think the overall process will be to import each CSV file into a Postgres work table of it's own. Then, import the first table into a master table that contains every field. Do the same with the second import with the additional logic to only set a column value if it wasn't already set by the first (prior) import. Then, do the same with the third table and so on. By getting the CSV files into Postgres first, it will be easier to manage cases where a column between the two tables is the same data but the names are different. I think it will also perform better.

The import of the first table into the main table will be a simple INSERT query. For each of the rest of the tables I think I would first run an UPDATE query with an INNER JOIN that updates column values of matching records but only if the column isn't already populated. Then, run an INSERT query using the second table on the left side of a LEFT OUTER JOIN and the main table on the right side and only return records from the left hand table that don't match any records on the right hand table (i.e. fields in right table are null). Do the same for the rest of the tables.

A different way to perform the INSERT query would be to use a sub query in the WHERE clause of the second table that only returns records that don't have matches in the main table. This will be easier if each of the source tables has it's own unique ID across all source tables. If you don't have unique IDs, you can use a concatenation of key field values.

SELECT columns
FROM table2
WHERE [concatenation of ] key_column(s) in table2 NOT IN (SELECT [contenation of] key_column(s) FROM table1);

Patrick Headley
Linx Consulting, Inc.
(303) 916-5522
pheadley@linxco-inc.com
www.linxco-inc.com

On 9/3/19 5:01 AM, TedJones wrote:
Hi Patrick

Thanks for your response. The tables that I gave were just an example and
there is no relevance in the friend/yes or names/contact details in
different tables.

To clarity, another example would be three tables - shop1, shop2 and shop3.
Each would have products so that would be a common column but product price
may also be a common column. However, there would be also different column
names for each shop. (columns with same names would be of the same data
type).

I have no control over the three tables that I get as csv files. It should
be easy enough, if needed to add a primary key column (to each?) after
reading in the csv files into tables.

So, generally, what I am trying to achieve is to combine three tables into
one table where some of the columns are the same and some are not. e.g table
1 with 12 columns; table 2 with 10 columns where 4 are the same as in table
1; table 3 with 4 columns where 1 is the same as in table 1 and 1 the same
as in table 2. The number of columns in the result table would then be 12
(table 1) +  6 (new ones from table 2) + 2 (new ones from table 3) = 20
columns. 

Usually the data rows from each table will be different but not always. If
data in a column common to all three tables e.g product name, was the same
from all tables i.e. same product name then all the information about that
product from the three tables would be in all the columns for that data row.
I hope that’s clear. i.e. the difference between example1 and example2 in my
original question.

In this was in reverse it would start with the large result table and use
three SELECT statements to create the three tables.

Thanks
Ted Jones





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html



В списке pgadmin-support по дате отправления:

Предыдущее
От: Edson Richter
Дата:
Сообщение: Re: pgAdmin - migration
Следующее
От: Jack Royal-Gordon
Дата:
Сообщение: Re: joining tables