Обсуждение: "Ungroup" data for import into PostgreSQL

Поиск
Список
Период
Сортировка

"Ungroup" data for import into PostgreSQL

От
"George Weaver"
Дата:
Hi List,

I need to import data from a large Excel spreadsheet into a PostgreSQL
table.  I have a program that uses ODBC to connect to Excel and extract data
using SQL queries.  The program then inserts the data into a PostgreSQL
table.

The challenge with this particular spreadsheet is that it is arrayed thus:

Model No 1     Product Code 15
    Serial No 1       No on Hand
    Serial No 2       No on Hand
    Serial No 3       No on Hand
Model No 4    Product Code 9
    Serial No 12     No on Hand
Model No 5    Product Code 27
    Serial No 6       No on Hand
    Serial No 14     No on Hand

etc.

I need the data in PostgreSQL arrayed thus

Model No 1    Product Code 15    Serial No 1       No on Hand
Model No 1    Product Code 15    Serial No 2       No on Hand
Model No 1    Product Code 15    Serial No 3       No on Hand
Model No 4    Product Code 9      Serial No 12     No on Hand
Model No 5    Product Code 27    Serial No 6       No on Hand
Model No 5    Product Code 27    Serial No 14     No on Hand

I can import the data procedurely using plpgsql to match the individual rows
to the master for each row (import the raw data into a buffer table in
PostgreSQL and then looping through the rows in the buffer table and
checking to see when the Model No changes).

I'm wondering if there is a more elegant way to do this using straight sql
from Excel?

Thanks for your help/suggestions!
George



Re: "Ungroup" data for import into PostgreSQL

От
Jim Nasby
Дата:
On 1/15/15 9:43 AM, George Weaver wrote:
> Hi List,
>
> I need to import data from a large Excel spreadsheet into a PostgreSQL table.  I have a program that uses ODBC to
connectto Excel and extract data using SQL queries.  The program then inserts the data into a PostgreSQL table. 
>
> The challenge with this particular spreadsheet is that it is arrayed thus:
>
> Model No 1     Product Code 15
>     Serial No 1       No on Hand
>     Serial No 2       No on Hand
>     Serial No 3       No on Hand
> Model No 4    Product Code 9
>     Serial No 12     No on Hand
> Model No 5    Product Code 27
>     Serial No 6       No on Hand
>     Serial No 14     No on Hand
>
> etc.
>
> I need the data in PostgreSQL arrayed thus
>
> Model No 1    Product Code 15    Serial No 1       No on Hand
> Model No 1    Product Code 15    Serial No 2       No on Hand
> Model No 1    Product Code 15    Serial No 3       No on Hand
> Model No 4    Product Code 9      Serial No 12     No on Hand
> Model No 5    Product Code 27    Serial No 6       No on Hand
> Model No 5    Product Code 27    Serial No 14     No on Hand
>
> I can import the data procedurely using plpgsql to match the individual rows to the master for each row (import the
rawdata into a buffer table in PostgreSQL and then looping through the rows in the buffer table and checking to see
whenthe Model No changes). 

Note that if you're doing that you better be putting the rownumber from excel into the table... result sets are NOT
guaranteedto be in insert order! 

> I'm wondering if there is a more elegant way to do this using straight sql from Excel?

Well, that's really an excel question, not a Postgres question...

If you load the whole spreadsheet into a single table and have a way to differentiate between the different rows then
youmight be able to do something with CTE's to relate a serial number to the product code. That might be faster than
plpgsql.

You might also be able to do something creative with formulas in excel to copy the product code data to the serial #
rows.You could then import the whole thing and re-normalize it. 

There's probably some stuff you could do with VBA too. If you care about performance you don't want to execute SQL
statementsfor each spreadsheet row. 
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: "Ungroup" data for import into PostgreSQL

От
Adrian Klaver
Дата:
On 01/15/2015 04:56 PM, Jim Nasby wrote:
> On 1/15/15 9:43 AM, George Weaver wrote:
>> Hi List,
>>
>> I need to import data from a large Excel spreadsheet into a PostgreSQL
>> table.  I have a program that uses ODBC to connect to Excel and
>> extract data using SQL queries.  The program then inserts the data
>> into a PostgreSQL table.
>>
>> The challenge with this particular spreadsheet is that it is arrayed
>> thus:
>>
>> Model No 1     Product Code 15
>>     Serial No 1       No on Hand
>>     Serial No 2       No on Hand
>>     Serial No 3       No on Hand
>> Model No 4    Product Code 9
>>     Serial No 12     No on Hand
>> Model No 5    Product Code 27
>>     Serial No 6       No on Hand
>>     Serial No 14     No on Hand
>>
>> etc.
>>
>> I need the data in PostgreSQL arrayed thus
>>
>> Model No 1    Product Code 15    Serial No 1       No on Hand
>> Model No 1    Product Code 15    Serial No 2       No on Hand
>> Model No 1    Product Code 15    Serial No 3       No on Hand
>> Model No 4    Product Code 9      Serial No 12     No on Hand
>> Model No 5    Product Code 27    Serial No 6       No on Hand
>> Model No 5    Product Code 27    Serial No 14     No on Hand
>>
>> I can import the data procedurely using plpgsql to match the
>> individual rows to the master for each row (import the raw data into a
>> buffer table in PostgreSQL and then looping through the rows in the
>> buffer table and checking to see when the Model No changes).
>
> Note that if you're doing that you better be putting the rownumber from
> excel into the table... result sets are NOT guaranteed to be in insert
> order!
>
>> I'm wondering if there is a more elegant way to do this using straight
>> sql from Excel?
>
> Well, that's really an excel question, not a Postgres question...
>
> If you load the whole spreadsheet into a single table and have a way to
> differentiate between the different rows then you might be able to do
> something with CTE's to relate a serial number to the product code. That
> might be faster than plpgsql.
>
> You might also be able to do something creative with formulas in excel
> to copy the product code data to the serial # rows. You could then
> import the whole thing and re-normalize it.
>
> There's probably some stuff you could do with VBA too. If you care about
> performance you don't want to execute SQL statements for each
> spreadsheet row.

Or if you really want to slice and dice and you use Python, then take a
look at Pandas:

http://pandas.pydata.org/

In particular the IO functions:

http://pandas.pydata.org/pandas-docs/stable/io.html


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: "Ungroup" data for import into PostgreSQL

От
"George Weaver"
Дата:
Sorry for the late reply...life interefered...

From: "Jim Nasby" <Jim.Nasby@BlueTreble.com>


> On 1/15/15 9:43 AM, George Weaver wrote:
>> Hi List,
>>
>> I need to import data from a large Excel spreadsheet into a PostgreSQL
>> table.  I have a program that uses ODBC to connect to Excel and extract
>> data using SQL queries.  The program then inserts the data into a
>> PostgreSQL table.
>>
>> The challenge with this particular spreadsheet is that it is arrayed
>> thus:
>>
>> Model No 1     Product Code 15
>>     Serial No 1       No on Hand
>>     Serial No 2       No on Hand
>>     Serial No 3       No on Hand
>> Model No 4    Product Code 9
>>     Serial No 12     No on Hand
>> Model No 5    Product Code 27
>>     Serial No 6       No on Hand
>>     Serial No 14     No on Hand
>>
>> etc.
>>
>> I need the data in PostgreSQL arrayed thus
>>
>> Model No 1    Product Code 15    Serial No 1       No on Hand
>> Model No 1    Product Code 15    Serial No 2       No on Hand
>> Model No 1    Product Code 15    Serial No 3       No on Hand
>> Model No 4    Product Code 9      Serial No 12     No on Hand
>> Model No 5    Product Code 27    Serial No 6       No on Hand
>> Model No 5    Product Code 27    Serial No 14     No on Hand
>>
>> I can import the data procedurely using plpgsql to match the individual
>> rows to the master for each row (import the raw data into a buffer table
>> in PostgreSQL and then looping through the rows in the buffer table and
>> checking to see when the Model No changes).
>
> Note that if you're doing that you better be putting the rownumber from
> excel into the table... result sets are NOT guaranteed to be in insert
> order!

Good point.
>
>> I'm wondering if there is a more elegant way to do this using straight
>> sql from Excel?
>
> Well, that's really an excel question, not a Postgres question...
>
> If you load the whole spreadsheet into a single table and have a way to
> differentiate between the different rows then you might be able to do
> something with CTE's to relate a serial number to the product code. That
> might be faster than plpgsql.

I have gone the route of loading the whole file into a buffer table in
PostgreSQL and processing it from there.  Since the only way I can relate
each row to the model number is by its relative row position, I have used a
plpgsql loop to "flatten" the data.

>
> You might also be able to do something creative with formulas in excel to
> copy the product code data to the serial # rows. You could then import the
> whole thing and re-normalize it.

I looked at this but decided to just do it in PostgreSQL - much more
powerful :-)

>
> There's probably some stuff you could do with VBA too. If you care about
> performance you don't want to execute SQL statements for each spreadsheet
> row.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: "Ungroup" data for import into PostgreSQL

От
"George Weaver"
Дата:
Hi Adrian,

From: "Adrian Klaver" <adrian.klaver@aklaver.com>
Subject: Re: [GENERAL] "Ungroup" data for import into PostgreSQL


> On 01/15/2015 04:56 PM, Jim Nasby wrote:
>> On 1/15/15 9:43 AM, George Weaver wrote:
>>> Hi List,
>>>
>>> I need to import data from a large Excel spreadsheet into a PostgreSQL
>>> table.  I have a program that uses ODBC to connect to Excel and
>>> extract data using SQL queries.  The program then inserts the data
>>> into a PostgreSQL table.
>>>
>>> The challenge with this particular spreadsheet is that it is arrayed
>>> thus:
>>>
>>> Model No 1     Product Code 15
>>>     Serial No 1       No on Hand
>>>     Serial No 2       No on Hand
>>>     Serial No 3       No on Hand
>>> Model No 4    Product Code 9
>>>     Serial No 12     No on Hand
>>> Model No 5    Product Code 27
>>>     Serial No 6       No on Hand
>>>     Serial No 14     No on Hand
>>>
>>> etc.
>>>
>>> I need the data in PostgreSQL arrayed thus
>>>
>>> Model No 1    Product Code 15    Serial No 1       No on Hand
>>> Model No 1    Product Code 15    Serial No 2       No on Hand
>>> Model No 1    Product Code 15    Serial No 3       No on Hand
>>> Model No 4    Product Code 9      Serial No 12     No on Hand
>>> Model No 5    Product Code 27    Serial No 6       No on Hand
>>> Model No 5    Product Code 27    Serial No 14     No on Hand
>>>
>>> I can import the data procedurely using plpgsql to match the
>>> individual rows to the master for each row (import the raw data into a
>>> buffer table in PostgreSQL and then looping through the rows in the
>>> buffer table and checking to see when the Model No changes).
>>
>> Note that if you're doing that you better be putting the rownumber from
>> excel into the table... result sets are NOT guaranteed to be in insert
>> order!
>>
>>> I'm wondering if there is a more elegant way to do this using straight
>>> sql from Excel?
>>
>> Well, that's really an excel question, not a Postgres question...
>>
>> If you load the whole spreadsheet into a single table and have a way to
>> differentiate between the different rows then you might be able to do
>> something with CTE's to relate a serial number to the product code. That
>> might be faster than plpgsql.
>>
>> You might also be able to do something creative with formulas in excel
>> to copy the product code data to the serial # rows. You could then
>> import the whole thing and re-normalize it.
>>
>> There's probably some stuff you could do with VBA too. If you care about
>> performance you don't want to execute SQL statements for each
>> spreadsheet row.
>
> Or if you really want to slice and dice and you use Python, then take a
> look at Pandas:
>
> http://pandas.pydata.org/
>
> In particular the IO functions:
>
> http://pandas.pydata.org/pandas-docs/stable/io.html

I don't use Python but I found the information in these links very
interesting.

Thanks!

> --
> Adrian Klaver
> adrian.klaver@aklaver.com


Re: "Ungroup" data for import into PostgreSQL

От
"George Weaver"
Дата:
Hi Adrian,

From: "Adrian Klaver" <adrian.klaver@aklaver.com>
Subject: Re: [GENERAL] "Ungroup" data for import into PostgreSQL


> On 01/15/2015 04:56 PM, Jim Nasby wrote:
>> On 1/15/15 9:43 AM, George Weaver wrote:
>>> Hi List,
>>>
>>> I need to import data from a large Excel spreadsheet into a PostgreSQL
>>> table.  I have a program that uses ODBC to connect to Excel and
>>> extract data using SQL queries.  The program then inserts the data
>>> into a PostgreSQL table.
>>>
>>> The challenge with this particular spreadsheet is that it is arrayed
>>> thus:
>>>
>>> Model No 1     Product Code 15
>>>     Serial No 1       No on Hand
>>>     Serial No 2       No on Hand
>>>     Serial No 3       No on Hand
>>> Model No 4    Product Code 9
>>>     Serial No 12     No on Hand
>>> Model No 5    Product Code 27
>>>     Serial No 6       No on Hand
>>>     Serial No 14     No on Hand
>>>
>>> etc.
>>>
>>> I need the data in PostgreSQL arrayed thus
>>>
>>> Model No 1    Product Code 15    Serial No 1       No on Hand
>>> Model No 1    Product Code 15    Serial No 2       No on Hand
>>> Model No 1    Product Code 15    Serial No 3       No on Hand
>>> Model No 4    Product Code 9      Serial No 12     No on Hand
>>> Model No 5    Product Code 27    Serial No 6       No on Hand
>>> Model No 5    Product Code 27    Serial No 14     No on Hand
>>>
>>> I can import the data procedurely using plpgsql to match the
>>> individual rows to the master for each row (import the raw data into a
>>> buffer table in PostgreSQL and then looping through the rows in the
>>> buffer table and checking to see when the Model No changes).
>>
>> Note that if you're doing that you better be putting the rownumber from
>> excel into the table... result sets are NOT guaranteed to be in insert
>> order!
>>
>>> I'm wondering if there is a more elegant way to do this using straight
>>> sql from Excel?
>>
>> Well, that's really an excel question, not a Postgres question...
>>
>> If you load the whole spreadsheet into a single table and have a way to
>> differentiate between the different rows then you might be able to do
>> something with CTE's to relate a serial number to the product code. That
>> might be faster than plpgsql.
>>
>> You might also be able to do something creative with formulas in excel
>> to copy the product code data to the serial # rows. You could then
>> import the whole thing and re-normalize it.
>>
>> There's probably some stuff you could do with VBA too. If you care about
>> performance you don't want to execute SQL statements for each
>> spreadsheet row.
>
> Or if you really want to slice and dice and you use Python, then take a
> look at Pandas:
>
> http://pandas.pydata.org/
>
> In particular the IO functions:
>
> http://pandas.pydata.org/pandas-docs/stable/io.html

I don't use Python but I found the information in these links very
interesting.

Thanks!

>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com



Re: "Ungroup" data for import into PostgreSQL

От
"George Weaver"
Дата:
Hi Adrian,

From: "Adrian Klaver" <adrian.klaver@aklaver.com>
Subject: Re: [GENERAL] "Ungroup" data for import into PostgreSQL


> On 01/15/2015 04:56 PM, Jim Nasby wrote:
>> On 1/15/15 9:43 AM, George Weaver wrote:
>>> Hi List,
>>>
>>> I need to import data from a large Excel spreadsheet into a PostgreSQL
>>> table.  I have a program that uses ODBC to connect to Excel and
>>> extract data using SQL queries.  The program then inserts the data
>>> into a PostgreSQL table.
>>>
>>> The challenge with this particular spreadsheet is that it is arrayed
>>> thus:
>>>
>>> Model No 1     Product Code 15
>>>     Serial No 1       No on Hand
>>>     Serial No 2       No on Hand
>>>     Serial No 3       No on Hand
>>> Model No 4    Product Code 9
>>>     Serial No 12     No on Hand
>>> Model No 5    Product Code 27
>>>     Serial No 6       No on Hand
>>>     Serial No 14     No on Hand
>>>
>>> etc.
>>>
>>> I need the data in PostgreSQL arrayed thus
>>>
>>> Model No 1    Product Code 15    Serial No 1       No on Hand
>>> Model No 1    Product Code 15    Serial No 2       No on Hand
>>> Model No 1    Product Code 15    Serial No 3       No on Hand
>>> Model No 4    Product Code 9      Serial No 12     No on Hand
>>> Model No 5    Product Code 27    Serial No 6       No on Hand
>>> Model No 5    Product Code 27    Serial No 14     No on Hand
>>>
>>> I can import the data procedurely using plpgsql to match the
>>> individual rows to the master for each row (import the raw data into a
>>> buffer table in PostgreSQL and then looping through the rows in the
>>> buffer table and checking to see when the Model No changes).
>>
>> Note that if you're doing that you better be putting the rownumber from
>> excel into the table... result sets are NOT guaranteed to be in insert
>> order!
>>
>>> I'm wondering if there is a more elegant way to do this using straight
>>> sql from Excel?
>>
>> Well, that's really an excel question, not a Postgres question...
>>
>> If you load the whole spreadsheet into a single table and have a way to
>> differentiate between the different rows then you might be able to do
>> something with CTE's to relate a serial number to the product code. That
>> might be faster than plpgsql.
>>
>> You might also be able to do something creative with formulas in excel
>> to copy the product code data to the serial # rows. You could then
>> import the whole thing and re-normalize it.
>>
>> There's probably some stuff you could do with VBA too. If you care about
>> performance you don't want to execute SQL statements for each
>> spreadsheet row.
>
> Or if you really want to slice and dice and you use Python, then take a
> look at Pandas:
>
> http://pandas.pydata.org/
>
> In particular the IO functions:
>
> http://pandas.pydata.org/pandas-docs/stable/io.html

I don't use Python but I found the information in these links very
interesting.

Thanks!

> --
> Adrian Klaver
> adrian.klaver@aklaver.com


Re: "Ungroup" data for import into PostgreSQL

От
"George Weaver"
Дата:
Sorry for the late reply...life interefered...

>From: Jim Nasby

>On 1/15/15 9:43 AM, George Weaver wrote:
>> Hi List,
>>
>> I need to import data from a large Excel spreadsheet into a PostgreSQL
>> table.  I have a program that uses ODBC to connect to Excel and extract
>> data using SQL queries.  The program then inserts the data into a
>> PostgreSQL table.
>>
>> The challenge with this particular spreadsheet is that it is arrayed
>> thus:
>>
>> Model No 1     Product Code 15
>>     Serial No 1       No on Hand
>>     Serial No 2       No on Hand
>>     Serial No 3       No on Hand
>> Model No 4    Product Code 9
>>     Serial No 12     No on Hand
>> Model No 5    Product Code 27
>>     Serial No 6       No on Hand
>>     Serial No 14     No on Hand
>>
>> etc.
>>
>> I need the data in PostgreSQL arrayed thus
>>
>> Model No 1    Product Code 15    Serial No 1       No on Hand
>> Model No 1    Product Code 15    Serial No 2       No on Hand
>> Model No 1    Product Code 15    Serial No 3       No on Hand
>> Model No 4    Product Code 9      Serial No 12     No on Hand
>> Model No 5    Product Code 27    Serial No 6       No on Hand
>> Model No 5    Product Code 27    Serial No 14     No on Hand
>>
>> I can import the data procedurely using plpgsql to match the individual
>> rows to the master for each row (import the raw data into a buffer table
>> in PostgreSQL and then looping through the rows in the buffer table and
>> checking to see when the Model No changes).

>Note that if you're doing that you better be putting the rownumber from
>excel into the table... result sets are NOT guaranteed to be in insert
>order!

Good point.

>> I'm wondering if there is a more elegant way to do this using straight
>> sql from Excel?

>Well, that's really an excel question, not a Postgres question...

>If you load the whole spreadsheet into a single table and have a way to
>differentiate between the different rows then you might be able to do
>something wih CTE's to relate a serial number to the product code. That
>might be faster than plpgsql.

I have gone the route of loading the whole file into a buffer table in
PostgreSQL and processing it from there.  Since the only way I can relate
each row to the model number is by its relative row position, I have used a
plpgsql loop to "flatten" the data.

>You might also be able to do something creative with formulas in excel to
>copy the product code data to the serial # rows. You could then import the
>whole thing and re-normalize it.

I looked at this but decided to just do it in PostgreSQL - much more
powerful :-)

>There's probably some stuff you could do with VBA too. If you care about
>performance you don't want to execute SQL statements for each spreadsheet
>row.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com