[NOVICE] Normalizing Unnormalized Input

Поиск
Список
Период
Сортировка
От Stephen Froehlich
Тема [NOVICE] Normalizing Unnormalized Input
Дата
Msg-id DM5PR06MB28919D0E5AB84F8D18CD207EE5C50@DM5PR06MB2891.namprd06.prod.outlook.com
обсуждение исходный текст
Ответы Re: [NOVICE] Normalizing Unnormalized Input
Список pgsql-novice

I have been building a PostgreSQL database for about a year and (shame) when I first built it, I built it unnormalized.  I am in the process of normalizing it at the moment (thankfully it isn’t live for anyone but me, so I can freeze it until I get it all normalized).

 

The part of the problem that I haven’t solved conceptually yet is how to normalize the incoming data.  Historically, I have been pulling a csv into R and perform whatever cleaning is necessary and then writing that unnormalized table straight out to PostgreSQL using the cumbersome and slow “caroline::dbWriteTable2” function.

 

I have seen:

 

https://stackoverflow.com/questions/23708278/how-to-normalize-data-efficently-while-inserting-into-sql-table-postgres

 

… but given the batch nature of my process, I was thinking of something a little different than this line-by-line function.

 

Instead, I was thinking about the following algorithm:

  • Have an empty unnormalized table without indexes that is written into in batches by R.
  • Have a function triggered by the update of that table that will:
    • Rename the table to a temp table name
    • Create a fresh blank table
    • “UPSERT” INSERT INTO … ON CONFLICT DO NOTHING each of the foreign key tables to capture any new values
    • Then build indexes on each of the text fields of the unnormalized table and then do a INSERT INTO the main normalized table with a SELECT with all of the proper joins ..
    • Delete the temp table

 

Does this sound like a good method?  How will it behave if I am doing two simultaneous writes to the unnormalized input table?

 

One thing I’m not sure of yet is how in a function to retain a variable name of the new temp table name, but I haven’t Googled that yet.

 

I am running PostgreSQL 9.6 and given that I both benefit from parallelization and also don’t run anything “live”, I’ll likely be upgrading to PostgreSQL 10 in August when it is final. I do share the server with a couple of colleagues, but that’s all the coordination I need.

 

 

Stephen Froehlich
Sr. Strategist, CableLabs®


s.froehlich@cablelabs.com

Tel: +1 (303) 661-3708

 

В списке pgsql-novice по дате отправления:

Предыдущее
От: Mick
Дата:
Сообщение: Re: [NOVICE] Question on installation of postgresql
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: [NOVICE] Normalizing Unnormalized Input