Re: Populating large DB from Perl script

Поиск
Список
Период
Сортировка
От Dawid Kuroczko
Тема Re: Populating large DB from Perl script
Дата
Msg-id 758d5e7f0711021157g7eca84e8y7c682ba83e86cb04@mail.gmail.com
обсуждение исходный текст
Ответ на Populating large DB from Perl script  ("Kynn Jones" <kynnjo@gmail.com>)
Список pgsql-general
On 11/1/07, Kynn Jones <kynnjo@gmail.com> wrote:
> Hi.  This is a recurrent problem that I have not been able to find a
> good solution for.  I have  large database that needs to be built from
> scratch roughly once every month.  I use a Perl script to do this.
>
> The tables are very large, so I avoid as much as possible using
> in-memory data structures, and instead I rely heavily on temporary
> flat files.
>
> The problem is the population of tables that refer to "internal" IDs
> on other tables.  By "internal" I mean IDs that have no meaning
> external to the database; they exist only to enable relational
> referencing.  They are always defined as serial integers.  So the
> script either must create and keep track of them, or it must populate
> the database in stages, letting Pg assign the serial IDs, and query
> the database for these IDs during subsequent stages.

If it is possible, perhaps you could load "raw" data into temporary
table and then create ids using these tables.
For instance:
CREATE TEMP TABLE foo_raw (host text, city text, who text, value int);
INSERT INTO hosts (host) SELECT DISTINCT host FROM foo; -- group by perhaps?
INSERT INTO [...]
INSERT INTO foo SELECT host_id,city_id,who_id,value
   FROM foo_raw
   JOIN hosts USING (host)
   JOIN cities USING (city)
   JOIN who USING (who);
This may or may not work, depending on your setup.

But perhaps a better approach, while needing more work would be:

Your script establishes two DB connections, one for "processing data"
and one for maintaining IDs.  Now whenever you need to get and ID do:
  1) query memcached if found, return it
  2) query database if found return it and insert into memcached
  3) insert into database, and insert into memcached, and perhaps commit it.
Befriend thyself with Cache::* perl modules. :)

   Regards,
      Dawid

В списке pgsql-general по дате отправления:

Предыдущее
От: "Kynn Jones"
Дата:
Сообщение: Re: Populating large DB from Perl script
Следующее
От: andy
Дата:
Сообщение: Re: Populating large DB from Perl script