Re: large numbers of inserts out of memory strategy

Поиск
Список
Период
Сортировка
От Peter J. Holzer
Тема Re: large numbers of inserts out of memory strategy
Дата
Msg-id 20171130174005.jc3t5km2f2n4qmk7@hjp.at
обсуждение исходный текст
Ответ на Re: large numbers of inserts out of memory strategy  (Ted Toth <txtoth@gmail.com>)
Ответы Re: large numbers of inserts out of memory strategy  (Ted Toth <txtoth@gmail.com>)
Список pgsql-general
On 2017-11-30 08:43:32 -0600, Ted Toth wrote:
> Date: Thu, 30 Nov 2017 08:43:32 -0600
> From: Ted Toth <txtoth@gmail.com>
> To: "Peter J. Holzer" <hjp-pgsql@hjp.at>
> Cc: pgsql-general@lists.postgresql.org
> Subject: Re: large numbers of inserts out of memory strategy
>
> On Thu, Nov 30, 2017 at 4:22 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> > On 2017-11-29 08:32:02 -0600, Ted Toth wrote:
> >> Yes I did generate 1 large DO block:
> >>
> >> DO $$
> >> DECLARE thingid bigint; thingrec bigint; thingdataid bigint;
> >> BEGIN
> >> INSERT INTO thing
> >> (ltn,classification,machine,source,thgrec,flags,serial,type) VALUES
> >> ('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec
> >> INTO thingid,thingrec;
> >> INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid);
> >> INSERT INTO thingstatus
> >>
(thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs)
> >> VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0);
[...]
> >>
> >> <repeated for each thing>
> >>
> >> END $$;
> >>
> >> Should I limit the number of 'thing' inserts within a DO block or
> >> wrapping each 'thing' insert in it's own DO block?
> >
>
> Thanks for the specific suggestions.
>
> > I would suggest getting rid of the do block entirely if that is
> > possible. Just create lots of insert statements. You can get the current
> > value of a sequence with currval('sequence_name').
>
> What is the downside of using a DO block?

As you discovered, the downside is that the whole do block needs
to be parsed before it can be executed. If you generate a huge do block,
that can be an issue. If you create lots of little do blocks, the
overhead is probably not noticable.

But I'd like to pose the question the other way around: What are the
advantages of using a do block? If there aren't any, I wouldn't use it
(keep it simple). One advantage is that you can use variables. But I
don't think you need that. Other possible advantages could be speed or
transactions: I don't know if these exist.

> I'd have to do a nextval on each sequence before I could use currval,
> right?

That happens automatically if you insert the default value into a serial
column. You seem to do that here:

> >> INSERT INTO thing (ltn,classification,machine,source,thgrec,flags,serial,type)
> >> VALUES ('T007336','THING',0,1025,7336,7,'XXX869977564',1)
> >> RETURNING id,thgrec INTO thingid,thingrec;

The column id is probably defined as serial (or explicitely associated
with a sequence). So you can get the value you just inserted with
currval('thing_id_seq') (Check the table definition to get the sequence
name, but unless you explicitely specified the sequence name that's
it).

So you could ditch the "RETURNING ..." clause and replace the next
statement with:
   INSERT INTO recnum_thing (recnum, thing_id) VALUES (7336, currval('thing_id_seq'));

And similar for the other statements where you use thingid and
thingdataid (thingrec is already known as far as i can see).

Note that this doesn't work if you need two values from the same
sequence. So you can't replace
   insert into thing(..) values(...) returning id as parent_id;   insert into thing(..) values(...) returning id as
child_id;  insert into structure(parent, child) values(parent_id, child_id); 

in the same way. But you don't seem to need anything like that.


> One thing that is unclear to me is when commits occur while using psql
> would you know where in the docs I can find information on this
> subject?

By default psql enables autocommit which causes an implicit commit after
every statement. With a do block I'm not sure whether that means after
the do block or after each statement within the do block. I'd just turn
autocommit off and add explicit commits wherever I wanted them.


> > Alternately or in addition, since you are using python, you might want
> > to insert directly into the database from python using psycopg2. For
> > separate insert statements that should have about the same performance.
> > (It is usually much faster to write to a csv file and load that with
> > copy than to insert each row, but you don't do that and it might be
> > difficult in your case).
>
> Yes, I thought about generating csv files but didn't see a way to deal
> with the foreign keys.

Still: Is there a reason why you use a python script to create an sql
script instead of directly issuing the sql queries from your python
script?
       hp

--   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

В списке pgsql-general по дате отправления:

Предыдущее
От: Laurenz Albe
Дата:
Сообщение: Re: Searching for big differences between values
Следующее
От: Christopher Browne
Дата:
Сообщение: Re: large numbers of inserts out of memory strategy