Обсуждение: Optimization required for multiple insertions in PostgreSQL

Поиск
Список
Период
Сортировка

Optimization required for multiple insertions in PostgreSQL

От
siva palanisamy
Дата:
I basically have 3 tables. One being the core table and the other 2 depend on the 1st. I have the requirement to add upto 70000 records in the tables. I do have constraints (primary & foreign keys, index, unique etc) set for the tables. I can't go for bulk import (using COPY command) as there is no standard .csv file in requirement, and the mapping is explicitly required plus few validations are externally applied in a C based programming file. Each record details (upto 70000) will be passed from .pgc (an ECPG based C Programming file) to postgresql file. It takes less time for the 1st few records and the performance is turning bad to the latter records! The result is very sad that it takes days to cover upto 20000! What are the performance measures could I step in into this? Please guide me

Re: Optimization required for multiple insertions in PostgreSQL

От
"Kevin Grittner"
Дата:
siva palanisamy <psivait@gmail.com> wrote:

> I basically have 3 tables. One being the core table and the other
> 2 depend on the 1st. I have the requirement to add upto 70000
> records in the tables. I do have constraints (primary & foreign
> keys, index, unique etc) set for the tables. I can't go for bulk
> import (using COPY command) as there is no standard .csv file in
> requirement, and the mapping is explicitly required plus few
> validations are externally applied in a C based programming file.
> Each record details (upto 70000) will be passed from .pgc (an ECPG
> based C Programming file) to postgresql file. It takes less time
> for the 1st few records and the performance is turning bad to the
> latter records! The result is very sad that it takes days to cover
> upto 20000! What are the performance measures could I step in into
> this? Please guide me

There's an awful lot you're not telling us, like what version of
PostgreSQL you're using, what your hardware looks like, how many
rows you're trying to insert per database transaction, what resource
looks like on the machine when it's running slow, what the specific
slow queries are and what their execution plans look like, etc.  I
could make a lot of guesses and take a shot in the dark with some
generic advice, but you would be better served by the more specific
advice you will get if you provide more detail.

Please review this page (and its links) and post again:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

Re: Optimization required for multiple insertions in PostgreSQL

От
"Tomas Vondra"
Дата:
On 3 Listopad 2011, 16:52, siva palanisamy wrote:
> I basically have 3 tables. One being the core table and the other 2 depend
> on the 1st. I have the requirement to add upto 70000 records in the
> tables.
> I do have constraints (primary & foreign keys, index, unique etc) set for
> the tables. I can't go for bulk import (using COPY command) as there is no
> standard .csv file in requirement, and the mapping is explicitly required
> plus few validations are externally applied in a C based programming file.
> Each record details (upto 70000) will be passed from .pgc (an ECPG based C
> Programming file) to postgresql file. It takes less time for the 1st few
> records and the performance is turning bad to the latter records! The
> result is very sad that it takes days to cover upto 20000! What are the
> performance measures could I step in into this? Please guide me

As Kevin already pointed out, this overall and very vague description is
not sufficient. We need to know at least this for starters

- version of PostgreSQL
- environment (what OS, what hardware - CPU, RAM, drives)
- basic PostgreSQL config values (shared buffers, checkpoint segments)
- structure of the tables, indexes etc.
- output of vmstat/iostat collected when the inserts are slow

And BTW the fact that you're not using a standard .csv file does not mean
you can't use COPY. You can either transform the file to CSV or create it
on the fly.

Tomas


Re: Optimization required for multiple insertions in PostgreSQL

От
"Kevin Grittner"
Дата:
[Please keep the list copied.]

siva palanisamy <psivait@gmail.com> wrote:

> Could you pls guide me on how to minimize time consumption? I've
> postgresql 8.1.4; Linux OS.

Well, the first thing to do is to use a supported version of
PostgreSQL.  More recent releases perform better, for starters.

http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy

Whichever major release you use, you should be up-to-date on bug
fixes, some of which are fixes for bugs which cause performance
problems:

http://www.postgresql.org/support/versioning

> I'm yet to check its RAM and other memory capacity but I
> guess it would've the necessary stuffs.

Knowing what hardware you have, and what your current PostgreSQL
configuration setting are, would allow us to suggest what you might
reconfigure to tune your database.

> My master table's schema is
>
> CREATE TABLE contacts ( contact_id SERIAL PRIMARY KEY,
> contact_type INTEGER DEFAULT 0, display_name TEXT NOT NULL DEFAULT
> '', first_name TEXT DEFAULT '', last_name TEXT DEFAULT '',
> company_name TEXT DEFAULT '', last_updated TIMESTAMP NOT NULL
> DEFAULT current_timestamp, UNIQUE(display_name) ) WITHOUT OIDS;

Not that this is a performance issue, but you almost certainly will
expect the semantics provided by TIMESTAMP WITH TIME ZONE for your
last_updated column.  Just specifying TIMESTAMP is probably going to
give you an unpleasant surprise somewhere down the road.

> I've a sql function that is called from a C program where
> parameters are being passed. It is replicated for the other 2
> tables as well. Totally, I've 3 tables.

Which table is the source of your slowness, and how do you know
that?

> FYI, database connection is opened for the 1st and closed
> only after the last record is attempted. Do you think these
> constraints take a lot of time?

The only constraints you've shown are PRIMARY KEY and UNIQUE.  It is
somewhat slower to add rows with those constraints in place than to
blast in data without the constraints and then add the constraints;
but I understand that if the data is not known to be clean and free
of duplicates, that's not possible.  That certainly doesn't account
for the timings you describe.

> taking days to complete 20000 odd records are not encouraging!

I've seen PostgreSQL insert more rows than that per second, so it's
not like it is some inherent slowness of PostgreSQL.  There is
something you're doing with it that is that slow.  Getting onto a
modern version of PostgreSQL may help a lot, but most likely there's
something you're not telling us yet that is the thing that really
needs to change.

Just as one off-the-wall example of what *can* happen -- if someone
disabled autovacuum and had a function which did an update to all
rows in a table each time the function was called, they would see
performance like you describe.  How do I know, from what you've told
me, that you're *not* doing that?  Or one of a hundred other things
I could postulate?  (Hint, if you showed us your current PostgreSQL
settings I could probably have ruled this out.)

-Kevin