Обсуждение: Performance of INSERT into temporary tables using psqlODBC driver

Поиск
Список
Период
Сортировка

Performance of INSERT into temporary tables using psqlODBC driver

От
padusuma
Дата:
I am working on adding support for PostgreSQL database for our application.
In a lot of our use-cases, data is inserted into temporary tables using
INSERT INTO statements with bind parameters, and subsequently queries are
run by joining to these temp tables. Following is some of the data for these
INSERT statements:

Table definition: CREATE TEMPORARY TABLE Table1( auid varchar(15) ) ON
COMMIT DELETE ROWS;

SQL statement: INSERT INTO Table1 (uidcol) VALUES (:1);

Time taken to insert 24428 rows: 10.077 sec
Time taken to insert 32512 rows: 16.026 sec
Time taken to insert 32512 rows: 15.821 sec
Time taken to insert  6107 rows: 1.514 sec

I am looking for suggestions to improve the performance of these INSERT
statements into temporary tables. Database is located on a Linux VM and the
version is "PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit". The application is running on a
windows platform and connecting to the database using psqlODBC driver
version 10.03.

Please let me know if any additional information is needed.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Re: Performance of INSERT into temporary tables using psqlODBC driver

От
Tim Cross
Дата:
padusuma <venkata.adusumalli@gmail.com> writes:

> I am working on adding support for PostgreSQL database for our application.
> In a lot of our use-cases, data is inserted into temporary tables using
> INSERT INTO statements with bind parameters, and subsequently queries are
> run by joining to these temp tables. Following is some of the data for these
> INSERT statements:
>
> Table definition: CREATE TEMPORARY TABLE Table1( auid varchar(15) ) ON
> COMMIT DELETE ROWS;
>
> SQL statement: INSERT INTO Table1 (uidcol) VALUES (:1);
>
> Time taken to insert 24428 rows: 10.077 sec
> Time taken to insert 32512 rows: 16.026 sec
> Time taken to insert 32512 rows: 15.821 sec
> Time taken to insert  6107 rows: 1.514 sec
>
> I am looking for suggestions to improve the performance of these INSERT
> statements into temporary tables. Database is located on a Linux VM and the
> version is "PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
> 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit". The application is running on a
> windows platform and connecting to the database using psqlODBC driver
> version 10.03.
>

We are inserting large numbers (millions) of rows into a postgres
database from a Javascript application and found using the COPY command
was much, much faster than doing regular inserts (even with multi-insert
commit). If you can do this using the driver you are using, that will
give you the largest performance boost. 


-- 
Tim Cross


Re: Performance of INSERT into temporary tables using psqlODBCdriver

От
padusuma
Дата:
>We are inserting large numbers (millions) of rows into a postgres
>database from a Javascript application and found using the COPY command
>was much, much faster than doing regular inserts (even with multi-insert
>commit). If you can do this using the driver you are using, that will
>give you the largest performance boost. 

The data to be inserted into temporary tables is obtained from one or more
queries run earlier and the data is available as a vector of strings. If I
need to use COPY FROM command, then the application would need to create a
file with the data to be inserted and the file needs to be readable by the
user running database server process, which may not be always possible
unless the application is running on the same host. I think this approach
may not be feasible for our application.

I have increased the value for /temp_buffers/ server parameter from the
default 8 MB to 128 MB. However, this change did not affect the INSERT time
for temporary tables.




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Re: Performance of INSERT into temporary tables using psqlODBC driver

От
Sergei Kornilov
Дата:
Hello

> The data to be inserted into temporary tables is obtained from one or more
> queries run earlier and the data is available as a vector of strings.
You can not use "insert into temp_table select /*anything you wish*/" statement?
Or even insert .. select ... returning if you need receive data to application?

> If I need to use COPY FROM command, then the application would need to create a
> file with the data to be inserted
You can not using "copy from stdin" statement?

regards, Sergei


Re: Performance of INSERT into temporary tables using psqlODBCdriver

От
padusuma
Дата:
Hello Sergei,
>> The data to be inserted into temporary tables is obtained from one or
>> more 
>> queries run earlier and the data is available as a vector of strings. 
>You can not use "insert into temp_table select /*anything you wish*/"
statement? 
>Or even insert .. select ... returning if you need receive data to
application? 
Unfortunately, the existing functionality in our application is in such a
manner that the data returned from one or more SELECT queries is processed
by server business logic and filtered, and the filtered data is then
inserted into the temporary tables. This is the reason I could not use
insert into ... select ... or insert ... select ... returning statements.

>> If I need to use COPY FROM command, then the application would need to
>> create a 
>> file with the data to be inserted 
>You can not using "copy from stdin" statement?
Thank you for suggesting the usage of "copy from stdin". I am not sure how
to pass the values to be inserted as input for "COPY FROM STDIN" statement
from my application based on psqlODBC driver. Can someone point me to an
example or suggest how to pass data from a client application to "COPY FROM
STDIN" statement?
Thanks.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Re: Performance of INSERT into temporary tables using psqlODBC driver

От
Tim Cross
Дата:
padusuma <venkata.adusumalli@gmail.com> writes:

>>We are inserting large numbers (millions) of rows into a postgres
>>database from a Javascript application and found using the COPY command
>>was much, much faster than doing regular inserts (even with multi-insert
>>commit). If you can do this using the driver you are using, that will
>>give you the largest performance boost.
>
> The data to be inserted into temporary tables is obtained from one or more
> queries run earlier and the data is available as a vector of strings. If I
> need to use COPY FROM command, then the application would need to create a
> file with the data to be inserted and the file needs to be readable by the
> user running database server process, which may not be always possible
> unless the application is running on the same host. I think this approach
> may not be feasible for our application.
>

OK, that does make a difference. If your data is already in the
database, COPY is not going to help you much.

> I have increased the value for /temp_buffers/ server parameter from the
> default 8 MB to 128 MB. However, this change did not affect the INSERT time
> for temporary tables.

It isn't clear why you create vectors of strings rather than just select
into or something similar.

There are no 'quick fixes' which can be applied without real analysis of
the system. However, based on the limited information available, you may
want to consider -

- Increase work_mem to reduce use of temp files. Need it to be 2 to 3
  times largest temp file (but use common sense)

- Tweak wal checkpoint parameters to prevent wal checkpoints occurring
  too frequently. Note that there is a play off here between frequency
  of checkpoints and boot time after a crash. Fewer wal checkpoints will
  usually improve performance, but recovery time is longer.

- Verify your inserts into temporary tables is the bottleneck and not
  the select from existing data (explain plan etc and adjust indexes
  accordingly).

How effectively you can increase insert times will depend on what the
memory and cpu profile of the system is. More memory, less use of temp
files, faster system, so spend a bit of time to make sure your system is
configured to squeeze as much out of that RAM as you can!

--
Tim Cross


Re: Performance of INSERT into temporary tables using psqlODBCdriver

От
padusuma
Дата:
Hello Tim,

>> I have increased the value for /temp_buffers/ server parameter from the 
>> default 8 MB to 128 MB. However, this change did not affect the INSERT
>> time 
>> for temporary tables. 

>It isn't clear why you create vectors of strings rather than just select 
>into or something similar. 

>There are no 'quick fixes' which can be applied without real analysis of 
>the system. However, based on the limited information available, you may 
>want to consider - 

>- Increase work_mem to reduce use of temp files. Need it to be 2 to 3 
>  times largest temp file (but use common sense) 

I have already increased the work_mem and maintenance_work_mem to 256MB. I
will check on the temp file sizes and adjust the work_mem parameter as you
suggested.

>- Tweak wal checkpoint parameters to prevent wal checkpoints occurring 
>  too frequently. Note that there is a play off here between frequency 
>  of checkpoints and boot time after a crash. Fewer wal checkpoints will 
>  usually improve performance, but recovery time is longer. 

>- Verify your inserts into temporary tables is the bottleneck and not 
>  the select from existing data (explain plan etc and adjust indexes 
>  accordingly). 

In few use-cases, I see that multiple inserts took 150 seconds out of total
database processing time of 175 seconds, and hence, the focus is on these
insert statements. I have run ANALYZE statement followed by INSERT INTO
temporary tables, before the temporary tables are used in joins in
subsequent queries. This reduced the subsequent query processing times due
to the updated statistics. I will look into adding indexes for these
temporary tables as well.

>How effectively you can increase insert times will depend on what the 
>memory and cpu profile of the system is. More memory, less use of temp 
>files, faster system, so spend a bit of time to make sure your system is 
>configured to squeeze as much out of that RAM as you can! 

Thank you for the suggestions. I will try these out.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Re: Performance of INSERT into temporary tables using psqlODBCdriver

От
padusuma
Дата:
Hello Tim,

I have tried the suggestions provided to the best of my knowledge, but I did
not see any improvement in the INSERT performance for temporary tables. The
Linux host on which PostgreSQL database is installed has 32 GB RAM.
Following are current settings I have in postgresql.conf file:
shared_buffers = 8GB
temp_buffers = 256MB
work_mem = 256MB
maintenance_work_mem = 256MB
wal_buffers = 256MB

checkpoint_timeout = 30min
checkpoint_completion_target = 0.75
max_wal_size = 1GB

effective_cache_size = 16GB

>>- Increase work_mem to reduce use of temp files. Need it to be 2 to 3 
>>  times largest temp file (but use common sense) 

>I have already increased the work_mem and maintenance_work_mem to 256MB. I 
>will check on the temp file sizes and adjust the work_mem parameter as you 
>suggested. 

>- Tweak wal checkpoint parameters to prevent wal checkpoints occurring 
>  too frequently. Note that there is a play off here between frequency 
>  of checkpoints and boot time after a crash. Fewer wal checkpoints will 
>  usually improve performance, but recovery time is longer. 

>How effectively you can increase insert times will depend on what the 
>memory and cpu profile of the system is. More memory, less use of temp 
>files, faster system, so spend a bit of time to make sure your system is 
>configured to squeeze as much out of that RAM as you can! 

Please let me know if there are any other suggestions that I can try.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Re: Performance of INSERT into temporary tables using psqlODBC driver

От
Tim Cross
Дата:
padusuma <venkata.adusumalli@gmail.com> writes:

> Hello Tim,
>
> I have tried the suggestions provided to the best of my knowledge, but I did
> not see any improvement in the INSERT performance for temporary tables. The
> Linux host on which PostgreSQL database is installed has 32 GB RAM.
> Following are current settings I have in postgresql.conf file:
> shared_buffers = 8GB
> temp_buffers = 256MB
> work_mem = 256MB
> maintenance_work_mem = 256MB
> wal_buffers = 256MB
>
> checkpoint_timeout = 30min
> checkpoint_completion_target = 0.75
> max_wal_size = 1GB
>
> effective_cache_size = 16GB
>
>>>- Increase work_mem to reduce use of temp files. Need it to be 2 to 3
>>>  times largest temp file (but use common sense)
>
>>I have already increased the work_mem and maintenance_work_mem to 256MB. I
>>will check on the temp file sizes and adjust the work_mem parameter as you
>>suggested.
>
>>- Tweak wal checkpoint parameters to prevent wal checkpoints occurring
>>  too frequently. Note that there is a play off here between frequency
>>  of checkpoints and boot time after a crash. Fewer wal checkpoints will
>>  usually improve performance, but recovery time is longer.
>
>>How effectively you can increase insert times will depend on what the
>>memory and cpu profile of the system is. More memory, less use of temp
>>files, faster system, so spend a bit of time to make sure your system is
>>configured to squeeze as much out of that RAM as you can!
>
> Please let me know if there are any other suggestions that I can try.

How are you gathering metrics to determine if performance has improved
or not?

Have you seen any change in your explain (analyze, buffers) plans?

Make sure your table statistics are all up-to-date before performing
each benchmark test. I often turn off autovacuum when doing this sort of
testing so that I know exactly when tables get vacuumed and statistics
get updated (just ensure you remember to turn it back on when your
finished!).

Are the wal checkpoints being triggered every 30 mins or more
frequently?

Are you still seeing the system use lots of temp files?

Do you have any indexes on the tables your inserting into?

As mentioned previously, there are no simple/quick fixes here - you
cannot just change a setting and see performance improve. It will be
necessary to do a lot of experimentation, gathering statistics and
investigate how postgres is using buffers, disk IO etc. All of these
parameters interact with each other, so it is critical you have good
metrics to see exactly what your changes do. It is complex and time
consuming. Highly recommend PostgreSQL: High Performance (Ahmed & SMith)
and Mastering Postgres (Shonig) for valuable background/tips - there
really is just far too much to communicate effectively via email.

Tim


--
Tim Cross


Re: Performance of INSERT into temporary tables using psqlODBCdriver

От
padusuma
Дата:
Hello Tim,

>How are you gathering metrics to determine if performance has improved 
>or not? 
I am measuring the response times through timer for the execution of SQL
statements through psqlODBC driver. The response times for INSERT INTO
temp-table statements have not changed with the parameters I modified.

>Have you seen any change in your explain (analyze, buffers) plans? 

There was no change in the EXPLAIN for INSERT INTO statement, but the
performance of the queries improved by about 5%.

>Make sure your table statistics are all up-to-date before performing 
>each benchmark test. I often turn off autovacuum when doing this sort of 
>testing so that I know exactly when tables get vacuumed and statistics 
>get updated (just ensure you remember to turn it back on when your 
>finished!). 
I ran the VACUUM ANALYZE statement manually before starting the tests. Even
though autovacuum was turned on, it did not get invoked due to the
thresholds and as bulk of the inserts are in temporary tables.

>Are the wal checkpoints being triggered every 30 mins or more 
>frequently? 
The wal checkpoints are triggered every 30 mins.

>Are you still seeing the system use lots of temp files? 
I do not see any files in pgsql_tmp folders in the tablespaces where the
tables are created. Also, I do not see pgsql_tmp folder in base and global
folders. Am I checking for these files in the correct location? Also, I ran
the following query (taken from another forum) to check the temporary files
generated for all the databases:
SELECT temp_files AS "Temporary files", temp_bytes AS "Size of temporary
files" FROM pg_stat_database db;

The result is 0 for both columns.

>Do you have any indexes on the tables your inserting into? 
I have not created indexes on these temporary tables, but programatically
executed /ANALYZE <temp-table>/ statement after the data is inserted into
these temp tables, to generate/update statistics for these tables. Indexes
do exist for all regular tables.

>As mentioned previously, there are no simple/quick fixes here - you 
>cannot just change a setting and see performance improve. It will be 
>necessary to do a lot of experimentation, gathering statistics and 
>investigate how postgres is using buffers, disk IO etc. All of these 
>parameters interact with each other, so it is critical you have good 
>metrics to see exactly what your changes do. It is complex and time 
>consuming. Highly recommend PostgreSQL: High Performance (Ahmed & SMith) 
>and Mastering Postgres (Shonig) for valuable background/tips - there 
>really is just far too much to communicate effectively via email.

Thank you for the suggestions on the books. I will go through these.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Re: Performance of INSERT into temporary tables using psqlODBC driver

От
Tim Cross
Дата:
padusuma <venkata.adusumalli@gmail.com> writes:

> Hello Tim,
>
>>How are you gathering metrics to determine if performance has improved
>>or not?
> I am measuring the response times through timer for the execution of SQL
> statements through psqlODBC driver. The response times for INSERT INTO
> temp-table statements have not changed with the parameters I modified.
>
>>Have you seen any change in your explain (analyze, buffers) plans?
>
> There was no change in the EXPLAIN for INSERT INTO statement, but the
> performance of the queries improved by about 5%.
>
>>Make sure your table statistics are all up-to-date before performing
>>each benchmark test. I often turn off autovacuum when doing this sort of
>>testing so that I know exactly when tables get vacuumed and statistics
>>get updated (just ensure you remember to turn it back on when your
>>finished!).
> I ran the VACUUM ANALYZE statement manually before starting the tests. Even
> though autovacuum was turned on, it did not get invoked due to the
> thresholds and as bulk of the inserts are in temporary tables.
>
>>Are the wal checkpoints being triggered every 30 mins or more
>>frequently?
> The wal checkpoints are triggered every 30 mins.
>
>>Are you still seeing the system use lots of temp files?
> I do not see any files in pgsql_tmp folders in the tablespaces where the
> tables are created. Also, I do not see pgsql_tmp folder in base and global
> folders. Am I checking for these files in the correct location? Also, I ran
> the following query (taken from another forum) to check the temporary files
> generated for all the databases:
> SELECT temp_files AS "Temporary files", temp_bytes AS "Size of temporary
> files" FROM pg_stat_database db;
>
> The result is 0 for both columns.
>
>>Do you have any indexes on the tables your inserting into?
> I have not created indexes on these temporary tables, but programatically
> executed /ANALYZE <temp-table>/ statement after the data is inserted into
> these temp tables, to generate/update statistics for these tables. Indexes
> do exist for all regular tables.
>
>>As mentioned previously, there are no simple/quick fixes here - you
>>cannot just change a setting and see performance improve. It will be
>>necessary to do a lot of experimentation, gathering statistics and
>>investigate how postgres is using buffers, disk IO etc. All of these
>>parameters interact with each other, so it is critical you have good
>>metrics to see exactly what your changes do. It is complex and time
>>consuming. Highly recommend PostgreSQL: High Performance (Ahmed & SMith)
>>and Mastering Postgres (Shonig) for valuable background/tips - there
>>really is just far too much to communicate effectively via email.
>
> Thank you for the suggestions on the books. I will go through these.

Based on your responses, it sounds like you have done the 'easy' stuff
which often results in improved performance. Now you are going to have
to dig much harder. It might be worth looking more closely at how
buffers/caching is working (pg_buffercache extension might be useful),
verifying where performance bottlenecks are (this can sometimes be
surprising - it may not be where you think it is. Don't forget to
profile your client, network/driver throughput, OS level disk I/O
etc). This is where books like PosgreSQL High Performance will be
useful.

My only word of caution is that you are likely to now begin looking at
options which can improve throughput, but often come with other 'costs',
such as stability, data integrity or recovery time. These are things
which can only be assessed on a per case basis and largely depend on
business priorities. It will take time and you will need to make changes
slowly and do a lot of benchmarking.

It is really important to have a clear idea as to what would be
acceptable performance rather than just a vague concept of making things
faster. For example, one application I have inserts 1.3+ billion rows
per day. This represents two 'sets' of data. Our minimum requirement was
the ability to process 1 set, but if possible, 2 sets would be
ideal. Initially, with the original technology being used, it took
between 23 and 26 hours to process 1 set. We were able to tune this to
get it always to be under 24 hours, but there was no way we were going
to get the level of improvement which would allow more than 1 set to be
processed per day - not with the technology and design that was in
place.

A decision was made to re-implement using a different technology and
design. This was where we gained the improvements in performance we
really required. While the technology did play a part, it was really the
re-design which gave us the performance improvement to reach our desired
goal of 2 sets per day. Even 3 sets per day is a possibility now.

We could have spent a lot of time tuning and re-spe'ing hardware etc to
get to 1 set per day and we would have succeeded, but that would have
been the absolute upper limit. I suspect it would have cost about the
same as the re-implementation, but with a much lower upper limit.

Re-implementation of a solution is often a hard case to sell, but it
might be the only way to get the performance you want. The big positive
to a re-implementation is that you usually get a better solution because
you are implementing with more knowledge and experience about the
problem domain. Design is often cleaner and as a result, easier to
maintain. It usually takes a lot less time than the original
implementation as well and can be the more economical solution compared
to fighting a system which has fundamental design limitations that
restrict performance.

good luck,

Tim
--
Tim Cross


Re: Performance of INSERT into temporary tables using psqlODBCdriver

От
padusuma
Дата:
Hello Tim,

>Re-implementation of a solution is often a hard case to sell, but it 
>might be the only way to get the performance you want. The big positive 
>to a re-implementation is that you usually get a better solution because 
>you are implementing with more knowledge and experience about the 
>problem domain. Design is often cleaner and as a result, easier to 
>maintain. It usually takes a lot less time than the original 
>implementation as well and can be the more economical solution compared 
>to fighting a system which has fundamental design limitations that 
>restrict performance.

Thank you for the suggestions and advice. I will definitely look into
re-implementation of certain parts of our solution as an option to improve
performance.




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html