Обсуждение: Performance Problems
Hi, i came across a problem for which I don't really have an explanation. If anyone has some ideas, help would be greatly appreciated. Here we go... I run postgers 7.2.1 on a SunFire, 1 GB memory. I run a perl batch job (DBI, same machine) that does the following. 1. Iterates through all the records in a table (total 4500 records) by selecting one record at the time 2. The record then is used to select a couple of records from another table (total 400,000 records in the table) and then selects a couple of records (average 10-15), complements the records with additional information and stores the information in a temp table. 3. The temp table is then consulted, one record selected and then inserted in a results table. the results table is growing by 4,500 records a day. the temp table is primarily used to apply a logic by sorting records. The process: The whole process loops 4,500 times. For every loop - a temp table is cleared - 10-15 records are inserted into the temp table - one record is selected from the temp table and inserted into the results table. in-between some calculations are done in perl. the total number of selects is ca. 400,000 total number of inserts of table deletes are 4,500 total number of inserts are ca. 400,000 The problem: initially if am able to evaluate 5-6 records a second and insert them into the results table, however towards the end of the evaluation, the figure drops to under 2. I optimized the code, use transaction blocks and tuned the database (has 700mb of memory allocated to it). What I don't quite understand is why the system slows down. since the process steps are the same. Could perl be the reason ? Any ideas ? Thanks Alex
Alex Paulusberger <alexp@meta-bit.com> writes: > initially if am able to evaluate 5-6 records a second and insert them > into the results table, however towards the end > of the evaluation, the figure drops to under 2. Do you have suitable index(es) on the temp table? Is the system actually using it/them for your queries? regards, tom lane
Another thought... Alex Paulusberger <alexp@meta-bit.com> writes: > The whole process loops 4,500 times. > For every loop > - a temp table is cleared How exactly are you clearing the temp table? DELETE FROM isn't a good plan because you'll still have dead tuples in there. You could do a DELETE FROM and then VACUUM, but I'd suggest TRUNCATE instead. regards, tom lane
Tom, thanks. I do use DELETE FROM since truncate is not an option in trnansaction blocks and within DBI , autocommit has to be turned off when connecting to the DB. But maybe you are right, and the overhead not being able to truncate tables is bigger than not using transaction blocks. Regards Alex Tom Lane wrote: >Another thought... > >Alex Paulusberger <alexp@meta-bit.com> writes: > >>The whole process loops 4,500 times. >>For every loop >>- a temp table is cleared >> > >How exactly are you clearing the temp table? DELETE FROM isn't a good >plan because you'll still have dead tuples in there. You could do a >DELETE FROM and then VACUUM, but I'd suggest TRUNCATE instead. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to majordomo@postgresql.org so that your >message can get through to the mailing list cleanly > >
Alex, databases are designed for set operations, i.e. process multiple rows at a time. The process you describe is inherently inefficient. Looping through the whole table; one line at a time with a DBI call each. This will result in many uneccessary table scans and lots of overhead for your several thousand DBI calls. So you would get the greatest performance improvment if you could rewrite your logic in a way to completely eliminate the loop. Write a querry that will create the whole temp table in one shot. The write a second querry to insert into the results table. Depending on the complexety of your calculations you may need to create your own SQL functions and add them to your database. Regards, Nikolaus Dilger On Fri, 23 August 2002, Alex Paulusberger wrote: > > Hi, > i came across a problem for which I don't really have > an explanation. If > anyone has some ideas, help would be greatly > appreciated. > > Here we go... > > I run postgers 7.2.1 on a SunFire, 1 GB memory. > I run a perl batch job (DBI, same machine) that does > the following. > > 1. Iterates through all the records in a table (total > 4500 records) by > selecting one record at the time > 2. The record then is used to select a couple of > records from another > table (total 400,000 records in the table) > and then selects a couple of records (average > 10-15), complements > the records with additional information > and stores the information in a temp table. > 3. The temp table is then consulted, one record > selected and then > inserted in a results table. > the results table is growing by 4,500 records a > day. the temp table > is primarily used to apply a logic by sorting > records. > > The process: > The whole process loops 4,500 times. > For every loop > - a temp table is cleared > - 10-15 records are inserted into the temp table > - one record is selected from the temp table and > inserted into the > results table. > in-between some calculations are done in perl. > > the total number of selects is ca. 400,000 > total number of inserts of table deletes are 4,500 > total number of inserts are ca. 400,000 > > The problem: > initially if am able to evaluate 5-6 records a second > and insert them > into the results table, however towards the end > of the evaluation, the figure drops to under 2. I > optimized the code, > use transaction blocks and tuned the database (has > 700mb of memory > allocated to it). What I don't quite understand is why > the system slows > down. since the process steps are the same. Could perl > be the reason ? > > Any ideas ? > > Thanks > Alex > > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org
Alex, You're able to alter the autocommit value during processing: $dbh->{AutoCommit} = 0; $dbh->do( qq{insert into yucky values ('A')} ); $dbh->commit; $dbh->{AutoCommit} = 1; $dbh->do( qq{truncate table yucky} ); $dbh->{AutoCommit} = 0; $dbh->do( qq{insert into yucky values ('A')} ); $dbh->commit; Do you need the use of a temp database table? If so have you looked at using a memory resident table (DBD::AnyData)? Another thing I've found that helps is to cache reference or lookup data. Memoize is an easy way to cache. This only works for data that doesn't change between queries select x from y where z = 1 always returns 'F' ... Tom On Fri, Aug 23, 2002 at 11:21:22PM +0900, Alex Paulusberger wrote: > Tom, > thanks. I do use DELETE FROM since truncate is not an option in > trnansaction blocks > and within DBI , autocommit has to be turned off when connecting to the DB. > > But maybe you are right, and the overhead not being able to truncate > tables is bigger than > not using transaction blocks. > > Regards > Alex > > Tom Lane wrote: > > >Another thought... > > > >Alex Paulusberger <alexp@meta-bit.com> writes: > > > >>The whole process loops 4,500 times. > >>For every loop > >>- a temp table is cleared > >> > > > >How exactly are you clearing the temp table? DELETE FROM isn't a good > >plan because you'll still have dead tuples in there. You could do a > >DELETE FROM and then VACUUM, but I'd suggest TRUNCATE instead.