Обсуждение: procedure using CURSOR to insert is extremely slow

Поиск
Список
Период
Сортировка

procedure using CURSOR to insert is extremely slow

От
Szalontai Zoltán
Дата:

Hi,

 

We have a Class db.t2.medium database on AWS.

We use a procedure to transfer data records from the Source to the Target Schema.

Transfers are identified by the log_id field in the target table.

 

The procedure is:

1 all records are deleted from the Target table with the actual log_id value

2 a complicated SELECT (numerous tables are joined) is created on the Source system

3 a cursor is defined based on this SELECT

4 we go trough the CURSOR and insert new records into the Target table with this log_id

 

(Actually we have about 100 tables in the Target schema and the size of the database backup file is about 1GByte. But we do the same for all the Target tables.)

 

Our procedure is extremely slow for the first run: 3 days for the 100 tables. For the second and all subsequent run it is fast enough (15 minutes).

The only difference between the first run and all the others is that in the first run there are no records in the Target schema with this log_id.

 

It seems, that in the first step the DELETE operation makes free some “space”, and the INSET operation in the 4. step can reuse this space. But if no records are deleted in the first step, the procedure is extremely slow.

 

To speed up the first run we found the following workaround:

We inserted dummy records into the Target tables with the proper log_id, and really the first run became very fast again.

 

Is there any “normal” way to speed up this procedure?

In the production environment there will be only “first runs”, the same log_id will never be used again.

 

 

thank

Zoltán

 

 

Re: procedure using CURSOR to insert is extremely slow

От
Hervé Schweitzer (HER)
Дата:
If you do a delete on the first step without any statistics, you request will do a full scan of the table, which will be slower.

Did you check the different execution plans ?


From: Szalontai Zoltán <szalontai.zoltan@t-online.hu>
Sent: Thursday, April 8, 2021 01:24 PM
To: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Subject: procedure using CURSOR to insert is extremely slow
 

Hi,

 

We have a Class db.t2.medium database on AWS.

We use a procedure to transfer data records from the Source to the Target Schema.

Transfers are identified by the log_id field in the target table.

 

The procedure is:

1 all records are deleted from the Target table with the actual log_id value

2 a complicated SELECT (numerous tables are joined) is created on the Source system

3 a cursor is defined based on this SELECT

4 we go trough the CURSOR and insert new records into the Target table with this log_id

 

(Actually we have about 100 tables in the Target schema and the size of the database backup file is about 1GByte. But we do the same for all the Target tables.)

 

Our procedure is extremely slow for the first run: 3 days for the 100 tables. For the second and all subsequent run it is fast enough (15 minutes).

The only difference between the first run and all the others is that in the first run there are no records in the Target schema with this log_id.

 

It seems, that in the first step the DELETE operation makes free some “space”, and the INSET operation in the 4. step can reuse this space. But if no records are deleted in the first step, the procedure is extremely slow.

 

To speed up the first run we found the following workaround:

We inserted dummy records into the Target tables with the proper log_id, and really the first run became very fast again.

 

Is there any “normal” way to speed up this procedure?

In the production environment there will be only “first runs”, the same log_id will never be used again.

 

 

thank

Zoltán

 

 

RE: procedure using CURSOR to insert is extremely slow

От
Szalontai Zoltán
Дата:

How to check execution plans?

We are in the Loop of the Cursor, and we do insert operations in it.

 

From: Hervé Schweitzer (HER) <herve.schweitzer@dbi-services.com>
Sent: Thursday, April 8, 2021 1:40 PM
To: Szalontai Zoltán <szalontai.zoltan@t-online.hu>; pgsql-performance@lists.postgresql.org
Subject: Re: procedure using CURSOR to insert is extremely slow

 

If you do a delete on the first step without any statistics, you request will do a full scan of the table, which will be slower.

 

Did you check the different execution plans ?

 


From: Szalontai Zoltán <szalontai.zoltan@t-online.hu>
Sent: Thursday, April 8, 2021 01:24 PM
To: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Subject: procedure using CURSOR to insert is extremely slow

 

Hi,

 

We have a Class db.t2.medium database on AWS.

We use a procedure to transfer data records from the Source to the Target Schema.

Transfers are identified by the log_id field in the target table.

 

The procedure is:

1 all records are deleted from the Target table with the actual log_id value

2 a complicated SELECT (numerous tables are joined) is created on the Source system

3 a cursor is defined based on this SELECT

4 we go trough the CURSOR and insert new records into the Target table with this log_id

 

(Actually we have about 100 tables in the Target schema and the size of the database backup file is about 1GByte. But we do the same for all the Target tables.)

 

Our procedure is extremely slow for the first run: 3 days for the 100 tables. For the second and all subsequent run it is fast enough (15 minutes).

The only difference between the first run and all the others is that in the first run there are no records in the Target schema with this log_id.

 

It seems, that in the first step the DELETE operation makes free some “space”, and the INSET operation in the 4. step can reuse this space. But if no records are deleted in the first step, the procedure is extremely slow.

 

To speed up the first run we found the following workaround:

We inserted dummy records into the Target tables with the proper log_id, and really the first run became very fast again.

 

Is there any “normal” way to speed up this procedure?

In the production environment there will be only “first runs”, the same log_id will never be used again.

 

 

thank

Zoltán

 

 

Re: procedure using CURSOR to insert is extremely slow

От
Milos Babic
Дата:
Hi Zoltan,

is there any particular reason why you don't do a bulk insert as:
   insert into target_table
   select ... from source_table(s) (with joins etc)

Regards,
Milos



On Thu, Apr 8, 2021 at 1:24 PM Szalontai Zoltán <szalontai.zoltan@t-online.hu> wrote:

Hi,

 

We have a Class db.t2.medium database on AWS.

We use a procedure to transfer data records from the Source to the Target Schema.

Transfers are identified by the log_id field in the target table.

 

The procedure is:

1 all records are deleted from the Target table with the actual log_id value

2 a complicated SELECT (numerous tables are joined) is created on the Source system

3 a cursor is defined based on this SELECT

4 we go trough the CURSOR and insert new records into the Target table with this log_id

 

(Actually we have about 100 tables in the Target schema and the size of the database backup file is about 1GByte. But we do the same for all the Target tables.)

 

Our procedure is extremely slow for the first run: 3 days for the 100 tables. For the second and all subsequent run it is fast enough (15 minutes).

The only difference between the first run and all the others is that in the first run there are no records in the Target schema with this log_id.

 

It seems, that in the first step the DELETE operation makes free some “space”, and the INSET operation in the 4. step can reuse this space. But if no records are deleted in the first step, the procedure is extremely slow.

 

To speed up the first run we found the following workaround:

We inserted dummy records into the Target tables with the proper log_id, and really the first run became very fast again.

 

Is there any “normal” way to speed up this procedure?

In the production environment there will be only “first runs”, the same log_id will never be used again.

 

 

thank

Zoltán

 

 



--

RE: procedure using CURSOR to insert is extremely slow

От
Szalontai Zoltán
Дата:

Hi Milos,

 

Inside the loops there are frequently if / else branches value transformations used.

We could not solve it without using a cursor.

 

Regards,

Zoltán

 

From: Milos Babic <milos.babic@gmail.com>
Sent: Thursday, April 8, 2021 2:31 PM
To: Szalontai Zoltán <szalontai.zoltan@t-online.hu>
Cc: Pgsql Performance <pgsql-performance@lists.postgresql.org>
Subject: Re: procedure using CURSOR to insert is extremely slow

 

Hi Zoltan,

 

is there any particular reason why you don't do a bulk insert as:

   insert into target_table

   select ... from source_table(s) (with joins etc)

 

Regards,

Milos

 

 

 

On Thu, Apr 8, 2021 at 1:24 PM Szalontai Zoltán <szalontai.zoltan@t-online.hu> wrote:

Hi,

 

We have a Class db.t2.medium database on AWS.

We use a procedure to transfer data records from the Source to the Target Schema.

Transfers are identified by the log_id field in the target table.

 

The procedure is:

1 all records are deleted from the Target table with the actual log_id value

2 a complicated SELECT (numerous tables are joined) is created on the Source system

3 a cursor is defined based on this SELECT

4 we go trough the CURSOR and insert new records into the Target table with this log_id

 

(Actually we have about 100 tables in the Target schema and the size of the database backup file is about 1GByte. But we do the same for all the Target tables.)

 

Our procedure is extremely slow for the first run: 3 days for the 100 tables. For the second and all subsequent run it is fast enough (15 minutes).

The only difference between the first run and all the others is that in the first run there are no records in the Target schema with this log_id.

 

It seems, that in the first step the DELETE operation makes free some “space”, and the INSET operation in the 4. step can reuse this space. But if no records are deleted in the first step, the procedure is extremely slow.

 

To speed up the first run we found the following workaround:

We inserted dummy records into the Target tables with the proper log_id, and really the first run became very fast again.

 

Is there any “normal” way to speed up this procedure?

In the production environment there will be only “first runs”, the same log_id will never be used again.

 

 

thank

Zoltán

 

 


 

--

RE: procedure using CURSOR to insert is extremely slow

От
"Mike Sofen"
Дата:

Hi Zoltan,

 

I haven’t needed to use a cursor in 20 years of sometimes very complex sql coding.

 

Why?  Cursors result in RBAR (row by agonizing row) operation which eliminates the power of set-based sql operations.  Performance will always suffer – sometimes to extremes.  I’m all about fastest possible performance for a given sql solution.

 

How?  There have been times I’ve initially said a similar thing – “I don’t see how to solve this without a cursor”.  When I hit that point, I stop and decompose the problem into simpler bits, and soak on it and always – literally always – a solution will appear. 

 

It’s all in how we envision the solution, especially with Postgres and its amazing ecosystem of sql functions.  We really can do almost anything.  Since the code is obviously way to complex to post here, I’d simply encourage you to rethink how you’re approaching the solution.

 

Mike

 

From: Szalontai Zoltán <szalontai.zoltan@t-online.hu>
Sent: Thursday, April 08, 2021 6:57 AM
To: 'Milos Babic' <milos.babic@gmail.com>
Cc: 'Pgsql Performance' <pgsql-performance@lists.postgresql.org>
Subject: RE: procedure using CURSOR to insert is extremely slow

 

Hi Milos,

 

Inside the loops there are frequently if / else branches value transformations used.

We could not solve it without using a cursor.

 

Regards,

Zoltán

 

From: Milos Babic <milos.babic@gmail.com>
Sent: Thursday, April 8, 2021 2:31 PM
To: Szalontai Zoltán <szalontai.zoltan@t-online.hu>
Cc: Pgsql Performance <pgsql-performance@lists.postgresql.org>
Subject: Re: procedure using CURSOR to insert is extremely slow

 

Hi Zoltan,

 

is there any particular reason why you don't do a bulk insert as:

   insert into target_table

   select ... from source_table(s) (with joins etc)

 

Regards,

Milos

 

 

 

On Thu, Apr 8, 2021 at 1:24 PM Szalontai Zoltán <szalontai.zoltan@t-online.hu> wrote:

Hi,

 

We have a Class db.t2.medium database on AWS.

We use a procedure to transfer data records from the Source to the Target Schema.

Transfers are identified by the log_id field in the target table.

 

The procedure is:

1 all records are deleted from the Target table with the actual log_id value

2 a complicated SELECT (numerous tables are joined) is created on the Source system

3 a cursor is defined based on this SELECT

4 we go trough the CURSOR and insert new records into the Target table with this log_id

 

(Actually we have about 100 tables in the Target schema and the size of the database backup file is about 1GByte. But we do the same for all the Target tables.)

 

Our procedure is extremely slow for the first run: 3 days for the 100 tables. For the second and all subsequent run it is fast enough (15 minutes).

The only difference between the first run and all the others is that in the first run there are no records in the Target schema with this log_id.

 

It seems, that in the first step the DELETE operation makes free some “space”, and the INSET operation in the 4. step can reuse this space. But if no records are deleted in the first step, the procedure is extremely slow.

 

To speed up the first run we found the following workaround:

We inserted dummy records into the Target tables with the proper log_id, and really the first run became very fast again.

 

Is there any “normal” way to speed up this procedure?

In the production environment there will be only “first runs”, the same log_id will never be used again.

 

 

thank

Zoltán

 

 


 

--

Re: procedure using CURSOR to insert is extremely slow

От
Milos Babic
Дата:
Hi Zoltan,

you should try to rethink the logic behind the query.
Numerous if/then/else can be transformed into case-when, or a bunch of unions, which, I'm 100% certain will do much better than row-by-row insertion.

However, this is a general note.
Still doesn't explain why it takes faster to insert with deletions (?!!)
Is there any chance the set you inserting in the second run is smaller (e.g. only a fraction of the original one)?

If possible, you can send over a fragment of the code, and we can look into it.

regards,
Milos






On Thu, Apr 8, 2021 at 3:56 PM Szalontai Zoltán <szalontai.zoltan@t-online.hu> wrote:

Hi Milos,

 

Inside the loops there are frequently if / else branches value transformations used.

We could not solve it without using a cursor.

 

Regards,

Zoltán

 

From: Milos Babic <milos.babic@gmail.com>
Sent: Thursday, April 8, 2021 2:31 PM
To: Szalontai Zoltán <szalontai.zoltan@t-online.hu>
Cc: Pgsql Performance <pgsql-performance@lists.postgresql.org>
Subject: Re: procedure using CURSOR to insert is extremely slow

 

Hi Zoltan,

 

is there any particular reason why you don't do a bulk insert as:

   insert into target_table

   select ... from source_table(s) (with joins etc)

 

Regards,

Milos

 

 

 

On Thu, Apr 8, 2021 at 1:24 PM Szalontai Zoltán <szalontai.zoltan@t-online.hu> wrote:

Hi,

 

We have a Class db.t2.medium database on AWS.

We use a procedure to transfer data records from the Source to the Target Schema.

Transfers are identified by the log_id field in the target table.

 

The procedure is:

1 all records are deleted from the Target table with the actual log_id value

2 a complicated SELECT (numerous tables are joined) is created on the Source system

3 a cursor is defined based on this SELECT

4 we go trough the CURSOR and insert new records into the Target table with this log_id

 

(Actually we have about 100 tables in the Target schema and the size of the database backup file is about 1GByte. But we do the same for all the Target tables.)

 

Our procedure is extremely slow for the first run: 3 days for the 100 tables. For the second and all subsequent run it is fast enough (15 minutes).

The only difference between the first run and all the others is that in the first run there are no records in the Target schema with this log_id.

 

It seems, that in the first step the DELETE operation makes free some “space”, and the INSET operation in the 4. step can reuse this space. But if no records are deleted in the first step, the procedure is extremely slow.

 

To speed up the first run we found the following workaround:

We inserted dummy records into the Target tables with the proper log_id, and really the first run became very fast again.

 

Is there any “normal” way to speed up this procedure?

In the production environment there will be only “first runs”, the same log_id will never be used again.

 

 

thank

Zoltán

 

 


 

--



--

RE: procedure using CURSOR to insert is extremely slow

От
Szalontai Zoltán
Дата:

Hi Milos,

 

I discuss this kind of rethinking with the team.

 

Perhaps I can copy our database on AWS for you, and you can check it.

 

thanks,

Zoltán

 

From: Milos Babic <milos.babic@gmail.com>
Sent: Thursday, April 8, 2021 8:22 PM
To: Szalontai Zoltán <szalontai.zoltan@t-online.hu>
Cc: Pgsql Performance <pgsql-performance@lists.postgresql.org>
Subject: Re: procedure using CURSOR to insert is extremely slow

 

Hi Zoltan,

 

you should try to rethink the logic behind the query.

Numerous if/then/else can be transformed into case-when, or a bunch of unions, which, I'm 100% certain will do much better than row-by-row insertion.

 

However, this is a general note.

Still doesn't explain why it takes faster to insert with deletions (?!!)

Is there any chance the set you inserting in the second run is smaller (e.g. only a fraction of the original one)?

 

If possible, you can send over a fragment of the code, and we can look into it.

 

regards,

Milos

 

 

 

 

 

 

On Thu, Apr 8, 2021 at 3:56 PM Szalontai Zoltán <szalontai.zoltan@t-online.hu> wrote:

Hi Milos,

 

Inside the loops there are frequently if / else branches value transformations used.

We could not solve it without using a cursor.

 

Regards,

Zoltán

 

From: Milos Babic <milos.babic@gmail.com>
Sent: Thursday, April 8, 2021 2:31 PM
To: Szalontai Zoltán <szalontai.zoltan@t-online.hu>
Cc: Pgsql Performance <pgsql-performance@lists.postgresql.org>
Subject: Re: procedure using CURSOR to insert is extremely slow

 

Hi Zoltan,

 

is there any particular reason why you don't do a bulk insert as:

   insert into target_table

   select ... from source_table(s) (with joins etc)

 

Regards,

Milos

 

 

 

On Thu, Apr 8, 2021 at 1:24 PM Szalontai Zoltán <szalontai.zoltan@t-online.hu> wrote:

Hi,

 

We have a Class db.t2.medium database on AWS.

We use a procedure to transfer data records from the Source to the Target Schema.

Transfers are identified by the log_id field in the target table.

 

The procedure is:

1 all records are deleted from the Target table with the actual log_id value

2 a complicated SELECT (numerous tables are joined) is created on the Source system

3 a cursor is defined based on this SELECT

4 we go trough the CURSOR and insert new records into the Target table with this log_id

 

(Actually we have about 100 tables in the Target schema and the size of the database backup file is about 1GByte. But we do the same for all the Target tables.)

 

Our procedure is extremely slow for the first run: 3 days for the 100 tables. For the second and all subsequent run it is fast enough (15 minutes).

The only difference between the first run and all the others is that in the first run there are no records in the Target schema with this log_id.

 

It seems, that in the first step the DELETE operation makes free some “space”, and the INSET operation in the 4. step can reuse this space. But if no records are deleted in the first step, the procedure is extremely slow.

 

To speed up the first run we found the following workaround:

We inserted dummy records into the Target tables with the proper log_id, and really the first run became very fast again.

 

Is there any “normal” way to speed up this procedure?

In the production environment there will be only “first runs”, the same log_id will never be used again.

 

 

thank

Zoltán

 

 


 

--


 

--