Re: Oracle to postgres migration via ora2pg (blob data)

Поиск
Список
Период
Сортировка
От Gilles Darold
Тема Re: Oracle to postgres migration via ora2pg (blob data)
Дата
Msg-id a0094d92-5bf3-45ec-ddd9-81688def0433@darold.net
обсуждение исходный текст
Ответ на Re: Oracle to postgres migration via ora2pg (blob data)  (Amol Tarte <amoltarte@gmail.com>)
Список pgsql-general
Le 31/07/2019 à 18:02, Amol Tarte a écrit :
FullConvert does this job much faster than ora2pg


With Warm Regards,
Amol Tarte,
Project Lead,
Rajdeep InfoTechno Pvt. Ltd.
Visit us at http://it.rajdeepgroup.com

On Wed 31 Jul, 2019, 5:16 PM Niels Jespersen, <NJN@dst.dk> wrote:

I would look at the source table in Oracle first. It looks a lot like audit data. Perhaps all content is not needed in Postgres. If it is, then the table and lobs may benefit from being reorganised in oracle.

 

Alter table CLIENT_DB_AUDIT_LOG move;

Alter table CLIENT_DB_AUDIT_LOG  move lob (SYS_LOB0000095961C00008$$);

-- Three more of these.

 

The syntax is from my the back of my head. You may need to look the details up.

 

Niels

 

 

Fra: Daulat Ram <Daulat.Ram@exponential.com>
Sendt: 31. juli 2019 13:32
Til: pgsql-general@lists.postgresql.org; pgsql-performance@lists.postgresql.org
Emne: Oracle to postgres migration via ora2pg (blob data)

 

Hello team,

 

We have to migrate a schema from oracle to postgres but there is one table that is having following large lob segments.  This table is taking time to export. What parameters we have to set in ora2pg.conf to speed up the data export by ora2pg.

 

Table:    CLIENT_DB_AUDIT_LOG

 

LOBSEGMENT           SYS_LOB0000095961C00008$$          80.26

LOBSEGMENT           SYS_LOB0000095961C00007$$          79.96

LOBSEGMENT           SYS_LOB0000094338C00008$$           8.84

LOBSEGMENT           SYS_LOB0000084338C00007$$           8.71

LOBSEGMENT           SYS_LOB0000085961C00009$$           5.32

 

VM Details are:

 

RAM  8GB

VCPUs 2 VCPU

Disk 40GB

 

Thanks,


Hi,

Before using impressive commercial products you can try some additional configuration with Ora2pg.

The only solution to improve data migration performances is to use parallelism. The problem with BLOB is that they have to be converted into hex to be inserted into a bytea. The internal function in Ora2Pg that responsible of this job is _escape_lob(). The other problem is that Oracle is very slow to send the BLOB to Ora2Pg, I don't know if commercial products are better at this point but I have challenged Ora2Pg with Kettle some years ago without do much differences. So what you should do first is to set the following in your ora2pg.conf:

NO_LOB_LOCATOR    1
LONGREADLEN        100000000

This will force Oracle to send the full content of the BLOB in a single pass otherwise it will use small chunks. The only drawback is that you have to set LONGREADLEN to the highest BLOB size in your table to not throw and LONGTRUNC error.

That also mean that for each ROW returned DBD::Oracle (Perl driver for Oracle) will allocate at least LONGREADLEN in memory and we want to extract as much rows as possible. You have understood that your 8GB of memory will limit the quantity of rows that can be exported at the same time.

The other point is that Oracle is slow so you have to parallelize data export. Use -J 4 at command line to create 4 simultaneous process to data export. Parallelization on Oracle side is only possible if you have a numeric column that can be used to split the data using modulo 4 in this case. This is a basic implementation but it is enough in most cases.

Converting BLOB to Bytea consume lot of cpu cycle too so it is a good practice to parallelize this work too. Use -j 2 or -j 3 for this work. The number of parallelisation process should be tested because there is a limit where you will not win anything.

If you have, let's say 32GB of memory and 12 cpu you could try a command like :

    ora2pg -c ora2pg.conf -J 4 -j 3 -t CLIENT_DB_AUDIT_LOG -L 500

If you have less resources don't forget that -J and -j must be multiplied to have the number of process that Ora2Pg will parallelize. The -L option (DATA_LIMIT) is used to reduce the number of row extracted at a time. Here with a value of 500 it will process 50 (DATA_LIMIT/10) rows with BLOB at a time. If the table do not have any BLOB it will use 500 row at a time. For most tables this parameter should be set to 10000 up to 250000. If you have lot of memory the value can be higher. If you think it is too low you can set BLOB_LIMIT in ora2pg.conf to set it at a higher value.

However Ora2Pg will show you the data migration speed so you can adjust all these parameters to see if you have some performances gains. If you want to know exactly at which speed Oracle is able to send the data add --oracle_speed to the ora2pg command. Ora2Pg will only extract data from Oracle, there will be no bytea transformation or data writing, just the full Oracle speed. You can do some test with the value of the -J option to see what is the best value. On the other side you can use --ora2pg_speed option to see at which speed Ora2Pg is able to convert the data, nothing will be written too. Use it to know if you have some win with the value of the -j option. Don't forget to do some additional test with the BLOB_LIMIT value to see if there some more improvement. If someone can prove me that they have better performances at Oracle data extraction side I will be pleased to look at this code.

I hope this will help.
 

Regards,
-- 
Gilles Darold
http://www.darold.net/

В списке pgsql-general по дате отправления:

Предыдущее
От: "Peter J. Holzer"
Дата:
Сообщение: Re: How do I create a Backup Operator account ?
Следующее
От: Julie Nishimura
Дата:
Сообщение: adding more space to the existing server