Обсуждение: COPY from STDIN vs file with large CSVs

Поиск

Список

Период

Сортировка

COPY from STDIN vs file with large CSVs

От

Wells Oliver

Дата:

08 января 2020 г., 16:54:14

I have a CSV that's ~30GB. Some 400m rows. Would there be a meaningful performance difference to run COPY from STDIN using: cat f.csv | psql "COPY .. FROM STDIN WITH CSV" versus just doing "COPY ... FROM 'f.csv' WITH CSV"?

Thanks. It took about four hours to copy one and I felt that was a little much.

Wells Oliver
wells.oliver@gmail.com

RE: COPY from STDIN vs file with large CSVs

От

Indirani Venkopa Chetty

Дата:

08 января 2020 г., 17:05:17

Hi Wells,

I am trying import CSV file in to Amazon RDS PostgreSQL

I am getting following error , In internal PostgreSQL it works

Here is the PSQL Script : can you also help me with Powershell to execute this script

COPY product."tblProductLicenseType" FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/tblProductLicenseType.csv' DELIMITER E',' NULL '\N' QUOTE '"' HEADER CSV ;

COPY product."tblProductLicense" FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/tblProductLicense.csv' WITH (format csv, DELIMITER ',', header true, force_null ("LicenseId", "Season", "ProductCode", "Account", "UserId", "StatusId", "CreateDate", "UpdateDate", "SourceId", ord_num, item_num, "lType"), ENCODING 'UTF8' );

COPY product."tblCobranderUserSeasonProduct" FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/tblCobranderUserSeasonProduct.csv' WITH (format csv, DELIMITER ',', header true, force_null ("uspId", "CobranderUserLinkId", season, prod_cd, "statusId", "statusDate", "createdBy", "createdDate", "modifiedBy", "modifiedDate"), ENCODING 'UTF8' );

COPY product."tblDailyFundedBankProducts" FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/tblDailyFundedBankProducts.csv' WITH (format csv, DELIMITER ',', header true, force_null ("Account", "UserID", "BankProductSeason", "TransmitterPaymentDate", "FundedBankProducts", "ProtectionPlusFunded"), ENCODING 'UTF8' );

COPY product."tblProductSoftwarePackage"("ProductCode", "Season", "SoftwarePackageId", "CreateDate", "UpdateDate" , "createdBy", "updatedBy") FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/tblProductSoftwarePackage.csv' WITH (format csv, DELIMITER ',', header true, force_null ("ProductCode", "Season", "SoftwarePackageId", "CreateDate", "UpdateDate", "createdBy", "updatedBy"), ENCODING 'UTF8' );

COPY product."tblUserProductEval" FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/tblUserProductEval.csv' WITH (format csv, DELIMITER ',', header true, force_null ("evalId", "userId", "prodCd", "licenseId", "evalStartDate", "evalEndDate", "createdBy", "createdDate", "modifiedBy", "modifiedDate"), ENCODING 'UTF8' );

COPY orders."OrderType" FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/OrderType.csv' WITH (format csv, DELIMITER ',', header true, force_null ("OrderTypeCode", "Description"), ENCODING 'UTF8' );

COPY orders.orders(ord_num, ord_stat, account, user_id, season, county, tax_rate, tax_amt, disc_pct, disc_amt, freight, ord_tot, bal_due, pmt1_amt, pmt1_dt, pmt2_amt, pmt2_dt, note1, note2, sales_exec, split, chk_num1, chk_name1, chk_amt1, exp_mon1, exp_yr1, auth1, ref1, chk_num2, chk_name2, chk_amt2, exp_mon2, exp_yr2, auth2, ref2, b_fname, b_lname, b_company, b_addr1, b_addr2, b_city, b_state, b_zip, s_phone, s_fname, s_lname, s_company, s_addr1, s_addr2, s_city, s_state, s_zip, orig_by, orig_dt, edit_by, edit_dt, security_cd1, security_cd2, export_date, exec_email, s_fax, do_not_ship, ship_email, "CommissionType", "OrderType", "PaymentPlan", "sfOrderId")

FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/orders.csv' WITH (format csv, DELIMITER ',', header true, force_null (ord_num, ord_stat, account, user_id, season, county, tax_rate, tax_amt, disc_pct, disc_amt, freight, ord_tot, bal_due, pmt1_amt, pmt1_dt, pmt2_amt, pmt2_dt, note1, note2, sales_exec, split, chk_num1, chk_name1, chk_amt1, exp_mon1, exp_yr1, auth1, ref1, chk_num2, chk_name2, chk_amt2, exp_mon2, exp_yr2, auth2, ref2, b_fname, b_lname, b_company, b_addr1, b_addr2, b_city, b_state, b_zip, s_phone, s_fname, s_lname, s_company, s_addr1, s_addr2, s_city, s_state, s_zip, orig_by, orig_dt, edit_by, edit_dt, security_cd1, security_cd2, export_date, exec_email, s_fax, do_not_ship, ship_email, "CommissionType", "OrderType", "PaymentPlan", "sfOrderId"), ENCODING 'UTF8' );

COPY orders.ord_items FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/ord_items.csv' WITH (format csv, DELIMITER ',', header true, force_null (item_num, ord_num, prod_cd, qty, uprice, dt, bank_id, ship_via, bank_stat, "sfOrderLineId"), ENCODING 'UTF8' );

COPY orders.ord_pkg FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/ord_pkg.csv' WITH (format csv, DELIMITER ',', header true, force_null ("rowID", parent_cd, child_cd, season, sequence_number), ENCODING 'UTF8' );

COPY orders.ord_stock FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/ord_stock.csv' WITH (format csv, DELIMITER ',', header true, force_null ("rowID", prod_cd, season, idescr, qty, uprice, backorder, shippable, bank_dependent, derived, taxable, pkg_id1, pkg_id2, act_code, "Discountable", "UserActivationTypeID", "eFileActive"), ENCODING 'UTF8' );

COPY orders.orders_ext FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/orders_ext.csv' WITH (format csv, DELIMITER ',', header true, force_null (ord_num, svc_fee, rest_fee, ref_credit), ENCODING 'UTF8' );

COPY orders.sls_comm4ord FROM E'//XXXXXXXXXXX/Product_Licensing_Orders/Development/sls_comm4ord.csv' WITH (format csv, DELIMITER ',', header true, force_null (ord_comm_id, ord_num, sales_exec, amount, upd_by, upd_dt, com_dt), ENCODING 'UTF8' );

RETURN True;

Thanks,

chetty

From: Wells Oliver <wells.oliver@gmail.com>
Sent: Wednesday, January 8, 2020 8:54 AM
To: pgsql-admin <pgsql-admin@postgresql.org>
Subject: COPY from STDIN vs file with large CSVs

Thanks. It took about four hours to copy one and I felt that was a little much.

Wells Oliver
wells.oliver@gmail.com

NOTICE: This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free, as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender, therefore, does not accept liability for any errors or omissions in the contents of this message which arise as a result of email transmission. Unless stated to the contrary, any opinions or comments are personal to the writer and do not represent the official view of Crosslink Professional Tax Solutions.

Вложения

image001.jpg

Re: COPY from STDIN vs file with large CSVs

От

Ron

Дата:

08 января 2020 г., 17:08:56

On 1/8/20 10:54 AM, Wells Oliver wrote:
> I have a CSV that's ~30GB. Some 400m rows. Would there be a meaningful 
> performance difference to run COPY from STDIN using: cat f.csv | psql 
> "COPY .. FROM STDIN WITH CSV" versus just doing "COPY ... FROM 'f.csv' 
> WITH CSV"?
>
> Thanks. It took about four hours to copy one and I felt that was a little 
> much.

catting the file starts another process, and opens a pipe.  That can't be 
faster than "COPY ... FROM ... WITH CSV".

pg_bulkload (which might be in your repository) is probably what you really 
want.

-- 
Angular momentum makes the world go 'round.

Re: COPY from STDIN vs file with large CSVs

От

Scott Ribe

Дата:

08 января 2020 г., 17:49:07

> On Jan 8, 2020, at 10:05 AM, Indirani Venkopa Chetty <ivchetty@crosslinktax.com> wrote:
>
> I am trying import CSV file in to Amazon RDS PostgreSQL
> I am getting following error , In internal PostgreSQL it works
> <image001.jpg>
>

The error message is telling you that you will have to connect to postgres as a user with pg superuser privileges.

Re: COPY from STDIN vs file with large CSVs

От

Wells Oliver

Дата:

08 января 2020 г., 17:50:14

Thanks, I had looked into pg_bulkload a bit but it does not seem to be available for PG 12? It's not in the extension directory, it's not available through apt-cache search, and I have all kinds of issues with it finding pgcommon and pgport when I try to build from source. Using Ubuntu 18 LTS...

On Wed, Jan 8, 2020 at 9:09 AM Ron <ronljohnsonjr@gmail.com> wrote:

On 1/8/20 10:54 AM, Wells Oliver wrote:
> I have a CSV that's ~30GB. Some 400m rows. Would there be a meaningful
> performance difference to run COPY from STDIN using: cat f.csv | psql
> "COPY .. FROM STDIN WITH CSV" versus just doing "COPY ... FROM 'f.csv'
> WITH CSV"?
>
> Thanks. It took about four hours to copy one and I felt that was a little
> much.

catting the file starts another process, and opens a pipe. That can't be
faster than "COPY ... FROM ... WITH CSV".

pg_bulkload (which might be in your repository) is probably what you really
want.

--
Angular momentum makes the world go 'round.

Wells Oliver
wells.oliver@gmail.com

Re: COPY from STDIN vs file with large CSVs

От

"David G. Johnston"

Дата:

08 января 2020 г., 18:33:51

On Wed, Jan 8, 2020 at 9:31 AM Indirani Venkopa Chetty <ivchetty@crosslinktax.com> wrote:

Hi Wells,
I am trying import CSV file in to Amazon RDS PostgreSQL
I am getting following error , In internal PostgreSQL it works

https://www.postgresql.org/docs/12/sql-copy.html

"COPY with a file name instructs the PostgreSQL server to directly read from or write to a file." - i.e., Even had the server accepted your command it wasn't going to find your file.

Use either psql's "\copy" meta command or "COPY FROM STDIN" (search around for the needed syntax to make that work).

David J.

Вложения

image001.jpg

Re: COPY from STDIN vs file with large CSVs

От

bricklen

Дата:

08 января 2020 г., 20:24:17

On Wed, Jan 8, 2020 at 8:55 AM Wells Oliver <wells.oliver@gmail.com> wrote:

I have a CSV that's ~30GB. Some 400m rows. Would there be a meaningful performance difference to run COPY from STDIN using: cat f.csv | psql "COPY .. FROM STDIN WITH CSV" versus just doing "COPY ... FROM 'f.csv' WITH CSV"?

If you're looking to speed up the loading - and your disk subsystem is decent - consider running your csv through the "split" command to decompose it into smaller CSV files. You can then load them in parallel using multiple psql sessions.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: COPY from STDIN vs file with large CSVs

COPY from STDIN vs file with large CSVs

RE: COPY from STDIN vs file with large CSVs

Вложения

Re: COPY from STDIN vs file with large CSVs

Re: COPY from STDIN vs file with large CSVs

Re: COPY from STDIN vs file with large CSVs

Re: COPY from STDIN vs file with large CSVs

Вложения

Re: COPY from STDIN vs file with large CSVs