Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

Поиск
Список
Период
Сортировка
От Steve Midgley
Тема Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres
Дата
Msg-id CAJexoSJ8nXoO9mxORayYrhiP+AOwz0Sifj8Y=x02cHLFCV9bVA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres  (Shaozhong SHI <shishaozhong@gmail.com>)
Список pgsql-sql

On Mon, Oct 4, 2021 at 1:19 PM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Hello, Steve,

That is interesting.

Send me the link for Python doing "Copy---From....CSV".

I would like to review and test.

Regards,

David

On Mon, 4 Oct 2021 at 18:30, Steve Midgley <science@misuse.org> wrote:


On Mon, Oct 4, 2021 at 8:55 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Has anyone tested this one?

I tried psql_insert_copy method, but I got the following error message.

to_sql() got an unexpected keyword argument 'method'

Can anyone shed light on this?

Interesting - I wasn't aware there was a faster method than "COPY ... FROM ... CSV" 

Maybe my best input is that I've always found COPY/FROM/CSV to be insanely fast and really easy to write from any standard console or ORM, so maybe the fractional performance increase from other methods isn't worth it, given how performant, reliable and easy to use this approach is? Not really an answer but food for thought..


The way I've implemented that (using Ruby, but very similar) is to actually copy from STDIN rather than CSV. In my implementation I wasn't running my import code on the Postgres server, so I couldn't get the CSV file onto the local DB machine. So I had to feed the data over the wire via STDIN. This is slower than if you can point Postgres to a CSV file that the Pg server itself can access on the file system, but it's more flexible and still remarkably fast. It can be even (a bit) faster if you don't feed STDIN line by line, but in my experience it's a nightmare to debug if you don't get a line number when an import file barfs.

Anyway, here's the Ruby code that shows the fast import concept: https://gist.github.com/science/393907d4123c87ed767bc81e9dd5a7da

I wrote this a LONG time ago, but I think the concepts are still relevant. IIRC, feeding Pg from STDIN was ~400x faster than using "insert into" or other SQL type commands to get the data in.

I hope this is helpful - I would think this code is easily portable to Python, as you just need to get a raw connection to Postgres from your ORM adapter and then model the commands the same way in the code above. Good luck and write with questions..

Steve

В списке pgsql-sql по дате отправления:

Предыдущее
От: Shaozhong SHI
Дата:
Сообщение: Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres
Следующее
От: Shaozhong SHI
Дата:
Сообщение: Is it possible to combine the power of Postgres and Python in do statement?