Обсуждение: Most effective and fast way to load few Tbyte of data from flat files into postgresql

Поиск
Список
Период
Сортировка

Most effective and fast way to load few Tbyte of data from flat files into postgresql

От
Dirk Krautschick
Дата:
Hi,

what would be the fastest or most effective way to load few (5-10) TB of data from flat files into
a postgresql database, including some 1TB tables and blobs?

There is the copy command but there is no way for native parallelism, right? I have found pg_bulkload
but haven't tested it yet. As far I can see EDB has its EDB*Loader as a commercial option.

Anything else to recommend?

Thanks and best regards

Dirk

Re: Most effective and fast way to load few Tbyte of data from flat files into postgresql

От
"Peter J. Holzer"
Дата:
On 2020-08-24 21:17:36 +0000, Dirk Krautschick wrote:
> what would be the fastest or most effective way to load few (5-10) TB
> of data from flat files into a postgresql database, including some 1TB
> tables and blobs?
>
> There is the copy command but there is no way for native parallelism,
> right? I have found pg_bulkload but haven't tested it yet. As far I
> can see EDB has its EDB*Loader as a commercial option.

A single COPY isn't parallel, but you can run several of them in
parallel (that's what pg_restore -j N does). So the total time may be
dominated by your largest table (or I/O bandwidth).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: [E] Re: Most effective and fast way to load few Tbyte of data from flat files into postgresql

От
"Saha, Sushanta K"
Дата:
You can explore "pgloader" also.

.... Sushanta


On Tue, Aug 25, 2020 at 7:24 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2020-08-24 21:17:36 +0000, Dirk Krautschick wrote:
> what would be the fastest or most effective way to load few (5-10) TB
> of data from flat files into a postgresql database, including some 1TB
> tables and blobs?
>
> There is the copy command but there is no way for native parallelism,
> right? I have found pg_bulkload but haven't tested it yet. As far I
> can see EDB has its EDB*Loader as a commercial option.

A single COPY isn't parallel, but you can run several of them in
parallel (that's what pg_restore -j N does). So the total time may be
dominated by your largest table (or I/O bandwidth).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"


--

Sushanta Saha|MTS IV-Cslt-Sys Engrg|WebIaaS_DB Group|HQ - VerizonWireless
O 770.797.1260  C 770.714.6555 Iaas Support Line 949-286-8810

Re: Most effective and fast way to load few Tbyte of data from flat files into postgresql

От
Shaozhong SHI
Дата:


On Tue, 25 Aug 2020 at 12:24, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2020-08-24 21:17:36 +0000, Dirk Krautschick wrote:
> what would be the fastest or most effective way to load few (5-10) TB
> of data from flat files into a postgresql database, including some 1TB
> tables and blobs?
>
> There is the copy command but there is no way for native parallelism,
> right? I have found pg_bulkload but haven't tested it yet. As far I
> can see EDB has its EDB*Loader as a commercial option.

A single COPY isn't parallel, but you can run several of them in
parallel (that's what pg_restore -j N does). So the total time may be
dominated by your largest table (or I/O bandwidth).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"


This topic is interesting.  Any examples for parallel copy?

Regards,

SS