Обсуждение: Generate test data inserts - 1GB

Поиск
Список
Период
Сортировка

Generate test data inserts - 1GB

От
Shital A
Дата:
Hello

Postgresql 9.6

Need to generate 1GB test data in very less time. I got some techniques online but they take around 40mins for 400GB. Any quicker way? 


Thanks.

Re: Generate test data inserts - 1GB

От
Adrian Klaver
Дата:
On 8/9/19 4:12 AM, Shital A wrote:
> Hello
> 
> Postgresql 9.6
> 
> Need to generate 1GB test data in very less time. I got some techniques 
> online but they take around 40mins for 400GB. Any quicker way?

1) Postgres version?

2) Data going into single table or multiple tables?

3) What is the time frame you are trying to achieve?

4) What techniques have you tried?

5) If you need only 1GB why the 400GB number?


> 
> 
> Thanks.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Generate test data inserts - 1GB

От
Shital A
Дата:


On Fri, 9 Aug 2019, 20:08 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 8/9/19 4:12 AM, Shital A wrote:
> Hello
>
> Postgresql 9.6
>
> Need to generate 1GB test data in very less time. I got some techniques
> online but they take around 40mins for 400GB. Any quicker way?

1) Postgres version?

2) Data going into single table or multiple tables?

3) What is the time frame you are trying to achieve?

4) What techniques have you tried?

5) If you need only 1GB why the 400GB number?


>
>
> Thanks.
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Hello,

Sorry 400GB was a typo. Its 400 MB Details are below:

1) Postgres version?
9.6

2) Data going into single table or multiple tables?
Single table having Multiple columns

3) What is the time frame you are trying to achieve?
As quick as possible. 

4) What techniques have you tried?
Insert into with With statement, inserting 2000000 rows at a time. This takes 40 mins.

5) If you need only 1GB why the 400GB number?
That was 400MB. M checking size using \l+ option.


Thanks. 

Re: Generate test data inserts - 1GB

От
Adrian Klaver
Дата:
On 8/9/19 8:14 AM, Shital A wrote:
> 
> 
> On Fri, 9 Aug 2019, 20:08 Adrian Klaver, <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 8/9/19 4:12 AM, Shital A wrote:
>      > Hello
>      >
>      > Postgresql 9.6
>      >
>      > Need to generate 1GB test data in very less time. I got some
>     techniques
>      > online but they take around 40mins for 400GB. Any quicker way?
> 
>     1) Postgres version?
> 
>     2) Data going into single table or multiple tables?
> 
>     3) What is the time frame you are trying to achieve?
> 
>     4) What techniques have you tried?
> 
>     5) If you need only 1GB why the 400GB number?
> 
> 
>      >
>      >
>      > Thanks.
>      >
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 
> 
> 
> Hello,
> 
> Sorry 400GB was a typo. Its 400 MB Details are below:
> 
> 1) Postgres version?
> 9.6
> 
> 2) Data going into single table or multiple tables?
> Single table having Multiple columns
> 
> 3) What is the time frame you are trying to achieve?
> As quick as possible.
> 
> 4) What techniques have you tried?
> Insert into with With statement, inserting 2000000 rows at a time. This 
> takes 40 mins.

Might take a look at COPY:

https://www.postgresql.org/docs/11/sql-copy.html

or its psql equivalent:

\copy.

COPY runs as the server user and sees files relative to the server location.
\copy runs as the psql(client) user.

In either case suitability will depend on where you are sourcing the 
test data from and what format it is in to begin with.


If you want a more sophisticated wrapper over the above then:

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

> 
> 5) If you need only 1GB why the 400GB number?
> That was 400MB. M checking size using \l+ option.
> 
> 
> Thanks.
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Generate test data inserts - 1GB

От
Adrian Klaver
Дата:
On 8/9/19 8:14 AM, Shital A wrote:
> 

> Hello,

> 
> 4) What techniques have you tried?
> Insert into with With statement, inserting 2000000 rows at a time. This 
> takes 40 mins.
> 

To add to my previous post. If you already have data in a Postgres 
database then you could do:

pg_dump -d db -t some_table -a -f test_data.sql

That will dump the data only for the table in COPY format. Then you 
could apply that to your test database(after TRUNCATE on table, assuming 
you want to start fresh):

psql -d test_db -f test_data.sql




-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Generate test data inserts - 1GB

От
Shital A
Дата:


On Fri, 9 Aug 2019, 21:25 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 8/9/19 8:14 AM, Shital A wrote:
>

> Hello,

>
> 4) What techniques have you tried?
> Insert into with With statement, inserting 2000000 rows at a time. This
> takes 40 mins.
>

To add to my previous post. If you already have data in a Postgres
database then you could do:

pg_dump -d db -t some_table -a -f test_data.sql

That will dump the data only for the table in COPY format. Then you
could apply that to your test database(after TRUNCATE on table, assuming
you want to start fresh):

psql -d test_db -f test_data.sql




--
Adrian Klaver
adrian.klaver@aklaver.com

Thanks for the reply Adrian. 

Missed one requirement. Will these methods generate wal logs needed for replication? 

Actually the data is to check if replication catches up. Below is scenario :

1. Have a master slave cluster with replication setup

2. Kill master so that standby takes over. We are using pacemaker for auto failure. 
Insert 1 GB data in new master while replication is broken.

3 Start oldnode as standby and check if 1GB data gets replicated.

As such testing might be frequent we needed to spend minimum time in generating data. 
Master slave are in same network. 

Thanks ! 

Re: Generate test data inserts - 1GB

От
Rob Sargent
Дата:




--
Adrian Klaver
adrian.klaver@aklaver.com

Thanks for the reply Adrian. 

Missed one requirement. Will these methods generate wal logs needed for replication? 

Actually the data is to check if replication catches up. Below is scenario :

1. Have a master slave cluster with replication setup

2. Kill master so that standby takes over. We are using pacemaker for auto failure. 
Insert 1 GB data in new master while replication is broken.

3 Start oldnode as standby and check if 1GB data gets replicated.

As such testing might be frequent we needed to spend minimum time in generating data. 
Master slave are in same network. 

Thanks !
How quickly does your production instance grow, in terms of GB/per-unit-time?

Re: Generate test data inserts - 1GB

От
Adrian Klaver
Дата:
On 8/9/19 9:51 AM, Shital A wrote:
> 
> 
> On Fri, 9 Aug 2019, 21:25 Adrian Klaver, <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 8/9/19 8:14 AM, Shital A wrote:
>      >
> 
>      > Hello,
> 
>      >
>      > 4) What techniques have you tried?
>      > Insert into with With statement, inserting 2000000 rows at a
>     time. This
>      > takes 40 mins.
>      >
> 
>     To add to my previous post. If you already have data in a Postgres
>     database then you could do:
> 
>     pg_dump -d db -t some_table -a -f test_data.sql
> 
>     That will dump the data only for the table in COPY format. Then you
>     could apply that to your test database(after TRUNCATE on table,
>     assuming
>     you want to start fresh):
> 
>     psql -d test_db -f test_data.sql
> 
> 
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 
> 
> Thanks for the reply Adrian.
> 
> Missed one requirement. Will these methods generate wal logs needed for 
> replication?

For COPY, AFAIK yes.

To verify set up a small test table and COPY to it and sees if the data 
shows up on the standby.

pg_bulkload:

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

"IMPORTANT NOTE: Under streaming replication environment, pg_bulkload 
does not work properly. See here for details.



> 
> Actually the data is to check if replication catches up. Below is scenario :
> 
> 1. Have a master slave cluster with replication setup
> 
> 2. Kill master so that standby takes over. We are using pacemaker for 
> auto failure.
> Insert 1 GB data in new master while replication is broken.
> 
> 3 Start oldnode as standby and check if 1GB data gets replicated.
> 
> As such testing might be frequent we needed to spend minimum time in 
> generating data.
> Master slave are in same network.
> 
> Thanks !


-- 
Adrian Klaver
adrian.klaver@aklaver.com