Обсуждение: Moving data from one set of tables to another?

Поиск

Список

Период

Сортировка

Moving data from one set of tables to another?

От

Howard Eglowstein

Дата:

18 сентября 2008 г., 19:10:13

I have three tables called 'data_a', 'data_b' and 'data_c' which each
have 50 columns. One of the columns in each is 'id' and is used to keep
track of which data in data_b and data_c corresponds to a row in
data_a.  If I want to get all of the data in all 150 fields for this
month (for example), I can get it with:

select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and
timestamp <= '2008-09-30 23:59:59'

What I need to do is execute this search which might return several
thousand rows and write the same structure into 'new_a', 'new_b' and
'new_c'. What i'm doing now in a C program is executing the search
above. Then I execute:

INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of them])
VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
Get the ID that was assigned to this row since 'id' is a serial field
and the number is assigned sequentially. Say it comes back as '1'.
INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them]) VALUES
('1', 'ABC', 'DEF', ...);
INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them]) VALUES
('1', 'ABC', 'DEF', ...);

That moves a copy of the three rows of data form the three tables into
the three separate new tables.
 From the original group of tables, the id for these rows was, let's
say, '1234'. Then I execute:

DELETE FROM data_a where id='1234';
DELETE FROM data_b where id='1234';
DELETE FROM data_c where id='1234';

That deletes the old data.

This works fine and gives me exactly what I wanted, but is there a
better way? This is 7 SQL calls and it takes about 3 seconds per moved
record on our Linux box.

Any thoughts or suggestions would be appreciated.

Thanks,

Howard

Re: Moving data from one set of tables to another?

От

Carol Walter

Дата:

18 сентября 2008 г., 21:17:47

What do you want for your end product?  Are the old tables empty
after you put the data into the new tables?

Carol

On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:

> I have three tables called 'data_a', 'data_b' and 'data_c' which
> each have 50 columns. One of the columns in each is 'id' and is
> used to keep track of which data in data_b and data_c corresponds
> to a row in data_a.  If I want to get all of the data in all 150
> fields for this month (for example), I can get it with:
>
> select * from (data_a, data_b, data_c) where data_a.id=data_b.id
> AND data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00'
> and timestamp <= '2008-09-30 23:59:59'
>
> What I need to do is execute this search which might return several
> thousand rows and write the same structure into 'new_a', 'new_b'
> and 'new_c'. What i'm doing now in a C program is executing the
> search above. Then I execute:
>
> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of
> them]) VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
> Get the ID that was assigned to this row since 'id' is a serial
> field and the number is assigned sequentially. Say it comes back as
> '1'.
> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them])
> VALUES ('1', 'ABC', 'DEF', ...);
> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them])
> VALUES ('1', 'ABC', 'DEF', ...);
>
> That moves a copy of the three rows of data form the three tables
> into the three separate new tables.
> From the original group of tables, the id for these rows was, let's
> say, '1234'. Then I execute:
>
> DELETE FROM data_a where id='1234';
> DELETE FROM data_b where id='1234';
> DELETE FROM data_c where id='1234';
>
> That deletes the old data.
>
> This works fine and gives me exactly what I wanted, but is there a
> better way? This is 7 SQL calls and it takes about 3 seconds per
> moved record on our Linux box.
>
> Any thoughts or suggestions would be appreciated.
>
> Thanks,
>
> Howard
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice

Re: Moving data from one set of tables to another?

От

Howard Eglowstein

Дата:

18 сентября 2008 г., 22:39:40

Somewhat empty, yes. The single set of 'data_' tables contains 3 years
worth of data. I want to move 2 years worth out into the 'new_' tables.
When I'm done, there will still be 1 year's worth of data left in the
original table.

Howard

Carol Walter wrote:
> What do you want for your end product?  Are the old tables empty after
> you put the data into the new tables?
>
> Carol
>
> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>
>> I have three tables called 'data_a', 'data_b' and 'data_c' which each
>> have 50 columns. One of the columns in each is 'id' and is used to
>> keep track of which data in data_b and data_c corresponds to a row in
>> data_a.  If I want to get all of the data in all 150 fields for this
>> month (for example), I can get it with:
>>
>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and
>> timestamp <= '2008-09-30 23:59:59'
>>
>> What I need to do is execute this search which might return several
>> thousand rows and write the same structure into 'new_a', 'new_b' and
>> 'new_c'. What i'm doing now in a C program is executing the search
>> above. Then I execute:
>>
>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of
>> them]) VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>> Get the ID that was assigned to this row since 'id' is a serial field
>> and the number is assigned sequentially. Say it comes back as '1'.
>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them])
>> VALUES ('1', 'ABC', 'DEF', ...);
>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them])
>> VALUES ('1', 'ABC', 'DEF', ...);
>>
>> That moves a copy of the three rows of data form the three tables
>> into the three separate new tables.
>> From the original group of tables, the id for these rows was, let's
>> say, '1234'. Then I execute:
>>
>> DELETE FROM data_a where id='1234';
>> DELETE FROM data_b where id='1234';
>> DELETE FROM data_c where id='1234';
>>
>> That deletes the old data.
>>
>> This works fine and gives me exactly what I wanted, but is there a
>> better way? This is 7 SQL calls and it takes about 3 seconds per
>> moved record on our Linux box.
>>
>> Any thoughts or suggestions would be appreciated.
>>
>> Thanks,
>>
>> Howard
>>
>> --
>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-novice
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.169 / Virus Database: 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>
>

Re: Moving data from one set of tables to another?

От

"Sean Davis"

Дата:

18 сентября 2008 г., 23:20:13

On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
<howard@yankeescientific.com> wrote:
> Somewhat empty, yes. The single set of 'data_' tables contains 3 years worth
> of data. I want to move 2 years worth out into the 'new_' tables. When I'm
> done, there will still be 1 year's worth of data left in the original table.
>
> Howard
>
> Carol Walter wrote:
>>
>> What do you want for your end product?  Are the old tables empty after you
>> put the data into the new tables?
>>
>> Carol
>>
>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>
>>> I have three tables called 'data_a', 'data_b' and 'data_c' which each
>>> have 50 columns. One of the columns in each is 'id' and is used to keep
>>> track of which data in data_b and data_c corresponds to a row in data_a.  If
>>> I want to get all of the data in all 150 fields for this month (for
>>> example), I can get it with:
>>>
>>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and timestamp
>>> <= '2008-09-30 23:59:59'

You might want to look at insert into ... select ...

You should be able to do this with 1 query per new table (+ the
deletes, obviously).  For a few thousand records, I would expect that
the entire process might take a few seconds.

Sean

>>> What I need to do is execute this search which might return several
>>> thousand rows and write the same structure into 'new_a', 'new_b' and
>>> 'new_c'. What i'm doing now in a C program is executing the search above.
>>> Then I execute:
>>>
>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of them])
>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>> Get the ID that was assigned to this row since 'id' is a serial field and
>>> the number is assigned sequentially. Say it comes back as '1'.
>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them]) VALUES
>>> ('1', 'ABC', 'DEF', ...);
>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them]) VALUES
>>> ('1', 'ABC', 'DEF', ...);
>>>
>>> That moves a copy of the three rows of data form the three tables into
>>> the three separate new tables.
>>> From the original group of tables, the id for these rows was, let's say,
>>> '1234'. Then I execute:
>>>
>>> DELETE FROM data_a where id='1234';
>>> DELETE FROM data_b where id='1234';
>>> DELETE FROM data_c where id='1234';
>>>
>>> That deletes the old data.
>>>
>>> This works fine and gives me exactly what I wanted, but is there a better
>>> way? This is 7 SQL calls and it takes about 3 seconds per moved record on
>>> our Linux box.
>>>
>>> Any thoughts or suggestions would be appreciated.
>>>
>>> Thanks,
>>>
>>> Howard
>>>
>>> --
>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-novice
>>
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>
>>
>
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>

Re: Moving data from one set of tables to another?

От

Howard Eglowstein

Дата:

18 сентября 2008 г., 23:29:17

What confuses me is that I need to do the one select with all three
tables and then do three inserts, no? The results is that the 150 fields
I get back from the select have to be split into 3 groups of 50 fields
each and then written into three tables.

What you're suggesting is that there is some statement which could do
the select and the three inserts at once?

Howard

Sean Davis wrote:
> You might want to look at insert into ... select ...
>
> You should be able to do this with 1 query per new table (+ the
> deletes, obviously).  For a few thousand records, I would expect that
> the entire process might take a few seconds.
>
> Sean
>

> On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
> <howard@yankeescientific.com> wrote:
>
>> Somewhat empty, yes. The single set of 'data_' tables contains 3 years worth
>> of data. I want to move 2 years worth out into the 'new_' tables. When I'm
>> done, there will still be 1 year's worth of data left in the original table.
>>
>> Howard
>>
>> Carol Walter wrote:
>>
>>> What do you want for your end product?  Are the old tables empty after you
>>> put the data into the new tables?
>>>
>>> Carol
>>>
>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>
>>>
>>>> I have three tables called 'data_a', 'data_b' and 'data_c' which each
>>>> have 50 columns. One of the columns in each is 'id' and is used to keep
>>>> track of which data in data_b and data_c corresponds to a row in data_a.  If
>>>> I want to get all of the data in all 150 fields for this month (for
>>>> example), I can get it with:
>>>>
>>>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and timestamp
>>>> <= '2008-09-30 23:59:59'
>>>>
>
>
>
>>>> What I need to do is execute this search which might return several
>>>> thousand rows and write the same structure into 'new_a', 'new_b' and
>>>> 'new_c'. What i'm doing now in a C program is executing the search above.
>>>> Then I execute:
>>>>
>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of them])
>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>> Get the ID that was assigned to this row since 'id' is a serial field and
>>>> the number is assigned sequentially. Say it comes back as '1'.
>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them]) VALUES
>>>> ('1', 'ABC', 'DEF', ...);
>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them]) VALUES
>>>> ('1', 'ABC', 'DEF', ...);
>>>>
>>>> That moves a copy of the three rows of data form the three tables into
>>>> the three separate new tables.
>>>> From the original group of tables, the id for these rows was, let's say,
>>>> '1234'. Then I execute:
>>>>
>>>> DELETE FROM data_a where id='1234';
>>>> DELETE FROM data_b where id='1234';
>>>> DELETE FROM data_c where id='1234';
>>>>
>>>> That deletes the old data.
>>>>
>>>> This works fine and gives me exactly what I wanted, but is there a better
>>>> way? This is 7 SQL calls and it takes about 3 seconds per moved record on
>>>> our Linux box.
>>>>
>>>> Any thoughts or suggestions would be appreciated.
>>>>
>>>> Thanks,
>>>>
>>>> Howard
>>>>
>>>> --
>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>
>>>
>>>
>> --
>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-novice
>>
>>
>
>
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.169 / Virus Database: 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>
>

Re: Moving data from one set of tables to another?

От

"Harold A. Giménez Ch."

Дата:

18 сентября 2008 г., 23:46:31

What would be wrong about simply copying the data with INSERT INTO [...] SELECT [...]; ?? This way you keep the same ids in the 'new_' tables as in the 'data_' tables.

Then you can reset the seq values:

select setval('data_a_seq', (SELECT MAX(id) FROM data_a)+1;

On Thu, Sep 18, 2008 at 7:20 PM, Sean Davis <sdavis2@mail.nih.gov> wrote:

On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
<howard@yankeescientific.com> wrote:
> Somewhat empty, yes. The single set of 'data_' tables contains 3 years worth
> of data. I want to move 2 years worth out into the 'new_' tables. When I'm
> done, there will still be 1 year's worth of data left in the original table.
>
> Howard
>
> Carol Walter wrote:
>>
>> What do you want for your end product? Are the old tables empty after you
>> put the data into the new tables?
>>
>> Carol
>>
>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>
>>> I have three tables called 'data_a', 'data_b' and 'data_c' which each
>>> have 50 columns. One of the columns in each is 'id' and is used to keep
>>> track of which data in data_b and data_c corresponds to a row in data_a. If
>>> I want to get all of the data in all 150 fields for this month (for
>>> example), I can get it with:
>>>
>>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and timestamp
>>> <= '2008-09-30 23:59:59'

You might want to look at insert into ... select ...

You should be able to do this with 1 query per new table (+ the
deletes, obviously). For a few thousand records, I would expect that
the entire process might take a few seconds.

Sean

>>> What I need to do is execute this search which might return several
>>> thousand rows and write the same structure into 'new_a', 'new_b' and
>>> 'new_c'. What i'm doing now in a C program is executing the search above.
>>> Then I execute:
>>>
>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of them])
>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>> Get the ID that was assigned to this row since 'id' is a serial field and
>>> the number is assigned sequentially. Say it comes back as '1'.
>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them]) VALUES
>>> ('1', 'ABC', 'DEF', ...);
>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them]) VALUES
>>> ('1', 'ABC', 'DEF', ...);
>>>
>>> That moves a copy of the three rows of data form the three tables into
>>> the three separate new tables.
>>> From the original group of tables, the id for these rows was, let's say,
>>> '1234'. Then I execute:
>>>
>>> DELETE FROM data_a where id='1234';
>>> DELETE FROM data_b where id='1234';
>>> DELETE FROM data_c where id='1234';
>>>
>>> That deletes the old data.
>>>
>>> This works fine and gives me exactly what I wanted, but is there a better
>>> way? This is 7 SQL calls and it takes about 3 seconds per moved record on
>>> our Linux box.
>>>
>>> Any thoughts or suggestions would be appreciated.
>>>
>>> Thanks,
>>>
>>> Howard
>>>
>>> --
>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-novice
>>
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>
>>
>
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: Moving data from one set of tables to another?

От

"Sean Davis"

Дата:

19 сентября 2008 г., 07:55:35

On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
<howard@yankeescientific.com> wrote:
> What confuses me is that I need to do the one select with all three tables
> and then do three inserts, no? The results is that the 150 fields I get back
> from the select have to be split into 3 groups of 50 fields each and then
> written into three tables.

You do the insert part of the command three times, once for each new
table, so three separate SQL statements.  The select remains basically
the same for all three, with only the column selection changing
(data_a.* when inserting into new_a, data_b.* when inserting into
new_b, etc.).  Just leave the ids the same as in the first set of
tables.  There isn't a need to change them in nearly every case.  If
you need to add a new ID column, you can do that as a serial column in
the new tables, but I would stick to the original IDs, if possible.

Sean

> What you're suggesting is that there is some statement which could do the
> select and the three inserts at once?
>
> Howard
>
> Sean Davis wrote:
>>
>> You might want to look at insert into ... select ...
>>
>> You should be able to do this with 1 query per new table (+ the
>> deletes, obviously).  For a few thousand records, I would expect that
>> the entire process might take a few seconds.
>>
>> Sean
>>
>
>> On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
>> <howard@yankeescientific.com> wrote:
>>
>>>
>>> Somewhat empty, yes. The single set of 'data_' tables contains 3 years
>>> worth
>>> of data. I want to move 2 years worth out into the 'new_' tables. When
>>> I'm
>>> done, there will still be 1 year's worth of data left in the original
>>> table.
>>>
>>> Howard
>>>
>>> Carol Walter wrote:
>>>
>>>>
>>>> What do you want for your end product?  Are the old tables empty after
>>>> you
>>>> put the data into the new tables?
>>>>
>>>> Carol
>>>>
>>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>>
>>>>
>>>>>
>>>>> I have three tables called 'data_a', 'data_b' and 'data_c' which each
>>>>> have 50 columns. One of the columns in each is 'id' and is used to keep
>>>>> track of which data in data_b and data_c corresponds to a row in
>>>>> data_a.  If
>>>>> I want to get all of the data in all 150 fields for this month (for
>>>>> example), I can get it with:
>>>>>
>>>>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
>>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and
>>>>> timestamp
>>>>> <= '2008-09-30 23:59:59'
>>>>>
>>
>>
>>
>>>>>
>>>>> What I need to do is execute this search which might return several
>>>>> thousand rows and write the same structure into 'new_a', 'new_b' and
>>>>> 'new_c'. What i'm doing now in a C program is executing the search
>>>>> above.
>>>>> Then I execute:
>>>>>
>>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of them])
>>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>>> Get the ID that was assigned to this row since 'id' is a serial field
>>>>> and
>>>>> the number is assigned sequentially. Say it comes back as '1'.
>>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them]) VALUES
>>>>> ('1', 'ABC', 'DEF', ...);
>>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them]) VALUES
>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>
>>>>> That moves a copy of the three rows of data form the three tables into
>>>>> the three separate new tables.
>>>>> From the original group of tables, the id for these rows was, let's
>>>>> say,
>>>>> '1234'. Then I execute:
>>>>>
>>>>> DELETE FROM data_a where id='1234';
>>>>> DELETE FROM data_b where id='1234';
>>>>> DELETE FROM data_c where id='1234';
>>>>>
>>>>> That deletes the old data.
>>>>>
>>>>> This works fine and gives me exactly what I wanted, but is there a
>>>>> better
>>>>> way? This is 7 SQL calls and it takes about 3 seconds per moved record
>>>>> on
>>>>> our Linux box.
>>>>>
>>>>> Any thoughts or suggestions would be appreciated.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Howard
>>>>>
>>>>> --
>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>> To make changes to your subscription:
>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>
>>>>
>>>>
>>>
>>> --
>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>
>>>
>>
>>  ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>
>>
>
>

Re: Moving data from one set of tables to another?

От

Howard Eglowstein

Дата:

19 сентября 2008 г., 11:43:58

So you'd agree then that I'll need 7 SQL statements but that I could
stack the INSERT and the first SELECT if I wanted to? Cool. That's what
I ended up with in C code and it's working pretty well. I did some
indexing on the database and got the whole transaction down to about
150ms for the sequence. I guess that's as good as it's going to get.

Thanks for the explanation!

Howard

Sean Davis wrote:
> On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
> <howard@yankeescientific.com> wrote:
>
>> What confuses me is that I need to do the one select with all three tables
>> and then do three inserts, no? The results is that the 150 fields I get back
>> from the select have to be split into 3 groups of 50 fields each and then
>> written into three tables.
>>
>
> You do the insert part of the command three times, once for each new
> table, so three separate SQL statements.  The select remains basically
> the same for all three, with only the column selection changing
> (data_a.* when inserting into new_a, data_b.* when inserting into
> new_b, etc.).  Just leave the ids the same as in the first set of
> tables.  There isn't a need to change them in nearly every case.  If
> you need to add a new ID column, you can do that as a serial column in
> the new tables, but I would stick to the original IDs, if possible.
>
> Sean
>
>
>> What you're suggesting is that there is some statement which could do the
>> select and the three inserts at once?
>>
>> Howard
>>
>> Sean Davis wrote:
>>
>>> You might want to look at insert into ... select ...
>>>
>>> You should be able to do this with 1 query per new table (+ the
>>> deletes, obviously).  For a few thousand records, I would expect that
>>> the entire process might take a few seconds.
>>>
>>> Sean
>>>
>>>
>>> On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
>>> <howard@yankeescientific.com> wrote:
>>>
>>>
>>>> Somewhat empty, yes. The single set of 'data_' tables contains 3 years
>>>> worth
>>>> of data. I want to move 2 years worth out into the 'new_' tables. When
>>>> I'm
>>>> done, there will still be 1 year's worth of data left in the original
>>>> table.
>>>>
>>>> Howard
>>>>
>>>> Carol Walter wrote:
>>>>
>>>>
>>>>> What do you want for your end product?  Are the old tables empty after
>>>>> you
>>>>> put the data into the new tables?
>>>>>
>>>>> Carol
>>>>>
>>>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>>>
>>>>>
>>>>>
>>>>>> I have three tables called 'data_a', 'data_b' and 'data_c' which each
>>>>>> have 50 columns. One of the columns in each is 'id' and is used to keep
>>>>>> track of which data in data_b and data_c corresponds to a row in
>>>>>> data_a.  If
>>>>>> I want to get all of the data in all 150 fields for this month (for
>>>>>> example), I can get it with:
>>>>>>
>>>>>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
>>>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and
>>>>>> timestamp
>>>>>> <= '2008-09-30 23:59:59'
>>>>>>
>>>>>>
>>>
>>>
>>>>>> What I need to do is execute this search which might return several
>>>>>> thousand rows and write the same structure into 'new_a', 'new_b' and
>>>>>> 'new_c'. What i'm doing now in a C program is executing the search
>>>>>> above.
>>>>>> Then I execute:
>>>>>>
>>>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of them])
>>>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>>>> Get the ID that was assigned to this row since 'id' is a serial field
>>>>>> and
>>>>>> the number is assigned sequentially. Say it comes back as '1'.
>>>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them]) VALUES
>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them]) VALUES
>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>
>>>>>> That moves a copy of the three rows of data form the three tables into
>>>>>> the three separate new tables.
>>>>>> From the original group of tables, the id for these rows was, let's
>>>>>> say,
>>>>>> '1234'. Then I execute:
>>>>>>
>>>>>> DELETE FROM data_a where id='1234';
>>>>>> DELETE FROM data_b where id='1234';
>>>>>> DELETE FROM data_c where id='1234';
>>>>>>
>>>>>> That deletes the old data.
>>>>>>
>>>>>> This works fine and gives me exactly what I wanted, but is there a
>>>>>> better
>>>>>> way? This is 7 SQL calls and it takes about 3 seconds per moved record
>>>>>> on
>>>>>> our Linux box.
>>>>>>
>>>>>> Any thoughts or suggestions would be appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Howard
>>>>>>
>>>>>> --
>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>> To make changes to your subscription:
>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>
>>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> No virus found in this incoming message.
>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>
>>>>
>>>>
>>>  ------------------------------------------------------------------------
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>
>>>
>>>
> >
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.169 / Virus Database: 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>
>

Re: Moving data from one set of tables to another?

От

"Sean Davis"

Дата:

19 сентября 2008 г., 14:37:35

On Fri, Sep 19, 2008 at 7:42 AM, Howard Eglowstein
<howard@yankeescientific.com> wrote:
> So you'd agree then that I'll need 7 SQL statements but that I could stack
> the INSERT and the first SELECT if I wanted to? Cool. That's what I ended up
> with in C code and it's working pretty well. I did some indexing on the
> database and got the whole transaction down to about 150ms for the sequence.
> I guess that's as good as it's going to get.

Keep in mind that the INSERT [...] SELECT [...] is done server-side,
so the data never goes over the wire to the client.  This is very
different than doing the select, accumulating the data, and then doing
the insert and is likely to be much faster, relatively.  150ms is
already pretty fast, but the principle of doing as much on the server
as possible is an important one when looking for efficiency,
especially when data sizes are large.

Glad to hear that it is working.

Sean

> Sean Davis wrote:
>>
>> On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
>> <howard@yankeescientific.com> wrote:
>>
>>>
>>> What confuses me is that I need to do the one select with all three
>>> tables
>>> and then do three inserts, no? The results is that the 150 fields I get
>>> back
>>> from the select have to be split into 3 groups of 50 fields each and then
>>> written into three tables.
>>>
>>
>> You do the insert part of the command three times, once for each new
>> table, so three separate SQL statements.  The select remains basically
>> the same for all three, with only the column selection changing
>> (data_a.* when inserting into new_a, data_b.* when inserting into
>> new_b, etc.).  Just leave the ids the same as in the first set of
>> tables.  There isn't a need to change them in nearly every case.  If
>> you need to add a new ID column, you can do that as a serial column in
>> the new tables, but I would stick to the original IDs, if possible.
>>
>> Sean
>>
>>
>>>
>>> What you're suggesting is that there is some statement which could do the
>>> select and the three inserts at once?
>>>
>>> Howard
>>>
>>> Sean Davis wrote:
>>>
>>>>
>>>> You might want to look at insert into ... select ...
>>>>
>>>> You should be able to do this with 1 query per new table (+ the
>>>> deletes, obviously).  For a few thousand records, I would expect that
>>>> the entire process might take a few seconds.
>>>>
>>>> Sean
>>>>
>>>>      On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
>>>> <howard@yankeescientific.com> wrote:
>>>>
>>>>
>>>>>
>>>>> Somewhat empty, yes. The single set of 'data_' tables contains 3 years
>>>>> worth
>>>>> of data. I want to move 2 years worth out into the 'new_' tables. When
>>>>> I'm
>>>>> done, there will still be 1 year's worth of data left in the original
>>>>> table.
>>>>>
>>>>> Howard
>>>>>
>>>>> Carol Walter wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> What do you want for your end product?  Are the old tables empty after
>>>>>> you
>>>>>> put the data into the new tables?
>>>>>>
>>>>>> Carol
>>>>>>
>>>>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I have three tables called 'data_a', 'data_b' and 'data_c' which each
>>>>>>> have 50 columns. One of the columns in each is 'id' and is used to
>>>>>>> keep
>>>>>>> track of which data in data_b and data_c corresponds to a row in
>>>>>>> data_a.  If
>>>>>>> I want to get all of the data in all 150 fields for this month (for
>>>>>>> example), I can get it with:
>>>>>>>
>>>>>>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
>>>>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and
>>>>>>> timestamp
>>>>>>> <= '2008-09-30 23:59:59'
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>>>>>
>>>>>>> What I need to do is execute this search which might return several
>>>>>>> thousand rows and write the same structure into 'new_a', 'new_b' and
>>>>>>> 'new_c'. What i'm doing now in a C program is executing the search
>>>>>>> above.
>>>>>>> Then I execute:
>>>>>>>
>>>>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of
>>>>>>> them])
>>>>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>>>>> Get the ID that was assigned to this row since 'id' is a serial field
>>>>>>> and
>>>>>>> the number is assigned sequentially. Say it comes back as '1'.
>>>>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them])
>>>>>>> VALUES
>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them])
>>>>>>> VALUES
>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>
>>>>>>> That moves a copy of the three rows of data form the three tables
>>>>>>> into
>>>>>>> the three separate new tables.
>>>>>>> From the original group of tables, the id for these rows was, let's
>>>>>>> say,
>>>>>>> '1234'. Then I execute:
>>>>>>>
>>>>>>> DELETE FROM data_a where id='1234';
>>>>>>> DELETE FROM data_b where id='1234';
>>>>>>> DELETE FROM data_c where id='1234';
>>>>>>>
>>>>>>> That deletes the old data.
>>>>>>>
>>>>>>> This works fine and gives me exactly what I wanted, but is there a
>>>>>>> better
>>>>>>> way? This is 7 SQL calls and it takes about 3 seconds per moved
>>>>>>> record
>>>>>>> on
>>>>>>> our Linux box.
>>>>>>>
>>>>>>> Any thoughts or suggestions would be appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Howard
>>>>>>>
>>>>>>> --
>>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>>> To make changes to your subscription:
>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> No virus found in this incoming message.
>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>> To make changes to your subscription:
>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>  ------------------------------------------------------------------------
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>
>>>>
>>>>
>>
>> >
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>
>>
>
>

Re: Moving data from one set of tables to another?

От

Howard Eglowstein

Дата:

19 сентября 2008 г., 14:48:31

Absolutely true, and if the data weren't stored on the same machine
which is running the client, I would have worked harder to combine
statements. In this case though, the server and the data are on the same
machine and the client application doing the SELECT, INSERT and DELETEs
is also on the same machine.

I'd like to see how to have done this with combined statements if I ever
have to do it again in a different setup, but it is working well now.
It's moved about 1/2 million records so far since last night.

Thanks again!

Howard

Sean Davis wrote:
> On Fri, Sep 19, 2008 at 7:42 AM, Howard Eglowstein
> <howard@yankeescientific.com> wrote:
>
>> So you'd agree then that I'll need 7 SQL statements but that I could stack
>> the INSERT and the first SELECT if I wanted to? Cool. That's what I ended up
>> with in C code and it's working pretty well. I did some indexing on the
>> database and got the whole transaction down to about 150ms for the sequence.
>> I guess that's as good as it's going to get.
>>
>
> Keep in mind that the INSERT [...] SELECT [...] is done server-side,
> so the data never goes over the wire to the client.  This is very
> different than doing the select, accumulating the data, and then doing
> the insert and is likely to be much faster, relatively.  150ms is
> already pretty fast, but the principle of doing as much on the server
> as possible is an important one when looking for efficiency,
> especially when data sizes are large.
>
> Glad to hear that it is working.
>
> Sean
>
>
>> Sean Davis wrote:
>>
>>> On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
>>> <howard@yankeescientific.com> wrote:
>>>
>>>
>>>> What confuses me is that I need to do the one select with all three
>>>> tables
>>>> and then do three inserts, no? The results is that the 150 fields I get
>>>> back
>>>> from the select have to be split into 3 groups of 50 fields each and then
>>>> written into three tables.
>>>>
>>>>
>>> You do the insert part of the command three times, once for each new
>>> table, so three separate SQL statements.  The select remains basically
>>> the same for all three, with only the column selection changing
>>> (data_a.* when inserting into new_a, data_b.* when inserting into
>>> new_b, etc.).  Just leave the ids the same as in the first set of
>>> tables.  There isn't a need to change them in nearly every case.  If
>>> you need to add a new ID column, you can do that as a serial column in
>>> the new tables, but I would stick to the original IDs, if possible.
>>>
>>> Sean
>>>
>>>
>>>
>>>> What you're suggesting is that there is some statement which could do the
>>>> select and the three inserts at once?
>>>>
>>>> Howard
>>>>
>>>> Sean Davis wrote:
>>>>
>>>>
>>>>> You might want to look at insert into ... select ...
>>>>>
>>>>> You should be able to do this with 1 query per new table (+ the
>>>>> deletes, obviously).  For a few thousand records, I would expect that
>>>>> the entire process might take a few seconds.
>>>>>
>>>>> Sean
>>>>>
>>>>>      On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
>>>>> <howard@yankeescientific.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Somewhat empty, yes. The single set of 'data_' tables contains 3 years
>>>>>> worth
>>>>>> of data. I want to move 2 years worth out into the 'new_' tables. When
>>>>>> I'm
>>>>>> done, there will still be 1 year's worth of data left in the original
>>>>>> table.
>>>>>>
>>>>>> Howard
>>>>>>
>>>>>> Carol Walter wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> What do you want for your end product?  Are the old tables empty after
>>>>>>> you
>>>>>>> put the data into the new tables?
>>>>>>>
>>>>>>> Carol
>>>>>>>
>>>>>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I have three tables called 'data_a', 'data_b' and 'data_c' which each
>>>>>>>> have 50 columns. One of the columns in each is 'id' and is used to
>>>>>>>> keep
>>>>>>>> track of which data in data_b and data_c corresponds to a row in
>>>>>>>> data_a.  If
>>>>>>>> I want to get all of the data in all 150 fields for this month (for
>>>>>>>> example), I can get it with:
>>>>>>>>
>>>>>>>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
>>>>>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and
>>>>>>>> timestamp
>>>>>>>> <= '2008-09-30 23:59:59'
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>>>> What I need to do is execute this search which might return several
>>>>>>>> thousand rows and write the same structure into 'new_a', 'new_b' and
>>>>>>>> 'new_c'. What i'm doing now in a C program is executing the search
>>>>>>>> above.
>>>>>>>> Then I execute:
>>>>>>>>
>>>>>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of
>>>>>>>> them])
>>>>>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>>>>>> Get the ID that was assigned to this row since 'id' is a serial field
>>>>>>>> and
>>>>>>>> the number is assigned sequentially. Say it comes back as '1'.
>>>>>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them])
>>>>>>>> VALUES
>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them])
>>>>>>>> VALUES
>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>
>>>>>>>> That moves a copy of the three rows of data form the three tables
>>>>>>>> into
>>>>>>>> the three separate new tables.
>>>>>>>> From the original group of tables, the id for these rows was, let's
>>>>>>>> say,
>>>>>>>> '1234'. Then I execute:
>>>>>>>>
>>>>>>>> DELETE FROM data_a where id='1234';
>>>>>>>> DELETE FROM data_b where id='1234';
>>>>>>>> DELETE FROM data_c where id='1234';
>>>>>>>>
>>>>>>>> That deletes the old data.
>>>>>>>>
>>>>>>>> This works fine and gives me exactly what I wanted, but is there a
>>>>>>>> better
>>>>>>>> way? This is 7 SQL calls and it takes about 3 seconds per moved
>>>>>>>> record
>>>>>>>> on
>>>>>>>> our Linux box.
>>>>>>>>
>>>>>>>> Any thoughts or suggestions would be appreciated.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Howard
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>>>> To make changes to your subscription:
>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> No virus found in this incoming message.
>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>> To make changes to your subscription:
>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>  ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> No virus found in this incoming message.
>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>
>>>>>
>>>>>
>>>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>
>>>
>>>
> >
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.169 / Virus Database: 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>
>

Re: Moving data from one set of tables to another?

От

"Sean Davis"

Дата:

19 сентября 2008 г., 14:58:34

On Fri, Sep 19, 2008 at 10:48 AM, Howard Eglowstein
<howard@yankeescientific.com> wrote:
> Absolutely true, and if the data weren't stored on the same machine which is
> running the client, I would have worked harder to combine statements. In
> this case though, the server and the data are on the same machine and the
> client application doing the SELECT, INSERT and DELETEs is also on the same
> machine.
>
> I'd like to see how to have done this with combined statements if I ever
> have to do it again in a different setup, but it is working well now. It's
> moved about 1/2 million records so far since last night.

So the 150ms was per row?  Not to belabor the point, but I have done
this with tables with tens-of-millions of rows in the space of seconds
to minutes (for the entire move, not per row), depending on the exact
details of the table(s).  No overnight involved.  The network is one
issue (which you have avoided by being local), but the encoding and
decoding overhead to go to a client is another one that is entirely
avoided.  When you have some free time, do benchmark, as I think the
difference could be substantial.

Sean

> Sean Davis wrote:
>>
>> On Fri, Sep 19, 2008 at 7:42 AM, Howard Eglowstein
>> <howard@yankeescientific.com> wrote:
>>
>>>
>>> So you'd agree then that I'll need 7 SQL statements but that I could
>>> stack
>>> the INSERT and the first SELECT if I wanted to? Cool. That's what I ended
>>> up
>>> with in C code and it's working pretty well. I did some indexing on the
>>> database and got the whole transaction down to about 150ms for the
>>> sequence.
>>> I guess that's as good as it's going to get.
>>>
>>
>> Keep in mind that the INSERT [...] SELECT [...] is done server-side,
>> so the data never goes over the wire to the client.  This is very
>> different than doing the select, accumulating the data, and then doing
>> the insert and is likely to be much faster, relatively.  150ms is
>> already pretty fast, but the principle of doing as much on the server
>> as possible is an important one when looking for efficiency,
>> especially when data sizes are large.
>>
>> Glad to hear that it is working.
>>
>> Sean
>>
>>
>>>
>>> Sean Davis wrote:
>>>
>>>>
>>>> On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
>>>> <howard@yankeescientific.com> wrote:
>>>>
>>>>
>>>>>
>>>>> What confuses me is that I need to do the one select with all three
>>>>> tables
>>>>> and then do three inserts, no? The results is that the 150 fields I get
>>>>> back
>>>>> from the select have to be split into 3 groups of 50 fields each and
>>>>> then
>>>>> written into three tables.
>>>>>
>>>>>
>>>>
>>>> You do the insert part of the command three times, once for each new
>>>> table, so three separate SQL statements.  The select remains basically
>>>> the same for all three, with only the column selection changing
>>>> (data_a.* when inserting into new_a, data_b.* when inserting into
>>>> new_b, etc.).  Just leave the ids the same as in the first set of
>>>> tables.  There isn't a need to change them in nearly every case.  If
>>>> you need to add a new ID column, you can do that as a serial column in
>>>> the new tables, but I would stick to the original IDs, if possible.
>>>>
>>>> Sean
>>>>
>>>>
>>>>
>>>>>
>>>>> What you're suggesting is that there is some statement which could do
>>>>> the
>>>>> select and the three inserts at once?
>>>>>
>>>>> Howard
>>>>>
>>>>> Sean Davis wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> You might want to look at insert into ... select ...
>>>>>>
>>>>>> You should be able to do this with 1 query per new table (+ the
>>>>>> deletes, obviously).  For a few thousand records, I would expect that
>>>>>> the entire process might take a few seconds.
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>     On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
>>>>>> <howard@yankeescientific.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Somewhat empty, yes. The single set of 'data_' tables contains 3
>>>>>>> years
>>>>>>> worth
>>>>>>> of data. I want to move 2 years worth out into the 'new_' tables.
>>>>>>> When
>>>>>>> I'm
>>>>>>> done, there will still be 1 year's worth of data left in the original
>>>>>>> table.
>>>>>>>
>>>>>>> Howard
>>>>>>>
>>>>>>> Carol Walter wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> What do you want for your end product?  Are the old tables empty
>>>>>>>> after
>>>>>>>> you
>>>>>>>> put the data into the new tables?
>>>>>>>>
>>>>>>>> Carol
>>>>>>>>
>>>>>>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have three tables called 'data_a', 'data_b' and 'data_c' which
>>>>>>>>> each
>>>>>>>>> have 50 columns. One of the columns in each is 'id' and is used to
>>>>>>>>> keep
>>>>>>>>> track of which data in data_b and data_c corresponds to a row in
>>>>>>>>> data_a.  If
>>>>>>>>> I want to get all of the data in all 150 fields for this month (for
>>>>>>>>> example), I can get it with:
>>>>>>>>>
>>>>>>>>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id
>>>>>>>>> AND
>>>>>>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and
>>>>>>>>> timestamp
>>>>>>>>> <= '2008-09-30 23:59:59'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>
>>>>>>>>> What I need to do is execute this search which might return several
>>>>>>>>> thousand rows and write the same structure into 'new_a', 'new_b'
>>>>>>>>> and
>>>>>>>>> 'new_c'. What i'm doing now in a C program is executing the search
>>>>>>>>> above.
>>>>>>>>> Then I execute:
>>>>>>>>>
>>>>>>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of
>>>>>>>>> them])
>>>>>>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>>>>>>> Get the ID that was assigned to this row since 'id' is a serial
>>>>>>>>> field
>>>>>>>>> and
>>>>>>>>> the number is assigned sequentially. Say it comes back as '1'.
>>>>>>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them])
>>>>>>>>> VALUES
>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them])
>>>>>>>>> VALUES
>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>>
>>>>>>>>> That moves a copy of the three rows of data form the three tables
>>>>>>>>> into
>>>>>>>>> the three separate new tables.
>>>>>>>>> From the original group of tables, the id for these rows was, let's
>>>>>>>>> say,
>>>>>>>>> '1234'. Then I execute:
>>>>>>>>>
>>>>>>>>> DELETE FROM data_a where id='1234';
>>>>>>>>> DELETE FROM data_b where id='1234';
>>>>>>>>> DELETE FROM data_c where id='1234';
>>>>>>>>>
>>>>>>>>> That deletes the old data.
>>>>>>>>>
>>>>>>>>> This works fine and gives me exactly what I wanted, but is there a
>>>>>>>>> better
>>>>>>>>> way? This is 7 SQL calls and it takes about 3 seconds per moved
>>>>>>>>> record
>>>>>>>>> on
>>>>>>>>> our Linux box.
>>>>>>>>>
>>>>>>>>> Any thoughts or suggestions would be appreciated.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Howard
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>>>>> To make changes to your subscription:
>>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> No virus found in this incoming message.
>>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>>> Database:
>>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>>> To make changes to your subscription:
>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> No virus found in this incoming message.
>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>>
>>>>
>>>>
>>
>> >
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>
>>
>
>

Re: Moving data from one set of tables to another?

От

Howard Eglowstein

Дата:

19 сентября 2008 г., 15:07:59

There are a lot of issues at work here. The speed of the machine, the
rest of the machine's workload, the database configuration, etc. This
machine is about 3 years old and not as fast as a test machine I have at
my desk. It's also running three web services and accepting new data
into the current year's tables at the rate of one set of rows every few
seconds. The database when I started didn't have any indices applied. I
indexed a few columns which seemed to help tremendously (a factor of 10
at least) and perhaps a few more might help.

Considering that searching the tables now with the data split into 3
rows takes a minute or more to search the whole database, I suspect that
there's still organizational issues that could be addressed to speed up
all PG operations. I'm far more concerned with robustness and I'm not
too keen on trying too many experiments until I get the data broken up
and backed up again.

I doubt this machine could perform 7 SQL operations on 1.5 million rows
in each of 3 tables in a few seconds or minutes on a good day, with the
wind, rolling down hill. I'd like to be proven wrong though...

Howard

Sean Davis wrote:
> On Fri, Sep 19, 2008 at 10:48 AM, Howard Eglowstein
> <howard@yankeescientific.com> wrote:
>
>> Absolutely true, and if the data weren't stored on the same machine which is
>> running the client, I would have worked harder to combine statements. In
>> this case though, the server and the data are on the same machine and the
>> client application doing the SELECT, INSERT and DELETEs is also on the same
>> machine.
>>
>> I'd like to see how to have done this with combined statements if I ever
>> have to do it again in a different setup, but it is working well now. It's
>> moved about 1/2 million records so far since last night.
>>
>
> So the 150ms was per row?  Not to belabor the point, but I have done
> this with tables with tens-of-millions of rows in the space of seconds
> to minutes (for the entire move, not per row), depending on the exact
> details of the table(s).  No overnight involved.  The network is one
> issue (which you have avoided by being local), but the encoding and
> decoding overhead to go to a client is another one that is entirely
> avoided.  When you have some free time, do benchmark, as I think the
> difference could be substantial.
>
> Sean
>
>
>> Sean Davis wrote:
>>
>>> On Fri, Sep 19, 2008 at 7:42 AM, Howard Eglowstein
>>> <howard@yankeescientific.com> wrote:
>>>
>>>
>>>> So you'd agree then that I'll need 7 SQL statements but that I could
>>>> stack
>>>> the INSERT and the first SELECT if I wanted to? Cool. That's what I ended
>>>> up
>>>> with in C code and it's working pretty well. I did some indexing on the
>>>> database and got the whole transaction down to about 150ms for the
>>>> sequence.
>>>> I guess that's as good as it's going to get.
>>>>
>>>>
>>> Keep in mind that the INSERT [...] SELECT [...] is done server-side,
>>> so the data never goes over the wire to the client.  This is very
>>> different than doing the select, accumulating the data, and then doing
>>> the insert and is likely to be much faster, relatively.  150ms is
>>> already pretty fast, but the principle of doing as much on the server
>>> as possible is an important one when looking for efficiency,
>>> especially when data sizes are large.
>>>
>>> Glad to hear that it is working.
>>>
>>> Sean
>>>
>>>
>>>
>>>> Sean Davis wrote:
>>>>
>>>>
>>>>> On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
>>>>> <howard@yankeescientific.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> What confuses me is that I need to do the one select with all three
>>>>>> tables
>>>>>> and then do three inserts, no? The results is that the 150 fields I get
>>>>>> back
>>>>>> from the select have to be split into 3 groups of 50 fields each and
>>>>>> then
>>>>>> written into three tables.
>>>>>>
>>>>>>
>>>>>>
>>>>> You do the insert part of the command three times, once for each new
>>>>> table, so three separate SQL statements.  The select remains basically
>>>>> the same for all three, with only the column selection changing
>>>>> (data_a.* when inserting into new_a, data_b.* when inserting into
>>>>> new_b, etc.).  Just leave the ids the same as in the first set of
>>>>> tables.  There isn't a need to change them in nearly every case.  If
>>>>> you need to add a new ID column, you can do that as a serial column in
>>>>> the new tables, but I would stick to the original IDs, if possible.
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> What you're suggesting is that there is some statement which could do
>>>>>> the
>>>>>> select and the three inserts at once?
>>>>>>
>>>>>> Howard
>>>>>>
>>>>>> Sean Davis wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> You might want to look at insert into ... select ...
>>>>>>>
>>>>>>> You should be able to do this with 1 query per new table (+ the
>>>>>>> deletes, obviously).  For a few thousand records, I would expect that
>>>>>>> the entire process might take a few seconds.
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>>     On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
>>>>>>> <howard@yankeescientific.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Somewhat empty, yes. The single set of 'data_' tables contains 3
>>>>>>>> years
>>>>>>>> worth
>>>>>>>> of data. I want to move 2 years worth out into the 'new_' tables.
>>>>>>>> When
>>>>>>>> I'm
>>>>>>>> done, there will still be 1 year's worth of data left in the original
>>>>>>>> table.
>>>>>>>>
>>>>>>>> Howard
>>>>>>>>
>>>>>>>> Carol Walter wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> What do you want for your end product?  Are the old tables empty
>>>>>>>>> after
>>>>>>>>> you
>>>>>>>>> put the data into the new tables?
>>>>>>>>>
>>>>>>>>> Carol
>>>>>>>>>
>>>>>>>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I have three tables called 'data_a', 'data_b' and 'data_c' which
>>>>>>>>>> each
>>>>>>>>>> have 50 columns. One of the columns in each is 'id' and is used to
>>>>>>>>>> keep
>>>>>>>>>> track of which data in data_b and data_c corresponds to a row in
>>>>>>>>>> data_a.  If
>>>>>>>>>> I want to get all of the data in all 150 fields for this month (for
>>>>>>>>>> example), I can get it with:
>>>>>>>>>>
>>>>>>>>>> select * from (data_a, data_b, data_c) where data_a.id=data_b.id
>>>>>>>>>> AND
>>>>>>>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and
>>>>>>>>>> timestamp
>>>>>>>>>> <= '2008-09-30 23:59:59'
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>>>>> What I need to do is execute this search which might return several
>>>>>>>>>> thousand rows and write the same structure into 'new_a', 'new_b'
>>>>>>>>>> and
>>>>>>>>>> 'new_c'. What i'm doing now in a C program is executing the search
>>>>>>>>>> above.
>>>>>>>>>> Then I execute:
>>>>>>>>>>
>>>>>>>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of
>>>>>>>>>> them])
>>>>>>>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>>>>>>>> Get the ID that was assigned to this row since 'id' is a serial
>>>>>>>>>> field
>>>>>>>>>> and
>>>>>>>>>> the number is assigned sequentially. Say it comes back as '1'.
>>>>>>>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them])
>>>>>>>>>> VALUES
>>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them])
>>>>>>>>>> VALUES
>>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>>>
>>>>>>>>>> That moves a copy of the three rows of data form the three tables
>>>>>>>>>> into
>>>>>>>>>> the three separate new tables.
>>>>>>>>>> From the original group of tables, the id for these rows was, let's
>>>>>>>>>> say,
>>>>>>>>>> '1234'. Then I execute:
>>>>>>>>>>
>>>>>>>>>> DELETE FROM data_a where id='1234';
>>>>>>>>>> DELETE FROM data_b where id='1234';
>>>>>>>>>> DELETE FROM data_c where id='1234';
>>>>>>>>>>
>>>>>>>>>> That deletes the old data.
>>>>>>>>>>
>>>>>>>>>> This works fine and gives me exactly what I wanted, but is there a
>>>>>>>>>> better
>>>>>>>>>> way? This is 7 SQL calls and it takes about 3 seconds per moved
>>>>>>>>>> record
>>>>>>>>>> on
>>>>>>>>>> our Linux box.
>>>>>>>>>>
>>>>>>>>>> Any thoughts or suggestions would be appreciated.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Howard
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>>>>>> To make changes to your subscription:
>>>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No virus found in this incoming message.
>>>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>>>> Database:
>>>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>>>> To make changes to your subscription:
>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>  ------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> No virus found in this incoming message.
>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> No virus found in this incoming message.
>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>>>
>>>>>
>>>>>
>>>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>
>>>
>>>
> >
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.169 / Virus Database: 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>
>

Re: Moving data from one set of tables to another?

От

Carol Walter

Дата:

19 сентября 2008 г., 15:51:15

Database tuning can really be an issue.  I have a development copy
and a production copy of most of my databases.  They are on two
different machines.  The databases used to tuned the same way,
however one of the machines has more processing power and one has
more memory.  When we retuned the databases to take advantage of the
machines strong points, it decreased the time it took to run some
queries by 400%.

Carol

P.S.  If I understand your process, and your deleting the records as
you go, that would make me really nervous.  As soon as you start, you
no longer have an intact table that has all the data in it.  While
modern databases engines do a lot to protect your data, there is
always some quirk that can happen.  If you have enough space, you
might consider running the delete after the tables are created.


On Sep 19, 2008, at 11:07 AM, Howard Eglowstein wrote:

> There are a lot of issues at work here. The speed of the machine,
> the rest of the machine's workload, the database configuration,
> etc. This machine is about 3 years old and not as fast as a test
> machine I have at my desk. It's also running three web services and
> accepting new data into the current year's tables at the rate of
> one set of rows every few seconds. The database when I started
> didn't have any indices applied. I indexed a few columns which
> seemed to help tremendously (a factor of 10 at least) and perhaps a
> few more might help.
>
> Considering that searching the tables now with the data split into
> 3 rows takes a minute or more to search the whole database, I
> suspect that there's still organizational issues that could be
> addressed to speed up all PG operations. I'm far more concerned
> with robustness and I'm not too keen on trying too many experiments
> until I get the data broken up and backed up again.
>
> I doubt this machine could perform 7 SQL operations on 1.5 million
> rows in each of 3 tables in a few seconds or minutes on a good day,
> with the wind, rolling down hill. I'd like to be proven wrong
> though...
>
> Howard
>
> Sean Davis wrote:
>> On Fri, Sep 19, 2008 at 10:48 AM, Howard Eglowstein
>> <howard@yankeescientific.com> wrote:
>>
>>> Absolutely true, and if the data weren't stored on the same
>>> machine which is
>>> running the client, I would have worked harder to combine
>>> statements. In
>>> this case though, the server and the data are on the same machine
>>> and the
>>> client application doing the SELECT, INSERT and DELETEs is also
>>> on the same
>>> machine.
>>>
>>> I'd like to see how to have done this with combined statements if
>>> I ever
>>> have to do it again in a different setup, but it is working well
>>> now. It's
>>> moved about 1/2 million records so far since last night.
>>>
>>
>> So the 150ms was per row?  Not to belabor the point, but I have done
>> this with tables with tens-of-millions of rows in the space of
>> seconds
>> to minutes (for the entire move, not per row), depending on the exact
>> details of the table(s).  No overnight involved.  The network is one
>> issue (which you have avoided by being local), but the encoding and
>> decoding overhead to go to a client is another one that is entirely
>> avoided.  When you have some free time, do benchmark, as I think the
>> difference could be substantial.
>>
>> Sean
>>
>>
>>> Sean Davis wrote:
>>>
>>>> On Fri, Sep 19, 2008 at 7:42 AM, Howard Eglowstein
>>>> <howard@yankeescientific.com> wrote:
>>>>
>>>>
>>>>> So you'd agree then that I'll need 7 SQL statements but that I
>>>>> could
>>>>> stack
>>>>> the INSERT and the first SELECT if I wanted to? Cool. That's
>>>>> what I ended
>>>>> up
>>>>> with in C code and it's working pretty well. I did some
>>>>> indexing on the
>>>>> database and got the whole transaction down to about 150ms for the
>>>>> sequence.
>>>>> I guess that's as good as it's going to get.
>>>>>
>>>>>
>>>> Keep in mind that the INSERT [...] SELECT [...] is done server-
>>>> side,
>>>> so the data never goes over the wire to the client.  This is very
>>>> different than doing the select, accumulating the data, and then
>>>> doing
>>>> the insert and is likely to be much faster, relatively.  150ms is
>>>> already pretty fast, but the principle of doing as much on the
>>>> server
>>>> as possible is an important one when looking for efficiency,
>>>> especially when data sizes are large.
>>>>
>>>> Glad to hear that it is working.
>>>>
>>>> Sean
>>>>
>>>>
>>>>
>>>>> Sean Davis wrote:
>>>>>
>>>>>
>>>>>> On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
>>>>>> <howard@yankeescientific.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> What confuses me is that I need to do the one select with all
>>>>>>> three
>>>>>>> tables
>>>>>>> and then do three inserts, no? The results is that the 150
>>>>>>> fields I get
>>>>>>> back
>>>>>>> from the select have to be split into 3 groups of 50 fields
>>>>>>> each and
>>>>>>> then
>>>>>>> written into three tables.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> You do the insert part of the command three times, once for
>>>>>> each new
>>>>>> table, so three separate SQL statements.  The select remains
>>>>>> basically
>>>>>> the same for all three, with only the column selection changing
>>>>>> (data_a.* when inserting into new_a, data_b.* when inserting into
>>>>>> new_b, etc.).  Just leave the ids the same as in the first set of
>>>>>> tables.  There isn't a need to change them in nearly every
>>>>>> case.  If
>>>>>> you need to add a new ID column, you can do that as a serial
>>>>>> column in
>>>>>> the new tables, but I would stick to the original IDs, if
>>>>>> possible.
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> What you're suggesting is that there is some statement which
>>>>>>> could do
>>>>>>> the
>>>>>>> select and the three inserts at once?
>>>>>>>
>>>>>>> Howard
>>>>>>>
>>>>>>> Sean Davis wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> You might want to look at insert into ... select ...
>>>>>>>>
>>>>>>>> You should be able to do this with 1 query per new table (+ the
>>>>>>>> deletes, obviously).  For a few thousand records, I would
>>>>>>>> expect that
>>>>>>>> the entire process might take a few seconds.
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>>     On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
>>>>>>>> <howard@yankeescientific.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Somewhat empty, yes. The single set of 'data_' tables
>>>>>>>>> contains 3
>>>>>>>>> years
>>>>>>>>> worth
>>>>>>>>> of data. I want to move 2 years worth out into the 'new_'
>>>>>>>>> tables.
>>>>>>>>> When
>>>>>>>>> I'm
>>>>>>>>> done, there will still be 1 year's worth of data left in
>>>>>>>>> the original
>>>>>>>>> table.
>>>>>>>>>
>>>>>>>>> Howard
>>>>>>>>>
>>>>>>>>> Carol Walter wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> What do you want for your end product?  Are the old tables
>>>>>>>>>> empty
>>>>>>>>>> after
>>>>>>>>>> you
>>>>>>>>>> put the data into the new tables?
>>>>>>>>>>
>>>>>>>>>> Carol
>>>>>>>>>>
>>>>>>>>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I have three tables called 'data_a', 'data_b' and
>>>>>>>>>>> 'data_c' which
>>>>>>>>>>> each
>>>>>>>>>>> have 50 columns. One of the columns in each is 'id' and
>>>>>>>>>>> is used to
>>>>>>>>>>> keep
>>>>>>>>>>> track of which data in data_b and data_c corresponds to a
>>>>>>>>>>> row in
>>>>>>>>>>> data_a.  If
>>>>>>>>>>> I want to get all of the data in all 150 fields for this
>>>>>>>>>>> month (for
>>>>>>>>>>> example), I can get it with:
>>>>>>>>>>>
>>>>>>>>>>> select * from (data_a, data_b, data_c) where
>>>>>>>>>>> data_a.id=data_b.id
>>>>>>>>>>> AND
>>>>>>>>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01
>>>>>>>>>>> 00:00:00' and
>>>>>>>>>>> timestamp
>>>>>>>>>>> <= '2008-09-30 23:59:59'
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>> What I need to do is execute this search which might
>>>>>>>>>>> return several
>>>>>>>>>>> thousand rows and write the same structure into 'new_a',
>>>>>>>>>>> 'new_b'
>>>>>>>>>>> and
>>>>>>>>>>> 'new_c'. What i'm doing now in a C program is executing
>>>>>>>>>>> the search
>>>>>>>>>>> above.
>>>>>>>>>>> Then I execute:
>>>>>>>>>>>
>>>>>>>>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine
>>>>>>>>>>> 50 of
>>>>>>>>>>> them])
>>>>>>>>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>>>>>>>>> Get the ID that was assigned to this row since 'id' is a
>>>>>>>>>>> serial
>>>>>>>>>>> field
>>>>>>>>>>> and
>>>>>>>>>>> the number is assigned sequentially. Say it comes back as
>>>>>>>>>>> '1'.
>>>>>>>>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of
>>>>>>>>>>> them])
>>>>>>>>>>> VALUES
>>>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of
>>>>>>>>>>> them])
>>>>>>>>>>> VALUES
>>>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>>>>
>>>>>>>>>>> That moves a copy of the three rows of data form the
>>>>>>>>>>> three tables
>>>>>>>>>>> into
>>>>>>>>>>> the three separate new tables.
>>>>>>>>>>> From the original group of tables, the id for these rows
>>>>>>>>>>> was, let's
>>>>>>>>>>> say,
>>>>>>>>>>> '1234'. Then I execute:
>>>>>>>>>>>
>>>>>>>>>>> DELETE FROM data_a where id='1234';
>>>>>>>>>>> DELETE FROM data_b where id='1234';
>>>>>>>>>>> DELETE FROM data_c where id='1234';
>>>>>>>>>>>
>>>>>>>>>>> That deletes the old data.
>>>>>>>>>>>
>>>>>>>>>>> This works fine and gives me exactly what I wanted, but
>>>>>>>>>>> is there a
>>>>>>>>>>> better
>>>>>>>>>>> way? This is 7 SQL calls and it takes about 3 seconds per
>>>>>>>>>>> moved
>>>>>>>>>>> record
>>>>>>>>>>> on
>>>>>>>>>>> our Linux box.
>>>>>>>>>>>
>>>>>>>>>>> Any thoughts or suggestions would be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Howard
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Sent via pgsql-novice mailing list (pgsql-
>>>>>>>>>>> novice@postgresql.org)
>>>>>>>>>>> To make changes to your subscription:
>>>>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -----------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> No virus found in this incoming message.
>>>>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>>>>> Database:
>>>>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent via pgsql-novice mailing list (pgsql-
>>>>>>>>> novice@postgresql.org)
>>>>>>>>> To make changes to your subscription:
>>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>>
>>>>>>>>
>>>>>>>> No virus found in this incoming message.
>>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>>> Database:
>>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> -----------------------------------------------------------------
>>>>>> -------
>>>>>>
>>>>>>
>>>>>> No virus found in this incoming message.
>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>> Database:
>>>>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> -------------------------------------------------------------------
>>>> -----
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>> Database:
>>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>>
>>>>
>>>>
>> >
>> ---------------------------------------------------------------------
>> ---
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>> Database: 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>
>>
>
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice

Re: Moving data from one set of tables to another?

От

Howard Eglowstein

Дата:

19 сентября 2008 г., 16:04:40

Yes, I have been deleting them as I go. I thought about running one pass
to move the data over and a second one to then delete the records. The
data in two of the tables is only loosely linked to the data in the
first by the value in one column, and I was concerned about how to know
where to restart the process if it stopped and I had to restart it
later. Deleting the three rows after the database reported successfully
writing the three new ones seemed like a good idea at the time. I don't
want to stop the process now, but I'll look at having the program keep
track of its progress and then go back and delete the old data when it's
done.

And yes, I do have a complete backup of the data from before I started
any of this. I can easily go back to where I was and try again or tweak
the process as needed. The database tuning is a problem I think we have
from before this procedure and I'll have to look at again after this
data is moved around.

Howard

Carol Walter wrote:
> Database tuning can really be an issue.  I have a development copy and
> a production copy of most of my databases.  They are on two different
> machines.  The databases used to tuned the same way, however one of
> the machines has more processing power and one has more memory.  When
> we retuned the databases to take advantage of the machines strong
> points, it decreased the time it took to run some queries by 400%.
>
> Carol
>
> P.S.  If I understand your process, and your deleting the records as
> you go, that would make me really nervous.  As soon as you start, you
> no longer have an intact table that has all the data in it.  While
> modern databases engines do a lot to protect your data, there is
> always some quirk that can happen.  If you have enough space, you
> might consider running the delete after the tables are created.
>
>
> On Sep 19, 2008, at 11:07 AM, Howard Eglowstein wrote:
>
>> There are a lot of issues at work here. The speed of the machine, the
>> rest of the machine's workload, the database configuration, etc. This
>> machine is about 3 years old and not as fast as a test machine I have
>> at my desk. It's also running three web services and accepting new
>> data into the current year's tables at the rate of one set of rows
>> every few seconds. The database when I started didn't have any
>> indices applied. I indexed a few columns which seemed to help
>> tremendously (a factor of 10 at least) and perhaps a few more might
>> help.
>>
>> Considering that searching the tables now with the data split into 3
>> rows takes a minute or more to search the whole database, I suspect
>> that there's still organizational issues that could be addressed to
>> speed up all PG operations. I'm far more concerned with robustness
>> and I'm not too keen on trying too many experiments until I get the
>> data broken up and backed up again.
>>
>> I doubt this machine could perform 7 SQL operations on 1.5 million
>> rows in each of 3 tables in a few seconds or minutes on a good day,
>> with the wind, rolling down hill. I'd like to be proven wrong though...
>>
>> Howard
>>
>> Sean Davis wrote:
>>> On Fri, Sep 19, 2008 at 10:48 AM, Howard Eglowstein
>>> <howard@yankeescientific.com> wrote:
>>>
>>>> Absolutely true, and if the data weren't stored on the same machine
>>>> which is
>>>> running the client, I would have worked harder to combine
>>>> statements. In
>>>> this case though, the server and the data are on the same machine
>>>> and the
>>>> client application doing the SELECT, INSERT and DELETEs is also on
>>>> the same
>>>> machine.
>>>>
>>>> I'd like to see how to have done this with combined statements if I
>>>> ever
>>>> have to do it again in a different setup, but it is working well
>>>> now. It's
>>>> moved about 1/2 million records so far since last night.
>>>>
>>>
>>> So the 150ms was per row?  Not to belabor the point, but I have done
>>> this with tables with tens-of-millions of rows in the space of seconds
>>> to minutes (for the entire move, not per row), depending on the exact
>>> details of the table(s).  No overnight involved.  The network is one
>>> issue (which you have avoided by being local), but the encoding and
>>> decoding overhead to go to a client is another one that is entirely
>>> avoided.  When you have some free time, do benchmark, as I think the
>>> difference could be substantial.
>>>
>>> Sean
>>>
>>>
>>>> Sean Davis wrote:
>>>>
>>>>> On Fri, Sep 19, 2008 at 7:42 AM, Howard Eglowstein
>>>>> <howard@yankeescientific.com> wrote:
>>>>>
>>>>>
>>>>>> So you'd agree then that I'll need 7 SQL statements but that I could
>>>>>> stack
>>>>>> the INSERT and the first SELECT if I wanted to? Cool. That's what
>>>>>> I ended
>>>>>> up
>>>>>> with in C code and it's working pretty well. I did some indexing
>>>>>> on the
>>>>>> database and got the whole transaction down to about 150ms for the
>>>>>> sequence.
>>>>>> I guess that's as good as it's going to get.
>>>>>>
>>>>>>
>>>>> Keep in mind that the INSERT [...] SELECT [...] is done server-side,
>>>>> so the data never goes over the wire to the client.  This is very
>>>>> different than doing the select, accumulating the data, and then
>>>>> doing
>>>>> the insert and is likely to be much faster, relatively.  150ms is
>>>>> already pretty fast, but the principle of doing as much on the server
>>>>> as possible is an important one when looking for efficiency,
>>>>> especially when data sizes are large.
>>>>>
>>>>> Glad to hear that it is working.
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>>
>>>>>> Sean Davis wrote:
>>>>>>
>>>>>>
>>>>>>> On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
>>>>>>> <howard@yankeescientific.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> What confuses me is that I need to do the one select with all
>>>>>>>> three
>>>>>>>> tables
>>>>>>>> and then do three inserts, no? The results is that the 150
>>>>>>>> fields I get
>>>>>>>> back
>>>>>>>> from the select have to be split into 3 groups of 50 fields
>>>>>>>> each and
>>>>>>>> then
>>>>>>>> written into three tables.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> You do the insert part of the command three times, once for each
>>>>>>> new
>>>>>>> table, so three separate SQL statements.  The select remains
>>>>>>> basically
>>>>>>> the same for all three, with only the column selection changing
>>>>>>> (data_a.* when inserting into new_a, data_b.* when inserting into
>>>>>>> new_b, etc.).  Just leave the ids the same as in the first set of
>>>>>>> tables.  There isn't a need to change them in nearly every
>>>>>>> case.  If
>>>>>>> you need to add a new ID column, you can do that as a serial
>>>>>>> column in
>>>>>>> the new tables, but I would stick to the original IDs, if possible.
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> What you're suggesting is that there is some statement which
>>>>>>>> could do
>>>>>>>> the
>>>>>>>> select and the three inserts at once?
>>>>>>>>
>>>>>>>> Howard
>>>>>>>>
>>>>>>>> Sean Davis wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> You might want to look at insert into ... select ...
>>>>>>>>>
>>>>>>>>> You should be able to do this with 1 query per new table (+ the
>>>>>>>>> deletes, obviously).  For a few thousand records, I would
>>>>>>>>> expect that
>>>>>>>>> the entire process might take a few seconds.
>>>>>>>>>
>>>>>>>>> Sean
>>>>>>>>>
>>>>>>>>>     On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
>>>>>>>>> <howard@yankeescientific.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Somewhat empty, yes. The single set of 'data_' tables contains 3
>>>>>>>>>> years
>>>>>>>>>> worth
>>>>>>>>>> of data. I want to move 2 years worth out into the 'new_'
>>>>>>>>>> tables.
>>>>>>>>>> When
>>>>>>>>>> I'm
>>>>>>>>>> done, there will still be 1 year's worth of data left in the
>>>>>>>>>> original
>>>>>>>>>> table.
>>>>>>>>>>
>>>>>>>>>> Howard
>>>>>>>>>>
>>>>>>>>>> Carol Walter wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> What do you want for your end product?  Are the old tables
>>>>>>>>>>> empty
>>>>>>>>>>> after
>>>>>>>>>>> you
>>>>>>>>>>> put the data into the new tables?
>>>>>>>>>>>
>>>>>>>>>>> Carol
>>>>>>>>>>>
>>>>>>>>>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I have three tables called 'data_a', 'data_b' and 'data_c'
>>>>>>>>>>>> which
>>>>>>>>>>>> each
>>>>>>>>>>>> have 50 columns. One of the columns in each is 'id' and is
>>>>>>>>>>>> used to
>>>>>>>>>>>> keep
>>>>>>>>>>>> track of which data in data_b and data_c corresponds to a
>>>>>>>>>>>> row in
>>>>>>>>>>>> data_a.  If
>>>>>>>>>>>> I want to get all of the data in all 150 fields for this
>>>>>>>>>>>> month (for
>>>>>>>>>>>> example), I can get it with:
>>>>>>>>>>>>
>>>>>>>>>>>> select * from (data_a, data_b, data_c) where
>>>>>>>>>>>> data_a.id=data_b.id
>>>>>>>>>>>> AND
>>>>>>>>>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01
>>>>>>>>>>>> 00:00:00' and
>>>>>>>>>>>> timestamp
>>>>>>>>>>>> <= '2008-09-30 23:59:59'
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> What I need to do is execute this search which might return
>>>>>>>>>>>> several
>>>>>>>>>>>> thousand rows and write the same structure into 'new_a',
>>>>>>>>>>>> 'new_b'
>>>>>>>>>>>> and
>>>>>>>>>>>> 'new_c'. What i'm doing now in a C program is executing the
>>>>>>>>>>>> search
>>>>>>>>>>>> above.
>>>>>>>>>>>> Then I execute:
>>>>>>>>>>>>
>>>>>>>>>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine
>>>>>>>>>>>> 50 of
>>>>>>>>>>>> them])
>>>>>>>>>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>>>>>>>>>> Get the ID that was assigned to this row since 'id' is a
>>>>>>>>>>>> serial
>>>>>>>>>>>> field
>>>>>>>>>>>> and
>>>>>>>>>>>> the number is assigned sequentially. Say it comes back as '1'.
>>>>>>>>>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of
>>>>>>>>>>>> them])
>>>>>>>>>>>> VALUES
>>>>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of
>>>>>>>>>>>> them])
>>>>>>>>>>>> VALUES
>>>>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>>>>>
>>>>>>>>>>>> That moves a copy of the three rows of data form the three
>>>>>>>>>>>> tables
>>>>>>>>>>>> into
>>>>>>>>>>>> the three separate new tables.
>>>>>>>>>>>> From the original group of tables, the id for these rows
>>>>>>>>>>>> was, let's
>>>>>>>>>>>> say,
>>>>>>>>>>>> '1234'. Then I execute:
>>>>>>>>>>>>
>>>>>>>>>>>> DELETE FROM data_a where id='1234';
>>>>>>>>>>>> DELETE FROM data_b where id='1234';
>>>>>>>>>>>> DELETE FROM data_c where id='1234';
>>>>>>>>>>>>
>>>>>>>>>>>> That deletes the old data.
>>>>>>>>>>>>
>>>>>>>>>>>> This works fine and gives me exactly what I wanted, but is
>>>>>>>>>>>> there a
>>>>>>>>>>>> better
>>>>>>>>>>>> way? This is 7 SQL calls and it takes about 3 seconds per
>>>>>>>>>>>> moved
>>>>>>>>>>>> record
>>>>>>>>>>>> on
>>>>>>>>>>>> our Linux box.
>>>>>>>>>>>>
>>>>>>>>>>>> Any thoughts or suggestions would be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Howard
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Sent via pgsql-novice mailing list
>>>>>>>>>>>> (pgsql-novice@postgresql.org)
>>>>>>>>>>>> To make changes to your subscription:
>>>>>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No virus found in this incoming message.
>>>>>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>>>>>> Database:
>>>>>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>>>>>> To make changes to your subscription:
>>>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>  ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No virus found in this incoming message.
>>>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>>>> Database:
>>>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> No virus found in this incoming message.
>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>> Database:
>>>>>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> No virus found in this incoming message.
>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>> Database:
>>>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>>>
>>>>>
>>>>>
>>> >
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>> Database: 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>
>>>
>>
>>
>> --
>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-novice
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.169 / Virus Database: 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>
>

Re: Moving data from one set of tables to another?

От

"Sean Davis"

Дата:

19 сентября 2008 г., 16:53:46

On Fri, Sep 19, 2008 at 12:04 PM, Howard Eglowstein
<howard@yankeescientific.com> wrote:
> Yes, I have been deleting them as I go. I thought about running one pass to
> move the data over and a second one to then delete the records. The data in
> two of the tables is only loosely linked to the data in the first by the
> value in one column, and I was concerned about how to know where to restart
> the process if it stopped and I had to restart it later. Deleting the three
> rows after the database reported successfully writing the three new ones
> seemed like a good idea at the time. I don't want to stop the process now,
> but I'll look at having the program keep track of its progress and then go
> back and delete the old data when it's done.
>
> And yes, I do have a complete backup of the data from before I started any
> of this. I can easily go back to where I was and try again or tweak the
> process as needed. The database tuning is a problem I think we have from
> before this procedure and I'll have to look at again after this data is
> moved around.

You might want to do all of this--inserts and deletes--within a
transaction.  Then, if ANY step fails, the entire process can be
rolled back.

> Carol Walter wrote:
>>
>> Database tuning can really be an issue.  I have a development copy and a
>> production copy of most of my databases.  They are on two different
>> machines.  The databases used to tuned the same way, however one of the
>> machines has more processing power and one has more memory.  When we retuned
>> the databases to take advantage of the machines strong points, it decreased
>> the time it took to run some queries by 400%.
>>
>> Carol
>>
>> P.S.  If I understand your process, and your deleting the records as you
>> go, that would make me really nervous.  As soon as you start, you no longer
>> have an intact table that has all the data in it.  While modern databases
>> engines do a lot to protect your data, there is always some quirk that can
>> happen.  If you have enough space, you might consider running the delete
>> after the tables are created.
>>
>>
>> On Sep 19, 2008, at 11:07 AM, Howard Eglowstein wrote:
>>
>>> There are a lot of issues at work here. The speed of the machine, the
>>> rest of the machine's workload, the database configuration, etc. This
>>> machine is about 3 years old and not as fast as a test machine I have at my
>>> desk. It's also running three web services and accepting new data into the
>>> current year's tables at the rate of one set of rows every few seconds. The
>>> database when I started didn't have any indices applied. I indexed a few
>>> columns which seemed to help tremendously (a factor of 10 at least) and
>>> perhaps a few more might help.
>>>
>>> Considering that searching the tables now with the data split into 3 rows
>>> takes a minute or more to search the whole database, I suspect that there's
>>> still organizational issues that could be addressed to speed up all PG
>>> operations. I'm far more concerned with robustness and I'm not too keen on
>>> trying too many experiments until I get the data broken up and backed up
>>> again.
>>>
>>> I doubt this machine could perform 7 SQL operations on 1.5 million rows
>>> in each of 3 tables in a few seconds or minutes on a good day, with the
>>> wind, rolling down hill. I'd like to be proven wrong though...
>>>
>>> Howard
>>>
>>> Sean Davis wrote:
>>>>
>>>> On Fri, Sep 19, 2008 at 10:48 AM, Howard Eglowstein
>>>> <howard@yankeescientific.com> wrote:
>>>>
>>>>> Absolutely true, and if the data weren't stored on the same machine
>>>>> which is
>>>>> running the client, I would have worked harder to combine statements.
>>>>> In
>>>>> this case though, the server and the data are on the same machine and
>>>>> the
>>>>> client application doing the SELECT, INSERT and DELETEs is also on the
>>>>> same
>>>>> machine.
>>>>>
>>>>> I'd like to see how to have done this with combined statements if I
>>>>> ever
>>>>> have to do it again in a different setup, but it is working well now.
>>>>> It's
>>>>> moved about 1/2 million records so far since last night.
>>>>>
>>>>
>>>> So the 150ms was per row?  Not to belabor the point, but I have done
>>>> this with tables with tens-of-millions of rows in the space of seconds
>>>> to minutes (for the entire move, not per row), depending on the exact
>>>> details of the table(s).  No overnight involved.  The network is one
>>>> issue (which you have avoided by being local), but the encoding and
>>>> decoding overhead to go to a client is another one that is entirely
>>>> avoided.  When you have some free time, do benchmark, as I think the
>>>> difference could be substantial.
>>>>
>>>> Sean
>>>>
>>>>
>>>>> Sean Davis wrote:
>>>>>
>>>>>> On Fri, Sep 19, 2008 at 7:42 AM, Howard Eglowstein
>>>>>> <howard@yankeescientific.com> wrote:
>>>>>>
>>>>>>
>>>>>>> So you'd agree then that I'll need 7 SQL statements but that I could
>>>>>>> stack
>>>>>>> the INSERT and the first SELECT if I wanted to? Cool. That's what I
>>>>>>> ended
>>>>>>> up
>>>>>>> with in C code and it's working pretty well. I did some indexing on
>>>>>>> the
>>>>>>> database and got the whole transaction down to about 150ms for the
>>>>>>> sequence.
>>>>>>> I guess that's as good as it's going to get.
>>>>>>>
>>>>>>>
>>>>>> Keep in mind that the INSERT [...] SELECT [...] is done server-side,
>>>>>> so the data never goes over the wire to the client.  This is very
>>>>>> different than doing the select, accumulating the data, and then doing
>>>>>> the insert and is likely to be much faster, relatively.  150ms is
>>>>>> already pretty fast, but the principle of doing as much on the server
>>>>>> as possible is an important one when looking for efficiency,
>>>>>> especially when data sizes are large.
>>>>>>
>>>>>> Glad to hear that it is working.
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Sean Davis wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
>>>>>>>> <howard@yankeescientific.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> What confuses me is that I need to do the one select with all three
>>>>>>>>> tables
>>>>>>>>> and then do three inserts, no? The results is that the 150 fields I
>>>>>>>>> get
>>>>>>>>> back
>>>>>>>>> from the select have to be split into 3 groups of 50 fields each
>>>>>>>>> and
>>>>>>>>> then
>>>>>>>>> written into three tables.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> You do the insert part of the command three times, once for each new
>>>>>>>> table, so three separate SQL statements.  The select remains
>>>>>>>> basically
>>>>>>>> the same for all three, with only the column selection changing
>>>>>>>> (data_a.* when inserting into new_a, data_b.* when inserting into
>>>>>>>> new_b, etc.).  Just leave the ids the same as in the first set of
>>>>>>>> tables.  There isn't a need to change them in nearly every case.  If
>>>>>>>> you need to add a new ID column, you can do that as a serial column
>>>>>>>> in
>>>>>>>> the new tables, but I would stick to the original IDs, if possible.
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> What you're suggesting is that there is some statement which could
>>>>>>>>> do
>>>>>>>>> the
>>>>>>>>> select and the three inserts at once?
>>>>>>>>>
>>>>>>>>> Howard
>>>>>>>>>
>>>>>>>>> Sean Davis wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> You might want to look at insert into ... select ...
>>>>>>>>>>
>>>>>>>>>> You should be able to do this with 1 query per new table (+ the
>>>>>>>>>> deletes, obviously).  For a few thousand records, I would expect
>>>>>>>>>> that
>>>>>>>>>> the entire process might take a few seconds.
>>>>>>>>>>
>>>>>>>>>> Sean
>>>>>>>>>>
>>>>>>>>>>    On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
>>>>>>>>>> <howard@yankeescientific.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Somewhat empty, yes. The single set of 'data_' tables contains 3
>>>>>>>>>>> years
>>>>>>>>>>> worth
>>>>>>>>>>> of data. I want to move 2 years worth out into the 'new_' tables.
>>>>>>>>>>> When
>>>>>>>>>>> I'm
>>>>>>>>>>> done, there will still be 1 year's worth of data left in the
>>>>>>>>>>> original
>>>>>>>>>>> table.
>>>>>>>>>>>
>>>>>>>>>>> Howard
>>>>>>>>>>>
>>>>>>>>>>> Carol Walter wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> What do you want for your end product?  Are the old tables empty
>>>>>>>>>>>> after
>>>>>>>>>>>> you
>>>>>>>>>>>> put the data into the new tables?
>>>>>>>>>>>>
>>>>>>>>>>>> Carol
>>>>>>>>>>>>
>>>>>>>>>>>> On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> I have three tables called 'data_a', 'data_b' and 'data_c'
>>>>>>>>>>>>> which
>>>>>>>>>>>>> each
>>>>>>>>>>>>> have 50 columns. One of the columns in each is 'id' and is used
>>>>>>>>>>>>> to
>>>>>>>>>>>>> keep
>>>>>>>>>>>>> track of which data in data_b and data_c corresponds to a row
>>>>>>>>>>>>> in
>>>>>>>>>>>>> data_a.  If
>>>>>>>>>>>>> I want to get all of the data in all 150 fields for this month
>>>>>>>>>>>>> (for
>>>>>>>>>>>>> example), I can get it with:
>>>>>>>>>>>>>
>>>>>>>>>>>>> select * from (data_a, data_b, data_c) where
>>>>>>>>>>>>> data_a.id=data_b.id
>>>>>>>>>>>>> AND
>>>>>>>>>>>>> data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00'
>>>>>>>>>>>>> and
>>>>>>>>>>>>> timestamp
>>>>>>>>>>>>> <= '2008-09-30 23:59:59'
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> What I need to do is execute this search which might return
>>>>>>>>>>>>> several
>>>>>>>>>>>>> thousand rows and write the same structure into 'new_a',
>>>>>>>>>>>>> 'new_b'
>>>>>>>>>>>>> and
>>>>>>>>>>>>> 'new_c'. What i'm doing now in a C program is executing the
>>>>>>>>>>>>> search
>>>>>>>>>>>>> above.
>>>>>>>>>>>>> Then I execute:
>>>>>>>>>>>>>
>>>>>>>>>>>>> INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of
>>>>>>>>>>>>> them])
>>>>>>>>>>>>> VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
>>>>>>>>>>>>> Get the ID that was assigned to this row since 'id' is a serial
>>>>>>>>>>>>> field
>>>>>>>>>>>>> and
>>>>>>>>>>>>> the number is assigned sequentially. Say it comes back as '1'.
>>>>>>>>>>>>> INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them])
>>>>>>>>>>>>> VALUES
>>>>>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>>>>>> INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them])
>>>>>>>>>>>>> VALUES
>>>>>>>>>>>>> ('1', 'ABC', 'DEF', ...);
>>>>>>>>>>>>>
>>>>>>>>>>>>> That moves a copy of the three rows of data form the three
>>>>>>>>>>>>> tables
>>>>>>>>>>>>> into
>>>>>>>>>>>>> the three separate new tables.
>>>>>>>>>>>>> From the original group of tables, the id for these rows was,
>>>>>>>>>>>>> let's
>>>>>>>>>>>>> say,
>>>>>>>>>>>>> '1234'. Then I execute:
>>>>>>>>>>>>>
>>>>>>>>>>>>> DELETE FROM data_a where id='1234';
>>>>>>>>>>>>> DELETE FROM data_b where id='1234';
>>>>>>>>>>>>> DELETE FROM data_c where id='1234';
>>>>>>>>>>>>>
>>>>>>>>>>>>> That deletes the old data.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This works fine and gives me exactly what I wanted, but is
>>>>>>>>>>>>> there a
>>>>>>>>>>>>> better
>>>>>>>>>>>>> way? This is 7 SQL calls and it takes about 3 seconds per moved
>>>>>>>>>>>>> record
>>>>>>>>>>>>> on
>>>>>>>>>>>>> our Linux box.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any thoughts or suggestions would be appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Howard
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Sent via pgsql-novice mailing list
>>>>>>>>>>>>> (pgsql-novice@postgresql.org)
>>>>>>>>>>>>> To make changes to your subscription:
>>>>>>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> No virus found in this incoming message.
>>>>>>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>>>>>>> Database:
>>>>>>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>>>>>>>>>> To make changes to your subscription:
>>>>>>>>>>> http://www.postgresql.org/mailpref/pgsql-novice
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> No virus found in this incoming message.
>>>>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>>>>> Database:
>>>>>>>>>> 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> No virus found in this incoming message.
>>>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus
>>>>>>>> Database:
>>>>>>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> No virus found in this incoming message.
>>>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>>>>
>>>>>>
>>>>>>
>>>> >
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>>>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>>>
>>>>
>>>
>>>
>>> --
>>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-novice
>>
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
>> 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM
>>
>>
>
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Moving data from one set of tables to another?