Re: [GENERAL] efficiently migrating 'old' data from one table toanother

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: [GENERAL] efficiently migrating 'old' data from one table toanother
Дата
Msg-id cda6e688-ad35-096e-24ab-3d5c2d789a55@aklaver.com
обсуждение исходный текст
Ответ на [GENERAL] efficiently migrating 'old' data from one table to another  (Jonathan Vanasco <postgres@2xlp.com>)
Ответы Re: [GENERAL] efficiently migrating 'old' data from one table to another
Список pgsql-general
On 01/12/2017 12:06 PM, Jonathan Vanasco wrote:
> I'm just wondering if there's a more efficient way of handling a certain periodic data migration.
>
> We have a pair of tables with this structure:
>
>     table_a__live
>         column_1 INT
>         column_2 INT
>         record_timestamp TIMESTAMP
>
>     table_a__archive
>         column_1 INT
>         column_2 INT
>         record_timestamp TIMESTAMP
>
> periodically, we must migrate items that are 'stale' from `table_a__live ` to `table_a__archive`.  The entries are
copiedover to the archive, then deleted. 
>
> The staleness is calculated based on age--  so we need to use INTERVAL.  the "live" table can have anywhere from 100k
to20MM records. 
>
> the primary key on `table_a__live` is a composite of column_1 & column_2,
>
> In order to minimize scanning the table, we opted to hint migrations with a dedicated column:
>
>     ALTER TABLE table_a__live ADD is_migrate BOOLEAN DEFAULT NULL;
>     CREATE INDEX idx_table_a__live_migrate ON table_a__live(is_migrate) WHERE is_migrate IS NOT NULL;
>
> so our migration is then based on that `is_migrate` column:
>
>     BEGIN;
>     UPDATE table_a__live SET is_migrate = TRUE WHERE record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' -
INTERVAL'1 month'; 
>     INSERT INTO table_a__archive (column_1, column_2, record_timestamp) SELECT column_1, column_2, record_timestamp
FROMtable_a__live WHERE is_migrate IS TRUE; 
>     DELETE FROM table_a__live WHERE is_migrate IS TRUE;
>     COMMIT;
>
> The inserts & deletes are blazing fast, but the UPDATE is a bit slow from postgres re-writing all the rows.

Maybe I am missing something, but why do the UPDATE?

Why not?:

BEGIN;

    INSERT INTO
        table_a__archive (column_1, column_2, record_timestamp)
    SELECT
        column_1, column_2, record_timestamp
    FROM
    table_a__live
    WHERE
        record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month';

    DELETE FROM
       table_a__live
    WHERE
       record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month';
COMMIT;

With an index on record_timestamp.

>
> can anyone suggest a better approach?
>
> I considered copying everything to a tmp table then inserting/deleting based on that table -- but there's a lot of
disk-ioon that approach too. 
>
>
> fwiw we're on postgres9.6.1
>


--
Adrian Klaver
adrian.klaver@aklaver.com


В списке pgsql-general по дате отправления:

Предыдущее
От: John R Pierce
Дата:
Сообщение: Re: [GENERAL] efficiently migrating 'old' data from one table toanother
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: [GENERAL] efficiently migrating 'old' data from one table to another