Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3
Дата
Msg-id CAKFQuwZvt1DFKAKTRtsKEGrCFV+=wfcuavaKpGTeKoWfGfLTtA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3  (Patrick Baker <patrickbakerbr@gmail.com>)
Список pgsql-general
On Thu, Jun 2, 2016 at 5:03 PM, Patrick Baker <patrickbakerbr@gmail.com> wrote:


2016-06-03 2:10 GMT+12:00 David G. Johnston <david.g.johnston@gmail.com>:
On Thu, Jun 2, 2016 at 1:04 AM, Patrick Baker <patrickbakerbr@gmail.com> wrote:

It's all working, except the LIMIT...  if possible can you please give me an example of that LIMIT in some of those queries?

​​
You also should use ORDER BY when using LIMIT and OFFSET; though depending on the setup it could be omitted.  Usually as long as the second execution cannot select any of the records the first execution touched you can choose a random quantity.  But if you want random then using OFFSET is pointless.

​SELECT *
FROM generate_series(1, 10)
ORDER BY 1
LIMIT 5
OFFSET 3

generate_series
----------------------
4
5
6
7
8
You are going to have difficultly finding people willing to help when you cannot put together a self-contained and syntax error free example (I think the last one is...) of what you want to do.  The PostgreSQL parser is very good at reading code and telling you what it doesn't like.  I'm not inclined to spend time reading queries that obviously cannot run and point out those same problems.  If you can a particular error you don't understand I'll be happy to try and explain what it is trying to tell you.

​You probably need to reformulate your update to read:

UPDATE tbl
FROM (
SELECT 50 RECORDS
)​ src
WHERE src = tbl;

​And ensure that the 50 being selected each time through are a different 50.

Writeable CTEs will probably help here.


​David J.



Hi David.

The SQLs inside the function works....

​Really?  ​You seem to have lost your FOR loop for starters, and your RETURN statement, and a semi-colon after END, and I doubt crtRow.file_id works, should I go on...so, yes, you can run the four individual SQL statements correctly but the function itself is bogus.
 
I'm just having problem about limiting the query to the number of rows I want, and also, to teach the update SQL to only touch the records the other SQLs inside the function have touched.
This is the function updated:

CREATE or REPLACE FUNCTION function_data_1(rows integer)
RETURNS INTEGER AS $$

declare
  completed integer;
  offset_num integer;
  crtRow record;

BEGIN
  offset_num = 0;

INSERT INTO table2_y_b (note_id, size, file_id, full_path)
    (
            SELECT
                    t1.note_id,
                    t1.size,
                    t1.file_id,
                    t1.full_path
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1.file_id
    );

UPDATE table2_y_b t2 SET segment_data =
    (
            SELECT
                    o1.data
            FROM
                    original_table1_b o1
            JOIN
                    table3_n_b t3 ON t3.file_id = o1.file_id
            WHERE
                    t2.migrated = 0
            AND
                    t2.file_id = o1.file_id
    );

UPDATE table2_y_b SET migrated = 1 WHERE file_id = crtRow.file_id AND migrated = 0;

UPDATE original_table1_b SET data = NULL WHERE file_id = crtRow.file_id;

END

$$ language 'plpgsql';




- As you can see, the first insert, inserts data into a new table from another select. This query must be limited by the number of rows I'll provide when calling the function; example:

select function_data_1(5000);
select function_data_1(60000);
select function_data_1(15000);

- The first update, copies the BLOBS from the original_table1_b table into the new one (as above). Here, I also need the query knows to only touch those records that have been touched by the above query.


- The second update, set the table2_y_b.migrated column from 0 to 1, telling me that, that record has been touched by the query. So the next call ( select function_data_1(60000); ) will already know that it does not need to touch that record; example:

WHERE
                    t2.migrated = 0

- The third and last update, deletes (set the blobs column as null) the blobs that have already been touched by the above queries.... Still.. don't know how to tell postgres to only touches the rows that have been touched by the above queries....


​Here's a fish - though you will still need to clean it.​

​This is not tested, and I haven't ever build this exact query for real, but it should work in theory...

--assumes that to be migrated records have previously had their migrated flag set to 0

function name (number_of_rows_to_process integer)
LANGUAGE sql -- this no longer requires procedural logic so no need for plpgsql
RETURNS SETOF bigint --returns the affected ids
AS $$
WITH the_records_I_want_to_affect AS (
-- pick N records to process
SELECT id, ...
FROM source_tbl
WHERE migrated = 0
​ORDER BY ...
LIMIT number_of_rows_to_process  -- your function argument goes here
FOR UPDATE
)​,
migrate_the_data AS (
-- place a copy of them into the archive table
INSERT INTO migration_table
SELECT id, ...
FROM the_records_I_want_to_affect 
RETURNING *
),
mark_as_migrated AS (
-- mark them as having been archived and nullify the blob data
UPDATE source_tbl
SET migrated = 1, data = null
FROM migrate_the_data recs
RETURNING source_tbl.id
)
SELECT id FROM mark_as_migrated;
$$

​I am sure a fully working version of this idiom in present in one and more places on the internet.  Feel free to search out fully working examples with additional commentary.​

You can make a FOR loop version of this work, and had to many years ago before writable CTEs were implemented.

David J.

В списке pgsql-general по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: dumb question
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: dumb question