Re: Deletion Challenge

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Deletion Challenge
Дата
Msg-id CAKFQuwbohNEjm+6A=HY8CV9sUKFOJ2LvuvnXTqKsnjCrrGrOZw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Deletion Challenge  (Adrian Klaver <adrian.klaver@aklaver.com>)
Ответы Re: Deletion Challenge  (Steve Crawford <scrawford@pinpointresearch.com>)
Список pgsql-general
On Wed, Dec 9, 2015 at 1:31 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/09/2015 12:24 AM, Berend Tober wrote:
Adrian Klaver wrote:
On 12/05/2015 08:08 AM, Berend Tober wrote:
/*

Deletion Challenge

I want to delete all but the most recent transaction, per person, from a
table that records a transaction history because at some point the
transaction history grows large enough to adversely effect performance,
and also becomes less relevant for retention.

...


test=> delete from cash_journal where ARRAY[click, cash_journal_id]
NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal
group by fairian_id);
DELETE 7

test=> SELECT * FROM cash_journal order by fairian_id, click,
cash_journal_id;
  click | cash_journal_id | fairian_id | debit | credit | balance
|           description
-------+-----------------+------------+-------+--------+---------+----------------------------------

    412 |               1 |          7 |     5 |        |      14 |
Sold food quantity 7 units.
     37 |               7 |          8 |     8 |        |       8 |
Ratified contract f1abd670358e03
     37 |               9 |          9 |     7 |        |       7 |
Ratified contract 1574bddb75c78a
     36 |              14 |         18 |     0 |      0 |       0 |
initial cash balance
    413 |               1 |         25 |       |    995 |       0 |
Redeemed bond 7719a1c782a1ba
(5 rows)


Nice.

The idea of a NOT IN query had occurred to me briefly, but I failed to
pursue it because at some point in the distant past I had gained the
impression that NOT IN queries were not computationally efficient.
During one round of testing I had like a million rows. I'll have to run
some EXPLAIN query testing with a larger data sample for comparison.
Thanks!

Plan B:

WITH d AS
    (SELECT * FROM
        cash_journal
    LEFT JOIN
        (SELECT
            MAX(ARRAY[click,cash_journal_id]) AS mx
        FROM
            cash_journal
        GROUP BY
            fairian_id)
        AS
            mxa
    ON
        mxa.mx=ARRAY[click, cash_journal_id]
    WHERE
        mx IS NULL)
DELETE FROM
    cash_journal
USING
    d
WHERE
    d.click = cash_journal.click
AND
    d.cash_journal_id = cash_journal.cash_journal_id;


​Couldn't the LEFT JOIN relation in the CTE be better written using "SELECT DISTINCT ON (click, cash_journal_id) click, cash_journal_id [...] ORDER BY click DESC, cash_journal_id" or something similar?  It doesn't seem like you should need to introduce an array and an aggregate here.

​It does have the negative property of only providing a single row; which excludes using it for the "last 5" part but I suspect it will be considerably faster for the single version.

David J.

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Deletion Challenge
Следующее
От: Steve Crawford
Дата:
Сообщение: Re: Deletion Challenge