Re: Postgres code for a query intermediate dataset

Поиск
Список
Период
Сортировка
От Atri Sharma
Тема Re: Postgres code for a query intermediate dataset
Дата
Msg-id CAOeZVied1mQ5CUKGL7rMTOG0tmh1yd+Wow_VHwGWMmisWE7g+g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Postgres code for a query intermediate dataset  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Список pgsql-hackers


On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 14/09/14 19:25, Atri Sharma wrote:


On Sunday, September 14, 2014, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>

wrote:

    On 14/09/14 05:36, Rohit Goyal wrote:

        Hi All,

        I want to work on the code of intermediate dataset of select and
        update
        query.

        For example.

        Rohit's salary has been updated 4 times, so it has 4 different
        version
        of salary.

        I want to select  salary of person named Rohit. Now suppose , in
        intermediate result, I found 4 different versions of the data. I
        want to
        know the code portion which i need to look for working on all 4
        versions
        in dataset. :)



    Hi Rohit,

    Currently in Postgres, these intermediate versions all exist -
    however a given session can only see one of them. Also VACUUM is
    allowed to destroy versions that no other transactions can see.

    So if I'm understanding you correctly, you would like to have some
    way for a session to see *all* these versions (and I guess
    preventing VACUUM from destroying them).



Any modifications of that sort are bound to introduce lots of pain, not
to mention performance degradation and the added responsibility of
ensuring that dead tuples don't bloat up the system (prevent vacuum from
running at regular intervals and you can have a xid wraparound).

I just mentioned that in case you are planning to go in that direction.
If you only want the data, use the triggers as Gavin mentioned.


Obviously in the general case sure - but (as yet) we don't have much idea about Rohit's use case and workload. If retrieving past versions is the *primary* workload bias and high update concurrency is not required then this could well work better than a trigger based solution.

And it does not seem too onerous to have the ability to switch this on as required, viz:

ALTER TABLE table1 VERSIONING;

(or similar syntax) which makes VACUUM leave this table alone. It might make more sense to make such a concept apply to a TABLESPACE instead mind you (i.e things in here are for archive/versioning purposes)...




What I think can be done is have a tuplestore which has the delta of updated rows i.e. only have the changes made in an update statement stored in a tuplestore (it could be a part of RelationData). It should be simple enough to have tuplestore store the oid of the inserted tuple and the difference between new tuple and the old tuple. No changes need to be done for old tuple since it can be marked as deleted and VACUUM can remove it as normal logic.

Not a clean way, but should work for what you proposed.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Atri Sharma
Дата:
Сообщение: Re: Postgres code for a query intermediate dataset
Следующее
От: Mark Kirkwood
Дата:
Сообщение: Re: Postgres code for a query intermediate dataset