Обсуждение: Implementing incremental client updates

Поиск
Список
Период
Сортировка

Implementing incremental client updates

От
Nii Gii
Дата:
Dear all,

I am a newcomer to postgres and love it so far. I've given this problem a lot of thought already, RTFM to the best of my ability, but hit a dead end, so I need a nudge in the right direction.

I'm designing a database where each entity of interest has a "rowversion" column that gets assigned a value from a global sequence. So, in the simplest scenario, if I have two rows in table "emps": emp1 with rowversion@3 and emp2 with rowversion@5, then I know emp2 was modified after emp1. 

This is to form the foundation of a data sync scenario, where a client knows they have everything up until @3 and now they need the latest updates (SELECT * FROM emps WHERE rowversion>3 and rowversion<=new_anchor). The problem here is, there's no way to interrogate the database for the "latest committed @rowversion that has none pending before it". An example scenario:

@3 - committed
@4 - committed
@5 - committed
@6 - in progress - not committed yet
@7 - in progress - not committed yet
@8 - committed
@9 - committed

When client asks for updated records, we ask the database for an appropriate new_anchor. Since the rows with rowversion @6 and @7 are still in progress, new_anchor has to be @5, so that our range query doesn't miss any uncommitted updates. Now the client can be confident it has everything up until @5.

So the actual problem distilled: how can this new_anchor be safely determined each time?

As you can probably tell I've borrowed this idea from SQL Server, where this problem is trivially solved by the min_active_rowversion() function. This function would return @6 in the above scenario, so your new_anchor is always going to be "min_active_rowversion()-1". I sort of had an idea how this could be implemented in postgres using an "active_rowversions" table, and a "SELECT min(id) FROM active_rowversions" but that would require READ UNCOMMITTED isolation, which is not available in postgres.

I would really appreciate any help or ideas.

Best regards,
Nick

Re: Implementing incremental client updates

От
Glyn Astill
Дата:
> From: Nii Gii <nkgm.realm@gmail.com>
>To: pgsql-sql@postgresql.org 
>Sent: Tuesday, 10 February 2015, 11:39
>Subject: [SQL] Implementing incremental client updates
> 
>
>
>Dear all,
>
>
>I am a newcomer to postgres and love it so far. I've given this problem a lot of thought already, RTFM to the best of
myability, but hit a dead end, so I need a nudge in the right direction. 
 
>
>
>I'm designing a database where each entity of interest has a "rowversion" column that gets assigned a value from a
globalsequence. So, in the simplest scenario, if I have two rows in table "emps": emp1 with rowversion@3 and emp2 with
rowversion@5,then I know emp2 was modified after emp1. 
 
>
>
>This is to form the foundation of a data sync scenario, where a client knows they have everything up until @3 and now
theyneed the latest updates (SELECT * FROM emps WHERE rowversion>3 and rowversion<=new_anchor). The problem here is,
there'sno way to interrogate the database for the "latest committed @rowversion that has none pending before it". An
examplescenario:
 
>
>
>@3 - committed
>@4 - committed
>@5 - committed
>@6 - in progress - not committed yet
>@7 - in progress - not committed yet
>@8 - committed
>@9 - committed
>
>
>When client asks for updated records, we ask the database for an appropriate new_anchor. Since the rows with
rowversion@6 and @7 are still in progress, new_anchor has to be @5, so that our range query doesn't miss any
uncommittedupdates. Now the client can be confident it has everything up until @5.
 
>
>
>So the actual problem distilled: how can this new_anchor be safely determined each time?
>
>
>As you can probably tell I've borrowed this idea from SQL Server, where this problem is trivially solved by the
min_active_rowversion()function. This function would return @6 in the above scenario, so your new_anchor is always
goingto be "min_active_rowversion()-1". I sort of had an idea how this could be implemented in postgres using an
"active_rowversions"table, and a "SELECT min(id) FROM active_rowversions" but that would require READ UNCOMMITTED
isolation,which is not available in postgres.
 
>
>
>I would really appreciate any help or ideas.

>

I guess one way to tackle it would be to try and make the assignment of rowversions from the sequence transactional in
someway.  I'm not sure if that's possible without reinventing some sort of sequence behaviour with locking, but you
mightbe able to achieve something workable by using a deferred trigger on the table to assign the sequence after the
initialmodification at transaction commit.  There could be a race condition with this approach, but something like: 
 


CREATE OR REPLACE FUNCTION update_rowversion() 
RETURNS trigger AS $$
BEGIN   UPDATE emps SET rowversion = nextval('rowversion_seq') WHERE <some pk> = NEW.<some pk>;   RETURN NEW;
END;
$$ LANGUAGE plpgsql VOLATILE;

CREATE CONSTRAINT TRIGGER emps_insert_rowversion
AFTER INSERT ON emps
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW 
EXECUTE PROCEDURE update_rowversion();

CREATE CONSTRAINT TRIGGER emps_update_rowversion
AFTER UPDATE ON emps
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
WHEN (OLD.rowversion IS NOT DISTINCT FROM NEW.rowversion)
EXECUTE PROCEDURE update_rowversion();

Obviously the downsides are that you're now doing extra work; an extra update, and the overhead of maintaining the
deferred-triggeraction queue. 
 

Another way to keep track of what is committed is to track the current snapshot, but there are complications there such
asfreezing of rows and xids wrapping around every 4 billion transcations.