Re: Finding latest record for a number of groups in an INSERT-only table

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: Finding latest record for a number of groups in an INSERT-only table
Дата
Msg-id 9AC6F5A9-35DF-442D-87F0-D16AFE7F8148@solfertje.student.utwente.nl
обсуждение исходный текст
Ответ на Re: Finding latest record for a number of groups in an INSERT-only table  (Daniel Farina <daniel@heroku.com>)
Ответы Re: Finding latest record for a number of groups in an INSERT-only table  (Daniel Farina <daniel@heroku.com>)
Re: Finding latest record for a number of groups in an INSERT-only table  (Chris Travers <chris.travers@gmail.com>)
Список pgsql-general
On 5 Jul 2011, at 9:13, Daniel Farina wrote:

>> Setup a materialized view.
>
> This rather defeats the point AFAIK, because keeping the materialized
> view up to date (being more than thirty seconds out of date is not
> desirable) will be expensive.  Maintaining the index on the (key,
> recency) pair is, in effect, a materialization of sorts already.

Except that you can't query an index directly and you can't create an index that only contains the latest rows without
markingthem as such, which is practically what a materialized view does. 

Your concerns about the overhead of the materialized view seem rather pessimistic. The materialization adds some
overheadto every insert/update/delete to test whether the current record in the materialized view should be replaced or
kept,but since it's a much smaller table, that should be quite fast. 

It's basically just a set of insert/update/delete triggers that fire once for every row, or once for every statement
(thatcan be more efficient with insert-heavy tables, as long as you group your inserts into a single statement). 

Whether you can handle the extra overhead depends on your insert-load.

You could also go with an intermediate solution, where you cache the latest record per key in the new inserts in a
batchand send those to a different table once the batch finishes processing. The insert-load on that table will be much
lower,so there will be more time to execute triggers of some kind to update the materialized view. Of course, you also
introducesome lag there with respect to the visibility of those rows. 

> In any case, as I was saying: there are terrible workarounds for this,
> but I think this is a rather common problem with INSERT-mostly
> relations that effectively want row-versioning of a sort, so I was
> hoping that lucid solutions to this issue have grown since 2008, when
> the thread I linked to transpired.


You really only want to know that a row is the latest in a group of similar rows. With MVCC, you can only really
achievethat by updating the latest row and the last latest row to reflect their status (latest or not). That means you
dotwo DELETE and INSERT operations every time you want to update what the latest row is. 

That's hardly different from using triggers anyway.

The alternatives are the various SELECT queries that you've already seen.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4e1340c712095745717687!



В списке pgsql-general по дате отправления:

Предыдущее
От: Alban Hertroys
Дата:
Сообщение: Re: Read MS-SQL data into Postgres via ODBC link?
Следующее
От: c k
Дата:
Сообщение: Python UCS4 error