Re: counting algorithm for incremental matview maintenance

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: counting algorithm for incremental matview maintenance
Дата
Msg-id 1368711134.33716.YahooMailNeo@web162904.mail.bf1.yahoo.com
обсуждение исходный текст
Ответ на Re: counting algorithm for incremental matview maintenance  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: counting algorithm for incremental matview maintenance
Re: counting algorithm for incremental matview maintenance
Список pgsql-hackers
<div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt">Josh
Berkus<josh@agliodbs.com> wrote:<br /><br />> It's fairly common for matviews to be constructed such that<br
/>>updates to them are strictly appends.  For example, a matview<br />> which has a daily summary would just get
appendedto each day,<br />> and existing rows would not change barring a major historical<br />> database
cleanup.<br/>><br />> It seems like we could ... and ought to ... optimize for this<br />> pattern somehow for
incrementalupdates.  That is, if the user<br />> knows that we're going to be only appending new rows and not<br
/>>modifying any old ones, s/he ought to be able to tell the<br />> database that somehow and avoid the overhead
ofchecking.  While<br />> the overhead of checking a count wouldn't be that high for a few<br />> hundred rows,
I'vedealt with matviews which were thousands to<br />> millions of rows.<br /><br />Thanks for the suggestion; I
willkeep an eye out for ways this<br />insight might allow an optimization.  I think there might be some<br
/>misunderstandingof the counting algorithm, though -- there is no<br />need to do a sequential pass through the
matviewexamining the<br />counts.<br /><br />I don't want to replicate the content of a fairly dense (in the<br />sense
ofhaving a lot of content per page) 10 page computer science<br />paper in this email, but for purposes of illustration
Iwill take a<br />very simple case and show how it works.  (This is not geared toward<br />your particular case,
becausethat could get kinda long to explain<br />here and now, but hopefully this will give an insight into the<br
/>techniqueoverall.)<br /><br />Let's say there is a table and matview like this:<br /><br />create table foo (fooid
intprimary key, val int not null);<br />create materialized view bar as select distinct val from foo;<br /><br />Let's
saythere are millions of rows in both, and that we have<br />flagged the view for incremental maintenance.  (Syntax
omittedto<br />avoid distracting bikeshedding on that when the point is the<br />algorithm.)<br /><br />Now, someone
runsthis:<br /><br />update foo set val = val + 1 where fooid between 1 and 10;<br /><br />What will happen is this:<br
/><br/>Since foo will be flagged as a relation which is referenced by an<br />incrementally maintained matview, a delta
relationwill be<br />materialized for this update, which will contain the net change to<br />the underlying table in
thecount_t system column.  "Before" tuples<br />will have a count of -1; "after" tuples will have a count of 1.<br
/>Thenthe query defining the view will be run *against the delta*,<br />resulting in a relation with a count_t column
reflectingthe net<br />change for each val.  Anything with a zero for the net change will<br />be dropped.  We will run
alogical UNION of this relation and the<br />bar matview.  In this case, we obviously want this to be done in a<br
/>waythat for each row in this "net change" relation, we do an index<br />scan against the bar matview; if not found,
weinsert the new row<br />into the matview with its count from the net change relation.<br />(That had better be
positiveor we have a bug -- so elog ERROR if<br />negative.)  If bar does contain a matching row, update count_t in<br
/>thatrow with the sum of its old value and the value from the "net<br />change" relation.  Of course, that new value
alsohad better be<br />positive or zero -- if zero we delete the old row rather than<br />updating count_t.<br /><br
/>Thecount_t column saves us from having to scan foo for all the old<br />val values.  It does not require any scan of
theentire bar<br />matview.  It allows us to zero in on exactly the right rows, and<br />lets us know what needs
doing.<br/><br />Clearly we want the planner to find the best plans for the interim<br />steps rather than hard-coding
particularrule-based plans; I just<br />used an example where it's pretty clear what a reasonable plan<br />would
be.<br/><br />Hopefully this makes it fairly clear that the only thing that an<br />optimization around the
"append-only"assertion for a matview would<br />be the ability to skip the probe for an existing record *related to<br
/>rowswhich are in the delta*.  As long as there is reasonable<br />indexing on the matview, maintenance for the
append-onlycase would<br />not involve scanning the entire matview.<br /><br />-- <br />Kevin Grittner<br
/>EnterpriseDB:http://www.enterprisedb.com<br />The Enterprise PostgreSQL Company<br /><br /></div> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jon Nelson
Дата:
Сообщение: Re: fallocate / posix_fallocate for new WAL file creation (etc...)
Следующее
От: Noah Misch
Дата:
Сообщение: Re: Parallel Sort