Re: why does count take so long?
От | Jean-Luc Lachance |
---|---|
Тема | Re: why does count take so long? |
Дата | |
Msg-id | 3F5F406B.9E8B5138@nsd.ca обсуждение исходный текст |
Ответ на | why does count take so long? (Joseph Shraibman <jks@selectacast.net>) |
Список | pgsql-general |
Christopher, You did not quite understand. The counts in question is the actual counts (deltas) for the transtactions. the tuple should be something like: xid, reltype, insert_count, update_count, delete_count When a COUNT(*) is issued the commited tuples are totaled up stored as xid 0 or whatever and the commited tuples deleted (all this with appropriate locks). I am not sure if there is a need for the update count. JLL Christopher Browne wrote: > > In the last exciting episode, jllachan@nsd.ca (Jean-Luc Lachance) wrote: > > How about keeping counts of inserts, deletes and updates per table per > > transaction as part of the live statistics? > > Aye, there's the rub. > > That's exactly what _won't_ work, and that's exactly the case that is > somewhat pathological under MVCC. > > With MVCC, data "appears as if by magic" when its transaction COMMITs, > thereby revealing the rows to the world. > > Unfortunately, there's no simple way of making updates to counts > "simply appear" when they take effect, not without turning the updates > into a concurrency bottleneck. > > Here's a bit of a wild thought... > > Assume a table with schema as follows: > create table pg_count ( > xid integer, --- Actually, it's not an integer, right, Andrew? :-( > reltype oid, > count integer > ); > > Every transaction, "xid," affects some number of tuples. So that for > a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4 > rows from table with 45678, part of the transaction would include > inserting these rows: > > insert into pg_count (xid, reltype, count) values (2134151, 12345, 5); > insert into pg_count (xid, reltype, count) values (2134151, 45678, -4); > > In order to get the count for table 12345, you could then go to > pg_count and request: > select sum(count) from pg_count where reltype = 12345; > > The size of this table would increase every time a transaction gets > committed, so presumably part of VACUUM TABLE would be a > collection/summarization, thus... > > -- Collect existing stats into 1 row > insert into pg_count(xid, reltype, count) values (currxid, > currtable, select sum(count) from pg_count where reltype = > currtable); > > -- Delete the old stats > delete from pg_count where reltype = currtable and xid <> currxid; > > This will cope with concurrency reasonably well (modulo having to make > sure that the "collect/delete" transaction is a serialized one). > > Unfortunately, if a table is updated frequently, the summary > select sum(count) from pg_count where reltype = 12345; > will have to collect together quite a large number of entries, which > makes this "less cheap." > > That suggests an optimization; any time the COUNT is selected, the old > stats can and should be collected into 1 row and the old data deleted. > -- > wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca'). > http://www3.sympatico.ca/cbbrowne/nonrdbms.html > Sturgeon's Law: 90% of *EVERYTHING* is crud. > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
В списке pgsql-general по дате отправления:
Предыдущее
От: Mike MascariДата:
Сообщение: Re: Constructing column from different individual fields