Re: Access statistics
| От | Jan Wieck |
|---|---|
| Тема | Re: Access statistics |
| Дата | |
| Msg-id | 200106011331.f51DV5P02078@jupiter.us.greatbridge.com обсуждение исходный текст |
| Ответ на | Re: Access statistics (Peter Eisentraut <peter_e@gmx.net>) |
| Ответы |
Re: Access statistics
|
| Список | pgsql-hackers |
Peter Eisentraut wrote:
> Jan Wieck writes:
>
> > In the next couple of hours (at least tomorrow) I would be
> > ready to commit the backend changes for table-/index-access
> > statistics and current backend activity views.
> >
> > Should I apply the patches or provide a separate patch for
> > review first?
>
> Maybe you could describe what it's going to do and how it's going to work.
Real programmers don't comment - if it was hard to write it should be hard to read :-)
So outing myself not beeing a *real programmer*, this is what I have so far:
* On startup the postmaster creates an INET domain UDP socket and bind(2)'s it to localhost:0, meaning the
kernel will assign a yet unused, unprivileged port that could be seen with getsockaddr(2).
It then starts two background processes of which one is simply a wraparound buffer doing recvfrom(2) on
thesocket, checking that the source address of the received packets is the sockets own address (!) and
forwardingapproved ones over a pipe to the second one, discribed later.
* Backends call some collector functions at various places now (these will finally be macros), that count
uptable scans, tuples returned by scans, buffer fetches/hits and the like. At the beginning of a statement
thebackends send a message telling the first couple of hundred bytes of the querystring and after the
statementis done (just before getting ready for the next command) they send the collected access numbers.
Tables, indexes etc. in these statistics are identified by OID, so the data doesn't tell much so far.
* The second background process fired by the postmaster collects these numbers into hashtables and as
longas it receives messages, it'll write out a summary file every 500 or so milliseconds, telling a snapshot
ofcurrent stats. Using select(2) with timeouts ensures that a complete idle DB instance not to waste a single
CPUcycle or IO to write these snapshots.
On startup it tries to read the last snapshot file in, so the collected statistics survive a postmaster
restart.
Vacuum reads the file too and sends bulk delete messages for objects that are gone. So the stats
don't grow infinitely.
* A bunch of new builtin functions gain access to the snapshot file. At first call of one of these
functions during a transaction, the backend will read the current file and return the numbers from in memory
then.
Based on these functions a couple of views can tell these collected stats. Information from the
databasessystem catalog is of course required to identify the objects in the stats, but I think those
informationsshould only be visible to someone who identified herself as a valid DB user anyway.
The visibility of querystrings (this info is available cross DB) is restricted to DB superusers.
There has been discussion already about using an INET vs. UNIX UDP socket for the communication. At least
forLinux I found INET to be the most effective way of communication. And for security concerns: If someone else
thanroot can really send packets to that socket that show up with a source address of 127.0.0.1:n, where
n is a portnumber actually occupied by your own socket, be sure you'll have more severe problems than modified
accessstatistics.
The views should be considered examples. The final naming and layout is subject for discussion.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
В списке pgsql-hackers по дате отправления: