Re: Explain buffers display units.
От | Robert Haas |
---|---|
Тема | Re: Explain buffers display units. |
Дата | |
Msg-id | 603c8f071002151848o5cea7d02hc58c739b84728a2a@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Explain buffers display units. (Greg Stark <stark@mit.edu>) |
Ответы |
Re: Explain buffers display units.
(Greg Stark <stark@mit.edu>)
|
Список | pgsql-hackers |
On Mon, Feb 15, 2010 at 6:44 PM, Greg Stark <stark@mit.edu> wrote: > I did respond to it. The whole point is that the text output is for a > human to read. It should be printed in human-readable units. Not some > arbitrary internal unit of accounting that they then have to do > arithmetic on to make sense of. Well, I disagree with your statement the previous output was not printed in human-readable units: it was printed in blocks, which I find to be a perfectly good unit. It's true that the basic unit of blocks can be converted into kilobytes, but so what? We aren't really measuring kilobytes; we're measuring blocks. We could equally well convert the sort and hash output from kilobytes into blocks, but it would be equally wrong: the sort and hash statistics are measuring memory usage by adding up actual memory allocations. The buffer statistics are simply counting the number of blocks that are read or written. Multiplying by the block size makes it sound as if all the memory was read or used, which is simply not the case - especially for things like buffer hits, which don't actually read or allocate any memory at all. > We do *not* display raw block numbers anywhere else. Generally I think > we should have a policy of outputing human-readable standard units of > memory whenever displaying a memory quantity. Actually I thought we > already had that policy, hence things like: > > postgres=# show shared_buffers; > shared_buffers > ---------------- > 28MB > (1 row) > > postgres=# show checkpoint_timeout; > checkpoint_timeout > -------------------- > 5min > (1 row) We certainly do that for GUCs, and in that context it seems to me to make sense. If you set your shared buffers to a gigabyte, PG will use an additional GB of memory. But if you hit a "gigabyte" of shared buffers, you may be examining anywhere from one 8K block over and over again all the way up to a full GB of memory. Block hits and reads just don't add in the same way that actual memory allocations do. And at any rate, what we DON'T do for GUCs is produce differing output format for the same parameter based on the magnitude of the output value, as you've done here. We accept input in several different formats, but there is only one canonical output formal for any particular GUC, which is furthermore always chosen in such a way that the exact value of the setting is preserved (again, unlike what you've done here). > The other examples you name are all internal or machine-readable > fomats which have to be formatted somehow using sql queries or tools > if you want to inspect the values directly. The user is free to format > the output of the pg_stat* functions using pg_size_pretty() though > it's annoying that it's not in the same base unit that > pg_relation_size() outputs but these are the only interface to these > internal counters so there's no way to know if they're being used for > human-readable output or for gathering raw data for statistics or > other purposes. So, you're saying we shouldn't look at the way that the pg_stat functions format the output because somebody might write a view over it that formats it in some different way that may or may not match what you've done for the EXPLAIN output? What makes you think that people don't just look at the raw numbers? I certainly have, and there's no suggestion in the documentation that users should do anything else. pg_stat_statements doesn't do what you're suggesting either; it, too, presents raw numbers, and lets the user make of it what they will. They might, for example, want to compute a hit ratio, as in the example provided in the docs. In the case of EXPLAIN of an index scan, they might want to estimate the number of seeks, on the theory that an inner-indexscan is going to be all random IO. >> I think this is a really terrible idea. You've got a lot of very >> specific formatting code in explain.c which anyone who wants to use >> the JSON and XML output will very possibly need to reimplement. I >> have worked really hard to keep the text format in sync with all the >> others, and up until now they have been. > > You're assuming the JSON and XML program is planning to display the > measurements? They might not be. They might be gathering them for > charting or for alerts or all kinds of other things. Even if they do > plan to output them they'll want to format it in way that makes sense > for the context it's used in which might include more or fewer digits > or plug into some widget which requires raw values and does the > formatting automatically. Yes, they might want to write their own formatting code, but they also might not. They might want to calculate hit ratios, or they might want to alter the number of decimal places, or they might just want to output the exact same information as the text format, but in a GUI format rather than using ASCII art. > Whereas the human-readable format should display values in a form > humans can parse, the machine-readable output should include the raw > measurements with enough information for the tool to make sense of it. > Probably the XML schema should include the units as an attribute for > each tag so tools don't have to hard-code knowledge about what unit > each tag is in. This doesn't seem to be a very carefully thought out proposal, because you haven't explained how it would work for JSON or YAML output. A format-neutral solution which we've already used for sort and hash information (and for GUCs) is to include the unit designator in the output.. But I generally think that trying to make the EXPLAIN output self-documenting to the point where programs don't need any specific knowledge of the meanings of the various properties is rather hopeless. If we want to provide that knowledge, it should be encoded in some sort of separate XML schema or data dictionary rather than encoded in the output of every run of EXPLAIN. I think Andres Freund actually wrote a relaxNG schema or something of that sort, which might or might not be a useful starting point, but nobody felt motivated to do anything with it. Upon further review, I also notice that this patch seems to have falsified the EXPLAIN documentation - both the description of the BUFFERS option and the description of the FORMAT option are no longer accurate. ...Robert
В списке pgsql-hackers по дате отправления:
Следующее
От: "Kevin Grittner"Дата:
Сообщение: Re: Error when building postgresql with contrib functions