Re: Improvement of pg_stat_statement usage about buffer hit ratio

Поиск
Список
Период
Сортировка
От KONDO Mitsumasa
Тема Re: Improvement of pg_stat_statement usage about buffer hit ratio
Дата
Msg-id 528B006D.5090001@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Improvement of pg_stat_statement usage about buffer hit ratio  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Improvement of pg_stat_statement usage about buffer hit ratio  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
(2013/11/19 12:03), Peter Geoghegan wrote:
> On Mon, Nov 18, 2013 at 6:12 PM, KONDO Mitsumasa
> <kondo.mitsumasa@lab.ntt.co.jp> wrote:
>> I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw
>> values be just simple. However, were his changes just simple? I cannot
>> understand his aesthetics sense and also you, too:-(
>
>> It's too complicated, and do you know how to tuning PG from information of
>> local_* and temp_*?
>> At least, I think that most user cannot tuning from these information, and
>> it might not be useful information only part of them.
>
> All of those costs are cumulative aggregates. If we didn't aggregate
> them, then the user couldn't possibly determine them on their own, to
> any approximation. That's the difference. If you think the local_* and
> temp_* aren't very useful, I'm inclined to agree, but it's too late to
> do anything about that now.
I regret past decision of Itagaki-san's patch, and improvement might not be 
possible. However, we can change it, if we get have logical reason to change it.

>> No. It's not for geek tools and people having pre-packaged solution in big
>> company, but also for common DBA tools.
>
> I don't think that the tool needs to be expensive. If selecting from
> the pg_stat_statements view every 1-3 seconds is too expensive for
> such a tool, we can have a discussion about being smarter, because
> there certainly are ways to optimize it.
I can understand why you say my patch is heavy now! Your monitoring methods are 
redically heavy. In general, we get pg_stat_statements view every 1 min - 5min. 
It is because monitoring SQLs must not heavier than common main SQLs. If we 
measure the real performance, we don't measure with monitoring SQL's cost. And, I 
cannot still understand you'd like to collect drastic detail performance of 
statements. I'd like to only know max, avg and stddev in each statement. They are 
enough, because we can improve them by using these information.

> Regarding your min/max patch: I'm opposed to adding even more to the
> spinlock-protected counters struct, so that we can get an exact answer
> to a question where an approximate answer would very likely be good
> enough. And as Itagaki-san said 4 years ago, who is to say that what
> you've done here for buffers (or equally, what you've done in your
> min/max patch) is more interesting than the same thing but for another
> cost? The point of having what you've removed from the
> pg_stat_statements docs about calculating averages is that it is an
> example that can be generalized from. I certainly think there should
> be better tooling to make displaying costs over time easier, or
> characterizing the distribution, but unfortunately this isn't it.
>
> Something like pg_stat_statements is allowed to be approximate. That's
> considered an appropriate trade-off. Most obviously, today there can
> be hash table collisions, and some of the entries can therefore be
> plain wrong. Previously, I put the probability of 1 collision in the
> hash table at about 1% when pg_stat_statements.max is set to 10,000.
> So if your min/max patch was "implemented in userspace", and an
> outlier is lost in the noise with just one second of activity, I'm not
> terribly concerned about that. It's a trade-off, and if you don't
> think it's the correct one, well then I'm afraid that's just where you
> and I differ. As I've said many times, if you want to have a
> discussion about making aggressive snapshotting of the
> pg_stat_statements view more practical, I think that would be very
> useful.
In summary of your comment, your patch is lower cost than I proposed patch. 
Because my patch has long lock problem, on the other hands your patch doesn't 
these problem. Is it right? If it true, I can understand your theoretically, but 
I'm not sure without real practice or benchmark that it is really or not. So we 
will need benchmark test in my patch and yours. I try it.

>> By the way, MySQL and Oracle database which are very popular have these
>> statistics. I think that your argument might disturb people who wants to
>> migration from these database and will accelerate popularity of these
>> database more.
>
> I think that there should be better tooling built on top of
> pg_stat_statements. I don't know what Oracle does, but I'm pretty sure
> that MySQL has nothing like pg_stat_statements. Please correct me if
> I'm mistaken.
I joined the db tech show case 2013 which is held in japan last week. Oracle 
speaker intoroduced performance schema and like these in MySQL 5.6. This is the 
slide of his. It's almost in japanese, but please see it since 31page. It is 
wirtten in SQL.
http://www.slideshare.net/yoyamasaki/20131110-tuning-onmysql56

In MySQL 5.6, it has information which are sum_time, min_time, avg_time, max_time 
and sum_lock_time. I think it is useful for understanding our executing statements.

>>> As I said on the min/max thread, if we're not
>>> doing enough to help people who would like to build such a tool, we
>>> should discuss how we can do better.
>>
>> Could you tell me how to get min/max statistics with low cost?
>
> See my previous comments on the other thread about making
> pg_stat_statements only return changed entries, and only sending the
> query text once.
OK.

>> Repeatedly, I think that if we want to get drastic detail statistics, we
>> have to create another tools of statistics. Your patch will be these
>> statistics tools. However, pg_stat_statement sholud be just light weight.
>
> This is incomprehensible. As with the cumulative aggregate costs, how
> is a consumer of pg_stat_statements possibly going to get the full
> query text from anywhere else? It *has* to come from
> pg_stat_statements directly.
I think that it is necessary to make it clear with real test.
I will read your patch and execute benchmark with each our patches.
However, I will travel on business tomorrow until this weekend.
I'm very sorry, but please wait it more.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: pre-commit triggers
Следующее
От: Tom Lane
Дата:
Сообщение: Re: lock on object is already held