Re: [HACKERS] 64-bit queryId?

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: [HACKERS] 64-bit queryId?
Дата
Msg-id CAM-w4HMDJYf5+kGefBtW7X7EQGA2RQ6rfmVAtt0FhSQMkp_hoA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] 64-bit queryId?  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы Re: [HACKERS] 64-bit queryId?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 30 September 2017 at 21:03, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

> I heard from customers that they periodically dump contents of
> pg_stat_statements and then build statistics over long period of time.  If
> even they leave default pg_stat_statements.max unchanged, probability of
> collision would be significantly higher.

Indeed. It's simple enough to export stats to prometheus with queryid
as the key. Then even if the query ages out of the database stats you
have graphs and derivative metrics going further back.

I have to admit this was my first reaction to the idea of using sha1
hashes for git commits as well. But eventually I came around. If the
chances of a hash collision are smaller than a cosmic ray flipping a
bit or a digital electronics falling into a meta-stable state then I
had to admit there's not much value in being theoretically more
correct.

In practice if the query has aged out of pg_stat_statements and you're
exporting pg_stat_statements metrics to longer-term storage there's
really nothing more "correct" than just using a long enough hash and
assuming there are no collisions anyways. If 64-bits is still not
sufficient we could just go to 160-bit sha1 or 256-bit sha256.

Actually there's a reason I'm wondering if we shouldn't use a
cryptographic hash or even an HMAC. Currently if you're non-superuser
we, quite sensibly, hide the query text. But we also hide the queryid.
The latter is really inconvenient since it really makes the stats
utterly useless. I'm not sure what the rationale was but the only
thing I can think of is a fear that it's possible to reverse engineer
the query using brute force. An HMAC, or for most purposes even a
simple cryptographic hash with a secret salt would make that
impossible.

(I have other advances in pg_stat_statements I would love to see. It
would be so much more helpful if pg_stat_statements also kept a few
examples of query parameters such as the most recent set, the set that
caused the longest execution, maybe the set with the largest of each
metric.)

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: [HACKERS] why subplan is 10x faster then function?
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] list of credits for release notes