On Tue, May 6, 2014 at 11:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The source code says that "query strings are normalized on a best effort
> basis", so perhaps we ought to say the same in the documentation.
Perhaps.
> It would be rather expensive to provide a guarantee of normalization:
> basically, we'd have to compute the normalized query string during parsing
> *even when the hashtable entry already exists*, and then store it
> somewhere where it'd survive till ExecutorEnd (but, preferably, not be
> leaked if we never get to ExecutorEnd; which makes this hard). I think
> most people would find that a bad tradeoff.
I certainly would.
> One cheap-and-dirty solution is to throw away the execution stats if we
> get to the end and find the hash table entry no longer exists, rather than
> make a new entry with a not-normalized string. Not sure if that cure is
> better than the disease or not.
I am certain that it is. Consider long running queries that don't
manage to get the benefit of the "aggressive decay for stick entries"
technique, because there is consistent contention.
> Another thought, though it's not too relevant to this particular scenario
> of intentional resets, is that we could raise the priority of entries
> for statements-in-progress even further. I notice for example that if
> entry_alloc finds an existing hashtable entry, it does nothing to raise
> the usage count of that entry.
To do otherwise would create an artificial prejudice against prepared
queries, though.
>>> This is a bit counterintuitive if you rely on the query to be normalised,
>>> e.g. for privacy reasons where you don’t want to leak query constants like
>>> password hashes or usernames.
>
> The bigger picture here is that relying on query normalization for privacy
> doesn't seem like a bright idea. Consider making sure that
> security-relevant values are passed as parameters rather than being
> embedded in the query text in the first place.
I agree.
--
Peter Geoghegan