On 12/12/2012 1:12 PM, Simon Riggs wrote:
> Currently, ANALYZE collects data on all columns and stores these
> samples in pg_statistic where they can be seen via the view pg_stats.
>
> In some cases we have data that is private and we do not wish others
> to see it, such as patient names. This becomes more important when we
> have row security.
>
> Perhaps that data can be protected, but it would be even better if we
> simply didn't store value-revealing statistic data at all. Such
> private data is seldom the target of searches, or if it is, it is
> mostly evenly distributed anyway.
Would protecting it the same way, we protect the passwords in pg_authid,
be sufficient?
Jan
>
> It would be good if we could collect the overall stats
> * NULL fraction
> * average width
> * ndistinct
> yet without storing either the MFVs or histogram.
> Doing that would avoid inadvertent leaking of potentially private information.
>
> SET STATISTICS 0
> simply skips collection of statistics altogether
>
> To implement this, one way would be to allow
>
> ALTER TABLE foo
> ALTER COLUMN foo1 SET STATISTICS PRIVATE;
>
> Or we could use another magic value like -2 to request this case.
>
> (Yes, I am aware we could use a custom datatype with a custom
> typanalyze for this, but that breaks other things)
>
> Thoughts?
>
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin