Re: Multivariate MCV stats can leak data to unprivileged users

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: Multivariate MCV stats can leak data to unprivileged users
Дата
Msg-id CAEZATCWMjB7i_4TM49036heLPEjRFThPK2HF2sq38nx-aPUrdQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Multivariate MCV stats can leak data to unprivileged users  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Multivariate MCV stats can leak data to unprivileged users  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Thu, 6 Jun 2019 at 21:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> Hi,
>
> Attached are three patches tweaking the stats - two were already posted
> in this thread, the third one is just updating docs.
>
> 1) 0001 - split pg_statistic_ext into definition + data
>
> This is pretty much the patch Dean posted some time ago, rebased to
> current master (fixing just minor pgindent bitrot).
>
> 2) 0002 - update sgml docs to reflect changes from 0001
>
> 3) 0003 - define pg_stats_ext view, similar to pg_stats
>

Seems reasonable on a quick read-through, except I spotted a bug in
the view (my fault) -- the statistics_owner column should come from
s.stxowner rather than c.relowner.


> The question is whether we want to also redesign pg_statistic_ext_data
> per Tom's proposal (more about that later), but I think we can treat
> that as an additional step on top of 0001. So I propose we get those
> changes committed, and then perhaps also switch the data table to the
> EAV model.
>
> Barring objections, I'll do that early next week, after cleaning up
> those patches a bit more.
>
> One thing I think we should fix is naming of the attributes in the 0001
> patch. At the moment both catalogs use "stx" prefix - e.g. "stxkind" is
> in pg_statistic_ext, and "stxmcv" is in pg_statistic_ext_data. We should
> probably switch to "stxd" in the _data catalog. Opinions?
>

Yes, that makes sense. Especially when joining the 2 tables, since it
makes it more obvious which table a given column is coming from in a
join clause.


> Now, back to the proposal to split the _data catalog rows to EAV form,
> with a new data type replacing the multiple types we have at the moment.
> I've started hacking on it today, but the more I work on it the less
> useful it seems to me.
>
> My understanding is that with that approach we'd replace the _data
> catalog (which currently has one column per statistic type, with a
> separate data type) with 1:M generic rows, with a generic data type.
> That is, we'd replace this
>
>     CREATE TABLE pg_statistic_ext_data (
>         stxoid OID,
>         stxdependencies pg_dependencies,
>         stxndistinct pg_ndistinct,
>         stxmcv pg_mcv_list,
>         ... histograms ...
>     );
>
> with something like this:
>
>     CREATE TABLE pg_statistiex_ext_data (
>         stxoid OID,
>         stxkind CHAR,
>         stxdata pg_statistic_ext_type
>     );
>
> where pg_statistic_ext would store all existing statistic types. along
> with a "flag" saying which value it actually stored (essentially a copy
> of the stxkind column, which we however need to lookup a statistic of a
> certain type, without having to detoast the statistic itself).
>
> As I mentioned before, I kinda dislike the fact that this obfuscates the
> actual statistic type by hiding it behing the "wrapper" type.
>
> The other thing is that we have to deal with 1:M relationship every time
> we (re)build the statistics, or when we need to access them. Now, it may
> not be a huge amount of code, but it just seems unnecessary. It would
> make sense if we planned to add large number of additional statistic
> types, but that seems unlikely - I personally can think of maybe one new
> statistic type, but that's about it.
>
> I'll continue working on it and I'll share the results early next week,
> after playing with it a bit, but I think we should get the existing
> patches committed and then continue discussing this as an additional
> improvement.
>

I wonder ... would it be completely crazy to just use a JSON column to
store the extended stats data?

It wouldn't be as compact as your representation, but it would allow
for future stats kinds without changing the catalog definitions, and
it wouldn't obfuscate the stats types. You could keep the 1:1
relationship, and have top-level JSON keys for each stats kind built,
and you wouldn't need the pg_mcv_list_items() function because you
could just put the MCV data in JSON arrays, which would be much more
transparent, and would make the user-accessible view much simpler. One
could also imagine writing regression tests that checked for specific
expected MCV values like "stxdata->'mcv'->'frequency'->0".

Just a thought.

Regards,
Dean



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Asim R P
Дата:
Сообщение: Re: POC: Cleaning up orphaned files using undo logs
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Should we warn against using too many partitions?