Hi everyone,
thanks for the reviews! Attached is v23 of the patch series, addressing
most of the points raised in the reviews.
A quick summary of the changes (I'll respond to the other threads for
points that deserve a bit more detailed discussion):
0) Rebase to current master. The main culprit was the pesky logical
replication patch committed a week ago, because SUBSCRIPTION and
STATISTICS are right next to each other in gram.y, various switches etc.
1) Many typos, mentioned by all the reviewers.
2) I've added a short explanation (in alter_table.sgml) of how ALTER
TABLE ... DROP COLUMN handles multivariate statistics, i.e. that those
are only dropped if there would be a single remaining column.
3) I've reworded 'thoroughly' to 'in more detail' in planstats.sgml, to
make Petr happy ;-)
4) Added missing comments to get_statistics_oid, RelationGetMVStatList,
update_mv_stats, ndistinct_for_combination. Also update_mv_stats() was
not used outside common.c, so I've made it static and removed the
prototype from mvstats.h.
5) I've changed 'statistics does not exist' to 'statistics do not exist'
on a number of places.
6) Removed XXX about checking for duplicates in CreateStatistics. I
agree with Petr that we shouldn't do such checks, as we're not doing
that for other objects (e.g. indexes).
7) I've moved moved the code loading statistics from get_relation_info
into a new function get_relation_statistics, to get rid of the
if (true)
{
...
}
block, which was there due to mimicking how index details are loaded
without having hasindex-like flag. I like this better than merging the
block into get_relation_info directly.
8) I've changed 'a statistics' to 'multivariate statistics' on a few
places in sgml docs, to make it clear it's not referring to the
'regular' statistics (e.g. at CREATE/DROP STATISTICS, mentioned by
Ideriha Takeshi).
9) I've changed the link in README.dependencies to
https://en.wikipedia.org/wiki/Functional_dependency as proposed by
Ideriha Takeshi. I'm pretty sure the wiki page about database
normalization, referenced by the original link, included a nice
functional dependency example some time ago, but it seems to have
changed and the new link is better.
But perhaps it's not a good idea to link to wikipedia, as the pages
clearly change quite significantly?
10) The CREATE STATISTICS now reports a nice 'already exists' message,
instead of the 'duplicate key', pointed out by Dilip.
11) MVNDistinctItem/MVNDistinctData now use FLEXIBLE_ARRAY_MEMBER for
the array, just like the other structs.
On 01/26/2017 12:01 PM, Kyotaro HORIGUCHI wrote:
> dependencies.c:
>
> dependency_dgree():
>
> - The k is assumed larger than 1. I think assertion is required.
>
> - "/* end of the preceding group */" seems to be better if it
> is just after the "if (multi_sort.." currently just after it.
>
> - The following comment seems mis-edited.
> > * If there is a single are no contradicting rows, count the group
> > * as supporting, otherwise contradicting.
>
> maybe this would be like the following? The varialbe counting
> the first "contradiction" is named "n_violations". This seems
> somewhat confusing.
>
> > * If there are no violating rows up to here, count the group
> > * as supporting, otherwise contradicting.
>
> - "/* first columns match, but the last one does not"
> else if (multi_sort_compare_dims((k - 1), (k - 1), ...
>
> The above comparison should use multi_sort_compare_dim, not
> dims
>
> - This function counts "n_contradicting_rows" but it is not
> referenced. Anyway n_contradicting_rows = numrows -
> n_supporing_rows so it and n_contradicting seem
> unncecessary.
>
Yes, absolutely. This was clearly unnecessary remainder of the original
implementation, and I failed to clean it up after adopting Dean's idea
of continuous dependency degree.
I've also reworked the method a bit, moving handling of the last group
into the main loop (instead of doing that separately right after the
loop, which I think was a bit ugly anyway). Can you check if you're
happy with the code & comments now?
>
> mvstats.h:
>
> - struct MVDependencyData/ MVDependenciesData
>
> The varialbe length member at the last of the structs should
> be defined using FLEXIBLE_ARRAY_MEMBER, from the convention.
>
Yes, fixed. The other structures already used that macro, but I failed
to notice MVDependencyData/ MVDependenciesData need that fix too.
>
> - I'm not sure how much it impacts performance, but some
> struct members seems to have a bit too wide types. For
> example, MVDepedenciesData.type is of int32 but it can have
> only '1' for now and it won't be two-digits. Also ndeps
> cannot be so large.
>
I doubt the impact on performance is measurable, particularly for the
global fields (e.g. nbuckets is tiny compared to the space needed for
the buckets themselves).
But I think you're right we shouldn't use fields wider than actually
needed (e.g. using uint32 for nbuckets is a bit insane, and uint16 would
be just fine). It's not just a matter of performance, but also a way to
document expected values etc.
I'll go through the fields and use smaller data types where appropriate.
>
> general:
> This patch uses int16 as the type of attrubute number but it
> might be better to use AttrNumber for the purpose.
> (Specifically it seems defined as the type for an attribute
> index but also used as the varialbe for number of attributes)
>
Agreed. Will check with the struct members.
>
> Sorry for the random comment in advance. I'll learn this further.
>
Thanks for the review!
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers