Re: SQL Property Graph Queries (SQL/PGQ)
| От | Henson Choi |
|---|---|
| Тема | Re: SQL Property Graph Queries (SQL/PGQ) |
| Дата | |
| Msg-id | CAAAe_zCPm3W7ZTSBfKUgVJPrTn4-D99agGsz5P2MBLZN=tME3g@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: SQL Property Graph Queries (SQL/PGQ) (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
| Список | pgsql-hackers |
SQL/PGQ Patch Review Report
============================
Patch: SQL Property Graph Queries (SQL/PGQ)
Commitfest: https://commitfest.postgresql.org/patch/4904
Review Methodology:
This review focused on quality assessment, not line-by-line code audit.
Key code paths and quality issues were examined with surrounding context
when concerns arose. Documentation files were reviewed with AI-assisted
grammar and typo checking. Code coverage was measured using gcov and
custom analysis tools.
Limitations:
- Not a security audit or formal verification
- Parser and executor internals reviewed at module level, not exhaustively
- Performance characteristics not benchmarked
TABLE OF CONTENTS
-----------------
1. Executive Summary
2. Issues Found
2.1 Critical / Major
2.2 Minor Issues
2.3 Suggestions for Discussion
3. Test Coverage
3.1 Covered Areas
3.2 Untested Items
3.3 Unimplemented Features (No Test Needed)
4. Code Coverage Analysis
4.1 Overall Coverage
4.2 Coverage by File
4.3 Uncovered Code Risk Assessment
5. Recommended Additional Tests
5.1 Black-box Tests (Functional)
5.2 White-box Tests (Coverage)
5.3 Untestable Code (Defensive)
6. Recommendations
7. Conclusion
1. EXECUTIVE SUMMARY
--------------------
Overall assessment: GOOD
The SQL/PGQ patch demonstrates solid implementation quality within its
defined scope. Code follows PostgreSQL coding standards, test coverage
is comprehensive at 90.5%, and documentation is thorough with only
minor typos.
Rating by Area:
- Code Quality: Excellent (PostgreSQL style compliant, 1 FIXME)
- Test Coverage: Good (90.5% line coverage, ~180 test cases)
- Documentation: Good (Complete, 5 minor issues)
- Build/Regress: Pass (make check-world, 56 test suites passed)
2. ISSUES FOUND
---------------
2.1 Critical / Major
(None)
2.2 Minor Issues
#1 [Code] src/backend/commands/propgraphcmds.c:1632
FIXME: Missing index for plppropid column causes sequential scan.
Decision needed: (a) add index, or (c) allow seq scan for rare path.
#2 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:286
Compatibility section incorrectly states "CREATE PROPERTY GRAPH"
Should be: "ALTER PROPERTY GRAPH"
#3 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:65
Synopsis shows VERTEX/NODE and EDGE/RELATIONSHIP as separate clauses,
but Description merges them as {VERTEX|NODE|EDGE|RELATIONSHIP}.
#4 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:80
Grammar error: "the graph removed" should be "the graph is removed"
#5 [Doc] doc/src/sgml/queries.sgml:2929,2931
TODO comments for unimplemented features without clear limitation notes.
#6 [Doc] doc/src/sgml/ddl.sgml:5717
Typo: "infrastucture" should be "infrastructure"
2.3 Alternative Approaches for Discussion
#1 Support CREATE PROPERTY GRAPH IF NOT EXISTS
Rationale: PostgreSQL-style extension, consistent with other DDL.
#2 Return 0 rows for non-existent labels instead of error
Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:NonExistent) ...) raises error
Alternative: Return empty result set instead
Rationale: Better for exploratory queries, similar to SQL WHERE on
non-existent values returning 0 rows rather than error.
#3 Return 0 rows when same variable has different labels
Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:Person)-[e]-(a:Company) ...)
raises error because variable 'a' has conflicting labels.
Alternative: Return empty result set instead
Rationale: Logically, a node cannot be both Person AND Company,
so 0 rows is the correct result. Consistent with standard SQL
WHERE semantics (impossible condition = 0 rows, not error).
3. TEST COVERAGE
----------------
3.1 Covered Areas
- CREATE PROPERTY GRAPH: Empty graph, VERTEX/EDGE TABLES, KEY, LABEL, PROPERTIES
- ALTER PROPERTY GRAPH: ADD/DROP VERTEX/EDGE, ADD/DROP LABEL, ADD/DROP PROPERTIES
- DROP PROPERTY GRAPH: Basic DROP, IF EXISTS
- RENAME TO / SET SCHEMA: Tested in alter_generic.sql
- OWNER TO: Permission change tests
- GRAPH_TABLE queries: Pattern matching, WHERE, COLUMNS, LATERAL, etc.
- RLS integration: PERMISSIVE/RESTRICTIVE policies, inheritance/partition tables
- Error cases: Type/collation mismatch, non-existent objects, etc.
- psql integration: \dG, \d commands
- pg_dump: Tested in t/002_pg_dump.pl
3.2 Untested Items
- CREATE TEMP PROPERTY GRAPH (Medium priority)
- DROP ... CASCADE (Low priority)
- DROP ... RESTRICT (Low priority)
3.3 Unimplemented Features (No Test Needed)
- Multiple path patterns: Parser supports, execution not implemented
- Quantifier {n,m}: Parser supports, execution not implemented
4. CODE COVERAGE ANALYSIS
-------------------------
4.1 Overall Coverage
90.5% (2,029 / 2,243 lines)
4.2 Coverage by File
propgraphcmds.c: 94.6% (DDL command processing)
rewriteGraphTable.c: 94.6% (GRAPH_TABLE rewrite)
parse_graphtable.c: 97.6% (Graph query parser)
parse_clause.c: 98.0% (FROM clause parser)
ruleutils.c: 85.1% (pg_get_propgraphdef, etc.)
pg_dump.c: 84.0% (Dump/restore)
describe.c: 88.9% (psql \dG command)
4.3 Uncovered Code Risk Assessment
Low Risk (13 items):
- Defensive code, debug functions, unimplemented feature guards
Medium Risk (1 item):
- TEMP graph functionality untested
Conclusion: Most uncovered code consists of error handling, unimplemented
feature guards, and debug utilities. No security or functional risk.
5. RECOMMENDED ADDITIONAL TESTS
-------------------------------
5.1 Black-box Tests (Functional)
Based on feature specification, independent of code structure.
Feature Test Case Priority
-------------------------------------------------------------------------------
TEMP graph CREATE TEMP PROPERTY GRAPH Medium
TEMP graph TEMP graph with permanent table reference Medium
TEMP graph ALTER ADD permanent table to TEMP graph Medium
CASCADE DROP ... CASCADE (dependent view cascade) Low
RESTRICT DROP ... RESTRICT (dependent object error) Low
5.2 White-box Tests (Coverage)
Based on uncovered code paths identified in coverage analysis.
File:Line Test Case Target Branch
-------------------------------------------------------------------------------
propgraphcmds.c:136,179,254-262 TEMP table in graph creation Auto TEMP conversion
propgraphcmds.c:1323,1364 ALTER ADD TEMP to perm graph Error branch
propgraphcmds.c:724-734 CREATE without LABEL clause Default label else
propgraphcmds.c:1300,1303 ALTER IF EXISTS non-existent missing_ok branch
propgraphcmds.c:116 CREATE UNLOGGED attempt UNLOGGED error
execMain.c:1162-1163 DML on graph RELKIND_PROPGRAPH
rewriteGraphTable.c:120 Multiple path patterns Length check
rewriteGraphTable.c:205 Quantifier usage Quantifier error
ruleutils.c:7939-7963 Complex label VIEW deparsing T_BoolExpr branch
5.3 Untestable Code (Defensive)
File:Line Reason
-------------------------------------------------------------------------------
rewriteGraphTable.c:202 PAREN_EXPR blocked at parser level
rewriteGraphTable.c:297,304,319 Cyclic pattern edge conflict - unreachable
rewriteGraphTable.c:801-818 get_gep_kind_name - error path only
nodeFuncs.c:4585-4596,4755-4774 Walker branches - internal implementation
6. RECOMMENDATIONS
------------------
6.1 Code Review Required (Minor, Decision Needed)
Location: propgraphcmds.c:1632
Issue: FIXME - No single-column index for plppropid
Current: InvalidOid causes sequential scan
Purpose: Check property references across all labels when deleting orphaned properties
Options:
(a) Add index: Create new single-column index on plppropid
(b) Use existing: NOT POSSIBLE (cannot specify plpellabelid, need all labels)
(c) Allow: Rare path (property deletion), sequential scan acceptable
6.2 Documentation Fixes (Minor)
- alter_property_graph.sgml: 3 typos/errors to fix
- queries.sgml: Add clear limitation notes for unimplemented features
- ddl.sgml: Fix "infrastucture" typo
6.3 Test Additions (Optional)
- CREATE TEMP PROPERTY GRAPH tests
- DROP ... CASCADE/RESTRICT tests
7. CONCLUSION
-------------
Test Quality: GOOD
Core functionality is thoroughly tested with comprehensive error case and
security (RLS) integration tests.
The patch is well-implemented within its defined scope. Identified issues
are minor documentation typos and one code decision point (FIXME).
No critical or major issues found.
Recommended actions before commit:
1. Decide on FIXME at propgraphcmds.c:1632 (index vs. allow seq scan)
2. Fix 5 documentation issues
3. Consider adding TEMP graph tests (optional but recommended)
Points for discussion (optional):
4. CREATE PROPERTY GRAPH IF NOT EXISTS support
5. Error vs. 0 rows behavior for non-existent/conflicting labels
Attachment:
- coverage_report.tar.gz (HTML coverage report generated by gcov)
============================
Patch: SQL Property Graph Queries (SQL/PGQ)
Commitfest: https://commitfest.postgresql.org/patch/4904
Review Methodology:
This review focused on quality assessment, not line-by-line code audit.
Key code paths and quality issues were examined with surrounding context
when concerns arose. Documentation files were reviewed with AI-assisted
grammar and typo checking. Code coverage was measured using gcov and
custom analysis tools.
Limitations:
- Not a security audit or formal verification
- Parser and executor internals reviewed at module level, not exhaustively
- Performance characteristics not benchmarked
TABLE OF CONTENTS
-----------------
1. Executive Summary
2. Issues Found
2.1 Critical / Major
2.2 Minor Issues
2.3 Suggestions for Discussion
3. Test Coverage
3.1 Covered Areas
3.2 Untested Items
3.3 Unimplemented Features (No Test Needed)
4. Code Coverage Analysis
4.1 Overall Coverage
4.2 Coverage by File
4.3 Uncovered Code Risk Assessment
5. Recommended Additional Tests
5.1 Black-box Tests (Functional)
5.2 White-box Tests (Coverage)
5.3 Untestable Code (Defensive)
6. Recommendations
7. Conclusion
1. EXECUTIVE SUMMARY
--------------------
Overall assessment: GOOD
The SQL/PGQ patch demonstrates solid implementation quality within its
defined scope. Code follows PostgreSQL coding standards, test coverage
is comprehensive at 90.5%, and documentation is thorough with only
minor typos.
Rating by Area:
- Code Quality: Excellent (PostgreSQL style compliant, 1 FIXME)
- Test Coverage: Good (90.5% line coverage, ~180 test cases)
- Documentation: Good (Complete, 5 minor issues)
- Build/Regress: Pass (make check-world, 56 test suites passed)
2. ISSUES FOUND
---------------
2.1 Critical / Major
(None)
2.2 Minor Issues
#1 [Code] src/backend/commands/propgraphcmds.c:1632
FIXME: Missing index for plppropid column causes sequential scan.
Decision needed: (a) add index, or (c) allow seq scan for rare path.
#2 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:286
Compatibility section incorrectly states "CREATE PROPERTY GRAPH"
Should be: "ALTER PROPERTY GRAPH"
#3 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:65
Synopsis shows VERTEX/NODE and EDGE/RELATIONSHIP as separate clauses,
but Description merges them as {VERTEX|NODE|EDGE|RELATIONSHIP}.
#4 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:80
Grammar error: "the graph removed" should be "the graph is removed"
#5 [Doc] doc/src/sgml/queries.sgml:2929,2931
TODO comments for unimplemented features without clear limitation notes.
#6 [Doc] doc/src/sgml/ddl.sgml:5717
Typo: "infrastucture" should be "infrastructure"
2.3 Alternative Approaches for Discussion
#1 Support CREATE PROPERTY GRAPH IF NOT EXISTS
Rationale: PostgreSQL-style extension, consistent with other DDL.
#2 Return 0 rows for non-existent labels instead of error
Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:NonExistent) ...) raises error
Alternative: Return empty result set instead
Rationale: Better for exploratory queries, similar to SQL WHERE on
non-existent values returning 0 rows rather than error.
#3 Return 0 rows when same variable has different labels
Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:Person)-[e]-(a:Company) ...)
raises error because variable 'a' has conflicting labels.
Alternative: Return empty result set instead
Rationale: Logically, a node cannot be both Person AND Company,
so 0 rows is the correct result. Consistent with standard SQL
WHERE semantics (impossible condition = 0 rows, not error).
3. TEST COVERAGE
----------------
3.1 Covered Areas
- CREATE PROPERTY GRAPH: Empty graph, VERTEX/EDGE TABLES, KEY, LABEL, PROPERTIES
- ALTER PROPERTY GRAPH: ADD/DROP VERTEX/EDGE, ADD/DROP LABEL, ADD/DROP PROPERTIES
- DROP PROPERTY GRAPH: Basic DROP, IF EXISTS
- RENAME TO / SET SCHEMA: Tested in alter_generic.sql
- OWNER TO: Permission change tests
- GRAPH_TABLE queries: Pattern matching, WHERE, COLUMNS, LATERAL, etc.
- RLS integration: PERMISSIVE/RESTRICTIVE policies, inheritance/partition tables
- Error cases: Type/collation mismatch, non-existent objects, etc.
- psql integration: \dG, \d commands
- pg_dump: Tested in t/002_pg_dump.pl
3.2 Untested Items
- CREATE TEMP PROPERTY GRAPH (Medium priority)
- DROP ... CASCADE (Low priority)
- DROP ... RESTRICT (Low priority)
3.3 Unimplemented Features (No Test Needed)
- Multiple path patterns: Parser supports, execution not implemented
- Quantifier {n,m}: Parser supports, execution not implemented
4. CODE COVERAGE ANALYSIS
-------------------------
4.1 Overall Coverage
90.5% (2,029 / 2,243 lines)
4.2 Coverage by File
propgraphcmds.c: 94.6% (DDL command processing)
rewriteGraphTable.c: 94.6% (GRAPH_TABLE rewrite)
parse_graphtable.c: 97.6% (Graph query parser)
parse_clause.c: 98.0% (FROM clause parser)
ruleutils.c: 85.1% (pg_get_propgraphdef, etc.)
pg_dump.c: 84.0% (Dump/restore)
describe.c: 88.9% (psql \dG command)
4.3 Uncovered Code Risk Assessment
Low Risk (13 items):
- Defensive code, debug functions, unimplemented feature guards
Medium Risk (1 item):
- TEMP graph functionality untested
Conclusion: Most uncovered code consists of error handling, unimplemented
feature guards, and debug utilities. No security or functional risk.
5. RECOMMENDED ADDITIONAL TESTS
-------------------------------
5.1 Black-box Tests (Functional)
Based on feature specification, independent of code structure.
Feature Test Case Priority
-------------------------------------------------------------------------------
TEMP graph CREATE TEMP PROPERTY GRAPH Medium
TEMP graph TEMP graph with permanent table reference Medium
TEMP graph ALTER ADD permanent table to TEMP graph Medium
CASCADE DROP ... CASCADE (dependent view cascade) Low
RESTRICT DROP ... RESTRICT (dependent object error) Low
5.2 White-box Tests (Coverage)
Based on uncovered code paths identified in coverage analysis.
File:Line Test Case Target Branch
-------------------------------------------------------------------------------
propgraphcmds.c:136,179,254-262 TEMP table in graph creation Auto TEMP conversion
propgraphcmds.c:1323,1364 ALTER ADD TEMP to perm graph Error branch
propgraphcmds.c:724-734 CREATE without LABEL clause Default label else
propgraphcmds.c:1300,1303 ALTER IF EXISTS non-existent missing_ok branch
propgraphcmds.c:116 CREATE UNLOGGED attempt UNLOGGED error
execMain.c:1162-1163 DML on graph RELKIND_PROPGRAPH
rewriteGraphTable.c:120 Multiple path patterns Length check
rewriteGraphTable.c:205 Quantifier usage Quantifier error
ruleutils.c:7939-7963 Complex label VIEW deparsing T_BoolExpr branch
5.3 Untestable Code (Defensive)
File:Line Reason
-------------------------------------------------------------------------------
rewriteGraphTable.c:202 PAREN_EXPR blocked at parser level
rewriteGraphTable.c:297,304,319 Cyclic pattern edge conflict - unreachable
rewriteGraphTable.c:801-818 get_gep_kind_name - error path only
nodeFuncs.c:4585-4596,4755-4774 Walker branches - internal implementation
6. RECOMMENDATIONS
------------------
6.1 Code Review Required (Minor, Decision Needed)
Location: propgraphcmds.c:1632
Issue: FIXME - No single-column index for plppropid
Current: InvalidOid causes sequential scan
Purpose: Check property references across all labels when deleting orphaned properties
Options:
(a) Add index: Create new single-column index on plppropid
(b) Use existing: NOT POSSIBLE (cannot specify plpellabelid, need all labels)
(c) Allow: Rare path (property deletion), sequential scan acceptable
6.2 Documentation Fixes (Minor)
- alter_property_graph.sgml: 3 typos/errors to fix
- queries.sgml: Add clear limitation notes for unimplemented features
- ddl.sgml: Fix "infrastucture" typo
6.3 Test Additions (Optional)
- CREATE TEMP PROPERTY GRAPH tests
- DROP ... CASCADE/RESTRICT tests
7. CONCLUSION
-------------
Test Quality: GOOD
Core functionality is thoroughly tested with comprehensive error case and
security (RLS) integration tests.
The patch is well-implemented within its defined scope. Identified issues
are minor documentation typos and one code decision point (FIXME).
No critical or major issues found.
Recommended actions before commit:
1. Decide on FIXME at propgraphcmds.c:1632 (index vs. allow seq scan)
2. Fix 5 documentation issues
3. Consider adding TEMP graph tests (optional but recommended)
Points for discussion (optional):
4. CREATE PROPERTY GRAPH IF NOT EXISTS support
5. Error vs. 0 rows behavior for non-existent/conflicting labels
Attachment:
- coverage_report.tar.gz (HTML coverage report generated by gcov)
---
End of Report
End of Report
2025년 12월 30일 (화) PM 8:27, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>님이 작성:
On Tue, Dec 30, 2025 at 3:14 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Dec 18, 2025 at 2:45 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> >
> > >
> > > I did another investigation about whether this level of checking is
> > > necessary. I think according to the letter of the SQL standard, the
> > > typmods must indeed match. It seems Oracle does not check (the example
> > > mentioned above came from an Oracle source). But I think it's okay to
> > > keep the check. In PostgreSQL, it is much less common to write like
> > > varchar(1000). And we can always consider relaxing it later.
> >
> > +1.
> >
> > Attached patch adds a couple more test statements.
> >
>
> Squashed this into the main patchset.
>
> > >
> > > 2) I had it in my notes to consider whether we should support the colon
> > > syntax for label expressions. I think we might have talked about that
> > > before.
> > >
> > > I'm now leaning toward not supporting it in the first iteration. I
> > > don't know that we have fully explored possible conflicts with host
> > > variable syntax in ecpg and psql and the like. Maybe avoid that for now.
> > >
> >
> > I was aware of ecpg and I vaguely remember we fixed something in ECPG
> > to allow : in MATCH statement. Probably following changes in
> > psqlscan.l and pgc.l
> > -self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
> > +self [,()\[\].;\:\|\+\-\*\/\%\^\<\>\=]
> >
> > Those changes add | after : (and not the : itself) so maybe they are
> > not about supporting : . Do you remember what those are?
>
> I reverted those changes from both the files and ran "meson test". I
> did not observe any failure. It seems those changes are not needed.
> But adding them as a separate commit (0004) in case CI bot reveals any
> failures without them.
>
> I noticed that there were no ECPG tests for SQL/PGQ. Added a basic
> test in patch 0003.
>
> >
> > I spotted some examples that use : in ddl.sgml.
> > <programlisting>
> > SELECT customer_name FROM GRAPH_TABLE (myshop MATCH
> > (c:customer)-[:has]->(o:"order" WHERE o.ordered_when = current_date)
> > COLUMNS (c.name AS customer_name));
> > </programlisting>
> >
> > The query demonstrates that one can use label names in a way that will
> > make the pattern look like an English sentence. Replacing : with IS
> > defeats that purpose.
> >
> > As written in that paragraph, the labels serve the purpose of exposing
> > the table with a different logical view (using different label and
> > property names). So we need that paragraph, but I think we should
> > change the example to use IS instead of :. Attached is suggested
> > minimal change, but I am not happy with it. Another possibility is we
> > completely remove that paragraph; I don't think we need to discuss all
> > possible usages the users will come up with.
> >
> > The patch changes one more instance of : by IS. But that's straight forward.
> >
> > In ddl.sgml I noticed a seemingly incomplete sentence
> > A property graph is a way to represent database contents, instead of using
> > relational structures such as tables.
> >
> > Represent the contents as what? I feel the complete sentence should be
> > one of the following
> > property graph is a way to represent database contents as a graph,
> > instead of representing those as relational structures OR
> > property graph is another way to represent database contents instead
> > of using relational structures such as tables
> >
> > But I can't figure out what was originally intended.
>
>
> 0002 contains some edits to this part of documentation. I think the
> paragraph reads better than before. Let me know what you think.
>
> Please let me know which of 0002 to 0004 look good to you. I will
> squash those into the patchset in the next version.
The previous patchset had a whitespace difference in ECPG expected
output files. Fixed in the attached patchset.
--
Best Wishes,
Ashutosh Bapat
Вложения
В списке pgsql-hackers по дате отправления: