Re: SQL Property Graph Queries (SQL/PGQ)

Поиск
Список
Период
Сортировка
От Henson Choi
Тема Re: SQL Property Graph Queries (SQL/PGQ)
Дата
Msg-id CAAAe_zAW7DQEqXDwroUaYdTgK1qpjEgTCbhDtMiYbxL+VVByvQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: SQL Property Graph Queries (SQL/PGQ)  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список pgsql-hackers
Hi hackers,

I ran Valgrind memcheck against the SQL/PGQ (Property Graph) implementation
to verify there are no memory issues introduced by the new graph features.

== Test Environment ==

- PostgreSQL: 19devel (master branch)
- Valgrind: 3.22.0
- Platform: Linux x86_64 (RHEL 9)
- Tests executed: run_pgq_tests.sql
  - create_property_graph.sql
  - graph_table.sql
  - graph_table_rls.sql
  - alter_generic.sql
  - object_address.sql

== Executive Summary ==

  +----------------------------------------------------------+
  | SQL/PGQ Implementation: NO MEMORY LEAKS DETECTED         |
  +----------------------------------------------------------+

All detected leaks originate from existing PostgreSQL core components.
No graph-related functions appear in any leak stack traces.

== Backend Process 487751: Itemized Leak Analysis ==

Total: 555 bytes definitely lost in 47 blocks, 21 unique leak contexts

+================================================================================+
| #  | Bytes | Blocks | Allocator           | Root Cause       | PGQ? | Verdict  |
+================================================================================+
| 1  | 1     | 1      | text_to_cstring     | SQL func cache   | NO   | HARMLESS |
| 2  | 3     | 1      | pstrdup             | PL/pgSQL lexer   | NO   | HARMLESS |
| 3  | 3     | 1      | pstrdup             | PL/pgSQL lexer   | NO   | HARMLESS |
| 4  | 12    | 2      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 5  | 12    | 2      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 6  | 12    | 2      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 7  | 13    | 2      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 8  | 21    | 3      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 9  | 25    | 3      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 10 | 27    | 1      | detoast_attr        | SQL func cache   | NO   | HARMLESS |
| 11 | 27    | 1      | detoast_attr        | SQL func cache   | NO   | HARMLESS |
| 12 | 33    | 5      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 13 | 33    | 5      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 14 | 33    | 5      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 15 | 37    | 1      | strdup              | Postmaster init  | NO   | HARMLESS |
| 16 | 38    | 5      | downcase_identifier | PL/pgSQL parser  | NO   | HARMLESS |
| 17 | 68    | 1      | deconstruct_array   | SQL func cache   | NO   | HARMLESS |
| 18 | 84    | 1      | deconstruct_array   | SQL func cache   | NO   | HARMLESS |
| 19 | 164   | 2      | lappend             | PL/pgSQL parser  | NO   | HARMLESS |
| 20 | 164   | 2      | lappend             | PL/pgSQL parser  | NO   | HARMLESS |
| 21 | 358   | 1      | plpgsql_ns_push     | PL/pgSQL parser  | NO   | HARMLESS |
+================================================================================+

     TOTAL: 555 bytes in 47 blocks
     PGQ-RELATED LEAKS: 0

== SQL/PGQ Verification ==

All 21 leak stack traces were examined. Key functions NOT present:

  - CreatePropertyGraph, DropPropertyGraph, AlterPropertyGraph
  - transformGraphTable, ExecGraphTableScan
  - Any function from src/backend/commands/propertygraphcmds.c
  - Any function from src/backend/parser/parse_graph.c
  - Any GRAPH_TABLE related executor nodes

The SQL/PGQ tests exercised CREATE PROPERTY GRAPH, DROP PROPERTY GRAPH,
and GRAPH_TABLE queries, yet none of these code paths appear in leaks.

== Process Summary ==

+--------+----------------+-----------------+----------+--------------------+
| PID    | Type           | Definitely Lost | Errors   | Notes              |
+--------+----------------+-----------------+----------+--------------------+
| 487724 | postmaster     | 37 bytes        | 1        | strdup at startup  |
| 487751 | backend        | 555 bytes       | 30       | See above analysis |
| 487735 | checkpointer   | 37 bytes        | 1        | strdup at startup  |
| 487736 | bgwriter       | 37 bytes        | 1        | strdup at startup  |
| 487737 | walwriter      | 37 bytes        | 1        | strdup at startup  |
| 487738 | autovacuum     | 37 bytes        | 1        | strdup at startup  |
| 487739 | logical rep    | 37 bytes        | 1        | strdup at startup  |
| 487740 | syslogger      | 37 bytes        | 1        | strdup at startup  |
| Others | aux workers    | 37 bytes each   | 1 each   | strdup at startup  |
+--------+----------------+-----------------+----------+--------------------+


== Possibly Lost (LLVM JIT) ==

Additionally, ~15KB in ~155 blocks reported as "possibly lost" from LLVM JIT:
  - llvm::MDTuple, llvm::User allocations
  - Origin: llvm_compile_expr -> jit_compile_expr -> ExecReadyExpr
  - These are LLVM's internal caches, not PostgreSQL leaks

== Other Memory Errors ==

+--------------------------------------------------------------------------+
| NO memory access errors detected:                                        |
|   - Invalid read/write: 0                                                |
|   - Use of uninitialized value: 0                                        |
|   - Conditional jump on uninitialized value: 0                           |
|   - Syscall param errors: 0                                              |
|   - Overlap in memcpy/strcpy: 0                                          |
+--------------------------------------------------------------------------+

Suppressed errors (via valgrind.supp):
  - 487751 (backend): 330 errors from 3 contexts suppressed
  - 487762 (aux worker): 9 errors suppressed
  - Other processes: ~9 errors each suppressed

Suppression rules in valgrind.supp (12 categories):
  1. Padding        - write() syscalls with uninitialized struct padding
  2. CRC            - CRC32 calculation on padded buffers
  3. Overlap        - memcpy overlap in bootstrap/relmap
  4. glibc          - Dynamic loader internals
  5. Regex/Pattern  - RE_compile_and_cache, patternsel
  6. Planner        - query_planner memcpy overlap
  7. Cache          - RelCache, CatCache, TypeCache possible leaks
  8. XLogReader     - WAL reader allocations
  9. Parallel       - ParallelWorkerMain allocations
  10. AutoVacuum    - AutoVacWorkerMain allocations
  11. GUC/Backend   - set_config_option, InitPostgres
  12. PL/pgSQL      - plpgsql_compile, SPI_prepare, CachedPlan

All suppressed errors are known, harmless PostgreSQL behaviors.
See attached valgrind.supp for details.

== Conclusion ==

The SQL/PGQ implementation introduces NO memory leaks. All detected issues
are pre-existing PostgreSQL behaviors related to:

  1. PL/pgSQL function compilation caching
  2. SQL function metadata caching
  3. Postmaster startup allocations
  4. LLVM JIT internal caching

These are by-design patterns that trade minimal memory for performance.

== Files Attached ==

- valgrind.supp: Suppression file used for this test
- 487724.log: Postmaster log
- 487751.log: Backend log with full leak details (126KB)
- 4877xx.log: Auxiliary process logs

--
Regards

2025년 12월 30일 (화) PM 8:27, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>님이 작성:
On Tue, Dec 30, 2025 at 3:14 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Dec 18, 2025 at 2:45 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> >
> > >
> > > I did another investigation about whether this level of checking is
> > > necessary.  I think according to the letter of the SQL standard, the
> > > typmods must indeed match.  It seems Oracle does not check (the example
> > > mentioned above came from an Oracle source).  But I think it's okay to
> > > keep the check.  In PostgreSQL, it is much less common to write like
> > > varchar(1000).  And we can always consider relaxing it later.
> >
> > +1.
> >
> > Attached patch adds a couple more test statements.
> >
>
> Squashed this into the main patchset.
>
> > >
> > > 2) I had it in my notes to consider whether we should support the colon
> > > syntax for label expressions.  I think we might have talked about that
> > > before.
> > >
> > > I'm now leaning toward not supporting it in the first iteration.  I
> > > don't know that we have fully explored possible conflicts with host
> > > variable syntax in ecpg and psql and the like.  Maybe avoid that for now.
> > >
> >
> > I was aware of ecpg and I vaguely remember we fixed something in ECPG
> > to allow : in MATCH statement. Probably following changes in
> > psqlscan.l and pgc.l
> > -self                   [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
> > +self                   [,()\[\].;\:\|\+\-\*\/\%\^\<\>\=]
> >
> > Those changes add | after : (and not the : itself) so maybe they are
> > not about supporting : . Do you remember what those are?
>
> I reverted those changes from both the files and ran "meson test". I
> did not observe any failure. It seems those changes are not needed.
> But adding them as a separate commit (0004) in case CI bot reveals any
> failures without them.
>
> I noticed that there were no ECPG tests for SQL/PGQ. Added a basic
> test in patch 0003.
>
> >
> > I spotted some examples that use : in ddl.sgml.
> > <programlisting>
> > SELECT customer_name FROM GRAPH_TABLE (myshop MATCH
> > (c:customer)-[:has]->(o:"order" WHERE o.ordered_when = current_date)
> > COLUMNS (c.name AS customer_name));
> > </programlisting>
> >
> > The query demonstrates that one can use label names in a way that will
> > make the pattern look like an English sentence. Replacing : with IS
> > defeats that purpose.
> >
> > As written in that paragraph, the labels serve the purpose of exposing
> > the table with a different logical view (using different label and
> > property names). So we need that paragraph, but I think we should
> > change the example to use IS instead of :. Attached is suggested
> > minimal change, but I am not happy with it. Another possibility is we
> > completely remove that paragraph; I don't think we need to discuss all
> > possible usages the users will come up with.
> >
> > The patch changes one more instance of : by IS. But that's straight forward.
> >
> > In ddl.sgml I noticed a seemingly incomplete sentence
> >    A property graph is a way to represent database contents, instead of using
> >    relational structures such as tables.
> >
> > Represent the contents as what? I feel the complete sentence should be
> > one of the following
> > property graph is a way to represent database contents as a graph,
> > instead of representing those as relational structures OR
> > property graph is another way to represent database contents instead
> > of using relational structures such as tables
> >
> > But I can't figure out what was originally intended.
>
>
> 0002 contains some edits to this part of documentation. I think the
> paragraph reads better than before. Let me know what you think.
>
> Please let me know which of 0002 to 0004 look good to you. I will
> squash those into the patchset in the next version.

The previous patchset had a whitespace difference in ECPG expected
output files. Fixed in the attached patchset.

--
Best Wishes,
Ashutosh Bapat
Вложения

В списке pgsql-hackers по дате отправления: