Обсуждение: Questions about PostgreSQL implementation details

Поиск
Список
Период
Сортировка

Questions about PostgreSQL implementation details

От
Julien Delplanque
Дата:
Hello PostgreSQL hackers,

I hope I am posting on the right mailing-list.

I am actually doing a PhD related to relational databases and software engineering.

I use PostgreSQL for my research.

I have a few questions about the internals of PostgreSQL and I think they require experts knowledge.

I could not find documentation about that in the nice PostgreSQL documentation but maybe I missed something? Tell me if
itis the case.
 

My Questions:

Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are
theyjust a virtual representation ?
 

What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROP
TABLE,CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables?
 

For example, something like:

INSERT INTO pg_class [...];

To create a new table (instead of the CREATE TABLE DDL query).

Q1.1 If it is possible, is what is done in reality? I have the feeling that it is not the case and that DDL queries are
implementedin C directly.
 

Q1.2 If it is possible and not done, what is the reason?

--

Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only have
asingle primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables?
 

I guess, again, they are implemented in C code, but I might be wrong.

Q2.1 If they are not implemented via data constraints on meta-description tables, why ?

Q2.2 Is there somewhere in the documentation a list of such "meta-constraints" implemented by PostgreSQL?


Thanks in advance for answering my questions, any help or pointers to existing documentation will be appreciated.

Cheers,

Julien Delplanque




Re: Questions about PostgreSQL implementation details

От
Mark Dilger
Дата:

On 12/9/19 7:35 AM, Julien Delplanque wrote:
> Hello PostgreSQL hackers,
> 
> I hope I am posting on the right mailing-list.
> 
> I am actually doing a PhD related to relational databases and software 
> engineering.
> 
> I use PostgreSQL for my research.
> 
> I have a few questions about the internals of PostgreSQL and I think 
> they require experts knowledge.
> 
> I could not find documentation about that in the nice PostgreSQL 
> documentation but maybe I missed something? Tell me if it is the case.
> 
> My Questions:
> 
> Q1. Are PostgreSQL's meta-description tables (such as pg_class) the 
> "reality" concerning the state of the DB or are they just a virtual 
> representation ?

Not all of them are real tables; some of the pg_catalog relations are
views over others of them.  But many of them are real tables with C
structs that back them.  Take a look in src/include/catalog/pg_class.h
and you'll see the C struct definition, somewhat obscured by some
macros that make it less obvious to people not familiar with the
postgresql sources.

On line 29:

CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP 
BKI_ROWTYPE_OID(83,RelationRelation_Rowtype_Id) BKI_SCHEMA_MACRO
{
...
}

That's a typedef.  See genbki.h where it defines the macro:

#define CATALOG(name,oid,oidmacro)    typedef struct CppConcat(FormData_,name)

> What I would like to know with this question is: would it be possible to 
> implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER 
> TABLE, etc.) as DML queries that modify the meta-data stored in 
> meta-description tables?
> 
> For example, something like:
> 
> INSERT INTO pg_class [...];
> 
> To create a new table (instead of the CREATE TABLE DDL query).

You are not allowed to insert into the pg_class table directly.  There
are good reasons for that.  Simply inserting a row into this table would
not cause all the infrastructure that backs a table to pop into
existence.  So you have to use the DDL commands.

> 
> Q1.1 If it is possible, is what is done in reality? I have the feeling 
> that it is not the case and that DDL queries are implemented in C directly.

See src/backend/commands/tablecmds.c, function DefineRelation.



-- 
Mark Dilger



Re: Questions about PostgreSQL implementation details

От
Tom Lane
Дата:
Julien Delplanque <julien.delplanque@inria.fr> writes:
> I have a few questions about the internals of PostgreSQL and I think they require experts knowledge.

> Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are
theyjust a virtual representation ? 

The system catalogs are reality as far as the effects of DDL go.  In the
particular case of pg_class, there is additional reality, which is that
(most) pg_class rows represent one or more data files on-disk.  You could
in fact simulate many DDL operations by manual inserts/deletes/updates on
system catalogs; but that would not result in any actions on the data
files, so it falls down for the specific cases of CREATE/DROP TABLE,
CREATE/DROP INDEX, etc.

> What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE,
DROPTABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables? 

Underneath, many of those operations are just catalog manipulations,
so yes up to the point where you need to do something that impacts
user data storage.

(In practice, getting all the details right from a SQL client
would be a pretty painful thing, so I'm not sure I see the point.
Usually, modifying the PG C code or writing an extension would be
a saner approach to modifying the system's behavior.)

> Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only
havea single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables? 

Mostly code.  The only real constraints on the system catalogs, in the
sense of something that would reject an ill-advised low-level update,
are the unique indexes.  There are also NOT NULL markers on many of
the catalogs' columns, but those are only enforced against manual
SQL updates not updates made by C code.  This is all somewhat historical,
I suppose, but it's worked well enough for us.

> Thanks in advance for answering my questions, any help or pointers to existing documentation will be appreciated.

There's no substitute for reading the source code.  Unlike some other
systems, PG was developed as open source from the beginning, so there
is not much in the way of a "theory of operations manual" or other
separate internals documentation --- people with these sorts of
questions are expected to go read the code.  You could perhaps start
by skimming

https://www.postgresql.org/docs/devel/internals.html

and then go look for README files in the portions of the source
tree that interest you.

            regards, tom lane



Re: Questions about PostgreSQL implementation details

От
Heikki Linnakangas
Дата:
Hi Julien!

On 09/12/2019 17:35, Julien Delplanque wrote:
> Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are
theyjust a virtual representation ?
 

Yes, the catalog tables are the authoritative source. The system uses 
those tables internally to get the information too.

Some of the pg_* relations are just views over other catalog tables, though.

> What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE,
DROPTABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables?
 
> 
> For example, something like:
> 
> INSERT INTO pg_class [...];
> 
> To create a new table (instead of the CREATE TABLE DDL query).
> 
> Q1.1 If it is possible, is what is done in reality? I have the feeling that it is not the case and that DDL queries
areimplemented in C directly.
 
> 
> Q1.2 If it is possible and not done, what is the reason?

The C code for the DDL commands do some things in addition to modifying 
the catalog tables. Notably for CREATE TABLE, it creates the relation 
file in the data directory, where all the data is stored. It also 
handles locking, invalidating various caches, firing event triggers etc. 
Except for creating relation files, those other things happen just in 
memory, though.

It is not supported, and please don't do it in production, but you could 
try it out. Set "allow_system_table_mods=on", and insert to pg_class, 
pg_attribute, etc. See how well it works. Beware that there are internal 
caches, called "syscaches", in backends over the catalog tables, so if 
you modify them directly, you may need to restart for the changes to 
take effect.

> Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only
havea single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables?
 
> 
> I guess, again, they are implemented in C code, but I might be wrong.

In C code.

> Q2.1 If they are not implemented via data constraints on meta-description tables, why ?

I think there are some restrictions that cannot easily be represented as 
constraints. Also, we've never supported constraints on catalog tables, 
so no one's given much thought to what it would look like if we did.

> Q2.2 Is there somewhere in the documentation a list of such "meta-constraints" implemented by PostgreSQL?

Hmm, I don't think there is. Your best bet is to just look at the C 
code, I'm afraid.

- Heikki



Re: Questions about PostgreSQL implementation details

От
Mark Dilger
Дата:

On 12/9/19 7:52 AM, Mark Dilger wrote:
>>
>> Q1.1 If it is possible, is what is done in reality? I have the feeling 
>> that it is not the case and that DDL queries are implemented in C 
>> directly.
> 
> See src/backend/commands/tablecmds.c, function DefineRelation.

I realize I could be a bit more helpful, here.  For a SQL command
like "CREATE TABLE", you can first look in src/backend/parser/gram.y
for the grammar rule.  In this case, searching for CREATE TABLE
leads you to a banner comment around line 3132.  The rule for
CreateStmt creates a node of type CreateStmt.  That leads you
to a struct of the same name in src/include/nodes/parsenodes.h.
You can see all the fields of that struct, and reconcile those
against what the code in gram.y is doing to populate those
fields.  You can then look in src/backend/tcop/utility.c for
the T_CreateStmt nodeTag, and you'll find that it appears in
a few switch statements.  One of those in ProcessUtilitySlow
calls DefineRelation along with a bunch of other stuff.

That's how you can trace these commands.  Let's take DROP TABLE
as a second example.  This one is harder, because "DROP TABLE"
doesn't exist exactly.  It's part of a larger grammar production
for DropStmt that includes other DROP commands.  All the same,
you can see there is a rule for DropStmt that creates a node of
type DropStmt.  Looking in src/include/nodes/parsenodes.h you
can see the struct of the same name.  Looking in tcop/utility.c
you see T_DropStmt is handled in a few switch statements, and
seem to run through ProcessUtilitySlow and ExecDropStmt.  The
function ExecDropStmt has a switch over stmt->removeType, which
was populated back in gram.y to the value OBJECT_TABLE.  That
now serves to select the RemoveRelations function.

This sort of analysis is fairly useful for getting from a SQL
statement to the code in src/backend/commands/ that implements
the guts of that statement.

I hope this helps.

-- 
Mark Dilger



Re: Questions about PostgreSQL implementation details

От
Tom Lane
Дата:
Mark Dilger <hornschnorter@gmail.com> writes:
> [ useful tips about finding the code that implements a SQL command ]

BTW, if it wasn't obvious already, you *really* want to have some kind
of tool that easily finds the definition of a particular C symbol.
You can fall back on "grep -r" or "git grep", but lots of people use
ctags or etags or some other C-aware indexing tool.

            regards, tom lane



Re: Questions about PostgreSQL implementation details

От
Julien Delplanque
Дата:
Thanks for your answer, I will dive into the C code then.

Le 9/12/19 à 16:52, Mark Dilger a écrit :
>
> Not all of them are real tables; some of the pg_catalog relations are
> views over others of them.  But many of them are real tables with C
> structs that back them.  Take a look in src/include/catalog/pg_class.h
> and you'll see the C struct definition, somewhat obscured by some
> macros that make it less obvious to people not familiar with the
> postgresql sources. 

Indeed, I should have say it in my email, I saw in the documentation 
that some of them are views. :-)

Cheers,

Julien

Le 9/12/19 à 16:52, Mark Dilger a écrit :
>
>
> On 12/9/19 7:35 AM, Julien Delplanque wrote:
>> Hello PostgreSQL hackers,
>>
>> I hope I am posting on the right mailing-list.
>>
>> I am actually doing a PhD related to relational databases and 
>> software engineering.
>>
>> I use PostgreSQL for my research.
>>
>> I have a few questions about the internals of PostgreSQL and I think 
>> they require experts knowledge.
>>
>> I could not find documentation about that in the nice PostgreSQL 
>> documentation but maybe I missed something? Tell me if it is the case.
>>
>> My Questions:
>>
>> Q1. Are PostgreSQL's meta-description tables (such as pg_class) the 
>> "reality" concerning the state of the DB or are they just a virtual 
>> representation ?
>
> Not all of them are real tables; some of the pg_catalog relations are
> views over others of them.  But many of them are real tables with C
> structs that back them.  Take a look in src/include/catalog/pg_class.h
> and you'll see the C struct definition, somewhat obscured by some
> macros that make it less obvious to people not familiar with the
> postgresql sources.
>
> On line 29:
>
> CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP 
> BKI_ROWTYPE_OID(83,RelationRelation_Rowtype_Id) BKI_SCHEMA_MACRO
> {
> ...
> }
>
> That's a typedef.  See genbki.h where it defines the macro:
>
> #define CATALOG(name,oid,oidmacro)    typedef struct 
> CppConcat(FormData_,name)
>
>> What I would like to know with this question is: would it be possible 
>> to implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, 
>> ALTER TABLE, etc.) as DML queries that modify the meta-data stored in 
>> meta-description tables?
>>
>> For example, something like:
>>
>> INSERT INTO pg_class [...];
>>
>> To create a new table (instead of the CREATE TABLE DDL query).
>
> You are not allowed to insert into the pg_class table directly. There
> are good reasons for that.  Simply inserting a row into this table would
> not cause all the infrastructure that backs a table to pop into
> existence.  So you have to use the DDL commands.
>
>>
>> Q1.1 If it is possible, is what is done in reality? I have the 
>> feeling that it is not the case and that DDL queries are implemented 
>> in C directly.
>
> See src/backend/commands/tablecmds.c, function DefineRelation.
>
>
>



Re: Questions about PostgreSQL implementation details

От
Craig Ringer
Дата:
On Tue, 10 Dec 2019 at 01:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <hornschnorter@gmail.com> writes:
> [ useful tips about finding the code that implements a SQL command ]

BTW, if it wasn't obvious already, you *really* want to have some kind
of tool that easily finds the definition of a particular C symbol.
You can fall back on "grep -r" or "git grep", but lots of people use
ctags or etags or some other C-aware indexing tool.

 
I strongly recommend cscope with editor integration for your preferred editor btw.
 
--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: Questions about PostgreSQL implementation details

От
Craig Ringer
Дата:

On Mon, 9 Dec 2019 at 23:35, Julien Delplanque <julien.delplanque@inria.fr> wrote:
Hello PostgreSQL hackers,

I hope I am posting on the right mailing-list.

I am actually doing a PhD related to relational databases and software engineering.

I use PostgreSQL for my research.

I have a few questions about the internals of PostgreSQL and I think they require experts knowledge.

I could not find documentation about that in the nice PostgreSQL documentation but maybe I missed something? Tell me if it is the case.

There are a bunch of README files in the source tree that concern various innards of PostgreSQL. They're not always referred to by any comments etc, so you have to know they exist. They're usually well worth reading, though it can take a while before you understand enough of PostgreSQL's architecture for them to make sense...

Try

    find src/ -name README\*
 
Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are they just a virtual representation ?

That's been largely answered. But I want to point out an important caveat that isn't obvious to new people: The oid of a relation (pg_class.oid) is not the same thing as the pg_class.relfilenode, which is usually the base of the filename of the on-disk storage for the relation. On an idle or new database most relations  are created with an equal oid and relfilename, so it's easy to think the oid maps to the on-disk name of a relation, but it doesn't. The relation oid will not change so long as the relation exists, but the relfilenode may change if the table contents are rewritten, etc. Additionally, there are special tables that are "relmapped" such that they don't have a normal relfilenode at all, instead access is indirected via a separate mapping. IIRC that's mainly necessary so we can bootstrap access to the catalog tables that tell us how to read the catalogs.

What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables?

Not really.

PostgreSQL has a caching layer - sycache, relcache, catcache - and invalidation scheme that it relies on. It doesn't execute regular queries on the system catalogs. It also has simplifying rules around how they are updated and accessed. See the logic in genam.c etc. Catalogs may also represent things that aren't just other DB rows - for example, pg_class entries are associated with files on disk for individual database tables.

You can't just insert into pg_class, pg_attribute, etc and expect that to safely create a table. Though it's surprising how much you can get away with by hacking the catalogs if you're very careful and you trick PostgreSQL into firing appropriate invalidations. I'd quite like to have a SQL-exposed way to do a forced global cache flush and invalidation for use in emergency scary catalog hacking situations.

So you can do quite a bit with direct catalog surgery, but it's dangerous and if you break the database, you get to keep the pieces. 

Q1.1 If it is possible, is what is done in reality? I have the feeling that it is not the case and that DDL queries are implemented in C directly.

Right. See standard_ProcessUtility() and friends.

Q1.2 If it is possible and not done, what is the reason?

Speed - no need to run the full executor. Simplification of catalog access. Caching and invalidations. Chicken/egg problems: how do you "CREATE TABLE pg_class"? . Lots more.
 
Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only have a single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables?

System catalogs are not permitted to have CONSTRAINTs (CHECK constraints, UNIQUE constraints, PRIMARY KEY constraints, FOREIGN KEY constraints, etc).

All such management is done in C level logic with the assistance of the pg_depend catalog and the relationships it tracks.
 
Q2.1 If they are not implemented via data constraints on meta-description tables, why ?

Same as above.
 
Q2.2 Is there somewhere in the documentation a list of such "meta-constraints" implemented by PostgreSQL?

Not AFAIK.

Why?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise