some notes about implementation of SQL/PSM in PostgreSQL

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема some notes about implementation of SQL/PSM in PostgreSQL
Дата
Msg-id AANLkTi=fqsdx6fW5DiXmAeLe7TFO6c2vNTRocZdTvK06@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
Hello

I try to implement SQL/PSM in PostgreSQL. I experimented with
rewriting a plpgsql - you can check this implementation
http://postgres.cz/index.php/SQL/PSM_Manual.

I though so we can have some common PL stack. But I don't think it
now. It means a some more uglification of current code or uglification
of our SQL/PSM implementation.

Why? PL/pgSQL language is extremely adjusted to PostgreSQL - it's some
combination of strong static language based on relative high dynamic
interpret.

For example:
* variables are static (left part of assign statement)
* assign statement and expressions are "pseudo static" - type is
specified when query is first evaluated, and cannot be changes later
* there are not a strong relation between left part and right part of
assign statement - PLpgSQL use a cast or IO cast when it's necessary

PL/pgSQL needs to know all objects before first using - it doesn't to
use a "reflection" on SQL level - see different a usage of plpgsql FOR
IN query and PL/SQL FOR IN query

Current implementation has some large advantages  or disadvantages - I
like PLpgSQL and I am sure so it is impossible to change current
behave now. This is just recapitulation:

+ there is a less dependency between functions - a function has to
exists just in moment when it is exactly executed
+ there isn't a complication with recompilation of functions - all
work is transparent
+ current implementation is fast - use a three level for access to
variables scalar, row, record
+ there is zero relation between tables and functions
+ functions is a dynamic - respect a changes in search_path (sometimes)
+ language is relative rich and allow very comfortable work with dynamic SQL
+ there are not standard - we are able to enhance language

- current implementation is too complex - there are a three different
access to variables
- there is relative big overhead when functions is starting first time
- there are lot of SQL parsing
- current validation checking  only syntax - it doesn't check a
references to SQL objects - so some errors can be found too late
- the zero relation between expression and variable means a lot of
casts and some IO casts
- older code can be quietly broken by ALTER TABLE ADD COLUMN statement
- there can be a collision between PL and SQL identifiers
- functions is too dynamic (search_path can be a security hole)
- there are a still some dependency issues - somewhere is necessary to
logout/loging
- isn't possible to call plpgsql function like coroutines - so there
are not possibility to effectively limit SRF functions
- there is issue with blind SQL optimalization - bad plans
- there is issue with slow varlena data type modifications in large loops
- there is issue with iteration over record

I don't thing so all these issues can be solved - some needs a
dependency between functions and it means a problem with updates on
production's servers. Solving a other can means a breaking
compatibility etc..

Because SQL/PSM is different language and because it's not implemented
yet, we can to address some issues - absolutely not all
* simpler interpret - there must not be a different access to scalar,
row or record variable
* language can be a more dynamic or more static - current level of
PLpgSQL is little bit strange - with large impact on interpret
complexity
* more static behave can accelerate a first start - SQL are analyzed once time
* we can use a simple interpret where is possible to store state in
every moment and call PSM function like couroutine (stack can be
stored inside fn_extra).

I am thinking about some low level PL language - like PLSPI or some
similar - SQL/PSM functions should be compiled to this low level
language. This language can be simply parsed (first start will be
faster) and can be simply serialized and deserialized - it can be
important for people who afraid about their source codes - they can
compile to PLPSM and they can distribute only this pcode.

There are a two issues of standard

1. it doesn't to know RETURN NEXT, and what I know all SQL/PSM
implementation doesn't support similar construct - so It's should not
be in pgPSM too.

2. there are very specific statement "FOR IN query" - the PL variables
are created automatically based on query result. This isn't possible
in PL/pgSQL, because we must not to analyze query in validation time.
This is real problem - it can have a impact on pg_dump, because it
needs to dump functions with empty body first, and then dump full
function again.

Because PL/pgSQL has a long history and we cannot to leave it, I am
inclined to think so better variant for implementation of SQL/PSM is
starting with a different runtime. Probably only one interesting part
that can be shared is expression evaluator. I am thinking, so this
moving this to SPI can be interesting for all PL.

Regards

Pavel Stehule


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Latches with weak memory ordering (Re: max_wal_senders must die)
Следующее
От: Itagaki Takahiro
Дата:
Сообщение: Re: MULTISET and additional functions for ARRAY