Обсуждение: Using relations in the SELECT part

Поиск
Список
Период
Сортировка

Using relations in the SELECT part

От
Giuseppe Sacco
Дата:
Hello,
I am writing to this list since I wrote a query that I cannot really
understand. So, thanks to anyone who will light my darkness :-)

I have a table with two columns, the first one is a key, the second one
is a list of car plates. What I need to extract is a result set that
contains two columns, the first one should always be the key, the
second one only one plate. If the record contains many plates, the
result set should contains a row for each plate.

Example:

postgres=# create temporary table t
 (key varchar primary key, plates varchar);
postgres=# insert into t values ('000000','AA888BB CC777DD GG333JJ'),
 ('111111','ZZ888KK');
INSERT 0 2
postgres=# select * from t;
  key   |         plates          
--------+-------------------------
 000000 | AA888BB CC777DD GG333JJ
 111111 | ZZ888KK


This is what I would like to extract from t:
  key   |         plate
--------+-------------------------
 000000 | AA888BB
 000000 | CC777DD
 000000 | GG333JJ
 111111 | ZZ888KK

the solution I found is:

postgres=# select key,
 unnest(regexp_split_to_array(plates, E'\\s+')) AS plate from t;
  key   |  plate  
--------+---------
 000000 | AA888BB
 000000 | CC777DD
 000000 | GG333JJ
 111111 | ZZ888KK


What did I write? The first operation is to convert the original space
separated list into an array, then convert that array to a relation
that contains many records.


Early questions:

1. why may I put in the SELECT part (instead of the FROM) a relation?
When I studied SQL, I was told to put all relations in FROM, and put in
the SELECT part only the colmns or expressions with columns for
formatting the output.

2. why postgresql create a cartesian product using a first element (a
single columns "key") and a second element (a relation "plate")?

3. how postgresql define the second element? it is not "static" since
it depends fomr the first element: it depends on the current record.
For every "key", there a different "plate" result set.


Furthermore, let's assume postgres does a cartesian product, if I add a
new relation as third element, does it create 4x3 product? Let's see:

postgres=# select key,
 unnest(regexp_split_to_array(plates, E'\\s+')) AS plate1,
 unnest(regexp_split_to_array(plates, E'\\s+')) AS plate2 from t;
  key   | plate1  | plate2  
--------+---------+---------
 000000 | AA888BB | AA888BB
 000000 | CC777DD | CC777DD
 000000 | GG333JJ | GG333JJ
 111111 | ZZ888KK | ZZ888KK

4. what happened? May this be somewhta related to IMMUTABLE function?
Is unnest an immutable function? And, in any case, why this this is not
a cartesia product?

Let's try in a different way, with a different array:

postgres=# select key,
 unnest(regexp_split_to_array(plates, E'\\s+')) AS plate1, 
 unnest('{1,2}'::int[]) AS array2 from t;
  key   | plate1  | array2 
--------+---------+--------
 000000 | AA888BB |      1
 000000 | CC777DD |      2
 000000 | GG333JJ |      1
 000000 | AA888BB |      2
 000000 | CC777DD |      1
 000000 | GG333JJ |      2
 111111 | ZZ888KK |      1
 111111 | ZZ888KK |      2

this time it is a cartesian product. Why postgresql acts differently?

Thank you,
Giuseppe


Re: Using relations in the SELECT part

От
Tom Lane
Дата:
Giuseppe Sacco <giuseppe@eppesuigoccas.homedns.org> writes:
> the solution I found is:
> postgres=# select key,
>  unnest(regexp_split_to_array(plates, E'\\s+')) AS plate from t;

> 1. why may I put in the SELECT part (instead of the FROM) a relation?
> When I studied SQL, I was told to put all relations in FROM, and put in
> the SELECT part only the colmns or expressions with columns for
> formatting the output.

If you don't like it, don't do it ;-).  A more theoretically pure approach
is

select key, plate
 from t,
      lateral unnest(regexp_split_to_array(plates, E'\\s+')) as plate;

although really you chose the wrong regexp function and should have used

select key, plate
 from t,
      lateral regexp_split_to_table(plates, E'\\s+') as plate;

Also, the keyword "lateral" is optional here per SQL standard, although
I think it's better to include it to make it clearer what's happening.

The fact that Postgres allows set-returning functions in the
SELECT targetlist is a hangover from Berkeley QUEL, which at this
point we sort of regret not having ripped out twenty years ago;
it's a real wart both semantically and implementation-wise.  But it's
hard to get rid of such things.  Putting a set-returning function
in LATERAL is cleaner and more standards-compliant, though.

> 2. why postgresql create a cartesian product using a first element (a
> single columns "key") and a second element (a relation "plate")?

It's not really very different from what happens with LATERAL, at least
for the case with just one SRF in the targetlist.

> Furthermore, let's assume postgres does a cartesian product, if I add a
> new relation as third element, does it create 4x3 product?

You've hit on the reason why it's semantically ugly: it's not very
clear what to do with multiple SRFs in one targetlist.  LATERAL,
together with the ROWS FROM construct, allows clear specification
of both of the useful behaviors (cartesian product and eval-set-
returning-functions-in-lockstep).  The multiple-SRFs-in-targetlist
behavior that we inherited from Berkeley is just a mess, as it
effectively runs the SRFs until reaching the least common multiple of
their periods.  We're changing that for v10 though.  You might find
this commit informative (at least the commit message and documentation
changes):

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=69f4b9c85

            regards, tom lane