Обсуждение: need clarification on CTE/join

Поиск
Список
Период
Сортировка

need clarification on CTE/join

От
Marc Millas
Дата:
Hi,

if I do:
postgres=# with numb as(select ceiling(2582*random()) rand, generate_series(1,5) as monnum) select rand, monnum from numb;
 rand | monnum
------+--------
 1543 |      1
 2299 |      2
  205 |      3
  523 |      4
  677 |      5
(5 lines)

ok, fine. The random numbers are at random...and the generate_series are ordered...

I have a table firstnames(id serial, firstname text) with 2582 lines containing firstnames sorted in alphabetical order.

--when I do, with the same CTE:
postgres=# with numb as(select ceiling(2582*random()) rand, generate_series(1,5) as monnum) select monnum, firstname from numb, firstnames where numb.rand= firstnames.id ;
 monnum |  firstname
--------+-----------
      2 | Christine
      1 | Firas
      4 | Firmin
      3 | Rawane
      5 | Titania
(5 lignes)

which mean that what I get is a set of firstnames ordered according to the firstnames table, and NOT to the result of the CTE.


--Now if I cast the result of the ceiling function to int:
postgres=# with numb as(select ceiling(2582*random())::int rand, generate_series(1,5) as monnum) select monnum, firstname from numb,firstnames where numb.rand=firstnames.id;
 monnum |  prenom
--------+----------
      1 | Dexter
      2 | Harrison
      3 | Angilbe
      4 | Narcisse
      5 | Marcel
(5 lignes)

Now its ordered according to the CTE. (and the firstname list is at random)

I did test the same thing after putting the result of the CTE in a table, with the very same behaviour.

So.. I would like to understand the "why" of this behaviour, ie. the change of order when I do the cast.
(The original reason was that I was creating a test env with millions of rows with this kind of CTE and was quite surprised to discover that the result table was ordered...which was not at all my goal)

thanks,


Marc MILLAS
Senior Architect
+33607850334

Re: need clarification on CTE/join

От
"David G. Johnston"
Дата:
On Tue, Mar 23, 2021 at 6:45 PM Marc Millas <marc.millas@mokadb.com> wrote:
So.. I would like to understand the "why" of this behaviour, ie. the change of order when I do the cast.

I believe the "why" is immaterial here.  Your queries do not contain order by so your results are unordered - even if there appears to be an apparent ordering for any particular result.

David J.

Re: need clarification on CTE/join

От
Marc Millas
Дата:
Hi,

I cannot agree.
I did an explain analyze with and without the cast: its extremely different:

postgres=# explain analyze with numb as(select ceiling(2582*random())::int rand, generate_series(1,5) as monnum) select monnum, prenom from numb,prenoms where numb.rand=prenoms.id;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.33..37.65 rows=5 width=11) (actual time=0.059..0.147 rows=5 loops=1)
   CTE numb
     ->  ProjectSet  (cost=0.00..0.05 rows=5 width=8) (actual time=0.021..0.022 rows=5 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
   ->  CTE Scan on numb  (cost=0.00..0.10 rows=5 width=8) (actual time=0.023..0.025 rows=5 loops=1)
   ->  Index Scan using prenoms_pkey on prenoms  (cost=0.28..7.50 rows=1 width=11) (actual time=0.024..0.024 rows=1 loops=5)
         Index Cond: (id = numb.rand)
 Planning Time: 0.111 ms
 Execution Time: 0.201 ms
(9 lignes)


postgres=# explain analyze with numb as(select ceiling(2582*random()) rand, generate_series(1,5) as monnum) select monnum, prenom from numb,prenoms where numb.rand=prenoms.id;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=0.21..54.59 rows=65 width=11) (actual time=0.105..0.291 rows=5 loops=1)
   Hash Cond: ((prenoms.id)::double precision = numb.rand)
   CTE numb
     ->  ProjectSet  (cost=0.00..0.05 rows=5 width=12) (actual time=0.002..0.003 rows=5 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
   ->  Seq Scan on prenoms  (cost=0.00..40.82 rows=2582 width=11) (actual time=0.011..0.131 rows=2582 loops=1)
   ->  Hash  (cost=0.10..0.10 rows=5 width=12) (actual time=0.012..0.012 rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  CTE Scan on numb  (cost=0.00..0.10 rows=5 width=12) (actual time=0.004..0.005 rows=5 loops=1)
 Planning Time: 0.070 ms
 Execution Time: 0.313 ms
(11 lignes)


Marc MILLAS
Senior Architect
+33607850334



On Wed, Mar 24, 2021 at 3:22 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Mar 23, 2021 at 6:45 PM Marc Millas <marc.millas@mokadb.com> wrote:
So.. I would like to understand the "why" of this behaviour, ie. the change of order when I do the cast.

I believe the "why" is immaterial here.  Your queries do not contain order by so your results are unordered - even if there appears to be an apparent ordering for any particular result.

David J.

Re: need clarification on CTE/join

От
"David G. Johnston"
Дата:
On Tuesday, March 23, 2021, Marc Millas <marc.millas@mokadb.com> wrote:
Hi,

I cannot agree.
I did an explain analyze with and without the cast: its extremely different:

postgres=# explain analyze with numb as(select ceiling(2582*random())::int rand, generate_series(1,5) as monnum) select monnum, prenom from numb,prenoms where numb.rand=prenoms.id;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.33..37.65 rows=5 width=11) (actual time=0.059..0.147 rows=5 loops=1)
   CTE numb
     ->  ProjectSet  (cost=0.00..0.05 rows=5 width=8) (actual time=0.021..0.022 rows=5 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
   ->  CTE Scan on numb  (cost=0.00..0.10 rows=5 width=8) (actual time=0.023..0.025 rows=5 loops=1)
   ->  Index Scan using prenoms_pkey on prenoms  (cost=0.28..7.50 rows=1 width=11) (actual time=0.024..0.024 rows=1 loops=5)
         Index Cond: (id = numb.rand)
 Planning Time: 0.111 ms
 Execution Time: 0.201 ms
(9 lignes)


postgres=# explain analyze with numb as(select ceiling(2582*random()) rand, generate_series(1,5) as monnum) select monnum, prenom from numb,prenoms where numb.rand=prenoms.id;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=0.21..54.59 rows=65 width=11) (actual time=0.105..0.291 rows=5 loops=1)
   Hash Cond: ((prenoms.id)::double precision = numb.rand)
   CTE numb
     ->  ProjectSet  (cost=0.00..0.05 rows=5 width=12) (actual time=0.002..0.003 rows=5 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
   ->  Seq Scan on prenoms  (cost=0.00..40.82 rows=2582 width=11) (actual time=0.011..0.131 rows=2582 loops=1)
   ->  Hash  (cost=0.10..0.10 rows=5 width=12) (actual time=0.012..0.012 rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  CTE Scan on numb  (cost=0.00..0.10 rows=5 width=12) (actual time=0.004..0.005 rows=5 loops=1)
 Planning Time: 0.070 ms
 Execution Time: 0.313 ms
(11 lignes)


If I’m reading that correctly since prenoms.id is an integer if you don’t cast the ceiling(random) away from double you cannot use the index since its not the same type - the integer has to become double, not the reverse.  So you get a different execution and thus different result ordering since the executor doesn’t have to care about row order.

David J.

Re: need clarification on CTE/join

От
Marc Millas
Дата:
Got it :-)
thanks !

Marc MILLAS
Senior Architect
+33607850334



On Wed, Mar 24, 2021 at 4:21 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, March 23, 2021, Marc Millas <marc.millas@mokadb.com> wrote:
Hi,

I cannot agree.
I did an explain analyze with and without the cast: its extremely different:

postgres=# explain analyze with numb as(select ceiling(2582*random())::int rand, generate_series(1,5) as monnum) select monnum, prenom from numb,prenoms where numb.rand=prenoms.id;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.33..37.65 rows=5 width=11) (actual time=0.059..0.147 rows=5 loops=1)
   CTE numb
     ->  ProjectSet  (cost=0.00..0.05 rows=5 width=8) (actual time=0.021..0.022 rows=5 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
   ->  CTE Scan on numb  (cost=0.00..0.10 rows=5 width=8) (actual time=0.023..0.025 rows=5 loops=1)
   ->  Index Scan using prenoms_pkey on prenoms  (cost=0.28..7.50 rows=1 width=11) (actual time=0.024..0.024 rows=1 loops=5)
         Index Cond: (id = numb.rand)
 Planning Time: 0.111 ms
 Execution Time: 0.201 ms
(9 lignes)


postgres=# explain analyze with numb as(select ceiling(2582*random()) rand, generate_series(1,5) as monnum) select monnum, prenom from numb,prenoms where numb.rand=prenoms.id;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=0.21..54.59 rows=65 width=11) (actual time=0.105..0.291 rows=5 loops=1)
   Hash Cond: ((prenoms.id)::double precision = numb.rand)
   CTE numb
     ->  ProjectSet  (cost=0.00..0.05 rows=5 width=12) (actual time=0.002..0.003 rows=5 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
   ->  Seq Scan on prenoms  (cost=0.00..40.82 rows=2582 width=11) (actual time=0.011..0.131 rows=2582 loops=1)
   ->  Hash  (cost=0.10..0.10 rows=5 width=12) (actual time=0.012..0.012 rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  CTE Scan on numb  (cost=0.00..0.10 rows=5 width=12) (actual time=0.004..0.005 rows=5 loops=1)
 Planning Time: 0.070 ms
 Execution Time: 0.313 ms
(11 lignes)


If I’m reading that correctly since prenoms.id is an integer if you don’t cast the ceiling(random) away from double you cannot use the index since its not the same type - the integer has to become double, not the reverse.  So you get a different execution and thus different result ordering since the executor doesn’t have to care about row order.

David J.