Обсуждение: question about efficiency
Howdy:
Running PostgreSQL 7.2.1 on RedHat Linux 7.2.
Have a question about a query and it's efficiency:
One of the users created a query that looks like this:
[snip]
"create table dev_lbpclaimsum as
SELECT
  claimsum2001.c_contract_num,
  claimsum2001.c_mbr_num,
  claimsum2001.c_proc_cd,
  claimsum2001.c_proc_mod,
  claimsum2001.d_from_dt,
  claimsum2001.d_thru_dt,
  claimsum2001.i_pd,
  claimsum2001.c_serv_prov
FROM
  claimsum2001 a join dev_pb_proc b on a.c_proc_cd = substr (b.c_proc_cd,1,6)
[/snip]
After looking monitoring the query, it seems that the job took over a day
and still never completed.  I looked at it and replaced the
'claimsum2001' with 'a' and did an explain on both:
[snip]
[with alias 'a' ]
Nested Loop  (cost=0.00..64051744.91 rows=15892944 width=84)
  ->  Seq Scan on dev_pb_proc b  (cost=0.00..20.00 rows=1000 width=10)
  ->  Index Scan using stateclaim01_proc_cd_i on claimsum2001 a  (cost=0.00..63813.33 rows=15893 width=74)
[/with alias]
[original way]
psql:./jans_stuff.sql:14: NOTICE:  Adding missing FROM-clause entry for table "claimsum2001"
psql:./jans_stuff.sql:14: NOTICE:  QUERY PLAN:
Nested Loop  (cost=0.00..19629582277130.89 rows=214697805924711 width=95)
  ->  Nested Loop  (cost=0.00..64051744.91 rows=15892944 width=21)
        ->  Seq Scan on dev_pb_proc b  (cost=0.00..20.00 rows=1000 width=10)
        ->  Index Scan using stateclaim01_proc_cd_i on claimsum2001 a  (cost=0.00..63813.33 rows=15893 width=11)
  ->  Seq Scan on claimsum2001  (cost=0.00..1100019.02 rows=13509002 width=74)
EXPLAIN
[/original way]
[/snip]
I created the table in the modified way, it took maybe an hour or so (which is great!)
But, my question is:
Can someone explain why there's a difference between using 
claimsum2001.(whatever) and a.(whatever)?   
I wouldn't have thought that it would have mattered in efficiency, just been more convenient.
Thanks!
-X
On Mon, 11 Nov 2002, Johnson, Shaunn wrote: > Howdy: > > Running PostgreSQL 7.2.1 on RedHat Linux 7.2. > > Have a question about a query and it's efficiency: > > One of the users created a query that looks like this: > > [snip] > "create table dev_lbpclaimsum as > SELECT > claimsum2001.c_contract_num, > claimsum2001.c_mbr_num, > claimsum2001.c_proc_cd, > claimsum2001.c_proc_mod, > claimsum2001.d_from_dt, > claimsum2001.d_thru_dt, > claimsum2001.i_pd, > claimsum2001.c_serv_prov > FROM > claimsum2001 a join dev_pb_proc b on a.c_proc_cd = substr > (b.c_proc_cd,1,6) This query is (afaik) illegal in plain SQL and for postgres assumes you want a join with claimsum2001 as claimsum2001. See notice, > psql:./jans_stuff.sql:14: NOTICE: Adding missing FROM-clause entry for > table "claimsum2001" It's basically converted it into a three way join. > Can someone explain why there's a difference between using > claimsum2001.(whatever) and a.(whatever)? Once you alias claimsum2001 as a there isn't a table claimsum2001 in the query, but you're asking for columns from the non-existant table in the select clause. PostgreSQL tries to be helpful by adding a from entry for the table whereas it'd technically be illegal.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> It's basically converted it into a three way join.
More to the point, an *unconstrained* join between "claimsum2001
as claimsum2001" and the intended joined table.  The reason it
ran for so long was it was generating a huge number of rows ---
as many as would be in the cross product.
            regards, tom lane
			
		> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: >> It's basically converted it into a three way join. > > More to the point, an *unconstrained* join between "claimsum2001 > as claimsum2001" and the intended joined table. The reason it > ran for so long was it was generating a huge number of rows --- > as many as would be in the cross product. Would it be possible to have some sort of configurable limit on this: max_result_set_size = 10,000,000 max_result_set_action = ignore | debug | warn | error The planner knows how many rows it's expecting to return, and a warning message would show up unconstrained queries like this. - Richard Huxton
<dev@archonet.com> writes:
> Would it be possible to have some sort of configurable limit on this:
> max_result_set_size = 10,000,000
> max_result_set_action = ignore | debug | warn | error
> The planner knows how many rows it's expecting to return,
Personally, I would never trust the planner's estimate far enough
to use it as the basis for triggering an error ;-)
            regards, tom lane