Обсуждение: Simulating an outer join
I have been thinking about how to simulate an outer join. It seems the best way is to do: SELECT tab1.col1, tab2.col3 FROM tab1, tab2 WHERE tab1.col1 = tab2.col2 UNION ALL SELECT tab1.col1, NULL FROM tab1 WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2) Comments? I know someone was asking about this recently. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > > I have been thinking about how to simulate an outer join. It seems the > best way is to do: > > SELECT tab1.col1, tab2.col3 > FROM tab1, tab2 > WHERE tab1.col1 = tab2.col2 > UNION ALL > SELECT tab1.col1, NULL > FROM tab1 > WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2) > > Comments? I know someone was asking about this recently. > I wouldn't use IN ;-) SELECT table1.key, table2.value FROM table1, table2 WHERE table1.key = table2.key UNION ALL SELECT table1.key, NULL FROM table1 WHERE NOT EXISTS (SELECT table2.key FROM table2 WHERE table1.key = table2.key); Mike Mascari
Can somebody comment on using EXISTS vs. IN in a subselect? I have some statements with subselects, and I'd like to understand the ramifications of choosing EXISTS or IN. Sarah Officer officers@aries.tucson.saic.com Mike Mascari wrote: > > Bruce Momjian wrote: > > > > I have been thinking about how to simulate an outer join. It seems the > > best way is to do: > > > > SELECT tab1.col1, tab2.col3 > > FROM tab1, tab2 > > WHERE tab1.col1 = tab2.col2 > > UNION ALL > > SELECT tab1.col1, NULL > > FROM tab1 > > WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2) > > > > Comments? I know someone was asking about this recently. > > > > I wouldn't use IN ;-) > > SELECT table1.key, table2.value > FROM table1, table2 > WHERE table1.key = table2.key > UNION ALL > SELECT table1.key, NULL > FROM table1 WHERE NOT EXISTS > (SELECT table2.key FROM table2 WHERE table1.key = table2.key); > > Mike Mascari > > ************
> Bruce Momjian wrote: > > > > I have been thinking about how to simulate an outer join. It seems the > > best way is to do: > > > > SELECT tab1.col1, tab2.col3 > > FROM tab1, tab2 > > WHERE tab1.col1 = tab2.col2 > > UNION ALL > > SELECT tab1.col1, NULL > > FROM tab1 > > WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2) > > > > Comments? I know someone was asking about this recently. > > > > I wouldn't use IN ;-) > > SELECT table1.key, table2.value > FROM table1, table2 > WHERE table1.key = table2.key > UNION ALL > SELECT table1.key, NULL > FROM table1 WHERE NOT EXISTS > (SELECT table2.key FROM table2 WHERE table1.key = table2.key); Yes, this is our brain-damaged parser/optmizer that likes the usually slower EXISTS with correlated subquery to the much clearer NOT IN. Bummer. I want to avoid having to put this workaround into my book, but I may have no choice. The work around is so non-obvious as to be a terrible hinderance for normal users. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Can somebody comment on using EXISTS vs. IN in a subselect? I have > some statements with subselects, and I'd like to understand the > ramifications of choosing EXISTS or IN. We have some brain-damaged code that is faster with EXISTS than IN. With IN, the subquery is evaluated and the result put in a temp relation. Every test for IN sequentially scans the subquery result looking for a match. EXISTS uses an index on the subquery result. Not sure why we can't just fix this, but I don't understand enough to know the reason. People who do understand say there is no good solution until we redesign the query tree in 7.1. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Bruce Momjian wrote: > > > > I have been thinking about how to simulate an outer join. It seems the > > best way is to do: > > > > SELECT tab1.col1, tab2.col3 > > FROM tab1, tab2 > > WHERE tab1.col1 = tab2.col2 > > UNION ALL > > SELECT tab1.col1, NULL > > FROM tab1 > > WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2) > > > > Comments? I know someone was asking about this recently. Mike Mascari wrote: > > I wouldn't use IN ;-) > > SELECT table1.key, table2.value > FROM table1, table2 > WHERE table1.key = table2.key > UNION ALL > SELECT table1.key, NULL > FROM table1 WHERE NOT EXISTS > (SELECT table2.key FROM table2 WHERE table1.key = table2.key); FWIW, that's exactly Joe Celko's SQL-89 workaround for OUTER JOINs in 'SQL for Smarties'. Well in fact he uses (SELECT * FROM table2 WHERE table1.key = table2.key) as the subquery, but I presume that's an insignificant difference. Julian Scarfe
> > UNION ALL > > SELECT table1.key, NULL > > FROM table1 WHERE NOT EXISTS > > (SELECT table2.key FROM table2 WHERE table1.key = table2.key); > > FWIW, that's exactly Joe Celko's SQL-89 workaround for OUTER JOINs in 'SQL for > Smarties'. Well in fact he uses (SELECT * FROM table2 WHERE table1.key = > table2.key) as the subquery, but I presume that's an insignificant difference. I am just writing the EXISTS section from my book. I don't think it matters what fields are returned from an EXISTS subquery. If I am wrong, someone please let me know. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > I am just writing the EXISTS section from my book. I don't think it > matters what fields are returned from an EXISTS subquery. If I am > wrong, someone please let me know. Celko also writes (in his chapter on EXISTS in "SQL for Smarties"): "In general the SELECT * option should perform better than the actual column. It lets the query optimizer decide which column to use. If a column has an index on it, then simply seeing a pointer to the index is enough to determine that something exists." Obviously you're in a much better position than me to judge whether that's the case in pgsql! But it might be worth a test. Julian Scarfe
> Bruce Momjian wrote: > > > I am just writing the EXISTS section from my book. I don't think it > > matters what fields are returned from an EXISTS subquery. If I am > > wrong, someone please let me know. > > Celko also writes (in his chapter on EXISTS in "SQL for Smarties"): > > "In general the SELECT * option should perform better than the actual column. > It lets the query optimizer decide which column to use. If a column has an > index on it, then simply seeing a pointer to the index is enough to determine > that something exists." > > Obviously you're in a much better position than me to judge whether that's the > case in pgsql! But it might be worth a test. In psql, I think * would generate all the columns, then throw it away, while a specific column would only carry around that column in the subquery result. so a single column is better. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I need to transfer data from pg to m$sql. m$sql has bcp (like pg's copy). bcp use delimiter. there are 4 ways 1) use pg's copy then m$sql's bcp: bcp does not recognize null. so, I have to use pg's copy with null as '' (BTW, it does not work in 6.5.1, right?) a more serious problem is that text field usually contains ugly-dirty charactors, so, what is the safe delimiter? -- the key here is that delimiter approach does not escape within the content (for speed). 2) to solve that headache, I try to use "pg_dump -D" . however, pg put a lot of escape there (e.g. \015 ). also, int8 can not be used here. -- I rechecked the design, and change it to int4 and then dump it :-) more seriously, m$sql is strangely sensitive to the word "exit" (quotation mark not included) within the sql single quotation mark! m$shit! 3) dynamically connect two db. using perl. or, if can not connected directly, use perl write a "pg_dump-D-like" program. 4) bcp can use fix-length format. However, seems pg's copy can not. of course, I can write a perl to do that. but text field can not have fixed length! Do I have to use 3) ? is there a short cut?