Обсуждение: Re: Am I really stupid???
Original message from: Fabrice Scemama >On finit par penser que ta question comporte en elle >sa propre reponse :-) >informe-toi sur les joins en particulier et le SQL en general. >Tu critiqueras la liste et les gens ensuite. > >(translation to English-speaking mailing-list users: > I replied 'RTFM' :-) > >Fabrice Scemama > >Dragos Stoichita wrote: Let's get rid of those kind of messages. When you write a message saying you are a beginner, there's always a little bunch of people like you who particularly like to say: "go read the docs newbie" or "go learn that" etc. I am not someone who never reads the docs before asking a question. I know SQL, I know joins, and all the people who told me to rewrite the query would better have not answered my message, because I already knew that. I take your message as a personal insult, because I don't criticize people without reason, and I did criticize this list because it was justified. I need help about only once every 2 or three months, because I am able to use the docs and do my own work alone. HERE I AM NOT ASKING HELP! I asked a simple question: why the query with intersects is not only slower, but with a factor of 10^something slower than the join one. If the select returns a table of 1 column with n rows all integers, sorting them ~n*logn, unique is ~n and intersect is ~n. I have here a book on Algorithmics and have validated a course on this, I know what I talk about. Why, with performant sorting, unique and intersect algorithms, it takes 13 seconds??? I just want a real serious person that has knowledge to answer this precise question, using an algorithmic demonstration. I only talk about maths and algorithms here. Please be serious in your answer next time. Dragos Stoichita.
Dragos Stoichita wrote: > Original message from: Fabrice Scemama > >On finit par penser que ta question comporte en elle > >sa propre reponse :-) > >informe-toi sur les joins en particulier et le SQL en general. > >Tu critiqueras la liste et les gens ensuite. > > > >(translation to English-speaking mailing-list users: > > I replied 'RTFM' :-) > > > >Fabrice Scemama > > > >Dragos Stoichita wrote: > > Let's get rid of those kind of messages. I agree. > When you write a message > saying you are a beginner, there's always a little bunch of people like > you who particularly like to say: "go read the docs newbie" or > "go learn that" etc. > > I am not someone who never reads the docs before asking a question. > I know SQL, I know joins, and all the people who told me to rewrite > the query would better have not answered my message, because > I already knew that. Unfortunantely email often does not include a knowlege meter that tells others what is know or not. Some people are better at making guesses that others, but it's not always easy. > I take your message as a personal insult, because I don't criticize > people without reason, and I did criticize this list because it was > justified. I'm not convinced that your criticism of the list was justified (you are of course welcome to draw your own conclusions). > I need help about only once every 2 or three months, because I am > able to use the docs and do my own work alone. > > HERE I AM NOT ASKING HELP! > > I asked a simple question: why the query with intersects is not only > slower, but with a factor of 10^something slower than the join one. > If the select returns a table of 1 column with n rows all integers, > sorting them ~n*logn, unique is ~n and intersect is ~n. I have here > a book on Algorithmics and have validated a course on this, I know > what I talk about. Why, with performant sorting, unique and intersect > algorithms, it takes 13 seconds??? > > I just want a real serious person that has knowledge to answer this > precise question, using an algorithmic demonstration. I only talk about > maths and algorithms here. If you wish to talk about the actual implemented algorithms, you might have done better posting to the hackers mailing list. But remember, unless you pay for support, you have no intrinsic right to expect that others will take the problem as seriously as you do. Particularly if you cannot be VERY clear about what the problem is and what questions you want answerred. That being said, the developers have always treated me well. As for your specific question, I'd like to ask if you've run the vaious queries and subqueries using 'explain' -- it is sometimes the case that indexes do not behave as you would expect or hope on complex queries. I think that historical experience with this fact is what caused some well meaning responders to suggest rewrite alternatives for you. I do not think it was out of ill will or a desire to insult your intelligence. For example, if you look through the archive you will see quite a few discussions about slow returns from "SELECT * FROM foo WHERE x in (...)" -- this is a fact about the current implementation of PostgreSQL and the (only?) solution is to rewrite using "WHERE EXISTS ...." Though I forget the exact reason, it turns out that it is not a trivial matter to have the optimizer rewrite the first query into the second (and of course the first is slow because it cannot use inidces). Your question could be a similar case, or it could be a repairable shortcoming in the planner/optimizer. In either case, explain will help and the developers/hackers may be able to give specific insight, as may browsing the code. > Please be serious in your answer next time. I am being as serious as I can. Please do not take offense at anything I have said -- offense or insult is explicitly not my intent. My intent is to help. And maybe to gently suggest that in most cases where insult is percieved on these lists, it was not intended, and that it always takes two parties to escalate the sense of insult. Karl DeBisschop
Karl DeBisschop <kdebisschop@h00a0cc3b7988.ne.mediaone.net> writes:
> For example, if you look through the archive you will see quite a few
> discussions about slow returns from "SELECT * FROM foo WHERE x in (...)"
> -- this is a fact about the current implementation of PostgreSQL and the
> (only?) solution is to rewrite using "WHERE EXISTS ...."  Though I forget
> the exact reason, it turns out that it is not a trivial matter to have the
> optimizer rewrite the first query into the second (and of course the first
> is slow because it cannot use inidces).  Your question could be a similar
> case, or it could be a repairable shortcoming in the planner/optimizer.
In fact, IN (subselect), INTERSECT, and EXCEPT are all pretty much the
same thing, and they're all pretty slow in the current code :-(, because
they all work by rescanning the inner query for each outer tuple --- in
other words, they're all implemented like plain nestloop joins.  EXISTS
is marginally better because the planner can figure out how to use an
index on the inner table, if there is one.
The right way to fix this is to promote these operations into
full-fledged join types so that the optimizer can consider alternatives
like mergejoin (which is basically the method Dragos is talking about),
hashjoin, index-driven nestloop, etc.  That's not a small task.  I'm
hoping to see it happen as part of the querytree redesign scheduled for
7.2, which will also give us outer joins.  If you think about it, all
of these are variants on the theme of outer join...
            regards, tom lane
			
		At 11:18 AM 17-05-2000 -0400, Tom Lane wrote: >In fact, IN (subselect), INTERSECT, and EXCEPT are all pretty much the >same thing, and they're all pretty slow in the current code :-(, because >they all work by rescanning the inner query for each outer tuple --- in >other words, they're all implemented like plain nestloop joins. EXISTS >is marginally better because the planner can figure out how to use an >index on the inner table, if there is one. Does that mean for two tables, one 1000 rows another 2000 rows, it's a total of 1000 X 2000 rows scanned? I suppose one way to slightly reduce the number of rows would be to select stuff into temp tables first. Cheerio, Link.
Lincoln Yeoh <lylyeoh@mecomb.com> writes:
> At 11:18 AM 17-05-2000 -0400, Tom Lane wrote:
>> In fact, IN (subselect), INTERSECT, and EXCEPT are all pretty much the
>> same thing, and they're all pretty slow in the current code :-(, because
>> they all work by rescanning the inner query for each outer tuple --- in
>> other words, they're all implemented like plain nestloop joins.  EXISTS
>> is marginally better because the planner can figure out how to use an
>> index on the inner table, if there is one.
> Does that mean for two tables, one 1000 rows another 2000 rows, it's a
> total of 1000 X 2000 rows scanned?
You got it :-(.  (Unless we find what we're looking for before scanning
all the way through --- so M*N is the worst case, but the average case
is probably less.)
> I suppose one way to slightly reduce the number of rows would be to select
> stuff into temp tables first.
7.0 is smart enough to do that if the inner query looks complicated.
If it's just a sequential scan itself, then of course there's no win...
            regards, tom lane