Обсуждение: where (x,y,z) in ((x1,y1, z1), (x1,y1, z1), (x1,y1, z1), (x2,y2, z2)) (not) optimized
where (x,y,z) in ((x1,y1, z1), (x1,y1, z1), (x1,y1, z1), (x2,y2, z2)) (not) optimized
От
Grzegorz Jaśkiewicz
Дата:
Hey folks, I have question really for all mighty developers, but don't want to spam -hackers with it. why : select * from foo where X in (1,1,1,1,1,1,1,1) --- same values in search. or select * from foo where (x,y) in ((1,2),(1,2),(1,2),(1,2),(1,2),(1,2),(1,2)); never gets optimized by planner, etc ? Is it just not worth optimizing from pg side? I am sure, it would make sense to actually reorder these values, so that index/whatnot could pick it up faster. Just another one of those, 'why' (not) questions from my side. thanks. -- GJ
Re: where (x,y,z) in ((x1,y1, z1), (x1,y1, z1), (x1,y1, z1), (x2,y2, z2)) (not) optimized
От
Filip Rembiałkowski
Дата:
2009/1/26 Grzegorz Jaśkiewicz <gryzman@gmail.com>
I would guess that optimizing silly-written queries was always a low-priority task...
IMHO this is good topic for -hackers list.. and probably not so hard to implement :)
BTW, test on CVS HEAD:
CREATE TABLE atest(id integer primary key);
insert into atest select x from generate_series(1,100000) x(x);
ANALYZE atest;
EXPLAIN ANALYZE SELECT * FROM atest where id in (1,2,3,3,3,3,3,3,3,5,5,5,5,5,5,5,5,5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);
EXPLAIN ANALYZE SELECT * FROM atest where id in (1,2,3,5);
shows that second query is 2.5 times faster than the first ( 0.170 ms / 0.070 ms).
Hey folks,
I have question really for all mighty developers, but don't want to
spam -hackers with it.
why :
select * from foo where X in (1,1,1,1,1,1,1,1) --- same values in search.
or select * from foo where (x,y) in ((1,2),(1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
never gets optimized by planner, etc ?
I would guess that optimizing silly-written queries was always a low-priority task...
IMHO this is good topic for -hackers list.. and probably not so hard to implement :)
BTW, test on CVS HEAD:
CREATE TABLE atest(id integer primary key);
insert into atest select x from generate_series(1,100000) x(x);
ANALYZE atest;
EXPLAIN ANALYZE SELECT * FROM atest where id in (1,2,3,3,3,3,3,3,3,5,5,5,5,5,5,5,5,5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);
EXPLAIN ANALYZE SELECT * FROM atest where id in (1,2,3,5);
shows that second query is 2.5 times faster than the first ( 0.170 ms / 0.070 ms).
Is it just not worth optimizing from pg side? I am sure, it would make
sense to actually reorder these values, so that index/whatnot could
pick it up faster.
Just another one of those, 'why' (not) questions from my side.
thanks.
--
GJ
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Filip Rembiałkowski
Re: where (x,y,z) in ((x1,y1, z1), (x1,y1, z1), (x1,y1, z1), (x2,y2, z2)) (not) optimized
От
Grzegorz Jaśkiewicz
Дата:
On Mon, Jan 26, 2009 at 1:58 PM, Filip Rembiałkowski <plk.zuber@gmail.com> wrote: > 2009/1/26 Grzegorz Jaśkiewicz <gryzman@gmail.com> >> >> Hey folks, >> >> I have question really for all mighty developers, but don't want to >> spam -hackers with it. >> >> why : >> select * from foo where X in (1,1,1,1,1,1,1,1) --- same values in search. >> or select * from foo where (x,y) in >> ((1,2),(1,2),(1,2),(1,2),(1,2),(1,2),(1,2)); >> >> never gets optimized by planner, etc ? > > I would guess that optimizing silly-written queries was always a > low-priority task... > EXPLAIN ANALYZE SELECT * FROM atest where id in > (1,2,3,3,3,3,3,3,3,5,5,5,5,5,5,5,5,5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); > EXPLAIN ANALYZE SELECT * FROM atest where id in (1,2,3,5); > > shows that second query is 2.5 times faster than the first ( 0.170 ms / > 0.070 ms). the difference isn't so small than :) silly or not, sometimes you end up with such collection passed on in some silly languages. -- GJ
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes: > why : > select * from foo where X in (1,1,1,1,1,1,1,1) --- same values in search. > never gets optimized by planner, etc ? We actually did that, before 8.0 or thereabouts. It was removed because it cost a lot of planner cycles for *every* query, and helped for only a small number of stupidly-written ones. In general, you can probably remove such duplicates on the client side more efficiently than the database could anyway (realizing that whatever it might do has to be datatype-independent). regards, tom lane