Обсуждение: max length of sql select statement ?
Hi All! I was searching the archive and was wondering why nobody asked this strange(!) question (or I've not found it?): "What is the max allowed length of a sql statement or query?" I want to combine hundrets or thousands 'OR' within a select statement. Possible or not (e.g: SELECT * FROM table WHERE col='a' OR col='b' OR col='c' OR ...... ) This would be a very "dirty" or some would say, a "horrible" solution; but how are you searching through your table with sql, when your SELECT is against a collection of different elements (e.g. Array of Strings, which should fit on one column and return all records which fit) Hope for help cheers Markus - sql beginner ;-)
Depending on ur original problem EXISTS or IN may be usable EXISTS is efficient and IN can be used efficiently in 7.4 version of postgresql regds mallah. > Hi All! > > I was searching the archive and was wondering why nobody asked this > strange(!) question (or I've not found it?): > > "What is the max allowed length of a sql statement or query?" > I want to combine hundrets or thousands 'OR' within a select statement. > Possible or not (e.g: SELECT * FROM table WHERE col='a' OR col='b' OR > col='c' OR ...... ) > > This would be a very "dirty" or some would say, a "horrible" solution; > > but how are you searching through your table with sql, when your SELECT > is against a collection of different elements (e.g. Array of Strings, > which should fit on one column and return all records which fit) > > Hope for help > cheers Markus - sql beginner ;-) > > > > > ---------------------------(end of > broadcast)--------------------------- TIP 1: subscribe and unsubscribe > commands go to majordomo@postgresql.org ----------------------------------------- Get your free web based email at trade-india.com. "India's Leading B2B eMarketplace.!" http://www.trade-india.com/
On Mon, 2003-07-07 at 17:57, mallah@trade-india.com wrote:
> Depending on ur original problem EXISTS or IN may be usable
> EXISTS is efficient and IN can be used efficiently in 7.4 version
> of postgresql
Could be a solution?!
The question is - how long could the IN be?
I mean, if I write something like:
SELECT * FROM table WHERE columnX IN ('a', 'b', 'c', 'd', 'e', ... );
How long can the collection (list) within IN be? Also thousands of
elements?
And what means efficient? Goes the DB only once through the table?
Cheers, Markus
> On Mon, 2003-07-07 at 17:57, mallah@trade-india.com wrote:
>> Depending on ur original problem EXISTS or IN may be usable
>> EXISTS is efficient and IN can be used efficiently in 7.4 version of
>> postgresql
>
> Could be a solution?!
> The question is - how long could the IN be?
>
> I mean, if I write something like:
> SELECT * FROM table WHERE columnX IN ('a', 'b', 'c', 'd', 'e', ... );
> How long can the collection (list) within IN be? Also thousands of
> elements?
Well i DO NOT know the exact limit.
May be someone else can answer it accurately.
But you could produce the list within IN using a subselect
that again depends on the exact problem.
regds
Mallah.
>
> And what means efficient? Goes the DB only once through the table?
>
> Cheers, Markus
>
>
>
>
> ---------------------------(end of
> broadcast)--------------------------- TIP 2: you can get off all lists
> at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
-----------------------------------------
Get your free web based email at trade-india.com. "India's Leading B2B eMarketplace.!"
http://www.trade-india.com/
> Could be a solution?! > The question is - how long could the IN be? I'm not sure about IN specifically, but I know you can do: SELECT * FROM table WHERE col = '<1GB long file>'; It tends not to be friendly for Ram though :)
On Mon, 2003-07-07 at 18:13, mallah@trade-india.com wrote: > Well i DO NOT know the exact limit. > May be someone else can answer it accurately. > > But you could produce the list within IN using a subselect > that again depends on the exact problem. > Maybe anybody knows how many? Anyway: My exact problem is "in words" quite easy: col1 | col2 ------------123 | 958143 | 394124 | 345324 | 345346 | 541743 | 144346 | 986 Imagine, this table is really big (millions records). Now, I want to retrieve for all records in col A OR col B where either 123, 124, 144, 541 (and a view thousands more ...) fits. As far as I understud you: SELECT * FROM table WHERE col1 IN (123,124,144,541,...) OR col2 IN (123,124,144,541,...); Cheers, Markus
On 7 Jul 2003, markus brosch wrote: > I was searching the archive and was wondering why nobody asked this > strange(!) question (or I've not found it?): > > "What is the max allowed length of a sql statement or query?" AFAIR in recent versions it's effectively limited only by resources (how much bandwidth/memory do you want to use). > I want to combine hundrets or thousands 'OR' within a select statement. > Possible or not (e.g: SELECT * FROM table WHERE col='a' OR col='b' OR > col='c' OR ...... ) It should be possible. > This would be a very "dirty" or some would say, a "horrible" solution; > > but how are you searching through your table with sql, when your SELECT > is against a collection of different elements (e.g. Array of Strings, > which should fit on one column and return all records which fit) Are the 'a', 'b', etc... fixed or generated on the fly? I'm not sure that the plan for a thousand OR clauses (or equivalently a thousand entries in a static IN) is going to necessarily be terribly good. You might have better luck setting it up to do a join with a table (possibly a temporary table if they're generated on the fly).
> > but how are you searching through your table with sql, when your SELECT > > is against a collection of different elements (e.g. Array of Strings, > > which should fit on one column and return all records which fit) > > Are the 'a', 'b', etc... fixed or generated on the fly? I'm not sure > that the plan for a thousand OR clauses (or equivalently a thousand > entries in a static IN) is going to necessarily be terribly good. You > might have better luck setting it up to do a join with a table (possibly a > temporary table if they're generated on the fly). Generated "on the fly" by JDBC within Java ;-) I already tried the join, but it takes ages *g*
On Mon, 2003-07-07 at 18:20, Rod Taylor wrote:
> > Could be a solution?!
> > The question is - how long could the IN be?
>
> I'm not sure about IN specifically, but I know you can do:
> SELECT * FROM table WHERE col = '<1GB long file>';
> It tends not to be friendly for Ram though :)
Hi again!
After I decided to use this statement (discussion two days ago!):
SELECT * FROM table WHERE columnX IN ('a', 'b', 'c', 'd', 'e', ... );
I got this error :-(((
java.sql.SQLException: ERROR: Expression too complex: nesting depth
exceeds max_expr_depth = 10000
I was searching the web and only found this discussion:
http://groups.google.de/groups?hl=de&lr=&ie=UTF-8&oe=UTF-8&threadm=3B2EA1E9.843D940A%40selectacast.net&rnum=1&prev=/groups%3Fhl%3Dde%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26q%3DExpression%2Btoo%2Bcomplex%253A%2Bnesting%2Bdepth%2Bexceeds%2Bmax_expr_depth%2B%253D%2B10000%26btnG%3DGoogle%2BSuche
Maybe we can disuss that problem here again?! What exactly means
"max_expr_depth"? Thanks for any help
Cheers Markus
> Maybe we can disuss that problem here again?! What exactly means > "max_expr_depth"? Thanks for any help If I'm not mistaken, max_expr_depth is used to catch runaway recursion (view a is select * from b, view b is select * from a). It's a tunable in postgresql.conf. Toss a couple of 0's behind the number and uncomment the line.
On Wed, 2003-07-09 at 17:45, Rod Taylor wrote: > > Nobody a better idea? Why is the join of a temporary table (the IN > > paramters) and the original table so slow? Any tricks here? > > Did you index and ANALYZE the temporary table? No! I have to do this "strange" and "long" statement also a view thousand times ... so I always have to create the temp table again and again and if I index the table it costs me more time. As far I am relativly new to SQL, I ask you: Where's the difference between and indexed temp (!!!) table and an unindexed one? The main table is of course fully indexed ;-) Cheers, Markus
On Wed, 2003-07-09 at 15:51, markus brosch wrote: > On Wed, 2003-07-09 at 17:45, Rod Taylor wrote: > > > Nobody a better idea? Why is the join of a temporary table (the IN > > > paramters) and the original table so slow? Any tricks here? > > > > Did you index and ANALYZE the temporary table? > > No! > > I have to do this "strange" and "long" statement also a view thousand > times ... so I always have to create the temp table again and again and > if I index the table it costs me more time. So.. which costs more. Building the index + fast query or doing the slow query? If you have anything more than a thousand lines in the temp table, I bet indexing it would be the better way to go. > As far I am relativly new to SQL, I ask you: > Where's the difference between and indexed temp (!!!) table and an > unindexed one? The main table is of course fully indexed ;-) One has an index, which means that other (much faster) methods may be available for the database to use.
It's me again - with the easy and complex statement at the same time ;-)
Who can help me?!
As far, my solution is still unsatisfactory.
I'll explain my full task this time,
otherwise it is hard to explain!
My tables:
(redundant for speed reasons - doesn't matter - only requests!)
===============================================================
table: goGraph
primarykey(parent,child)
index (B-tree) on parent column and on child column.
parent | child
--------------------------
GO:0000001 | GO:0000002
GO:0000001 | GO:0000003
GO:0000002 | GO:0000005
GO:0000002 | GO:0000006
GO:0000003 | GO:0000006
GO:0000003 | GO:0000007
... | ...
about 15000 elements more!
This represents a direct acyclic graph:
GO:0000001 / \ v v GO:0000002 GO:0000003
/ \ / \ v v v v
GO:0000005 GO:0000006 GO:0000007
===============================================================
table: binaryInteraction
primarykey(baitProtein, preyProtein, gogoKey)
index (B-tree) on gogokey.
baitProtein | preyProtein | "baitGoId" | "preyGoId" | gogoKey
--------------------------------------------------------------
123 | 123 | GO:0000001 | GO:0000001 | 11
123 | 123 | GO:0000020 | GO:0000001 | 120
123 | 123 | GO:0000020 | GO:0000020 | 2020
123 | 345 | GO:0000001 | GO:0000001 | 11
123 | 345 | GO:0000020 | GO:0000001 | 120
... | ... | ... | ... | ...
up to several millions entries!
===============================================================
So, what I have to do is:
Someone choose 2 goIds. For these ids and all their children
(information from goGraph) I want the count(*) of "fits" within the
table binaryInteraction. That means, baitGoId and preyGoId must be
either the two choosen goIds or one of the related children. I use the
gogoKey for this task: I compute Integers (within Java) from the goIds
and sort smallerGoId.concatenate(biggerGoId) - so I've unique keys for
baitGo / preyGo pairs. One more problem: The goGraph (see table) is a
graph ... and for that reason one node can have more parents. If I
choose e.g. GO:000002 and GO:000003 from the example above and want to
compute their children, I don't want to take GO:000006 into account (two
parents in which I am intested in). That menas, whenever I ask for
children of two nodes, I want a DISTINCT SET of children.
Example how I am processing the data at the moment:
User chosse GoId1: GO:0000005 and GO:0000008;
Java:
- get all childs of GO:000005 by jdbc from goGraph - collection "one"
- get all childs of GO:000006 by jdbc from goGraph - collection "two"
- compute intersection of the two collections
- substract of each collection the intersection
- result: distinct children of two goIds
Now the binaryInteraction table came into our game ;-)
I apply the one distinct collection "one" and collection "two" to
gogoKey:
-----------------------
1. SELECT count(*) FROM (SELECT DISTINCT bait, prey FROM binaryInteraction WHERE gogoKey IN
(gogokey1,gogokey2, gogokey3, ... ) ) AS foo;
gogokeyX will be processed by java -> it generates for each
go-go-combination the unique gogokey.
Speed: medium
Limit: only 10000 Elements within the IN statement can be processed by
default. In postgres.conf I can change this limit, as Rod Taylor already
posted. But in the manual there is a warning of BufferOverflow if the
value it too high ...
If I reach the limit, I get something like:
java.sql.SQLException: ERROR: Expression too complex: nesting depth
exceeds max_expr_depth = 10000
BTW: Why nested depth? Within IN I compute thousands of "OR"s, but why
nested ... ???
-----------------------
2. for all gogokeys I generate a temporary table and do a join with the
binaryInteraction table.
SELECT COUNT(*)
FROM (SELECT DISTINCT binaryInteraction.bait binaryInteraction.prey FROM binaryInteraction, tempTable WHERE
binaryInteraction.gogokey= tempTable.gogokey)
AS temp)
Speed: very slow, but no limit!
-----------------------
After all, I want to compute and precalculate every possible
goId-goId-combination ;-) So, each of these things I explained here,
I've to do millions of time! I tried to find a recursive solution -
impossible!
Who can help? Maybe there is a way to compute it in one but more complex
sql-statemnt. Me as far I am still new to sql I don't know the "tricks"!
If there would be a way to calculate the intersection ... then maybe a
solution is not that far ... Also I was thinking about a denormalized
goGraph table: parent -> allChilds. Then I compute (within SQL?) the
intersection between the 2 GoIds and then I apply it to
binaryInteraction.
Thanks for any ideas and thanks at least
for reading that long story ;-)
Cheers Markus
PS: If you have any additional questions: please feel free to ask!
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
> ... I don't want to take GO:000006 into account (two parents in which
> I am intested in). That menas, whenever I ask for children of two nodes,
> I want a DISTINCT SET of children.
To start with, you can avoid the Java and do this in SQL:
SELECT child FROM gograph WHERE parent='GO:0000002' OR parent='GO:0000005'
EXCEPT
(SELECT child FROM gograph WHERE parent='GO:0000002'
INTERSECT
SELECT child FROM gograph WHERE parent='GO:0000005');
And yes, I would certainly start by normalizing things a little bit:
CREATE SEQUENCE goid_seq;
CREATE TABLE goID ( idname TEXT, id INTEGER NOT NULL DEFAULT nextval('goid_seq')
);
INSERT INTO goid(idname) SELECT DISTINCT parent FROM gograph;
INSERT INTO goid(idname)
SELECT DISTINCT child FROM gograph WHERE NOT EXISTS (SELECT 1 FROM goID WHERE idname = child);
CREATE TABLE gomap ( parent INTEGER, child INTEGER
);
INSERT INTO gomap SELECT
(SELECT id FROM goid WHERE idname=parent),
(SELECT id FROM goid WHERE idname=child)
FROM gograph
As far as the binaryInteraction table, a little more information is needed:
how are each of these tables being populated? Why the distinct? Is it because
there may be duplicate rows in the table? The reason I as is that it might be
better to ue triggers to compute some of the information as it comes in,
depending on which tables are changes and how often.
- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200307151035
-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html
iD8DBQE/FBGrvJuQZxSWSsgRAlEfAKCL4ttDdTRHxPRW9N00nowPh1/q/QCgqrkv
e7Ncj4al4aJ4ihktEyweJJo=
=Z/rk
-----END PGP SIGNATURE-----