Обсуждение: Alternative to a temporary table
I have a PHP web-based application where a temporary list of servers and their characteristics (each represented by a unique numeric server_id) is extracted from a master server list based on a number of dynamic and user-selected criteria (can the user view the server, is it on-line, is it a member of a specific group, etc). Once the user selects the set of criteria (and servers), it rarely change during the rest of the browser session. The resulting temporary list of servers is then joined against other tables with different sets of information about each of the servers, based on the server_id. I currently create a temporary table to hold the selected server_id's and characteristics. I then join this temp table with other data tables to produce my reports. My reason for using the temporary table method is that the SQL for the initial server selection is dynamically created based on the user's selections, and is complex enough that it does not lend itself to being easily incorporated into any of the other data extraction queries (which may also contain dynamic filtering). Unfortunately, the PHP connection to the database does not survive from webscreen to webscreen, so I have to re-generate the temporary server_id table each time it is needed for a report screen. An idea I had to make this process more efficient was instead of re-creating the temporary table over and over each time it is needed, do a one-time extraction of the list of user-selected server_id's, store the list in a PHP global variable, and then use the list in a dynamically-created WHERE clause in the rest of the queries. The resulting query would look something like SELECT * FROM some_data_table WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...) Simple enough, however in rare cases the list of server_id's can range between 6,000 and 10,000. My question to the group is, can having so many values in a WHERE/IN clause effect query performance? Am I being naive about this and is there a different, better way? The server_id field is of course indexed, but it is possible that the list of selected sid's can contain almost all of the values in the some_data_table server_id index (in the situation where _all_ of the records are requested I wouldn't use the WHERE clause in the query). The some_data_table can contain millions of records for thousands of servers, so every bit of efficiency helps. If this is not the proper group for this kind of question, please point me in the right direction. Thanks! --- Steve ___________________________________________________________________________________ Steven Rosenstein IT Architect/Developer | IBM Virtual Server Administration Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 Text Messaging: 6463456978 @ mobile.mycingular.com Email: srosenst @ us.ibm.com "Learn from the mistakes of others because you can't live long enough to make them all yourself." -- Eleanor Roosevelt
On Mon, Oct 03, 2005 at 11:47:52AM -0400, Steven Rosenstein wrote: > I currently create a temporary table to hold the selected server_id's and > characteristics. I then join this temp table with other data tables to > produce my reports. My reason for using the temporary table method is that > the SQL for the initial server selection is dynamically created based on > the user's selections, and is complex enough that it does not lend itself > to being easily incorporated into any of the other data extraction queries > (which may also contain dynamic filtering). > > Unfortunately, the PHP connection to the database does not survive from > webscreen to webscreen, so I have to re-generate the temporary server_id > table each time it is needed for a report screen. An idea I had to make > this process more efficient was instead of re-creating the temporary table > over and over each time it is needed, do a one-time extraction of the list > of user-selected server_id's, store the list in a PHP global variable, and > then use the list in a dynamically-created WHERE clause in the rest of the > queries. The resulting query would look something like > > SELECT * > FROM some_data_table > WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...) > > Simple enough, however in rare cases the list of server_id's can range > between 6,000 and 10,000. > > My question to the group is, can having so many values in a WHERE/IN clause > effect query performance? Probably, yes. As always, benchmark a test case, but last time I checked (in 7.4) you'd be better creating a new temporary table for every query than use an IN list that long. It's a lot better in 8.0, I believe, so you should benchmark it there. > Am I being naive about this and is there a > different, better way? The server_id field is of course indexed, but it is > possible that the list of selected sid's can contain almost all of the > values in the some_data_table server_id index (in the situation where _all_ > of the records are requested I wouldn't use the WHERE clause in the query). > The some_data_table can contain millions of records for thousands of > servers, so every bit of efficiency helps. Don't use a temporary table. Instead use a permanent table that contains the server ids you need and the PHP session token. Then you can create your list of server_ids once and insert it into that table associated with your sessionid. Then future queries can be simple joins against that table. SELECT some_data_table.* FROM some_data_table, session_table WHERE some_data_table.server_id = session_table.server_id AND session_table.session_id = 'foobar' You'd need a reaper process to delete old data from that table to prevent it from growing without limit, and probably a table associating session start time with sessionid to make reaping easier. Cheers, Steve