Обсуждение: Alternative to a temporary table

Поиск
Список
Период
Сортировка

Alternative to a temporary table

От
Steven Rosenstein
Дата:



I have a PHP web-based application where a temporary list of servers and
their characteristics (each represented by a unique numeric server_id) is
extracted from a master server list based on a number of dynamic and
user-selected criteria (can the user view the server, is it on-line, is it
a member of a specific group, etc).  Once the user selects the set of
criteria (and servers), it rarely change during the rest of the browser
session.  The resulting temporary list of servers is then joined against
other tables with different sets of information about each of the servers,
based on the server_id.

I currently create a temporary table to hold the selected server_id's and
characteristics.  I then join this temp table with other data tables to
produce my reports.  My reason for using the temporary table method is that
the SQL for the initial server selection is dynamically created based on
the user's selections, and is complex enough that it does not lend itself
to being easily incorporated into any of the other data extraction queries
(which may also contain dynamic filtering).

Unfortunately, the PHP connection to the database does not survive from
webscreen to webscreen, so I have to re-generate the temporary server_id
table each time it is needed for a report screen.  An idea I had to make
this process more efficient was instead of re-creating the temporary table
over and over each time it is needed, do a one-time extraction of the list
of user-selected server_id's, store the list in a PHP global variable, and
then use the list in a dynamically-created WHERE clause in the rest of the
queries.  The resulting query would look something like

     SELECT *
     FROM some_data_table
     WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...)

Simple enough, however in rare cases the list of server_id's can range
between 6,000 and 10,000.

My question to the group is, can having so many values in a WHERE/IN clause
effect query performance?  Am I being naive about this and is there a
different, better way?  The server_id field is of course indexed, but it is
possible that the list of selected sid's can contain almost all of the
values in the some_data_table server_id index (in the situation where _all_
of the records are requested I wouldn't use the WHERE clause in the query).
The some_data_table can contain millions of records for thousands of
servers, so every bit of efficiency helps.

If this is not the proper group for this kind of question, please point me
in the right direction.

Thanks!
--- Steve
___________________________________________________________________________________

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


Re: Alternative to a temporary table

От
Steve Atkins
Дата:
On Mon, Oct 03, 2005 at 11:47:52AM -0400, Steven Rosenstein wrote:

> I currently create a temporary table to hold the selected server_id's and
> characteristics.  I then join this temp table with other data tables to
> produce my reports.  My reason for using the temporary table method is that
> the SQL for the initial server selection is dynamically created based on
> the user's selections, and is complex enough that it does not lend itself
> to being easily incorporated into any of the other data extraction queries
> (which may also contain dynamic filtering).
>
> Unfortunately, the PHP connection to the database does not survive from
> webscreen to webscreen, so I have to re-generate the temporary server_id
> table each time it is needed for a report screen.  An idea I had to make
> this process more efficient was instead of re-creating the temporary table
> over and over each time it is needed, do a one-time extraction of the list
> of user-selected server_id's, store the list in a PHP global variable, and
> then use the list in a dynamically-created WHERE clause in the rest of the
> queries.  The resulting query would look something like
>
>      SELECT *
>      FROM some_data_table
>      WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...)
>
> Simple enough, however in rare cases the list of server_id's can range
> between 6,000 and 10,000.
>
> My question to the group is, can having so many values in a WHERE/IN clause
> effect query performance?

Probably, yes. As always, benchmark a test case, but last time I
checked (in 7.4) you'd be better creating a new temporary table for
every query than use an IN list that long. It's a lot better in 8.0, I
believe, so you should benchmark it there.

> Am I being naive about this and is there a
> different, better way?  The server_id field is of course indexed, but it is
> possible that the list of selected sid's can contain almost all of the
> values in the some_data_table server_id index (in the situation where _all_
> of the records are requested I wouldn't use the WHERE clause in the query).
> The some_data_table can contain millions of records for thousands of
> servers, so every bit of efficiency helps.

Don't use a temporary table. Instead use a permanent table that
contains the server ids you need and the PHP session token. Then you
can create your list of server_ids once and insert it into that table
associated with your sessionid. Then future queries can be simple
joins against that table.

    SELECT some_data_table.*
    FROM some_data_table, session_table
    WHERE some_data_table.server_id = session_table.server_id
      AND session_table.session_id = 'foobar'

You'd need a reaper process to delete old data from that table to
prevent it from growing without limit, and probably a table associating
session start time with sessionid to make reaping easier.

Cheers,
  Steve