Обсуждение: optimizing large query with IN (...)
Guys,
I got a Java program to tune. It connects to a 7.4.1 postgresql server
running Linux using JDBC.
The program needs to update a counter on a somewhat large number of
rows, about 1200 on a ~130k rows table. The query is something like
the following:
UPDATE table SET table.par = table.par + 1
WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
This query runs on a transaction (by issuing a call to
setAutoCommit(false)) and a commit() right after the query
is sent to the backend.
The process of committing and updating the values is painfully slow
(no surprises here). Any ideas?
Thanks.
> UPDATE table SET table.par = table.par + 1
> WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
How fast is the query alone, i.e.
SELECT * FROM table
WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
>>UPDATE table SET table.par = table.par + 1
>>WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
>
>
> How fast is the query alone, i.e.
>
> SELECT * FROM table
> WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
Also, post the output of '\d table' and EXPLAIN ANALYZE UPDATE...
Chris
On Wed, Mar 10, 2004 at 12:35:15AM -0300, Marcus Andree S. Magalhaes wrote:
> Guys,
>
> I got a Java program to tune. It connects to a 7.4.1 postgresql server
> running Linux using JDBC.
>
> The program needs to update a counter on a somewhat large number of
> rows, about 1200 on a ~130k rows table. The query is something like
> the following:
>
> UPDATE table SET table.par = table.par + 1
> WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
>
> This query runs on a transaction (by issuing a call to
> setAutoCommit(false)) and a commit() right after the query
> is sent to the backend.
>
> The process of committing and updating the values is painfully slow
> (no surprises here). Any ideas?
I posted an analysis of use of IN () like this a few weeks ago on
pgsql-general.
The approach you're using is optimal for < 3 values.
For any more than that, insert value1 ... value1200 into a temporary
table, then do
UPDATE table SET table.par = table.par + 1
WHERE table.key IN (SELECT value from temp_table);
Indexing the temporary table marginally increases the speed, but not
significantly.
Cheers,
Steve
Hmm... from the 'performance' point of view, since the data comes from a quite complex select statement, Isn't it better/quicker to have this select replaced by a select into and creating a temporary database? > The problem, as I understand it, is that 7.4 introduced massive > improvements in handling moderately large in() clauses, as long as they > can fit in sort_mem, and are provided by a subselect. > > So, creating a temp table with all the values in it and using in() on > the temp table may be a win: > > begin; > create temp table t_ids(id int); > insert into t_ids(id) values (123); <- repeat a few hundred times > select * from maintable where id in (select id from t_ids); > ...
I'm not sure exactly what you're saying here. If the data in the in() clause comes from a complex select, then just use the select in there, and bypass the temporary table idea. I'm not sure what a temporary database is, did you mean temporary table? if so, then my above comment addresses that point. On Wed, 10 Mar 2004, Marcus Andree S. Magalhaes wrote: > > Hmm... from the 'performance' point of view, since the data comes from > a quite complex select statement, Isn't it better/quicker to have this > select replaced by a select into and creating a temporary database? > > > > > The problem, as I understand it, is that 7.4 introduced massive > > improvements in handling moderately large in() clauses, as long as they > > can fit in sort_mem, and are provided by a subselect. > > > > So, creating a temp table with all the values in it and using in() on > > the temp table may be a win: > > > > begin; > > create temp table t_ids(id int); > > insert into t_ids(id) values (123); <- repeat a few hundred times > > select * from maintable where id in (select id from t_ids); > > ... > > > >
On Wed, Mar 10, 2004 at 02:02:23PM -0300, Marcus Andree S. Magalhaes wrote: > Hmm... from the 'performance' point of view, since the data comes from > a quite complex select statement, Isn't it better/quicker to have this > select replaced by a select into and creating a temporary database? Definitely - why loop the data into the application and back out again if you don't need to? > > The problem, as I understand it, is that 7.4 introduced massive > > improvements in handling moderately large in() clauses, as long as they > > can fit in sort_mem, and are provided by a subselect. > > > > So, creating a temp table with all the values in it and using in() on > > the temp table may be a win: > > > > begin; > > create temp table t_ids(id int); > > insert into t_ids(id) values (123); <- repeat a few hundred times > > select * from maintable where id in (select id from t_ids); > > ... Cheers, Steve
On Wed, 10 Mar 2004, Marcus Andree S. Magalhaes wrote:
>
> Guys,
>
> I got a Java program to tune. It connects to a 7.4.1 postgresql server
> running Linux using JDBC.
>
> The program needs to update a counter on a somewhat large number of
> rows, about 1200 on a ~130k rows table. The query is something like
> the following:
>
> UPDATE table SET table.par = table.par + 1
> WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
>
> This query runs on a transaction (by issuing a call to
> setAutoCommit(false)) and a commit() right after the query
> is sent to the backend.
>
> The process of committing and updating the values is painfully slow
> (no surprises here). Any ideas?
The problem, as I understand it, is that 7.4 introduced massive
improvements in handling moderately large in() clauses, as long as they
can fit in sort_mem, and are provided by a subselect.
So, creating a temp table with all the values in it and using in() on the
temp table may be a win:
begin;
create temp table t_ids(id int);
insert into t_ids(id) values (123); <- repeat a few hundred times
select * from maintable where id in (select id from t_ids);
...
Marcus, > The problem, as I understand it, is that 7.4 introduced massive > improvements in handling moderately large in() clauses, as long as they > can fit in sort_mem, and are provided by a subselect. Also, this problem may be fixed in 7.5, when it comes out. It's a known issue. -- Josh Berkus Aglio Database Solutions San Francisco
Marcus Andree S. Magalhaes wrote:
> Guys,
>
> I got a Java program to tune. It connects to a 7.4.1 postgresql server
> running Linux using JDBC.
>
> The program needs to update a counter on a somewhat large number of
> rows, about 1200 on a ~130k rows table. The query is something like
> the following:
>
> UPDATE table SET table.par = table.par + 1
> WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
>
How often do you update this counter? Each update requires adding a new
row to the table and invalidating the old one. Then the old ones stick
around until the next vacuum.