Обсуждение: Most efficient way to hard-sort records
Hello, I have a table with cca 100,000 records. I need to hard-sort the records by a key from second table. By hard-sorting I mean storing position of each row in the main table. Here is what my tables look like: main_table: id, name, position key_table: id, main_table_id, key, value Here is how I need to sort the records: SELECT * FROM main_table INNER JOIN key_table ON main_table.id = key_table.main_table_id WHERE key = 'param' ORDER BY value I currently collect all ids from main_table in sorted order and then update the position field for each row in the main_table one-by-one. Is there a better/faster/more efficient solution? Thank you for your suggestions. -- Miroslav Šulc
Вложения
Hi, Miroslav, Miroslav Šulc schrieb: > I have a table with cca 100,000 records. I need to hard-sort the records > by a key from second table. By hard-sorting I mean storing position of > each row in the main table. Here is what my tables look like: > > main_table: id, name, position > key_table: id, main_table_id, key, value > > Here is how I need to sort the records: > SELECT * FROM main_table > INNER JOIN key_table ON main_table.id = key_table.main_table_id > WHERE key = 'param' > ORDER BY value > > I currently collect all ids from main_table in sorted order and then > update the position field for each row in the main_table one-by-one. Is > there a better/faster/more efficient solution? Create an SQL function that selects the sort value from the key table when given id as parameter, and then create a functional index on the table, and CLUSTER the table on the index. Scratch-Code (untested): CREATE FUNCTION getvalue (ID int4) RETURNS int4 AS " SELECT value FROM key_table WHERE value=$1 LIMIT 1" LANGUAGE SQL STRICT; CREATE INDEX main_table_order_idx ON main_table (getvalue(id)); CLUSTER main_table_order_idx ON main_table; HTH, Markus
> main_table: id, name, position > key_table: id, main_table_id, key, value > > Here is how I need to sort the records: > SELECT * FROM main_table > INNER JOIN key_table ON main_table.id = key_table.main_table_id > WHERE key = 'param' > ORDER BY value > > I currently collect all ids from main_table in sorted order and then > update the position field for each row in the main_table one-by-one. Is > there a better/faster/more efficient solution? A cheap solution if you don't care about the position value as long as sort order is ok. 1) # SELECT main_table.id into temp_table FROM main_table INNER JOIN key_table ON main_table.id = key_table.main_table_id ORDER BY value; 2) # update main_table set position = (select oid from temp_table where id = main_table.id ); I guess I'll get a set of consecutive oids by this. You can make the number begin at arbitrary number, by 2-a) # update main_table set position = ( (select oid::int4 from temp_table where id = main_table.id ) - (select min(oid::int4) from temp_table) + 1) ; I read that oid wraps around (after ~ billions) so you might want to check your current oid. Regards, Ben K. Developer http://benix.tamu.edu
Is it possible to do this : CREATE TABLE sorted (order_no SERIAL PRIMARY KEY, other columns...) INSERT INTO sorted (columns) SELECT * FROM main_table INNER JOIN key_table ON main_table.id = key_table.main_table_id WHERE key = 'param' ORDER BY value SELECT The SERIAL will automatically generate the order_no you want, which corresponds to the position in the sorted set. Then, to get the records in-order : SELECT * FROM sorted ORDER BY order_no As the records have been inserted in-order in the "sorted" table, this table is, in fact, clustered, so a full table scan using the index on "order_no" will be very fast.Of course this is only interesting if this data is quite static, because you'll have to re-generate the table when the data changes. There is another solution : CREATE INDEX on key_table( key, value ) Now, the index can optimize ordering by (key,value), which is equivalent to ordering by value if key = constant. A bit of query manipulation might get you what you want ; I suppose all rows in "key_table" reference a row in "main_table" ; so it is faster to sort (and limit) first on key_table, then grab the rows from main_table : SELECT k.value, m.* FROM key_table k LEFT JOIN main_table m ON m.id=k.main_table_id WHERE k.key='param' ORDER BY k.key, k.value If key_table REFERENCES main_table, LEFT JOIN is equivalent to INNER JOIN ; however if the planner is smart enough, it might notice that it can index-scan key_table in key,value order, grabbing rows from main_table in order and skip the sort entirely. On Sun, 07 May 2006 08:53:46 +0200, Ben K. <bkim@coe.tamu.edu> wrote: >> main_table: id, name, position >> key_table: id, main_table_id, key, value >> >> Here is how I need to sort the records: >> SELECT * FROM main_table >> INNER JOIN key_table ON main_table.id = key_table.main_table_id >> WHERE key = 'param' >> ORDER BY value >> >> I currently collect all ids from main_table in sorted order and then >> update the position field for each row in the main_table one-by-one. Is >> there a better/faster/more efficient solution? > > > A cheap solution if you don't care about the position value as long as > sort order is ok. > > 1) > # SELECT main_table.id into temp_table FROM main_table INNER JOIN > key_table ON main_table.id = key_table.main_table_id ORDER BY value; > > 2) > # update main_table set position = (select oid from temp_table where id > = main_table.id ); > > I guess I'll get a set of consecutive oids by this. > > You can make the number begin at arbitrary number, by > > 2-a) > # update main_table set position = ( (select oid::int4 from temp_table > where id = main_table.id ) - (select min(oid::int4) from temp_table) > + 1) ; > > I read that oid wraps around (after ~ billions) so you might want to > check your current oid. > > > > > Regards, > > Ben K. > Developer > http://benix.tamu.edu > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
> CREATE TABLE sorted (order_no SERIAL PRIMARY KEY, other columns...)
> INSERT INTO sorted (columns) SELECT * FROM main_table INNER JOIN
> key_table ON main_table.id = key_table.main_table_id WHERE key = 'param' ORDER
> BY value SELECT
> The SERIAL will automatically generate the order_no you want, which
> corresponds to the position in the sorted set.
> Then, to get the records in-order :
> SELECT * FROM sorted ORDER BY order_no
Good ... I just got myself into the habit of not recreating a table since
I have to clean up permissions and what not. I guess it depends.
Another version along that line ?
# create sequence counterseq start 1;
-- (set/reset whenever a counter is needed)
# select main_table.*, nextval('counterseq') as position2 into sorted_main_table from main_table, keytable where
main_table.id= keytable.main_table_id order by value;
Regards,
Ben K.
Developer
http://benix.tamu.edu
> Another version along that line ?
>
> # create sequence counterseq start 1;
> -- (set/reset whenever a counter is needed)
>
> # select main_table.*, nextval('counterseq') as position2
> into sorted_main_table
> from main_table, keytable where main_table.id =
> keytable.main_table_id
> order by value;
You could also use generate_series(), but I don't know if it can generate
unbounded series...
Hi PFC,
the generate_series() seems to be the right thing I need in my scenario
but I didn't figure out how to create a SELECT stattement that will
contain sorted main_table.id in first column and generated series in
second column which I would use to update the main_table.position. Is it
possible? How can I do that? Or is there another approach to employ
generate_series()?
Miroslav Šulc
PFC napsal(a):
>
>> Another version along that line ?
>>
>> # create sequence counterseq start 1;
>> -- (set/reset whenever a counter is needed)
>>
>> # select main_table.*, nextval('counterseq') as position2
>> into sorted_main_table
>> from main_table, keytable where main_table.id =
>> keytable.main_table_id
>> order by value;
>
> You could also use generate_series(), but I don't know if it can
> generate unbounded series...
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
Вложения
On May 6, 2006, at 17:13 , Miroslav Šulc wrote: > I have a table with cca 100,000 records. I need to hard-sort the > records > by a key from second table. By hard-sorting I mean storing position of > each row in the main table. Here is what my tables look like: > > main_table: id, name, position > key_table: id, main_table_id, key, value > > Here is how I need to sort the records: > SELECT * FROM main_table > INNER JOIN key_table ON main_table.id = key_table.main_table_id > WHERE key = 'param' > ORDER BY value I don't know about faster or more efficient, but here's an alternative that I haven't seen yet: Michael Glaesemann grzm seespotcode net