Обсуждение: SQL problem: bank account
It seems to me this is a simple problem, but the solution eludes me. I have a table: bank_account ( transaction_id int not null serial, customer_id int not null references customer(id), ts timestamp not nulldefault now(), amount float not null, balance float not null, primary key(transaction_id) ) I need to get the most recent transaction for each customer. I need only the transaction ID, but the entire row would be best. I have two solutions, both of which are too slow for use in my interactive web-based interface: Solution1: Outer left self join: SELECT ba1.* FROM bank_account ba1 LEFT OUTER JOIN bank_account ba2 ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts WHERE ba2.ts IS NULL; This query works great on tables of less than a few thousand rows. For my 300k row table, it takes several hours. Solution2: max-concat trick SELECT split_part(max( extract(EPOCH from ts)::VARCHAR || ' ' || transaction_id::VARCHAR), ' ', 2)::INT FROM bank_account GROUP BY customer_id This is an ugly and obviously inefficient solution, but it does the job in about 1/2 hour. Still too long though. I've been working on this problem for days, and consulting friends. No elegant, fast solution is presenting itself. As I said, I feel I'm not seeing the obvious solution in front of my face. In the mean-time I can use this query to do the job on a per-customer basis: select * from bank_account where id = <the customer's ID> and ts = (select max(ts) from bank_account ba2 where ba2.customer_id = bank_account.customer_id); However, doing this for all 40,000 customers is not workable as a manual process. My last resort is to do it this way to pre-generate the report, but I'd far rather do it real-time. Help! My brain hurts! -- Erik G. Burrows - KG6HEA www.erikburrows.com PGP Key: http://www.erikburrows.com/egb@erikburrows.com.pgpkey
> I need to get the most recent transaction for each customer. I need only > the transaction ID, but the entire row would be best. > Why don't you alter the customer table to hold the transaction ID of the most recent transaction? Some questions though: Do you vacuum the database regularly? Do you have any indexes created? Some things to think about. Andy
> It seems to me this is a simple problem, but the solution eludes me. > > I have a table: > > bank_account ( > transaction_id int not null serial, > customer_id int not null references customer(id), > ts timestamp not null default now(), > amount float not null, > balance float not null, > primary key(transaction_id) > ) > > I need to get the most recent transaction for each customer. I need only > the transaction ID, but the entire row would be best. For the sake of being explicit, change your table definition (though what you have above is a-okay and works): CREATE SEQUENCE transaction_id_seq; CREATE TABLE bank_account ( transaction_id int not null DEFAULT NEXTVAL('transaction_id_seq'::TEXT), customer_id int notnull references customer(id), ts timestamp not null default now(), amount float not null, balance float not null, primarykey(transaction_id) ); Once you insert a value into the bank_account table, SELECT CURRVAL('transaction_id_seq') will be what you're looking for. Read up on CURRVAL() at: http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=functions-sequence.html -sc -- Sean Chittenden
How about: select max(transaction_id) from bank_account group by customer_id Should work... Detlef -----Ursprüngliche Nachricht----- Von: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]Im Auftrag von Erik G. Burrows Gesendet: Montag, 2. Juni 2003 07:30 An: pgsql-sql@postgresql.org Betreff: [SQL] SQL problem: bank account It seems to me this is a simple problem, but the solution eludes me. I have a table: bank_account ( transaction_id int not null serial, customer_id int not null references customer(id), ts timestamp not nulldefault now(), amount float not null, balance float not null, primary key(transaction_id) ) I need to get the most recent transaction for each customer. I need only the transaction ID, but the entire row would be best. I have two solutions, both of which are too slow for use in my interactive web-based interface: Solution1: Outer left self join: SELECT ba1.* FROM bank_account ba1 LEFT OUTER JOIN bank_account ba2 ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts WHERE ba2.ts IS NULL; This query works great on tables of less than a few thousand rows. For my 300k row table, it takes several hours. Solution2: max-concat trick SELECT split_part(max( extract(EPOCH from ts)::VARCHAR || ' ' || transaction_id::VARCHAR), ' ', 2)::INT FROM bank_account GROUP BY customer_id This is an ugly and obviously inefficient solution, but it does the job in about 1/2 hour. Still too long though. I've been working on this problem for days, and consulting friends. No elegant, fast solution is presenting itself. As I said, I feel I'm not seeing the obvious solution in front of my face. In the mean-time I can use this query to do the job on a per-customer basis: select * from bank_account where id = <the customer's ID> and ts = (select max(ts) from bank_account ba2 where ba2.customer_id = bank_account.customer_id); However, doing this for all 40,000 customers is not workable as a manual process. My last resort is to do it this way to pre-generate the report, but I'd far rather do it real-time. Help! My brain hurts! -- Erik G. Burrows - KG6HEA www.erikburrows.com PGP Key: http://www.erikburrows.com/egb@erikburrows.com.pgpkey ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
"Erik G. Burrows" <eburrows@erikburrows.com> writes: > I need to get the most recent transaction for each customer. I need only > the transaction ID, but the entire row would be best. If you don't mind a not-standard-SQL solution, the SELECT DISTINCT ON construct is designed for this sort of thing. See the "weather report" example in the SELECT reference page. regards, tom lane
I am afraid, this looks even uglier then your second solution, but should work, and be quicker... -- You need this to avoid having to rescan the whole table for each customerid every time and resort the results create index customer_txstamp_idx on bank_account (customer_id, ts); select ba.* from bank_account ba where transaction_id = (select transaction_id from bank_account where customer_id = ba.customer_id order by customer_id desc, ts desc limit 1); Now, note that, if you have lots of different customers in that table, it will still take a while to fetch them all (although, it should still be a lot quicker then half an hour) - in that case, you may consider either getting them one-by-one (by adding ... and customer_id=? to the above query) or using cursors... I hope, it helps... Dima Erik G. Burrows wrote: >It seems to me this is a simple problem, but the solution eludes me. > >I have a table: > >bank_account ( > transaction_id int not null serial, > customer_id int not null references customer(id), > ts timestamp not null default now(), > amount float not null, > balance float not null, > primary key(transaction_id) >) > >I need to get the most recent transaction for each customer. I need only >the transaction ID, but the entire row would be best. > > > >I have two solutions, both of which are too slow for use in my >interactive web-based interface: > >Solution1: Outer left self join: >SELECT > ba1.* >FROM > bank_account ba1 > LEFT OUTER JOIN bank_account ba2 > ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts >WHERE > ba2.ts IS NULL; > >This query works great on tables of less than a few thousand rows. For >my 300k row table, it takes several hours. > > >Solution2: max-concat trick >SELECT > split_part(max( > extract(EPOCH from ts)::VARCHAR || ' ' || > transaction_id::VARCHAR), ' ', 2)::INT >FROM > bank_account >GROUP BY > customer_id > >This is an ugly and obviously inefficient solution, but it does the job >in about 1/2 hour. Still too long though. > > > >I've been working on this problem for days, and consulting friends. No >elegant, fast solution is presenting itself. As I said, I feel I'm not >seeing the obvious solution in front of my face. > >In the mean-time I can use this query to do the job on a per-customer >basis: > >select > * >from > bank_account >where > id = <the customer's ID> > and ts = (select max(ts) from bank_account ba2 where ba2.customer_id = >bank_account.customer_id); > > >However, doing this for all 40,000 customers is not workable as a manual >process. My last resort is to do it this way to pre-generate the report, >but I'd far rather do it real-time. > >Help! My brain hurts! > > >
On Monday 02 June 2003 00:49, listrec wrote: > How about: > > select max(transaction_id) from bank_account group by customer_id > And if you want the rest of the data in the rows: SELECT b.* FROM bank_account b JOIN (SELECT max(transaction_id) AS transaction_idFROM bank_accountGROUP BY customer_id ) AS subselect ON subselect.transaction_id=b.transaction_id ; I use a similar query here at work to find the first time a guy has visited our site through an advertiser. -- Jonathan Gardner <jgardner@jonathangardner.net> (was jgardn@alumni.washington.edu) Live Free, Use Linux!
On Monday 02 June 2003 00:49, listrec wrote: > How about: > > select max(transaction_id) from bank_account group by customer_id > And if you want the rest of the data in the rows: SELECT b.* FROM bank_account b JOIN (SELECT max(transaction_id) AS transaction_idFROM bank_accountGROUP BY customer_id ) AS subselect ON subselect.transaction_id=b.transaction_id ; I use a similar query here at work to find the first time a guy has visited our site through an advertiser. -- Jonathan M. Gardner Smooth Corporation - Perl Programmer jgardner@smoothcorp.com - (425) 460-4780 Live Free, Use Linux!
This is the solution I was looking for! 20 seconds to compile the whole list. Thanks everyone for your help, I very much appreciate it. Even though this is kind of 'hackey' being that it's non-standard SQL, it keeps the database from having to to 40,000 selects, as would have to do with any correlated sub-select statement. Thanks! > "Erik G. Burrows" <eburrows@erikburrows.com> writes: > > I need to get the most recent transaction for each customer. I need only > > the transaction ID, but the entire row would be best. > > If you don't mind a not-standard-SQL solution, the SELECT DISTINCT ON > construct is designed for this sort of thing. See the "weather report" > example in the SELECT reference page. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- Erik G. Burrows - KG6HEA www.erikburrows.com PGP Key: http://www.erikburrows.com/egb@erikburrows.com.pgpkey