Обсуждение: SQL problem: bank account

Поиск
Список
Период
Сортировка

SQL problem: bank account

От
"Erik G. Burrows"
Дата:
It seems to me this is a simple problem, but the solution eludes me.

I have a table:

bank_account ( transaction_id int not null serial, customer_id int not null references customer(id), ts timestamp not
nulldefault now(), amount float not null, balance float not null, primary key(transaction_id)
 
)

I need to get the most recent transaction for each customer. I need only
the transaction ID, but the entire row would be best.



I have two solutions, both of which are too slow for use in my
interactive web-based interface:

Solution1: Outer left self join:
SELECT ba1.*
FROM bank_account ba1 LEFT OUTER JOIN bank_account ba2   ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts
WHERE ba2.ts IS NULL;

This query works great on tables of less than a few thousand rows. For
my 300k row table, it takes several hours.


Solution2: max-concat trick
SELECT split_part(max(     extract(EPOCH from ts)::VARCHAR || '    ' ||     transaction_id::VARCHAR), '    ', 2)::INT
FROM bank_account
GROUP BY customer_id

This is an ugly and obviously inefficient solution, but it does the job
in about 1/2 hour. Still too long though.



I've been working on this problem for days, and consulting friends. No
elegant, fast solution is presenting itself. As I said, I feel I'm not
seeing the obvious solution in front of my face.

In the mean-time I can use this query to do the job on a per-customer
basis:

select *
from bank_account
where id = <the customer's ID> and ts = (select max(ts) from bank_account ba2 where ba2.customer_id =
bank_account.customer_id);


However, doing this for all 40,000 customers is not workable as a manual
process. My last resort is to do it this way to pre-generate the report,
but I'd far rather do it real-time.

Help! My brain hurts!

-- 
Erik G. Burrows - KG6HEA                   www.erikburrows.com
PGP Key: http://www.erikburrows.com/egb@erikburrows.com.pgpkey


Re: SQL problem: bank account

От
"Andrew J. Kopciuch"
Дата:
> I need to get the most recent transaction for each customer. I need only
> the transaction ID, but the entire row would be best.
>

Why don't you alter the customer table to hold the transaction ID of the most 
recent transaction?



Some questions though:

Do you vacuum the database regularly?
Do you have any indexes created?


Some things to think about.



Andy




Re: SQL problem: bank account

От
Sean Chittenden
Дата:
> It seems to me this is a simple problem, but the solution eludes me.
>
> I have a table:
> 
> bank_account (
>   transaction_id int not null serial,
>   customer_id int not null references customer(id),
>   ts timestamp not null default now(),
>   amount float not null,
>   balance float not null,
>   primary key(transaction_id)
> )
> 
> I need to get the most recent transaction for each customer. I need only
> the transaction ID, but the entire row would be best.

For the sake of being explicit, change your table definition (though
what you have above is a-okay and works):

CREATE SEQUENCE transaction_id_seq;
CREATE TABLE bank_account ( transaction_id int not null DEFAULT NEXTVAL('transaction_id_seq'::TEXT), customer_id int
notnull references customer(id), ts timestamp not null default now(), amount float not null, balance float not null,
primarykey(transaction_id)
 
);

Once you insert a value into the bank_account table, SELECT
CURRVAL('transaction_id_seq') will be what you're looking for.  Read
up on CURRVAL() at:

http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=functions-sequence.html

-sc

-- 
Sean Chittenden


Re: SQL problem: bank account

От
"listrec"
Дата:
How about:

select max(transaction_id) from bank_account group by customer_id

Should work...

Detlef

-----Ursprüngliche Nachricht-----
Von: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org]Im Auftrag von Erik G. Burrows
Gesendet: Montag, 2. Juni 2003 07:30
An: pgsql-sql@postgresql.org
Betreff: [SQL] SQL problem: bank account


It seems to me this is a simple problem, but the solution eludes me.

I have a table:

bank_account ( transaction_id int not null serial, customer_id int not null references customer(id), ts timestamp not
nulldefault now(), amount float not null, balance float not null, primary key(transaction_id)
 
)

I need to get the most recent transaction for each customer. I need only
the transaction ID, but the entire row would be best.



I have two solutions, both of which are too slow for use in my
interactive web-based interface:

Solution1: Outer left self join:
SELECT ba1.*
FROM bank_account ba1 LEFT OUTER JOIN bank_account ba2   ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts
WHERE ba2.ts IS NULL;

This query works great on tables of less than a few thousand rows. For
my 300k row table, it takes several hours.


Solution2: max-concat trick
SELECT split_part(max(     extract(EPOCH from ts)::VARCHAR || '    ' ||     transaction_id::VARCHAR), '    ', 2)::INT
FROM bank_account
GROUP BY customer_id

This is an ugly and obviously inefficient solution, but it does the job
in about 1/2 hour. Still too long though.



I've been working on this problem for days, and consulting friends. No
elegant, fast solution is presenting itself. As I said, I feel I'm not
seeing the obvious solution in front of my face.

In the mean-time I can use this query to do the job on a per-customer
basis:

select *
from bank_account
where id = <the customer's ID> and ts = (select max(ts) from bank_account ba2 where ba2.customer_id =
bank_account.customer_id);


However, doing this for all 40,000 customers is not workable as a manual
process. My last resort is to do it this way to pre-generate the report,
but I'd far rather do it real-time.

Help! My brain hurts!

--
Erik G. Burrows - KG6HEA                   www.erikburrows.com
PGP Key: http://www.erikburrows.com/egb@erikburrows.com.pgpkey

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly



Re: SQL problem: bank account

От
Tom Lane
Дата:
"Erik G. Burrows" <eburrows@erikburrows.com> writes:
> I need to get the most recent transaction for each customer. I need only
> the transaction ID, but the entire row would be best.

If you don't mind a not-standard-SQL solution, the SELECT DISTINCT ON
construct is designed for this sort of thing.  See the "weather report"
example in the SELECT reference page.
        regards, tom lane


Re: SQL problem: bank account

От
Dmitry Tkach
Дата:
I am afraid, this looks even uglier then your second solution, but 
should work, and be quicker...

-- You need this to avoid having to rescan the whole table for each 
customerid every time and resort the results
create index customer_txstamp_idx on bank_account (customer_id, ts);

select ba.* from bank_account ba where
transaction_id = (select transaction_id from bank_account where 
customer_id = ba.customer_id order by customer_id desc, ts desc limit 1);

Now, note that, if you have lots of different customers in that table, 
it will still take a while to fetch them all (although, it should still 
be a lot quicker then half an hour) - in that case, you may consider 
either getting them one-by-one (by adding ... and customer_id=? to the 
above query) or using cursors...

I hope, it helps...

Dima


Erik G. Burrows wrote:

>It seems to me this is a simple problem, but the solution eludes me.
>
>I have a table:
>
>bank_account (
>  transaction_id int not null serial,
>  customer_id int not null references customer(id),
>  ts timestamp not null default now(),
>  amount float not null,
>  balance float not null,
>  primary key(transaction_id)
>)
>
>I need to get the most recent transaction for each customer. I need only
>the transaction ID, but the entire row would be best.
>
>
>
>I have two solutions, both of which are too slow for use in my
>interactive web-based interface:
>
>Solution1: Outer left self join:
>SELECT
>  ba1.*
>FROM
>  bank_account ba1
>  LEFT OUTER JOIN bank_account ba2
>    ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts
>WHERE
>  ba2.ts IS NULL;
>
>This query works great on tables of less than a few thousand rows. For
>my 300k row table, it takes several hours.
>
>
>Solution2: max-concat trick
>SELECT
>  split_part(max(
>      extract(EPOCH from ts)::VARCHAR || '    ' ||
>      transaction_id::VARCHAR), '    ', 2)::INT
>FROM
>  bank_account
>GROUP BY
>  customer_id
>
>This is an ugly and obviously inefficient solution, but it does the job
>in about 1/2 hour. Still too long though.
>
>
>
>I've been working on this problem for days, and consulting friends. No
>elegant, fast solution is presenting itself. As I said, I feel I'm not
>seeing the obvious solution in front of my face.
>
>In the mean-time I can use this query to do the job on a per-customer
>basis:
>
>select
>  *
>from
>  bank_account
>where
>  id = <the customer's ID>
>  and ts = (select max(ts) from bank_account ba2 where ba2.customer_id =
>bank_account.customer_id);
>
>
>However, doing this for all 40,000 customers is not workable as a manual
>process. My last resort is to do it this way to pre-generate the report,
>but I'd far rather do it real-time.
>
>Help! My brain hurts!
>
>  
>




Re: SQL problem: bank account

От
Jonathan Gardner
Дата:
On Monday 02 June 2003 00:49, listrec wrote:
> How about:
>
> select max(transaction_id) from bank_account group by customer_id
>

And if you want the rest of the data in the rows:

SELECT b.*
FROM bank_account b
JOIN  (SELECT max(transaction_id) AS transaction_idFROM bank_accountGROUP BY customer_id
) AS subselect ON subselect.transaction_id=b.transaction_id
;

I use a similar query here at work to find the first time a guy has visited 
our site through an advertiser.

-- 
Jonathan Gardner <jgardner@jonathangardner.net>
(was jgardn@alumni.washington.edu)
Live Free, Use Linux!


Re: SQL problem: bank account

От
Jonathan Gardner
Дата:
On Monday 02 June 2003 00:49, listrec wrote:
> How about:
>
> select max(transaction_id) from bank_account group by customer_id
>

And if you want the rest of the data in the rows:

SELECT b.*
FROM bank_account b
JOIN  (SELECT max(transaction_id) AS transaction_idFROM bank_accountGROUP BY customer_id
) AS subselect ON subselect.transaction_id=b.transaction_id
;

I use a similar query here at work to find the first time a guy has visited 
our site through an advertiser.

-- 
Jonathan M. Gardner
Smooth Corporation - Perl Programmer
jgardner@smoothcorp.com - (425) 460-4780
Live Free, Use Linux!


Re: SQL problem: bank account

От
"Erik G. Burrows"
Дата:
This is the solution I was looking for! 20 seconds to compile the whole
list.

Thanks everyone for your help, I very much appreciate it. Even though
this is kind of 'hackey' being that it's non-standard SQL, it keeps the
database from having to to 40,000 selects, as would have to do with any
correlated sub-select statement.

Thanks!

> "Erik G. Burrows" <eburrows@erikburrows.com> writes:
> > I need to get the most recent transaction for each customer. I need only
> > the transaction ID, but the entire row would be best.
> 
> If you don't mind a not-standard-SQL solution, the SELECT DISTINCT ON
> construct is designed for this sort of thing.  See the "weather report"
> example in the SELECT reference page.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/docs/faqs/FAQ.html
-- 
Erik G. Burrows - KG6HEA                   www.erikburrows.com
PGP Key: http://www.erikburrows.com/egb@erikburrows.com.pgpkey