Re: More of a SQL question, I guess.
От | Frank Bax |
---|---|
Тема | Re: More of a SQL question, I guess. |
Дата | |
Msg-id | 46C1BDEA.6050209@sympatico.ca обсуждение исходный текст |
Ответ на | More of a SQL question, I guess. (Howard Eglowstein <howard@yankeescientific.com>) |
Ответы |
Re: More of a SQL question, I guess.
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-novice |
Howard Eglowstein wrote: > I have a database with 150 fields per row. For historical reasons, it's > broken into three databases, data_a, data_b and data_c. One of the > fields is a timestamp and one of them is a serial number that came in > from a remote machine that reports its status every day. > > If I use a command like 'SELECT MAX(Timestamp) FROM data_a WHERE field1 > = '0001', I can get back the time of the latest report. I can then issue > the command 'SELECT * from data_a, data_b, data_c WHERE > data_a.id=data_b.id AND data_a.id=data_c.id and field1='0001' to get the > 150 fields for that report. It works fine, but it takes a while to > respond when the database is the better part of a million records. > > If it were just one serial number, the two queries would be okay because > It really only takes a minute. The problem I have is that we're hoping > to have thousands of machine in the field (this is a just a test > database) and clearly this approach won't work for thousands of serial > numbers. The second method I did was to simply use 'SE:ECT * from > data_a.....' to get all million records, and have my C code look for > serial numbers in each line and keep the latest by timestamp. That takes > about as long as doing the first procedure 3 times, but it gives me the > latest data for all of the serial numbers in the system. That's > perfectly cool, except that it won't scale nicely. If the web code that > does the search isn't on the same machine that holds the data, we'll > have to ship gigabytes of data over the network for each search. > > What I'd *like* is something that uses groups and MAX() to do this in > one SQL command. It would group the data by the serial number (field1), > find the record in each group with the maximum timestamp value and > return all 150 fields. Is this possible? I thought of using unions, but > I think I have to issue a pretty long command for each group and the PG > buffers probably will max out after some relatively small number, no? > > Any thoughts would be appreciated. Start with SELECT field1,MAX(timestamp) as ts FROM data_a GROUP BY field1 to get a timestamp for each value of field1; then use self join to get the serial number from these rows. At this point you have a problem if there are any duplicate timestamp values. SELECT id, a.f1, ts from data_a join (select f,max(timestamp) as ts from data_a group by f1) as a on a.ts=data_a.timestamp; Call this select SQL1 and join itto your longer sql statement: SELECT * from data_a, data_b, data_c JOIN (SQL1) as aa on aa.id = data_a.id WHERE data_a.id=data_b.id AND data_a.id=data_c.id It seems to me the underlying question here comes up often. Might I suggest that we add a new question to section 4 of FAQ: Q: How can I retrieve other fields in same row as result returned by an aggregate function such as max()? A: Is my "SQL1" the best/only answer to this question?
В списке pgsql-novice по дате отправления:
Предыдущее
От: "Patrick Lindeman"Дата:
Сообщение: Re: Postmaster start up problems (can't create lock file )
Следующее
От: "Jasbinder Singh Bali"Дата:
Сообщение: Re: Postmaster start up problems (can't create lock file )