Обсуждение: how to remove a for-loop from programming language and put it into the query?

Поиск
Список
Период
Сортировка

how to remove a for-loop from programming language and put it into the query?

От
Pedro Zorzenon Neto
Дата:
Hello!

Can someone help me to develop a query?

Things are more complex than this example, but with this simple example
I can explain what I need and get an answer.

Table: "diagnose_logs"
Fields:
  - id serial
  - hardware_id integer
  - diag_value integer
  - ts timestamp

So I collect many diagnose information from many hardwares.

So, I need to get a report of all diagnostics of all hardware on
december 25th.

(external programming language)
for ($i = 1; $i < 500; $i++) {
  // return me the "most recent" diag_value from a hardware_id $i
  // at the desired timestamp
  runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
23:59:59' and hardware_id = $i order by ts desc limit 1");
}

Currently I have an index on diagnose_logs(ts,hardware_id)
I have 3 milion registers of 500 different hardware_id.

The time to run 500 times this query is long... about 1 minute. When I
need a montly day-by-day report of 500 hardwares, it takes about half an
hour.

can I turn this for-loop into a single query to run in postgres?

Thanks,
Pedro


Re: how to remove a for-loop from programming language and put it into the query?

От
Thom Brown
Дата:
On 5 July 2010 15:48, Pedro Zorzenon Neto <pedro2009@mandic.com.br> wrote:
> Hello!
>
> Can someone help me to develop a query?
>
> Things are more complex than this example, but with this simple example
> I can explain what I need and get an answer.
>
> Table: "diagnose_logs"
> Fields:
>  - id serial
>  - hardware_id integer
>  - diag_value integer
>  - ts timestamp
>
> So I collect many diagnose information from many hardwares.
>
> So, I need to get a report of all diagnostics of all hardware on
> december 25th.
>
> (external programming language)
> for ($i = 1; $i < 500; $i++) {
>  // return me the "most recent" diag_value from a hardware_id $i
>  // at the desired timestamp
>  runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
> 23:59:59' and hardware_id = $i order by ts desc limit 1");
> }
>
> Currently I have an index on diagnose_logs(ts,hardware_id)
> I have 3 milion registers of 500 different hardware_id.
>
> The time to run 500 times this query is long... about 1 minute. When I
> need a montly day-by-day report of 500 hardwares, it takes about half an
> hour.
>
> can I turn this for-loop into a single query to run in postgres?
>
> Thanks,
> Pedro

I'm probably misunderstanding the problem, but can't you just do:

SELECT
    diag_value
FROM
    diagnose_logs
WHERE
    ts <= '2009-12-25 23:59:59'
AND
    hardware_id BETWEEN 1 AND 500
ORDER BY
    ts DESC
LIMIT 1

Regards

Thom

Re: how to remove a for-loop from programming language and put it into the query?

От
Sam Mason
Дата:
On Mon, Jul 05, 2010 at 11:48:37AM -0300, Pedro Zorzenon Neto wrote:
> for ($i = 1; $i < 500; $i++) {
>   // return me the "most recent" diag_value from a hardware_id $i
>   // at the desired timestamp
>   runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
> 23:59:59' and hardware_id = $i order by ts desc limit 1");
> }
>
> can I turn this for-loop into a single query to run in postgres?

You want to be using DISTINCT ON or some sort of WINDOW function.
DISTINCT ON works with older version of PG, but isn't as standards'
conforming.  The following should do the trick with DISTINCT ON:

  SELECT DISTINCT ON (hardware_id) hardware_id, diag_value, ts
  FROM diagnose_logs
  WHERE ts <= '2009-12-25 23:59:59'
  ORDER BY hardware_id, ts DESC;

You can obviously put in the normal clauses to limit the hardware_ids to
be things you consider important in the normal ways.

--
  Sam  http://samason.me.uk/

Re: how to remove a for-loop from programming language and put it into the query?

От
Pedro Zorzenon Neto
Дата:
> I'm probably misunderstanding the problem, but can't you just do:
>
> SELECT
>     diag_value
> FROM
>     diagnose_logs
> WHERE
>     ts <= '2009-12-25 23:59:59'
> AND
>     hardware_id BETWEEN 1 AND 500
> ORDER BY
>     ts DESC
> LIMIT 1

Hi Thom,

Yes, I think you misunderstood.

An example of a table:

hardware_id | ts                  | diag_value
          1 | 2009-12-25 14:00:00 | 43.5 (*)
          1 | 2009-12-26 15:00:00 | 43.6
          1 | 2009-12-24 13:00:00 | 43.7
          2 | 2009-12-24 15:00:00 | 43.8 (*)
          2 | 2009-12-24 14:00:00 | 43.9
          2 | 2009-12-24 14:16:00 | 43.9
          2 | 2009-12-27 14:00:00 | 44.0

I need to get the "most recent" value before "2009-12-25 23:59:59" from
every hardware_id.

For hardware_id=1, the value would be:
          1 | 2009-12-25 14:00:00 | 43.5
for hardware_id=2, the value would be:
          2 | 2009-12-24 15:00:00 | 43.8

I need a query that will return me those lines marked with (*) :-) is
this possible?

Thanks!


Re: how to remove a for-loop from programming language and put it into the query?

От
Tim Landscheidt
Дата:
Pedro Zorzenon Neto <pedro2009@mandic.com.br> wrote:

> [...]
> So, I need to get a report of all diagnostics of all hardware on
> december 25th.

> (external programming language)
> for ($i = 1; $i < 500; $i++) {
>   // return me the "most recent" diag_value from a hardware_id $i
>   // at the desired timestamp
>   runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
> 23:59:59' and hardware_id = $i order by ts desc limit 1");
> }

> Currently I have an index on diagnose_logs(ts,hardware_id)
> I have 3 milion registers of 500 different hardware_id.

> The time to run 500 times this query is long... about 1 minute. When I
> need a montly day-by-day report of 500 hardwares, it takes about half an
> hour.

> can I turn this for-loop into a single query to run in postgres?

Another month, another case for "DISTINCT ON":

| SELECT DISTINCT ON (hardware_id)
|   hardware_id, diag_value
|   FROM diagnose_logs
|   WHERE ts <= '2009-12-25 23:59:59'
|   ORDER BY hardware_id, ts DESC;

BTW, I'd prefer "WHERE ts < '2009-12-26'" as otherwise you
don't catch a timestamp '2009-12-25 23:59:59.5' (not to
speak of leap seconds).

Tim

Re: how to remove a for-loop from programming language and put it into the query?

От
Thom Brown
Дата:
On 5 July 2010 16:26, Pedro Zorzenon Neto <pedro2009@mandic.com.br> wrote:
>> I'm probably misunderstanding the problem, but can't you just do:
>>
>> SELECT
>>       diag_value
>> FROM
>>       diagnose_logs
>> WHERE
>>       ts <= '2009-12-25 23:59:59'
>> AND
>>       hardware_id BETWEEN 1 AND 500
>> ORDER BY
>>       ts DESC
>> LIMIT 1
>
> Hi Thom,
>
> Yes, I think you misunderstood.
>
> An example of a table:
>
> hardware_id | ts                  | diag_value
>          1 | 2009-12-25 14:00:00 | 43.5 (*)
>          1 | 2009-12-26 15:00:00 | 43.6
>          1 | 2009-12-24 13:00:00 | 43.7
>          2 | 2009-12-24 15:00:00 | 43.8 (*)
>          2 | 2009-12-24 14:00:00 | 43.9
>          2 | 2009-12-24 14:16:00 | 43.9
>          2 | 2009-12-27 14:00:00 | 44.0
>
> I need to get the "most recent" value before "2009-12-25 23:59:59" from
> every hardware_id.
>
> For hardware_id=1, the value would be:
>          1 | 2009-12-25 14:00:00 | 43.5
> for hardware_id=2, the value would be:
>          2 | 2009-12-24 15:00:00 | 43.8
>
> I need a query that will return me those lines marked with (*) :-) is
> this possible?
>
> Thanks!
>
>

D'oh!  I completely ignored that "LIMIT 1".  okay... what Sam said.

Thom

Re: how to remove a for-loop from programming language and put it into the query?

От
Pedro Zorzenon Neto
Дата:
Em 05-07-2010 12:22, Sam Mason escreveu:
>
> You want to be using DISTINCT ON or some sort of WINDOW function.
> DISTINCT ON works with older version of PG, but isn't as standards'
> conforming.  The following should do the trick with DISTINCT ON:
>
>   SELECT DISTINCT ON (hardware_id) hardware_id, diag_value, ts
>   FROM diagnose_logs
>   WHERE ts <= '2009-12-25 23:59:59'
>   ORDER BY hardware_id, ts DESC;
>
> You can obviously put in the normal clauses to limit the hardware_ids to
> be things you consider important in the normal ways.

Hi Sam!

It worked ok! your solution solves what I need. The process time went
from 60 to 20 seconds. nice!

Can you help me to discover why the "Seq Scan" in explain analyse? I
tried to create some indexes to change seq scan to index scan, but
couldn't do it.

Now the real table and field names...

explain analyse select distinct on (callbox_id) callbox_id, ts, imei,
temperatura from diag_resultados where ts <= '2010-06-15 00:00:00' order
by callbox_id, ts desc;
                                                               QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=408118.90..417725.43 rows=406 width=18) (actual
time=19608.347..22626.744 rows=458 loops=1)
   ->  Sort  (cost=408118.90..412922.17 rows=1921306 width=18) (actual
time=19608.345..21503.135 rows=1905941 loops=1)
         Sort Key: callbox_id, ts
         ->  Seq Scan on diag_resultados  (cost=0.00..58795.50
rows=1921306 width=18) (actual time=0.024..4886.113 rows=1905941 loops=1)
               Filter: (ts <= '2010-06-15 00:00:00-03'::timestamp with
time zone)
 Total runtime: 22762.754 ms

Pg is old in this machine. 7.4.17

Re: how to remove a for-loop from programming language and put it into the query?

От
Sam Mason
Дата:
On Mon, Jul 05, 2010 at 12:44:55PM -0300, Pedro Zorzenon Neto wrote:
> Em 05-07-2010 12:22, Sam Mason escreveu:
> >   SELECT DISTINCT ON (hardware_id) hardware_id, diag_value, ts
> >   FROM diagnose_logs
> >   WHERE ts <= '2009-12-25 23:59:59'
> >   ORDER BY hardware_id, ts DESC;
>
> It worked ok! your solution solves what I need. The process time went
> from 60 to 20 seconds. nice!

Always nice when less code is faster!

> Can you help me to discover why the "Seq Scan" in explain analyse? I
> tried to create some indexes to change seq scan to index scan, but
> couldn't do it.

It's because the only way PG knows how to do a DISTINCT ON is to sort
the whole table and then pull out the appropriate values.  Sorting the
whole of a table is generally going to be faster than referring to an
index for every row and hence PG won't use an index.

I'm not sure if that's changed more recently, but for 7.4 I'm pretty
sure that's the case anyway.

--
  Sam  http://samason.me.uk/

Re: how to remove a for-loop from programming language and put it into the query?

От
Sim Zacks
Дата:
> (external programming language)
> for ($i = 1; $i < 500; $i++) {
>   // return me the "most recent" diag_value from a hardware_id $i
>   // at the desired timestamp
>   runquery("select diag_value from diagnose_logs where ts <= '2009-12-25
> 23:59:59' and hardware_id = $i order by ts desc limit 1");
> }
>
> can I turn this for-loop into a single query to run in postgres?
>
> Thanks,
> Pedro
>
Try:

SELECT diag_value  FROM diagnose_logs a where id in
 (
     SELECT id FROM diagnose_logs b
       WHERE a.hardware_id=b.hardware_id
    and ts <= '2009-12-25 23:59:59'
    and hardware_id between 1 and 500
       ORDER BY ts LIMIT 1)
     ORDER BY hardware_id;