Обсуждение: Searching for big differences between values

Поиск
Список
Период
Сортировка

Searching for big differences between values

От
Durumdara
Дата:
Hello! Somewhere the users made mistakes on prices (stock). I need to search for big differences between values. For example: 20 21 21,5 30 28 .. 46392 <- 46392 <- But it could be: 42300 43100 44000 43800 65000 <- 42100 Human eye could locate these values, but there we need to check 30.000 articles and 450.000 values. Do you have any idea, how to this with SQL? In most cases the first values are ok, the second interval (after a date) it could be wrong... I don't know how to define the solution, but I think PGSQL have intelligent solution for this problem. We need to search for elements have bigger value like base price * tolerance. But the base price is calculated dynamically from the lower values... The tolerance is lower on highest base prices. Thank you for any help! Best regards dd

Re: Searching for big differences between values

От
Chris Travers
Дата:
On Nov 30, 2017 08:35, "Durumdara" wrote: Hello! Somewhere the users made mistakes on prices (stock). I need to search for big differences between values. For example: 20 21 21,5 30 28 .. 46392 <- 46392 <- But it could be: 42300 43100 44000 43800 65000 <- 42100 Human eye could locate these values, but there we need to check 30.000 articles and 450.000 values. Do you have any idea, how to this with SQL? In most cases the first values are ok, the second interval (after a date) it could be wrong... I don't know how to define the solution, but I think PGSQL have intelligent solution for this problem. We need to search for elements have bigger value like base price * tolerance. But the base price is calculated dynamically from the lower values... The tolerance is lower on highest base prices. The good news is relational dbs excel at performing. However you have not provided nearly enough info to help write a query. How are such things calculated? Thank you for any help! Best regards dd

Re: Searching for big differences between values

От
Chris Mair
Дата:
> I need to search for big differences between values.
[...]

Hi,

from an SQL point of view this is not difficult, but you need to
carefully define a criteria for the outliers.

For example, to find values that are more than a standard deviation
away from the mean, do something like this:

chris=# select * from val;   x
-------    20    21  21.5    30    28 46392 46393    40
(8 rows)

chris=# select * from val where x > (select avg(x) + stddev(x) from val) or x < (select avg(x) - stddev(x) from val);
x
------- 46392 46393
(2 rows)

Try with n*stddev(x) for n = 2, 3, 4, 5, ... to see
if you can get to your outliers...

Bye,
Chris.



Re: Searching for big differences between values

От
Rory Campbell-Lange
Дата:
On 30/11/17, Durumdara (durumdara@gmail.com) wrote:
> Somewhere the users made mistakes on prices (stock).
> 
> I need to search for big differences between values.
> For example:
> 
> 20
> 21
> 21,5
> 30
> 28
> ..
> 46392 <-
> 46392 <-

You could use window functions
https://www.postgresql.org/docs/current/static/functions-window.html

Eg   costings=> create table vals (num integer);
   costings=> insert into vals values (20), (21), (30), (28), (46392);
   costings=> select num, num - lag(num, 1, num) over () as diff from vals   order by num;     num  | diff
-------+-------      20 |     0       21 |     1       28 |    -2       30 |     9    46392 | 46364   (5 rows)
 

Although you might want to use averaging or percentages to throw up
errors instead.


Re: Searching for big differences between values

От
Laurenz Albe
Дата:
Durumdara wrote:
> Somewhere the users made mistakes on prices (stock).
> 
> I need to search for big differences between values.
> For example:
> 
> 20
> 21
> 21,5
> 30
> 28
> ..
> 46392 <-
> 46392 <-
> 
> But it could be:
> 
> 42300
> 43100
> 44000
> 43800
> 65000 <-
> 42100
> 
> Human eye could locate these values, but there we need to check 30.000 articles and 450.000 values.
> 
> Do you have any idea, how to this with SQL?

You could use some variant of
 SELECT id, price FROM (SELECT id, price,              avg(price) OVER (ORDER BY id ROWS BETWEEN 3 PRECEDING AND 3
FOLLOWING)AS av       FROM stock) q WHERE NOT (price BETWEEN 0.8 * av AND 1.25 * av);
 

Yours,
Laurenz Albe