Обсуждение: Searching for big differences between values
Hello!
Somewhere the users made mistakes on prices (stock).
I need to search for big differences between values.
For example:
20
21
21,5
30
28
..
46392 <-
46392 <-
But it could be:
42300
43100
44000
43800
65000 <-
42100
Human eye could locate these values, but there we need to check 30.000
articles and 450.000 values.
Do you have any idea, how to this with SQL?
In most cases the first values are ok, the second interval (after a date)
it could be wrong...
I don't know how to define the solution, but I think PGSQL have intelligent
solution for this problem.
We need to search for elements have bigger value like base price *
tolerance.
But the base price is calculated dynamically from the lower values... The
tolerance is lower on highest base prices.
Thank you for any help!
Best regards dd
On Nov 30, 2017 08:35, "Durumdara" wrote:
Hello!
Somewhere the users made mistakes on prices (stock).
I need to search for big differences between values.
For example:
20
21
21,5
30
28
..
46392 <-
46392 <-
But it could be:
42300
43100
44000
43800
65000 <-
42100
Human eye could locate these values, but there we need to check 30.000
articles and 450.000 values.
Do you have any idea, how to this with SQL?
In most cases the first values are ok, the second interval (after a date)
it could be wrong...
I don't know how to define the solution, but I think PGSQL have intelligent
solution for this problem.
We need to search for elements have bigger value like base price *
tolerance.
But the base price is calculated dynamically from the lower values... The
tolerance is lower on highest base prices.
The good news is relational dbs excel at performing. However you have not
provided nearly enough info to help write a query. How are such things
calculated?
Thank you for any help!
Best regards dd
> I need to search for big differences between values. [...] Hi, from an SQL point of view this is not difficult, but you need to carefully define a criteria for the outliers. For example, to find values that are more than a standard deviation away from the mean, do something like this: chris=# select * from val; x ------- 20 21 21.5 30 28 46392 46393 40 (8 rows) chris=# select * from val where x > (select avg(x) + stddev(x) from val) or x < (select avg(x) - stddev(x) from val); x ------- 46392 46393 (2 rows) Try with n*stddev(x) for n = 2, 3, 4, 5, ... to see if you can get to your outliers... Bye, Chris.
On 30/11/17, Durumdara (durumdara@gmail.com) wrote: > Somewhere the users made mistakes on prices (stock). > > I need to search for big differences between values. > For example: > > 20 > 21 > 21,5 > 30 > 28 > .. > 46392 <- > 46392 <- You could use window functions https://www.postgresql.org/docs/current/static/functions-window.html Eg costings=> create table vals (num integer); costings=> insert into vals values (20), (21), (30), (28), (46392); costings=> select num, num - lag(num, 1, num) over () as diff from vals order by num; num | diff -------+------- 20 | 0 21 | 1 28 | -2 30 | 9 46392 | 46364 (5 rows) Although you might want to use averaging or percentages to throw up errors instead.
Durumdara wrote: > Somewhere the users made mistakes on prices (stock). > > I need to search for big differences between values. > For example: > > 20 > 21 > 21,5 > 30 > 28 > .. > 46392 <- > 46392 <- > > But it could be: > > 42300 > 43100 > 44000 > 43800 > 65000 <- > 42100 > > Human eye could locate these values, but there we need to check 30.000 articles and 450.000 values. > > Do you have any idea, how to this with SQL? You could use some variant of SELECT id, price FROM (SELECT id, price, avg(price) OVER (ORDER BY id ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)AS av FROM stock) q WHERE NOT (price BETWEEN 0.8 * av AND 1.25 * av); Yours, Laurenz Albe