Re: Find inconsistencies in data with date range
Re: Find inconsistencies in data with date range
От:
David G Johnston <david.g.johnston@gmail.com>
Дата:
Re: Find inconsistencies in data with date range
От:
Jason Aleski <jason.aleski@gmail.com>
Дата:
In case anyone else needs similar code, I was able to get this working. Below is the code that pulls the missing dates using a cursor and returns the information into a table. I'm sure there may be a way to make the code more efficient, but considering this will only get ran maybe once a quarter (for quarterly reports), it works for me. With 700+ stores, it takes about 30 minutes to fully run from a reporting server. I have a JAVA program that queries the function "SELECT * FROM eod_missing_dates();" Then sends all the missing dates to a RabbitMQ server to with a worker program to try to rebuild the missing eod summaries and if not, it will send a message to the store managers. Hopefully this code will help someone else!
CREATE OR REPLACE FUNCTION eod_missing_dates()
RETURNS TABLE(store_id uuid, location character varying, missing_ts timestamp with time zone) AS
$BODY$
DECLARE
location_cursor CURSOR FOR SELECT * FROM locations ORDER BY store_id;
store_rec location%ROWTYPE;
BEGIN
CREATE TEMP TABLE dr_temptable(dr_ts, dr_dow) AS (SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
FROM (SELECT date as GenDate
FROM generate_series('1950-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
) AS t1
WHERE extract(dow from GenDate) NOT IN (0,6));
OPEN location_cursor;
LOOP
FETCH location_cursor INTO store_rec;
EXIT WHEN store_rec IS NULL;
IF NOT FOUND THEN
EXIT;
END IF;
RAISE INFO '%', 'Checking data/dates for ' || store_rec.location;
RETURN QUERY SELECT store_rec.row_id as store_id, store_rec.location AS location, dr_ts AS missing_ts FROM dr_temptable AS t1
WHERE dr_ts NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
JOIN location AS t2 ON t1.store_id = t2.row_id
WHERE t2.location = store_rec.location)
AND dr_ts > (SELECT start_date FROM locations WHERE location=store_rec.location);
END LOOP;
CLOSE location_cursor;
DROP TABLE dr_temptable;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
;
CREATE OR REPLACE FUNCTION eod_missing_dates()
RETURNS TABLE(store_id uuid, location character varying, missing_ts timestamp with time zone) AS
$BODY$
DECLARE
location_cursor CURSOR FOR SELECT * FROM locations ORDER BY store_id;
store_rec location%ROWTYPE;
BEGIN
CREATE TEMP TABLE dr_temptable(dr_ts, dr_dow) AS (SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
FROM (SELECT date as GenDate
FROM generate_series('1950-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
) AS t1
WHERE extract(dow from GenDate) NOT IN (0,6));
OPEN location_cursor;
LOOP
FETCH location_cursor INTO store_rec;
EXIT WHEN store_rec IS NULL;
IF NOT FOUND THEN
EXIT;
END IF;
RAISE INFO '%', 'Checking data/dates for ' || store_rec.location;
RETURN QUERY SELECT store_rec.row_id as store_id, store_rec.location AS location, dr_ts AS missing_ts FROM dr_temptable AS t1
WHERE dr_ts NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
JOIN location AS t2 ON t1.store_id = t2.row_id
WHERE t2.location = store_rec.location)
AND dr_ts > (SELECT start_date FROM locations WHERE location=store_rec.location);
END LOOP;
CLOSE location_cursor;
DROP TABLE dr_temptable;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
;
Jason Aleski / IT Specialist
On 6 March 2015 at 22:38, Jason Aleski <jason.aleski@gmail.com> wrote:I know I can do this Java, but I'd rather have this running as a Stored Procedure. What I am wanting to do is identify and potentially correct the summary data for date inconsistencies. We have policies/red flag reports in place to keep this from happening, but we are now cleaning up history. The query below works on a per store basis, but I'd like to be able to run this for all stores in the location table.
I've looked at some procedure codes regarding looping, but everything I try to create seems to give me problems. THe code I'm trying is also below. Does anyone have any suggestions on how to accomplish this?
Working Tables
locations - table contains store information, startup date, address, etc
daily_salessummary - table holds daily sales summary by store (summary should be updated nightly). eod_ts is End of Day Timestamp.
Query
WITH datelist AS(
SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
FROM (SELECT date as GenDate
FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
) AS t1
)
SELECT gendate FROM datelist AS t1
WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
JOIN locations AS t2 ON t1.location_id = t2.row_id
WHERE t2.locationCode = 'US_FL_TAMPA_141')
AND gendate > (SELECT start_date FROM locations WHERE locationCode = 'US_FL_TAMPA_141')
Desired Output - could output to an exceptions table
StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01
StoreCode 'MX_OAXACA_SALINA_8344' missing daily summary for 2011-06-05
ProcedureSQL (contains unknown errors)
DECLARE
CURSOR location_table IS
SELECT locationCode FROM locations;
BEGIN
FOR thisSymbol IN ticker_tables LOOP
EXECUTE IMMEDIATE 'WITH datelist AS(
SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
FROM (SELECT date as GenDate
FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
) AS t1
)
SELECT gendate FROM datelist AS t1
WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
JOIN locations AS t2 ON t1.location_id = t2.row_id
WHERE t2.locationCode = '' || location_table.locationCode || '')
AND gendate > (SELECT start_date FROM locations WHERE locationCode = '' || location_table.locationCode || '')';
END LOOP;
END;-- Jason Aleski / IT Specialist
Re: Find inconsistencies in data with date range
От:
s d <daku.sandor@gmail.com>
Дата:
Hi,
Something like this?
It inserts error records into a table called locationrep.
create or replace function finderror() returns void as $$
declare
startd date;
daterec record;
begin
select into startd min(startdate) from location; --identify the earliest opening date
--iterating trough dates from that date until now
for daterec in select generate_series::date as repdate from generate_series(startd,now()::date,'1 day'::interval) loop
--insert ito the error table all the shops which not have an entry from the current date and opened before the said date
insert into locationrep select shop,daterec.repdate from location where startdate<=daterec.repdate and not exists(select 1 from daily_salessummary where shop=location.shop and reportdate=daterec.repdate);
end loop;
end;
$$ language plpgsql;
Regards,
Sándor Daku
On 6 March 2015 at 22:38, Jason Aleski <jason.aleski@gmail.com> wrote:
I know I can do this Java, but I'd rather have this running as a Stored Procedure. What I am wanting to do is identify and potentially correct the summary data for date inconsistencies. We have policies/red flag reports in place to keep this from happening, but we are now cleaning up history. The query below works on a per store basis, but I'd like to be able to run this for all stores in the location table.
I've looked at some procedure codes regarding looping, but everything I try to create seems to give me problems. THe code I'm trying is also below. Does anyone have any suggestions on how to accomplish this?
Working Tables
locations - table contains store information, startup date, address, etc
daily_salessummary - table holds daily sales summary by store (summary should be updated nightly). eod_ts is End of Day Timestamp.
Query
WITH datelist AS(
SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
FROM (SELECT date as GenDate
FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
) AS t1
)
SELECT gendate FROM datelist AS t1
WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
JOIN locations AS t2 ON t1.location_id = t2.row_id
WHERE t2.locationCode = 'US_FL_TAMPA_141')
AND gendate > (SELECT start_date FROM locations WHERE locationCode = 'US_FL_TAMPA_141')
Desired Output - could output to an exceptions table
StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01
StoreCode 'MX_OAXACA_SALINA_8344' missing daily summary for 2011-06-05
ProcedureSQL (contains unknown errors)
DECLARE
CURSOR location_table IS
SELECT locationCode FROM locations;
BEGIN
FOR thisSymbol IN ticker_tables LOOP
EXECUTE IMMEDIATE 'WITH datelist AS(
SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
FROM (SELECT date as GenDate
FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
) AS t1
)
SELECT gendate FROM datelist AS t1
WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
JOIN locations AS t2 ON t1.location_id = t2.row_id
WHERE t2.locationCode = '' || location_table.locationCode || '')
AND gendate > (SELECT start_date FROM locations WHERE locationCode = '' || location_table.locationCode || '')';
END LOOP;
END;-- Jason Aleski / IT Specialist