Обсуждение: Find inconsistencies in data with date range

Поиск
Список
Период
Сортировка

Find inconsistencies in data with date range

От
Jason Aleski
Дата:
I know I can do this Java, but I'd rather have this running as a Stored Procedure.  What I am wanting to do is
identifyand potentially correct the summary data for date inconsistencies.  We have policies/red flag reports in place
tokeep this from happening, but we are now cleaning up history.  The query below works on a per store basis, but I'd
liketo be able to run this for all stores in the location table.<br /><br /> I've looked at some procedure codes
regardinglooping, but everything I try to create seems to give me problems.  THe code I'm trying is also below.  Does
anyonehave any suggestions on how to accomplish this?<br /><br /><br /><br /><u>Working Tables</u><br />   locations -
tablecontains store information, startup date, address, etc<br />   daily_salessummary - table holds daily sales
summaryby store (summary should be updated nightly).  eod_ts is End of Day Timestamp.<br /><br /><u>Query</u><br />
WITHdatelist AS(<br />   SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek<br />   FROM (SELECT date
asGenDate<br />         FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date<br />
      ) AS t1  <br /> )<br /> SELECT gendate FROM datelist AS t1<br /> WHERE gendate NOT IN (SELECT t1.eod_ts FROM
daily_salessummaryAS t1<br />                       JOIN locations AS t2 ON t1.location_id = t2.row_id<br />
                     WHERE t2.locationCode = 'US_FL_TAMPA_141')<br />                       <br />   AND gendate >
(SELECTstart_date FROM locations WHERE locationCode = 'US_FL_TAMPA_141')<br /><br /><br /><u>Desired Output</u> - could
outputto an exceptions table<br /> StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01<br /> StoreCode
'MX_OAXACA_SALINA_8344'missing daily summary for 2011-06-05<br /><br /><br /><u>ProcedureSQL</u> (contains unknown
errors)<br/> DECLARE<br />   CURSOR location_table IS<br />     SELECT locationCode FROM locations;<br />   BEGIN<br />
   FOR thisSymbol IN ticker_tables LOOP<br />       EXECUTE IMMEDIATE 'WITH datelist AS(<br />
                                             SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek<br />
                                               FROM (SELECT date as GenDate<br />
                                                     FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1
day'::interval)date<br />                                                      ) AS t1  <br />
                                             )<br />                                               SELECT gendate FROM
datelistAS t1<br />                                               WHERE gendate NOT IN (SELECT t1.eod_ts FROM
daily_salessummaryAS t1<br />                                                                     JOIN locations AS t2
ONt1.location_id = t2.row_id<br />                                                                     WHERE
t2.locationCode= '' || location_table.locationCode  || '')                    <br />
                                            AND gendate > (SELECT start_date FROM locations WHERE locationCode = ''
||location_table.locationCode  || '')';<br />     END LOOP;<br />   END;<br /><br /><br /><br /><pre
class="moz-signature"cols="72">-- 
 
Jason Aleski / IT Specialist</pre>

Re: Find inconsistencies in data with date range

От
Adrian Klaver
Дата:
On 03/06/2015 01:38 PM, Jason Aleski wrote:
> I know I can do this Java, but I'd rather have this running as a Stored
> Procedure.  What I am wanting to do is identify and potentially correct
> the summary data for date inconsistencies.  We have policies/red flag
> reports in place to keep this from happening, but we are now cleaning up
> history.  The query below works on a per store basis, but I'd like to be
> able to run this for all stores in the location table.
>
> I've looked at some procedure codes regarding looping, but everything I
> try to create seems to give me problems.  THe code I'm trying is also
> below.  Does anyone have any suggestions on how to accomplish this?
>
>
>
> _Working Tables_
>    locations - table contains store information, startup date, address, etc
>    daily_salessummary - table holds daily sales summary by store
> (summary should be updated nightly).  eod_ts is End of Day Timestamp.
>
> _Query_
> WITH datelist AS(
>    SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
>    FROM (SELECT date as GenDate
>          FROM
> generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1
> day'::interval) date
>         ) AS t1
> )
> SELECT gendate FROM datelist AS t1
> WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
>                        JOIN locations AS t2 ON t1.location_id = t2.row_id
>                        WHERE t2.locationCode = 'US_FL_TAMPA_141')
>
>    AND gendate > (SELECT start_date FROM locations WHERE locationCode =
> 'US_FL_TAMPA_141')


First in above and in variation below I would probably do some alias 
renaming. I pretty sure t1 means different things throughout the query, 
but is hard to follow exactly what.

>
>
> _Desired Output_ - could output to an exceptions table
> StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01
> StoreCode 'MX_OAXACA_SALINA_8344' missing daily summary for 2011-06-05
>
>
> _ProcedureSQL_ (contains unknown errors)
> DECLARE
>    CURSOR location_table IS
>      SELECT locationCode FROM locations;
>    BEGIN
>      FOR thisSymbol IN ticker_tables LOOP
>        EXECUTE IMMEDIATE 'WITH datelist AS(
>                                                SELECT t1.GenDate as
> gendate, extract(dow from GenDate) AS dayofweek
>                                                  FROM (SELECT date as
> GenDate
>                                                        FROM
> generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1
> day'::interval) date
>                                                       ) AS t1
>                                                )
>                                                SELECT gendate FROM
> datelist AS t1
>                                                WHERE gendate NOT IN
> (SELECT t1.eod_ts FROM daily_salessummary AS t1
> JOIN locations AS t2 ON t1.location_id = t2.row_id
> WHERE t2.locationCode = '' || location_table.locationCode  || '')
>                                               AND gendate > (SELECT
> start_date FROM locations WHERE locationCode = '' ||
> location_table.locationCode  || '')';
>      END LOOP;
>    END;

I do not use cursors enough in plpgsql to be sure, but I think the above 
definition is incorrect:

http://www.postgresql.org/docs/9.4/interactive/plpgsql-cursors.html

To reduce the moving parts I would write the function without the cursor 
and just hardwire the location information to start with to get a 
working sample.

>
>
>
> --
> Jason Aleski / IT Specialist
>


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Find inconsistencies in data with date range

От
David G Johnston
Дата:
Jason Aleksi wrote
> I've looked at some procedure codes regarding looping, but everything I 
> try to create seems to give me problems.  THe code I'm trying is also 
> below.  Does anyone have any suggestions on how to accomplish this?

I would build a master table of stores and dates and then write a query to
update a third field from null to the number of records found for the given
combination.  When all the nulls are gone you can scan for zeros to figure
out what combinations are missing data.  If you have a matching index the
queries should execute reasonably efficiently and you either call it from a
function in the database or externally on one or more threads depending on
where you expect to encounter the processing bottleneck.  You can process
more than one day or store at a time if so desired but there will likely be
a point of diminishing returns depending on the volume of data.  I would
probably do all days for one store in a given year at a time.

David J.



--
View this message in context:
http://postgresql.nabble.com/Find-inconsistencies-in-data-with-date-range-tp5840865p5840891.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: Find inconsistencies in data with date range

От
s d
Дата:
Hi,

Something like this?
It inserts error records into a table called locationrep.

create or replace function finderror() returns void as $$
    declare
startd date;
daterec record;
    begin
select into startd min(startdate) from location; --identify the earliest opening date
        --iterating trough dates from that date until now
for daterec in select generate_series::date as repdate from generate_series(startd,now()::date,'1 day'::interval) loop
            --insert ito the error table all the shops which not have an entry from the current date and opened before the said date
   insert into locationrep select shop,daterec.repdate from location where startdate<=daterec.repdate and not exists(select 1 from daily_salessummary where shop=location.shop and reportdate=daterec.repdate);
end loop;
    end;

$$ language plpgsql;

Regards,
Sándor Daku

On 6 March 2015 at 22:38, Jason Aleski <jason.aleski@gmail.com> wrote:
I know I can do this Java, but I'd rather have this running as a Stored Procedure.  What I am wanting to do is identify and potentially correct the summary data for date inconsistencies.  We have policies/red flag reports in place to keep this from happening, but we are now cleaning up history.  The query below works on a per store basis, but I'd like to be able to run this for all stores in the location table.

I've looked at some procedure codes regarding looping, but everything I try to create seems to give me problems.  THe code I'm trying is also below.  Does anyone have any suggestions on how to accomplish this?



Working Tables
  locations - table contains store information, startup date, address, etc
  daily_salessummary - table holds daily sales summary by store (summary should be updated nightly).  eod_ts is End of Day Timestamp.

Query
WITH datelist AS(
  SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
  FROM (SELECT date as GenDate
        FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
       ) AS t1 
)
SELECT gendate FROM datelist AS t1
WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
                      JOIN locations AS t2 ON t1.location_id = t2.row_id
                      WHERE t2.locationCode = 'US_FL_TAMPA_141')
                     
  AND gendate > (SELECT start_date FROM locations WHERE locationCode = 'US_FL_TAMPA_141')


Desired Output - could output to an exceptions table
StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01
StoreCode 'MX_OAXACA_SALINA_8344' missing daily summary for 2011-06-05


ProcedureSQL (contains unknown errors)
DECLARE
  CURSOR location_table IS
    SELECT locationCode FROM locations;
  BEGIN
    FOR thisSymbol IN ticker_tables LOOP
      EXECUTE IMMEDIATE 'WITH datelist AS(
                                              SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
                                                FROM (SELECT date as GenDate
                                                      FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
                                                     ) AS t1 
                                              )
                                              SELECT gendate FROM datelist AS t1
                                              WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
                                                                    JOIN locations AS t2 ON t1.location_id = t2.row_id
                                                                    WHERE t2.locationCode = '' || location_table.locationCode  || '')                   
                                             AND gendate > (SELECT start_date FROM locations WHERE locationCode = '' || location_table.locationCode  || '')';
    END LOOP;
  END;



-- 
Jason Aleski / IT Specialist

Re: Find inconsistencies in data with date range

От
Jason Aleski
Дата:
In case anyone else needs similar code, I was able to get this working.  Below is the code that pulls the missing
datesusing a cursor and returns the information into a table. I'm sure there may be a way to make the code more
efficient,but considering this will only get ran maybe once a quarter (for quarterly reports), it works for me.  With
700+stores, it takes about 30 minutes to fully run from a reporting server.  I have a JAVA program that queries the
function"SELECT * FROM eod_missing_dates();"  Then sends all the missing dates to a RabbitMQ server to with a worker
programto try to rebuild the missing eod summaries and if not, it will send a message to the store managers.  Hopefully
thiscode will help someone else!<br /><br /><br /> CREATE OR REPLACE FUNCTION eod_missing_dates()<br />   RETURNS
TABLE(store_iduuid, location character varying, missing_ts timestamp with time zone) AS<br /> $BODY$<br /> DECLARE<br
/>  location_cursor CURSOR FOR SELECT * FROM locations ORDER BY store_id;<br />   store_rec location%ROWTYPE;<br />
BEGIN<br/><br />   CREATE TEMP TABLE dr_temptable(dr_ts, dr_dow) AS (SELECT t1.GenDate as gendate, extract(dow from
GenDate)AS dayofweek<br />   FROM (SELECT date as GenDate<br />         FROM
generate_series('1950-01-01'::date,CURRENT_TIMESTAMP::date,'1day'::interval) date<br />        ) AS t1  <br />   WHERE
extract(dowfrom GenDate) NOT IN (0,6));<br /><br />   OPEN location_cursor;<br />   LOOP<br />     FETCH
location_cursorINTO store_rec;<br />     EXIT WHEN store_rec IS NULL;<br /><br />     IF NOT FOUND THEN<br />      
EXIT;<br/>     END IF;<br />     <br />     RAISE INFO '%', 'Checking data/dates for ' || store_rec.location;<br />    
RETURNQUERY SELECT store_rec.row_id as store_id, store_rec.location AS location, dr_ts AS missing_ts FROM dr_temptable
ASt1<br />                  WHERE dr_ts NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1<br />
                     JOIN location AS t2 ON t1.store_id = t2.row_id<br />                       WHERE t2.location =
store_rec.location)                     <br />                       AND dr_ts > (SELECT start_date FROM locations
WHERElocation=store_rec.location);    <br />   END LOOP;<br />   CLOSE location_cursor;  <br />   DROP TABLE
dr_temptable;<br/> END;<br /> $BODY$<br />   LANGUAGE plpgsql VOLATILE<br /> ;<br /><br /><br /><pre
class="moz-signature"cols="72">Jason Aleski / IT Specialist</pre><br /><blockquote
cite="mid:CAKyoTgbVfWULPdG1-BX3FSfm=hT3aFRs=avM+kg=0ozhP_RCwg@mail.gmail.com"type="cite"><div class="gmail_extra"><br
/><divclass="gmail_quote">On 6 March 2015 at 22:38, Jason Aleski <span dir="ltr"><<a
href="mailto:jason.aleski@gmail.com"moz-do-not-send="true" target="_blank">jason.aleski@gmail.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0           .8ex;border-left:1px #ccc
solid;padding-left:1ex"><divbgcolor="#FFFFFF" text="#000000"> I know I can do this Java, but I'd rather have this
runningas a Stored Procedure.  What I am wanting to do is identify and potentially correct the summary data for date
inconsistencies. We have policies/red flag reports in place to keep this from happening, but we are now cleaning up
history. The query below works on a per store basis, but I'd like to be able to run this for all stores in the location
table.<br/><br /> I've looked at some procedure codes regarding looping, but everything I try to create seems to give
meproblems.  THe code I'm trying is also below.  Does anyone have any suggestions on how to accomplish this?<br /><br
/><br/><br /><u>Working Tables</u><br />   locations - table contains store information, startup date, address, etc<br
/>  daily_salessummary - table holds daily sales summary by store (summary should be updated nightly).  eod_ts is End
ofDay Timestamp.<br /><br /><u>Query</u><br /> WITH datelist AS(<br />   SELECT t1.GenDate as gendate, extract(dow from
GenDate)AS dayofweek<br />   FROM (SELECT date as GenDate<br />         FROM
generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1day'::interval) date<br />        ) AS t1  <br /> )<br />
SELECTgendate FROM datelist AS t1<br /> WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1<br />
                     JOIN locations AS t2 ON t1.location_id = t2.row_id<br />                       WHERE
t2.locationCode= 'US_FL_TAMPA_141')<br />                       <br />   AND gendate > (SELECT start_date FROM
locationsWHERE locationCode = 'US_FL_TAMPA_141')<br /><br /><br /><u>Desired Output</u> - could output to an exceptions
table<br/> StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01<br /> StoreCode 'MX_OAXACA_SALINA_8344'
missingdaily summary for 2011-06-05<br /><br /><br /><u>ProcedureSQL</u> (contains unknown errors)<br /> DECLARE<br />
 CURSOR location_table IS<br />     SELECT locationCode FROM locations;<br />   BEGIN<br />     FOR thisSymbol IN
ticker_tablesLOOP<br />       EXECUTE IMMEDIATE 'WITH datelist AS(<br />                                              
SELECTt1.GenDate as gendate, extract(dow from GenDate) AS dayofweek<br />                                              
 FROM (SELECT date as GenDate<br />                                                       FROM
generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1day'::interval) date<br />
                                                    ) AS t1  <br />                                               )<br
/>                                              SELECT gendate FROM datelist AS t1<br />
                                             WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1<br />
                                                                   JOIN locations AS t2 ON t1.location_id =
t2.row_id<br/>                                                                     WHERE t2.locationCode = '' ||
location_table.locationCode || '')                    <br />                                              AND gendate
>(SELECT start_date FROM locations WHERE locationCode = '' || location_table.locationCode  || '')';<br />     END
LOOP;<br/>   END;<span class="HOEnZb"><font color="#888888"><br /><br /><br /><br /><pre cols="72">-- 
 
Jason Aleski / IT Specialist</pre> </font></span></div></blockquote></div><br /></div></blockquote><br />