Обсуждение: Obvious data mismatch in View2 which basically SELECT * from View1

Поиск
Список
Период
Сортировка

Obvious data mismatch in View2 which basically SELECT * from View1

От
Ben
Дата:
Dear list,

Recently I am getting feedback, data in my analytic report is not 
repeatable. From time to time they get different data for the same time 
span.
(but IIRC previously it was OK). Therefore I started debuging the View 
chain for that report, during which I bumped into this issue/phenomenon.

In a over -simplified version:

CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;

Now col1 from both views looks different. I don't know where to start to 
solve this problem.

The actual situation is a bit more than that, the following is the 
actual query:


     -- trying to audit utlog weighed stat
     with t as (
     select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02 
0:0:0'::timestamp t1
     )
     --select * from t;
     select *
     -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
     -- from utlog.stats_per_shift_filtered_per_reason
     from utlog.stats_per_shift_filtered                     (let's call 
it #View2 for short)
     -- from utlog.stats_per_shift_filtered_b0206      (let's call it 
#View1 for short)
     -- from utlog.stats_per_shift
     cross join t
     where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
     limit 100
     ;

The Result for #View2

     wts                 | wsft | wspan  | wstate | wline | rcodes
     --------------------+------+--------+--------+-------+-------
     2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
     2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
     2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
     2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
     2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}

     INDEX | COLUMN_NAME | DATA_TYPE
     ------+-------------+------------
     1     | wts         | timestamptz
     3     | wsft        | varchar
     4     | wspan       | float8
     5     | wstate      | varchar
     6     | wline       | varchar
     7     | rcodes      | text[]


Same query, the Result for #View1

     wts                 | wsft | wspan | wstate | wline | rcodes
     --------------------+------+-------+--------+-------+-------
     2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
     2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
     2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
     2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
     2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}

     INDEX | COLUMN_NAME | DATA_TYPE
     ------+-------------+------------
     1     | wts         | timestamptz
     3     | wsft        | varchar
     4     | wspan       | float8
     5     | wstate      | varchar
     6     | wline       | varchar
     7     | rcodes      | varchar[]

Reuslts in `wspan` column is inaccurate while both type are float8. Most 
weird thing is the 5 to 0 change. for Row 1.

The `_b0206`(#View1) is just a version of 
`stats_per_shift_filtered`(#View2) from past revisions.
I am sure the original CREATE statement for (#View2) is `CREATE VIEW ... 
AS SELECT * FROM ...._b0206`

Definition of View2 in SQLWorkbench/J generated schema:


     CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
     (
     wts,
     wdate,
     wsft,
     wspan,
     wstate,
     wline,
     rcodes
     )
     AS
     SELECT stats_per_shift_filtered_u0206.wts,
         stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
         stats_per_shift_filtered_u0206.wspan,
         stats_per_shift_filtered_u0206.wstate,
         stats_per_shift_filtered_u0206.wline,
         stats_per_shift_filtered_u0206.rcodes
     FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)


It feels like the utlog.stats_per_shift_filtered_u0206 in 
utlog.stats_per_shift_filtered definition is a different object from 
utlog.stats_per_shift_filtered_u0206?

I am totally out of clues. Any help would be appreciated. Thanks.


Regards,

Ben





Re: Obvious data mismatch in View2 which basically SELECT * from View1

От
Magnus Hagander
Дата:


On Wed, Sep 16, 2020 at 9:26 AM Ben <bentenzha@outlook.com> wrote:
Dear list,

Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.

In a over -simplified version:

CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;

Now col1 from both views looks different. I don't know where to start to
solve this problem.

The actual situation is a bit more than that, the following is the
actual query:


     -- trying to audit utlog weighed stat
     with t as (
     select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
     )
     --select * from t;
     select *
     -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
     -- from utlog.stats_per_shift_filtered_per_reason
     from utlog.stats_per_shift_filtered                     (let's call
it #View2 for short)
     -- from utlog.stats_per_shift_filtered_b0206      (let's call it
#View1 for short)
     -- from utlog.stats_per_shift
     cross join t
     where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
     limit 100
     ;


Not sure if it might be something lost in your simplification here, but you have a LIMIT with no ORDER BY there. That basically means "give me 100 random rows" (but not with a very good random level). It does not return rows in a consistent/predictable order. So as long as that query is part of what you're doing, you should not be surprised if you get the rows in an inconsistent/unpredictable order, with whatever follow-on effects that might have. (And it can lead to weird follow-on effects like the ones you're talking about when used in larger query structures)

--

Re: Obvious data mismatch in View2 which basically SELECT * from View1

От
Ron
Дата:
On 9/15/20 10:40 PM, Ben wrote:
> Dear list,
>
> Recently I am getting feedback, data in my analytic report is not 
> repeatable. From time to time they get different data for the same time span.
> (but IIRC previously it was OK). Therefore I started debuging the View 
> chain for that report, during which I bumped into this issue/phenomenon.
>
> In a over -simplified version:
>
> CREATE VIEW2 AS SELECT * FROM VIEW1;
> SELECT  col1 FROM VIEW2 WHERE cond1=True;
> SELECT  col1 FROM VIEW1 WHERE cond1=True;
>
> Now col1 from both views looks different. I don't know where to start to 
> solve this problem.
>
> The actual situation is a bit more than that, the following is the actual 
> query:
>
>
>     -- trying to audit utlog weighed stat
>     with t as (
>     select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02 
> 0:0:0'::timestamp t1
>     )
>     --select * from t;
>     select *
>     -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
>     -- from utlog.stats_per_shift_filtered_per_reason
>     from utlog.stats_per_shift_filtered                     (let's call it 
> #View2 for short)
>     -- from utlog.stats_per_shift_filtered_b0206      (let's call it 
> #View1 for short)
>     -- from utlog.stats_per_shift
>     cross join t
>     where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
>     limit 100
>     ;
>
> The Result for #View2
>
>     wts                 | wsft | wspan  | wstate | wline | rcodes
>     --------------------+------+--------+--------+-------+-------
>     2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
>     2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
>     2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
>     2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
>     2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}
>
>     INDEX | COLUMN_NAME | DATA_TYPE
>     ------+-------------+------------
>     1     | wts         | timestamptz
>     3     | wsft        | varchar
>     4     | wspan       | float8
>     5     | wstate      | varchar
>     6     | wline       | varchar
>     7     | rcodes      | text[]
>
>
> Same query, the Result for #View1
>
>     wts                 | wsft | wspan | wstate | wline | rcodes
>     --------------------+------+-------+--------+-------+-------
>     2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
>     2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
>     2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
>     2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
>     2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}
>
>     INDEX | COLUMN_NAME | DATA_TYPE
>     ------+-------------+------------
>     1     | wts         | timestamptz
>     3     | wsft        | varchar
>     4     | wspan       | float8
>     5     | wstate      | varchar
>     6     | wline       | varchar
>     7     | rcodes      | varchar[]
>
> Reuslts in `wspan` column is inaccurate while both type are float8. Most 
> weird thing is the 5 to 0 change. for Row 1.
>
> The `_b0206`(#View1) is just a version of 
> `stats_per_shift_filtered`(#View2) from past revisions.
> I am sure the original CREATE statement for (#View2) is `CREATE VIEW ... 
> AS SELECT * FROM ...._b0206`
>
> Definition of View2 in SQLWorkbench/J generated schema:
>
>
>     CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
>     (
>     wts,
>     wdate,
>     wsft,
>     wspan,
>     wstate,
>     wline,
>     rcodes
>     )
>     AS
>     SELECT stats_per_shift_filtered_u0206.wts,
>         stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
>         stats_per_shift_filtered_u0206.wspan,
>         stats_per_shift_filtered_u0206.wstate,
>         stats_per_shift_filtered_u0206.wline,
>         stats_per_shift_filtered_u0206.rcodes
>     FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)
>
>
> It feels like the utlog.stats_per_shift_filtered_u0206 in 
> utlog.stats_per_shift_filtered definition is a different object from 
> utlog.stats_per_shift_filtered_u0206?
>
> I am totally out of clues. Any help would be appreciated. Thanks.

Try the queries in a serializable read only transaction.  That should any 
possible changes in the underlying data.

START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;
COMMIT;

-- 
Angular momentum makes the world go 'round.



Re: Obvious data mismatch in View2 which basically SELECT * from View1

От
Ben
Дата:

Hi Magnus,

Thanks for the heads up.

For the data posted in this email, the listed result is the full result set for that query.
I have checked the data in view2 and view1 with various conditions.
Both view has many difference in column wspan::float8.
The condition in the shown query is intended to show just a small set of them.
But you are right, I should be more cautious. Thanks for the headsup.

Regards,
Ben


On 9/16/20 3:35 PM, Magnus Hagander wrote:


On Wed, Sep 16, 2020 at 9:26 AM Ben <bentenzha@outlook.com> wrote:
Dear list,

Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.

In a over -simplified version:

CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;

Now col1 from both views looks different. I don't know where to start to
solve this problem.

The actual situation is a bit more than that, the following is the
actual query:


     -- trying to audit utlog weighed stat
     with t as (
     select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
     )
     --select * from t;
     select *
     -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
     -- from utlog.stats_per_shift_filtered_per_reason
     from utlog.stats_per_shift_filtered                     (let's call
it #View2 for short)
     -- from utlog.stats_per_shift_filtered_b0206      (let's call it
#View1 for short)
     -- from utlog.stats_per_shift
     cross join t
     where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
     limit 100
     ;


Not sure if it might be something lost in your simplification here, but you have a LIMIT with no ORDER BY there. That basically means "give me 100 random rows" (but not with a very good random level). It does not return rows in a consistent/predictable order. So as long as that query is part of what you're doing, you should not be surprised if you get the rows in an inconsistent/unpredictable order, with whatever follow-on effects that might have. (And it can lead to weird follow-on effects like the ones you're talking about when used in larger query structures)

--

Re: Obvious data mismatch in View2 which basically SELECT * from View1

От
Ben
Дата:
Dear List,

Some further investigation.

Creating a fresh View3 on View1 gives exactly the same result as View1.

The View1 View2 are both years old in a production database, in use for quite some time. (The database is production duty but not hosted in server room with UPS. It's like a edge PC in industry monitoring. Now am more concerned with its data integrity)

The problem with the final report is reported recently. I am not sure what's broken in the database.

I haven't replaced the broken View2 yet. Hope someone can point me to some further investigation.

My concern is that if there are other views inside that database having similar integrity issue, how can I find them all (if any).

It's beyond my regular SQL ability. I guess I really need help from people with maintenance experience.

Any help will be appreciated, thanks in advance.

Ben



On September 16, 2020 3:40:34 AM UTC, Ben <bentenzha@outlook.com> wrote:
Dear list,

Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.

In a over -simplified version:

CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;

Now col1 from both views looks different. I don't know where to start to
solve this problem.

The actual situation is a bit more than that, the following is the
actual query:


    -- trying to audit utlog weighed stat
    with t as (
    select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
    )
    --select * from t;
    select *
    -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
    -- from utlog.stats_per_shift_filtered_per_reason
    from utlog.stats_per_shift_filtered                     (let's call
it #View2 for short)
    -- from utlog.stats_per_shift_filtered_b0206      (let's call it
#View1 for short)
    -- from utlog.stats_per_shift
    cross join t
    where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
    limit 100
    ;

The Result for #View2

    wts                 | wsft | wspan  | wstate | wline | rcodes
    --------------------+------+--------+--------+-------+-------
    2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
    2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
    2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
    2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
    2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}

    INDEX | COLUMN_NAME | DATA_TYPE
    ------+-------------+------------
    1     | wts         | timestamptz
    3     | wsft        | varchar
    4     | wspan       | float8
    5     | wstate      | varchar
    6     | wline       | varchar
    7     | rcodes      | text[]


Same query, the Result for #View1

    wts                 | wsft | wspan | wstate | wline | rcodes
    --------------------+------+-------+--------+-------+-------
    2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
    2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
    2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
    2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
    2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}

    INDEX | COLUMN_NAME | DATA_TYPE
    ------+-------------+------------
    1     | wts         | timestamptz
    3     | wsft        | varchar
    4     | wspan       | float8
    5     | wstate      | varchar
    6     | wline       | varchar
    7     | rcodes      | varchar[]

Reuslts in `wspan` column is inaccurate while both type are float8. Most
weird thing is the 5 to 0 change. for Row 1.

The `_b0206`(#View1) is just a version of
`stats_per_shift_filtered`(#View2) from past revisions.
I am sure the original CREATE statement for (#View2) is `CREATE VIEW ...
AS SELECT * FROM ...._b0206`

Definition of View2 in SQLWorkbench/J generated schema:


    CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
    (
    wts,
    wdate,
    wsft,
    wspan,
    wstate,
    wline,
    rcodes
    )
    AS
    SELECT stats_per_shift_filtered_u0206.wts,
        stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
        stats_per_shift_filtered_u0206.wspan,
        stats_per_shift_filtered_u0206.wstate,
        stats_per_shift_filtered_u0206.wline,
        stats_per_shift_filtered_u0206.rcodes
    FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)


It feels like the utlog.stats_per_shift_filtered_u0206 in
utlog.stats_per_shift_filtered definition is a different object from
utlog.stats_per_shift_filtered_u0206?

I am totally out of clues. Any help would be appreciated. Thanks.


Regards,

Ben



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Obvious data mismatch in View2 which basically SELECT * from View1

От
Ravi Krishna
Дата:


>I haven't replaced the broken View2 yet. >Hope someone can point me to some >further investigation.

Did you look at the actual definition of view2. Like all RDBMS PG materializes the Ddl as it existed at the time of creation and converts it into an in line sql.
There is a possibility it may be different than what you think.

Re: Obvious data mismatch in View2 which basically SELECT * from View1

От
Jerry Sievers
Дата:
Ben <bentenzha@outlook.com> writes:

> Dear List,
>
> Some further investigation.
>
> Creating a fresh View3 on View1 gives exactly the same result as
> View1.
>
> The View1 View2 are both years old in a production database, in use
> for quite some time. (The database is production duty but not hosted
> in server room with UPS. It's like a edge PC in industry monitoring.
> Now am more concerned with its data integrity)
>
> The problem with the final report is reported recently. I am not sure
> what's broken in the database.
>
> I haven't replaced the broken View2 yet. Hope someone can point me to
> some further investigation.

Already mentioned downthread, but have a look at the view definitions
by...

select pg_get_viewdef('$your-view');

Or...

pg_dump --table $your-view

Expect to see something different if you repeat the above for the old
view giving undesired results and the new correct version.

HTH



> My concern is that if there are other views inside that database
> having similar integrity issue, how can I find them all (if any).
>
> It's beyond my regular SQL ability. I guess I really need help from
> people with maintenance experience.
>
> Any help will be appreciated, thanks in advance.
>
> Ben
>
>
>
> On September 16, 2020 3:40:34 AM UTC, Ben <bentenzha@outlook.com>
> wrote:
>
>     Dear list,
>     
>     Recently I am getting feedback, data in my analytic report is not 
>     repeatable. From time to time they get different data for the same time 
>     span.
>     (but IIRC previously it was OK). Therefore I started debuging the View 
>     chain for that report, during which I bumped into this issue/phenomenon.
>     
>     In a over -simplified version:
>     
>     CREATE VIEW2 AS SELECT * FROM VIEW1;
>     SELECT  col1 FROM VIEW2 WHERE cond1=True;
>     SELECT  col1 FROM VIEW1 WHERE cond1=True;
>     
>     Now col1 from both views looks different. I don't know where to start to 
>     solve this problem.
>     
>     The actual situation is a bit more than that, the following is the 
>     actual query:
>     
>     
>          -- trying to audit utlog weighed stat
>          with t as (
>          select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02 
>     0:0:0'::timestamp t1
>          )
>          --select * from t;
>          select *
>          -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
>          -- from utlog.stats_per_shift_filtered_per_reason
>          from utlog.stats_per_shift_filtered                     (let's call 
>     it #View2 for short)
>          -- from utlog.stats_per_shift_filtered_b0206      (let's call it 
>     #View1 for short)
>          -- from utlog.stats_per_shift
>          cross join t
>          where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
>          limit 100
>          ;
>     
>     The Result for #View2
>     
>          wts                 | wsft | wspan  | wstate | wline | rcodes
>          --------------------+------+--------+--------+-------+-------
>          2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
>          2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
>          2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
>          2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
>          2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}
>     
>          INDEX | COLUMN_NAME | DATA_TYPE
>          ------+-------------+------------
>          1     | wts         | timestamptz
>          3     | wsft        | varchar
>          4     | wspan       | float8
>          5     | wstate      | varchar
>          6     | wline       | varchar
>          7     | rcodes      | text[]
>     
>     
>     Same query, the Result for #View1
>     
>          wts                 | wsft | wspan | wstate | wline | rcodes
>          --------------------+------+-------+--------+-------+-------
>          2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
>          2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
>          2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
>          2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
>          2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}
>     
>          INDEX | COLUMN_NAME | DATA_TYPE
>          ------+-------------+------------
>          1     | wts         | timestamptz
>          3     | wsft        | varchar
>          4     | wspan       | float8
>          5     | wstate      | varchar
>          6     | wline       | varchar
>          7     | rcodes      | varchar[]
>     
>     Reuslts in `wspan` column is inaccurate while both type are float8. Most 
>     weird thing is the 5 to 0 change. for Row 1.
>     
>     The `_b0206`(#View1) is just a version of 
>     `stats_per_shift_filtered`(#View2) from past revisions.
>     I am sure the original CREATE statement for (#View2) is `CREATE VIEW ... 
>     AS SELECT * FROM ...._b0206`
>     
>     Definition of View2 in SQLWorkbench/J generated schema:
>     
>     
>          CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
>          (
>          wts,
>          wdate,
>          wsft,
>          wspan,
>          wstate,
>          wline,
>          rcodes
>          )
>          AS
>          SELECT stats_per_shift_filtered_u0206.wts,
>              stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
>              stats_per_shift_filtered_u0206.wspan,
>              stats_per_shift_filtered_u0206.wstate,
>              stats_per_shift_filtered_u0206.wline,
>              stats_per_shift_filtered_u0206.rcodes
>          FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)
>     
>     
>     It feels like the utlog.stats_per_shift_filtered_u0206 in 
>     utlog.stats_per_shift_filtered definition is a different object from 
>     utlog.stats_per_shift_filtered_u0206?
>     
>     I am totally out of clues. Any help would be appreciated. Thanks.
>     
>     
>     Regards,
>     
>     Ben
>     
>     
>     
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net



Re: Obvious data mismatch in View2 which basically SELECT * from View1

От
Ben
Дата:
Hi Sievers,  Krishna


You are right none of them is what I originally used to create them 
which should be
`select * from utlog.stats_per_shift_filtered_b0206`, but they do look 
different,
maybe the prev version is created before a pg update ? b0206 could be 
20190206 or 20180206...

the output:


lets_db=#     select pg_get_viewdef('utlog.stats_per_shift_filtered');
                             pg_get_viewdef
-----------------------------------------------------------------------
   SELECT stats_per_shift_filtered_u0206.wts,                          +
stats_per_shift_filtered_u0206.wdate,                            +
      (stats_per_shift_filtered_u0206.wsft)::character varying AS wsft,+
stats_per_shift_filtered_u0206.wspan,                            +
stats_per_shift_filtered_u0206.wstate,                           +
stats_per_shift_filtered_u0206.wline,                            +
stats_per_shift_filtered_u0206.rcodes                            +
     FROM utlog.stats_per_shift_filtered_u0206;
(1 row)

(result for the same query, wspan is the column in question:

  wts           |   wdate    | wsft |      wspan       | wstate | wlin
------------------------+------------+------+------------------+--------+-----
2020-07-01 14:57:38+08 | 2020-07-01 | D    |              6.8 | S00    | F02
  2020-07-01 13:22:58+08 | 2020-07-01 | D    |             5.15 | S00    
| F02
2020-07-01 11:35:46+08 | 2020-07-01 | D    | 12.2166666666667 | S00    | F02
2020-07-01 09:50:01+08 | 2020-07-01 | D    | 12.5333333333333 | S00    | F02
2020-07-01 08:00:00+08 | 2020-07-01 | D    |                0 | S00    | F02
(5 rows) )

lets_db=#     select pg_get_viewdef('utlog.view_test1');
                 pg_get_viewdef
-----------------------------------------------
   SELECT stats_per_shift_filtered_b0206.wts,  +
      stats_per_shift_filtered_b0206.wdate,    +
      stats_per_shift_filtered_b0206.wsft,     +
      stats_per_shift_filtered_b0206.wspan,    +
      stats_per_shift_filtered_b0206.wstate,   +
      stats_per_shift_filtered_b0206.wline,    +
      stats_per_shift_filtered_b0206.rcodes    +
     FROM utlog.stats_per_shift_filtered_b0206;
(1 row)

(the result for the same query:

wts |   wdate    | wsft | wspan | wstate | wline | rcodes
------------------------+------------+------+-------+--------+-------+----------------+---------------------
  2020-07-01 09:50:01+08 | 2020-07-01 | D    |    13 | S00    | F02   | 
{PDCB}
2020-07-01 11:35:46+08 | 2020-07-01 | D    |    12 | S00    | F02   | {CDSO}
2020-07-01 14:57:38+08 | 2020-07-01 | D    |     7 | S00    | F02   | {PDCB}
2020-07-01 08:00:00+08 | 2020-07-01 | D    |     5 | S00    | F02   | {PDCB}
2020-07-01 13:22:58+08 | 2020-07-01 | D    |     5 | S00    | F02   | {PDCB}
(5 rows)
)

The result in returned column looks different but

definition of the column in question (wspan::float8) looks identical in 
both case.


Regards,

Ben


On 9/17/20 10:41 PM, Jerry Sievers wrote:
> Ben <bentenzha@outlook.com> writes:
>
>> Dear List,
>>
>> Some further investigation.
>>
>> Creating a fresh View3 on View1 gives exactly the same result as
>> View1.
>>
>> The View1 View2 are both years old in a production database, in use
>> for quite some time. (The database is production duty but not hosted
>> in server room with UPS. It's like a edge PC in industry monitoring.
>> Now am more concerned with its data integrity)
>>
>> The problem with the final report is reported recently. I am not sure
>> what's broken in the database.
>>
>> I haven't replaced the broken View2 yet. Hope someone can point me to
>> some further investigation.
> Already mentioned downthread, but have a look at the view definitions
> by...
>
> select pg_get_viewdef('$your-view');
>
> Or...
>
> pg_dump --table $your-view
>
> Expect to see something different if you repeat the above for the old
> view giving undesired results and the new correct version.
>
> HTH
>
>
>
>> My concern is that if there are other views inside that database
>> having similar integrity issue, how can I find them all (if any).
>>
>> It's beyond my regular SQL ability. I guess I really need help from
>> people with maintenance experience.
>>
>> Any help will be appreciated, thanks in advance.
>>
>> Ben
>>
>>
>>
>> On September 16, 2020 3:40:34 AM UTC, Ben <bentenzha@outlook.com>
>> wrote:
>>
>>      Dear list,
>>      
>>      Recently I am getting feedback, data in my analytic report is not
>>      repeatable. From time to time they get different data for the same time
>>      span.
>>      (but IIRC previously it was OK). Therefore I started debuging the View
>>      chain for that report, during which I bumped into this issue/phenomenon.
>>      
>>      In a over -simplified version:
>>      
>>      CREATE VIEW2 AS SELECT * FROM VIEW1;
>>      SELECT  col1 FROM VIEW2 WHERE cond1=True;
>>      SELECT  col1 FROM VIEW1 WHERE cond1=True;
>>      
>>      Now col1 from both views looks different. I don't know where to start to
>>      solve this problem.
>>      
>>      The actual situation is a bit more than that, the following is the
>>      actual query:
>>      
>>      
>>           -- trying to audit utlog weighed stat
>>           with t as (
>>           select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
>>      0:0:0'::timestamp t1
>>           )
>>           --select * from t;
>>           select *
>>           -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
>>           -- from utlog.stats_per_shift_filtered_per_reason
>>           from utlog.stats_per_shift_filtered                     (let's call
>>      it #View2 for short)
>>           -- from utlog.stats_per_shift_filtered_b0206      (let's call it
>>      #View1 for short)
>>           -- from utlog.stats_per_shift
>>           cross join t
>>           where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
>>           limit 100
>>           ;
>>      
>>      The Result for #View2
>>      
>>           wts                 | wsft | wspan  | wstate | wline | rcodes
>>           --------------------+------+--------+--------+-------+-------
>>           2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
>>           2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
>>           2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
>>           2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
>>           2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}
>>      
>>           INDEX | COLUMN_NAME | DATA_TYPE
>>           ------+-------------+------------
>>           1     | wts         | timestamptz
>>           3     | wsft        | varchar
>>           4     | wspan       | float8
>>           5     | wstate      | varchar
>>           6     | wline       | varchar
>>           7     | rcodes      | text[]
>>      
>>      
>>      Same query, the Result for #View1
>>      
>>           wts                 | wsft | wspan | wstate | wline | rcodes
>>           --------------------+------+-------+--------+-------+-------
>>           2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
>>           2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
>>           2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
>>           2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
>>           2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}
>>      
>>           INDEX | COLUMN_NAME | DATA_TYPE
>>           ------+-------------+------------
>>           1     | wts         | timestamptz
>>           3     | wsft        | varchar
>>           4     | wspan       | float8
>>           5     | wstate      | varchar
>>           6     | wline       | varchar
>>           7     | rcodes      | varchar[]
>>      
>>      Reuslts in `wspan` column is inaccurate while both type are float8. Most
>>      weird thing is the 5 to 0 change. for Row 1.
>>      
>>      The `_b0206`(#View1) is just a version of
>>      `stats_per_shift_filtered`(#View2) from past revisions.
>>      I am sure the original CREATE statement for (#View2) is `CREATE VIEW ...
>>      AS SELECT * FROM ...._b0206`
>>      
>>      Definition of View2 in SQLWorkbench/J generated schema:
>>      
>>      
>>           CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
>>           (
>>           wts,
>>           wdate,
>>           wsft,
>>           wspan,
>>           wstate,
>>           wline,
>>           rcodes
>>           )
>>           AS
>>           SELECT stats_per_shift_filtered_u0206.wts,
>>               stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
>>               stats_per_shift_filtered_u0206.wspan,
>>               stats_per_shift_filtered_u0206.wstate,
>>               stats_per_shift_filtered_u0206.wline,
>>               stats_per_shift_filtered_u0206.rcodes
>>           FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)
>>      
>>      
>>      It feels like the utlog.stats_per_shift_filtered_u0206 in
>>      utlog.stats_per_shift_filtered definition is a different object from
>>      utlog.stats_per_shift_filtered_u0206?
>>      
>>      I am totally out of clues. Any help would be appreciated. Thanks.
>>      
>>      
>>      Regards,
>>      
>>      Ben
>>      
>>      
>>      
>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>
>>



Re: Obvious data mismatch in View2 which basically SELECT * from View1

От
Tom Lane
Дата:
Ben <bentenzha@outlook.com> <MWHPR06MB2400E1FA4AAD12440EA96B79B93E0@MWHPR06MB2400.namprd06.prod.outlook.com> writes:
> You are right none of them is what I originally used to create them
> which should be
> `select * from utlog.stats_per_shift_filtered_b0206`, but they do look
> different,

This discrepancy could explain a lot:

>       (stats_per_shift_filtered_u0206.wsft)::character varying AS wsft,

vs

>       stats_per_shift_filtered_b0206.wsft,

Your problem query has a WHERE clause testing wsft, so this difference
is subtly affecting what that WHERE clause means.  That could result in
a plan change, thereby changing the row output order, which as already
noted would be sufficient to explain the inconsistency.

If I had to bet I'd suspect that the extra cast is blocking use of an
index on wsft.

            regards, tom lane