Обсуждение: Help ... Unexpected results when using limit/offset with select statement..DB corruption?

Поиск
Список
Период
Сортировка

Help ... Unexpected results when using limit/offset with select statement..DB corruption?

От
"Barbara Cosentino"
Дата:

I got some weird results when processing select statements with limit and offset.   I think its some kind of database corruption but I was wondering what other’s think.

 
Background:
 
The table I’m having the issue with is described below.  The thing to note is the primary key 
 
ice=# \d nc_host_datum
        Table "public.nc_host_datum"
        Column        |  Type   | Modifiers
----------------------+---------+-----------
 host_id              | bigint  | not null
 host_datum_type_id   | integer | not null
 host_datum_source_id | integer | not null
 data                 | text    | not null
Indexes:
    "nc_host_datum_pkey" PRIMARY KEY, btree (host_id, host_datum_type_id)
Foreign-key constraints:
    "foreign_key_01" FOREIGN KEY (host_id) REFERENCES nc_host(host_id) ON UPDATE CASCADE ON DELETE CASCADE
    "foreign_key_02" FOREIGN KEY (host_datum_type_id) REFERENCES nc_host_datum_type(host_datum_type_id) ON UPDATE RESTRICT ON DELETE RESTRICT
    "foreign_key_03" FOREIGN KEY (host_datum_source_id) REFERENCES nc_host_datum_source(host_datum_source_id) ON UPDATE RESTRICT ON DELETE RESTRICT

Problem:
  
I perform the following select (notice that the group by is by the primary key). 
 
select host_id, host_datum_type_id, count(*)   
from nc_host_datum where host_id in
  ( select host_id 
  from nc_host 
  where audit_id=2041) 
group by host_id, host_datum_type_id;
 
and get the following result (There are many more rows but these are all the rows for host_id =   963711):
 
host_id   | host_datum_type_id | count 
-------------+------------------------------+---------
  963711 |                 58           |     1
  963711 |                 54           |     1
  963711 |                 39           |     1
  963711 |                 28           |     1
  963711 |                 27           |     1
 
 
Notice that there are 5 rows for host_id 963711 and the host_datum_type_id's are all unique 
 
Then I perform the following selects
 
SELECT host_id, host_datum_type_id, host_datum_source_id, data  
FROM nc_host_datum  INNER JOIN nc_host USING (host_id)  
WHERE audit_id=2041  
ORDER BY host_id  
LIMIT 49 OFFSET 1372;
 
And  
 
SELECT host_id, host_datum_type_id, host_datum_source_id, data  
FROM nc_host_datum  INNER JOIN nc_host USING (host_id)  
WHERE audit_id=2041  
ORDER BY host_id  
LIMIT 49 OFFSET 1421;
 
A portion of the output follows.  
 
 host_id | host_datum_type_id | host_datum_source_id |     data     
---------+--------------------+----------------------+--------------
  :
  :
 
  963710 |                 58 |                   17| harrish
  963711 |                 27 |                    3 | 1
  963711 |                 28 |                    3 | 1
(49 rows)
 
 
 host_id | host_datum_type_id | host_datum_source_id |     data     
---------+--------------------+----------------------+--------------
  963711 |                 28 |                    3 | 1
  963711 |                 58 |                   17 | lmitchel
  963711 |                 39 |                    3 | us.aegon.com 
:
: 
(49 rows)
 
Notice that host_id = 963711 and host_datum_type_id = 28 is repeated twice.  Since the offset is not overlapping, how can this happen?  Any ideas on how to fix this?
 
Thanks,
 
Barbara
 

 

Re: Help ... Unexpected results when using limit/offset with

От
Stephan Szabo
Дата:
On Thu, 18 Jan 2007, Barbara Cosentino wrote:

> Then I perform the following selects
>
> SELECT host_id, host_datum_type_id, host_datum_source_id, data
> FROM nc_host_datum  INNER JOIN nc_host USING (host_id)
> WHERE audit_id=2041
> ORDER BY host_id
> LIMIT 49 OFFSET 1372;
>
> And
>
> SELECT host_id, host_datum_type_id, host_datum_source_id, data
> FROM nc_host_datum  INNER JOIN nc_host USING (host_id)
> WHERE audit_id=2041
> ORDER BY host_id
> LIMIT 49 OFFSET 1421;
>
> A portion of the output follows.
>
>  host_id | host_datum_type_id | host_datum_source_id |     data
> ---------+--------------------+----------------------+--------------
>   :
>   :
>
>   963710 |                 58 |                   17| harrish
>   963711 |                 27 |                    3 | 1
>   963711 |                 28 |                    3 | 1
> (49 rows)
>
>
>  host_id | host_datum_type_id | host_datum_source_id |     data
> ---------+--------------------+----------------------+--------------
>   963711 |                 28 |                    3 | 1
>   963711 |                 58 |                   17 | lmitchel
>   963711 |                 39 |                    3 | us.aegon.com
> :
> :
> (49 rows)
>

> Notice that host_id = 963711 and host_datum_type_id = 28 is repeated
> twice.  Since the offset is not overlapping, how can this happen?

I'd suggest adding host_datum_type_id to the order by so that you have a
guarantee of the order that the rows for a given host_id will come,
otherwise I don't think you can assume anything within one host_id which
means you could get the same row at different effective offsets in
different runs of the base query (especially if you hit a point where the
plan changes).