Обсуждение: Help ... Unexpected results when using limit/offset with select statement..DB corruption?
Help ... Unexpected results when using limit/offset with select statement..DB corruption?
От
"Barbara Cosentino"
Дата:
I got some weird results when processing select statements with limit and offset. I think its some kind of database corruption but I was wondering what other’s think.
Background: The table I’m having the issue with is described below. The thing to note is the primary key ice=# \d nc_host_datum Table "public.nc_host_datum" Column | Type | Modifiers----------------------+---------+----------- host_id | bigint | not null host_datum_type_id | integer | not null host_datum_source_id | integer | not null data | text | not nullIndexes: "nc_host_datum_pkey" PRIMARY KEY, btree (host_id, host_datum_type_id)Foreign-key constraints: "foreign_key_01" FOREIGN KEY (host_id) REFERENCES nc_host(host_id) ON UPDATE CASCADE ON DELETE CASCADE "foreign_key_02" FOREIGN KEY (host_datum_type_id) REFERENCES nc_host_datum_type(host_datum_type_id) ON UPDATE RESTRICT ON DELETE RESTRICT "foreign_key_03" FOREIGN KEY (host_datum_source_id) REFERENCES nc_host_datum_source(host_datum_source_id) ON UPDATE RESTRICT ON DELETE RESTRICT
Problem: I perform the following select (notice that the group by is by the primary key). select host_id, host_datum_type_id, count(*) from nc_host_datum where host_id in ( select host_id from nc_host where audit_id=2041) group by host_id, host_datum_type_id; and get the following result (There are many more rows but these are all the rows for host_id = 963711): host_id | host_datum_type_id | count -------------+------------------------------+--------- 963711 | 58 | 1 963711 | 54 | 1 963711 | 39 | 1 963711 | 28 | 1 963711 | 27 | 1 Notice that there are 5 rows for host_id 963711 and the host_datum_type_id's are all unique Then I perform the following selects SELECT host_id, host_datum_type_id, host_datum_source_id, data FROM nc_host_datum INNER JOIN nc_host USING (host_id) WHERE audit_id=2041 ORDER BY host_id LIMIT 49 OFFSET 1372; And SELECT host_id, host_datum_type_id, host_datum_source_id, data FROM nc_host_datum INNER JOIN nc_host USING (host_id) WHERE audit_id=2041 ORDER BY host_id LIMIT 49 OFFSET 1421; A portion of the output follows. host_id | host_datum_type_id | host_datum_source_id | data ---------+--------------------+----------------------+-------------- : : 963710 | 58 | 17| harrish 963711 | 27 | 3 | 1 963711 | 28 | 3 | 1(49 rows) host_id | host_datum_type_id | host_datum_source_id | data ---------+--------------------+----------------------+-------------- 963711 | 28 | 3 | 1 963711 | 58 | 17 | lmitchel 963711 | 39 | 3 | us.aegon.com :: (49 rows) Notice that host_id = 963711 and host_datum_type_id = 28 is repeated twice. Since the offset is not overlapping, how can this happen? Any ideas on how to fix this? Thanks, Barbara
On Thu, 18 Jan 2007, Barbara Cosentino wrote: > Then I perform the following selects > > SELECT host_id, host_datum_type_id, host_datum_source_id, data > FROM nc_host_datum INNER JOIN nc_host USING (host_id) > WHERE audit_id=2041 > ORDER BY host_id > LIMIT 49 OFFSET 1372; > > And > > SELECT host_id, host_datum_type_id, host_datum_source_id, data > FROM nc_host_datum INNER JOIN nc_host USING (host_id) > WHERE audit_id=2041 > ORDER BY host_id > LIMIT 49 OFFSET 1421; > > A portion of the output follows. > > host_id | host_datum_type_id | host_datum_source_id | data > ---------+--------------------+----------------------+-------------- > : > : > > 963710 | 58 | 17| harrish > 963711 | 27 | 3 | 1 > 963711 | 28 | 3 | 1 > (49 rows) > > > host_id | host_datum_type_id | host_datum_source_id | data > ---------+--------------------+----------------------+-------------- > 963711 | 28 | 3 | 1 > 963711 | 58 | 17 | lmitchel > 963711 | 39 | 3 | us.aegon.com > : > : > (49 rows) > > Notice that host_id = 963711 and host_datum_type_id = 28 is repeated > twice. Since the offset is not overlapping, how can this happen? I'd suggest adding host_datum_type_id to the order by so that you have a guarantee of the order that the rows for a given host_id will come, otherwise I don't think you can assume anything within one host_id which means you could get the same row at different effective offsets in different runs of the base query (especially if you hit a point where the plan changes).