Обсуждение: BUG #8629: Strange resultset when using CTE or a subselect
The following bug has been logged on the website: Bug reference: 8629 Logged by: Jonathan Camile Email address: jonathan.camile@gmail.com PostgreSQL version: 9.2.4 Operating system: Ubuntu 10.04.4 LTS Description: Hey folks! I have a bit of an issue with a query and I don't understand why. It might be not very elegant but here it is, when I use the following query the last result will always be the same whatsoever the values of LIMIT and OFFSET. ``` WITH filtred_table AS ( SELECT c.id FROM my_table t WHERE t.enabled = true AND (t.hdata->'field')::integer = ANY ('{16788}') ) SELECT my_table.id FROM my_table WHERE mycontract.id IN (SELECT filtred_table.id FROM filtred_table) ORDER BY my_table.hdata->'field' DESC LIMIT 5 OFFSET 0 ``` Returns ``` 392200574 367046368 375126026 407430954 385242163 ``` ``` WITH filtred_table AS ( SELECT c.id FROM my_table t WHERE t.enabled = true AND (t.hdata->'field')::integer = ANY ('{16788}') ) SELECT my_table.id FROM my_table WHERE mycontract.id IN (SELECT filtred_table.id FROM filtred_table) ORDER BY my_table.hdata->'field' DESC LIMIT 5 OFFSET 5 ``` Returns ``` 186939712 350754246 408507328 381550486 385242163 ``` If I don't `ORDER BY` the field used to filter the issue doesn't happen. Thanks.
On Mon, Nov 25, 2013 at 2:27 AM, <jonathan.camile@gmail.com> wrote: > > WITH filtred_table AS ( > SELECT c.id > FROM my_table t > WHERE t.enabled = true > AND (t.hdata->'field')::integer = ANY ('{16788}') > ) > SELECT my_table.id > FROM my_table > WHERE mycontract.id IN (SELECT filtred_table.id FROM filtred_table) > ORDER BY my_table.hdata->'field' DESC > LIMIT 5 > OFFSET 5 > Try with OFFSET 5 LIMIT 5. The offset should be applied *before* the LIMIT.
Hi, I tried with OFFSET before LIMIT but I have the same problem. It's always the same record that I have last. -- Jonathan Camile On 25 November 2013 18:32, bricklen <bricklen@gmail.com> wrote: > > On Mon, Nov 25, 2013 at 2:27 AM, <jonathan.camile@gmail.com> wrote: >> >> WITH filtred_table AS ( >> SELECT c.id >> FROM my_table t >> WHERE t.enabled = true >> AND (t.hdata->'field')::integer = ANY ('{16788}') >> ) >> SELECT my_table.id >> FROM my_table >> WHERE mycontract.id IN (SELECT filtred_table.id FROM filtred_table) >> ORDER BY my_table.hdata->'field' DESC >> LIMIT 5 >> OFFSET 5 >> > > > Try with OFFSET 5 LIMIT 5. The offset should be applied *before* the LIMIT. >
jonathan.camile wrote > The following bug has been logged on the website: > > Bug reference: 8629 > Logged by: Jonathan Camile > Email address: > jonathan.camile@ > PostgreSQL version: 9.2.4 > Operating system: Ubuntu 10.04.4 LTS > Description: > > Hey folks! > > > I have a bit of an issue with a query and I don't understand why. > It might be not very elegant but here it is, when I use the following > query > the last result will always be the same whatsoever the values of LIMIT and > OFFSET. When debugging ORDER BY/LIMIT it is good practice to output the columns being ordered, and others if applicable, and omit the LIMIT (or makes it considerably larger than needed) to see what raw table data the LIMIT clause is seeing. At minimum you need; "SELECT my_table.id, my_table.hdata->'field' FROM ..." for the output and either remove the limit or show at least 30 records initially. The small subset of the data you have provided is insufficient to determine whether you are making bad assumptions about your data or whether there is actually a problem. The fact that it is not self-contained makes debugging difficult as well. Note that LIMIT and OFFSET can appear in either order. While there are some parts of a select statement for which the syntax dictates an order these do not appear to be in the group. If they could not then LIMIT 5 OFFSET 100 would make no sense... -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-8629-Strange-resultset-when-using-CTE-or-a-subselect-tp5780187p5780199.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
> >> Try with OFFSET 5 LIMIT 5. The offset should be applied *before* the >> LIMIT. >> > David Johnston just proved my suggestion about the order of ORDER BY and OFFSET was full of it; I should have tested before suggesting that.
jonathan.camile wrote > CREATE TABLE public.testme AS SELECT generate_series(134800, 348008) as > id, trunc(random() * 9 + 1) as status; > > Then if you play with the following query, you will reproduce it. > > WITH filtred_test AS ( > SELECT c.id > FROM public.testme c > WHERE c.status = ANY ('{5}') > ) > SELECT mytest.id, mytest.status > FROM public.testme mytest > WHERE mytest.id IN (SELECT filtred_test.id FROM filtred_test) > ORDER BY mytest.status DESC > OFFSET 35 > LIMIT 10 Strange but not a bug - though I haven't tested it myself to prove out anything. You are ordering by a single field that, by definition, has the same value for every record that the LIMIT/OFFSET sees. The ORDER BY is effectively a no-op in this situation and the order of your output is going to be random. Why it just happens that the same record is always in your random output I have no clue but it is your query that is lacking here - not PostgreSQL. You need a secondary order by field, like ID, if you want to guarantee that different ranges provide different rows. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-8629-Strange-resultset-when-using-CTE-or-a-subselect-tp5780187p5780342.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Hi David, Thanks for your reply. Here a query to create table with which I can reproduce this behavior : CREATE TABLE public.testme AS SELECT generate_series(134800, 348008) as id, trunc(random() * 9 + 1) as status; Then if you play with the following query, you will reproduce it. WITH filtred_test AS ( SELECT c.id FROM public.testme c WHERE c.status = ANY ('{5}') ) SELECT mytest.id, mytest.status FROM public.testme mytest WHERE mytest.id IN (SELECT filtred_test.id FROM filtred_test) ORDER BY mytest.status DESC OFFSET 35 LIMIT 10 I always have the same last result regardless of the offset or the limit I use. If you have any clue about what's wrong I'll be very glad. By advance, thanks. David Johnston wrote > > jonathan.camile wrote >> The following bug has been logged on the website: >> >> Bug reference: 8629 >> Logged by: Jonathan Camile >> Email address: >> jonathan.camile@ >> PostgreSQL version: 9.2.4 >> Operating system: Ubuntu 10.04.4 LTS >> Description: >> >> Hey folks! >> >> >> I have a bit of an issue with a query and I don't understand why. >> It might be not very elegant but here it is, when I use the following >> query >> the last result will always be the same whatsoever the values of LIMIT >> and >> OFFSET. > When debugging ORDER BY/LIMIT it is good practice to output the columns > being ordered, and others if applicable, and omit the LIMIT (or makes it > considerably larger than needed) to see what raw table data the LIMIT > clause is seeing. At minimum you need; "SELECT my_table.id, > my_table.hdata->'field' FROM ..." for the output and either remove the > limit or show at least 30 records initially. > > > The small subset of the data you have provided is insufficient to > determine whether you are making bad assumptions about your data or > whether there is actually a problem. The fact that it is not > self-contained makes debugging difficult as well. > > Note that LIMIT and OFFSET can appear in either order. While there are > some parts of a select statement for which the syntax dictates an order > these do not appear to be in the group. If they could not then LIMIT 5 > OFFSET 100 would make no sense... David Johnston wrote > > jonathan.camile wrote >> The following bug has been logged on the website: >> >> Bug reference: 8629 >> Logged by: Jonathan Camile >> Email address: >> jonathan.camile@ >> PostgreSQL version: 9.2.4 >> Operating system: Ubuntu 10.04.4 LTS >> Description: >> >> Hey folks! >> >> >> I have a bit of an issue with a query and I don't understand why. >> It might be not very elegant but here it is, when I use the following >> query >> the last result will always be the same whatsoever the values of LIMIT >> and >> OFFSET. > When debugging ORDER BY/LIMIT it is good practice to output the columns > being ordered, and others if applicable, and omit the LIMIT (or makes it > considerably larger than needed) to see what raw table data the LIMIT > clause is seeing. At minimum you need; "SELECT my_table.id, > my_table.hdata->'field' FROM ..." for the output and either remove the > limit or show at least 30 records initially. > > > The small subset of the data you have provided is insufficient to > determine whether you are making bad assumptions about your data or > whether there is actually a problem. The fact that it is not > self-contained makes debugging difficult as well. > > Note that LIMIT and OFFSET can appear in either order. While there are > some parts of a select statement for which the syntax dictates an order > these do not appear to be in the group. If they could not then LIMIT 5 > OFFSET 100 would make no sense... David Johnston wrote > > jonathan.camile wrote >> The following bug has been logged on the website: >> >> Bug reference: 8629 >> Logged by: Jonathan Camile >> Email address: >> jonathan.camile@ >> PostgreSQL version: 9.2.4 >> Operating system: Ubuntu 10.04.4 LTS >> Description: >> >> Hey folks! >> >> >> I have a bit of an issue with a query and I don't understand why. >> It might be not very elegant but here it is, when I use the following >> query >> the last result will always be the same whatsoever the values of LIMIT >> and >> OFFSET. > When debugging ORDER BY/LIMIT it is good practice to output the columns > being ordered, and others if applicable, and omit the LIMIT (or makes it > considerably larger than needed) to see what raw table data the LIMIT > clause is seeing. At minimum you need; "SELECT my_table.id, > my_table.hdata->'field' FROM ..." for the output and either remove the > limit or show at least 30 records initially. > > > The small subset of the data you have provided is insufficient to > determine whether you are making bad assumptions about your data or > whether there is actually a problem. The fact that it is not > self-contained makes debugging difficult as well. > > Note that LIMIT and OFFSET can appear in either order. While there are > some parts of a select statement for which the syntax dictates an order > these do not appear to be in the group. If they could not then LIMIT 5 > OFFSET 100 would make no sense... -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-8629-Strange-resultset-when-using-CTE-or-a-subselect-tp5780187p5780294.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
On Tue, Nov 26, 2013 at 1:50 AM, jonathan.camile <jonathan.camile@gmail.com>wrote: > > Here a query to create table with which I can reproduce this behavior : > > CREATE TABLE public.testme AS SELECT generate_series(134800, 348008) as id, > trunc(random() * 9 + 1) as status; > > Then if you play with the following query, you will reproduce it. > > WITH filtred_test AS ( > SELECT c.id > FROM public.testme c > WHERE c.status = ANY ('{5}') > ) > SELECT mytest.id, mytest.status > FROM public.testme mytest > WHERE mytest.id IN (SELECT filtred_test.id FROM filtred_test) > ORDER BY mytest.status DESC > OFFSET 35 > LIMIT 10 > > I always have the same last result regardless of the offset or the limit I > use. > I cannot reproduce your problem in 9.2.5 nor 9.3.1. Rerunning the query with different OFFSET values gives me different results each time.
bricklen <bricklen@gmail.com> writes: > On Tue, Nov 26, 2013 at 1:50 AM, jonathan.camile > <jonathan.camile@gmail.com>wrote: >> I always have the same last result regardless of the offset or the limit I >> use. > I cannot reproduce your problem in 9.2.5 nor 9.3.1. Rerunning the query > with different OFFSET values gives me different results each time. I do reproduce the behavior, at least when offset+limit is small enough that the executor will use a top-N heapsort. As already stated, it's not a bug; the ordering is underdetermined so any set of the correct number of rows with status = 5 is a valid output. As for what's actually happening: the row that always shows up seems to be the physically first one with status = 5. I think that's starting out as the top of the top-N heap, and it just stays there because there's nothing that can displace it. regards, tom lane
Hi Tom, Thanks for the answer. It's clearer now, I might avoid this behavior ordering by id at last. I still have the impression that is a bit buggy or misleading. Even with simple query such as : SELECT c.id FROM public.testme c WHERE c.status = ANY ('{5}') ORDER BY c.status DESC OFFSET 150 LIMIT 200 the last is always the same but not the penultimate. Even if the ordering is underdetermined it shouldn't change then LIMIT/OFFSET shouldn't return the same value for 2 non overlapping ranges. I find it difficult to understand how it works completely, but my knowledge are quite limited. Thanks anyway. :) -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-8629-Strange-resultset-when-using-CTE-or-a-subselect-tp5780187p5780519.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.