Обсуждение: Cursor with hold emits the same row more than once across commits in 8.3.7
Short Desc: Cursor with hold emits the same row more than once across commits in 8.3.7 Os : Debian Etch amd64 / Ubuntu Jaunty amd64 Pg : 8.3.7 Build options: Official package and also compiled from source with --enable-integer-datetimes Detailed Desc: A construction of the form DECLARE cur CURSOR WITH HOLD FOR SELECT * FROM obj loop FETCH 1000 FROM cur process 'em COMMIT results in some of the same rows being emitted more than once, altho the final rowcount is correct (i.e some rows end up being never seen). Originally discovered using a perl DBI program, and we wondered if the version of DBD::Pg might be an issue, so a c library program was written to test this - and it exhibits the problem too (see attached for schema and program). The table rows are reasonably wide: select attname,n_distinct,avg_width from pg_stats where tablename='obj'; attname | n_distinct | avg_width -------------+------------+----------- obj_id | -1 | 4 obj_link_id | 5 | 4 obj_fil | 13035 | 1188 which may be a factor(tuplestore issues?)... The table is reasonably sizable (10000000 rows). I can attach the generation program for this dataset if required. regards Mark
Вложения
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > A construction of the form > DECLARE cur CURSOR WITH HOLD FOR SELECT * FROM obj > loop > FETCH 1000 FROM cur > process 'em > COMMIT > results in some of the same rows being emitted more than once, altho the > final rowcount is correct (i.e some rows end up being never seen). I poked into this a bit, and it looks sort of nasty. Mark's immediate complaint is a consequence of the synchronize_seqscan patch, but there are other issues too. The problem comes from the fact that a WITH HOLD cursor is initially treated the same as a regular cursor, ie, we just fetch on demand. If it's still open at transaction commit, we do this: ExecutorRewind();fetch all the rows into a tuplestore;advance the tuplestore past the number of rows previously fetched; and then later transactions can fetch-on-demand from the tuplestore. The reason for the bug is that with synchronize_seqscan on, a SeqScan node that gets rewound does not necessarily restart from the same point in the table that it initially started reading from. So the initial fetch grabs 1000 rows, but then when we rewind, the first 1000 rows loaded into the tuplestore may come from a different range of the table. This does not only affect cursors WITH HOLD. Some paths in the cursor MOVE logic also rely on ExecutorRewind to behave sanely. We could probably fix this specific issue by refactoring things in such a way that the seqscan start point is frozen on the first read and re-used after rewinds. However, it strikes me also that a cursor query containing volatile functions is going to cause some similar issues --- you can't just re-execute the query for "the same" rows and expect to get stable results. What should we do about that? The technically best solution is probably similar to what Materialize nodes do, ie, read the query only once and be careful to stash rows aside into a tuplestore as they are read. This would fix both issues with one patch. The problem with that is that if the user doesn't actually do any backwards fetching, you waste all the tuplestore maintenance work. Or we could just document that cursors containing volatile functions don't behave stably if you try to read backwards; or try to enforce that you can't do so. The volatile-function issue has been there since the dawn of time, and we've never had a complaint about it AFAIR. So maybe trying to "fix" it isn't a good thing and we should just document the behavior. But the syncscan instability is new and probably ought to be dealt with. Thoughts? regards, tom lane
Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
От
Alvaro Herrera
Дата:
Robert Haas escribió: > I suppose in theory you could try to figure out whether > materialization is really necessary (let's see... no seqscans here and > no volatile functions... ok, so we can cheat...) but that seems > likely to lead to future bugs as the rules for which things are safe > change. Another thing we could do is disable syncscan if we see a WITH HOLD cursor, but I guess it's not future-proof either. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert Haas <robertmhaas@gmail.com> writes: > This seems like the only option that will produce correct answers, so > it gets my vote. How much is the performance penalty for > materializing the tuplestore? I'm inclined to think that whatever it > is, you just have to pay it if you ask for a WITH HOLD cursor. I don't mind paying it for a WITH HOLD cursor, since by definition you're asking for a more expensive behavior there. The thing that is bothering me more is whether we want to pay a price for a *non* WITH HOLD cursor. You can get instability for seqscan or volatile functions if you just try MOVE BACKWARD ALL and re-read. regards, tom lane
On Tue, 2009-06-09 at 12:07 -0400, Tom Lane wrote: > We could probably fix this specific issue by refactoring things in such > a way that the seqscan start point is frozen on the first read and > re-used after rewinds. I don't know what you mean by "frozen" exactly, but the start point of a synchronized scan is stored in shared memory; otherwise, it wouldn't know where to stop. Regards,Jeff Davis
On Tue, 2009-06-09 at 10:51 -0700, Jeff Davis wrote: > On Tue, 2009-06-09 at 12:07 -0400, Tom Lane wrote: > > We could probably fix this specific issue by refactoring things in such > > a way that the seqscan start point is frozen on the first read and > > re-used after rewinds. > > I don't know what you mean by "frozen" exactly, but the start point of a > synchronized scan is stored in shared memory; otherwise, it wouldn't > know where to stop. > Correction: I didn't actually mean _shared_ memory there. It's just backend-local memory. Regards,Jeff Davis
Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
От
"Kevin Grittner"
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> This seems like the only option that will produce correct answers, >> so it gets my vote. How much is the performance penalty for >> materializing the tuplestore? I'm inclined to think that whatever >> it is, you just have to pay it if you ask for a WITH HOLD cursor. > > I don't mind paying it for a WITH HOLD cursor, since by definition > you're asking for a more expensive behavior there. The thing that > is bothering me more is whether we want to pay a price for a *non* > WITH HOLD cursor. You can get instability for seqscan or volatile > functions if you just try MOVE BACKWARD ALL and re-read. I would expect to pay more for a scrollable cursor than non- scrollable; and in fact, the fine manual says "Depending upon the complexity of the query's execution plan, specifying SCROLL might impose a performance penalty on the query's execution time." That would tend to argue in favor of taking the time to produce correct answers. It does raise a question, though, about another sentence in the same paragraph: "The default is to allow scrolling in some cases; this is not the same as specifying SCROLL." Either we make people pay for this when they haven't specified SCROLL but PostgreSQL has historically given it to them anyway, or we might break existing applications. -Kevin
On Tue, Jun 9, 2009 at 12:07 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > The technically best solution is probably similar to what Materialize > nodes do, ie, read the query only once and be careful to stash rows > aside into a tuplestore as they are read. This would fix both issues > with one patch. The problem with that is that if the user doesn't > actually do any backwards fetching, you waste all the tuplestore > maintenance work. This seems like the only option that will produce correct answers, so it gets my vote. How much is the performance penalty for materializing the tuplestore? I'm inclined to think that whatever it is, you just have to pay it if you ask for a WITH HOLD cursor. I suppose in theory you could try to figure out whether materialization is really necessary (let's see... no seqscans here and no volatile functions... ok, so we can cheat...) but that seems likely to lead to future bugs as the rules for which things are safe change. ...Robert
Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
От
Jaime Casanova
Дата:
On Tue, Jun 9, 2009 at 1:00 PM, Kevin Grittner<Kevin.Grittner@wicourts.gov> wrote: > > the same paragraph: "The default is to allow scrolling in some cases; "in some cases"... like in "but not all"... ? this doesn't sound like a vow to me. if the user really wants SCROLLing ability he should have been specified that way... i say pay the price for WITH HOLD and SCROLL and don't allow scrolling ability if SCROLL hasn't been specified -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Tue, Jun 9, 2009 at 1:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> This seems like the only option that will produce correct answers, so >> it gets my vote. How much is the performance penalty for >> materializing the tuplestore? I'm inclined to think that whatever it >> is, you just have to pay it if you ask for a WITH HOLD cursor. > > I don't mind paying it for a WITH HOLD cursor, since by definition > you're asking for a more expensive behavior there. The thing that is > bothering me more is whether we want to pay a price for a *non* WITH > HOLD cursor. You can get instability for seqscan or volatile functions > if you just try MOVE BACKWARD ALL and re-read. [ reads the fine manual ] It seems like we need to materialize if you ask for WITH HOLD or SCROLL. I guess the question is what to do if you haven't specified either and then try to scroll anyway. The manual says that it may fail, but it doesn't say that might seem to work but actually return wrong answers. ...Robert
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2009-06-09 at 10:51 -0700, Jeff Davis wrote: >> I don't know what you mean by "frozen" exactly, but the start point of a >> synchronized scan is stored in shared memory; otherwise, it wouldn't >> know where to stop. > Correction: I didn't actually mean _shared_ memory there. It's just > backend-local memory. Well, wherever it's stored, it's a demonstrable fact that we're not getting the same rows after ExecutorRewind(); and that we do get the same rows out if we disable synchronize_seqscans in Mark's test case. I haven't got round to identifying exactly what to change if we decide to go for a narrow fix instead of storing the query results at a higher level. regards, tom lane
Given that RC freeze is nearly upon us for 8.4, and that we need a reasonably non-invasive fix for 8.3 anyway, I propose that for now we just deal with the syncscan issue by tweaking heap_rescan so that rs_startblock doesn't get changed. It looks like that's about a three-line patch. The question of how cursors should behave with respect to volatile functions can be documented and left for another time. regards, tom lane
Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
От
Mark Kirkwood
Дата:
Tom Lane wrote: > Given that RC freeze is nearly upon us for 8.4, and that we need a > reasonably non-invasive fix for 8.3 anyway, I propose that for now > we just deal with the syncscan issue by tweaking heap_rescan so that > rs_startblock doesn't get changed. It looks like that's about a > three-line patch. The question of how cursors should behave with > respect to volatile functions can be documented and left for another > time. > Sounds like a good approach. Mark