Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
| От | Richard Guo | 
|---|---|
| Тема | Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results | 
| Дата | |
| Msg-id | CAMbWs49MBMkc_HhCW2v_cNSPH8=tCKDKd0AgEX6OEp-xOkbFfw@mail.gmail.com обсуждение исходный текст  | 
		
| Ответ на | Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Ответы | 
                	
            		Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
            		
            		 Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results  | 
		
| Список | pgsql-bugs | 
On Tue, Oct 28, 2025 at 5:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > As far as I can see, in a parallel hash join node->hj_CurTuple will > be pointing into a shared hash table, so that this code is fooling > with a shared HasMatch flag bit with no sort of lock whatsoever. > This query doesn't have any joinqual at the PHJ level if I'm reading > EXPLAIN correctly, so the window to get it wrong isn't very wide. > Nonetheless, if two parallel workers inspect the same inner tuple > at about the same time, this code would allow both of them to > return it. > > I'm unsure if we've got any infrastructure that'd allow setting the > tuple's match bit in a more atomic fashion. We might have to revert > aa86129e1. Thanks Lori for the report and test case. Thanks Tom for the analysis. I think you're right. In a parallel hash join, the inner relation is stored in a shared global hash table for probing, allowing multiple workers to access the same tuples concurrently. If two workers probe the same inner tuple at the same time, it can lead to duplicate emissions of that tuple, violating the semantics of a right semi join. AFAICT, there are 3 possible options for a fix. 1) Revert aa86129e1. 2) Modify the code to perform atomic operations on the matched flag using a CAS (or a similar) mechanism when running in parallel execution. 3) Disable parallel right semi joins in the planner. For option #2, it seems to me that it would require non-trivial changes, which might not be suitable for back-patching. We would also need to evaluate the performance impact of introducing atomic operations. I'm somewhat inclined toward option #3 instead. We can implement the disablement in hash_inner_and_outer(), which would only require a small modification (perhaps two lines, plus some comment updates). If option #2 doesn't work out, I'm open to option #1. (I'm still trying to understand why concurrent access to the matched flag in cases other than right semi joins (such as right or full joins) doesn't lead to concurrency issues.) - Richard
В списке pgsql-bugs по дате отправления: