Re: Qual push down to table AM
От | Julien Tachoires |
---|---|
Тема | Re: Qual push down to table AM |
Дата | |
Msg-id | 20250829083832.f2clwltkdzsfd2ns@poseidon.home.virt обсуждение исходный текст |
Ответ на | Re: Qual push down to table AM (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
Hi, On Wed, Aug 27, 2025 at 05:50:01PM -0400, Andres Freund wrote: > On 2025-08-27 22:27:37 +0200, Julien Tachoires wrote: > > Please find attached a patch set proposal intended to implement WHERE > > clauses (qual) push down to the underlying table AM during > > table/sequential scan execution. > > > > The primary goal of this project is to convert quals to ScanKeys and > > pass them to the table AMs. Table AMs are then allowed to apply early > > tuple filtering during table (sequential) scans. Applying filtering at > > the table storage level is something necessary for non row-oriented > > table storage like columnar storage. Index organized table is another > > table storage that would need quals push down. > > > > AFAIK, CustomScan is the one and only way to go for having table scan > > using quals pushed down, but each table AM must implement its own > > mechanism. IMHO, having this feature available in core would help the > > development of new table AMs. About Heap, some performance testing > > (detailed at the end of this message) shows between 45% and 60% > > improvement in seq scan execution time when only one tuple is returned > > from the table. > > One problem with doing that in the case of heapam is that you're evaluating > scan keys with the buffer lock held - with basically arbitrary expressions > being evaluated. That's an easy path to undetected deadlocks. You'd have to > redesign the relevant mechanism to filter outside of the lock... Thank you for this quick feedback. One potential approach to solve this in heapgettup() would be: 1. hold the buffer lock 2. get the tuple from the buffer 3. if the tuple is not visible, move to the next tuple, back to 2. 4. release the buffer lock 5. if the tuple does not satisfy the scan keys, take the buffer lock, move to the next tuple, back to 2. 6. return the tuple Do you see something fundamentally wrong here? In practice, I might be wrong, but I think this problem affects heapgettup() only, heapgettup_pagemode() does not hold the buffer lock when HeapKeyTest() is called. To reach this problematic part we need two conditions: non-NULL ScanKey array and not to be in the page-at-a-time mode (pagemode). The only table_beginscan_something() function able to meet these conditions before calling the scan_begin() API is table_beginscan_sampling(): the caller can choose to use pagemode. The only place where table_beginscan_sampling() is called is in tablesample_init(), in this case the ScanKey array is NULL, so we cannot reach the problematic part of heapgettup(). There is only one other case where we disable the pagemode in heapam: when the snapshot is non-MVCC. Do you know any other code path/scenario I missed that can lead to reach this problematic part? Best regards, -- Julien Tachoires
В списке pgsql-hackers по дате отправления: