RE: Recovery performance of standby for multiple concurrenttruncates on large tables
| От | Jamison, Kirk | 
|---|---|
| Тема | RE: Recovery performance of standby for multiple concurrenttruncates on large tables | 
| Дата | |
| Msg-id | D09B13F772D2274BB348A310EE3027C6326856@g01jpexmbkw24 обсуждение исходный текст | 
| Ответ на | Re: Recovery performance of standby for multiple concurrent truncateson large tables (Thomas Munro <thomas.munro@enterprisedb.com>) | 
| Список | pgsql-hackers | 
Hi, I appreciate the feedback and suggestions. On Tue, Jul 31, 2018 at 8:01 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> How would this work if a relfilenode number that belonged to an old >> relation got recycled for a new relation? >> .. >> I think something like this could be made to work -- both on the >> master and the standby, and not just while waiting for a failover -- >> ... >> It's not clear to me whether it would be worth the overhead of doing >> something like this. Making relation drops faster at the cost of >> making buffer cleaning slower could be a loser. The deferred list has the information such as relfilenode and the head/top page number of invalid pages. The standby in the promote mode only registers info in the deferred list when redoing the COMMIT record. However, it scans the shared buffer cache only once before the end of the promote, and discards the page related to the table included in the deferred list. After removing, increment the head page number of the abovementioned invalid page. In this way, if ever while promoting an INSERT progresses (I'm not sure if it's possible), the discard progresses too, & the app can refer to it. As mentioned by Tsunakawa-san above, the purpose of the design is to make the failover process as shorter as possible and not really to make the drop of relations faster. My initial thought was not to touch the regular buffer invalidation process, but I am also not sure of the level of complexity that the proposed design would affect and how crucial the requirement (shorter failover) would make, so I asked for community's feedback. On Tue, July 31, 2018 8:27 AM, Thomas Munro wrote: > Anyway, it's a lot of complexity, and it falls back to a worst cases > like today, and can also transfer work to innocent foreground processes. > I see why Andres says we should just get a better data structure so we > can make the guy doing the dropping pay for it up front, but more > efficiently. I suspect we may also want an ordered data structure for > write clustering purposes one day. I also understand the value of solving the root cause of the problem that's why I asked Andres if we could expect a development from the community for V12 regarding the radix tree approach for buffer management, or even just from anyone who could start a WIP patch regardless if it's radix tree or not. And perhaps we'd like to get involved as well as this is also our customer's problem. So I am just curious about the radix tree approach's design. Maybe we should start discussing what kind of data structures, processing, etc. are involved? I also read other design solutions from another thread [1]. a. Fujii-san's proposal Add $SUBJECT for performance improvement; reloption to prevent vacuum from truncating empty pages b. Pavan's comment > What if we remember the buffers as seen by count_nondeletable_pages() and > then just discard those specific buffers instead of scanning the entire > shared_buffers again? Surely we revisit all to-be-truncated blocks before > actual truncation. So we already know which buffers to discard. And we're > holding exclusive lock at that point, so nothing can change underneath. Of > course, we can't really remember a large number of buffers, so we can do > this in small chunks. Scan last K blocks, remember those K buffers, discard > those K buffers, truncate the relation and then try for next K blocks. If > another backend requests lock on the table, we give up or retry after a > while. [1] https://www.postgresql.org/message-id/flat/20180419203802.hqrb4o2wexjnb2ft%40alvherre.pgsql#7d4a8c56a01392a3909b2150371e6495 Now, how do we move forward? Thank you everyone. Regards, Kirk Jamison
В списке pgsql-hackers по дате отправления: