Eliminating PD_ALL_VISIBLE, take 2
От | Jeff Davis |
---|---|
Тема | Eliminating PD_ALL_VISIBLE, take 2 |
Дата | |
Msg-id | 1369886097.23418.0.camel@jdavis обсуждение исходный текст |
Ответы |
Re: Eliminating PD_ALL_VISIBLE, take 2
|
Список | pgsql-hackers |
Continuation of: http://www.postgresql.org/message-id/1353551097.11440.128.camel@sussancws0025 Rebased patch attached; no other changes. First of all, I'd like to take a step back and describe the benefit that I'm trying to achieve. If you have an analytic or data warehouse workload, then a data load may result in separate writes for 1. Data load itself 2. Setting hint bits 3. Setting PD_ALL_VISIBLE If you have very low concurrency, short transactions, or data is small enough to fit in memory; then #2 and #3 might be combined into one write. But an analytic or data warehouse workload has long-running queries and there are typically at least some queries going at any given time. That means that SELECTs or VACUUMs that happen soon after the data load will start setting hint bits, and the shared buffers will fill up and those pages will be evicted (without PD_ALL_VISIBLE because there are still concurrent queries that can't see the tuples). Then, a later VACUUM will come along and rewrite the whole table, just to set PD_ALL_VISIBLE. The visibility map bit and PD_ALL_VISIBLE have the same meaning, so this proposal is merely to eliminate PD_ALL_VISIBLE and the associated logic. The benefits are: * eliminate extra page writes when the only change is setting PD_ALL_VISIBLE (when checksums are enabled, this is particularly expensive because it requires the pages to be written to WAL as well) * simplify the code by removing some complex logic that defies the ordinary rules in transam/README The costs are: * during a scan, we need to check the VM so that we don't need to read the hints on each tuple individually * inserts, updates, and deletes need to check the VM to know whether to unset the page's bit The costs are primarily the need to pin VM pages in situations where we would have just used PD_ALL_VISIBLE before. Pinning pages requires locking and maintenance in shared buffers, so this is a real concern. However, I believe the previous discussion was derailed because of fear of contention if we pin any pages at all. I'd like this discussion to be about whether we can spread the cost of pinning a VM page over enough other work to be negligible. Heikki pointed out one degenerate case where the cost of pinning pages showed up: http://www.postgresql.org/message-id/50FD11C5.1030700@vmware.com But as he says: "This is a worst-case scenario, and the slowdown isn't huge, so maybe it's a worthwhile tradeoff. But it shows that removing PD_ALL_VISIBLE is not completely free." Also, there is another proposal to eliminate freezing as we know it: http://www.postgresql.org/message-id/20130523175148.GA29374@alap2.anarazel.de I don't see any obvious conflict between the two proposals, but they are related and one may affect the other. They may even be complimentary. Regards, Jeff Davis
Вложения
В списке pgsql-hackers по дате отправления: