COPY with hints, rebirth
От | Simon Riggs |
---|---|
Тема | COPY with hints, rebirth |
Дата | |
Msg-id | CA+U5nMJ8CdApEcm0+Ln0WXVx194k3wJHihb=-0xDHen1v0hhDg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: COPY with hints, rebirth
Re: COPY with hints, rebirth |
Список | pgsql-hackers |
A long time ago, in a galaxy far away, we discussed ways to speed up data loads/COPY. http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php In particular, the idea that we could mark tuples as committed while we are still loading them, to avoid negative behaviour for the first reader. Simple patch to implement this is attached, together with test case. Current behaviour is shown here Run COPY and then... SELECT count(*) FROM table with no indexes 1st SELECT Time: 1518.571 ms <--- slowed dramatically by setting hint bits 2nd SELECT Time: 914.141 ms 3rd SELECT Time: 914.921 ms With this patch I observed the following results 1st SELECT Time: 890.820 ms 2nd SELECT Time: 884.799 ms 3rd SELECT Time: 882.405 ms What exactly does it do? Previously, we optimised COPY when it was loading data into a newly created table or a freshly truncated table. This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of code. The patch also adds some tests for corner cases that would make that action break MVCC - though those cases are minor and typical data loads will benefit fully from this. In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC() and adding current xid to snapshots. That is an invasive change that I would wish to avoid at any time and explains the long delay in tackling this. The way I've implemented it, is just as a short test during XidInMVCCSnapshot() so that we trap the case when the xid == xmax and so would appear to be running. This is much less invasive and just as performant as Tom's original suggestion. Why do we need this now? Setting checksums on page requires us to write WAL for hints, so the situation of the 1st SELECT after a load would get somewhat worse when page_checksums are enabled, but we already know there is a price. However, this is a situation we can solve, and add value for all cases, not just when checksums enabled. So I'm posting this as a separate patch rather than including that as a tuning feature of the checksums patch. Your input will be generously received, -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: