Обсуждение: Patch to speed up pg_dump

Поиск
Список
Период
Сортировка

Patch to speed up pg_dump

От
"Vincze, Tamas"
Дата:
Hi,

We have a database with tens of millions of large objects, none
of them with any comments. When running pg_dump it spends several
hours looking for BLOB comments, finding none at the end but taxing
the server so much that the simplest query takes seconds to complete.

The attached patch fixes this by fetching the description only
of those BLOBs that may have it.

For those interested, some more info:

This query takes about 2 hours to execute:
sw2=# select count(*) from pg_largeobject;
    count
-----------
  135807552
(1 row)

I'm throttling the transfer rate on pg_dump's stdout so that it
doesn't affect server performance a lot, but obviously it didn't
help the function saving (looking for) BLOB comments.

Regards,
Tamas
--- postgresql-8.3.7/src/bin/pg_dump/pg_dump.c.orig    2009-03-31 15:47:28.000000000 -0400
+++ postgresql-8.3.7/src/bin/pg_dump/pg_dump.c    2009-04-01 10:25:39.000000000 -0400
@@ -1759,7 +1759,13 @@

     /* Cursor to get all BLOB comments */
     if (AH->remoteVersion >= 70200)
-        blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, obj_description(loid, 'pg_largeobject') FROM (SELECT
DISTINCTloid FROM pg_largeobject) ss"; 
+        /* Get comments for BLOBs that have a matching pg_description row. When there are many
+         * (millions) of BLOBs without comments this avoids fetching and then ignoring them,
+         * potentionally saving hours of backup time.
+         * Note that it may still select BLOBs that have no comment if a pg_description row's objoid
+         * matches a BLOB's loid, but references an object contained in a different system catalog,
+         * thus the PQgetisnull() check below is still needed to ignore them. */
+        blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, obj_description(loid, 'pg_largeobject') FROM (SELECT
DISTINCTloid FROM pg_description JOIN pg_largeobject ON (pg_description.objoid = pg_largeobject.loid)) ss"; 
     else if (AH->remoteVersion >= 70100)
         blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, obj_description(loid) FROM (SELECT DISTINCT loid FROM
pg_largeobject)ss"; 
     else

Re: Patch to speed up pg_dump

От
Tom Lane
Дата:
"Vincze, Tamas" <vincze@neb.com> writes:
> +         * Note that it may still select BLOBs that have no comment if a pg_description row's objoid
> +         * matches a BLOB's loid, but references an object contained in a different system catalog,

... seems like that would be easy to fix ...
        regards, tom lane


Re: Patch to speed up pg_dump

От
"Vincze, Tamas"
Дата:
Tom Lane wrote:
> "Vincze, Tamas" <vincze@neb.com> writes:
>> +         * Note that it may still select BLOBs that have no comment if a pg_description row's objoid
>> +         * matches a BLOB's loid, but references an object contained in a different system catalog,
>
> ... seems like that would be easy to fix ...

Yes, it wasn't that hard. The revised patch is attached.

Originally I didn't want to add more dependencies on the system
catalogs. Also, I've left the DECLARE statements untouched for
pre-v7.2 backends, so the NULL check on the comment is still
needed for those cases and if the description itself is NULL.

Regards,
Tamas
--- postgresql-8.3.7/src/bin/pg_dump/pg_dump.c.orig    2009-03-31 15:47:28.000000000 -0400
+++ postgresql-8.3.7/src/bin/pg_dump/pg_dump.c    2009-04-01 14:07:55.000000000 -0400
@@ -1759,7 +1759,18 @@

     /* Cursor to get all BLOB comments */
     if (AH->remoteVersion >= 70200)
-        blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, obj_description(loid, 'pg_largeobject') FROM (SELECT
DISTINCTloid FROM pg_largeobject) ss"; 
+        /* Get comments for BLOBs that have a matching pg_description row. When there are many
+         * (millions) of BLOBs without comments this avoids fetching and then ignoring them,
+         * potentionally saving hours of backup time. */
+        blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, obj_description(loid, 'pg_largeobject') FROM ("
+                "SELECT DISTINCT loid FROM pg_description JOIN pg_largeobject "
+                "ON (pg_description.objoid = pg_largeobject.loid) "
+                "WHERE classoid = ("
+                    "SELECT oid FROM pg_class WHERE relname = 'pg_largeobject' AND relnamespace = ("
+                        "SELECT oid FROM pg_namespace WHERE nspname='pg_catalog'"
+                    ")"
+                ") AND objsubid = 0"
+              ") ss";
     else if (AH->remoteVersion >= 70100)
         blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, obj_description(loid) FROM (SELECT DISTINCT loid FROM
pg_largeobject)ss"; 
     else

Re: Patch to speed up pg_dump

От
Tom Lane
Дата:
"Vincze, Tamas" <vincze@neb.com> writes:
> Tom Lane wrote:
>> "Vincze, Tamas" <vincze@neb.com> writes:
> +         * Note that it may still select BLOBs that have no comment if a pg_description row's objoid
> +         * matches a BLOB's loid, but references an object contained in a different system catalog,
>> 
>> ... seems like that would be easy to fix ...

> Yes, it wasn't that hard. The revised patch is attached.

Applied to HEAD and 8.3, using regclass cast to simplify ...
        regards, tom lane