Update performance ... is 200,000 updates per hour what I should expect?

Поиск
Список
Период
Сортировка
От Erik Norvelle
Тема Update performance ... is 200,000 updates per hour what I should expect?
Дата
Msg-id A461D6B8-24DF-11D8-BDFB-000A9583BF06@norvelle.net
обсуждение исходный текст
Ответы Re: Update performance ... is 200,000 updates per hour  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
Re: Update performance ... is 200,000 updates per hour what I should expect?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Update performance ... is 200,000 updates per hour what I should expect?  (Greg Stark <gsstark@mit.edu>)
Список pgsql-performance
<fixed><fontfamily><param>Courier New</param>Folks:


I´m running a query which is designed to generate a foreign key for a
table of approx. 10 million records (I've mentioned this in an earlier
posting).  The table is called "indethom", and each row contains a
single word from the works of St. Thomas Aquinas, along with
grammatical data about the word form, and (most importantly for my
current problem) a set of columns identifying the particular
work/section/paragraph that the word appears in.


This database is completely non-normalized, and I'm working on
performing some basic normalization, beginning with creating a table
called "s2.sectiones" which (naturally) contains a complete listing of
all of the sections of all the works of St. Thomas.  I will then
eliminate this information from the original "indethom" table,
replacing it with the foreign key I am currently generating.


** My question has to do with whether or not I am getting maximal
speed out of PostgreSQL, or whether I need to perform further
optimizations.  I am currently getting about 200,000 updates per hour,
and updating the entire 10 million rows thus requires 50 hours, which
seems a bit much.


Here's the query I am running:

update indethom

    set query_counter = nextval('s2.query_counter_seq'),           --
Just for keeping track of how fast the query is running

    sectref = (select clavis from s2.sectiones where

        s2.sectiones.nomeoper = indethom.nomeoper

        and s2.sectiones.refere1a = indethom.refere1a and
s2.sectiones.refere1b = indethom.refere1b

        and s2.sectiones.refere2a = indethom.refere2a  and
s2.sectiones.refere2b = indethom.refere2b

        and s2.sectiones.refere3a = indethom.refere3a  and
s2.sectiones.refere3b = indethom.refere3b

        and s2.sectiones.refere4a = indethom.refere4a and
s2.sectiones.refere4b = indethom.refere4b);


Here´s the query plan:

                            QUERY PLAN
              

-------------------------------------------------------------------------------------

 Seq Scan on indethom  (cost=0.00..1310352.72 rows=10631972 width=212)

   SubPlan

     ->  Index Scan using sectiones_ndx on sectiones  (cost=0.00..6.03
rows=1 width=4)

           Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND
(refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a
= $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))

(4 rows)


<smaller>Note:  I have just performed a VACUUM ANALYZE on the indethom
table, as suggested by this listserve.</smaller>


Here's the structure of the s2.sectiones table:

it=> \d s2.sectiones

        Table "s2.sectiones"

  Column  |     Type     | Modifiers

----------+--------------+-----------

 nomeoper | character(3) |

 refere1a | character(2) |

 refere1b | character(2) |

 refere2a | character(2) |

 refere2b | character(2) |

 refere3a | character(2) |

 refere3b | character(2) |

 refere4a | character(2) |

 refere4b | character(2) |

 clavis   | integer      |

Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a,
refere2b, refere3a, refere3b, refere4a, refere4b)


Finally, here is the structure of indethom (some non-relevant columns
not shown):

it=> \d indethom

              Table "public.indethom"

    Column     |         Type          | Modifiers

---------------+-----------------------+-----------

 numeoper      | smallint              | not null

 nomeoper      | character(3)          | not null

 editcrit      | character(1)          |

 refere1a      | character(2)          |

 refere1b      | character(2)          |

 refere2a      | character(2)          |

 refere2b      | character(2)          |

 refere3a      | character(2)          |

 refere3b      | character(2)          |

 refere4a      | character(2)          |

 refere4b      | character(2)          |

 refere5a      | character(2)          | not null

 refere5b      | smallint              | not null

 referen6      | smallint              | not null

 ... several columns skipped ...

 verbum        | character varying(22) | not null

 ... other columns skipped ...

 poslinop      | integer               | not null

 posverli      | smallint              | not null

 posverop      | integer               | not null

 clavis        | integer               | not null

 articref      | integer               |

 sectref       | integer               |

 query_counter | integer               |

Indexes: indethom_pkey primary key btree (clavis),

         indethom_articulus_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b),

         indethom_sectio_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b, refere4a, refere4b),

         verbum_ndx btree (verbum)


Thanks for your assistance!

-Erik Norvelle</fontfamily></fixed>Folks:

I´m running a query which is designed to generate a foreign key for a
table of approx. 10 million records (I've mentioned this in an earlier
posting).  The table is called "indethom", and each row contains a
single word from the works of St. Thomas Aquinas, along with
grammatical data about the word form, and (most importantly for my
current problem) a set of columns identifying the particular
work/section/paragraph that the word appears in.

This database is completely non-normalized, and I'm working on
performing some basic normalization, beginning with creating a table
called "s2.sectiones" which (naturally) contains a complete listing of
all of the sections of all the works of St. Thomas.  I will then
eliminate this information from the original "indethom" table,
replacing it with the foreign key I am currently generating.

** My question has to do with whether or not I am getting maximal speed
out of PostgreSQL, or whether I need to perform further optimizations.
I am currently getting about 200,000 updates per hour, and updating the
entire 10 million rows thus requires 50 hours, which seems a bit much.

Here's the query I am running:
update indethom
    set query_counter = nextval('s2.query_counter_seq'),           -- Just
for keeping track of how fast the query is running
    sectref = (select clavis from s2.sectiones where
        s2.sectiones.nomeoper = indethom.nomeoper
        and s2.sectiones.refere1a = indethom.refere1a and
s2.sectiones.refere1b = indethom.refere1b
        and s2.sectiones.refere2a = indethom.refere2a  and
s2.sectiones.refere2b = indethom.refere2b
        and s2.sectiones.refere3a = indethom.refere3a  and
s2.sectiones.refere3b = indethom.refere3b
        and s2.sectiones.refere4a = indethom.refere4a and
s2.sectiones.refere4b = indethom.refere4b);

Here´s the query plan:
                             QUERY PLAN
------------------------------------------------------------------------
-------------
  Seq Scan on indethom  (cost=0.00..1310352.72 rows=10631972 width=212)
    SubPlan
      ->  Index Scan using sectiones_ndx on sectiones  (cost=0.00..6.03
rows=1 width=4)
            Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND
(refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a =
$5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))
(4 rows)

Note:  I have just performed a VACUUM ANALYZE on the indethom table, as
suggested by this listserve.

Here's the structure of the s2.sectiones table:
it=> \d s2.sectiones
         Table "s2.sectiones"
   Column  |     Type     | Modifiers
----------+--------------+-----------
  nomeoper | character(3) |
  refere1a | character(2) |
  refere1b | character(2) |
  refere2a | character(2) |
  refere2b | character(2) |
  refere3a | character(2) |
  refere3b | character(2) |
  refere4a | character(2) |
  refere4b | character(2) |
  clavis   | integer      |
Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a,
refere2b, refere3a, refere3b, refere4a, refere4b)

Finally, here is the structure of indethom (some non-relevant columns
not shown):
it=> \d indethom
               Table "public.indethom"
     Column     |         Type          | Modifiers
---------------+-----------------------+-----------
  numeoper      | smallint              | not null
  nomeoper      | character(3)          | not null
  editcrit      | character(1)          |
  refere1a      | character(2)          |
  refere1b      | character(2)          |
  refere2a      | character(2)          |
  refere2b      | character(2)          |
  refere3a      | character(2)          |
  refere3b      | character(2)          |
  refere4a      | character(2)          |
  refere4b      | character(2)          |
  refere5a      | character(2)          | not null
  refere5b      | smallint              | not null
  referen6      | smallint              | not null
  ... several columns skipped ...
  verbum        | character varying(22) | not null
  ... other columns skipped ...
  poslinop      | integer               | not null
  posverli      | smallint              | not null
  posverop      | integer               | not null
  clavis        | integer               | not null
  articref      | integer               |
  sectref       | integer               |
  query_counter | integer               |
Indexes: indethom_pkey primary key btree (clavis),
          indethom_articulus_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b),
          indethom_sectio_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b, refere4a, refere4b),
          verbum_ndx btree (verbum)

Thanks for your assistance!
-Erik Norvelle

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: A question on the query planner
Следующее
От: Stephan Szabo
Дата:
Сообщение: Re: Update performance ... is 200,000 updates per hour