Re: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search

Поиск
Список
Период
Сортировка
От Allan Kamau
Тема Re: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search
Дата
Msg-id AANLkTimxpMyH1RuZhf0Nu+yebUpPjHEeX5BeN99GaaFU@mail.gmail.com
обсуждение исходный текст
Ответ на huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
Ответы Re: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search  (Sam Mason <sam@samason.me.uk>)
Список pgsql-general
On Tue, Sep 28, 2010 at 1:49 PM, Ivan Sergio Borgonovo
<mail@webthatworks.it> wrote:
> I know I'm comparing apples and orange but still the difference in
> performance was quite astonishing.
>
> I've 2 tables that look like:
>
> create table products(
>  id bigint
>  price double precision, /* legacy, don't ask */
>  sometextfield1 varchar(128),
>  sometextfield2 varchar(128),
>  ...
> );
>
> one on a MS SQL 2005 and another one on pg 8.3.
>
> MS SQL has full text search on the text fields (I don't know the
> details).
>
> pg product table has a tsvector field and a gin index defined on it +
> trigger that update the tsvector field when the textfields change.
> The trigger is made in a way that it actually update the tsvector
> just if the text fields are changed.
>
> The hardware on the 2 machines is a bit different.
> MS SQL 2005 runs on a box that is 2 years old, 2 SATA drives on RAID
> 1 hw, 2 Xeon dual core (I can't check details right now)
> PG runs on a box that has more than 5 years, 3 SCSI drives on RAID 5
> hw, 2 Xeon single core HT (Intel(R) Xeon(TM) CPU 3.20GHz, cpu fam
> 14, model 4)
>
> Both have 4Gb of ram.
> shared_buffers is 240Mb.
> Both share a similar workload.
> Both boxes were in the same "price class" when they were bought.
>
> In both tables I've to update price.
> VACUUM FULL was performed just before updating the prices.
>
> MS SQL receives a large sql file that contain all the UPDATE
> statements.
> PG receive a csv file that is loaded into a table with COPY and then
> does the update as
> update products set price=p.price from temp_price where id=p.id and
> price<>p.price;
>
> MS SQL ingurgitate the whole sql file in around 10sec.
> pg takes more than 5 min to just run the single update statement.
>
> I'd like to know if such a large difference can be justified just by
> HW difference or by a difference in the process on how data are
> loaded [1] or by the difference in performance of the 2 servers on
> this kind of workload or by some postgres config before I decide how
> to manage my time to redesign the import procedure.
>
> If HW can justify such huge difference I'll devote my time to other
> problems.
>
> I'd say that a potential culprit could be the gin index. No matter
> if the tsvector is updated or not, if the row is changed I think the
> index is going to be updated anyway.
> Somehow MS SQL circumvent this problem, possibly by building the
> equivalent of a tsvector column in a "hidden" table that
> automatically join to the "text" table.
> This add a join but reduce the cost of table modification since
> simpler (btree) indexes are faster to update.
>
> Still huge updates are rare and that table is mostly read and very
> rarely written. During unusually huge updates I may consider to drop
> the gin index.
>
> [1] I'd expect that excluding the time it takes to load the csv a
> single update should run faster than a huge list of single statement
> update
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Restarting the computer on which PG is running may help. I have access
to a server running PG 8.4 on Ubuntu and I have noticed that after a
day of intense use the PG slows down significantly, "free -g" reports
almost no free memory available (something seems to leak memory on
this Ubuntu box). But when I restart the OS (Ubuntu), the PG executes
my queries in good time. I seem not to have similar problems on the
other servers running Fedora 12 and 13. But it could be my
configuration(s) on the Ubuntu box at fault, I am still investigating.

Allan.

В списке pgsql-general по дате отправления:

Предыдущее
От: Ivan Sergio Borgonovo
Дата:
Сообщение: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search
Следующее
От: Vick Khera
Дата:
Сообщение: Re: Scaling PostgreSQL-9