Обсуждение: Adding TEXT columns tanks performance?

Поиск
Список
Период
Сортировка

Adding TEXT columns tanks performance?

От
Arturo Perez
Дата:
Hi all,

Saturday I changed a table to add a varchar(24) and a TEXT column.
It's used for some reporting purposes (small potatoe stuff really)
and the TEXT column remains mostly empty.  However, this week
performance has gotten terrible.  Queries joining against the
aforementioned table have gone from 40s to 1500s.  The schema change
is the only explanation I have for the 30x slower queries.  The
queries have definitely gotten disk-bound (I can see the connection
process sitting in the D state for several minutes).

This is pg8.1.4 on Linux RedHat. Would adding a TEXT column impact
things that much?  The change to the schema was

alter table foo add column title text;

tia,
arturo


Re: Adding TEXT columns tanks performance?

От
Ron Johnson
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/08/07 20:50, Arturo Perez wrote:
> Hi all,
>
> Saturday I changed a table to add a varchar(24) and a TEXT column.  It's
> used for some reporting purposes (small potatoe stuff really) and the
> TEXT column remains mostly empty.  However, this week performance has
> gotten terrible.  Queries joining against the aforementioned table have
> gone from 40s to 1500s.  The schema change is the only explanation I
> have for the 30x slower queries.  The queries have definitely gotten
> disk-bound (I can see the connection process sitting in the D state for
> several minutes).
>
> This is pg8.1.4 on Linux RedHat. Would adding a TEXT column impact
> things that much?  The change to the schema was
>
> alter table foo add column title text;

Could the issue be record fragmentation?  Does PG even suffer from
record fragmentation?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFy/YRS9HxQb37XmcRAlM3AKCmcuEQR0OBrNrs6vtdfrxtMgmswgCggVKS
d8W9YEzDUmCfT53jorbCq9w=
=xpiF
-----END PGP SIGNATURE-----

Re: Adding TEXT columns tanks performance?

От
"Merlin Moncure"
Дата:
On 2/8/07, Arturo Perez <aperez@hayesinc.com> wrote:
> Hi all,
>
> Saturday I changed a table to add a varchar(24) and a TEXT column.
> It's used for some reporting purposes (small potatoe stuff really)
> and the TEXT column remains mostly empty.  However, this week
> performance has gotten terrible.  Queries joining against the
> aforementioned table have gone from 40s to 1500s.  The schema change
> is the only explanation I have for the 30x slower queries.  The
> queries have definitely gotten disk-bound (I can see the connection
> process sitting in the D state for several minutes).
>
> This is pg8.1.4 on Linux RedHat. Would adding a TEXT column impact
> things that much?  The change to the schema was
>
> alter table foo add column title text;

explain analyze please.

no reason for adding text column to do that.  especially since you
didn't default the column which would effectively update the entire
table.

merlin

Re: Adding TEXT columns tanks performance?

От
Arturo Perez
Дата:
On Feb 9, 2007, at 11:43 AM, Merlin Moncure wrote:

> On 2/8/07, Arturo Perez <aperez@hayesinc.com> wrote:
>> Hi all,
>>
>> Saturday I changed a table to add a varchar(24) and a TEXT column.
>> It's used for some reporting purposes (small potatoe stuff really)
>> and the TEXT column remains mostly empty.  However, this week
>> performance has gotten terrible.  Queries joining against the
>> aforementioned table have gone from 40s to 1500s.  The schema change
>> is the only explanation I have for the 30x slower queries.  The
>> queries have definitely gotten disk-bound (I can see the connection
>> process sitting in the D state for several minutes).
>>
>> This is pg8.1.4 on Linux RedHat. Would adding a TEXT column impact
>> things that much?  The change to the schema was
>>
>> alter table foo add column title text;
>
> explain analyze please.
>
> no reason for adding text column to do that.  especially since you
> didn't default the column which would effectively update the entire
> table.
>
> merlin


Here's the explain analyze.  Note that this query was never very fast
but's it has literally gotten two orders of magnitude slower. This is
with 8.1.4 on linux with nothing special in terms of disks (I think
it's a mirrored system drive).  It's a hosted environment kind of thing.

Shared buffers is 160MB (20000) and effective cache is 1GB.  The
user_tracking
table has about 2M rows and the extended_user table has about 6K.
I'm not very
good at reading these but it looks like sort memory might be too
low?  work_mem is 1024,
the default.  I did this "set session work_mem to 100000;" in psql
but it still takes
quite a while.

iht=> explain analyze SELECT session_id, action, count(ACTION) as hits
iht->                                 FROM extended_user LEFT JOIN
user_tracking USING (user_id)
iht->                                 WHERE subscription_id = 1147
iht->                                 GROUP BY session_id, action
iht->                                 HAVING count(ACTION) > 0;

              QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------
GroupAggregate  (cost=172717.49..173695.46 rows=32599 width=60)
(actual time=411713.041..411761.857 rows=7309 loops=1)
    Filter: (count("action") > 0)
    ->  Sort  (cost=172717.49..172798.99 rows=32599 width=60) (actual
time=411712.907..411732.032 rows=16012 loops=1)
          Sort Key: user_tracking.session_id, user_tracking."action"
          ->  Merge Left Join  (cost=0.00..169571.78 rows=32599
width=60) (actual time=147593.828..411070.706 rows=16012 loops=1)
                Merge Cond: ("outer".user_id = "inner".user_id)
                ->  Index Scan using extended_user_pkey on
extended_user  (cost=0.00..236.92 rows=117 width=4) (actual
time=1.627..154.499 rows=119 loops=1)
                      Filter: (subscription_id = 1147)
                ->  Index Scan using user_tracking_user_id_idx on
user_tracking  (cost=0.00..164008.04 rows=2000218 width=64) (actual
time=0.010..408731.064 rows=2000620 loops=1)
Total runtime: 411781.174 ms
(10 rows)



Re: Adding TEXT columns tanks performance?

От
Tom Lane
Дата:
Arturo Perez <aperez@hayesinc.com> writes:
> Saturday I changed a table to add a varchar(24) and a TEXT column.

You didn't actually say which of these tables you changed?

> I'm not very good at reading these but it looks like sort memory might
> be too low?

The runtime seems to be entirely in the index scan on user_tracking.
I'm surprised it doesn't do something to avoid a full-table indexscan
--- in this case, hashing with extended_user as the inner relation would
seem like the obvious thing.  Is user_id a hashable datatype?

It's possible that adding the columns would have affected the plan by
making it look like a sort or hash would take too much memory, but if
that were it then your hand increase in work_mem should have fixed it.
Tis odd.  I don't suppose you know what plan was used before?

            regards, tom lane

Re: Adding TEXT columns tanks performance?

От
Arturo Perez
Дата:
On Feb 10, 2007, at 12:34 PM, Tom Lane wrote:

> Arturo Perez <aperez@hayesinc.com> writes:
>> Saturday I changed a table to add a varchar(24) and a TEXT column.
>
> You didn't actually say which of these tables you changed?

Sorry, I changed extended_user.

>
>> I'm not very good at reading these but it looks like sort memory
>> might
>> be too low?
>
> The runtime seems to be entirely in the index scan on user_tracking.
> I'm surprised it doesn't do something to avoid a full-table indexscan
> --- in this case, hashing with extended_user as the inner relation
> would
> seem like the obvious thing.  Is user_id a hashable datatype?

user_id is an integer;  Here are the table definitions, since this
seems like a problem
that won't go away anytime soon.

                 Table "public.extended_user"
       Column       |            Type             | Modifiers
-------------------+-----------------------------+-----------
create_date       | timestamp without time zone | not null
email             | character varying(99)       |
first_name        | character varying(99)       | not null
last_name         | character varying(99)       | not null
license_agreement | boolean                     | not null
license_date      | timestamp without time zone |
password          | character varying(32)       | not null
subscription_id   | integer                     | not null
user_id           | integer                     | not null
user_name         | character varying(99)       | not null
active            | boolean                     | not null
phone             | character varying(24)       |
title             | text                        |
Indexes:
     "extended_user_pkey" PRIMARY KEY, btree (user_id) CLUSTER
     "user_name_uq" UNIQUE, btree (user_name)
     "extended_user_subscription_id_idx" btree (subscription_id)
Foreign-key constraints:
     "extended_user_subscription_id_fkey" FOREIGN KEY
(subscription_id) REFERENCES subscription(subscription_id) DEFERRABLE
INITIALLY DEFERRED

\d user_tracking
                                                Table
"public.user_tracking"
       Column      |            Type
|                                Modifiers
------------------+-----------------------------
+-----------------------------------------------------------------------
---
action           | character varying(255)      | not null
entry_date       | timestamp without time zone | not null
note             | text                        |
report_id        | integer                     |
session_id       | character varying(255)      | not null
user_id          | integer                     |
user_tracking_id | integer                     | not null default
nextval('user_tracking_user_tracking_id_seq'::regclass)
Indexes:
     "user_tracking_pkey" PRIMARY KEY, btree (user_tracking_id)
     "user_tracking_monthly_idx" btree (date_part('year'::text,
entry_date), date_part('month'::text, entry_date))
     "user_tracking_quarterly_idx" btree (date_part('year'::text,
entry_date), date_part('quarter'::text, entry_date))
     "user_tracking_report_id_idx" btree (report_id)
     "user_tracking_user_id_idx" btree (user_id)
Foreign-key constraints:
     "user_tracking_report_id_fkey" FOREIGN KEY (report_id)
REFERENCES article(article_id) DEFERRABLE INITIALLY DEFERRED
     "user_tracking_user_id_fkey" FOREIGN KEY (user_id) REFERENCES
extended_user(user_id) DEFERRABLE INITIALLY DEFERRED


>
> It's possible that adding the columns would have affected the plan by
> making it look like a sort or hash would take too much memory, but if
> that were it then your hand increase in work_mem should have fixed it.
> Tis odd.  I don't suppose you know what plan was used before?
>
>             regards, tom lane

No, sorry.  Further information:  on disk the user_tracking table is
over 500MB, I can't
increase shared_buffers (currently 20000) because of SHMMAX limits
(for now, scheduled outage
and all that).

Any suggestions on how to improve the situation?

tias,
-arturo


Re: Adding TEXT columns tanks performance?

От
Arturo Perez
Дата:
On Feb 10, 2007, at 12:34 PM, Tom Lane wrote:

> Arturo Perez <aperez@hayesinc.com> writes:
>> Saturday I changed a table to add a varchar(24) and a TEXT column.
>
> You didn't actually say which of these tables you changed?
>
>> I'm not very good at reading these but it looks like sort memory
>> might
>> be too low?
>
> The runtime seems to be entirely in the index scan on user_tracking.
> I'm surprised it doesn't do something to avoid a full-table indexscan
> --- in this case, hashing with extended_user as the inner relation
> would
> seem like the obvious thing.  Is user_id a hashable datatype?
>
> It's possible that adding the columns would have affected the plan by
> making it look like a sort or hash would take too much memory, but if
> that were it then your hand increase in work_mem should have fixed it.
> Tis odd.  I don't suppose you know what plan was used before?
>
>             regards, tom lane


I did this and now the thing is nicely faster:

iht=> alter table user_tracking alter column user_id set statistics 500;
ALTER TABLE
iht=> analyze user_tracking;
ANALYZE
iht=> explain analyze SELECT session_id, action, count(ACTION) as hits
iht->                                 FROM extended_user LEFT JOIN
user_tracking USING (user_id)
iht->                                 WHERE subscription_id = 1147
iht->                                 GROUP BY session_id, action
iht->                                 HAVING count(ACTION) > 0;

         QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------
GroupAggregate  (cost=125961.69..127082.82 rows=37371 width=60)
(actual time=679.115..725.317 rows=7312 loops=1)
    Filter: (count("action") > 0)
    ->  Sort  (cost=125961.69..126055.12 rows=37371 width=60) (actual
time=679.067..697.588 rows=16017 loops=1)
          Sort Key: user_tracking.session_id, user_tracking."action"
          ->  Nested Loop Left Join  (cost=5.64..122319.43 rows=37371
width=60) (actual time=0.160..118.177 rows=16017 loops=1)
                ->  Index Scan using
extended_user_subscription_id_idx on extended_user
(cost=0.00..161.08 rows=134 width=4) (actual time=0.066..1.289
rows=119 loops=1)
                      Index Cond: (subscription_id = 1147)
                ->  Bitmap Heap Scan on user_tracking
(cost=5.64..905.77 rows=469 width=64) (actual time=0.162..0.730
rows=135 loops=119)
                      Recheck Cond: ("outer".user_id =
user_tracking.user_id)
                      ->  Bitmap Index Scan on
user_tracking_user_id_idx  (cost=0.00..5.64 rows=469 width=0) (actual
time=0.139..0.139 rows=135 loops=119)
                            Index Cond: ("outer".user_id =
user_tracking.user_id)
Total runtime: 732.520 ms
(12 rows)


thanks all,
arturo