Обсуждение: How can this be optimized, if possible?

Поиск
Список
Период
Сортировка

How can this be optimized, if possible?

От
"Net Virtual Mailing Lists"
Дата:
Hello,


My database has grown far faster then expected and a query which used to
run acceptably now does not.   I'm trying to figure out a way to make
this operate faster and scale better.   I'm very open to the idea that
this does not need to be done using a SQL query at all - right now I'm
really just in need of some conceptual/architectural help on this one.


So I have two tables:

         Table "category"
    Column    |       Type        | Modifiers
--------------+-------------------+-----------
 head_title   | character varying |
 cat_title    | character varying |
 subcat_title | character varying |
 category     | ltree             |



                                             Table "test"
       Column       |           Type           |
   Modifiers
--------------------+--------------------------
+-----------------------------------------------------------------
 id                 | integer                  | not null default
nextval('master.test_id_seq'::text)
 category           | ltree[]                  |


... there are other fields in the test table, but these are really the
only two relevant to this.


The query I want to run against these two tables is something like this:

SELECT
 count(*) as count,
 category.category,
 nlevel(category.category) AS level,
 subpath(category.category,0,nlevel(category.category)-1) as parent,
 category.head_title,
 category.cat_title,
 category.subcat_title
FROM
 test,
 category
WHERE
 test.category <@ category.category
GROUP BY
  category.category, category.head_title, category.cat_title,
category.subcat_title |


Many times the "WHERE" clause will contain additional search criteria on
the 'test' table.  What I am trying to get is a count of how many rows
from the test table fall into each category, being limited by the search
criteria.

This query is starting to take an enormous amount of time (30+ seconds)
and I really need the results of this in a couple seconds tops.  I can do
a "select category from test" and it completes in about .5 seconds.  The
category table currently only has 225 rows, the test table having
approximately 30,000.  "SELECT count(category,category FROM test GROUP BY
category" is quite slow and I thought of making a materialized view of
this, but then of course I don't see any way to make that result limited
by my search criteria.

I am completely open to re-architecting this entirely, performance of
this query is critical to my application - I really just am not sure
where to start.  It seems like everything I do is worse then what I
started with.

... It *almost* seems as if I need to build some sort of "search engine
like" tool which performs all the queries against the database, has tons
of memory, and cache the category attributes for each record in memory.
This sure seems like a lot of work though - I sincerely hope there is an
easier way.....

Thanks for your help, as always!

- Greg


Re: How can this be optimized, if possible?

От
Greg Stark
Дата:
"Net Virtual Mailing Lists" <mailinglists@net-virtual.com> writes:

> The query I want to run against these two tables is something like this:
>
> SELECT
>  count(*) as count,
>  category.category,
>  nlevel(category.category) AS level,
>  subpath(category.category,0,nlevel(category.category)-1) as parent,
>  category.head_title,
>  category.cat_title,
>  category.subcat_title
> FROM
>  test,
>  category
> WHERE
>  test.category <@ category.category
> GROUP BY
>   category.category, category.head_title, category.cat_title,
> category.subcat_title |
>
>
> Many times the "WHERE" clause will contain additional search criteria on
> the 'test' table.  What I am trying to get is a count of how many rows
> from the test table fall into each category, being limited by the search
> criteria.

Post the output of EXPLAIN ANALYZE SELECT ...

Also list any indexes you have on the tables. Do you have a GIST index on
the ltree column in test?

There are a number of ways of rewriting this query, you'll probably have some
success with one of them.

You could for example write it:

SELECT *,
 (SELECT count(*)
    FROM test
   WHERE category <@ category.category
    [AND search criteria...]) as count
 FROM category

Normally I would say your form with the join gives the planner the maximum
flexibility, but I don't think the planner is going to be able to do any
better than nested loops with a join clause like that so I don't think this
will be any worse than the join. And it might have a better chance of using an
index on test.category.

But not that it's still got to do 300 scans of the test index. If each one
takes .5s then this query is still going to take 150s or so. But with a gist
index on the test.category column it may be more 10s total. It will depend
partly on how many categories you have that span a large number of records in
test. That is, how many "parent" categories you have.


--
greg

Re: How can this be optimized, if possible?

От
"Dann Corbit"
Дата:
Would it help if you created a trigger that puts a category count into a
statistics table for each table?  Perhaps you could gather most of the
information you need from such a process.  If an estimate is good enough
and you are using the statistics collector and if you have an index on
that column, perhaps you could even query the system statistics tables.

Do you have a large number of categories, or is it only a few?

If it is only 10 categories or so, and if they do not change, then
perhaps you can reformulate as a collection of equal joins.

What does the query plan look like?  What sort of hardware are you
using?  What version of PostgreSQL are you using?

Your query speed does seem shockingly slow for only 30,000 rows in the
largest table.

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Net Virtual Mailing Lists
> Sent: Monday, July 04, 2005 9:47 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] How can this be optimized, if possible?
>
> Hello,
>
>
> My database has grown far faster then expected and a query which used
to
> run acceptably now does not.   I'm trying to figure out a way to make
> this operate faster and scale better.   I'm very open to the idea that
> this does not need to be done using a SQL query at all - right now I'm
> really just in need of some conceptual/architectural help on this one.
>
>
> So I have two tables:
>
>          Table "category"
>     Column    |       Type        | Modifiers
> --------------+-------------------+-----------
>  head_title   | character varying |
>  cat_title    | character varying |
>  subcat_title | character varying |
>  category     | ltree             |
>
>
>
>                                              Table "test"
>        Column       |           Type           |
>    Modifiers
> --------------------+--------------------------
> +-----------------------------------------------------------------
>  id                 | integer                  | not null default
> nextval('master.test_id_seq'::text)
>  category           | ltree[]                  |
>
>
> ... there are other fields in the test table, but these are really the
> only two relevant to this.
>
>
> The query I want to run against these two tables is something like
this:
>
> SELECT
>  count(*) as count,
>  category.category,
>  nlevel(category.category) AS level,
>  subpath(category.category,0,nlevel(category.category)-1) as parent,
>  category.head_title,
>  category.cat_title,
>  category.subcat_title
> FROM
>  test,
>  category
> WHERE
>  test.category <@ category.category
> GROUP BY
>   category.category, category.head_title, category.cat_title,
> category.subcat_title |
>
>
> Many times the "WHERE" clause will contain additional search criteria
on
> the 'test' table.  What I am trying to get is a count of how many rows
> from the test table fall into each category, being limited by the
search
> criteria.
>
> This query is starting to take an enormous amount of time (30+
seconds)
> and I really need the results of this in a couple seconds tops.  I can
do
> a "select category from test" and it completes in about .5 seconds.
The
> category table currently only has 225 rows, the test table having
> approximately 30,000.  "SELECT count(category,category FROM test GROUP
BY
> category" is quite slow and I thought of making a materialized view of
> this, but then of course I don't see any way to make that result
limited
> by my search criteria.
>
> I am completely open to re-architecting this entirely, performance of
> this query is critical to my application - I really just am not sure
> where to start.  It seems like everything I do is worse then what I
> started with.
>
> ... It *almost* seems as if I need to build some sort of "search
engine
> like" tool which performs all the queries against the database, has
tons
> of memory, and cache the category attributes for each record in
memory.
> This sure seems like a lot of work though - I sincerely hope there is
an
> easier way.....
>
> Thanks for your help, as always!
>
> - Greg
>
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings