Обсуждение: Query Analysis

Поиск
Список
Период
Сортировка

Query Analysis

От
Дата:
Direct Cc: would be MUCH appreciated!

I'm using PostgreSQL 7.1.3

My FIRST question is:

How come I can't seem to get any of that nifty profiling output to
/var/log/messages?

[aside]
Not in /var/log/pgsql nor in /var/lib/pgsql/data/pg.log either.  Don't
care where it goes, so long as I can find it...  While I realize that this
is very configurable, some "clues" to newbies about the usual places would
have been most welcome in the docs.
[/aside]


I have:
  Altered postgresql.conf to turn "on" the show_query_stats (et al) as
well as syslog = 2
  Altered /etc/rc.d/init/postgresql to be:
su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl  -D $PGDATA -o '-i -s ' -p
/usr/bin/postmaster start  > /dev/null 2>&1" < /dev/null
  Altered /var/lib/pgsql/postmaster.opts to be:
/usr/bin/postmaster '-D' '/var/lib/pgsql/data' '-i' '-s'

Once I'm in psql, I use SET to turn them on as well.

This resulted in all my ERROR and NOTICE messages going into
/var/log/messages, but *NOT* any sort of nifty query analysis type stuff.

So what did I miss?  Is there another client/server spot where I need to
get that '-s' in there?

Is there another switch to actually kick-start it?  The docs are probably
real clear to y'all, but I'm obviously missing something simple here...


Of course, the root problem is a monster query that suddenly takes far far
too long...

I realize that I'm trying to do a full-text search, *BUT* a similar query
"works fine"...

Why does this take minutes:

SELECT DISTINCT *, 0 + (0 + 10 * (lower(title) like '%einstein%') ::int +
10 * (lower(author_flattened) like '%einstein%') ::int + 30 *
(lower(subject_flattened) like '%einstein%') ::int + 30 * (lower(text)
LIKE '%einstein%') ::int + 9 * (substring(lower(title), 1, 20) like
'%einstein%') ::int + 25 * (substring(lower(text), 1, 20) LIKE
'%einstein%') ::int ) AS points FROM article WHERE TRUE AND (FALSE OR
(lower(title) like '%einstein%') OR (lower(author_flattened) like
'%einstein%') OR (lower(subject_flattened) like '%einstein%') OR
(lower(text) LIKE '%einstein%') ) ORDER BY points desc, volume, number,
article.article LIMIT 10, 0

while this takes seconds:

SELECT *, 0 + 3 * ( title like '%Einstein%' )::int + 3 * ( author like
'%Einstein%' )::int + ( ( 1 + 1 * ( lower(text) like '%einstein%' )::int )
+ ( 0 + ( subject like '%Einstein%' )::int ) ) AS points FROM article
WHERE TRUE AND title like '%Einstein%' AND author like '%Einstein%' AND (
( TRUE AND lower(text) like '%einstein%' ) OR ( FALSE OR subject like
'%Einstein%' ) ) ORDER BY points desc, volume, number, article.article
LIMIT 10, 0


Is it the function calls to lower() which I have yet to implement on the
second query?

Is it the sheer number of rows being returned?

Do a lot of "OR" sub-parts to the WHERE drag it down?

Article has ~17000 records in it.
The 'text' field is the actual contents of a magazine article.

I would ask if it was the ~* (REGEXP) but that hasn't even kicked in for
this single-term ('Einstein') input! :-^

We're talking about minutes instead of seconds here.

All fields are of type 'text'

VACUUM VERBOSE ANALYZE is running nightly

/proc/cpuinfo sez:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III CPU family      1400MHz
stepping        : 1
cpu MHz         : 1406.005
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse
bogomips        : 2804.94

Finally, any "rules of thumb" about that one 512 RAM size thingie in
postmaster.conf would be especially appreciated...

If you're willing to actually poke at the search engine with other inputs,
I'd be happy to provide a URL off-list.




Re: Query Analysis

От
Josh Berkus
Дата:
"typea":

> Why does this take minutes:
>
> SELECT DISTINCT *, 0 + (0 + 10 * (lower(title) like '%einstein%') ::int +
> 10 * (lower(author_flattened) like '%einstein%') ::int + 30 *
> (lower(subject_flattened) like '%einstein%') ::int + 30 * (lower(text)
> LIKE '%einstein%') ::int + 9 * (substring(lower(title), 1, 20) like
> '%einstein%') ::int + 25 * (substring(lower(text), 1, 20) LIKE
> '%einstein%') ::int ) AS points FROM article WHERE TRUE AND (FALSE OR
> (lower(title) like '%einstein%') OR (lower(author_flattened) like
> '%einstein%') OR (lower(subject_flattened) like '%einstein%') OR
> (lower(text) LIKE '%einstein%') ) ORDER BY points desc, volume, number,
> article.article LIMIT 10, 0
>
> while this takes seconds:
>
> SELECT *, 0 + 3 * ( title like '%Einstein%' )::int + 3 * ( author like
> '%Einstein%' )::int + ( ( 1 + 1 * ( lower(text) like '%einstein%' )::int )
> + ( 0 + ( subject like '%Einstein%' )::int ) ) AS points FROM article
> WHERE TRUE AND title like '%Einstein%' AND author like '%Einstein%' AND (
> ( TRUE AND lower(text) like '%einstein%' ) OR ( FALSE OR subject like
> '%Einstein%' ) ) ORDER BY points desc, volume, number, article.article
> LIMIT 10, 0

It's probably mostly the SELECT DISTINCT, which aggregates records and is
therefore slow.    Try running EXPLAIN ANALYZE to see what steps are actually
taking the time.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Query Analysis

От
Дата:
Since it's 7.1.3 I don't have the "ANALYZE" bit in EXPLAIN, but:

archive_beta=> explain SELECT DISTINCT *, 0 + (0 + 10 * (lower(title) like
'%einstein%') ::int + 10 * (lower(author_flattened) like '%einstein%')
::int + 30 * (lower(subject_flattened) like '%einstein%') ::int + 30 *
(lower(text) LIKE '%einstein%') ::int + 9 * (substring(lower(title), 1,
20) like '%einstein%') ::int + 25 * (substring(lower(text), 1, 20) LIKE
'%einstein%') ::int ) AS points FROM article WHERE TRUE AND (FALSE OR
(lower(title) like '%einstein%') OR (lower(author_flattened) like
'%einstein%') OR (lower(subject_flattened) like '%einstein%') OR
(lower(text) LIKE '%einstein%') ) ORDER BY points desc, volume, number,
article.article LIMIT 10, 0 ;
NOTICE:  QUERY PLAN:

Limit  (cost=1374.97..1375.02 rows=1 width=212)
  ->  Unique  (cost=1374.97..1375.02 rows=1 width=212)
        ->  Sort  (cost=1374.97..1374.97 rows=1 width=212)
              ->  Seq Scan on article  (cost=0.00..1374.96 rows=1 width=212)

EXPLAIN
archive_beta=> explain  SELECT *, 0 + 3 * ( title like '%Einstein%' )::int
+ 3 * ( author like  '%Einstein%' )::int + ( ( 1 + 1 * ( lower(text) like
'%einstein%' )::int )  + ( 0 + ( subject like '%Einstein%' )::int ) ) AS
points FROM article  WHERE TRUE AND title like '%Einstein%' AND author
like '%Einstein%' AND (  ( TRUE AND lower(text) like '%einstein%' ) OR (
FALSE OR subject like  '%Einstein%' ) ) ORDER BY points desc, volume,
number, article.article  LIMIT 10, 0;
NOTICE:  QUERY PLAN:

Limit  (cost=1243.48..1243.48 rows=1 width=212)
  ->  Sort  (cost=1243.48..1243.48 rows=1 width=212)
        ->  Seq Scan on article  (cost=0.00..1243.47 rows=1 width=212)

While the first one is higher, these two do not seem drastically different
to me -- Those numbers are accumulative, right?  So the top row is my
"final answer"  The extra Unique row doesn't seem to be adding
significantly to the numbers as far as EXPLAIN can tell...

And yet the queries are orders of magnitude apart in actual performance.

'Course, I don't claim to completely understand the output of EXPLAIN yet
either.

I also took out the DISTINCT in the first one, just to test.  It was
certainly "faster" but not nearly so much that it "caught up" to the other
query.

Thanks in advance for any help!




Re: Query Analysis

От
Дата:
I think I've narrowed down my problem space a bit.

Playing with various "fast" versus "slow" queries leads me to ask:

GIVEN:

15,000 reocrds with a 'text' field named 'text'
Average 'text' length about 10 K.
Full text search using lower() and LIKE and even ~* sometimes on that
field with a keyword.

What can be done to maximize performance on such large chunks of text?

More RAM?
Tweak that 512 number in postmaster.conf?
Faster CPU?
Is my only option to resort to a concordance?




Re: Query Analysis

От
Tom Lane
Дата:
<typea@l-i-e.com> writes:
> 15,000 reocrds with a 'text' field named 'text'
> Average 'text' length about 10 K.
> Full text search using lower() and LIKE and even ~* sometimes on that
> field with a keyword.

Consider using a full-text-indexing method (there are a couple in
contrib, and OpenFTS has a website that was mentioned recently in
the mail lists).

            regards, tom lane