Re: Slow query on OS X box

Поиск
Список
Период
Сортировка
От Patrick Hatcher
Тема Re: Slow query on OS X box
Дата
Msg-id OF2F8C390D.61ED20AE-ON88256CB7.00006621@fds.com
обсуждение исходный текст
Ответ на Slow query on OS X box  ("Patrick Hatcher" <PHatcher@macys.com>)
Список pgsql-performance
Thanks everyone.  I'll give your suggestions a try and report back.

Patrick Hatcher
Macys.Com
Legacy Integration Developer
415-422-1610 office
HatcherPT - AIM





          
                    Rod Taylor <rbt@rbt.ca>
          
                    Sent by:                           To:     Patrick Hatcher <PHatcher@macys.com>
          
                    pgsql-performance-owner@post       cc:     Postgresql Performance
<pgsql-performance@postgresql.org>          
                    gresql.org                         Subject:     Re: [PERFORM] Slow query on OS X box
          

          

          
                    01/22/2003 02:54 PM
          

          

          




Yup, since you still need to pull everything off the disk (the slowest
part), which is quite a bit of data.  You're simply dealing with a lot
of data for a single query -- not much you can do.

Is this a dedicated -- one client doing big selects like this?

Knock your shared_buffers down to about 2000, bump your sort mem up to
around 32MB (128MB or so if it's a dedicated box with a vast majority of
queries like the below).


Okay, need to do something about the rest of the data. 13million * 2k is
a big number.  Do you have a set of columns that are rarely used?  If
so, toss them into a separate table and link via a unique identifier
(int4).  It'll cost extra when you do hit them, but pulling out a few of
the large ones information wise would buy quite a bit.

Now, wizard.  For that particular query it would be best if entries were
made for all the values of wizard into a lookup table, and change
search_log.wizard into a reference to that entry in the lookup.  Index
the lookup table well (one in the wizard primary key -- int4, and a
unique index on the 'wizard' varchar).  Group by the number, join to the
lookup table for the name.

Any other values with highly repetitive data?  Might want to consider
doing the same for them.

In search_log, index the numeric representation of 'wizard' (key from
lookup table), but don't bother indexing numbers that occur regularly.
Look up how to create a partial index.  Ie. The value 'Keyword' could be
skipped as it occurs once in four tuples -- too often for an index to be
useful.


On Wed, 2003-01-22 at 15:49, Patrick Hatcher wrote:
> Sorry I'm being really dense today.  I didn't even notice the 3.2 million
> row being returned. :(
>
> To answer your question, no, all fields would not have data.  The data we
> receive is from a Web log file.  It's parsed and then uploaded to this
> table.
>
> I guess the bigger issue is that when trying to do aggregates, grouping
by
> the wizard field, it takes just as long.
>
> Ex:
> mdc_oz=# explain analyze select wizard,count(wizard) from search_log
where
> sdate
>  between '2002-12-01' and '2003-01-15' group by wizard;
>                                                                   QUERY
> PLAN
>

-----------------------------------------------------------------------------------------------------------------------------------------------

>  Aggregate  (cost=1083300.43..1112411.55 rows=388148 width=10) (actual
> time=229503.85..302617.75 rows=14 loops=1)
>    ->  Group  (cost=1083300.43..1102707.84 rows=3881482 width=10) (actual
> time=229503.60..286014.83 rows=3717161 loops=1)
>          ->  Sort  (cost=1083300.43..1093004.14 rows=3881482 width=10)
> (actual time=229503.57..248415.81 rows=3717161 loops=1)
>                Sort Key: wizard
>                ->  Seq Scan on search_log  (cost=0.00..575217.57
> rows=3881482 width=10) (actual time=91235.76..157559.58 rows=3717161
> loops=1)
>                      Filter: ((sdate >= '2002-12-01'::date) AND (sdate
> <= '2003-01-15'::date))
>  Total runtime: 302712.48 msec
> (7 rows)

> On Wed, 2003-01-22 at 13:26, Patrick Hatcher wrote:
> > I have a table that contains over 13 million rows.  This query takes an
> > extremely long time to return.  I've vacuum full, analyzed, and
> re-indexed
> > the table. Still the results are the same.  Any ideas?

> > mdc_oz=# explain analyze select wizard from search_log where wizard
> > ='Keyword' and sdate between '2002-12-01' and '2003-01-15';
> >                                                          QUERY PLAN
> >
>

-----------------------------------------------------------------------------------------------------------------------------

>
> >  Seq Scan on search_log  (cost=0.00..609015.34 rows=3305729 width=10)
> > (actual time=99833.83..162951.25 rows=3280573 loops=1)
> >    Filter: ((wizard = 'Keyword'::character varying) AND (sdate >
> > = '2002-12-01'::date) AND (sdate <= '2003-01-15'::date))
> >  Total runtime: 174713.25 msec
> > (3 rows)
> --
> Rod Taylor <rbt@rbt.ca>
>
> PGP Key: http://www.rbt.ca/rbtpub.asc
> (See attached file: signature.asc)
>
>
> ______________________________________________________________________
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc
(See attached file: signature.asc)


Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Dann Corbit"
Дата:
Сообщение: Re: [HACKERS] Terrible performance on wide selects
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Proposal: relaxing link between explicit JOINs and execution order