Re: selects from large tables

Поиск
Список
Период
Сортировка
От Nikk Anderson
Тема Re: selects from large tables
Дата
Msg-id DA1274E682D3734B8802904A9B36124C298A99@nic-nts1.nic.parallel.ltd.uk
обсуждение исходный текст
Ответ на selects from large tables  (Nikk Anderson <Nikk.Anderson@parallel.ltd.uk>)
Список pgsql-performance

Hi Charlie,
We do a vacuum analyze every night at midnight.  I thought that perhaps the analyzing was not being done correctly, so I manually did a vacuum analyze and the estimated row counts were way still out. 

I will look into clustering the data and see what effect that may have.

Thanks

Nikk

-----Original Message-----
From: Charles H. Woloszynski [mailto:chw@clearmetrix.com]
Sent: 18 November 2002 15:46
To: Nikk Anderson
Cc: 'Stephan Szabo'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] selects from large tables

Nikk:

Are you doing vaccums on these tables?  I was under the understanding
that the estimated row count should be close to the real row count
returned, and when it is not (as it looks in your case), the primary
reason for the disconnect is that the stats for the tables are
out-of-date. 

Since it used the indexes, I am not sure if the old stats are causing
any issues, but I suspect they are not helping. 

Also, do you do any clustering of the data (since the queries are mostly
time limited)?  I am wondering if the system is doing lots of seeks to
get the data (implying that the data is all over the disk and not
clustered). 

Charlie

Nikk Anderson wrote:

> Hi,
> Thanks for the reply Stephen, the data is 'somewhat' realistic.....
>
> The data in the table is actually synthetic, but the structure is the
> same as our live system, and the queries are similar to those we
> actually carry out.
>
> As the data was synthetic there was a bit of repetition (19 million
> rows of repetition!! ) of the item used in the where clause, meaning
> that most of the table was returned by the queries - oops!  So, I have
> done is some more realistic queries from our live system, and put the
> time it takes, and the explain results.  Just to note that the
> explain's estimated number of rows is way out - its guesses are way
> too low.
>
> Typically a normal query on our live system returns between 200 and
> 30000 rows depending on the reports a user wants to generate.  In
> prior testing, we noted that using SELECT COUNT( ..   was slower than
> other queries, which is why we though we would test counts first.
>
>
> Here are some more realistic results, which still take a fair whack of
> time........
>
>
> Starting query 0
> Query 0: SELECT * FROM xx WHERE time BETWEEN '2002-11-17 14:08:58.021'
> AND '2002-11-18 14:08:58.021' AND job_id = 335
> Time taken = 697 ms
> Index Scan using http_timejobid on xx  (cost=0.00..17.01 rows=4 width=57)
> This query returns 500 rows of data
>
>
> Starting query 1
> Query 1: SELECT * FROM xx WHERE time BETWEEN '2002-11-11 14:08:58.021'
> AND '2002-11-18 14:08:58.021' AND job_id = 335
> Time taken = 15 seconds
> Index Scan using http_timejobid on xx  (cost=0.00..705.57 rows=175
> width=57)
> This query return 3582 rows
>
> Starting query 2
> Query 2: SELECT * FROM xx WHERE time BETWEEN '2002-10-19 15:08:58.021'
> AND '2002-11-18 14:08:58.021' AND job_id = 335;
> Time taken = 65 seconds
> Index Scan using http_timejobid on xx  (cost=0.00..3327.55 rows=832
> width=57)
> This query returns 15692 rows
>
> Starting query 3
> Query 3: SELECT * FROM xx_result WHERE time BETWEEN '2002-08-20
> 15:08:58.021' AND '2002-11-18 14:08:58.021' AND job_id = 335;
>
> Time taken = 241 seconds
> Index Scan using http_timejobid on xx  (cost=0.00..10111.36 rows=2547
> width=57)
> This query returns 48768 rows
>
>
> Cheers
>
> Nikk
>
>
>
>
> -----Original Message-----
> From: Stephan Szabo [mailto:sszabo@megazone23.bigpanda.com]
> Sent: 18 November 2002 13:02
> To: Nikk Anderson
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] selects from large tables
>
>
>
> On Mon, 18 Nov 2002, Nikk Anderson wrote:
>
> > Any ideas on how we can select data more quickly from large tables?
>
> Are these row estimates realistic? It's estimating nearly 20 million rows
> to be returned by some of the queries (unless I'm misreading the
> number - possible since it's 5am here).  At that point you almost
> certainly want to be using a cursor rather than plain queries since even a
> small width result (say 50 bytes) gives a very large (1 gig) result set.
>
> > - Queries and explain plans
> >
> > select count(*) from table_name;
> > NOTICE:  QUERY PLAN:
> > Aggregate  (cost=488700.65..488700.65 rows=1 width=0)
> >   ->  Seq Scan on table_name  (cost=0.00..439527.12 rows=19669412
> width=0)
> >
> > hawkdb=# explain select count(job_id) from table_name;
> > NOTICE:  QUERY PLAN:
> > Aggregate  (cost=488700.65..488700.65 rows=1 width=4)
> >   ->  Seq Scan on table_name  (cost=0.00..439527.12 rows=19669412
> width=4)
> >
> > hawkdb=# explain select * from table_name;
> > NOTICE:  QUERY PLAN:
> > Seq Scan on table_name  (cost=0.00..439527.12 rows=19669412 width=57)
> >
> > hawkdb=# explain select count(*) from table_name where job_id = 13;
> > NOTICE:  QUERY PLAN:
> > Aggregate  (cost=537874.18..537874.18 rows=1 width=0)
> >   ->  Seq Scan on table_name  (cost=0.00..488700.65 rows=19669412
> width=0)
> >
> > hawkdb=# explain select * from table_name where job_id = 13;
> > NOTICE:  QUERY PLAN:
> > Seq Scan on http_result  (cost=0.00..488700.65 rows=19669412 width=57)
> >
> > hawkdb=# explain select * from table_name where job_id = 1;
> > NOTICE:  QUERY PLAN:
> > Index Scan using http_result_pk on table_name  (cost=0.00..5.01 rows=1
> > width=57)
> >
> > hawkdb=#explain select * from table_name where time > '2002-10-10';
> > NOTICE:  QUERY PLAN:
> > Seq Scan on table_name  (cost=0.00..488700.65 rows=19649743 width=57)
> >
> > hawkdb=# explain select * from http_result where time < '2002-10-10';
> > NOTICE:  QUERY PLAN:
> > Index Scan using table_name_time on table_name  (cost=0.00..75879.17
> > rows=19669 width=57)
>

--

Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

В списке pgsql-performance по дате отправления:

Предыдущее
От: Stephan Szabo
Дата:
Сообщение: Re: selects from large tables
Следующее
От: Nikk Anderson
Дата:
Сообщение: Re: selects from large tables