Re: raising the default default_statistics_target

Поиск
Список
Период
Сортировка
От scott.marlowe
Тема Re: raising the default default_statistics_target
Дата
Msg-id Pine.LNX.4.33.0403091009280.6001-100000@css120.ihs.com
обсуждение исходный текст
Ответ на Re: raising the default default_statistics_target  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: raising the default default_statistics_target  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sun, 7 Mar 2004, Tom Lane wrote:

> Neil Conway <neilc@samurai.com> writes:
> > Tom Lane wrote:
> >> This is something we need to consider, but we'll need more evidence
> >> before making a choice.  One thing that we have very little data about
> >> is how much difference it makes in the quality of planner choices.
> 
> > Right, but is there a practical way to actually get this data?
> 
> I haven't thought of one yet, but perhaps someone will have an idea.

Hi Tom.  I ran some very simple tests on analyze times and query plan 
times on a very simple table, with data randomly distributed.  The index 
was on a date field, since that's what I was testing last.

This was all done on my 512Meg memory 1.1GHz celeron workstation with an 
IDE drive.  I'd love more input on better testing methodologies here...

with 100k or 1M rows that look kinda like this:  (I'll test 10M rows 
later, which means the dataset won't fit in memory, so there'll be lots of 
access going on.  Right now the 1M row table is 80 meg)
select * from test2 limit 5;                   info                     |         dt          |   id
---------------------------------------------+---------------------+---------Francize perfectible swirling fluctuates
| 2004-05-20 20:12:04 | 2721995Fields chauffeur attentionality grandmother | 2004-04-07 14:36:02 | 2721996Belgium
bilkedexplosively defendant        | 2004-09-16 16:27:22 | 2721997perspectives Buenos Pollux discriminates    |
2004-11-1112:28:31 | 2721998Victorianize Savonarola blackmails sufficed | 2004-02-27 21:17:20 | 2721999
 
(5 rows)

here's what I get with different statistics targets for analyze times:
100k    1M    1M    analyze    analyze    plan    
target    ms    ms    ms    
10    250    875    2    
20    350    1250        
30    430    1500        
40    520    1725        
50    580    1900        
60    690    2100        
70    775    2175        
80    850    2300        
90    950    2400        
100    1000    2600    2.5    
200    1806    3700        
300    2600    4800        
400    2600    5900        
500    2600    7200        
700    2600    9500        
1000    2600    13000    5    

Since this data is randomly distributed, I didn't bother doing a lot of 
testing to see how accurate each target setting was.  If that would be 
useful to know I'd gladly test it, but I was only setting out to test the 
time to analyze and the time to plan.

Note that I only tested 3 targets for planning time, as it didn't seem to 
make a very big difference.  The query was:

select * from test2 where dt between 'march 11, 2004' and 'march 13, 2004';

I also ran some quick tests on smaller tables (1000 and 10k rows) and 
there, the plateau that we see in the 100k analyze shows up much quicker, 
at something like 50 or so.  I.e. the analyze time flattened out quickly 
and higher numbers cost very little if anything.

Since this query was quite an easy plan, I'd expect to need a much more 
complex one to test the increase in planning time, say something that has 
to look at a lot of statistics.  Any particular join type or something 
that's likely to do that?




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: raising the default default_statistics_target
Следующее
От: Lee Kindness
Дата:
Сообщение: ECPG - Specifying connections, TSD, sqlca.