Re: testing/predicting optimization using indexes

Поиск
Список
Период
Сортировка
От TJ O'Donnell
Тема Re: testing/predicting optimization using indexes
Дата
Msg-id 41F819C3.4080507@acm.org
обсуждение исходный текст
Ответ на Re: testing/predicting optimization using indexes  (PFC <lists@boutiquenumerique.com>)
Ответы Re: testing/predicting optimization using indexes  (PFC <lists@boutiquenumerique.com>)
Список pgsql-sql
I was puzzled as to why my search slowed down when I added columns.
The VACUUM did not restore the former speed,
which I had obtained before adding the columns.
So, I rebuilt the table with only the smiles column and my original
speed was again obtained (not surprising).
After I added the extra columns, it slowed down again.
Finally, I built the table with all the additional columns created
during the initial creation of the table.  The original speed was obtained!
I conclude that the addition of columns after building all the rows of
a table somehow makes the table access less efficient.  Is this generally
true?  Is there a more efficient way to add columns to a table after its
initial construction?

The secondary issue was one of using an index on the additional columns.
This greatly speeds up the overall search, by limiting the number of
rows needing to use oe_matches.  I am currently working on optimizing the
number and nature of these extra columns.  However, my initial question
still remains.  Once I find a good set of columns to use as an index,
will I then get even greater speed by defining a new data type and an
index method equivalent to my multi-column index?

Here are the data you requested.  I think this is less important now that
I know I should create all my columns from the beginning.
Thanks for the tip on how to compute average time spent in my
oe_matches functions.  This will be very useful for future optimization.

SELECT count(*) FROM structure
237597

SELECT avg(length(smiles)) FROM structure
37.6528912402092619

VACUUM FULL ANALYZE structure
(no output)

EXPLAIN ANALYZE SELECT oe_matches(smiles,'c1ccccc1CC(=O)NC') FROM  structure
Seq Scan on structure  (cost=0.00..7573.96 rows=237597 width=41) (actual time=17.443..15025.974 rows=237597 loops=1)
Total runtime: 16786.542 ms

EXPLAIN ANALYZE SELECT smiles FROM structure
Seq Scan on structure  (cost=0.00..6979.97 rows=237597 width=41) (actual time=0.067..735.884 rows=237597 loops=1)
Total runtime: 1200.661 ms


TJ


PFC wrote:
> 
>> I'm quite happy with the speedup in 3, but puzzled over the slowdown 
>> in  2.
> 
>     Could you provide :
> 
>     - SELECT count(*) FROM structure;
>     => NRows
>     - SELECT avg(length(smiles)) FROM structure;
> 
>     Then VACUUM FULL ANALYZE structure
>     Redo your timings and this time post EXPLAIN ANALYZE
> 
>     Also your query returns 1313 rows, so wan you post :
> 
> EXPLAIN ANALYZE SELECT oe_matches(smiles,'c1ccccc1CC(=O)NC') FROM  
> structure;
> => time T1
> EXPLAIN ANALYZE SELECT smiles FROM structure;
> => time T2
> 
> (T1-T2)/(NRows) will give you an estimate of the time spent in each  
> oe_matches call.
> 
>     Also note that for postgres (a,b) > (c,d) means ((a>c) and (b>d)), 
> which  can be misleading, but I think that's what you wanted.


В списке pgsql-sql по дате отправления:

Предыдущее
От: Andrew Hammond
Дата:
Сообщение: Re: same question little different test MSSQL vrs Postgres
Следующее
От: PFC
Дата:
Сообщение: Re: testing/predicting optimization using indexes