Re: Performance Optimization for Dummies 2 - the SQL

От: Carlo Stonebanks
Тема: Re: Performance Optimization for Dummies 2 - the SQL
Дата: ,
Msg-id: eg6f8l$2det$1@news.hub.org
(см: обсуждение, исходный текст)
Ответ на: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks")
Ответы: Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure")
Список: pgsql-performance

Скрыть дерево обсуждения

Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Alex Stapleton, )
   Re: Performance Optimization for Dummies 2 - the SQL  (Markus Schaber, )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
  index growth problem  (Graham Davis, )
   Re: index growth problem  ("Jim C. Nasby", )
    Re: index growth problem  (Graham Davis, )
     Re: index growth problem  ("Jim C. Nasby", )
    Re: index growth problem  (Tom Lane, )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Tom Lane, )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
   Re: Performance Optimization for Dummies 2 - the SQL  ("Jim C. Nasby", )
    Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
   Re: Performance Optimization for Dummies 2 - the SQL  ("Jim C. Nasby", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Tom Lane, )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Scott Marlowe, )
   Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
    Re: Performance Optimization for Dummies 2 - the SQL  ("Jim C. Nasby", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Tom Lane, )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Jim C. Nasby", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Shaun Thomas, )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )

> how did you determine that it is done every 500 rows? this is the

The import program pages the import table - it is currently set at 500 rows
per page. With each page, I run an ANALYZE.

> default autovacuum paramater.  if you followed my earlier
> recommendations, you are aware that autovacuum (which also analyzes)
> is not running during bulk inserts, right?

It's intuitivly obvious, but I can't do bulk inserts. It's just not the
nature of what we are doing with the data.

> imo, best way to do big data import/conversion is to:
> 1. turn off all extra features, like stats, logs, etc

done

> 2. use copy interface to load data into scratch tables with probably
> all text fields

done

> 3. analyze (just once)

I think this doesn't apply in our case, because we aren't doing bulk
inserts.

> 4. use big queries to transform, normalize, etc

This is currently being done programmatically. The nature of what we're
doing is suited for imperitive, navigational logic rather than declarative,
data set logic; just the opposite of what SQL likes, I know! If there's some
way to replace thousands of lines of analysis and decision trees with
ultrafast queries - great...

> important feature of analyze is to tell the planner approx. how big
> the tables are.

But the tables grow as the process progresses - would you not want the
server to re-evaluate its strategy periodically?

Carlo

>
> merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>




В списке pgsql-performance по дате сообщения:

От: "Carlo Stonebanks"
Дата:
Сообщение: Re: Performance Optimization for Dummies 2 - the SQL
От: "Craig A. James"
Дата:
Сообщение: Simple join optimized badly?