Michael Artz wrote:
>> > I have primary table that holds ip information
>> > and two other tables that hold event data for the specific IP in with
>> > a one-to-many mapping between them, ie:
>> [snip]
>> > There is quite a bit of commonality between the network_events and
>> > host_events schemas, but they do not currently share an ancestor.
>> > ip_info has about 13 million rows, the network_events table has about
>> > 30 million rows, and the host_events table has about 7 million rows.
>> > There are indexes on all the rows.
>>
>> What indexes though. Do you have (name,ip) on the two event tables?
>
> All the columns are indexed individually. The tables are completely
> static, as I reload the whole DB with new data every day.
The point of a (name,ip) index would be to let you read off ip numbers
in order easily.
>> How selective is "name" - are there many different values or just a few?
>> If lots, it might be worth increasing the statistics gathered on that
>> column (ALTER COLUMN ... SET STATISTICS).
>> http://www.postgresql.org/docs/8.2/static/sql-altertable.html
>
> I guess that is the heart of my question. "name" is not very
> selective (there are only 20 or so choices) however other columns are
> fairly selective for certain cases, such as 'port'. When querying on
> and unusual port, the query is very fast, and the single UNIONed
> subselect returns quickly. When 'port' is not very selective (like
> port = '80', which is roughly 1/2 of the rows in the DB), the dual
> subselect query wins, hands-down.
>
> And I have altered the statistics via the config file:
> default_statistics_target = 100
> Perhaps this should be even higher for certain columns?
You're probably better off leaving it at 10 and upping it for the vital
columns. 25 for names should be a good choice.
You could try partial indexes for those cases where you have
particularly common values of name/port:
CREATE INDEX idx1 ON host_events (ip) WHERE port=80;
--
Richard Huxton
Archonet Ltd