Re: IP addresses, NetBlocks, and ASNs

Поиск
Список
Период
Сортировка
От Gary Warner
Тема Re: IP addresses, NetBlocks, and ASNs
Дата
Msg-id 384920706.1031297.1398027711602.JavaMail.root@cis.uab.edu
обсуждение исходный текст
Ответ на Re: IP addresses, NetBlocks, and ASNs  (Steve Atkins <steve@blighty.com>)
Список pgsql-performance
Steve,

The "indexable IPv4 range" described on that pgfoundry site sounds like exactly what I've been trying to figure out.

That said, I also agree with your approach of "lookup and store with each record at fetch time" for exactly the reason
yougive.  If the location of that IP address changes, i don't want the CURRENT location, I want the "at incident time"
location.

Thanks for the tips (though other suggestions are also very welcome...)

I'll contact you off-list for more regarding the project, Steve.

Thanks again!

----------------------------------------------------------

Gary Warner
Director of Research in Computer Forensics
The University of Alabama at Birmingham
Center for Information Assurance and Joint Forensics Research
205.422.2113
gar@cis.uab.edu

-----------------------------------------------------------

----- Original Message -----
From: "Steve Atkins" <steve@blighty.com>
To: "Postgres Performance" <pgsql-performance@postgresql.org>
Cc: "Gary Warner" <gar@cis.uab.edu>
Sent: Sunday, April 20, 2014 9:57:48 AM
Subject: Re: [PERFORM] IP addresses, NetBlocks, and ASNs


On Apr 19, 2014, at 7:12 PM, Gary Warner <gar@cis.uab.edu> wrote:

> Does anyone have some good tricks for mapping IP addresses to ASN numbers in very large volumes?
>
> This is probably more a "how would you approach this problem?" than "can you help me tweak this query"
>
> I have a very large number of IP addresses (many millions) that are found in some security related logs.  I need a
fastway to categorize these back to their ASN numbers and Country codes.  I have a table that was generated from BGP
routingfiles to create a list of netblocks and their corresponding ASNs.  For small groups of IP addresses (hundreds or
thousands)it has always been fast enough to easily just do something like: 
>
> select count(*), asn_number from logaddr a, ipv4 b where a.ip <<= b.net_block

For decent performance, you might want to look at http://ip4r.projects.pgfoundry.org. That'll let you use indexes for
"isthis ip address in any of the ranges in this table.". 

(I see hints that there'll be similar support for inet in 9.5, but I've not looked at the code).

> where logaddr has information about IP addresses encountered and
> where "ipv4" has a list of mappings of which netblocks belong to which ASNs (yeah, it is 525,000+ Netblocks ...)
>
> I'm trying to figure out if there is a better way to use the "cidr" netblock to speed up the look up and matching.
>
> I've tried playing with going from ASN to the matching IP addresses, or the other way around ... for example, here is
theExplain  for looking at all the IP addresses in the logaddr (currently about 9 million records) that use ASN = 2119: 
>
>
> explain select count(distinct ip), country_code, asn_number from ipv4 a, logaddr b
> where b.ip <<= a.net_block and a.asn_number = 2119
> group by country_code, asn_number;
>
>
> "GroupAggregate  (cost=36851922.32..38215407.51 rows=4 width=18)"
> "  ->  Sort  (cost=36851922.32..37192793.61 rows=136348515 width=18)"
> "        Sort Key: a.country_code, a.asn_number"
> "        ->  Nested Loop  (cost=0.00..4448316.05 rows=136348515 width=18)"
> "              Join Filter: ((b.ip)::inet <<= (a.net_block)::inet)"
> "              ->  Seq Scan on logaddr b  (cost=0.00..347394.01 rows=9089901 width=7)"
> "              ->  Materialize  (cost=0.00..10466.66 rows=30 width=18)"
> "                    ->  Seq Scan on ipv4 a  (cost=0.00..10466.51 rows=30 width=18)"
> "                          Filter: (asn_number = 2119)"
>
>
> Just the first netblock from that ASN (2.148.0.0/14) occurs around 7,300 times, and does ok if I was just looking for
that... but I need a way to do "big aggregate queries" like "what were the most common ASNs in this whole log file?"
or"how often does country XX occur in this log file?" 

ip4r is pretty good (and does support v6 too, despite the name).

If you need yet more performance (and you probably don't) the approach I've moved to using is to tag the records with
ASNon import, using an in-core patricia trie kept in sync with the address ranges listed in the database to do the
work.It's faster still, at the cost of a fair bit more work during import. (It's also a little more accurate in some
cases,as the mapping of IP address to list of ASNs is dynamic, and you usually want the ASN at the time of an incident,
notthe one now.) 

Cheers,
  Steve



В списке pgsql-performance по дате отправления:

Предыдущее
От: Steve Atkins
Дата:
Сообщение: Re: IP addresses, NetBlocks, and ASNs
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Fast distinct not working as expected