CAIDA Internet eXchange Points (IXPs) Dataset
An Internet eXchange Point (IX or IXP) is a physical infrastructure used by Internet service providers (ISPs) and content delivery networks (CDNs) to exchange Internet traffic between their networks (Autonomous Systems - ASes). An IXP can be distributed and located in numerous data centers (aka facilities), and a single facility can contain mutliple IXPs. Each IXP has a prefix, or collection of prefixes, which are used by companies/ASes to address machines within the IXP infrastructure. An AS connected to a given IXP is known as a member of that IXP. Internet traffic exchange through an IXP makes use of Border Gateway Protocol (BGP) that recognizes ISPs and CDNs by their Autonomous System Numbers (ASNs).
Sources of Data
In order to make the most complete list of IXPs we combined information available from the following sources:
- PeeringDB(PDB)
- Hurricane Electric (HE)
- Packet Clearing House (PCH)
We also used GeoNames data (readme,download) to derive relevant geographic information.
Methodology
First, we downloaded the GeoNames data set and created a local sqlite database of geographic coordinates indexed on the name, asciiname, and alternative names of cities and villages. If we could not find a match between the name of the city where a certain IXP is located and any of the location strings in the database, we assigned negative geo_ids to those IXPs.
Next, we tried to identify the cases when IXPs listed in the different data sources are in fact the same. This is a non-trivial task since IXPs names, cities and addresses could be (and are) spelled differently. We first merged IXPs found in different sources which have the same set of prefixes. For the remaining IXPs, we calculated the Levenshtein distance between the IXPs names. IXPs with the names more than 4 characters long and for which the distance was less than 2, not determined by the first or last characters of each string, were assumed to be identical. For example, the Levenshtein distance between "Equinix Sào Paulo" and "Equinix Sao Paulo" is 1 (one character is different in those names); therefore, we decide that both designate the same IXP. The names "BIX" and "CIX" are also off by one character, but they are only 3 characters long, and thus we treat them as referring to two different IXPs. Finally, "FICIX2" and "FICIX3" are long enough and also have only one character difference, but it is the last character of each string, and we conclude that the "2" and "3" indicate different IXPs.
Although many IXPs are distributed across multiple facilities, only PDB database provides detailed location information about multiple facilities for individual IXPs. We use the PDB information directly to create facility records with all their geographic fields (street address, zipcode, city, state, country, and region) populated. In contrast, both PCH and HE include only a single facility location for each IXP in their database, and typically localize it only at the city level accuracy. Thus, we create a facility placeholder record from PCH and HE data using the most specific geographic data these databases provide. To populate geographic fields for an IXP record, we assign a specific value to a given field only if this value is the same in all facilities or facility placeholder records for this IXP.
Format
All files are in JSONL (JSON Lines) format with comment lines starting with '#' and all other lines containing a single object in JSON format. JSONL can be converted to JSON with jsonl_to_json.py tool. All files begin with a commented meta data line showing when the file was produced.
File ixs.jsonl contains information about individual IXPs. Each IXP is assigned its own "ix_id". The "pch_id" and "pdb_id" values match the IXP ids in the original sources, Packet Clearing House (PCH) and PeeringDB (PDB) respectively. (IXP entries in the Hurricane Electric (HE) database do not have a similar id field.) Among those sources, PDB is the only one that provides organizational information. Therefore, our "org_id" values are the same as "pdb_org_id".
{ "ix_id": 3, "pch_id": 1461, "pdb_id": 639, "pdb_org_id": 8375, "name": "Calgary Internet Exchange", "alternatenames": [ "YYCIX Calgary Internet Exchange", "YYCIX" ], "geo_id": 5913490, "city": "Calgary", "state": "AB" "country": "CA", "region": "North America", "sources": [ "pdb", "pch", "he" ], "url": [ "https:\/\/www.yycix.ca\/", "http:\/\/yycix.ca" ], "prefixes": { "ipv4": [ "206.126.225.0\/24" ], "ipv6": [ "2001:504:2f::\/64" ] }, }
File facilities.jsonl contains information about individual facilities. The "clli" value is CLLI name or a COMMON LANGUAGE Location Identifier Code, an identifier used within the North American telecommunications industry. Other fields are self-explanatory.
{ "fac_id": 1110 "pdb_fac_id": 2410, "pdb_org_id": 12757, "name": "City of Calgary - City Hall ", "latitude": 51.04551, "longitude": -114.056326, "address": "800 Macleod Trail S.E.", "zipcode": "T2P 2M5", "state": "AB", "country": "CA", "city": "Calgary", "clli": "calgar", "sources": [ "pdb" ] }
File ix-facilites.jsonl contains mapping between facilities and IXPs. (Note that it is "many-to-many" mapping since the same IXP can be present in a number of facilities and a given facility can host many IXPs.) The example below means that the IXP with ix_id value of 3 (Calgary Internet Exchange shown in the first listing) has presence at the facility with fac_id value of 1110 (shown in the listing above). For IXPs present at multiple facilities, our data set contains multiple records with the same ix_id and different fac_id's.
{ "ix_id": 3, "fac_id": 1110 }
File ix-asns.jsonl shows IP addresses used at a given IXP by each member AS.
{ "asn": "23467", "ipv4": [ "206.108.115.28", "206.108.115.27" ], "ipv6": [ "2001:504:38:1:0:a502:3467:2", "2001:504:38:1:0:a502:3467:1" ], "ix_id": 6 }
File organizations.jsonl contains the information about each organization learned from PDB. These records can be linked to the corresponding facilities records by matching their respective pdb_org_id values.
{ "org_id": 229, "pdb_org_id": 229, "name": "Init7 (Switzerland) Ltd", "address": "Technoparkstrasse 5", "zipcode": "8406", "city": "Winterthur", "state": "ZH", "country": "CH", "url": "http:\/\/www.init7.net\/", }
File locations.jsonl is similar to the geoname locations, but contains negative "geo_id"s for those entries where geographic locations of IXPs were not found in the geonames dataset. A full description of the fields can be found here.
{ "geo_id": 5391811, "geoname_id": 5391811, "name": "San Diego", "asciiname": "San Diego", "alternatenames": [ "davis' folly", "didacopolis", "gorad san-dyega", "graytown", "lungsod ng san diego", "new san diego", "san", "san diegas", "san diego", "san diegu", "san dijego", "san diyego", "san diy\u00e9go" ], "latitude": 32.71533, "longitude": -117.15726, "feature_class": "P", "feature_code": "PPLA2", "country_code": "US", "cc2": "", "admin1_code": "CA", "admin2_code": "073", "admin3_code": "", "admin4_code": "", "population": 1394928, "elevation": 20, "dem": 31 }
Data Access
Please read the terms of the CAIDA Acceptable Use Agreement (AUA) for Publicy Accessible Datasets below:
As required by the AUA, if you use this dataset in any publication (including but not limited to: papers, presentations, web pages, and papers published by a third party) please include the following reference:
The CAIDA UCSD IXPs Dataset, <date range used>Please report all your publications (papers, presentations, class projects, websites etc.) to CAIDA.
https://www.caida.org/catalog/datasets/ixps/