Understanding the Internet AS Topology & its Applications
Autonomous Systems (AS) in the Internet use BGP to perform interdomain routing. A set of import and export policies at an AS make up the routing table of an AS. Since AS relationships are not publicly available, several studies have proposed heuristic algorithms for inferring AS relationships using publicly available BGP data. Content Delivery Network (CDN) servers placed around the world cater to the needs of clients that access their content. Since, the majority of the Internet traffic today is content delivery traffic, it is important to study the efficiency of the routing paths from users to content servers which are not under the control of content providers. Netflix and Akamai are two major CDN providers. When a user accesses content hosted on these networks, the Quality of Service (QoS) is crucial for a seamless user experience. Hence, it is important for CDNs to choose a server that optimizes on the QoS when a user requests content from its network. Due to lack of authentication of routes in BGP, prefixes are prone to being hijacked by ASes to which the prefixes do not belong. The mechanisms used to address this is to detect the hijack after it has happened and react to it. A more preventive mechanism is necessary to prevent it from happening in the first place. A recent work introduced the identification of serial hijackers that would enable such a solution. Unfortunately, the ground truth of serial hijackers is very small.
We present a machine learning approach to edge type inference in AS graphs. We use our method to train classifiers for three AS graphs derived from different data sources-a BGP graph, a traceroute graph, and an IRR graph. The classifier annotated the edges into p2c and p2p edge type. We merge the three individual graphs to obtain a combined graph and propose a method to compute edge types in the combined graph. We analyze the characteristics of the three individual graphs and the combined graph and show that combining the three individual graphs gives us a significantly more complete view of both the p2p and p2c ecosystems in the Internet. We also present a method to compute the customer cones of peering networks using PCH data.
We conduct a case study of Netflix to understand the efficiency of the AS paths from various access ISPs to Netflix servers deployed at IXPs in different regions of the world. We discover inefficient AS paths in Europe, North America, and South America. Paths in South America are especially inefficient as many of them leave the continent. We also analyze long paths in each region, explore their causes, and propose ways to avoid long paths. We measure the latency of paths from residential ISPs to Akamai servers in the United States and observe the variation in the path quality at different times of the day at different client and server locations by using active measurements from RIPE Atlas probes, and httping measurements at Iowa State University. Based on our observations, we propose a server selection strategy that takes advantage of a low latency server or maximum throughput server with an accuracy of over 98% at two different client locations in the United States. We observe that the Akamai server choice does not always pick the minimum latency nor maximum throughput server, and that the optimum server based on throughput or latency is not always the geographically closest server.
We try to make the process of gathering the serial hijacker ground truth easier than manually going through the available mailing list by using a document classifier that can classify sources of interest from which the serial hijacker information can be derived from. The resulting classifier can identify the document sources of such BGP hijacking information with 89% accuracy.
Committee: Lu Ruan (major professor), Jin Tian, Ying Cai, Kris De Brabanter, Carl Chang, and Pavan Aduri