"Exploiting Feature Distributions in Anomaly Diagnosis"
Abstract:
Both operators and users of the Internet are increasingly concerned with the problem of network anomalies ---
attacks, infections, misconfiguations, and other unusual events. The increasing practicality of large-scale flow
capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse
set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis
is as yet unmet. In this talk I will argue that the distributions of packet features (IP addresses and ports)
observed in flow traces reveals both the presence and the structure of a wide range of anomalies. Using entropy
as a summarization tool, I will show that the analysis of feature distributions leads to significant advances on
two fronts: (1) it enables highly sensitive detection of a wide range of anomalies, augmenting detections by
volume-based methods, and (2) it enables automatic classification of anomalies via unsupervised learning. Using
data from two backbone networks (Abilene and Geant), I will show that using feature distributions, anomalies
naturally fall into distinct and meaningful clusters. These clusters can be used to automatically classify
anomalies and to uncover new anomaly types.