Benchmark Data sets for Intrusion Detection System in Bag of System Calls representation
March 7, 2005, Dae-Ki Kang
Very short introduction
Here, we provide intrusion detection benchmark data sets in bag of system calls representation.
The data sets we transformed to bag of system calls representation are originally from
-
Sequence TIme-Delay Embedding (STIDE) benchmark data sets at University of New Mexico (UNM)
- and Defense Advanced Research Projects Agency (DARPA) evaluation data sets at Massachusetts Institute of Technology (MIT) Lincoln Lab.
In bag of system calls representation, each system call's relative order information in the trace (or sequence) are not preserved.
Instead, the integer frequency count of each system call in the sequence is maintained.
Detailed information could be found in the reference papers [1,2].
Data files
UNM's benchmark data sets -
For each trace generated by a process, our program constructs an ordered list of the frequency counts together with their class label showing "intrusive" or "normal".
For each benchmark data set, the program stores the ordered lists to a data file in WEKA's ARFF format.
- Live lpr ARFF
- Live lpr MIT ARFF
- Synthetic sendmail ARFF
- Synthetic sendmail CERT ARFF
- Denial of Service (DoS) ARFF
MIT Lincoln Lab's DARPA evaluation data sets -
From one omnibus file for a day, our program generates a set of sequences and their associated class labels by parsing "exec" system call in the omnibus file according to network analyzer's output.
- Monday, 4th week, 1998 ARFF
- Tuesday, 4th week, 1998 ARFF
- Thursday, 4th week, 1998 ARFF
- Friday, 4th week, 1998 ARFF
No extra processing, except removing position information and maintaing frequency counts to generate a bag of system calls representation, was done to the data sets.
Programs
Available upon request.
Send email to
.
References
-
D.-K. Kang, D. Fuller, and V. Honavar, "Learning Classifiers for Misuse Detection Using a Bag of System Calls Representation," Proceedings of IEEE International Conference on Intelligence and Security Informatics (ISI-2005), Atlanta, GA, USA, May 19-20, 2005; Lecture Notes in Computer Science, Vol. 3495, pp. 511-516, 2005, Springer-Verlag.
-
D.-K. Kang, D. Fuller, and V. Honavar, "Learning Classifiers for Misuse and Anomaly Detection Using a Bag of System Calls Representation," Proceedings of 6th IEEE Systems Man and Cybernetics Information Assurance Workshop (IAW), West Point, NY, June 15-17, 2005.
Example decision trees