Algorithms for Large Data Sets: Theory and Practice (Online)

Course

Course Catalog URL:

Identifier:

COM S 535

Offered during Fall Semester each year.

Credits and contact hours: 3 credits, 3 contact hours
Instructor’s or course coordinator’s name: Pavankumar Aduri
Text book, title, author, and year: None required
Other supplemental materials: Mining of Massive Data Sets, Leskovec, Rajaramn and Ullman

Specific course information

Brief description of the content of the course: Algorithmic challenges involved in solving computational problems on massive data sets. Probabilistic data structures, Curse of Dimensionality and dimensionality reduction, locality sensitive hashing, similarity measures, matrix decompositions. Optimization problems in massive data analysis. Computational problems that arise in the context of web search, social network analysis, online advertising etc. Practical aspects include implementation and performance evaluation of the algorithms on real world data sets. Graduate credit requires a written report on current research.
Prerequisites or co-requisites: COM S 311 or equivalent; for graduate credit: graduate standing or permission of instructor
Required, elective, or selected elective? Selected Elective

Specific goals for the course

Ability to apply mathematical ideas and algorithmic principles in modeling computational problems that arise in the context of massive data sets. (1)
Ability to use the theoretical tools in a practical environment and design programs to solve computational problems that arise in the context of massive data sets. (6)

Brief list of topics to be covered

Basics of Probability Theory
- Random Variables
- Expectation, variance
- Markov, Chebyshev and Chernoff Bounds
Probabilistic Data Structures
- Theory behind hashing
- Memory efficient data structures: Bloom Filters, CountMin Sketch
Similar Items
- Measuring similarity among items
- Document similarity
- Dimensionality reduction
- Locality sensitive hashing
- Latent semantic analysis
Components of Search Engine
- Crawling and ranking
  - Web crawlers
  - Google’s page rank algorithm
  - Hubs and authorities in web
Social Network Analysis
- Substructures and communities
- Influence propagation
Matrix Decompositions
Online Adverting and Recommendation Systems
- Advertising on Web, placing ads along with search results, Ad words problem
- Predicting items an individual user might like

Search form