by Angie Hagerty
illustrated by Keo Pierron
The Dependable Data Driven Discovery Institute (D4) is a lot like a jigsaw puzzle that has taken four years to assemble. Hridesh Rajan, interim chair of the Department of Computer Science and Kingland Professor of Data Analytics, leads the effort by connecting experts in computer science, statistics, mathematics and engineering. Their unique blend of talent has built an innovative research hub for sharing data science expertise.
It’s no surprise to Rajan that the pieces continue to fall into place for the D4 Institute.
“When accomplished researchers from multiple disciplines collaborate, you can create something special that is bigger than what each of you can do alone,” Rajan said. “Teaming up allowed us to multiply our potential.”
The core group includes: Pavan Aduri, professor of computer science; Chinmay Hegde, assistant professor of electrical and computer engineering; Daniel Nettleton, department chair and Distinguished Professor of Statistics; and Eric Weber, professor of mathematics. The project also involved Michael Cantanzaro, assistant professor of mathematics; Kevin Liu, assistant professor of computer science; Namrata Vaswani, professor of electrical and computer engineering; Vinodchandran Variyam, professor of computer science at University of Nebraska, Lincoln; Li Wang, associate professor of statistics and Zhengyuan Zhu, professor of statistics.
What began as an idea that Rajan sketched in a notebook has blossomed into a state-of-the-art clearinghouse for studying the data science pipeline. The group recently secured funding from the National Science Foundation and is laser focused on educating practitioners and data science experts about the ethical, legal, social and economic impacts of living in a data-driven society.
Research that examines the ‘data pipeline’
“Data permeates nearly every corner of our lives, and every day millions of data-driven decisions happen across the globe,” Rajan said. “The D4 Institute will study the entire data science life cycle in order to improve the reliability of data-driven decisions that impact an untold number of people.”
Rajan notes that it’s no longer enough to focus solely on the accuracy of data or machine-learning algorithms. Studying the data science life cycle holistically is just as critical. These processes can profoundly impact whether or not data-driven discoveries and data-driven decisions are sound.
In a data-driven world, sound data science life cycles are imperative. Data-driven decisions can have profound and life-altering impacts on people’s lives careers and futures—and even on whole societies.
“Whether a person is admitted into a college, approved for a mortgage—or serves a shorter or longer prison sentence—is determined today by a data science life cycle,” Rajan said. “It’s critical to fully examine the systems that gather, store and extract the knowledge and that’s what we’re doing at the D4 Institute.
“It’s very important to broaden the scope and analyze the entire data pipeline to understand how data-driven decisions are made and if those systems are producing reliable, timely and trustworthy data,” he said. “Our research attempts to expose and correct flaws in these systems.”
The ‘corner pieces’ of the D4 Institute
Google’s Engineering Director, Rajesh Parekh, (MS ’93 computer science, Ph.D. ’98 computer science), addresses a packed Gold Room in the Memorial Union.
When someone begins a puzzle, they usually lay out the corner pieces first. This jump-starts the process and begins the work of building out that initial foundation. The 2016 Midwest Big Data Summer School was one of the first ‘corner pieces’ from which the D4 Institute was built.
Rajan and Aduri were in uncharted waters when they agreed to organize the first Midwest Big Data Summer School, an intensive, week-long curriculum of workshops and lectures for students and early-career faculty who are interested in delving into data science research.
The program is geared toward a broad audience that extends beyond the computer science arena—from students studying sociology and economics, to faculty members from psychology, journalism and more.
“The first year, registration for the event filled in 24 hours and we had to secure a bigger venue,” Rajan said. “This was a pleasant surprise, signaling a very real and significant interest in data science at Iowa State. Pavan Aduri and I were both worried and excited at the same time.”
The program’s success led Rajan’s group to lobby for the development of a data science program at Iowa State. They pointed to the overwhelming success of the first Midwest Big Data Summer School as evidence that a program would thrive on campus. Their voices were heard and the group assisted in the development of Iowa State’s inaugural data science curriculum. Today, students from all academic corners of the university can earn a data science certificate, minor or major.
“We built on all of these initial successes. Then we quickly pivoted to research and other ambitious projects,” said Pavan Aduri, professor of computer science. “Our group met regularly, rebranded as the Theoretical and Applied Data Science (TADS) Initiative and applied for NSF funding.”
Their hard work paid off. In the fall of 2019, the team secured a $1.5 million NSF grant through the highly competitive TRIPODS (Transdisciplinary Research in Principles of Data Science) program. The funds seeded the development of the D4 Institute and will fund its research until 2022.
Tracy Kimbrel, program director at the NSF has high praise for the group’s commitment to improving the data science life cycle.
“The D4 Institute at Iowa State brings a unique focus on all aspects of end-to-end dependability in data science to the program’s portfolio,” Kimbrel said. “This project has the potential to increase confidence and reduce risk in the data-driven decision-making that increasingly impacts the lives of our citizens on a daily basis.”
Filling in essential pieces for students
Sumon Biswas (’21 computer science, Ph.D.) was immediately drawn to Iowa State’s computer science program, in part because of Rajan’s group. Research opportunities related to the data science field matched nicely with his career goals. Biswas was particularly drawn to the group’s commitment to researching the data science pipeline.
“My research interests are very specific and tailored. My career focus blends software engineering, programming languages and data science,” Biswas said. “The varied research opportunities at Iowa State, in particular with the D4 Institute, allowed me to become an entrepreneur of sorts and design my own career that fit with my goals.”
Rajan has provided Biswas with a rich array of opportunities that have shaped his career path. In addition to engaging in cutting-edge research on the data science life cycle, Biswas provided significant contributions to the development of the successful TRIPODS NSF grant. He also attended the Midwest Big Data Summer School where he learned cutting-edge research methods that further drew him into studying the data science life cycle.
“It’s been incredible,” Biswas said. “I’ve learned novel research ideas from D4 researchers and practitioners who have introduced me to studying the data science pipeline and its properties.”
Biswas is close to publishing his own research which he conducted at the D4 Institute.
“It’s exciting to be involved in research that could improve software systems, which affect many people who are impacted by data-driven decisions,” he said.
Rajan and his team plan to hire additional undergraduates, graduate students and postdocs at the D4 Institute. More students, like Biswas, will benefit from the experience of conducting NSF-funded research and working with seasoned experts who collaborate on studies.
Completing the D4 picture
Rajan and the team will continue to add capabilities and resources that allow the D4 Institute to expand its research efforts and gather additional data.
The pieces continue to come together for the D4 Institute as the group’s longstanding members look to the future.
“We want to train and grow the next generation of data science researchers,” Rajan said. “Our goal is to use the D4 Institute to establish Iowa State University as the epicenter of research on dependable data-driven discoveries.”
Long-term goals for the D4 Institute include: Facilitating additional cross-disciplinary research, creating a hub for sharing data science expertise, educating data scientists, and spreading awareness about the importance of a fair and efficient data science pipeline that fosters honest and ethical decisions.
“In just a few years, we’ve expanded from a small team planning our first project into a successful NSF – funded research group that will be working hard to impact the study of the data science lifecycle,” Rajan said.
Return to the Atanasoff Today main page here.