NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | August 23, 2017 |
Latest Amendment Date: | June 16, 2022 |
Award Number: | 1740822 |
Award Instrument: | Standard Grant |
Program Manager: |
Phillip Regalia
pregalia@nsf.gov (703)292-2981 CCF Division of Computing and Communication Foundations CSE Direct For Computer & Info Scie & Enginr |
Start Date: | October 1, 2017 |
End Date: | September 30, 2023 (Estimated) |
Total Intended Award Amount: | $1,496,655.00 |
Total Awarded Amount to Date: | $1,496,655.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
341 PINE TREE RD ITHACA NY US 14850-2820 (607)255-5014 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
107 Hoy Road Ithaca NY US 14853-7501 |
Primary Place of Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
TRIPODS Transdisciplinary Rese, OFFICE OF MULTIDISCIPLINARY AC, INSPIRE |
Primary Program Source: |
01001718DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The researchers propose to create a center of data science for improved decision-making that combines expertise from computer science, information science, mathematics, operations research, and statistics. Their goal is to pursue basic research that will contribute to the theoretical foundations of data science. The research topics chosen have applications that can benefit society as a whole and integrate the perspectives of the disciplines that the project brings together. The five concrete research directions proposed are: Privacy and Fairness, Learning on Social Graphs, Learning to Intervene, Uncertainty Quantification, and Deep Learning. The aim of the Center is to advance knowledge in these areas and to broaden the range of disciplines and perspectives that can provide contributions to these challenging issues. The researchers plan to incorporate the community beyond Cornell through online seminars, workshops, and student conferences.
The research findings will provide an urgently needed foundation for data science in several topic areas of importance to society. As the center is placed at the intersection of multiple disciplines, the intellectual merit spans all disciplines involved and findings may translate to new algorithms and approaches in each one of them.
The research focus spans five core areas.
1. Privacy and Fairness. As data science becomes pervasive across many areas of society, and as it is increasingly used to aid decision-making in sensitive domains, it becomes crucial to protect individuals by guaranteeing privacy and fairness. The investigators propose to research the theoretical foundations to providing such guarantees and to surface inherent limitations.
2. Learning on Social Graphs. Many of the fundamental questions in applying data science to the interactions between individuals and larger social systems involve the social networks that underpin the connections between individuals. The researchers will develop new techniques for understanding both the structure of these networks and the processes that take place within them.
3. Learning to Intervene. Data-driven approaches to learning good interventions (including policies, recommendations, and treatments) inspire challenging questions about the foundations of sequential experimental design, counterfactual reasoning, and causal inference.
4. Uncertainty Quantification. Quantifying uncertainty about specific predictions or conclusions represents a key need in data science, especially when applied to decision-making with potential consequences to human subjects. The researchers will develop statistical tools and theoretical guarantees to assess the uncertainties of predictions made by popular algorithms in data science.
5. Deep Learning. Deep Learning algorithms have made impressive advances in practical settings. Although their basic building blocks are well understood, there is still ambiguity about what they learn and why they generalize so well. There are indications that they may learn data manifolds and that the type of optimization algorithm influences generalization.
Advances in our theoretical understanding of these phenomena requires combined efforts from optimization, statistics, and mathematics but could lead to insights for all aspects of data science.
Funds for the project come from CISE Computing and Communications Foundations, MPS Division of Mathematical Sciences, MPS Office of Multidisciplinary Activities, and Growing Convergent Research. (Convergence can be characterized as the deep integration of knowledge, techniques, and expertise from multiple fields to form new and expanded frameworks for addressing scientific and societal challenges and opportunities. This project promotes Convergence by bringing together communities representing many disciplines including mathematics, statistics, and theoretical computer science as well as engaging communities that apply data science to practical research problems.)
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The Cornell University TRIPODS Center for Data Science has made significant strides as an epicenter for research and learning in the field of data science. The center has excelled in molding the foundational theories of data science to enhance aspects like privacy and decision-making, while also facilitating educational initiatives to cultivate a new generation of data scientists.
- Research Development: The center's research initiatives have led to notable progress in several domains. Protecting individual privacy and ensuring the fairness of predictive models in sensitive sectors have been central themes. Efforts to comprehend and leverage social graph data and learning from interventions have been another. Key projects have taken steps to quantify the uncertainty in predictions generated by algorithms, providing more confidence in data science applications affecting human subjects. Remarkable headway has been achieved in deep learning, particularly in the efficiency of neural networks, thereby reducing computational requirements without sacrificing performance.
- Educational Initiatives and Community Building: Cornell's TRIPODS Center has spurred the creation of a vibrant data science community across the university through its weekly machine learning seminars and the introduction of new data science courses at the undergraduate level. These initiatives have facilitated productive dialogue and collaborative research, contributing significantly to the interdepartmental synergy.
- Postdoctoral Scholar Contributions: The center has benefitted from the recruitment of postdoctoral researchers who have catalyzed breakthroughs in deep understanding complex issues like the widely popular EM algorithm's limits and guarantees, efficient neural network layouts, and methods to recover missing data entries in matrices with the use of Kernel Optimal Matching and other advanced statistical techniques.
- Workshops and Events: The TRIPODS Center has hosted workshops that brought together an interdisciplinary range of experts to focus on the intersections between machine learning, optimization, and causal inference. These events have contributed to shaping the discourse around algorithmic decision-making and data science modeling.
- Software and Tools Development: The successful creation and global adoption of the GPyTorch library is a testament to the center's dedication to practical innovation, providing essential tools for fast and scalable inference within the data science community. GPyTorch is currently one of the most widely used libraries for Gaussian Process inference. ( https://gpytorch.ai/ )
- Youth and Community Engagement: The CSMore program is an embodiment of the center's dedication to education and diversifying the computer science field. This summer program engaged freshmen with computer science fundamentals and research exposure, establishing a strong foundation for their future studies.
Last Modified: 01/10/2024
Modified by: Kilian Weinberger
Please report errors in award information by writing to: awardsearch@nsf.gov.