Award Abstract # 1740822
TRIPODS: Data Science for Improved Decision-Making: Learning in the Context of Uncertainty, Causality, Privacy, and Network Structures

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: CORNELL UNIVERSITY
Initial Amendment Date: August 23, 2017
Latest Amendment Date: June 16, 2022
Award Number: 1740822
Award Instrument: Standard Grant
Program Manager: Phillip Regalia
pregalia@nsf.gov
 (703)292-2981
CCF
 Division of Computing and Communication Foundations
CSE
 Direct For Computer & Info Scie & Enginr
Start Date: October 1, 2017
End Date: September 30, 2023 (Estimated)
Total Intended Award Amount: $1,496,655.00
Total Awarded Amount to Date: $1,496,655.00
Funds Obligated to Date: FY 2017 = $1,496,655.00
History of Investigator:
  • Kilian Weinberger (Principal Investigator)
    kilianweinberger@cornell.edu
  • Steven Strogatz (Co-Principal Investigator)
  • Jon Kleinberg (Co-Principal Investigator)
  • David Shmoys (Co-Principal Investigator)
  • Giles Hooker (Co-Principal Investigator)
Recipient Sponsored Research Office: Cornell University
341 PINE TREE RD
ITHACA
NY  US  14850-2820
(607)255-5014
Sponsor Congressional District: 19
Primary Place of Performance: Cornell University
107 Hoy Road
Ithaca
NY  US  14853-7501
Primary Place of Performance
Congressional District:
19
Unique Entity Identifier (UEI): G56PUALJ3KT5
Parent UEI:
NSF Program(s): TRIPODS Transdisciplinary Rese,
OFFICE OF MULTIDISCIPLINARY AC,
INSPIRE
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 047Z, 060Z, 062Z
Program Element Code(s): 041Y00, 125300, 807800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The researchers propose to create a center of data science for improved decision-making that combines expertise from computer science, information science, mathematics, operations research, and statistics. Their goal is to pursue basic research that will contribute to the theoretical foundations of data science. The research topics chosen have applications that can benefit society as a whole and integrate the perspectives of the disciplines that the project brings together. The five concrete research directions proposed are: Privacy and Fairness, Learning on Social Graphs, Learning to Intervene, Uncertainty Quantification, and Deep Learning. The aim of the Center is to advance knowledge in these areas and to broaden the range of disciplines and perspectives that can provide contributions to these challenging issues. The researchers plan to incorporate the community beyond Cornell through online seminars, workshops, and student conferences.

The research findings will provide an urgently needed foundation for data science in several topic areas of importance to society. As the center is placed at the intersection of multiple disciplines, the intellectual merit spans all disciplines involved and findings may translate to new algorithms and approaches in each one of them.

The research focus spans five core areas.

1. Privacy and Fairness. As data science becomes pervasive across many areas of society, and as it is increasingly used to aid decision-making in sensitive domains, it becomes crucial to protect individuals by guaranteeing privacy and fairness. The investigators propose to research the theoretical foundations to providing such guarantees and to surface inherent limitations.

2. Learning on Social Graphs. Many of the fundamental questions in applying data science to the interactions between individuals and larger social systems involve the social networks that underpin the connections between individuals. The researchers will develop new techniques for understanding both the structure of these networks and the processes that take place within them.

3. Learning to Intervene. Data-driven approaches to learning good interventions (including policies, recommendations, and treatments) inspire challenging questions about the foundations of sequential experimental design, counterfactual reasoning, and causal inference.

4. Uncertainty Quantification. Quantifying uncertainty about specific predictions or conclusions represents a key need in data science, especially when applied to decision-making with potential consequences to human subjects. The researchers will develop statistical tools and theoretical guarantees to assess the uncertainties of predictions made by popular algorithms in data science.

5. Deep Learning. Deep Learning algorithms have made impressive advances in practical settings. Although their basic building blocks are well understood, there is still ambiguity about what they learn and why they generalize so well. There are indications that they may learn data manifolds and that the type of optimization algorithm influences generalization.

Advances in our theoretical understanding of these phenomena requires combined efforts from optimization, statistics, and mathematics but could lead to insights for all aspects of data science.

Funds for the project come from CISE Computing and Communications Foundations, MPS Division of Mathematical Sciences, MPS Office of Multidisciplinary Activities, and Growing Convergent Research. (Convergence can be characterized as the deep integration of knowledge, techniques, and expertise from multiple fields to form new and expanded frameworks for addressing scientific and societal challenges and opportunities. This project promotes Convergence by bringing together communities representing many disciplines including mathematics, statistics, and theoretical computer science as well as engaging communities that apply data science to practical research problems.)

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 18)
Sreekumar, Sreejith and Goldfeld, Ziv "Soft-covering via Constant-composition Superposition codes" Proceedings of the IEEE International Symposium on Information Theory (ISIT) , 2021 https://doi.org/10.1109/ISIT45174.2021.9518258 Citation Details
Sreekumar, Sreejith and Zhang, Zhengxin and Goldfeld, Ziv "Non-asymptotic Performance Guarantees for Neural Estimation of f-divergences" Proceedings of Machine Learning Research , v.130 , 2022 Citation Details
Sreejith Sreekumar, Zhengxin Zhang "Non-asymptotic Performance Guarantees for Neural Estimation of f-Divergences" Proeedings of the International Workshop on Artificial Intelligence and Statistics , 2021 Citation Details
S. Sreekumar, A. Bunin "The secrecy capacity of cost-constrained wiretap channels," IEEE transactions on information theory , 2021 Citation Details
Naman Agarwal, Sham Kakade "Leverage Score Sampling for Faster Accelerated Regression and ERM" Algorithmic Learning Theory , 2019 Citation Details
Rahul Kidambi, Aravind Rajeswaran "MOReL: Model-Based Offline Reinforcement Learning" Neurips , 2020 Citation Details
Su, Yi and Wang, Lequn and Santacatterina, Michele and Joachims, Thorsten "CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning" Proceedings of Machine Learning Research , 2019 Citation Details
Gao Huang, Shichen Liu "CondenseNet: An Efficient DenseNet using Learned Group Convolutions" CVPR , 2018 Citation Details
Huang, Gao and Shichen, Liu and Van der Maaten, Laurens and Weinberger, Kilian "CondenseNet: An Efficient DenseNet using Learned Group Convolutions" CVPR 2018 , 2018 Citation Details
Benson, Austin R and Kumar, Ravi and Tomkins, Andrew. "Sequences of Sets" The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , 2018 Citation Details
Benson, Austin R and Kleinberg, Jon. "FOUND GRAPH DATA AND PLANTED VERTEX COVERS" arXiv.org , 2018 Citation Details
(Showing: 1 - 10 of 18)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The Cornell University TRIPODS Center for Data Science has made significant strides as an epicenter for research and learning in the field of data science. The center has excelled in molding the foundational theories of data science to enhance aspects like privacy and decision-making, while also facilitating educational initiatives to cultivate a new generation of data scientists.


- Research Development:  The center's research initiatives have led to notable progress in several domains. Protecting individual privacy and ensuring the fairness of predictive models in sensitive sectors have been central themes. Efforts to comprehend and leverage social graph data and learning from interventions have been another. Key projects have taken steps to quantify the uncertainty in predictions generated by algorithms, providing more confidence in data science applications affecting human subjects. Remarkable headway has been achieved in deep learning, particularly in the efficiency of neural networks, thereby reducing computational requirements without sacrificing performance.
- Educational Initiatives and Community Building:  Cornell's TRIPODS Center has spurred the creation of a vibrant data science community across the university through its weekly machine learning seminars and the introduction of new data science courses at the undergraduate level. These initiatives have facilitated productive dialogue and collaborative research, contributing significantly to the interdepartmental synergy.
- Postdoctoral Scholar Contributions:  The center has benefitted from the recruitment of postdoctoral researchers who have catalyzed breakthroughs in deep understanding complex issues like the widely popular EM algorithm's limits and guarantees, efficient neural network layouts, and methods to recover missing data entries in matrices with the use of Kernel Optimal Matching and other advanced statistical techniques.
- Workshops and Events:  The TRIPODS Center has hosted workshops that brought together an interdisciplinary range of experts to focus on the intersections between machine learning, optimization, and causal inference. These events have contributed to shaping the discourse around algorithmic decision-making and data science modeling.
- Software and Tools Development:  The successful creation and global adoption of the GPyTorch library is a testament to the center's dedication to practical innovation, providing essential tools for fast and scalable inference within the data science community. GPyTorch is currently one of the most widely used libraries for Gaussian Process inference. ( https://gpytorch.ai/ )
- Youth and Community Engagement:  The CSMore program is an embodiment of the center's dedication to education and diversifying the computer science field. This summer program engaged freshmen with computer science fundamentals and research exposure, establishing a strong foundation for their future studies.


Last Modified: 01/10/2024
Modified by: Kilian Weinberger

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page

DCSIMG