Award Abstract # 1740855
TRIPODS: Berkeley Institute on the Foundations of Data Analysis

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE
Initial Amendment Date: August 23, 2017
Latest Amendment Date: July 26, 2019
Award Number: 1740855
Award Instrument: Continuing Grant
Program Manager: Yuliya Gorb
ygorb@nsf.gov
 (703)292-2113
CCF
 Division of Computing and Communication Foundations
CSE
 Direct For Computer & Info Scie & Enginr
Start Date: September 1, 2017
End Date: August 31, 2022 (Estimated)
Total Intended Award Amount: $1,499,999.00
Total Awarded Amount to Date: $1,499,999.00
Funds Obligated to Date: FY 2017 = $499,999.00
FY 2018 = $500,000.00

FY 2019 = $500,000.00
History of Investigator:
  • Michael Mahoney (Principal Investigator)
    mmahoney@icsi.berkeley.edu
  • Bin Yu (Co-Principal Investigator)
  • Peter Bartlett (Co-Principal Investigator)
  • Fernando Perez (Co-Principal Investigator)
  • Michael Jordan (Co-Principal Investigator)
  • Richard Karp (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-Berkeley
1608 4TH ST STE 201
BERKELEY
CA  US  94710-1749
(510)643-3891
Sponsor Congressional District: 12
Primary Place of Performance: University of California-Berkeley
CA  US  94704-5940
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): GS3YEVSS12N6
Parent UEI:
NSF Program(s): TRIPODS Transdisciplinary Rese
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
01001819DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 047Z, 062Z
Program Element Code(s): 041Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

In response to NSF's TRIPODS Phase I initiative, the PIs, with expertise in theoretical and applied statistics, computer science, and mathematics at the University of California, Berkeley, will create a Foundations of Data Analysis (FODA) Institute to address cutting-edge foundational issues in interdisciplinary data science. The Institute will advance foundational research and the application of foundational methods through an intensive program of cross-disciplinary outreach to application domains in and beyond the campus research community. In parallel with the massive technological and methodological advances in the underlying disciplines over the past decade, a thriving array of data-related research and training programs has emerged across campus. Yet none of these programs within the campus data science ecosystem are devoted to addressing the interdisciplinary foundations of data analysis in a focused, mission-driven manner. The FODA Institute will address this crucial unmet need. This interdisciplinary project will lay the groundwork for more productive and fruitful interactions between theoretically-inclined data science researchers and researchers in diverse domains that rely upon, but do not always explicitly appreciate, foundational concepts. Advances in this area will lead to more principled extraction of insights from data across a wide range of domains. The three-year Phase I pilot will pave the way for institutionalization of the project as a larger center that will be the subject of a potential Phase II application.

The technical research component of the project addresses four fundamental challenges in data science: the characterization of what is, and what is not, possible in terms of upper and lower bounds for inferential optimization problems; probing more deeply the notion of stability as a computational-inferential principle; exploring the complementary role of randomness as a statistical resource, as an algorithmic resource, and as a tool for data-driven computational mathematics; and developing methods to combine science-based with data-driven models in a principled manner. Each of these challenges addresses old questions in light of new needs, each has important synergies with the other challenges, and each is situated squarely at the interface of theoretical computer science, theoretical statistics, and applied mathematics. The project will bridge the underlying interdisciplinary gaps to address some of the most important questions at the heart of data science today. Funds for the project come from CISE Computing and Communications Foundations and MPS Division of Mathematical Sciences.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Tan, Yan_Shuo and Ronen, Omer and Saarinen, Theo and Yu, Bin "The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis" , 2024 Citation Details
Singh, Chandan and Ha, Wooseok and Yu, Bin "Interpreting and Improving Deep-Learning Models with Reality Checks" Lecture notes in computer science , 2022 https://doi.org/10.1007/978-3-031-04083-2_12 Citation Details
Ha, Wooseok and Singh, Chandan and Lanusse, Francois and Upadhyayula, Srigokul and Yu, Bin "Adaptive wavelet distillation from neural networks through interpretations" Advances in neural information processing systems , 2021 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Researchers with expertise in theoretical and applied statistics, computer science, and mathematics at UC Berkeley created the Foundations of Data Analysis (FODA) Institute to address cutting-edge foundational issues in interdisciplinary data science.  The Institute advanced foundational research and the application of foundational methods through an intensive program of cross-disciplinary outreach to application domains in and beyond the UC Berkeley campus research community.  Prior to the creation of FODA, a thriving array of data-related research and training programs had emerged across campus.  None, however, was devoted to addressing the interdisciplinary foundations of data analysis.  Research outcomes focused on the training of graduate students and postdoctoral researchers on topics including the characterization of what is and what is not possible in terms of upper and lower bounds for inferential optimization problems; of stability as a computational-inferential principle; the complementary role of randomness as a statistical resource, as an algorithmic resource, and as a tool for data-driven computational mathematics; and developing methods to combine science-based with data-driven models in a principled manner.  Collaborations with UC Berkeley campus partners, including the Berkeley Institute for Data Science (BIDS), the Simons Institute for the Theory of Computing, and the RISE/AMP Labs, as well as with Lawrence Berkeley National Laboratory and with the NSF Big Data Regional Innovation Hub, were developed and ensured that the work of the FODA Institute was disseminated across the existing data science ecosystem and beyond.  TRIPODS+X projects applied the methods to problems in materials science, cosmology, and beyond.  Research outcomes were also integrated into the undergraduate and graduate curricula at Berkeley, producing researchers and scholars in diverse domains with rigorous training in the critical analysis of data as it appears in their everyday experience.  In partnership with MIT, Boston University, Bryn Mawr College, Harvard University, Howard University, and Northeastern University, FODA served as a three-year Phase I pilot that led to collaboration with other Phase I partners and others to create the TRIPODS Phase II Foundations of Data Science Institute (FODSI).


Last Modified: 01/10/2023
Modified by: Michael Mahoney

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page

DCSIMG