skip to main content
research-article
Open access

Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development

Published: 18 October 2021 Publication History

Abstract

Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real-world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision's propensity to shape machine learning research and impact human life, we seek to understand disciplinary practices around dataset documentation - how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examine what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collected a corpus of about 500 computer vision datasets, from which we sampled 114 dataset publications across different vision tasks. Through both a structured and thematic content analysis, we document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision datasets authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identify sit in opposition with social computing practices. We conclude with suggestions on how to better incorporate silenced values into the dataset creation and curation process.

References

[1]
Rediet Abebe, Solon Barocas, Jon Kleinberg, Karen Levy, Manish Raghavan, and David G. Robinson. 2020. Roles for Computing in Social Change. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, Barcelona Spain, 252--260. https://doi.org/10.1145/3351095.3372871
[2]
Shazia Afzal, Rajmohan C, Manish Kesarwani, Sameep Mehta, and Hima Patel. 2020. Data Readiness Report. arXiv:2010.07213 [cs] (Oct. 2020). arXiv:2010.07213 [cs]
[3]
Philip E. Agre. 1998. Toward a Critical Technical Practice: Lessons Learned in Trying to Reform AI. Psychology Press.
[4]
Elizabeth Anderson. 1995. Knowledge, human interests, and objectivity in feminist epistemology. Philosophical Topics 23, 2 (1995), 27--58.
[5]
Jane Anderson and Kimberly Christen. 2013. 'Chuck a copyright on It': dilemmas of digital return and the possibilities for traditional knowledge licenses and labels. Museum Anthropology Review 7, 1--2 (2013), 105.
[6]
Carolyn Ashurst, Markus Anderljung, Carina Prunkl, Jan Leike, Yarin Gal, Toby Shevlane, and Allan Dafoe. 2020. A Guide to Writing the NeurIPS Impact Statement. (2020).
[7]
Mariam Attia and Julian Edge. 2017. Be(Com)Ing a Reflexive Researcher: A Developmental Approach to Research Methodology. Open Review of Educational Research 4, 1 (Jan. 2017), 33--45. https://doi.org/10.1080/23265507.2017.1300068
[8]
Shaowen Bardzell. 2010. Feminist HCI: Taking Stock and Outlining an Agenda for Design. In Proceedings of the 28th International Conference on Human Factors in Computing Systems - CHI '10. ACM Press, Atlanta, Georgia, USA, 1301. https://doi.org/10.1145/1753326.1753521
[9]
Tony Becher. 1987. Disciplinary Discourse. Studies in Higher Education 12, 3 (Jan. 1987), 261--274. https://doi.org/10.1080/03075078712331378052
[10]
Emily M. Bender and Batya Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Transactions of the Association for Computational Linguistics 6 (Dec. 2018), 587--604. https://doi.org/10.1162/tacl_a_00041
[11]
Eli Blevis. 2007. Sustainable Interaction Design: Invention & Disposal, Renewal & Reuse. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI '07. ACM Press, San Jose, California, USA, 503--512. https://doi.org/10.1145/1240624.1240705
[12]
Christine L Borgman. 2016. Big Data, Little Data, No Data: Scholarship in the Networked World. MIT press.
[13]
Geoffrey C Bowker. 2005. Memory Practices in the Sciences. Mit Press Cambridge, MA.
[14]
Geoffrey C Bowker and Susan Leigh Star. 2000. Sorting Things Out: Classification and Its Consequences. MIT Press.
[15]
Ruth Breeze. 2011. Disciplinary Values in Legal Discourse: A Corpus Study. Ibérica, Revista de la Asociación Europea de Lenguas para Fines Específicos 21 (2011), 93--115.
[16]
Tom Broens, Dick Quartel, and Marten van Sinderen. 2007. Capturing Context Requirements. In Smart Sensing and Context, Gerd Kortuem, Joe Finney, Rodger Lea, and Vasughi Sundramoorthy (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 223--238.
[17]
Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In FAT*. 77--91.
[18]
Stevie Chancellor, Eric P. S. Baumer, and Munmun De Choudhury. 2019. Who Is the "Human" in Human-Centered Machine Learning: The Case of Predicting Mental Health from Social Media. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov. 2019), 1--32. https://doi.org/10.1145/3359249
[19]
Kathy Charmaz. 2006. Constructing Grounded Theory: A Practical Guide through Qualitative Analysis. sage.
[20]
Torkil Clemmensen and Kerstin Roese. 2010. An Overview of a Decade of Journal Publications about Culture and Human-Computer Interaction (HCI). In Human Work Interaction Design: Usability in Social, Cultural and Organizational Contexts, Dinesh Katre, Rikke Orngreen, Pradeep Yammiyavar, and Torkil Clemmensen (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 98--112.
[21]
Harry Collins. 1992. Changing Order: Replication and Induction in Scientific Practice. University of Chicago Press.
[22]
Danish Contractor, Daniel McDuff, Julia Haines, Brent Hecht, and Christopher Hines. [n.d.]. Responsible AI Licenses. https://www.licenses.ai/.
[23]
Geoff Cooper and John Bowers. 1995. Representing the user: Notes on the disciplinary rhetoric of human-computer interaction. Cambridge Series on Human Computer Interaction (1995), 48--66.
[24]
Sasha Costanza-Chock. 2020. Design Justice: Community-Led Practices to Build the Worlds We Need. The MIT Press, Cambridge, MA.
[25]
Kate Crawford and Trevor Paglen. 2019. Excavating AI: The Politics of Images in Machine Learning Training Sets. Excavating AI (2019).
[26]
Fred D. Davis. 1989. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly 13, 3 (1989), 319--340. http://www.jstor.org/stable/249008
[27]
Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. 2019. Does Object Recognition Work for Everyone?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
[28]
Emily L. Denton, A. Hanna, Razvan Amironesei, Andrew Smart, Hilary Nicole, and M. Scheuerman. 2020. Bringing the People Back In: Contesting Benchmark Machine Learning Datasets. ICML Workshop on Participatory Approaches to Machine Learning (2020).
[29]
Michael A. DeVito, Darren Gergle, and Jeremy Birnholtz. 2017. "Algorithms Ruin Everything": #RIPTwitter, Folk Theories, and Resistance to Algorithmic Change in Social Media. Association for Computing Machinery, New York, NY, USA, 3163--3174. https://doi.org/10.1145/3025453.3025659
[30]
Jacob Dexe, Ulrik Franke, Anneli Avatare Nöu, and Alexander Rad. 2020. Towards Increased Transparency with Value Sensitive Design. In Artificial Intelligence in HCI, Helmut Degen and Lauren Reinerman-Jones (Eds.). Springer International Publishing, Cham, 3--15.
[31]
Brian Dobreski. 2018. Toward a Value-Analytic Approach to Information Standards. Proceedings of the Association for Information Science and Technology 55, 1 (2018), 114--122. https://doi.org/10.1002/pra2.2018.14505501013 arXiv:https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/pra2.2018.14505501013
[32]
Dulhanty, Chris. 2020. Issues in Computer Vision Data Collection: Bias, Consent, and Label Taxonomy.
[33]
Brianna Dym, Jed R. Brubaker, Casey Fiesler, and Bryan Semaan. 2019. "Coming Out Okay": Community Narratives for LGBTQ Identity Recovery Work. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov. 2019), 1--28. https://doi.org/10.1145/3359256
[34]
William Easley, Foad Hamidi, Wayne G. Lutters, and Amy Hurst. 2018. Shifting Expectations: Understanding Youth Employees' Handoffs in a 3D Print Shop. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 47 (Nov. 2018), 23 pages. https://doi.org/10.1145/3274316
[35]
Christiane Fellbaum. 2012. WordNet. In The Encyclopedia of Applied Linguistics, Carol Chapelle (Ed.). John Wiley & Sons, Inc., Hoboken, NJ, USA, wbeal1285. https://doi.org/10.1002/9781405198431.wbeal1285
[36]
Karën Fort, Gilles Adda, and K Bretonnel Cohen. 2011. Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics 37, 2 (2011), 413--420.
[37]
Race Forward. 2015. Race Reporting Guide. Technical Report.
[38]
Michel Foucault. 1990. The History of Sexuality: An Introduction. Vintage.
[39]
Christopher Fox, Anany Levitin, and Thomas Redman. 1994. The Notion of Data and Its Quality Dimensions. Information Processing & Management 30, 1 (Jan. 1994), 9--19. https://doi.org/10.1016/0306--4573(94)90020--5
[40]
Batya Friedman. 1996. Value-Sensitive Design. Interactions 3, 6 (Dec. 1996), 16--23. https://doi.org/10.1145/242485.242493
[41]
Patricia Garcia and Marika Cifor. 2019. Expanding Our Reflexive Toolbox: Collaborative Possibilities for Examining Socio-Technical Systems Using Duoethnography. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 190 (Nov. 2019), 23 pages. https://doi.org/10.1145/3359292
[42]
Patricia Garcia, Tonia Sutherland, Marika Cifor, Anita Say Chan, Lauren Klein, Catherine D'Ignazio, and Niloufar Salehi. 2020. No: Critical Refusal as Feminist Data Practice. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. ACM, Virtual Event USA, 199--202. https://doi.org/10.1145/3406865.3419014
[43]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2020. Datasheets for Datasets. arXiv:1803.09010 [cs] (March 2020). arXiv:1803.09010 [cs]
[44]
R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in, Garbage out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.
[45]
Yolanda Gil, Cédric H. David, Ibrahim Demir, Bakinam T. Essawy, Robinson W. Fulweiler, Jonathan L. Goodall, Leif Karlstrom, Huikyo Lee, Heath J. Mills, Ji-Hyun Oh, Suzanne A. Pierce, Allen Pope, Mimi W. Tzeng, Sandra R. Villamizar, and Xuan Yu. 2016. Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science 3, 10 (2016), 388--415. https://doi.org/10.1002/2015EA000136 arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/2015EA000136
[46]
Lisa Gitelman (Ed.). 2013. "Raw Data" Is an Oxymoron. The MIT Press, Cambridge, Massachusetts ; London, England.
[47]
Laurence Goldman. 2020. Social Impact Analysis: An Applied Anthropology Manual.
[48]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.
[49]
Mary L Gray and Siddharth Suri. 2019. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Houghton Mifflin Harcourt.
[50]
J. Grudin. 1994. Computer-Supported Cooperative Work: History and Focus. Computer 27, 5 (1994), 19--26. https://doi.org/10.1109/2.291294
[51]
Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. 2020. Towards a Critical Race Methodology in Algorithmic Fairness. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Dec. 2020). https://doi.org/10.1145/3351095.3372826
[52]
Donna Haraway. 1988. Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective. Feminist studies 14, 3 (1988), 575--599.
[53]
Gillian R. Hayes. 2011. The Relationship of Action Research to Human-Computer Interaction. ACM Transactions on Computer-Human Interaction 18, 3 (July 2011), 1--20. https://doi.org/10.1145/1993060.1993065
[54]
Benjamin Heinzerling. 2019. NLP's Clever Hans Moment has Arrived. https://thegradient.pub/nlps-clever-hans-moment-has-arrived/. The Gradient (2019).
[55]
Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, and Anna Rohrbach. 2018. Women Also Snowboard: Overcoming Bias in Captioning Models. In Computer Vision -- ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 793--811.
[56]
James Hodge, Sarah Foley, Rens Brankaert, Gail Kenning, Amanda Lazar, Jennifer Boger, and Kellie Morrissey. 2020. Relational, Flexible, Everyday: Learning from Ethics in Dementia Research. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1--16. https://doi.org/10.1145/3313831.3376627
[57]
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. arXiv:1805.03677 [cs] (May 2018). arXiv:1805.03677 [cs]
[58]
Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure. Proceedings of the 2021 Conference on Fairness, Accountability, and Transparency (Oct. 2021).
[59]
IBM. [n.d.]. IBM Data Privacy Passports. https://www.ibm.com/products/data-privacy-passports.
[60]
Inter-University Consortium For Political And Social Research. 2012. Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle. (2012). https://doi.org/10.3886/GUIDETOSOCIALSCIENCEDATAPREPARATIONANDARCHIVING
[61]
Lilly C Irani and M Six Silberman. 2013. Turkopticon: Interrupting Worker Invisibility in Amazon Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 611--620.
[62]
Eun Seo Jo and Timnit Gebru. 2020. Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* '20). Association for Computing Machinery, New York, NY, USA, 306--316. https://doi.org/10.1145/3351095.3372829
[63]
Christine Kaeser-Chen, Elizabeth Dubois, Friederike Schüür, and Emanuel Moss. 2020. Positionality-Aware Machine Learning: Translation Tutorial. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* '20). Association for Computing Machinery, New York, NY, USA, 704. https://doi.org/10.1145/3351095.3375666
[64]
Lucas Kempe-Cook, Stephen Tsung-Han Sher, and Norman Makoto Su. 2019. Behind the Voices: The Practice and Challenges of Esports Casters. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300795
[65]
Os Keyes, Josephine Hoy, and Margaret Drouhard. 2019. Human-Computer Insurrection: Notes on an Anarchist HCI. arXiv:1908.06167 [cs] (Aug. 2019). arXiv:1908.06167 [cs]
[66]
Mehtab Khan and Alex Hanna. 2020. The Legality of Computer Vision Datasets. Under review (2020).
[67]
Colin Koopman. 2019. How We Became Our Data: A Genealogy of the Informational Person. The University of Chicago Press, Chicago.
[68]
Klaus Krippendorff. 2018. Content Analysis: An Introduction to Its Methodology. Sage publications.
[69]
Bruno Latour. 1987. Science in action: How to follow scientists and engineers through society. Harvard university press.
[70]
Bruno Latour and Steve Woolgar. 1986. Laboratory life: The construction of scientific facts. Princeton University Press.
[71]
Leib Litman, Jonathan Robinson, and Cheskie Rosenzweig. 2015. The Relationship between Motivation, Monetary Compensation, and Data Quality among US- and India-Based Workers on Mechanical Turk. Behavior Research Methods 47, 2 (June 2015), 519--528. https://doi.org/10.3758/s13428-014-0483-x
[72]
Wendy E. MacKay. 1999. Is Paper Safer? The Role of Paper Flight Strips in Air Traffic Control. ACM Trans. Comput.-Hum. Interact. 6, 4 (Dec. 1999), 311--340. https://doi.org/10.1145/331490.331491
[73]
Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra, and Ivan Meza-Ruiz. 2018. Challenges of language technologies for the indigenous languages of the Americas. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 55--69. https://www.aclweb.org/anthology/C18--1006
[74]
Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-Rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov. 2019), 1--23. https://doi.org/10.1145/3359174
[75]
Jacob Metcalf and Kate Crawford. 2016. Where Are Human Subjects in Big Data Research? The Emerging Ethics Divide. Big Data & Society 3, 1 (Jan. 2016), 205395171665021. https://doi.org/10.1177/2053951716650211
[76]
Mila Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana-Alina Serbanescu, and Alex Hanna. 2021. Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices. In FAccT.
[77]
Erwan Moreau, Carl Vogel, and Marguerite Barry. 2019. A Paradigm for Democratizing Artificial Intelligence Research. In Innovations in Big Data Mining and Embedded Knowledge, Anna Esposito, Antonietta M. Esposito, and Lakhmi C. Jain (Eds.). Springer International Publishing, Cham, 137--166. https://doi.org/10.1007/978--3-030--15939--9
[78]
Michael Muller, Cecilia Aragon, Shion Guha, Marina Kogan, Gina Neff, Cathrine Seidelin, Katie Shilton, and Anissa Tanweer. 2020. Interrogating Data Science. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. ACM, Virtual Event USA, 467--473. https://doi.org/10.1145/3406865.3418584
[79]
Madhumita Murgia. 2019. Who's Using Your Face? The Ugly Truth about Facial Recognition. Financial Times (Sept. 2019).
[80]
Michelle Murphy. 2017. The Economization of Life. Duke University Press, Durham ; London.
[81]
Ihudiya Finda Ogbonnaya-Ogburu, Angela D.R. Smith, Alexandra To, and Kentaro Toyama. 2020. Critical Race Theory for HCI. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1--16. https://doi.org/10.1145/3313831.3376392
[82]
Open Science Collaboration. 2015. Estimating the Reproducibility of Psychological Science. Science 349, 6251 (Aug. 2015), aac4716--aac4716. https://doi.org/10.1126/science.aac4716
[83]
Irene V. Pasquetto, Bernadette M. Randles, and Christine L. Borgman. 2017. On the Reuse of Scientific Data. Data Science Journal 16 (March 2017), 8. https://doi.org/10.5334/dsj-2017-008
[84]
Desmond Upton Patton, Philipp Blandfort, William R Frey, Michael B Gaskell, and Svebor Karaman. 2019. Annotating twitter data from vulnerable populations: Evaluating disagreement between domain experts and graduate student annotators. (2019).
[85]
Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. 2020. Data and Its (Dis)Contents: A Survey of Dataset Development and Use in Machine Learning Research. arXiv:2012.05345 [cs] (Dec. 2020). arXiv:2012.05345 [cs]
[86]
Andrew Pickering. 2010. The Mangle of Practice: Time, Agency, and Science. University of Chicago Press.
[87]
Matthew Pittman and Kim Sheehan. 2016. Amazon's Mechanical Turk a Digital Sweatshop? Transparency and Accountability in Crowdsourced Online Research. Journal of Media Ethics 31, 4 (Oct. 2016), 260--262. https://doi.org/10.1080/23736992.2016.1228811
[88]
Jason L Powell. 2015. 'Disciplining' Truth and Science: Michel Foucault and the Power of Social Science. World Scientific News 13 (2015), 15--29.
[89]
Vinay Uday Prabhu and Abeba Birhane. 2020. Large Image Datasets: A Pyrrhic Win for Computer Vision? arXiv:2006.16923 [cs, stat] (July 2020). arXiv:2006.16923 [cs, stat]
[90]
Inioluwa Deborah Raji, Morgan Klaus Scheuerman, and Razvan Amironesei. 2021. ?You Can't Sit With Us": Exclusionary Pedagogy in AI Ethics Education. In FAccT.
[91]
Jemima Repo. 2015. The Biopolitics of Gender. Oxford University Press.
[92]
Wendy Roldan, Xin Gao, Allison Marie Hishikawa, Tiffany Ku, Ziyue Li, Echo Zhang, Jon E. Froehlich, and Jason Yip. 2020. Opportunities and Challenges in Involving Users in Project-Based HCI Education. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1--15. https://doi.org/10.1145/3313831.3376530
[93]
Gillian Rose. 1997. Situating Knowledges: Positionality, Reflexivities and Other Tactics. Progress in Human Geography 21, 3 (June 1997), 305--320. https://doi.org/10.1191/030913297673302122
[94]
Ammon J. Salter and Ben R. Martin. 2001. The Economic Benefits of Publicly Funded Basic Research: A Critical Review. Research Policy 30, 3 (March 2001), 509--532. https://doi.org/10.1016/S0048--7333(00)00091--3
[95]
Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Parveen Paritosh, and Lora Aroyo. 2021. "Everyone Wants to Do the Model Work, Not the Data Work": Data Cascades in High-Stakes AI. In CHI.
[96]
Morgan Klaus Scheuerman, Jacob M. Paul, and Jed R. Brubaker. 2019. How Computers See Gender: An Evaluation of Gender Classification in Commercial Facial Analysis Services. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 144 (Nov. 2019). https://doi.org/10.1145/3359246
[97]
Morgan Klaus Scheuerman, Katta Spiel, Oliver L Haimson, Foad Hamidi, and Stacy M Branham. 2020. HCI Guidelines for Gender Equity and Inclusivity. https://www.morgan-klaus.com/gender-guidelines.html.
[98]
Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R. Brubaker. 2020. How We've Taught Algorithms to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis. Proc. ACM Hum.-Comput. Interact. 4, CSCW1 (2020). https://doi.org/10.1145/3392866
[99]
Christof Schöch. 2013. Big? Smart? Clean? Messy? Data in the Humanities. Journal of Digital Humanities 2, 3 (Dec. 2013), 2--13.
[100]
James C. Scott. 2008. Seeing like a State: How Certain Schemes to Improve the Human Condition Have Failed (nachdr. ed.). Yale Univ. Press, New Haven, Conn.
[101]
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc., 2503--2511.
[102]
Evan Selinger and Woodrow Hartzog. 2019. What Happens When Employers Can Read Your Facial Expressions? The New York Times (Oct. 2019).
[103]
Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, Vancouver BC Canada, 826--838. https://doi.org/10.1145/2675133.2675285
[104]
Phoebe Sengers, Kirsten Boehner, Shay David, and Joseph 'Jofish' Kaye. 2005. Reflective Design. In Proceedings of the 4th Decennial Conference on Critical Computing between Sense and Sensibility - CC '05. ACM Press, Aarhus, Denmark, 49. https://doi.org/10.1145/1094562.1094569
[105]
Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D. Sculley. 2017. No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World. arXiv:1711.08536 [stat] (Nov. 2017). arXiv:1711.08536 [stat]
[106]
Ellen Simpson and Bryan Semaan. 2021. For You, or For"You"?: Everyday LGBTQ+ Encounters with TikTok. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3 (Jan. 2021), 1--34. https://doi.org/10.1145/3432951
[107]
Stephen C. Slota, Kenneth R. Fleischmann, Sherri Greenberg, Nitin Verma, Brenna Cummings, Lan Li, and Chris Shenefiel. 2020. Good Systems, Bad Data?: Interpretations of AI Hype and Failures. Proceedings of the Association for Information Science and Technology 57, 1 (2020), e275. https://doi.org/10.1002/pra2.275 arXiv:https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/pra2.275
[108]
R. Smith. 2001. Measuring the Social Impact of Research. BMJ 323, 7312 (Sept. 2001), 528--528. https://doi.org/10.1136/bmj.323.7312.528
[109]
Luke Stark. 2019. Facial Recognition Is the Plutonium of AI. XRDS 25, 3 (April 2019), 50--55. https://doi.org/10.1145/3313129
[110]
Victoria Stodden, Matthew S. Krafczyk, and Adhithya Bhaskar. 2018. Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility. In Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems. ACM, Tempe AZ USA, 1--5. https://doi.org/10.1145/3214239.3214242
[111]
Victoria Stodden, Jennifer Seiler, and Zhaokun Ma. 2018. An Empirical Analysis of Journal Policy Effectiveness for Computational Reproducibility. Proceedings of the National Academy of Sciences 115, 11 (March 2018), 2584--2589. https://doi.org/10.1073/pnas.1708290115
[112]
Ann Laura Stoler. 1995. Race and the Education of Desire: Foucault's History of Sexuality and the Colonial Order of Things. Duke University Press. https://doi.org/10.1215/9780822377719
[113]
Lucy Suchman. 1993. Do Categories Have Politics?: The Language/Action Perspective Reconsidered. Computer Supported Cooperative Work (CSCW) 2, 3 (Sept. 1993), 177--190. https://doi.org/10.1007/BF00749015
[114]
Jennyfer Lawrence Taylor, Alessandro Soro, Paul Roe, Anita Lee Hong, and Margot Brereton. 2017. Situational When: Designing for Time Across Cultures. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, Denver Colorado USA, 6461--6474. https://doi.org/10.1145/3025453.3025936
[115]
Thomas Teo. 2014. Epistemological Violence. In Encyclopedia of Critical Psychology, Thomas Teo (Ed.). Springer New York, New York, NY, 593--596. https://doi.org/10.1007/978--1--4614--5583--7_441
[116]
Diane Vaughan. 1999. The Role of the Organization in the Production of Techno-Scientific Knowledge. Social Studies of Science 29, 6 (1999), 913--943.
[117]
Janet Vertesi and Paul Dourish. 2011. The Value of Data: Considering the Context of Production in Data Economies. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work - CSCW '11. ACM Press, Hangzhou, China, 533. https://doi.org/10.1145/1958824.1958906
[118]
Lewis Raven Wallace. 2019. The View from Somewhere: Undoing the Myth of Journalistic Objectivity. University of Chicago Press.
[119]
Jonathan Stuart Ward and Adam Barker. 2013. Undefined By Data: A Survey of Big Data Definitions. arXiv:1309.5821 [cs] (Sept. 2013). arXiv:1309.5821 [cs]
[120]
Vanessa Williamson. 2016. On the Ethics of Crowdsourced Research. PS: Political Science & Politics 49, 1 (2016), 77--81. https://doi.org/10.1017/S104909651500116X
[121]
Langdon Winner. 2020. The Whale and the Reactor: A Search for Limits in an Age of High Technology (second edition ed.). University of Chicago Press, Chicago.
[122]
Qing Zhang, David Elsweiler, and Christoph Trattner. 2020. Visual Cultural Biases in Food Classification. Foods 9, 6 (June 2020), 823. https://doi.org/10.3390/foods9060823
[123]
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men Also like Shopping: Reducing Gender Bias Amplification Using Corpus-Level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 2979--2989. https://doi.org/10.18653/v1/D17--1323

Cited By

View all
  • (2024)When Being a Data Annotator Was Not Yet a Job: The Laboratory Origins of Dispersible Labor in Computer Vision ResearchSocius: Sociological Research for a Dynamic World10.1177/2378023124125961710Online publication date: 24-Jun-2024
  • (2024)Resisting Dehumanization in the Age of “AI”Current Directions in Psychological Science10.1177/0963721423121728633:2(114-120)Online publication date: 2-Feb-2024
  • (2024)Who's in and who's out? A case study of multimodal CLIP-filtering in DataCompProceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3689904.3694702(1-17)Online publication date: 29-Oct-2024
  • Show More Cited By

Index Terms

  1. Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 5, Issue CSCW2
      CSCW2
      October 2021
      5376 pages
      EISSN:2573-0142
      DOI:10.1145/3493286
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 October 2021
      Published in PACMHCI Volume 5, Issue CSCW2

      Check for updates

      Badges

      • Best Paper

      Author Tags

      1. computer vision
      2. datasets
      3. machine learning
      4. values in design
      5. work practice

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1,494
      • Downloads (Last 6 weeks)166
      Reflects downloads up to 01 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)When Being a Data Annotator Was Not Yet a Job: The Laboratory Origins of Dispersible Labor in Computer Vision ResearchSocius: Sociological Research for a Dynamic World10.1177/2378023124125961710Online publication date: 24-Jun-2024
      • (2024)Resisting Dehumanization in the Age of “AI”Current Directions in Psychological Science10.1177/0963721423121728633:2(114-120)Online publication date: 2-Feb-2024
      • (2024)Who's in and who's out? A case study of multimodal CLIP-filtering in DataCompProceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3689904.3694702(1-17)Online publication date: 29-Oct-2024
      • (2024)Combating Islamophobia: Compromise, Community, and Harmony in Mitigating Harmful Online ContentACM Transactions on Social Computing10.1145/36415107:1-4(1-32)Online publication date: 27-Apr-2024
      • (2024)Missed Opportunities for Human-Centered AI Research: Understanding Stakeholder Collaboration in Mental Health AI ResearchProceedings of the ACM on Human-Computer Interaction10.1145/36373728:CSCW1(1-24)Online publication date: 26-Apr-2024
      • (2024)Attitudes Toward Facial Analysis AI: A Cross-National Study Comparing Argentina, Kenya, Japan, and the USAProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659038(2273-2301)Online publication date: 3-Jun-2024
      • (2024)A Critical Analysis of the Largest Source for Generative AI Training Data: Common CrawlProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659033(2199-2208)Online publication date: 3-Jun-2024
      • (2024)Data, Annotation, and Meaning-Making: The Politics of Categorization in Annotating a Dataset of Faith-based Communal ViolenceProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659030(2148-2156)Online publication date: 3-Jun-2024
      • (2024)Machine learning data practices through a data curation lens: An evaluation frameworkProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658955(1055-1067)Online publication date: 3-Jun-2024
      • (2024)The ``Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based BiasesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642669(1-18)Online publication date: 11-May-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media