skip to main content
10.1145/3097983.3098021acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

Published: 13 August 2017 Publication History

Abstract

Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components---a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt.
We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions.
We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.

Supplementary Material

MP4 File (cheng_machine_learning.mp4)

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning OSDI. 265--283.
[2]
Rami Abousleiman, Guangzhi Qu, and Osamah A. Rawashdeh. 2013. North Atlantic Right Whale Contact Call Detection. CoRR Vol. abs/1304.7851 (2013).
[3]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In DLRS. 7--10.
[4]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. RecSys. 191--198.
[5]
Yann Dauphin, Razvan Pascanu, Caglar Gülccehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. CoRR Vol. abs/1406.2572 (2014).
[6]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and et al. 2007. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm AOFA.
[7]
Tim Kraska, Ameet Talwalkar, John C. Duchi, Rean Griffith, Michael J. Franklin, and Michael I. Jordan 2013. MLbase: A Distributed Machine-learning System. CIDR.
[8]
Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, and Ken Goldberg 2016. ActiveClean: Interactive Data Cleaning For Statistical Modeling. PVLDB, Vol. 9, 12 (2016), 948--959.
[9]
Sara Landset, Taghi M. Khoshgoftaar, Aaron N. Richter, and Tawfiq Hasanin 2015. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data, Vol. 2, 1 (2015), 24.
[10]
Cheng Li, Yue Lu, Qiaozhu Mei, Dong Wang, and Sandeep Pandey 2015. Click-through Prediction for Advertising in Twitter Timeline KDD. 1959--1968.
[11]
Jimmy J. Lin and Alek Kolcz 2012. Large-scale machine learning at twitter. In SIGMOD. 793--804.
[12]
H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica 2013. Ad Click Prediction: A View from the Trenches. In KDD. 1222--1230.
[13]
Xiangrui Meng, Joseph K. Bradley, Burak Yavuz, Evan R. Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, D. B. Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2015. MLlib: Machine Learning in Apache Spark. CoRR Vol. abs/1505.06807 (2015).
[14]
J.I. Munro and M.S. Paterson 1980. Selection and sorting with limited storage. Theoretical Computer Science Vol. 12, 3 (1980), 315--323.
[15]
Sinno Jialin Pan and Qiang Yang 2010. A Survey on Transfer Learning. IEEE Trans. on Knowl. and Data Eng. Vol. 22, 10 (Oct. 2010), 1345--1359.
[16]
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Franccois Crespo, and Dan Dennison 2015. Hidden Technical Debt in Machine Learning Systems. NIPS. 2503--2511.
[17]
Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, and Benjamin Recht. 2016. KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics. CoRR Vol. abs/1610.09451 (2016).
[18]
Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, and Matei Zaharia. 2016. ModelDB: a system for machine learning model management HILDA@SIGMOD. 14.
[19]
Cassandra Xia, Clemens Mewald, D. Sculley, David Soergel, George Roumpos, Heng-Tze Cheng, Illia Polosukhin, Jamie Alexander Smith, Jianwei Xie, Lichan Hong, Martin Wicke, Mustafa Ispir, Philip Daniel Tucker, Yuan Tang, and Zakaria Haque 2017. Train and Distribute: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks. KDD (under review).
[20]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? NIPS. 3320--3328.
[21]
Martin Zinkevich. 2016. Rules of Machine Learning. In NIPS Workshop on Reliable Machine Learning. Invited Talk.

Cited By

View all
  • (2024)Towards Trustworthy Experimental Replication in SLICES-RI2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619815(672-677)Online publication date: 3-Jun-2024
  • (2024)Instance segmentation on distributed deep learning big data clusterJournal of Big Data10.1186/s40537-023-00871-911:1Online publication date: 2-Jan-2024
  • (2024)AutoDW: Automatic Data Wrangling Leveraging Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695267(2041-2052)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2017

Check for updates

Author Tags

  1. continuous training
  2. end-to-end platform
  3. large-scale machine learning

Qualifiers

  • Research-article

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,201
  • Downloads (Last 6 weeks)115
Reflects downloads up to 24 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Trustworthy Experimental Replication in SLICES-RI2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619815(672-677)Online publication date: 3-Jun-2024
  • (2024)Instance segmentation on distributed deep learning big data clusterJournal of Big Data10.1186/s40537-023-00871-911:1Online publication date: 2-Jan-2024
  • (2024)AutoDW: Automatic Data Wrangling Leveraging Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695267(2041-2052)Online publication date: 27-Oct-2024
  • (2024)Data Validation Utilizing Expert Knowledge and Shape ConstraintsJournal of Data and Information Quality10.1145/366182616:2(1-27)Online publication date: 25-Jun-2024
  • (2024)Warm-Starting and Quantum Computing: A Systematic Mapping StudyACM Computing Surveys10.1145/365251056:9(1-31)Online publication date: 13-Mar-2024
  • (2024)Data Management for ML-Based Analytics and BeyondACM / IMS Journal of Data Science10.1145/36110931:1(1-23)Online publication date: 16-Jan-2024
  • (2024)Deep Neural Networks and Tabular Data: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322916135:6(7499-7519)Online publication date: Jun-2024
  • (2024)Common Metadata Framework: Integrated Framework for Trustworthy Artificial Intelligence PipelinesIEEE Internet Computing10.1109/MIC.2024.337717028:3(37-44)Online publication date: 21-Mar-2024
  • (2024)Hybrid Evaluation for Occlusion-based Explanations on CNN Inference Queries2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00078(953-966)Online publication date: 13-May-2024
  • (2024)A Way of Making Smart Health Through Collaborating Machine Learning with the Bibliometrics2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)10.1109/ICACITE60783.2024.10617395(209-214)Online publication date: 14-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media