research-article

Open access

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1387 - 1395

https://doi.org/10.1145/3097983.3098021

Published: 13 August 2017 Publication History

Abstract

Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components---a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt.

We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions.

We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.

Supplementary Material

MP4 File (cheng_machine_learning.mp4)

Download
376.58 MB

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning OSDI. 265--283.

Digital Library

[2]

Rami Abousleiman, Guangzhi Qu, and Osamah A. Rawashdeh. 2013. North Atlantic Right Whale Contact Call Detection. CoRR Vol. abs/1304.7851 (2013).

[3]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In DLRS. 7--10.

Digital Library

[4]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. RecSys. 191--198.

[5]

Yann Dauphin, Razvan Pascanu, Caglar Gülccehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. CoRR Vol. abs/1406.2572 (2014).

[6]

Philippe Flajolet, Éric Fusy, Olivier Gandouet, and et al. 2007. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm AOFA.

[7]

Tim Kraska, Ameet Talwalkar, John C. Duchi, Rean Griffith, Michael J. Franklin, and Michael I. Jordan 2013. MLbase: A Distributed Machine-learning System. CIDR.

[8]

Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, and Ken Goldberg 2016. ActiveClean: Interactive Data Cleaning For Statistical Modeling. PVLDB, Vol. 9, 12 (2016), 948--959.

Digital Library

[9]

Sara Landset, Taghi M. Khoshgoftaar, Aaron N. Richter, and Tawfiq Hasanin 2015. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data, Vol. 2, 1 (2015), 24.

[10]

Cheng Li, Yue Lu, Qiaozhu Mei, Dong Wang, and Sandeep Pandey 2015. Click-through Prediction for Advertising in Twitter Timeline KDD. 1959--1968.

[11]

Jimmy J. Lin and Alek Kolcz 2012. Large-scale machine learning at twitter. In SIGMOD. 793--804.

Digital Library

[12]

H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica 2013. Ad Click Prediction: A View from the Trenches. In KDD. 1222--1230.

[13]

Xiangrui Meng, Joseph K. Bradley, Burak Yavuz, Evan R. Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, D. B. Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2015. MLlib: Machine Learning in Apache Spark. CoRR Vol. abs/1505.06807 (2015).

[14]

J.I. Munro and M.S. Paterson 1980. Selection and sorting with limited storage. Theoretical Computer Science Vol. 12, 3 (1980), 315--323.

[15]

Sinno Jialin Pan and Qiang Yang 2010. A Survey on Transfer Learning. IEEE Trans. on Knowl. and Data Eng. Vol. 22, 10 (Oct. 2010), 1345--1359.

Digital Library

[16]

D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Franccois Crespo, and Dan Dennison 2015. Hidden Technical Debt in Machine Learning Systems. NIPS. 2503--2511.

[17]

Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, and Benjamin Recht. 2016. KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics. CoRR Vol. abs/1610.09451 (2016).

[18]

Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, and Matei Zaharia. 2016. ModelDB: a system for machine learning model management HILDA@SIGMOD. 14.

[19]

Cassandra Xia, Clemens Mewald, D. Sculley, David Soergel, George Roumpos, Heng-Tze Cheng, Illia Polosukhin, Jamie Alexander Smith, Jianwei Xie, Lichan Hong, Martin Wicke, Mustafa Ispir, Philip Daniel Tucker, Yuan Tang, and Zakaria Haque 2017. Train and Distribute: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks. KDD (under review).

[20]

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? NIPS. 3320--3328.

[21]

Martin Zinkevich. 2016. Rules of Machine Learning. In NIPS Workshop on Reliable Machine Learning. Invited Talk.

Cited By

Andreou POsmolovskiy AHadjidemetriou PFdida S(2024)Towards Trustworthy Experimental Replication in SLICES-RI2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619815(672-677)Online publication date: 3-Jun-2024
https://doi.org/10.23919/IFIPNetworking62109.2024.10619815
Elhmadany MElmadah IAbdelmunim H(2024)Instance segmentation on distributed deep learning big data clusterJournal of Big Data10.1186/s40537-023-00871-911:1Online publication date: 2-Jan-2024
https://doi.org/10.1186/s40537-023-00871-9
Liu LHasegawa SSampat SXenochristou MChen WKato TKakibuchi TAsai TFilkov VRay BZhou M(2024)AutoDW: Automatic Data Wrangling Leveraging Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695267(2041-2052)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695267
Show More Cited By

Index Terms

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

Recommendations

Factorizing YAGO: scalable machine learning for linked data
WWW '12: Proceedings of the 21st international conference on World Wide Web

Vast amounts of structured information have been published in the Semantic Web's Linked Open Data (LOD) cloud and their size is still growing rapidly. Yet, access to this information via reasoning and querying is sometimes difficult, due to LOD's size, ...
Weak consistency and stochastic environments: harmonization of replicated machine learning models
PaPoC '16: Proceedings of the 2nd Workshop on the Principles and Practice of Consistency for Distributed Data

Many machine learning (ML) models are of a stochastic nature. We aim to combine the principles of weak consistency with large scale distributed machine learning. We see interesting opportunities in this domain in (1) perceiving parallel ML algorithms ...
LightLDA: Big Topic Models on Modest Computer Clusters
WWW '15: Proceedings of the 24th International Conference on World Wide Web

When building large-scale machine learning (ML) programs, such as massive topic models or deep neural networks with up to trillions of parameters and training examples, one usually assumes that such massive tasks can only be attempted with industrial-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2017

2240 pages

ISBN:9781450348874

DOI:10.1145/3097983

General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM

Copyright © 2017 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2017

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '17

Sponsor:

KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2017

NS, Halifax, Canada

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

197
Total Citations
View Citations
27,751
Total Downloads

Downloads (Last 12 months)1,201
Downloads (Last 6 weeks)115

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Andreou POsmolovskiy AHadjidemetriou PFdida S(2024)Towards Trustworthy Experimental Replication in SLICES-RI2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619815(672-677)Online publication date: 3-Jun-2024
https://doi.org/10.23919/IFIPNetworking62109.2024.10619815
Elhmadany MElmadah IAbdelmunim H(2024)Instance segmentation on distributed deep learning big data clusterJournal of Big Data10.1186/s40537-023-00871-911:1Online publication date: 2-Jan-2024
https://doi.org/10.1186/s40537-023-00871-9
Liu LHasegawa SSampat SXenochristou MChen WKato TKakibuchi TAsai TFilkov VRay BZhou M(2024)AutoDW: Automatic Data Wrangling Leveraging Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695267(2041-2052)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695267
Bachinger FEhrlinger LKronberger GWöss W(2024)Data Validation Utilizing Expert Knowledge and Shape ConstraintsJournal of Data and Information Quality10.1145/366182616:2(1-27)Online publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1145/3661826
Truger FBarzen JBechtold MBeisel MLeymann FMandl AYussupov V(2024)Warm-Starting and Quantum Computing: A Systematic Mapping StudyACM Computing Surveys10.1145/365251056:9(1-31)Online publication date: 13-Mar-2024
https://dl.acm.org/doi/10.1145/3652510
Kang DGuibas JBailis PHashimoto TSun YZaharia M(2024)Data Management for ML-Based Analytics and BeyondACM / IMS Journal of Data Science10.1145/36110931:1(1-23)Online publication date: 16-Jan-2024
https://dl.acm.org/doi/10.1145/3611093
Borisov VLeemann TSeßler KHaug JPawelczyk MKasneci G(2024)Deep Neural Networks and Tabular Data: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322916135:6(7499-7519)Online publication date: Jun-2024
https://doi.org/10.1109/TNNLS.2022.3229161
Koomthanam ATripathy ASerebryakov SNayak GFoltin MBhattacharya S(2024)Common Metadata Framework: Integrated Framework for Trustworthy Artificial Intelligence PipelinesIEEE Internet Computing10.1109/MIC.2024.337717028:3(37-44)Online publication date: 21-Mar-2024
https://dl.acm.org/doi/10.1109/MIC.2024.3377170
Ding GXu CQian W(2024)Hybrid Evaluation for Occlusion-based Explanations on CNN Inference Queries2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00078(953-966)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00078
Vignesh TSruthi PMohanraj TKumar GVarkale ASheela M(2024)A Way of Making Smart Health Through Collaborating Machine Learning with the Bibliometrics2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)10.1109/ICACITE60783.2024.10617395(209-214)Online publication date: 14-May-2024
https://doi.org/10.1109/ICACITE60783.2024.10617395
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents