skip to main content
10.1145/2934872.2934874acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

Robotron: Top-down Network Management at Facebook Scale

Published: 22 August 2016 Publication History

Abstract

Network management facilitates a healthy and sustainable network. However, its practice is not well understood outside the network engineering community. In this paper, we present Robotron, a system for managing a massive production network in a top-down fashion. The system's goal is to reduce effort and errors on management tasks by minimizing direct human interaction with network devices. Engineers use Robotron to express high-level design intent, which is translated into low-level device configurations and deployed safely. Robotron also monitors devices' operational state to ensure it does not deviate from the desired state. Since 2008, Robotron has been used to manage tens of thousands of network devices connecting hundreds of thousands of servers globally at Facebook.

Supplementary Material

MP4 File (p426.mp4)

References

[1]
Apache thrift. http://thrift.apache.org/.
[2]
Django. https://www.djangoproject.com/.
[3]
Google Compute Engine Incident 15064. https://status.cloud.google.com/incident/compute/15064.
[4]
HPE Network Management (HP OpenView). http://www8.hp.com/us/en/software-solutions/network-management/index.html.
[5]
ISO/IEC 7498-4: Information processing systems – Open Systems Interconnection – Basic Reference Model – Part 4: Management framework.
[6]
OpenConfig. http://www.openconfig.net/.
[7]
Opsware. http://www.opsware.com/.
[8]
Root Cause Analysis for recent Windows Azure Service Interruption in Western Europe. https://goo.gl/UtrzhL.
[9]
Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region. https://aws.amazon.com/message/65648/.
[10]
Tivoli Netcool Configuration Manager. http://ibm.com/software/products/en/tivonetcconfmana.
[11]
A. Akella and R. Mahajan. A call to arms for management plane analytics. In Proceedings of the 13th ACM Workshop on Hot Topics in Networks, HotNets-XIII, 2014.
[12]
H. Ballani and P. Francis. Conman: A step towards network manageability. In Proceedings of the 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM '07, 2007.
[13]
T. Benson, A. Akella, and D. Maltz. Unraveling the complexity of network management. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, NSDI'09, 2009.
[14]
T. Benson, A. Akella, and D. A. Maltz. Mining policies from enterprise network configuration. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement Conference, IMC '09, 2009.
[15]
T. Benson, A. Akella, and A. Shaikh. Demystifying configuration challenges and trade-offs in network-based ISP services. In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM '11, 2011.
[16]
D. Caldwell et al. The cutting edge of ip router configuration. SIGCOMM Comput. Commun. Rev., 34(1):21–26, Jan. 2004.
[17]
Distributed Management Task Force, Inc. http://www.dmtf.org.
[18]
W. Enck et al. Configuration management at massive scale: system design and experience. Selected Areas in Communications, IEEE Journal on, 2009.
[19]
R. Enns, M. Bjorklund, J. Schoenwaelder, and A. Bierman. Network Configuration Protocol (NETCONF). RFC 6241 (Proposed Standard), June 2011.
[20]
N. Feamster and H. Balakrishnan. Detecting BGP configuration faults with static analysis. In Proceedings of the 2Nd Conference on Symposium on Networked Systems Design & Implementation - Volume 2, NSDI'05, 2005.
[21]
A. Fogel et al. A general approach to network configuration analysis. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI'15, 2015.
[22]
A. Gember-Jacobson et al. Management plane analytics. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference, IMC '15, 2015.
[23]
R. Gerhards. The Syslog Protocol. RFC 5424 (Proposed Standard), Mar. 2009.
[24]
A. Greenberg et al. A clean slate 4d approach to network control and management. SIGCOMM Comput. Commun. Rev., 35(5):41–54, Oct. 2005.
[25]
Y. Himura and Y. Yasuda. Discovering configuration templates of virtualized tenant networks in multi-tenancy datacenters via graph-mining. SIGCOMM Comput. Commun. Rev., 42(3), June 2012.
[26]
H. Kim, T. Benson, A. Akella, and N. Feamster. The evolution of network configuration: A tale of two campuses. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC '11, 2011.
[27]
T. Koponen et al. Onix: A distributed control platform for large-scale production networks. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, 2010.
[28]
P. Lapukhov, A. Premji, and J. Mitchell. Use of BGP for routing in large-scale data centers. Internet-draft, Internet Engineering Task Force, Apr. 2016. Work in Progress.
[29]
R. Mahajan, D. Wetherall, and T. Anderson. Understanding BGP misconfiguration. SIGCOMM Comput. Commun. Rev., 32(4), Aug. 2002.
[30]
D. A. Maltz et al. Routing design in operational networks: A look from the inside. In Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM '04, 2004.
[31]
B. Schlinker et al. Condor: Better topologies through declarative design. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, 2015.
[32]
A. Singh et al. Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, 2015.
[33]
P. Sun et al. A network-state management service. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, 2014.
[34]
X. Sun and G. G. Xie. Minimizing network complexity through integrated top-down design. In Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies, CoNEXT '13, 2013.
[35]
Y.-W. E. Sung et al. Towards systematic design of enterprise networks. In Proceedings of the 2008 ACM CoNEXT Conference, CoNEXT '08, 2008.
[36]
Y.-W. E. Sung et al. Modeling and understanding end-to-end class of service policies in operational networks. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM '09, 2009.
[37]
C. Tang et al. Holistic configuration management at facebook. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, 2015.
[38]
S. Vissicchio et al. Improving network agility with seamless BGP reconfigurations. IEEE/ACM Trans. Netw., 21(3):990–1002, June 2013.

Cited By

View all
  • (2024)A Review of Intelligent Configuration and Its Security for Complex NetworksChinese Journal of Electronics10.23919/cje.2023.00.00133:4(920-947)Online publication date: Jul-2024
  • (2024)Detecting Inconsistency between Network Design and Current State Based on Network Ontology BonsaiProceedings of the Asian Internet Engineering Conference 202410.1145/3674213.3674222(76-84)Online publication date: 9-Aug-2024
  • (2024)Topaz: Declarative and Verifiable Authoritative DNS at CDN-ScaleProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672240(891-903)Online publication date: 4-Aug-2024
  • Show More Cited By

Index Terms

  1. Robotron: Top-down Network Management at Facebook Scale

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCOMM '16: Proceedings of the 2016 ACM SIGCOMM Conference
    August 2016
    645 pages
    ISBN:9781450341936
    DOI:10.1145/2934872
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Facebook
    2. Network Management
    3. Robotron

    Qualifiers

    • Research-article

    Conference

    SIGCOMM '16
    Sponsor:
    SIGCOMM '16: ACM SIGCOMM 2016 Conference
    August 22 - 26, 2016
    Florianopolis, Brazil

    Acceptance Rates

    SIGCOMM '16 Paper Acceptance Rate 39 of 231 submissions, 17%;
    Overall Acceptance Rate 462 of 3,389 submissions, 14%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)433
    • Downloads (Last 6 weeks)61
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Review of Intelligent Configuration and Its Security for Complex NetworksChinese Journal of Electronics10.23919/cje.2023.00.00133:4(920-947)Online publication date: Jul-2024
    • (2024)Detecting Inconsistency between Network Design and Current State Based on Network Ontology BonsaiProceedings of the Asian Internet Engineering Conference 202410.1145/3674213.3674222(76-84)Online publication date: 9-Aug-2024
    • (2024)Topaz: Declarative and Verifiable Authoritative DNS at CDN-ScaleProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672240(891-903)Online publication date: 4-Aug-2024
    • (2024)Relational Network VerificationProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672238(213-227)Online publication date: 4-Aug-2024
    • (2024)In-Network Address Caching for Virtual NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672213(735-749)Online publication date: 4-Aug-2024
    • (2024)Enhancing Network Data Plane Analysis with Native Graph DatabaseNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575228(1-9)Online publication date: 6-May-2024
    • (2024)Toward Autonomous Trusted Networks-From Digital Twin PerspectiveIEEE Network10.1109/MNET.2024.335318038:3(84-91)Online publication date: May-2024
    • (2024)CloudPlanner: Minimizing Upgrade Risk of Virtual Network Devices for Large-Scale Cloud NetworksIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621109(741-750)Online publication date: 20-May-2024
    • (2024)Exploring Use of Symbolic Execution for Service Analysis2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)10.1109/DSN-S60304.2024.00014(12-16)Online publication date: 24-Jun-2024
    • (2023)Weaver Meets KANVAS: An Autonomous Closed-Loop Network Management SystemProceedings of the 18th Asian Internet Engineering Conference10.1145/3630590.3630594(28-36)Online publication date: 12-Dec-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media