skip to main content
10.1145/3230543.3230546acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

FBOSS: building switch software at scale

Published: 07 August 2018 Publication History

Abstract

The conventional software running on network devices, such as switches and routers, is typically vendor-supplied, proprietary and closed-source; as a result, it tends to contain extraneous features that a single operator will not most likely fully utilize. Furthermore, cloud-scale data center networks often times have software and operational requirements that may not be well addressed by the switch vendors.
In this paper, we present our ongoing experiences on overcoming the complexity and scaling issues that we face when designing, developing, deploying and operating an in-house software built to manage and support a set of features required for data center switches of a large scale Internet content provider. We present FBOSS, our own data center switch software, that is designed with the basis on our switch-as-a-server and deploy-early-and-iterate principles. We treat software running on data center switches as any other software services that run on a commodity server. We also build and deploy only a minimal number of features and iterate on it. These principles allow us to rapidly iterate, test, deploy and manage FBOSS at scale. Over the last five years, our experiences show that FBOSS's design principles allow us to quickly build a stable and scalable network. As evidence, we have successfully grown the number of FBOSS instances running in our data center by over 30x over a two year period.

References

[1]
{n. d.}. Ansible is Simple IT Automation. https://www.ansible.com/.
[2]
{n. d.}. Apache Thrift. https://thrift.apache.org/.
[3]
{n. d.}. Chef. https://www.chef.io/chef/.
[4]
{n. d.}. FBOSS Open Source. https://github.com/facebook/fboss.
[5]
{n. d.}. FBOSS Thrift Management Interface. https://github.com/facebook/fboss/blob/master/fboss/agent/if/ctrl.thrift.
[6]
{n. d.}. Jenkins. https://jenkins-ci.org/.
[7]
{n. d.}. Kerberos: The Network Authentication Protocol. http://web.mit.edu/kerberos/.
[8]
{n. d.}. Microsoft showcases the Azure Cloud Switch. https://azure.microsoft.com/en-us/blog/microsoft-showcases-the-azure-cloud-switch-acs/.
[9]
{n. d.}. OpenConfig. https://github.com/openconfig/public.
[10]
{n. d.}. Travis CI. https://travis-ci.org/.
[11]
2016. OpenSwitch. http://www.openswitch.net/.
[12]
2017. Cisco NX-OS Software. http://www.cisco.com/c/en/us/products/ios-nx-os-software/nx-os-software/index.html.
[13]
2018. Facebook Open Routing Group. https://www.facebook.com/groups/openr/about/.
[14]
2018. HwSwitch implementation for Mellanox Switch. https://github.com/facebook/fboss/pull/67.
[15]
Lior Abraham, John Allen, Oleksandr Barykin, Vinayak Borkar, Bhuwan Chopra, Ciprian Gerea, Daniel Merl, Josh Metzler, David Reiss, Subbu Subramanian, Janet L. Wiener, and Okay Zed. 2013. Scuba: Diving into Data at Facebook. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1057--1067.
[16]
Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010. A View of Cloud Computing. Commun. ACM 53, 4 (April 2010), 50--58.
[17]
Randy Bias. 2016. The History of Pets vs Cattle and How to Use the Analogy Properly. http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/.
[18]
Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. 2014. P4: Programming Protocol-independent Packet Processors. SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), 87--95.
[19]
Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN. In SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). 99--110.
[20]
Cumulus. {n. d.}. Cumulus Linux. https://cumulusnetworks.com/products/cumulus-linux/.
[21]
Harrington D., R. Presuhn, and Wijnen B. 2002. An Architecture for Describing Simple Network Management Protocol (SNMP) Management Frameworks. https://tools.ietf.org/html/rfc4862.
[22]
Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for Improving Regression Testing in Continuous Integration Development Environments. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 235--245.
[23]
HP Enterprise. {n. d.}. HP Openview. https://software.microfocus.com/en-us/products/application-lifecycle-management/overview.
[24]
Facebook. 2017. Wedge 100S 32x100G Specification. http://www.opencompute.org/products/facebook-wedge-100s-32x100g/.
[25]
Tian Fang. 2015. Introducing OpenBMC: an open software framework for next-generation system management. https://code.facebook.com/posts/1601610310055392.
[26]
A. Farrel, J.-P. Vasseur, and J. Ash. 2006. A Path Computation Element (PCE)-Based Architecture. Technical Report. Internet Engineering Task Force.
[27]
Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. 2011. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM '11). ACM, New York, NY, USA, 350--361.
[28]
Natasha Gude, Teemu Koponen, Justin Pettit, Ben Pfaff, Martín Casado, Nick McKeown, and Scott Shenker. 2008. NOX: Towards an Operating System for Networks. SIGCOMM Comput. Commun. Rev. 38, 3 (July 2008), 105--110.
[29]
IBM. {n. d.}. Tivoli Netcool/OMNIbus. https://www-03.ibm.com/software/products/en/ibmtivolinetcoolomnibus.
[30]
Big Switch Networks Inc. 2013. Open Network Linux. https://opennetlinux.org/.
[31]
Xin Jin, Nathan Farrington, and Jennifer Rexford. 2016. Your Data Center Switch is Trying Too Hard. In Proceedings of the Symposium on SDN Research (SOSR '16). ACM, New York, NY, USA, Article 12, 6 pages.
[32]
D Joachimpillai and JH Salim. 2004. Forwarding and Control Element Separation (forces). https://datatracker.ietf.org/wg/forces/about/.
[33]
Yousef Khalidi. 2017. SONiC: The networking switch software that powers the Microsoft Global Cloud. https://azure.github.io/SONiC/.
[34]
Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu, Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, and Scott Shenker. 2010. Onix: A Distributed Control Platform for Large-scale Production Networks. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 351--364. http://dl.acm.org/citation.cfm?id=1924943.1924968
[35]
David L. Tennenhouse and David J. Wetherall. 2000. Towards an Active Network Architecture. 26 (07 2000), 14.
[36]
P. Lapukhov, A. Premji, and Mitchell J. 2016. Use of BGP for Routing in Large-Scale Data Centers. https://tools.ietf.org/html/rfc7938.
[37]
Ville Lauriokari. 2009. Copy-On-Write 101. https://hackerboss.com/copy-on-write-101-part-1-what-is-it/.
[38]
K. Lougheed, Cisco Systems, and Y. Rkhter. 1989. A Border Gateway Protocol (BGP). https://tools.ietf.org/html/rfc1105.
[39]
R. P. Luijten, A. Doering, and S. Paredes. 2014. Dual function heat-spreading and performance of the IBM/ASTRON DOME 64-bit microserver demonstrator. In 2014 IEEE International Conference on IC Design Technology. 1--4.
[40]
Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. 2008. OpenFlow: Enabling Innovation in Campus Networks. SIGCOMM Comput. Commun. Rev. 38, 2 (March 2008), 69--74.
[41]
Juniper Networks. 2017. Junos OS. https://www.juniper.net/us/en/products-services/nos/junos/.
[42]
Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A Fast, Scalable, In-memory Time Series Database. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1816--1827.
[43]
A.D. Persson, C.A.C. Marcondes, and D.P. Johnson. 2013. Method and system for network stack tuning. https://www.google.ch/patents/US8467390 US Patent 8,467,390.
[44]
Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan Jackson, Andy Zhou, Jarno Rajahalme, Jesse Gross, Alex Wang, Joe Stringer, Pravin Shelar, Keith Amidon, and Martin Casado. 2015. The Design and Implementation of Open vSwitch. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). USENIX Association, Oakland, CA, 117--130. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/pfaff
[45]
Danilo Sato. 2014. Canary Release. https://martinfowler.com/bliki/CanaryRelease.html.
[46]
Brandon Schlinker, Hyojeong Kim, Timothy Cui, Ethan Katz-Bassett, Harsha V. Madhyastha, Italo Cunha, James Quinn, Saif Hasan, Petr Lapukhov, and Hongyi Zeng. 2017. Engineering Egress with Edge Fabric: Steering Oceans of Content to the World. In Proceedings of the ACM SIGCOMM 2017 Conference (SIGCOMM '17). ACM, New York, NY, USA, 418--431.
[47]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. SIGCOMM Comput. Commun. Rev. 45, 4 (Aug. 2015), 183--197.
[48]
Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, and Hongyi Zeng. 2016. Robotron: Top-down Network Management at Face-book Scale. In Proceedings of the ACM SIGCOMM 2016 Conference (SIGCOMM '16). ACM, New York, NY, USA, 426--439.
[49]
David Szabados. 2017. Broadcom Ships Tomahawk 3, Industry's Highest Bandwidth Ethernet Switch Chip at 12.8 Terabits per Second. http://investors.broadcom.com/phoenix.zhtml?c=203541&p=irol-newsArticle&ID=2323373.
[50]
S Thomson, Narten T., and Jinmei T. 2007. IPv6 Stateless Address Autoconfiguration. https://tools.ietf.org/html/rfc4862.
[51]
F. Wang, L. Gao, S. Xiaozhe, H. Harai, and K. Fujikawa. 2017. Towards reliable and lightweight source switching for datacenter networks. In IEEE INFOCOM 2017 - IEEE Conference on Computer Communications. 1--9.
[52]
Jun Xiao. 2017. New Approach to OVS Datapath Performance. http://openvswitch.org/support/boston2017/1530-jun-xiao.pdf.
[53]
Xilinx. {n. d.}. Lightweight Ethernet Switch. https://www.xilinx.com/applications/wireless-communications/wireless-connectivity/lightweight-ethernet-switch.html.

Cited By

View all
  • (2024)A Review of Intelligent Configuration and Its Security for Complex NetworksChinese Journal of Electronics10.23919/cje.2023.00.00133:4(920-947)Online publication date: Jul-2024
  • (2024)A Decentralized SDN Architecture for the WANProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672257(938-953)Online publication date: 4-Aug-2024
  • (2024)NetRen: Service Migration-Driven Network Renascence with Synthesizing Updated ConfigurationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651365(708-721)Online publication date: 27-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication
August 2018
604 pages
ISBN:9781450355674
DOI:10.1145/3230543
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FBOSS
  2. Facebook
  3. data center networks
  4. network management
  5. network monitoring
  6. switch software design

Qualifiers

  • Research-article

Conference

SIGCOMM '18
Sponsor:
SIGCOMM '18: ACM SIGCOMM 2018 Conference
August 20 - 25, 2018
Budapest, Hungary

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)414
  • Downloads (Last 6 weeks)39
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Review of Intelligent Configuration and Its Security for Complex NetworksChinese Journal of Electronics10.23919/cje.2023.00.00133:4(920-947)Online publication date: Jul-2024
  • (2024)A Decentralized SDN Architecture for the WANProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672257(938-953)Online publication date: 4-Aug-2024
  • (2024)NetRen: Service Migration-Driven Network Renascence with Synthesizing Updated ConfigurationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651365(708-721)Online publication date: 27-Apr-2024
  • (2024)Interleaved Function Stream Execution Model for Cache-Aware High-Speed Stateful Packet Processing2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00056(531-542)Online publication date: 23-Jul-2024
  • (2023)Beyond a Centralized Verifier: Scaling Data Plane Checking via Distributed, On-Device VerificationProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604843(152-166)Online publication date: 10-Sep-2023
  • (2023)An End-Host-Importance-Aware Secure Service-Enabled Hybrid SDN DeploymentIEEE Transactions on Network and Service Management10.1109/TNSM.2022.320869520:2(2056-2070)Online publication date: Jun-2023
  • (2023)Scalable and Robust East-West Forwarding Framework for Hyperscale CloudsIEEE/ACM Transactions on Networking10.1109/TNET.2023.326977231:6(3063-3079)Online publication date: Dec-2023
  • (2022)SwitchTxProceedings of the VLDB Endowment10.14778/3551793.355183815:11(2881-2894)Online publication date: 1-Jul-2022
  • (2022)Network can check itselfProceedings of the 21st ACM Workshop on Hot Topics in Networks10.1145/3563766.3564095(85-92)Online publication date: 14-Nov-2022
  • (2022)SwitchVProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544220(365-379)Online publication date: 22-Aug-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media