Data Infrastructure
Data Infrastructure
Our data infrastructure team’s mission is to build a world-class data infrastructure that helps us ultimately deliver value to our members. Through highly operable, high leverage, and easy-to-use online and nearline infrastructure for data storage, indexing, streaming, media, information retrieval and derived data applications, we’re powering LinkedIn and the products and services our members use everyday.
Teams & Project Spotlights
Streams Infrastructure
Developed at Linkedin, Apache Kafka, Apache Samza and Brooklin form a world-class data processing infrastructure that powers our community of more than 660 million members.
Storage Infrastructure
Our infrastructure must be able to store large volumes of data, while handling a high volume of queries per second (QPS). We’ve built tools, such as Espresso, Venice, and Ambry, to ensure efficient storage at scale.
Feed Infrastructure
Feed Infrastructure owns multiple large scale distributed systems that power the feeds and many of the search experiences core to our LinkedIn members’ experiences. Our technology domain includes information retrieval, machine learning, and distributed datastore.
Machine Learning Infrastructure
What is the point of learning if you don't apply the learning to change yourself and the world? In partnership with AI and our sister teams in AI Infrastructure, the Machine Learning Infrastructure facilitates the robust, efficient, and straightforward application of machine learned capability to LinkedIn's mission.
Search Infrastructure
Our members use search to find people, jobs, companies, groups, and other professional content. To power these solutions, our search platform brings together information retrieval, machine learning, distributed systems, big data, and other fundamental areas of computer science.
Graph
Datasets on the scale of the Economic Graph cannot be encoded in the storage of a single computer, hence we designed a distributed system that could scale to support—both now and in the future—one of the world’s largest social network graphs.
Data Productivity
Data Productivity’s mission is high productivity via easy and powerful interfaces to data systems. The data productivity team focuses on the experiences of creating, using, or reviewing data entities, with the goal to intelligently improve developer agility.
Media Infra (Vector)
Vector is LinkedIn’s media processing and serving infrastructure. Vector handles creation, processing, storing, and presentation of all media content that includes images, videos, documents, audio and related metadata. Vector provides both internal and member-facing APIs, powering use cases like LinkedIn profile images, feed videos, messaging attachments, LinkedIn Live and more. Over 100M+ images and videos are processed per day and served at more than a million QPS at peak.
Interested in joining our team at LinkedIn?