Skip to main content
Microsoft Security

CISO Series: Lessons learned from the Microsoft SOC—Part 2a: Organizing people

In the second post in our series, we focus on the most valuable resource in the security operations center (SOC)—our people. This series is designed to share our approach and experience with operations, so you can use what we learned to improve your SOC. In Part 1: Organization, we covered the SOC’s organizational role and mission, culture, and metrics.

The lessons in the series come primarily from Microsoft’s corporate IT security operation team, one of several specialized teams in the Microsoft Cyber Defense Operations Center (CDOC). We also include lessons our Detection and Response Team (DART) have learned helping our customers respond to major incidents.

People are the most valuable asset in the SOC—their experience, skill, insight, creativity, and resourcefulness are what makes our SOC effective. Our SOC management team spends a lot of time thinking about how to ensure our people are set up with what they need to succeed and stay engaged. As we’ve improved our processes, we’ve been able to decrease the time it takes to ramp people up and increase employee enjoyment of their jobs.

Today, we cover the first two aspects of how to set up people in the SOC for success:

Empower humans with automation

Rapidly sorting out the signal (real detections) from the noise (false positives) in the SOC requires investing in both humans and automation. We strongly believe in the power of automation and technology to reduce human toil, but ultimately, we’re dealing with human attack operators and human judgement is critical to the process.

In our SOC, automation is not about using efficiency to remove humans from the process—it is about empowering humans. We continuously think about how we can automate repetitive tasks from the analyst’s job, so they can focus on the complex problems that people are uniquely able to solve.

Automation empowers humans to do more in the SOC by increasing response speed and capturing human expertise. The toil our staff experiences comes mostly from repetitive tasks and repetitive tasks come from either attackers or defenders doing the same things over and over. Repetitive tasks are ideal candidates for automation.

We also found that we need to constantly refine the automation because attackers are creative and persistent, constantly innovating to avoid detections and preventive controls. When an effective attack method is identified (like phishing), they exploit it until it stops working. But they also continually innovate new tactics to evade defenses introduced by the cybersecurity community. Given the profit potential of attacks, we expect the challenges of evolving attacks to continue for the foreseeable future.

When repetitive and boring work is automated, analysts can apply more of their creative minds and energy to solving the new problems that attackers present to them and proactively hunting for attackers that got past the first lines of defense. We’ll discuss areas where we use automation and machine learning in “Part 3: Technology.”

Microsoft SOC teams and tiers model

At Microsoft, we organized our SOC into specialized teams, allowing them to better develop and apply deep expertise, which supports the overall goals of reducing time to acknowledge and remediate.

This diagram represents the key SOC functions: threat intelligence, incident management, and SOC analyst tiers:

Image showing key SOC functions: threat intelligence, incident management, and SOC analysts (tiers 1, 2, and 3).

Threat intelligence—We have several threat intelligence teams at Microsoft that support the SOC and other business functions. Their role is to both inform business stakeholders of risk and provide technical support for incident investigations, hunting operations, and defensive measures for known threats. These strategic (business) and tactical (technical) intelligence goals are related but distinctly different from each other. We task different teams for each goal and ensure processes are in place (such as daily standup meetings) to keep them in close contact.

Incident management—Enterprise-wide coordination of incidents, impact assessment, and related tasks are handled by dedicated personnel separate from technical analyst teams. At Microsoft, these incident response teams work with the SOC and business stakeholders to coordinate actions that may impact services or business units. Additionally, this team brings in legal, compliance, and privacy experts as needed to consult and advise on actions regarding regulatory aspects of incidents. This is particularly important at Microsoft because we’re compliant with a large number of international standards and regulations.

SOC analyst tiers—This three-tier model for SOC analysts will probably look familiar to seasoned SOC professionals, though there are some subtleties in our model we don’t see widely in the industry.

Image showing Microsoft's Corporate IT SOC tiers and tools: alert queue (hot path), proactive hunt (cold path), tiers 3, 2, and 1, and finally, automation.

Our organization uses the term hot path and cold path to describe how we discover adversaries and optimize processes to handle them.

Roles and functions of the SOC analyst tiers

Tier 1—This team is the primary front line for and focuses on high-speed remediation over a large volume of incidents. Tier 1 analysts respond to a very specific set of alert sources and follow prescriptive instructions to investigate, remediate, and document the incidents. The rule of thumb for alerts that Tier 1 handles is that it can be typically remediated within seconds to minutes. The incidents will be escalated to Tier 2 if the incident isn’t covered by a documented Tier 1 procedure or it requires involved/advanced remediation (for example, device isolation and cleanup).

In addition:

Tier 2—This team is focused on incidents that require deeper analysis and remediation. Many Tier 2 incidents have been escalated from Tier 1 analysts, but Tier 2 also directly monitors alerts for sensitive assets and known attacker campaigns. These incidents are usually more complex and require an approach that is still structured, but much more flexible than Tier 1 procedures. Additionally, some Tier 2 analysts also proactively hunt for adversaries (typically using lower priority alerts from the same Microsoft Threat Protection tools they use to manage reactive incidents).

Tier 3—This team is focused primarily on advanced hunting and sophisticated analysis to identify anomalies that may indicate advanced adversaries. Most incidents are remediated at Tiers 1 and 2 (96 percent) and only unprecedented findings or deviations from norms are escalated to Tier 3 teams. Tier 3 team members have a high degree of freedom to bring their different skills, backgrounds, and approaches to the goal of ferreting out red team/hidden adversaries. Tier 3 team members have backgrounds as security professionals, data scientists, intelligence analysts, and more. These teams use different tools (Microsoft, custom, and third-party) to sift through a number of different datasets to uncover hidden adversary activity. A favorite of many analysts is the use of Kusto Query Language (KQL) queries across Microsoft Threat Protection tool datasets.

The structure of Tier 3 has changed over time, but has recently gravitated to four different functions:

Learn more

For more insights into Microsoft’s approach to using technology to empower people, watch Ann Johnson’s keynote at RSA 2019 and download our poster. For information on organizational culture and goals, read Lessons learned from the Microsoft SOC—Part 1: Organization. In addition, see our CISO series to learn more.

Stayed tuned for the second segment in “Lessons learned from the Microsoft SOC—Part 2,” where we’ll cover career paths and readiness programs for people in our SOC. And finally, we’ll wrap up this series with “Part 3: Technology,” where we’ll discuss the technology that enables our people to accomplish their mission.

For more discussion on some of these topics, see John and Kristina’s session (starting at 1:05:48) at Microsoft’s recent Virtual Security Summit.

Read more from this series