Skip to content

Latest commit

 

History

History
77 lines (46 loc) · 5.49 KB

reliability-cosmos-mongodb.md

File metadata and controls

77 lines (46 loc) · 5.49 KB
title description author ms.author ms.reviewer ms.service ms.subservice ms.topic ms.date ms.custom CustomerIntent
Reliability in Azure Cosmos DB for MongoDB vCore
Find out about reliability in Azure Cosmos DB for MongoDB vCore
anaharris-ms
gahllevy
sidandrews
azure-cosmos-db
mongodb
concept-article
03/11/2024
references_regions, subject-reliability
As a cloud architect/engineer, I need general guidance reliability in Azure Cosmos DB for MongoDB vCore

Reliability in Azure Cosmos DB for MongoDB vCore

[!INCLUDEMongoDB vCore]

This article contains detailed information on regional resiliency with availability zones and cross-region disaster recovery and business continuity for Azure Cosmos DB for MongoDB vCore.

For an architectural overview of reliability in Azure, see Azure reliability.

Availability zone support

[!INCLUDE Availability zone description]

To gain availability zone support, you must enable High availability (HA).

HA avoids database downtime by maintaining standby replicas of every shard in a cluster. If a shard goes down, Azure Cosmos DB for MongoDB vCore switches incoming connections from the failed shard to its standby replica.

When HA is enabled in a region that supports availability zones, HA replica shards are provisioned in a different availability zone from their primary shards. HA replicas don't receive requests from clients unless their primary shard fails.

If HA is disabled, each shard has its own locally redundant storage (LRS) with three synchronous replicas maintained by Azure Storage service. If there's a single replica failure, the Azure Storage service detects the failure, and transparently re-creates the relevant data. For LRS storage durability, see Summary of redundancy options. However, in the case of a region failure, you run the risk of extensive downtime and possible data loss.

Create a resource with availability zones enabled

To enable availability zones, you must enable High availability (HA) when creating a cluster or in the Scale section of an existing cluster in the Azure portal.

Cross-region disaster recovery and business continuity

[!INCLUDE introduction to disaster recovery]

Azure Cosmos DB for MongoDB vCore does not provide built-in automatic failover or disaster recovery. Planning for high availability is a critical step as your solution scales.

Disaster recovery in single-region geography

To maximize your uptime, plan ahead to maintain business continuity and prepare for disaster recovery with Azure Cosmos DB for MongoDB vCore.

While Azure services are designed to maximize uptime, unplanned service outages might occur. A disaster recovery plan ensures that you have a strategy in place for handling regional service outages.

Azure Cosmos DB for MongoDB vCore automatically takes backups of your data at regular intervals. The automatic backups are taken without affecting the performance or availability of the database operations. All backups are performed automatically in the background and stored separately from the source data in a storage service. These automatic backups are useful in scenarios when you accidentally delete or modify resources and later require the original versions.

Automatic backups are retained in various intervals based on whether the cluster is currently active or recently deleted.

Retention period
Active clusters 35 days
Deleted clusters 7 days

Design for high availability

High availability (HA) should be enabled for critical Azure Cosmos DB for MongoDB vCore clusters running production workloads. In an HA-enabled cluster, each shard serves as a primary along with a hot-standby shard provisioned in another availability zone. Replication between the primary and the secondary shard is synchronous by default. Any modification to the database is persisted on both the primary and the secondary (hot-standby) shards before a response from the database is received.

The service maintains health checks and heartbeats to each primary and secondary shard of the cluster. If a primary shard becomes unavailable due to a zone or regional outage, the secondary shard is automatically promoted to become the new primary and a subsequent secondary shard is built for the new primary. In addition, if a secondary shard becomes unavailable, the service auto creates a new secondary shard with a full copy of data from the primary.

If the service triggers a failover from the primary to the secondary shard, connections are seamlessly routed under the covers to the new primary shard.

Synchronous replication between the primary and secondary shards guarantees no data loss if there's a failover.

Next steps