title | description | ms.service | ms.subservice | ms.topic | ms.date | author | ms.author |
---|---|---|---|---|---|---|---|
Tutorial: Create a Jupyter Notebook to analyze data in your Azure Cosmos DB for NoSQL account
|
Learn how to use Visual Studio Code Jupyter notebooks to import data to Azure Cosmos DB for NoSQL and analyze the data.
|
azure-cosmos-db |
nosql |
overview |
10/09/2023 |
markjbrown |
mjbrown |
Tutorial: Create a Jupyter Notebook to analyze data in your Azure Cosmos DB for NoSQL account using Visual Studio Code Jupyter notebooks
[!INCLUDENoSQL]
This tutorial walks through how to use the Visual Studio Code Jupyter notebooks to interact with your Azure Cosmos DB for NoSQL account. You'll see how to connect to your account, import data, and run queries.
- An existing Azure Cosmos DB for NoSQL account.
- If you have an existing Azure subscription, create a new account.
- No Azure subscription? You can try Azure Cosmos DB free with no credit card required.
- Install Visual Studio Code and setup your environment to use notebooks.
- An existing Azure Cosmos DB for NoSQL account.
- If you have an existing Azure subscription, create a new account.
- No Azure subscription? You can try Azure Cosmos DB free with no credit card required.
- Install Visual Studio Code and setup your environment to use notebooks.
- Install the Polyglot notebooks extension for Visual Studio Code.
In this section, you'll create the Azure Cosmos database, container, and import the retail data to the container.
- Open Visual Studio Code.
- Run the Create: New Jupyter Notebook command from the Command Palette (Ctrl+Shift+P) or create a new .ipynb file in your workspace.
-
Open Visual Studio Code.
-
Run the Polyglot Notebook: Create new blank notebook command from the Command Palette (Ctrl+Shift+P).
:::image type="content" source="media/tutorial-create-notebook-vscode/create-notebook-csharp.png" alt-text="Screenshot of Create new Polyglot notebook command in Visual Studio Code.":::
-
Select the .ipynb file extension.
-
Select C# as the default language.
Tip
Now that the new notebook has been created, you can save it and name it something like AnalyzeRetailData.ipynb.
-
Start in the default code cell.
-
Install the Azure.cosmos package. Run this cell before continuing.
%pip install azure.cosmos
-
Import any packages you require for this tutorial.
import azure.cosmos from azure.cosmos.partition_key import PartitionKey from azure.cosmos import CosmosClient
-
Create a new instance of CosmosClient.
endpoint = "<FILL ME>" key = "<FILL ME>" cosmos_client = CosmosClient(url=endpoint, credential=key)
-
Create a database named RetailIngest using the built-in SDK.
database = cosmos_client.create_database_if_not_exists('RetailIngest')
-
Create a container named WebsiteMetrics with a partition key of
/CartID
.container = database.create_container_if_not_exists(id='WebsiteMetrics', partition_key=PartitionKey(path='/CartID'))
-
Select Run to create the database and container resource.
:::image type="content" source="media/tutorial-create-notebook-vscode/run-cell-python.png" alt-text="Screenshot of Execute cell in Visual Studio Code Jupyter notebook.":::
-
Start in the default code cell.
-
Install the Microsoft.Azure.Cosmos NuGet package. Run this cell before proceeding.
#r "nuget: Microsoft.Azure.Cosmos"
-
Create a new code cell.
-
Import any packages you require for this tutorial.
using Microsoft.Azure.Cosmos;
-
Create a new instance of the client type using the built-in SDK. Fill in the URI endpoint and key of your Azure Cosmos DB account. You can find these values in the Keys page in your Azure Cosmos DB account.
var endpoint = "<FILL ME>"; var key = "<FILL ME>"; var cosmosClient = new CosmosClient(Cosmos.Endpoint, Cosmos.Key);
-
Create a database named RetailIngest.
Database database = await cosmosClient.CreateDatabaseIfNotExistsAsync("RetailIngest");
-
Create a container named WebsiteMetrics with a partition key of
/CartID
.Container container = await database.CreateContainerIfNotExistsAsync("WebsiteMetrics", "/CartID");
-
Select Execute Cell to create the database and container resource.
:::image type="content" source="media/tutorial-create-notebook-vscode/run-cell-csharp.png" alt-text="Screenshot of Execute cell in Visual Studio Code Jupyter C# notebook.":::
-
Add a new code cell
-
Within the code cell, add the following code to upload data from this url:
<https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json>
.import urllib.request import json with urllib.request.urlopen("https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json") as url: docs = json.loads(url.read().decode()) for doc in docs: container.upsert_item(doc)
-
Run the cell. This will take 45 seconds to 1 minute to run.
-
Add a new code cell.
-
In the code cell, create a new C# class to represent an item in the container. Run the cell.
public class Record { public string id { get; set; } public int CartID { get; set; } public string Action { get; set; } public decimal Price { get; set; } public string Country { get; set; } public string Item { get; set; } }
-
Add a new code cell.
-
Within the code cell, add the following code to upload data from this url:
<https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json>
.using System.Net.Http; using System.Text.Json; using System.IO; var dataURL = "https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json"; var jsonData = new HttpClient().GetStringAsync(dataURL).Result; Record[] result = JsonSerializer.Deserialize<Record[]>(jsonData); foreach (Record record in result) { await container.UpsertItemAsync<Record>(record, new PartitionKey(record.CartID)); //43 seconds }
-
Run the cell. This will take 45 seconds to 1 minute to run.
-
Create another new code cell.
-
In the code cell, use a SQL query to populate a Pandas DataFrame. Run this cell.
import pandas as pd from pandas import DataFrame QUERY = "SELECT c.Action, c.Price as ItemRevenue, c.Country, c.Item FROM c" results = container.query_items( query=QUERY, enable_cross_partition_query=True ) df_cosmos = pd.DataFrame(results)
-
Create another new code cell.
-
In the code cell, output the top 10 items from the dataframe. Run this cell.
df_cosmos.head(10)
-
Observe the output of running the command.
Action ItemRevenue Country Item 0 Purchased 19.99 Macedonia Button-Up Shirt 1 Viewed 12.00 Papua New Guinea Necklace 2 Viewed 25.00 Slovakia (Slovak Republic) Cardigan Sweater 3 Purchased 14.00 Senegal Flip Flop Shoes 4 Viewed 50.00 Panama Denim Shorts 5 Viewed 14.00 Senegal Flip Flop Shoes 6 Added 14.00 Senegal Flip Flop Shoes 7 Added 50.00 Panama Denim Shorts 8 Purchased 33.00 Palestinian Territory Red Top 9 Viewed 30.00 Malta Green Sweater -
Create another new code cell.
-
In the code cell, import the pandas package to customize the output of the dataframe. Run this cell.
import pandas as pd df_cosmos.groupby("Item").size().reset_index()
-
Observe the output of running the command.
Item Test 0 Flip Flop Shoes 66 1 Necklace 55 2 Athletic Shoes 111 ... ... ... 45 Windbreaker Jacket 56
-
Create a new code cell.
-
In the code cell, add code to execute a SQL query using the SDK storing the output of the query in a variable of type xref:System.Collections.Generic.List%601 named results.
using System.Collections.Generic; var query = new QueryDefinition( query: "SELECT c.Action, c.Price, c.Country, c.Item FROM c" ); FeedIterator<Record> feed = container.GetItemQueryIterator<Record>( queryDefinition: query ); var results = new List<Record>(); while (feed.HasMoreResults) { FeedResponse<Record> response = await feed.ReadNextAsync(); foreach (Record result in response) { results.Add(result); } }
-
Create another new code cell.
-
In the code cell, create a dictionary by adding unique permutations of the Item field as the key and the sum of the Price field as the value. This gives the total sales for each item. Run this cell.
var dictionary = new Dictionary<string, decimal>(); foreach(var result in results) { if (dictionary.ContainsKey(result.Item)) { dictionary[result.Item] += result.Price; } else { dictionary.TryAdd (result.Item, result.Price); } } dictionary
-
Observe the output with unique combinations of the Item and Price fields.
... Black Tee: 603 Flannel Shirt: 1199.40 Socks: 210.00 Rainjacket: 2695 ...