Protected material detection

The Protected material text API flags known text content (for example, song lyrics, articles, recipes, and selected web content) that might be output by large language models.

The Protected material code API flags protected code content (from known GitHub repositories, including software libraries, source code, algorithms, and other proprietary programming content) that might be output by large language models.

Caution

The content safety service's code scanner/indexer is only current through November 6, 2021. Code that was added to GitHub after this date will not be detected. Use your own discretion when using Protected Material for Code to detect recent bodies of code.

By detecting and preventing the display of protected material, organizations can ensure compliance with intellectual property laws, maintain content originality, and protect their reputations.

This guide provides details about the kinds of content that the protected material API detects.

User scenarios

Content generation platforms for creative writing

  • Scenario: A content generation platform that uses generative AI for creative writing (for example, blog posts, stories, marketing copy) integrates the Protected Material for Text feature to prevent the generation of content that closely matches known copyrighted material.
  • User: Platform administrators and content creators.
  • Action: The platform uses Azure AI Content Safety to scan AI-generated content before it's provided to users. If the generated text matches protected material, the content is flagged and either blocked or revised.
  • Outcome: The platform avoids potential copyright infringements and ensures that all generated content is original and compliant with intellectual property laws.

Automated social media content creation

  • Scenario: A digital marketing agency uses generative AI to automate social media content creation. The agency integrates the Protected Material for Text feature to avoid publishing AI-generated content that includes copyrighted text, such as song lyrics or excerpts from books.
  • User: Digital marketers and social media managers.
  • Action: The agency employs Azure AI Content Safety to check all AI-generated social media content for matches against a database of protected material. Content that matches is flagged for revision or blocked from posting.
  • Outcome: The agency maintains compliance with copyright laws and avoids reputation risks associated with posting unauthorized content.

AI-assisted news writing

  • Scenario: A news outlet uses generative AI to assist journalists in drafting articles and reports. To ensure the content does not unintentionally replicate protected news articles or other copyrighted material, the outlet uses the Protected Material for Text feature.
  • User: Journalists, editors, and compliance officers.
  • Action: The news outlet integrates Azure AI Content Safety into its content creation workflow. AI-generated drafts are automatically scanned for protected content before submission for editorial review.
  • Outcome: The news outlet prevents accidental copyright violations and maintains the integrity and originality of its reporting.

E-learning platforms using AI for content generation

  • Scenario: An e-learning platform employs generative AI to generate educational content, such as summaries, quizzes, and explanatory text. The platform uses the Protected Material for Text feature to ensure the generated content does not include protected material from textbooks, articles, or academic papers.
  • User: Educational content creators and compliance officers.
  • Action: The platform integrates the feature to scan AI-generated educational materials. If any content matches known protected academic material, it's flagged for revision or automatically removed.
  • Outcome: The platform maintains educational content quality and complies with copyright laws, avoiding the use of protected material in AI-generated learning resources.

AI-powered recipe generators

  • Scenario: A food and recipe website uses generative AI to generate new recipes based on user preferences. To avoid generating content that matches protected recipes from famous cookbooks or websites, the website integrates the Protected Material for Text feature.
  • User: Content managers and platform administrators.
  • Action: The website uses Azure AI Content Safety to check AI-generated recipes against a database of known protected content. If a generated recipe matches a protected one, it's flagged and revised or blocked.
  • Outcome: The website ensures that all AI-generated recipes are original, reducing the risk of copyright infringement.

Protected material text examples

Refer to this table for details of the major categories of protected material text detection. All four categories are applied when you call the API.

Category Scope Considered acceptable Considered harmful
Recipes Copyrighted content related to Recipes.

Other harmful or sensitive text is out of scope for this task, unless it intersects with Recipes IP copyright harm.
  • Links to web pages that contain information about recipes  
  • Any content from recipes that have no or low IP/Copyright protections: 
    • Lists of ingredients
    • Basic instructions for combining and cooking ingredients
  • Rejection or refusal to provide copyrighted content: 
    • Changing a topic to avoid sharing copyrighted content
    • Refusal to share copyrighted content
    • Providing nonresponsive information
  • Other literary content in a recipe 
    • Matching anecdotes, stories, or personal commentary about the recipe (40 characters or more)
    • Creative names for the recipe that are not limited to the well-known name of the dish, or a plain descriptive summary of the dish indicating what the primary ingredient is (40 characters or more)
    • Creative descriptions of the ingredients or steps for combining or cooking ingredients, including descriptions that contain more information than needed to create the dish, rely on imprecise wording, or contain profanity (40 characters or more)
  • Methods to access copyrighted content:
    • Ways to bypass paywalls to access recipes
Web Content All websites that have webmd.com as their URL domain name. Only focuses on issues of copyrighted content around Selected Web Content.

Other harmful or sensitive text is out of scope for this task, unless it intersects Selected Web Content harm.
  • Links to web pages 
  • Short excerpts or snippets of Selected Web Content as long as:
    • They are relevant to the user's query
    • They are fewer than 200 characters
  • Substantial content of Selected Web Content  
    • Response sections longer than 200 characters that bear substantial similarity to a block of text from the Selected Web Content
    • Excerpts from Selected Web Content that are longer than 200 characters
    • Quotes from Selected Web Content that are longer than 200 characters
  • Methods to access copyrighted content:
    • Ways to bypass paywalls or DRM protections to access copyrighted Selected Web Content
News Only focus on issues of copyrighted content around News.

Other harmful or sensitive text is out of scope for this task, unless it intersects News IP Copyright harm.
  • Links to web pages that host news or information about news, magazines, or blog articles as long as:
    • They have legitimate permissions
    • They have licensed news coverage
    • They are authorized platforms
  • Links to authorized web pages that contain embedded audio/video players as long as:
    • They have legitimate permissions
    • They have licensed news coverage
    • They are authorized streaming platforms
    • They are official YouTube channels
  • Short excerpts/snippets like headlines or captions from news articles as long as:
    • They are relevant to the user's query
    • They are not a substantial part of the article
    • They are not the entire article
  • Summary of news articles as long as:
    • It is relevant to the user's query
    • It is brief and factual
    • It does not copy/paraphrase a substantial part of the article
    • It is clearly and visibly cited as a summary
  • Analysis/Critique/Review of news articles as long as:
    • It is relevant to the user's query
    • It is brief and factual
    • It does not copy/paraphrase a substantial part of the article
    • It is clearly and visibly cited as an analysis/critique/review
  • Any news content that has no IP/Copyright protections:
    • News/Magazines/Blogs that are in the public domain
    • News/Magazines/Blogs for which Copyright protection has elapsed, been surrendered, or never existed
  • Rejection or refusal to provide copyrighted content:
    • Changing topic to avoid sharing copyrighted content
    • Refusal to share copyrighted content
    • Providing nonresponsive information
  • Links to pdf or any other file containing full text of news/magazine/blog articles, unless:
    • They are sourced from authorized platforms with legitimate permissions and licenses
  • News content
    • More than 200 characters taken verbatim from any news article
    • More than 200 characters substantially similar to a block of text from any news article
    • Direct access to news/magazine/blog articles that are behind paywalls
  • Methods to access copyrighted content:
    • Steps to download news from an unauthorized website
    • Ways to bypass paywalls or DRM protections to access copyrighted news or videos
Lyrics Only focuses on issues of copyrighted content around Songs.

Other harmful or sensitive text is out of scope for this task, unless it intersects Songs IP Copyright harm.
  • Links to web pages that contain information about songs such as:
    • Lyrics of the songs
    • Chords or tabs of the associated music
    • Analysis or reviews of the song/music
  • Links to authorized web pages that contain embedded audio/video players as long as:
    • They have legitimate permissions
    • They have licensed music
    • They are authorized streaming platforms
    • They are official YouTube channels
  • Short excerpts or snippets from lyrics of the songs as long as:
    • They are relevant to the user's query
    • They are not a substantial part of the lyrics
    • They are not the entire lyrics
    • They are not more than 11 words long
  • Short excerpts or snippets from chords/tabs of the songs as long as:
    • They are relevant to the user's query
    • They are not a substantial part of the chords/tabs
    • They are not the entire chords/tabs
  • Any content from songs that have no IP/Copyright protections:
    • Songs/Lyrics/Chords/Tabs that are in the public domain
    • Songs/Lyrics/Chords/Tabs for which Copyright protection has elapsed, been surrendered, or never existed
  • Rejection or refusal to provide copyrighted content:
    • Changing topic to avoid sharing copyrighted content
    • Refusal to share copyrighted content
    • Providing nonresponsive information
  • Lyrics of a song
    • Entire lyrics
    • Substantial part of the lyrics
    • Part of lyrics that contain more than 11 words
  • Chords or Tabs of a song
    • Entire chords/tabs
    • Substantial part of the chords/tabs
  • Links to webpages that contain embedded audio/video players that:
    • Do not have legitimate permissions
    • Do not have licensed music
    • Are not authorized streaming platforms
    • Are not official YouTube channels
  • Methods to access copyrighted content:
    • Steps to download songs from an unauthorized website
    • Ways to bypass paywalls or DRM protections to access copyrighted songs or videos

Next steps

Follow the quickstart to get started using Azure AI Content Safety to detect protected material.